This is the first of a couple of articles which are about SQL*Loader. SQL*Loader allows data to be loaded into an Oracle database from a file on the server.
There are a couple of things we need to know about before we try an example.
1) Input file
Contains the data to be loaded. It can be in any format supplied by the source system - comma separated, tab separated, positional.
2) Control file
A detailed definition of the data in the file which enables Oracle to turn it into fields and load it. Besides the data layout, the file can also contain metadata about the data, for example date format information and the number of records to skip when loading.
Input File
For this example, I created a file containing comma separated data. Although I am using commas in this example, other characters can be used.
Graham,Wallace,01/01/1970
David,"Gilmour",05/05/1957
"Roger",Waters,10/10/1980
Things to note:
1) the fields are separated ("terminated") by a comma
2) the fields are also sometimes ("optionally") surrounded ("enclosed") by double quotes
3) the date field contains a consistent date format (day/month/year)
Control File
A sample control file can be seen below. Note that options to be applied to the load can be specified on the command line and within the control file.
LOAD DATA
INFILE 'input_file.csv'
BADFILE 'first_bad.txt'
DISCARDFILE 'first_dsc.txt'
APPEND
INTO TABLE person
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
forename CHAR,
surname CHAR,
dob DATE "dd/mm/yyyy",
person_id "person_seq.nextval"
)
To explain further, we'll look at the file line by line.
LOAD DATA
This tells SQL*Loader that the file contains instructions for load.
INFILE 'input_file.csv'
This details where the input file can be found.
BADFILE 'first_bad.txt'
When records fail parsing because of a missing field or error, they are written to the BADFILE.
DISCARDFILE 'first_dsc.txt'
If there are records which are not inserted or rejected (for example if there is a condition on the load) then records are written to the DISCARDFILE.
Note that default values are used for the BADFILE and DISCARDFILE if they are not specified in the control file or on the command line.
APPEND
INTO TABLE person
This details the method of loading data into the target table as well as the name of the target table, in this case "person".
The four possible methods that can be specified are:
- INSERT inserts the new data and expects the target table to be empty otherwise an error occurs
- APPEND inserts the new data with no impact to the existing data
- REPLACE uses a delete to remove the existing data
- TRUNCATE uses a truncate to remove the existing data
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
This tells SQL*Loader that the each field in the input file is terminated by a comma and might have a double quote surrounding it. If you look back at the input data, you will see that some fields are quoted.
TRAILING NULLCOLS
This this tells SQL*Loader that there may be extra fields at the end of the file which are null.
(
forename CHAR,
surname CHAR,
dob DATE "dd/mm/yyyy",
person_id "person_seq.nextval"
)
Finally we get to the point in the file where we define the layout of the data. There are three fields in the data: forename, surname and finally date of birth, which has the mask dd/mm/yyyy applied to it.
The fourth field is a virtual field which does not appear in the file. It tells SQL*Loader to use person_seq.nextval to populate the person_id field.
Oracle Table
The final things we need are 1) to create the person table to hold the data from the input file and 2) create the sequence we refer to above.
CREATE TABLE person
( PERSON_ID NUMBER,
FORENAME VARCHAR2(100),
SURNAME VARCHAR2(100),
DOB DATE
);
CREATE SEQUENCE person_seq
MINVALUE 1
MAXVALUE 9999
INCREMENT BY 1
START WITH 1;
Invoking SQL*Loader
Loader is invoked using the sqlldr program from the command line.
sqlldr userid=top control=first.ctl log=first_log.txt
The parameters for the command above are:
- userid. The user being used to run SQL*Loader
- control. The name of the control file to be used for the load
- log. The file to which the load will write output
A zipped version of the files I used can be found
here.