Monday, February 18, 2013

Timestamp with time zone

One of the databases I support is based in America. The application code makes extensive use of the TIMESTAMP WITH TIME ZONE data type to ensure that the date/time fields are correctly displayed according to the time zone. Lets try querying on using the TIMESTAMP WITH LOCAL TIME ZONE column.

Here's a simple example. I create a small table with 10,000 rows in Oracle 11.2.0.3.
 CREATE TABLE dates(  
  id NUMBER,  
  lt TIMESTAMP WITH LOCAL TIME ZONE  
 );

 INSERT /*+ APPEND */  
 INTO   dates  
 SELECT level,  
        TRUNC(SYSTIMESTAMP) + level  
 FROM   dual  
 CONNECT BY level <= 10000;  

 ALTER TABLE dates ADD CONSTRAINT pk_dates PRIMARY KEY (id);  
 CREATE INDEX i_dates_03 ON dates(lt);  
 exec dbms_stats.gather_table_stats(user, 'dates');  

The key thing to note is the index on the "lt" column. Lets run a simple query against the table, using the TO_TIMESTAMP function.
 SELECT *  
 FROM   dates  
 WHERE  lt = TO_TIMESTAMP('28/02/2013 00:00:00','dd/mm/yyyy hh24:mi:ss');

 --------------------------------------------------  
 | Id | Operation                  | Name         |  
 --------------------------------------------------  
 |  0 | SELECT STATEMENT           |              |  
 |  1 | TABLE ACCESS BY INDEX ROWID| DATES        |  
 |* 2 |  INDEX RANGE SCAN          | I_DATES_03   |  
 --------------------------------------------------  
 Predicate Information (identified by operation id):  
 ---------------------------------------------------  
   2 - access("LT"=TIMESTAMP' 2013-02-28 00:00:00.000000000')  
Note that an the index is being used to access the table. The access predicate on step 2 confirms this.

What if a TO_TIMESTAMP_TZ function is applied? I thought that this should also use the index since we are applying a timezone/timestamp comparison to a field stored as TIMESTAMP WITH LOCAL TIME ZONE.
 SELECT *
 FROM   dates
 WHERE  lt = TO_TIMESTAMP_TZ('28/02/2013 00:00:00 0:00','dd/mm/yyyy hh24:mi:ss tzh:tzm');

 -----------------------------------  
 | Id | Operation        | Name    |  
 -----------------------------------  
 |  0 | SELECT STATEMENT |         |  
 |* 1 | TABLE ACCESS FULL| DATES   |  
 -----------------------------------  
 Predicate Information (identified by operation id):  
 ---------------------------------------------------  
   1 - filter(SYS_EXTRACT_UTC(INTERNAL_FUNCTION("LT"))=  
        SYS_EXTRACT_UTC(TO_TIMESTAMP_TZ('28/02/2013 00:00:00 0:00','dd/mm/yyyy hh24:mi:ss tzh:tzm')))  
This is what surprised me - the index is not being used. Instead, a full table scan and an INTERNAL_FUNCTION is used (which would have decreased the speed of my update operation had I not noticed it).

What I decided to do was use an alter session command so my session was in the correct time zone:
 ALTER SESSION SET TIME_ZONE='-5:00';  

As well as this I converted my TO_TIMESTAMP_TZ function calls to be TO_DATE.

Tuesday, February 12, 2013

SQL*Loader using a CSV file

This is the first of a couple of articles which are about SQL*Loader. SQL*Loader allows data to be loaded into an Oracle database from a file on the server.

There are a couple of things we need to know about before we try an example.

1) Input file
Contains the data to be loaded. It can be in any format supplied by the source system - comma separated, tab separated, positional.

2) Control file
A detailed definition of the data in the file which enables Oracle to turn it into fields and load it. Besides the data layout, the file can also contain metadata about the data, for example date format information and the number of records to skip when loading.

Input File
For this example, I created a file containing comma separated data. Although I am using commas in this example, other characters can be used.

 Graham,Wallace,01/01/1970  
 David,"Gilmour",05/05/1957  
 "Roger",Waters,10/10/1980  
Things to note:
1) the fields are separated ("terminated") by a comma
2) the fields are also sometimes ("optionally") surrounded ("enclosed") by double quotes
3) the date field contains a consistent date format (day/month/year)

Control File
A sample control file can be seen below. Note that options to be applied to the load can be specified on the command line and within the control file.

 LOAD DATA  
 INFILE      'input_file.csv'  
 BADFILE     'first_bad.txt'  
 DISCARDFILE 'first_dsc.txt'  
 APPEND  
 INTO TABLE person  
 FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'  
 TRAILING NULLCOLS
 (  
   forename  CHAR,  
   surname   CHAR,  
   dob       DATE "dd/mm/yyyy",
   person_id "person_seq.nextval"
 )  
To explain further, we'll look at the file line by line.

 LOAD DATA  
This tells SQL*Loader that the file contains instructions for load.

 INFILE   'input_file.csv'   
This details where the input file can be found.

 BADFILE   'first_bad.txt'   
When records fail parsing because of a missing field or error, they are written to the BADFILE.

 DISCARDFILE 'first_dsc.txt'  
If there are records which are not inserted or rejected (for example if there is a condition on the load) then records are written to the DISCARDFILE.
Note that default values are used for the BADFILE and DISCARDFILE if they are not specified in the control file or on the command line.

 APPEND 
 INTO TABLE person
This details the method of loading data into the target table as well as the name of the target table, in this case "person".
The four possible methods that can be specified are:
- INSERT inserts the new data and expects the target table to be empty otherwise an error occurs
- APPEND inserts the new data with no impact to the existing data
- REPLACE uses a delete to remove the existing data
- TRUNCATE uses a truncate to remove the existing data

 FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'  
This tells SQL*Loader that the each field in the input file is terminated by a comma and might have a double quote surrounding it. If you look back at the input data, you will see that some fields are quoted.

 TRAILING NULLCOLS  
This this tells SQL*Loader that there may be extra fields at the end of the file which are null.

 (  
   forename  CHAR,  
   surname   CHAR,  
   dob       DATE "dd/mm/yyyy",
   person_id "person_seq.nextval"
 )
Finally we get to the point in the file where we define the layout of the data. There are three fields in the data: forename, surname and finally date of birth, which has the mask dd/mm/yyyy applied to it.
The fourth field is a virtual field which does not appear in the file. It tells SQL*Loader to use person_seq.nextval to populate the person_id field.

Oracle Table
The final things we need are 1) to create the person table to hold the data from the input file and 2) create the sequence we refer to above.
 CREATE TABLE person   
 ( PERSON_ID NUMBER,   
  FORENAME VARCHAR2(100),   
  SURNAME  VARCHAR2(100),   
  DOB      DATE
 );  

CREATE SEQUENCE person_seq
MINVALUE 1 
MAXVALUE 9999 
INCREMENT BY 1 
START WITH 1;

Invoking SQL*Loader
Loader is invoked using the sqlldr program from the command line.

 sqlldr userid=top control=first.ctl log=first_log.txt  
The parameters for the command above are:
- userid. The user being used to run SQL*Loader
- control. The name of the control file to be used for the load
- log. The file to which the load will write output

A zipped version of the files I used can be found here.