Friday, March 15, 2013

Datapump and network links

I was asked to develop a solution which copied data from one schema to another. The solution was required to be generic so would work if the schemas resided on different databases.

I thought this was a good situation to utilise at Data Pump rather than a one off set of scripts.

When I reviewed the Oracle documentation the following was needed:
- A database link between source and target databases
- Specify NETWORK_LINK in the parameter file

Creation of a database link can be done as follows:
 CREATE DATABASE LINK loopback  
 CONNECT TO scott  
 IDENTIFIED BY "xxx"   
 USING 'orcl11g';  

As can be seen in the code above, the link is called "loopback". It creates to the user SCOTT using the password XXX and connects to the orcl11g database.

Next step is to confirm the link is working:
 SELECT COUNT(*)  
 FROM  emp@loopback;  
 COUNT(*)  
 ----------  
    14  

Works fine. Now for the datapump parameter file.
 CONTENT=ALL  
 FULL=N  
 LOGFILE=import.log  
 METRICS=YES  
 NETWORK_LINK=loopback  
 REMAP_SCHEMA=scott:top  
 TABLES=scott.emp  
 TABLE_EXISTS_ACTION=TRUNCATE  

The important parameters are:
- CONTENT, when ALL then metadata and data is transferred
- NETWORK_LINK, set to use the LOOPBACK database link created above
- REMAP_SCHEMA, specifies the source and target schemas
- TABLES, only copy the EMP table between the schemas. Note that I have prefixed the EMP table with the source schema name - this prevents the errors ORA-39166: Object TOP.EMP was not found.
ORA-31655: no data or metadata objects selected for job

Now for the datapump import. I used the following command
 impdp top parfile=scott_imp.par  

Finally, check that the table is populated in the target schema
 SELECT COUNT(*)  
 FROM top.emp;  
 COUNT(*)  
 ----------  
    14  

As an aside, I received the following error when testing:
 ORA-31631: privileges are required  
 ORA-39149: cannot link privileged user to non-privileged user  

To resolve, I granted the roles IMP_FULL_DATABASE and EXP_FULL_DATABASE for both the source and target users.

Monday, February 18, 2013

Timestamp with time zone

One of the databases I support is based in America. The application code makes extensive use of the TIMESTAMP WITH TIME ZONE data type to ensure that the date/time fields are correctly displayed according to the time zone. Lets try querying on using the TIMESTAMP WITH LOCAL TIME ZONE column.

Here's a simple example. I create a small table with 10,000 rows in Oracle 11.2.0.3.
 CREATE TABLE dates(  
  id NUMBER,  
  lt TIMESTAMP WITH LOCAL TIME ZONE  
 );

 INSERT /*+ APPEND */  
 INTO   dates  
 SELECT level,  
        TRUNC(SYSTIMESTAMP) + level  
 FROM   dual  
 CONNECT BY level <= 10000;  

 ALTER TABLE dates ADD CONSTRAINT pk_dates PRIMARY KEY (id);  
 CREATE INDEX i_dates_03 ON dates(lt);  
 exec dbms_stats.gather_table_stats(user, 'dates');  

The key thing to note is the index on the "lt" column. Lets run a simple query against the table, using the TO_TIMESTAMP function.
 SELECT *  
 FROM   dates  
 WHERE  lt = TO_TIMESTAMP('28/02/2013 00:00:00','dd/mm/yyyy hh24:mi:ss');

 --------------------------------------------------  
 | Id | Operation                  | Name         |  
 --------------------------------------------------  
 |  0 | SELECT STATEMENT           |              |  
 |  1 | TABLE ACCESS BY INDEX ROWID| DATES        |  
 |* 2 |  INDEX RANGE SCAN          | I_DATES_03   |  
 --------------------------------------------------  
 Predicate Information (identified by operation id):  
 ---------------------------------------------------  
   2 - access("LT"=TIMESTAMP' 2013-02-28 00:00:00.000000000')  
Note that an the index is being used to access the table. The access predicate on step 2 confirms this.

What if a TO_TIMESTAMP_TZ function is applied? I thought that this should also use the index since we are applying a timezone/timestamp comparison to a field stored as TIMESTAMP WITH LOCAL TIME ZONE.
 SELECT *
 FROM   dates
 WHERE  lt = TO_TIMESTAMP_TZ('28/02/2013 00:00:00 0:00','dd/mm/yyyy hh24:mi:ss tzh:tzm');

 -----------------------------------  
 | Id | Operation        | Name    |  
 -----------------------------------  
 |  0 | SELECT STATEMENT |         |  
 |* 1 | TABLE ACCESS FULL| DATES   |  
 -----------------------------------  
 Predicate Information (identified by operation id):  
 ---------------------------------------------------  
   1 - filter(SYS_EXTRACT_UTC(INTERNAL_FUNCTION("LT"))=  
        SYS_EXTRACT_UTC(TO_TIMESTAMP_TZ('28/02/2013 00:00:00 0:00','dd/mm/yyyy hh24:mi:ss tzh:tzm')))  
This is what surprised me - the index is not being used. Instead, a full table scan and an INTERNAL_FUNCTION is used (which would have decreased the speed of my update operation had I not noticed it).

What I decided to do was use an alter session command so my session was in the correct time zone:
 ALTER SESSION SET TIME_ZONE='-5:00';  

As well as this I converted my TO_TIMESTAMP_TZ function calls to be TO_DATE.

Tuesday, February 12, 2013

SQL*Loader using a CSV file

This is the first of a couple of articles which are about SQL*Loader. SQL*Loader allows data to be loaded into an Oracle database from a file on the server.

There are a couple of things we need to know about before we try an example.

1) Input file
Contains the data to be loaded. It can be in any format supplied by the source system - comma separated, tab separated, positional.

2) Control file
A detailed definition of the data in the file which enables Oracle to turn it into fields and load it. Besides the data layout, the file can also contain metadata about the data, for example date format information and the number of records to skip when loading.

Input File
For this example, I created a file containing comma separated data. Although I am using commas in this example, other characters can be used.

 Graham,Wallace,01/01/1970  
 David,"Gilmour",05/05/1957  
 "Roger",Waters,10/10/1980  
Things to note:
1) the fields are separated ("terminated") by a comma
2) the fields are also sometimes ("optionally") surrounded ("enclosed") by double quotes
3) the date field contains a consistent date format (day/month/year)

Control File
A sample control file can be seen below. Note that options to be applied to the load can be specified on the command line and within the control file.

 LOAD DATA  
 INFILE      'input_file.csv'  
 BADFILE     'first_bad.txt'  
 DISCARDFILE 'first_dsc.txt'  
 APPEND  
 INTO TABLE person  
 FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'  
 TRAILING NULLCOLS
 (  
   forename  CHAR,  
   surname   CHAR,  
   dob       DATE "dd/mm/yyyy",
   person_id "person_seq.nextval"
 )  
To explain further, we'll look at the file line by line.

 LOAD DATA  
This tells SQL*Loader that the file contains instructions for load.

 INFILE   'input_file.csv'   
This details where the input file can be found.

 BADFILE   'first_bad.txt'   
When records fail parsing because of a missing field or error, they are written to the BADFILE.

 DISCARDFILE 'first_dsc.txt'  
If there are records which are not inserted or rejected (for example if there is a condition on the load) then records are written to the DISCARDFILE.
Note that default values are used for the BADFILE and DISCARDFILE if they are not specified in the control file or on the command line.

 APPEND 
 INTO TABLE person
This details the method of loading data into the target table as well as the name of the target table, in this case "person".
The four possible methods that can be specified are:
- INSERT inserts the new data and expects the target table to be empty otherwise an error occurs
- APPEND inserts the new data with no impact to the existing data
- REPLACE uses a delete to remove the existing data
- TRUNCATE uses a truncate to remove the existing data

 FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'  
This tells SQL*Loader that the each field in the input file is terminated by a comma and might have a double quote surrounding it. If you look back at the input data, you will see that some fields are quoted.

 TRAILING NULLCOLS  
This this tells SQL*Loader that there may be extra fields at the end of the file which are null.

 (  
   forename  CHAR,  
   surname   CHAR,  
   dob       DATE "dd/mm/yyyy",
   person_id "person_seq.nextval"
 )
Finally we get to the point in the file where we define the layout of the data. There are three fields in the data: forename, surname and finally date of birth, which has the mask dd/mm/yyyy applied to it.
The fourth field is a virtual field which does not appear in the file. It tells SQL*Loader to use person_seq.nextval to populate the person_id field.

Oracle Table
The final things we need are 1) to create the person table to hold the data from the input file and 2) create the sequence we refer to above.
 CREATE TABLE person   
 ( PERSON_ID NUMBER,   
  FORENAME VARCHAR2(100),   
  SURNAME  VARCHAR2(100),   
  DOB      DATE
 );  

CREATE SEQUENCE person_seq
MINVALUE 1 
MAXVALUE 9999 
INCREMENT BY 1 
START WITH 1;

Invoking SQL*Loader
Loader is invoked using the sqlldr program from the command line.

 sqlldr userid=top control=first.ctl log=first_log.txt  
The parameters for the command above are:
- userid. The user being used to run SQL*Loader
- control. The name of the control file to be used for the load
- log. The file to which the load will write output

A zipped version of the files I used can be found here.


Monday, January 28, 2013

SQL*Plus bind and substitution variables

Is it possible to create generic grant scripts for a release across all of the environments which are used on the path to production eg SIT, UAT?

Each of the environments in use have "read only" and "fire fighter" users created to prevent the schema owner password being too widely used. To complicate things, the read only and fire fighter user names do not remain constant. Grant scripts are created on a per environment basis which means the same scripts are not being run more than once - increasing the risk that errors are made when the scripts are amended.

The scenario is best described in the following script:
CREATE USER uat    IDENTIFIED BY uat;  
CREATE USER ro_uat IDENTIFIED BY ro_uat;
CREATE USER ff     IDENTIFIED BY ff_uat;  
-- Grant privileges to connect

connect uat/uat
CREATE TABLE x
(x_id NUMBER);

GRANT SELECT ON x TO ro_uat;
GRANT SELECT, INSERT, UPDATE, DELETE ON x to ff_uat;

I have created an application owner user and two users to be used for read only and update. The GRANT statements need to be executed once per environment - and amended on deployment to each new environment.

The solution uses a mixture of substitution and bind variables. A substitution variable is identified in a SQL*Plus script with & or && and are used to allow repeated use within a script. A bind variable in this context is a variable which is used in SQL*Plus and can be referenced within SQL or PL/SQL executed as part of the script.

My first attempt to solve the problem resulted in the following script:
VARIABLE ro VARCHAR2(30);  -- Declare bind variables to use in block below
VARIABLE ff VARCHAR2(30);  
-- Create an anonymous block to populate the variables
DECLARE  
  v_user VARCHAR2(30);  
BEGIN  
  -- Get the user
  SELECT user   
  INTO   v_user  
  FROM   dual;  
  -- Set variables depending on the user, notice the use of the ':' to signify a bind variable
  CASE v_user  
    WHEN 'PROD' THEN   
      :ro := 'ROPROD';  
      :ff := 'FFPROD';  
    WHEN 'UAT' THEN  
      :ro := 'ROUAT';  
      :ff := 'FFUAT';  
    WHEN 'DEV' THEN  
      :ro := 'RODEV';  
      :ff := 'FFDEV';  
  END CASE;  
END;  
/  
-- Grant permissions
GRANT SELECT ON x TO :ro;
GRANT SELECT ON x TO :ro
                     *
ERROR at line 1:
ORA-00987: missing or invalid username(s)

We cannot use a bind variable here.

Lets try converting the bind variable to a substitution variable.
-- This code can be run after the code above   
-- Ensure the parameters are reset 
undefine ro_param   
undefine ff_param   
-- create a user variable. new_value stores the result of the query in the variable  
-- note that the column in the query must match the column name in the command  
column rousr format a30 new_value ro_param   
column ffusr format a30 new_value ff_param   
SELECT :ro AS rousr,   
       :ff AS ffusr   
FROM dual; 

GRANT SELECT ON x TO &&first;
old   1: GRANT SELECT ON x TO &&first
new   1: GRANT SELECT ON x TO RODEV

Grant succeeded.  

Please note that "set define on" should be set in the script to ensure that the && characters are identified and used properly. "set verify on" is also useful - it displays the lines marked old and new in the script above.

Tuesday, January 22, 2013

SQL*Plus Scripting


I have been doing a lot of work for implementations with SQL scripts recently.

SQL scripts are extremely useful for automation purposes. I also like to output the contents of my scripts to a file so I have a permanent record.

When I first started looking at the scripts, they typically looked like the following:
 DROP TABLE t;  
 CREATE TABLE t  
 (  
  t_id NUMBER,  
  t_txt VARCHAR2(50)  
 );  
 CREATE UNIQUE INDEX pk_t ON t(t_id);  
 ALTER TABLE t ADD CONSTRAINT t_pk PRIMARY KEY t(t_id) USING INDEX;  
 exit  

Running this script through SQL*Plus produces the following output.
 top@ORCL11G> @sql_scripts.sql  
 Table dropped.  
 Table created.  
 Index created.  
 Table altered.  

Bare bones, isn't it? The problems at first sight are
- No output to a file, so no permanent record
- No record on the screen of which command is currently executing - imagine a situation with a large script containing long running updates - how can progress be assessed?

Lets amend the script to be the following:
 spool s2_log.txt  
 set echo on  
 DROP TABLE t;  
 CREATE TABLE t  
 (  
  t_id NUMBER,  
  t_txt VARCHAR2(50)  
 );  
 CREATE UNIQUE INDEX pk_t ON t(t_id);  
 ALTER TABLE t ADD CONSTRAINT t_pk PRIMARY KEY t(t_id) USING INDEX;  
 exit  

The "spool" command tells Oracle to direct output to a file. The "set echo on" command is SQL*Plus specific and tells Oracle to send the SQL statement it is executing to the screen.

Please see the sample output below:
 top@ORCL11G> DROP TABLE t;  
 Table dropped.  
 top@ORCL11G> CREATE TABLE t  
  2 (  
  3  t_id NUMBER,  
  4  t_txt VARCHAR2(50)  
  5 );  
 Table created.  
 top@ORCL11G>  
 top@ORCL11G> CREATE UNIQUE INDEX pk_t ON t(t_id);  
 Index created.  
 top@ORCL11G>  
 top@ORCL11G> ALTER TABLE t ADD CONSTRAINT t_pk PRIMARY KEY (t_id) USING INDEX;  
 Table altered.  
Far more user friendly! As can be seen, SQL*Plus outputs exactly the commands and the results.

Some of the other useful commands I use are:
host - host command is also very useful - it executes and Operating System command without leaving SQL*Plus. For example, if you forget the name of the script you need to run, you can enter "host ls" at the prompt and it will list the contents of the current directory.

set timing on - prints the execution time per statement to the screen

set serveroutput on - Shows output from PL/SQL blocks when dbms_output is used

connect user/password - Allows the script to log on as a different user to execute commands in the other schema

There are far more options that I have covered here. The Oracle documentation shows all of the possibilities.

That's it for the moment - I will write another post regarding bind variables in scripts.