What Is SQL*Loader

Oracle Databases today are ever increasing in complexity and size. Gigabyte-sized databases are common, and data warehouses are often reaching the terabyte-sized range. With the growth of these databases, the need to populate them with external data quickly and efficiently is of paramount importance. To handle this challenge, Oracle provides a tool called SQL*Loader that loads data from external data files into an Oracle database.

SQL*Loader has many functions that include the following capabilities:

  • Data can be loaded from multiple input data files of differing file types.

  • Input records can be of fixed and variable lengths.

  • Multiple tables can be loaded in the same run. SQL*Loader can also logically load selected records into each respective table.

  • SQL functions can be used against input data before being loaded into tables.

  • Multiple physical records can be combined into a single logical record. Likewise, SQL can take a single physical record and load it as multiple logical records.

SQL*Loader is a flexible tool as well. SQL*loader can load data from specifically formatted files (files with data fields in fixed locations), files where the fields have some kind of field separator (such as a '.'), data that is in its own definition file, or it can even load data via the Unix pipe command. SQL*Loader can also take advantage of Oracle Parallel Server, is one of the tools covered by Oracle9i resumable features, and can perform direct-path loads to the database. This method using the direct-path option skips the traditional logging of data to Oracle's recovery mechanisms and simply loads data blocks in the tablespaces, it is much faster than the conventional load previously described here.

SQL*Loader works in conjunction with a control file (see Figure 17.1). This control file contains syntax defining what format and field separation is in the input data file, what table and table columns are to be populated, and can contain many of the command-line options such as the reject file (rejected records), commit point (how many records to insert prior to doing a database commit), degree of parallelism, and so on.

Figure 17.1. SQL*Loader configuration.


SQL*Loader Syntax

SQL*Loader has many options that can be defined both in a control file or on the command line when executing SQL*Loader. Listing 17.1 shows the SQL*Loader help text. As you can see, SQL*Loader has many options. This section covers the basics and provides useful examples that comprise some typical data-loading exercises.

Notice near the end of Listing 17.1, the command line syntax: 'sqlldr scott/tiger control=foo logfile=log'. SQL*Loader needs a valid userid and password and it needs a control file. The control file will be covered shortly. You can use the keyword syntax userid=scott/tiger or the userid and password can be the first syntax after the command sqlldr. In other words, some of these options are positional on the command line, others are not.

TIP

The author recommends using keywords on the command line, which helps for readability when used inside command scripts, and it ensures that the options desired are the options specified.


Listing 17.1. SQL*Loader Help Text
SQL*Loader: Release 9.0.1.0.0 - Production on Sun Aug 12 14:36:04 2001

 Copyright 2001 Oracle Corporation.  All rights reserved.


Usage: SQLLOAD keyword=value [,keyword=value,...]

Valid Keywords:
    userid -- ORACLE username/password
   control -- Control file name
       log -- Log file name
       bad -- Bad file name
      data -- Data file name
   discard -- Discard file name
discardmax -- Number of discards to allow          (Default all)
      skip -- Number of logical records to skip    (Default 0)
      load -- Number of logical records to load    (Default all)
    errors -- Number of errors to allow            (Default 50)
      rows -- Number of rows in conventional path bind array or
    between direct path data saves
               (Default: Conventional path 64, Direct path all)
  bindsize -- Size of conventional path bind array in bytes  (Default 256000)
    silent -- Suppress messages during run (header,feedback,errors,discards,
    partitions)
    direct -- use direct path                      (Default FALSE)
   parfile -- parameter file: name of file that contains parameter specifications
  parallel -- do parallel load                     (Default FALSE)
      file -- File to allocate extents from
skip_unusable_indexes -- disallow/allow unusable indexes or index partitions
      (Default FALSE)
skip_index_maintenance -- do not maintain indexes, mark affected indexes as
    unusable  (Default FALSE)
commit_discontinued -- commit loaded rows when load is discontinued
    (Default FALSE)
  readsize -- Size of Read buffer                  (Default 1048576)
external_table -- use external table for load; NOT_USED, GENERATE_ONLY,
    EXECUTE  (Default NOT_USED)
columnarrayrows -- Number of rows for direct path column array  (Default 5000)
streamsize -- Size of direct path stream buffer in bytes  (Default 256000)
multithreading -- use multithreading in direct path
 resumable -- enable or disable resumable for current session  (Default FALSE)
resumable_name -- text string to help identify resumable statement
resumable_timeout -- wait time (in seconds) for RESUMABLE  (Default 7200)
PLEASE NOTE: Command-line parameters may be specified either by
position or by keywords.  An example of the former case is 'sqlldr
scott/tiger foo'; an example of the latter is 'sqlldr control=foo
userid=scott/tiger'.  One may specify parameters by position before
but not after parameters specified by keywords.  For example,
'sqlldr scott/tiger control=foo logfile=log' is allowed, but
'sqlldr scott/tiger control=foo log' is not, even though the
position of the parameter 'log' is correct.

The control file is used to tell SQL*Loader what format the data to be loaded is in and it also tells SQL*Loader where to load the data. Many of the command-line options can also be specified in the control file. The control file typically has a suffix of .ctl.

In Listing 17.1, with CONTROL=FOO, it would be assumed that there is a control file named FOO.CTL in the computer directory where the SQL*Loader was executed from. The logfile can be specified either on the command line or in the control file. This is the journal of how the loader session went, what time it was initiated, and how many records were loaded into what object from what data source. Any errors that might occur will also appear in this log file. If a suffix is not given to the log file name, a .log suffix will be added to it. Another file, refer to Figure 17.1, is the .bad file. This file will be created even if it is not defined. If not defined, it will have the same filename as the control file but with a .bad suffix. This file will contain any records not loaded by SQL*Loader. This gives the user the ability to load most of the records that are not problematic and deal specifically with those records that are.

Another key syntax in Listing 17.1 is SKIP, the number of records to skip at the beginning of the load; LOAD, the number of records to load then stop the load process; and ERRORS, the number of bad records to encounter before stopping the load process. The remainder of the syntax listed in Listing 17.1 is useful but beyond the scope of this chapter.

The control file syntax is as follows. It must start with LOAD DATA or with the recoverable or unrecoverable option set as in UNRECOVERABLE LOAD DATA. The default is recoverable. Unrecoverable is like a direct-path load, it bypasses the Oracle recovery and read-consistency mechanisms giving a faster load but with data loss if something happened during the load. Listing 17.2 shows the control file syntax (defaults are underlined, syntax in [ ] are optional) and Listing 17.3 shows a simple example with its data file.

The INSERT mode requires that the table being loaded is empty. An error will result if the table has rows in it and SQL*Loader will stop. APPEND simply does inserts to the table whether there are any rows or not. REPLACE will do a DELETE of any rows in the table and then insert the data, and TRUNCATE will truncate the table the insert the data. The BEGINDATA section is needed if the data being loaded is actually in the control file. Notice that you can specify more than one INFILE.

Listing 17.2. Control File Syntax
 [UNRECOVERABLE | RECOVERABLE] LOAD DATA
[INFILE <computer file name> [INFILE <computer name>]...]
[INSERT | APPEND | REPLACE | TRUNCATE] [Options] INTO TABLE <table name> ...)]
     [FIELDS TERMINATED BY] [( <column name column data type and field
    location or delimiter, [ ...]
[BEGINDATA and data in control file]

The columns that the data is associated with are named. These columns can have a data type, they should have either the infile location by location start and location end position or have a field delimiter specified. Files being loaded where the fields are in fixed locations use the location start and end position, as shown in Listing 17.3. The position is the number of positions from the beginning of the record.

Listing 17.3. Control File Column Definition Syntax
( EMPNO    POSITION(01:04) INTEGER EXTERNAL,
  ENAME    POSITION(06:15) CHAR,
  JOB      POSITION(17:25) CHAR,
  MGR      POSITION(27:30) INTEGER EXTERNAL,
  SAL      POSITION(32:39) DECIMAL EXTERNAL,
  COMM     POSITION(41:48) DECIMAL EXTERNAL,
  DEPTNO   POSITION(50:51) INTEGER EXTERNAL)

Notice the CHAR, INTEGER EXTERNAL, and other commands in Listing 17.3. This is defining the type of field in the data file. INTEGER EXTERNAL is commonly used for numbers with no decimal positions. DECIMAL EXTERNAL is used for numbers with decimal locations, CHAR is used for character fields, and so on. There are many kinds of field types but these are the common ones. An Oracle date mask (HIRE_DATE DATE 'mm/dd/yy') follows the DATE column type.

SQL*Loader Examples

Notice how the control file in Listing 17.4 follows the column format rules of Listing 17.3. This control file uses a delimited file to separate the fields. Notice the INFILE syntax, the REPLACE syntax is used causing any rows that were in the table TEST_PWD to be deleted. When loading a delimited file, the first field in the data file will be loaded into the column login_id, the second field (determined by the delimiter ':' in this example) will be loaded into the column xfield, and so on.

Listing 17.4. Simple Control File Example
load data
infile 'passwd.dat'
replace into table test_pwd
fields terminated by ':'
(
   login_id     char,
   xfield       char,
   user_id      integer external,
   group_id     integer external,
   full_name    char,
   full_path    char,
   start_shell  char)

passwd.dat data file

skanathu:x:1030:10:Swamy Kanathur:/home/skanathu:/bin/ksh
pmcgrath:x:1100:10:Patrick McGrath:/export/home/pmcgrath:/bin/ksh
rvanderz:x:1101:10:Ryan Vanderzanden:/export/home/rvanderz:/bin/ksh
jgreenle:x:1102:10:Jerry Greenlee:/export/home/jgreenle:/bin/ksh
amohamed:x:1103:10:Azeem Mohamed:/export/home/amohamed:/bin/ksh
cspacko:x:1107:10:Chris Spacko:/export/home/cspacko:/bin/ksh
mkurtz:x:1108:10:Mark Kurtz:/export/home/mkurtz:/bin/ksh
solson:x:1109:10:Scott Olson:/export/home/solson:/bin/ksh
shealy:x:1110:10:Steve Healy:/export/home/shealy:/bin/ksh
imnadm:x:1111:2504:NetQuestion:/opt/IMNSearch:/bin/sh
jsteffan:x:1034:102:Joshua Steffan:/export/home/jsteffan:/bin/ksh
rbuglio:x:1001:102:Rick Buglio:/export/home/rbuglio:/bin/ksh
djenson:x:1007:102:Deb Jenson:/export/home/djenson:/bin/ksh
mfriel:x:1009:102:Mike Friel:/export/home/mfriel:/bin/ksh

After the load is complete, Listing 17.5 illustrates the data in the table.

Listing 17.5. SQL Query Showing Data Loaded by Listing 17.4
SQL> select login_id, user_id
  2  from test_pwd;

LOGIN_ID                USER_ID
-------------------- ----------
skanathu                   1030
pmcgrath                   1100
rvanderz                   1101
jgreenle                   1102
amohamed                   1103
cspacko                    1107
mkurtz                     1108
solson                     1109
shealy                     1110
imnadm                     1111
jsteffan                   1034
rbuglio                    1001
djenson                    1007
mfriel                     1009

Listing 17.6 shows the SQL*Plus help system control file. Notice the INFILE is a *, which means that the data will be supplied (see the BEGINDATA syntax). This control file is using the APPEND feature and loads a table called HELP.

Listing 17.6. SQL*Plus HELP Table Control File
--
-- Copyright  1997 by Oracle Corporation. All Rights Reserved.
--
load data
infile *
preserve blanks
into table help append
fields terminated by '^'
TRAILING NULLCOLS
(
    topic, seq, info
)
BEGINDATA
Assignment Statement^1^
Assignment Statement^2^ Assignment Statement
.
.
.

Listing 17.7 shows the familiar EMP table being loaded with SQL*Loader. Notice the INFILE has fixed position fields. Because there is no MODE specified, INSERT is assumed and the EMP table being loaded must not have any rows in it.

Listing 17.7. EMP Table Control File
LOAD DATA
INFILE 'employees.dat'
INTO TABLE EMP

( EMPNO    POSITION(01:04) INTEGER EXTERNAL,
  ENAME    POSITION(06:15) CHAR,
  JOB      POSITION(17:25) CHAR,
  MGR      POSITION(27:30) INTEGER EXTERNAL,
  SAL      POSITION(32:39) DECIMAL EXTERNAL,
  COMM     POSITION(41:48) DECIMAL EXTERNAL,
  DEPTNO   POSITION(50:51) INTEGER EXTERNAL)

Listing 17.8 illustrates the log file generated from the load of the table in Listing 17.4. If there were any errors, they too would have appeared in this file. Notice all the options and any default options taken are specified. This was a successful load because all 14 records loaded with no errors or rejects. Be sure to look at the time stamp at the end of the listing.

Listing 17.8. TEST_PWD Table Load Log File
SQL*Loader: Release 9.0.1.0.0 - Production on Sat Aug 11 17:10:13 2001

 Copyright 2001 Oracle Corporation.  All rights reserved.

Control File:   passwd.ctl
Data File:      passwd.dat
  Bad File:     passwd.bad
  Discard File:  none specified

 (Allow all discards)

Number to load: ALL
Number to skip: 0
Errors allowed: 50
Bind array:     64 rows, maximum of 256000 bytes
Continuation:    none specified
Path used:      Conventional

Table TEST_PWD, loaded from every logical record.
Insert option in effect for this table: REPLACE

   Column Name                  Position   Len  Term Encl Datatype
------------------------------ ---------- ----- ---- ---- ---------------------
LOGIN_ID                            FIRST     *   :       CHARACTER
XFIELD                               NEXT     *   :       CHARACTER
USER_ID                              NEXT     *   :       CHARACTER
GROUP_ID                             NEXT     *   :       CHARACTER
FULL_NAME                            NEXT     *   :       CHARACTER
FULL_PATH                            NEXT     *   :       CHARACTER
START_SHELL                          NEXT     *   :       CHARACTER


Table TEST_PWD_EXT:
  14 Rows successfully loaded.
  0 Rows not loaded due to data errors.
  0 Rows not loaded because all WHEN clauses were failed.
  0 Rows not loaded because all fields were null.
Space allocated for bind array:                 115584 bytes(64 rows)
Read   buffer bytes: 1048576

Total logical records skipped:          0
Total logical records read:            14
Total logical records rejected:         0
Total logical records discarded:        0

Run began on Sat Aug 11 17:10:13 2001
Run ended on Sat Aug 11 17:10:20 2001

Elapsed time was:     00:00:07.50
CPU time was:         00:00:00.07

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.205.232