External tables

Data can be imported into and exported from Oracle 10g in different ways. E.g. by using the SQL*Loader tool for text files (sqlldr) or the Data Pump (impdp/expdp) tools for the propriety Oracle RAW format. For previous Oracle versions, the tools imp and are also available. Those import/export to external locations on the client machine, not the server machine.

Another way to get data in and out of Oracle is to define an external table. To do this you have to create a table in Oracle in the regular way using a CREATE TABLE statement, but with adding an additional ORGANIZATION EXTERNAL clause.

Within this clause the properties of the external (server side!) location can be specified.

EXAMPLE

CREATE TABLE TEST
(
  "CODE" VARCHAR2(17 BYTE)
)
ORGANIZATION EXTERNAL
(
  TYPE ORACLE_LOADER
  DEFAULT DIRECTORY DATA_PUMP_DIR
  ACCESS PARAMETERS (
    RECORDS DELIMITED BY NEWLINE
    CHARACTERSET WE8MSWIN1252
    FIELDS TERMINATED BY 0X'09'
    MISSING FIELD VALUES ARE NULL
    REJECT ROWS WITH ALL NULL FIELDS
  )
  LOCATION ( DATA_PUMP_DIR: 'EXTFILE.TXT' )
)

FIELD LIST

The list of fields following the create table keywords cannot be inferred dynamically, but has to be specified explicitly. So it is not possible to "hook up" a text file and retrieving the list of fields automatically.

Trick: import the data first into a table using SQL Developer, TOAD, etc, and create the DDL script from it. Finally, copy and paste the field list from this script.

It is also possible to specify the "external" list of fields inside the organization external clause. This is, for example, similar to how it is done using Oracle Warehouse Builder (OWB).

TYPE

One of these properties is "TYPE", which specifies the access driver that Oracle uses. There are two values for this property:
  • ORACLE_LOADER: only read access, from all kinds of text files (but not, for example, Excel .xls)
  • ORACLE_DATAPUMP: read and write access, a propriety RAW format
The access driver is a wrapper Oracle uses to call the previously mentioned SQL*Loader or Data Pump functionality.

An error occurs when an insert statement is performed on a table is created with access driver ORACLE_LOADER:
SQL Error: ORA-30657: operation not supported on external organized table30657.0000 -  "operation not supported on external organized table"
*Cause:    User attempted on operation on an external table which is
           not supported.
*Action:   Don't do that!
The ORACLE_DATAPUMP type has to be used to export data from Oracle using an external table. Unfortunately, this data cannot be humanly read or imported into any other tool then an Oracle database.

DEFAULT DIRECTORY / LOCATION

A directory object has to be available in Oracle. If it is not already present, it has to be created. For example:

CREATE OR REPLACE DIRECTORY DATA_PUMP_DIR AS 'C:\'

Now the tag DATA_PUMP_DIR can be used in the organization external clause. The DEFAULT DICTIONARY property is mandatory and has to be used when setting the LOCATION.

A path specification is not allowed between the single-quotes.

This is not valid:
  • LOCATION ( 'C:\EXTFILE.TXT' ) 
ACCESS PARAMETERS

The options are rather straightforward, except perhaps "FIELDS TERMINATED BY". To set the field delimiter a character can be specified between single-quotes. But it is also possible to specify special characters like tab or linefeed.

To specify a tab:

FIELDS TERMINATED BY 0X'09'

The 0X instructs Oracle that the value between the single-quotes should be interpreted as an ASCII value. In the example, ASCII character 9 is a tab.

OPTIONALLY ENCLOSED BY '"'

It is also possible to specify that some values have double-quotes around them. This however does not work in the following case:

Column 1 value - tab - tab - Column 3 value

Here a record has a value for columns 1 and 3, but not for column 2. It should be read as NULL, but is actually skipped. This results in the value of column 3 being wrongly inserted in column 2.

CHARACTER SET

Explicitly setting the character set of the external location prevents problems with interpreting the data and can improve performance.

From Odi's blog:

If you do not specify the CHARACTERSET parameter, the database character set is used to interpret the input file. As the input file is in WE8MSWIN1252, the ORACLE_LOADER driver gets confused trying to interpret single-byte WE8MSWIN1252 codes as multibyte AL32UTF8 codes.

The character set of the input file depends on the way it was created. Even on US Windows, you can create text files in different encodings. Notepad allows you to save the file in ANSI code page (=WE8MSWIN1252 on US Windows), Unicode (=AL16UTF16LE), Unicode big endian (=AL16UTF16), and UTF-8 (=AL32UTF8). The Command Prompt edit.exe editor saves the files in the OEM code page (=US8PC437 on US Windows).

-- Sergiusz

LOGGING

All errors are logged to a bad file - containing the violating records - and a log file - containing messages about the error that occured.

To disable logging, (some of) the following keywords can be added to the clause:
  • NODISCARDFILE
  • NOBADFILE
  • NOLOGFILE
Oracle specification of the ORACLE_LOADER access driver:

No comments: