Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Elie.

Asked: December 08, 2002 - 12:11 am UTC

Last updated: March 12, 2013 - 9:35 am UTC

Version: 9.2.0

Viewed 10K+ times! This question is

You Asked

Hi, Tom.

I am trying to use Oracle 9i's ability to treat ASCII files as Oracle tables and have run into a few problems.

First off , here is the script I've written that creates a directory called "external_tables_dir" and then reads an ASCII file called "clntques.txt" whose fields are pipe delimited:

declare
ddl varchar2(2000);
cnt integer;
begin
ddl := 'create or replace directory external_tables_dir as ''/home/athena_d/ora_ext_tbl''';
execute immediate ddl;

select count(*)
into cnt
from user_tables
where table_name = 'EXTERNAL_CLNTQUES';

if ( cnt = 1 ) then
ddl := 'drop table athena.external_clntques';
execute immediate ddl;
end if;

ddl := '
create table athena.external_clntques
(CLIENT_ID number(10),
NAME varchar2(60),
SHORT_NAME varchar2(15),
QTEXT varchar2(4000),
QUESTION_CATEGORY varchar2(60),
CLI_QUESTION_CODE varchar2(60),
IS_ACTIVE_CHK varchar2(1),
IS_DELETED_CHK varchar2(1),
CREATED_ON date,
UPDATED_ON date,
CREATED_BY_ID number(10),
UPDATED_BY_ID number(10),
ALT_KEY varchar2(60)
)
organization external
(type oracle_loader
default directory athena_external_tables_dir
access parameters
(fields terminated by ''|''
missing field values are null
(client_id,
name,
short_name,
qtext,
question_category,
cli_question_code,
is_active_chk,
is_deleted_chk,
created_on,
updated_on char date_format date mask "dd-mon-yyyy hh24:mi:ss",
created_by_id,
updated_by_id,
alt_key)
)
location (''clntques.txt'')
)
parallel 2
reject limit unlimited';
execute immediate ddl;
end;
/

And here are the first few rows from the ASCII file:

CLIENT_ID|NAME|SHORT_NAME|QTEXT|QUESTION_CATEGORY|CLI_QUESTION_CODE|IS_ACTIVE_CHK|IS_DELETED_CHK|CREATED_ON|UPDATED_ON|CREATED_BY_ID
|UPDATED_BY_ID|ALT_KEY
6|||Would you be interested in speaking with a Ryder Vehicle Sales manager to discuss your current needs?|||Y|N|||||RYDER*00527
6|||Name of person completing the survey if different than addressed.|||Y|N|||||RYDER*00049
6|||How would you classify your company?|||Y|N|||||RYDER*00050
6|||Supply Chain Services:|||Y|N|||||RYDER*00372
6|||Were you contacted by Ryder Vehicle Sales?|||Y|N|||||RYDER*00523
6|||Are <COMPANY> customers requesting 'Just in Time Delivery'?|||Y|N|||||RYDER*00007

=============================================================

Question 1:
Running the above script produces no errors. However,
if I then run SELECT * FROM EXTERNAL_CLNTQUES, Oracle creates
a file EXTERNAL_CLNTQUES_5466.bad containing 6 records that
were rejected from my ASCII file. Oracle also creates the
file EXTERNAL_CLNTQUES_5466.log that contains these error
messages, one for each of the 6 rejected records:

KUP-04021: field formatting error for field QTEXT
KUP-04026: field too long for datatype
KUP-04101: record 54 rejected in file /home/athena_d/ora_ext_tbl/clntques.txt

This log file also defines the QTEXT field as:

QTEXT CHAR (255)
Terminated by "|"
Trim whitespace same as SQL Loader

And so, Oracle thinks that the QTEXT field in the ASCII file
is a CHAR(255) column. However, as specified in my sqlplus
script, the column name QTEXT is defined as VARCHAR2(4000).
How can I get my sqlplus script to recognize QTEXT as
VARCHAR2(4000)?


Question 2:
The first row in ASCII file clntques.txt is a header record, simply giving the names of the fields making up the file. After running my script, somehow Oracle "knows" that this very first row is really not part of the file's data. I know this is true because doing a SELECT * FROM EXTERNAL_CLNTQUES starts showing data from the second row in the file. How does Oracle know that this file's first row is, in fact, just a header record? There is no "SKIP=1" clause that I can put into my sqlplus script, is there?


Question 3:
Do you know how I can specify "log" and "bad" directories in
my sqlplus script? I have tried to use the clauses:

badfile external_tables_dir: ''bad_clntques''
logfile external_tables_dir: ''log_clntques''

in the script but Oracle then complains with "Invalid statement".


Question 4:
Whenever I run any type of SELECT query against virtual table EXTERNAL_CLNTQUES, Oracle creates at least 2 log files in the defined directory. I've noticed that Oracle sometimes even creates 3 files: 2 log files and a "bad" file if errors occurred. Why is Oracle doing this? In a real life application, how can I tell if a query running against such a virtual table caused errors? In order to know that my query ran OK, it seems that I have to constantly look for any new "bad" files generated in the defined directory as opposed to simply getting an Oracle error either displayed on a terminal (as in a SQLPLUS session) or an error that I can "catch" in an exception handler (as in a plsql package).

I would be interested to know your answers/comments to these questions.

Thanks very much.

Elie


and Tom said...

parallel is meaningful ONLY with fixed length (positional) files. We need to know that each record is say "100 bytes" in order to parallelize on it. Thats just an FYI...

Also, don't you find it tedious to run ddl in plsql? Just use sqlplus (thats what I'm going to do)

I see:

create or replace directory external_tables_dir as

but also:

(type oracle_loader
default directory athena_external_tables_dir

so, perhaps (99.999% likely) you are looking at one file - but loading a totally different one? I say that because I ran this pretty much unchanged (except no plsql -- just a script)

ops$tkyte@ORA920.US.ORACLE.COM> @test
ops$tkyte@ORA920.US.ORACLE.COM> drop table external_clntques;

Table dropped.

ops$tkyte@ORA920.US.ORACLE.COM>
ops$tkyte@ORA920.US.ORACLE.COM> create or replace directory external_tables_dir as
2 '/tmp/'
3 /

Directory created.

ops$tkyte@ORA920.US.ORACLE.COM>
ops$tkyte@ORA920.US.ORACLE.COM> create table external_clntques
2 (CLIENT_ID number(10),
3 NAME varchar2(60),
4 SHORT_NAME varchar2(15),
5 QTEXT varchar2(4000),
6 QUESTION_CATEGORY varchar2(60),
7 CLI_QUESTION_CODE varchar2(60),
8 IS_ACTIVE_CHK varchar2(1),
9 IS_DELETED_CHK varchar2(1),
10 CREATED_ON date,
11 UPDATED_ON date,
12 CREATED_BY_ID number(10),
13 UPDATED_BY_ID number(10),
14 ALT_KEY varchar2(60)
15 )
16 organization external
17 (type oracle_loader
18 default directory external_tables_dir
19 access parameters
20 (fields terminated by '|'
21 missing field values are null
22 (client_id,
23 name,
24 short_name,
25 qtext,
26 question_category,
27 cli_question_code,
28 is_active_chk,
29 is_deleted_chk,
30 created_on,
31 updated_on char date_format date mask "dd-mon-yyyy hh24:mi:ss",
32 created_by_id,
33 updated_by_id,
34 alt_key)
35 )
36 location ('clntques.txt')
37 )
38 parallel 2
39 reject limit unlimited;

Table created.

ops$tkyte@ORA920.US.ORACLE.COM>
ops$tkyte@ORA920.US.ORACLE.COM> exec print_table( 'select * from external_clntques' )
CLIENT_ID : 6
NAME :
SHORT_NAME :
QTEXT : Would you be interested in speaking with a Ryder Vehicle Sales manager to discuss your current
needs?
QUESTION_CATEGORY :
CLI_QUESTION_CODE :
IS_ACTIVE_CHK : Y
IS_DELETED_CHK : N
CREATED_ON :
UPDATED_ON :
CREATED_BY_ID :
UPDATED_BY_ID :
ALT_KEY : RYDER*00527
-----------------
CLIENT_ID : 6
NAME :
SHORT_NAME :
QTEXT : Name of person completing the survey if different than addressed.
QUESTION_CATEGORY :
CLI_QUESTION_CODE :
IS_ACTIVE_CHK : Y
IS_DELETED_CHK : N
CREATED_ON :
UPDATED_ON :
CREATED_BY_ID :
UPDATED_BY_ID :
ALT_KEY : RYDER*00049
-----------------
CLIENT_ID : 6
NAME :
SHORT_NAME :
QTEXT : How would you classify your company?
QUESTION_CATEGORY :
CLI_QUESTION_CODE :
IS_ACTIVE_CHK : Y
IS_DELETED_CHK : N
CREATED_ON :
UPDATED_ON :
CREATED_BY_ID :
UPDATED_BY_ID :
ALT_KEY : RYDER*00050
-----------------
CLIENT_ID : 6
NAME :
SHORT_NAME :
QTEXT : Supply Chain Services:
QUESTION_CATEGORY :
CLI_QUESTION_CODE :
IS_ACTIVE_CHK : Y
IS_DELETED_CHK : N
CREATED_ON :
UPDATED_ON :
CREATED_BY_ID :
UPDATED_BY_ID :
ALT_KEY : RYDER*00372
-----------------
CLIENT_ID : 6
NAME :
SHORT_NAME :
QTEXT : Were you contacted by Ryder Vehicle Sales?
QUESTION_CATEGORY :
CLI_QUESTION_CODE :
IS_ACTIVE_CHK : Y
IS_DELETED_CHK : N
CREATED_ON :
UPDATED_ON :
CREATED_BY_ID :
UPDATED_BY_ID :
ALT_KEY : RYDER*00523
-----------------
CLIENT_ID : 6
NAME :
SHORT_NAME :
QTEXT : Are <COMPANY> customers requesting 'Just in Time Delivery'?
QUESTION_CATEGORY :
CLI_QUESTION_CODE :
IS_ACTIVE_CHK : Y
IS_DELETED_CHK : N
CREATED_ON :
UPDATED_ON :
CREATED_BY_ID :
UPDATED_BY_ID :
ALT_KEY : RYDER*00007
-----------------

PL/SQL procedure successfully completed.


and it worked immediately? As for the QTEXT question -- sure, your database column definition -- what the input data will be mapped to -- is a varchar2(4000) but the input type is defaulted -- you didn't specify an input type for QTEXT. Hence, it defaults to char(255). do this:

...
short_name,
qtext CHAR(4000),
question_category,
cli_question_code,
.....

and your log will have:

SHORT_NAME CHAR (255)
Terminated by "|"
Trim whitespace same as SQL Loader
QTEXT CHAR (4000)
Terminated by "|"
Trim whitespace same as SQL Loader
QUESTION_CATEGORY CHAR (255)
Terminated by "|"

instead....



2) I'm thoroughly confused now. Q1 was all about "help, I get nothing from my external table" but this one says "ok, when I select from it -- I only get the valid data -- it seems to skip the header?????

So - can you or can you not select from this external table?

Anyway - assuming you can, you'll find your header in the BAD file. Why? Well:

create table athena.external_clntques
(CLIENT_ID number(10),
....

the string client_id is not convertable to a number(10) so that record is quite simply rejected. You can skip this record via:

default directory external_tables_dir
access parameters
(
records delimited by newline skip 1

fields terminated by '|'
missing field values are null

(documented in
</code> http://docs.oracle.com/cd/B10501_01/server.920/a96652/ch12.htm#1009462 <code>
)....




q3)

access parameters
(
records delimited by newline
badfile external_tables_dir:foobar

skip 1

will create /tmp/foobar.bad for bad records (given my example)...

Since I don't know where you tried to stick this before, I cannot comment on why it did not work.


q4) Using your original create that worked (no skip, no badfile) I do not get the same results you do:

ops$ora920@ORA920.US.ORACLE.COM> !rm -f /tmp/*.bad /tmp/*.log

ops$ora920@ORA920.US.ORACLE.COM> !ls -l /tmp/*.bad /tmp/*.log
ls: /tmp/*.bad: No such file or directory
ls: /tmp/*.log: No such file or directory

ops$ora920@ORA920.US.ORACLE.COM> select count(*) from external_clntques;

COUNT(*)
----------
6

ops$ora920@ORA920.US.ORACLE.COM> !ls -l /tmp/*.bad /tmp/*.log
-rw-r--r-- 1 ora920 ora920 155 Dec 8 10:50 /tmp/EXTERNAL_CLNTQUES_23676.bad
-rw-r--r-- 1 ora920 ora920 1859 Dec 8 10:50 /tmp/EXTERNAL_CLNTQUES_23676.log

ops$ora920@ORA920.US.ORACLE.COM>


So, I cannot comment on your "two" log files since I don't see two and you don't show us what they were....

The very existence of a BAD file would indicate errors and you could do the following. Here I did use a BADFILE filename (foobar.bad) and created a "bad table"

ops$ora920@ORA920.US.ORACLE.COM> create table external_clntques_bad
2 ( text1 varchar2(4000) ,
3 text2 varchar2(4000) ,
4 text3 varchar2(4000)
5 )
6 organization external
7 (type oracle_loader
8 default directory external_tables_dir
9 access parameters
10 (
11 records delimited by newline
12 fields
13 missing field values are null
14 ( text1 position(1:4000),
15 text2 position(4001:8000),
16 text3 position(8001:12000)
17 )
18 )
19 location ('foobar.bad')
20 )
21 /

Table created.

ops$ora920@ORA920.US.ORACLE.COM>
ops$ora920@ORA920.US.ORACLE.COM>
ops$ora920@ORA920.US.ORACLE.COM> select count(*) from external_clntques_bad;

COUNT(*)
----------
1

It lets you see how many bad records there are (and even view if if you want)

ops$ora920@ORA920.US.ORACLE.COM> select * from external_clntques_bad;

TEXT1
-----------------------------------------------------------------------------------------------------------------------------------
TEXT2
-----------------------------------------------------------------------------------------------------------------------------------
TEXT3
-----------------------------------------------------------------------------------------------------------------------------------
CLIENT_ID|NAME|SHORT_NAME|QTEXT|QUESTION_CATEGORY|CLI_QUESTION_CODE|IS_ACTIVE_CHK|IS_DELETED_CHK|CREATED_ON|UPDATED_ON|CREATED_BY_I
D|UPDATED_BY_ID|ALT_KEY



The entire concept of the "bad" file is to IGNORE records that would cause an error. If you got an error (as you seem to imply you want by the comment "it seems that I have to constantly look for any new "bad"
files generated in the defined directory as opposed to simply getting an Oracle
error either displayed on a terminal"...

You surpressed the errors with reject limit unlimited. So yes, you would have to "constantly" look for bad files in your case. You can have an exception raised -- that is easy, just set your reject limit down.



Rating

  (203 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Questions About External Files

Elie Grunhaus, December 08, 2002 - 4:17 pm UTC

Thanks very much for your quick reply.

Your comments and examples are very instructive,especially the link you included to the Oracle documentation on external file access. I see there several clauses that will be very useful for me.


A reader, December 11, 2002 - 4:10 pm UTC

Hi Tom,

I have created an external table and I am getting error while accessing it.

 SQL> CREATE directory MY_DIR AS 'E:\ADMIN\';

Directory created.
SQL> grant all  on directory my_dir to public;

Grant succeeded.

  1   create table external_table
  2       (EMPNO NUMBER(4) ,
  3        ENAME VARCHAR2(10),
  4        JOB VARCHAR2(9),
  5        MGR NUMBER(4),
  6        HIREDATE DATE,
  7        SAL NUMBER(7, 2),
  8        COMM NUMBER(7, 2),
  9        DEPTNO NUMBER(2)
 10      )
 11      ORGANIZATION EXTERNAL
 12      ( type oracle_loader
 13        default directory MY_DIR
 14        access parameters
 15        ( fields terminated by ',' )
 16        location ('emp.txt')
 17* )
SQL> /

Table created.


SQL> select * from external_table;
select * from external_table
*
ERROR at line 1:
ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-04040: file emp.txt in MY_DIR not found
ORA-06512: at "SYS.ORACLE_LOADER", line 14
ORA-06512: at line 1


-- I have EMP.TXT file on e:\admin dir. on server

Thanks, 

Tom Kyte
December 11, 2002 - 10:32 pm UTC

Is e: a permanent drive on the server. (if it is a network mount, you are using windows, not as easy as unix with mounts)

Is e:\admin accesible to everyone (remember, oracle isn't running as YOU)

Is e:\admin\emp.txt accesible to everyone?


99999 times out of 100000 -- if the answers are no, yes, yes -- the file really isn't there, it is on the client -- check again.

A reader, December 12, 2002 - 8:26 am UTC

Hi Tom,

e: a permanent drive on the server
----------------------------------
-- e: is Removable disk drive(External) (We have test PC here in our lab and they don't have enough space so I added extrnal disk on one of the NT workstation and created drive E:\)

Is e:\admin accesible to everyone
---------------------------------

-- About four people are using this instance from their machine.We have given all rights on E:\ to all the people.

Is e:\admin\emp.txt accesible to everyone?
------------------------------------------
-- Since I gave right on E:\ to all It is accesible to everybody. Everybody mapped drive e:\ on their machine AND they are able to do all operation(-rwx) to this drive.

Thanks,


Tom Kyte
December 12, 2002 - 10:07 am UTC

and if you use a permanent drive, in order to test the theory that maybe this removable media is not available to the services that start on boot up -- what happens then.

See
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:241814624807 <code>
for where I'm going here -- that the removable media is like a network drive and not universally accessible to the services.

Test by using a real disk.

fields are not terminated

Pushparaj, September 25, 2003 - 4:15 pm UTC

Tom,

I have a data file and the fields are not terminated
by anything. It is basically terminated by the field position. What should I provide in the access parameters
list.

For example,
My datafile is

11SCOTT10
12SAMMY20

empno position (1,2)
ename position (3,7)
deptno position (8,9)


CREATE TABLE emp_ext
( empno number(2),
ename char(5),
deptno number(2)
)
organization external
( type oracle_loader
default directory DATA_DIR
access parameters
(
fields terminated by ????????
)
location ('emp.dat')
)
/

Tom Kyte
September 25, 2003 - 11:33 pm UTC

don't use terminated by, use POSTITION(n:m) instead

This is great! BUT how do I do this?

chneih, October 21, 2003 - 10:49 am UTC

I have a CSV datafile optionally enclosed by '"' but with comma inside the enclosed '"' field.

For example

AcctNO Name JobTitle Age

123,"Schmoe, Joe","Mechaic",23
456,"David","Manager",45
789,"Blaine, David, E","",20


Thanks,
Chneih

Tom Kyte
October 21, 2003 - 5:09 pm UTC

tis ok, that is what the enclosed by is about

Question

chneih, October 22, 2003 - 11:05 am UTC

How do I load everything to a 4 column table. Right now I'm using

fields delimited by ',' optionally enclosed by '"'

AcctNO Name JobTitle Age

123,"Schmoe, Joe","Mechaic",23
456,"David","Manager",45
789,"Blaine, David, E","",20

Thanks again


Tom Kyte
October 22, 2003 - 6:03 pm UTC

ops$tkyte@ORA920> desc t
 Name                                Null?    Type
 ----------------------------------- -------- ------------------------
 A                                            NUMBER(38)
 B                                            VARCHAR2(25)
 C                                            VARCHAR2(15)
 D                                            NUMBER(38)
 
ops$tkyte@ORA920> !cat t.ctl
LOAD DATA
INFILE *
INTO TABLE T
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
(a,b,c,d)
BEGINDATA
123,"Schmoe, Joe","Mechaic",23
456,"David","Manager",45
789,"Blaine, David, E","",20
 
ops$tkyte@ORA920> !sqlldr / t.ctl
 
SQL*Loader: Release 9.2.0.3.0 - Production on Wed Oct 22 18:05:09 2003
 
Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.
 
Commit point reached - logical record count 3
 
  1* select * from t
ops$tkyte@ORA920> /
 
         A B                         C                        D
---------- ------------------------- --------------- ----------
       123 Schmoe, Joe               Mechaic                 23
       456 David                     Manager                 45
       789 Blaine, David, E                                  20
 
 

Quick question about utl_file.fgetattr

Jimmy, February 05, 2004 - 5:25 am UTC

Hey Tom!

Just a quick question. When loading data into my database via External Tables there is sometimes a External_Table.bad file that is created in the DATA_DIR.

This is fine but is there any way of using utl_file.fgetattr to scan if the file exists? Normally I would do something like:

l_filename := 'EXTERNAL_TABLE.bad';
utl_file.fgetattr('DATA_DIR',l_filename,l_exists,l_size,l_block_size);

But the problem i am having is that the data loading from the external table is done monthly and the latter part of the .bad file changes depending on when the procedure is run. Is there any way of using a
LIKE(EXTERNAL_TABLE_%.bad) to pick up the .bad file from the DATA_DIR?

searching for any .bad file will not do because i have multiple external tables each potentailly creating a .bad file.

Any help would be most appreciated!

Thanks



Tom Kyte
February 05, 2004 - 7:42 am UTC

what is the format you used in the create table statement to name the bad file? did you use a format?

Jimmy, February 05, 2004 - 9:41 am UTC

Not at first and this was the underlying reason why i needed to search the filename of the bad file within the DATA_DIR.

I think that i am gonna have to define a fixed format for each bad file within my create external tables scripts thus avoiding the problem.

Thanks for your help though

Regards

Accessing a flat file stored in XDB

Dave, February 05, 2004 - 3:09 pm UTC

I have recently been exploring the use of XDB, specifically using XDB as an ftp server.

So people/systems can ftp files to Oracle, through the listener, and they are stored directly in the database. Now my question,and it may be a rather broad one I'm afraid, is this:- since this file is now within the database, is there a way for me to ...

i) find it
ii) read the data in it

... through SQL?

I suspect that there's some XML-based way of doing this, but I've been a relational database guy for a long time and the XML stuff is an intimidating learning curve. Obviously I'd like to be able to accept files through ftp and access them directly without writing them out to the o/s and reading them in again.

Does this seem achievable?

Tom Kyte
February 06, 2004 - 2:15 pm UTC

I asked Sean Dillon, our XML technologist, to take a look at this and here's what he had to say:
-----

Hi Dave,

Yes, absolutely. There are a couple of ways to do what you want to do, and the good news is they've all been documented/sampled/explained time and again. I'll explain some of these things and then point you to some resources on OTN that you should read that will tell you what you want to know.

First... how do you FIND your XML files in the XML DB repository? Well, that's pretty simple. With XML DB comes a couple of views you'll want to familiarize yourself with. These are RESOURCE_VIEW and PATH_VIEW. These views provide you a mechanism to access data stored in the XML Repository from SQL.

RESOURCE_VIEW:
Column Datatype Description
------ -------- -------------------------------------------------------
RES XMLTYPE A resource in Oracle XML repository
ANY_PATH VARCHAR2 A path used to access the resource in the repository

PATH_VIEW:
Column Datatype Description
------ -------- -----------------------------
PATH VARCHAR2 Path name of a resource
RES XMLTYPE The resource referred by PATH
LINK XMLTYPE Link property

If you look in the XML DB Developer's Guide, chapter 18 is entitled SQL Access Using RESOURCE_VIEW and PATH_VIEW. You can find the documentation on OTN here: </code> https://docs.oracle.com

Go to the bottom of that page, get the XML DB Developer's Guide.

For accessing the data, there are a number of ways to do this.  If you load in arbitrary XML (i.e., XML that does NOT use a database-registered XML Schema, the only option you have is through XMLType access methods (because essentially, that's what you've loaded into the repository).  

There have been quite a few articles on this subject in Oracle Magazine, I'd suggest reading those and doing the examples to get a better understanding of how you can access the XML data stored in the repository.  If after that you still have questions, come back to AskTom and fire away ;-)  (i.e., no need to rewrite the article's contents here ;-)).  Check out these articles:

https://docs.oracle.com https://asktom.oracle.com/Misc/oramag/on-procedures-flushes-and-writes.html <code>

Hope that helps!

_smd_







External tables or ETL tool

A reader, February 07, 2004 - 12:07 am UTC

Hi Tom,
There is a requirement to create logical record from ASCII input files (total 10). The input files size vary from 1GB to 5GB. The relationships between various input ASCII files are known. The final o/p is required in ASCII file. We also have third party ETL tool. Initially, I was thinking of using ETL tool, but on reading your new book (page 411), I am thinking of using external tables (we have Oracle9i - 9.2). Could you please give your advice with the approach below:
1. Create external tables with for each input file, and then create tables (as per example on page 411) in parallel.
2. Create required indices
3. Join various tables created in step 1, and then create ASCII file through sqlplus.
The advantage of ETL tools is I need not to create tables ie., I can join ASCII files, and finally create ASCII file. But I am not sure about performance. With external tables approach, even creating tables, and then creating indices , and finally spool the file, will also have performance impact. Further, I noticed before (in older versions of oracle) that there was some limitation in o/p file size produced through sqlplus. I am expecting o/p file size in 10 GB (records are formed from various fields of input files). Is there maximum file size limit for the file to be produced through sqlplus in 9.2 (64bit)?
Thanks


Tom Kyte
February 07, 2004 - 2:39 pm UTC

well, there is 0% chance I would even remotely consider using sqlplus for a 10gig output file.

I might use my array_flat program and just run a query that queries the external tables directly to produce the resulting output file. Just configure sufficient TEMP space, pga_aggregate_target (or set big sort_area/hash_area sizes) and let it rip. It could be as easy as 5 create tables (with dbms_stats.set_table_stats to give us a tiny bit of info) and a query. (don't even bother indexing, just do a big bulk query and let it rip).



</code> http://asktom.oracle.com/~tkyte/flat/index.html <code>
look for the pro*c program.

Dave, February 08, 2004 - 10:51 am UTC

Thanks Sean (and Tom), those are great references to start with.

As an architectural followup, is the inclusion of web-server capabilities in the database product a sign that Oracle is aiming to move away from Apache for some products? How "well-featured" do you regard Oracle as an HTTP server currently?

Thanks again.

Tom Kyte
February 08, 2004 - 8:56 pm UTC

Not a chance -- apache is part and parcel of the solution -- still there in 10g.

HTTP is a very very simple protocol when used simply. apache offers lots more (the whole module concept and all of the existing modules). Setting up http services is pretty easy -- doesn't require apache in all cases. apache is a web server -- we are not trying to take over that entire realm there.

Another question on external files.

Matt, February 10, 2004 - 7:25 pm UTC

I am working on Windows and have to load UNIX base files.

I have been using the following access parameters:

records delimited by newline

Obviously on windows this is "\r\n". The UNIX files are "\n". I am currently getting the message:

*
ERROR at line 1:
ORA-29913: error in executing ODCIEXTTABLEFETCH callout
ORA-29400: data cartridge error
KUP-04020: found record longer than buffer size supported, 524288, in D:\GRMS
Migration Letterbox\REMCODEV\inbox\regdata.csv
ORA-06512: at "SYS.ORACLE_LOADER", line 14
ORA-06512: at line 1

Looking at the utilities doco I understand that I can use hex values. I have tried the following:

records delimited by "OA"

records delimited by "OD"

and I get the error;

*
ERROR at line 1:
ORA-29913: error in executing ODCIEXTTABLEFETCH callout
ORA-29400: data cartridge error
KUP-04020: found record longer than buffer size supported, 524288, in D:\GRMS
Migration Letterbox\REMCODEV\inbox\regdata.csv
ORA-06512: at "SYS.ORACLE_LOADER", line 14
ORA-06512: at line 1

What am I doing wrong?

Best Regards,

Tom Kyte
February 11, 2004 - 8:50 am UTC

can you ftp the files to your machine to get the newlines correct? ftp from unix to windows will accomodate for the fact that windows wants 2 things to end a line.


Short of that -- not having a windows machine to try anything with and not seeing the entire example, I cannot really comment further right now.

An example

Matt, February 16, 2004 - 2:24 am UTC

Tom,

According to

</code> http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96652/ch12.htm#1007431 <code>

If I am reading the doco correctly, I should be able to use 'delimited by "OA"' to enable me to access a unix file on a windows platform.

Can you please provide a quick and dirty example of this syntax working - it doesn't for me. I have a call open with Oracle support and am not getting a very good response.

If you can please just show me a unix file on a unix box (if that is all you have) being accessed as an external table using the "delimited by "<hex value here>"" syntax it would really be appreciated.

Thanks.

Tom Kyte
February 16, 2004 - 10:03 am UTC

Ok, so I install 9ir2 on my windoze VM (vmware -- very cool, lets me runs windows ironically enough in a window. perfection).....

created a flat file in unix, binary ftp it. then:

CREATE TABLE ext_table
(
EMPNO NUMBER(4),
ENAME VARCHAR2(10),
JOB VARCHAR2(9),
MGR NUMBER(4),
HIREDATE DATE,
SAL NUMBER(7,2),
COMM NUMBER(7,2),
DEPTNO varchar2(4000)
)
ORGANIZATION external
(
TYPE oracle_loader
DEFAULT DIRECTORY MYFILES
ACCESS PARAMETERS
(
RECORDS DELIMITED BY 0x'0a' CHARACTERSET WE8ISO8859P1

FIELDS TERMINATED BY "|" LDRTRIM
REJECT ROWS WITH ALL NULL FIELDS
(
EMPNO CHAR(255)
TERMINATED BY "|",
ENAME CHAR(255)
TERMINATED BY "|",
JOB CHAR(255)
TERMINATED BY "|",
MGR CHAR(255)
TERMINATED BY "|",
HIREDATE CHAR(255)
TERMINATED BY "|",
SAL CHAR(255)
TERMINATED BY "|",
COMM CHAR(255)
TERMINATED BY "|",
DEPTNO CHAR(255)
TERMINATED BY "|"
)
)
location
(
'emp.dat'
)
)REJECT LIMIT UNLIMITED
/


loads it - the trick is 0x'0a'
^^

Excellent - thanks!

Matt, February 16, 2004 - 4:55 pm UTC

I think that the doco that I have read is a little thin on the ground on this part of the syntax - maybe it has improved in 10G?

Thanks for the quick turnaround - questions like these seem to take an age for Oracle Support to respond to accurately.

How do you find the VMWare - does it work well? I plan to use it to get a 10G install on Linux working on my Win2K machine.

Thanks again.



Tom Kyte
February 17, 2004 - 7:56 am UTC

you got that backwards. You should use it to get 10g under windows running on your linux machine! windows in a window is such a nice thing.


I find vmware to be pretty stable -- use it to access AOL and other "we only do windoze" software when needed. The database is fully supported under vmware as well.

A reader, March 04, 2004 - 10:31 pm UTC

Tom,

Lets say if I have text file which is delimited by '|' and ','

Can you please provide me an example by which i can take care of both of these delimiters by creating one external table.

Thanks.




Tom Kyte
March 05, 2004 - 7:53 am UTC

it doesn't do delimited by this that or the other thing. only "delimited by this"

A reader, March 05, 2004 - 9:01 am UTC

Can I do this in sqlldr then?

Tom Kyte
March 05, 2004 - 10:20 am UTC

sqlldr and external tables are virtually synonymous.


fields are delimited by "a thing"

we'd have to parse a line ourselves (see
</code> http://asktom.oracle.com/~tkyte/SkipCols/index.html <code>for an example)

Thought about preprocessing?

Tanja Wilson, March 05, 2004 - 11:55 am UTC

Depending upon file size, it might be faster to preprocess the file and change the "|" to a "," or vice versa, so you have a file delimited by only one thing.

We use sed (a basic Unix command) to do simple things like this to "clean up" files before passing them to sqlldr / external tables.

location of files

reader, March 07, 2004 - 4:04 pm UTC

Should the flat files locate on the serever where the databse is if I use external tables? Thanks.

Tom Kyte
March 07, 2004 - 4:48 pm UTC

yes, they must be accessible to that server.

A reader, March 08, 2004 - 9:43 am UTC

Thanks Tom and Tanja..


Any limitations on the number of external tables you can create

Grant in Calgary, March 25, 2004 - 6:02 pm UTC

Hi Tom,

I have recently built procedure that uses dynamic sql to generate an external table with a unique_name based on the business file type and loads the data into the system. This is an operation that occurs frequently every day and includes files that have sizes exceeding 100mb. The end users that use the apps that call this procedure also have the ability to "correct" the data after it has been loaded into the system and occasionally make the wrong correction. Because of this, our old system use to involve tracking down the load file and finding out what the original values looked like to find the errors.
(Sorry for being so long winded!)

With the new system, I would like to maintain the external tables for 6 months to a year, in order to build queries that compare the data in the file to what exists in the system. The only thing that I can think of with this approach is that it may mean there could be thousands or 10's of thousand of these external tables.

I wanted to know if there is a downside to maintaining all of these external tables or any limitations in oracle as to how many external table definitions the database can maintain at one time.

If there is, I can build into the procedure, the functionality necessary to create and drop the external table whenever I access it. Is this necessary?

Thanks Again Tom in advance

Tom Kyte
March 25, 2004 - 9:31 pm UTC

you would just be having a really large data dictionary, on the size of an Oracle Apps/peoplesoft/SAP one :)

Queries that hit the data dictionary (monitoring tools, guis mostly) might be affected, but should be a serious issue in most cases -- it wouldn't even affect it as much as a "real" table since it won't be filling up the extents views like real ones would..

External table and tab delimited data file

SAJ, April 07, 2004 - 2:31 pm UTC

I have a problem with external table and tab delimited text file. Specially, when there are fields in the data that are null. Can you help please?

Here are the scripts

drop TABLE testload

CREATE TABLE testload (
deptno NUMBER(6),
dname VARCHAR2(20),
loc VARCHAR2(25)
)
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
DEFAULT DIRECTORY GMF_DATA_AU
ACCESS PARAMETERS
( FIELDS TERMINATED BY '\t'
OPTIONALLY ENCLOSED BY '"'
missing field values are null
(
deptno integer EXTERNAL(6),
dname CHAR(20),
loc CHAR(25)
)
)
LOCATION ('testload.txt')
)
REJECT LIMIT UNLIMITED
LOGGING
NOCACHE
NOPARALLEL;

The data file is below

12 SARATOGA
10 ACCOUNTING CLEVELAND
11 ART SALEM
13 FINANCE BOSTON
21 SALES PHILA.
22 SALES ROCHESTER
42 INT'L SAN FRAN

With the above script, when I select from the external table
this is what i get

SELECT * FROM testload
deptno dname loc
-----------------------------
12 SARATOGA
10 ACCOUNTING CLEVELAND
11 ART SALEM
13 FINANCE BOSTON
21 SALES PHILA.
22 SALES ROCHESTER
42 INT'L SAN FRAN



Tom Kyte
April 08, 2004 - 9:37 am UTC

that is the way it works -- when it sees multiple 'termination' characters, it eats them all up.

My first thought is -- you have SQL, lets use it:


ops$tkyte@ORA9IR2> CREATE TABLE "TESTLOAD"
  2  (
  3    DATA VARCHAR2(4000)
  4  )
  5  ORGANIZATION external
  6  (
  7    TYPE oracle_loader
  8    DEFAULT DIRECTORY TEST
  9    ACCESS PARAMETERS
 10    (
 11      RECORDS DELIMITED BY NEWLINE CHARACTERSET WE8ISO8859P1
 12      BADFILE 'testload.bad'
 13      LOGFILE 'test.log_xt'
 14      READSIZE 1048576
 15      FIELDS LDRTRIM
 16      REJECT ROWS WITH ALL NULL FIELDS
 17      (
 18        "DATA" CHAR(4000)
 19      )
 20    )
 21    location
 22    (
 23      'testload.txt'
 24    )
 25  )REJECT LIMIT UNLIMITED
 26  /
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select * from testload;
 
DATA
------------------------------------------------------------------------
12              SARATOGA
10      ACCOUNTING      CLEVELAND
11      ART     SALEM
13      FINANCE BOSTON
21      SALES   PHILA.
22      SALES   ROCHESTER
42      INT'L   SAN FRAN
 
7 rows selected.
 
ops$tkyte@ORA9IR2> column c1 format a15
ops$tkyte@ORA9IR2> column c2 format a15
ops$tkyte@ORA9IR2> column c3 format a15
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select substr( data, tab0+1, tab1-tab0-1 ) c1,
  2         substr( data, tab1+1, tab2-tab1-1 ) c2,
  3         substr( data, tab2+1, tab3-tab2-1 ) c3
  4    from (
  5  select data,
  6         0 tab0,
  7         instr(data,chr(9),1,1) tab1,
  8         instr(data,chr(9),1,2) tab2,
  9         length(data)+1 tab3
 10    from testload
 11         )
 12  /
 
C1              C2              C3
--------------- --------------- ---------------
12                              SARATOGA
10              ACCOUNTING      CLEVELAND
11              ART             SALEM
13              FINANCE         BOSTON
21              SALES           PHILA.
22              SALES           ROCHESTER
42              INT'L           SAN FRAN
 
7 rows selected.


use that last query as a view of testload and you have what you want.  as you can see, it is easy to generalize this to as many columns as you like. 

Thanks. It works like a charm!

SAJ, April 08, 2004 - 5:21 pm UTC

Not that I doubted you! You always come up with the most elegant solutions

But, I still don't understand why SQL*Loader behaves differently than External Table. Doesn't External table creation use SQL*Loader

I created a control file for the same data file and used SQL*loader for the test run and that loaded the file correctly into its respective columns

Tom Kyte
April 08, 2004 - 5:41 pm UTC

sqlldr does the same thing with that. I have that documented in my book "expert one on one Oracle".

it didn't load the way you describe -- two consecutive tabs in sqlldr would do the SAME thing (in sqlldr, it rejected the row unless I used trailing nullcols in fact)



Help Wanted!!!!!!!!!!!

A reader, June 22, 2004 - 4:41 pm UTC

Hi Tom,
I am using external tables. But I am getting the following error:
ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-04063: unable to open log file ext_vende_1012_1032.log
OS error The system cannot find the file specified.
ORA-06512: at "SYS.ORACLE_LOADER", line 14
ORA-06512: at line 1


The code is as follows:

drop table ext_VENDEDORES;

CREATE TABLE ext_VENDEDORES
(
VENDID NUMBER(10) ,
CODIGO VARCHAR2(10) ,
EMPRESA VARCHAR2(30) ,
NOMBRE VARCHAR2(80) ,
IDENTIFI NUMBER(10) ,
DIRECCION VARCHAR2(80) ,
TELEFONO NUMBER(10) ,
PROMOTOR NUMBER(1) ,
FECHA DATE ,
ACOPIO VARCHAR2(15) ,
ESTADO VARCHAR2(15)
)
organization external
(type oracle_loader
default directory my_dir
access parameters
(records delimited by newline
badfile my_dir:'ext_vende%a_%p.bad'
logfile my_dir:'ext_vende%a_%p.log'
fields terminated by ','
missing field values are null
(
VENDID ,
CODIGO ,
EMPRESA,
NOMBRE ,
IDENTIFI ,
DIRECCION,
TELEFONO ,
PROMOTOR ,
FECHA char date_format date mask "dd/mm/yyyy hh24:mi:ss",
ACOPIO,
ESTADO
)
)
location ('vendedores.txt')
)
reject limit unlimited;

I created the directory as my_dir on the associated path and copied the .txt data file to that directory.
The concerned user was granted read-write privileges on that directory. Also, I verified the existence of the directory from all_directories dictionary.

I also tried by copying and changing the location from 'vendedores.txt' to 'c:/vendedores.txt'

But it generated the same error. It creates the external table but generates the error when I do a select * from ext_vendedores;

Please help me resolve this issue..
Thanks as always

Tom Kyte
June 22, 2004 - 10:17 pm UTC

the directory must exist on the SERVER.

the concerned user is the "oracle software owner" as far as the OS is concerned.

oracle must have read write access to this directory
and the directory must exist on the SERVER (database server) itself.

Disable log

Jan, June 23, 2004 - 5:37 am UTC

Everytime I select from an external table, the log file "TABLENAME_SESSIONID.LOG" is generated. How can I disable this?

Thanks

Bad file

A reader, June 23, 2004 - 9:04 pm UTC

I have a lot of SQLLDR scripts to load data into staging tables and then upsert into main tables. The new external tables and MERGE features would be really useful here.

But with SQLLDR, which is invoked from Unix shell scripts, I usually look at the return code ($?) from 'sqlldr', do a 'wc -l' on the bad file, to determine success/failure, etc

So, when using external tables, I would need to create a bad file to see if the load succeeded or not, do a count(*) (similar to wc -l badfile.bad), etc.

Since the bad file has exactly the same format as my data file, instead of duplicating the DDL for the bad file/table, can I do the usual trick we do to copy a table's definition like

create table my_bad_table as select * from ext_table where 1=2;



Tom Kyte
June 24, 2004 - 9:16 am UTC

well, if the DDL for the original input wasn't able to be processed -- why would copying it to a "bad table" definition work???

the bad table would just reject all of the records!


your bad table would be something so simple the records could not be rejected, like:

ops$tkyte@ORA9IR2> create table t
  2  ( data varchar2(4000) )
  3  organization external
  4  ( type oracle_loader default directory data_dir
  5    access parameters
  6    ( records delimited by newline fields ( "DATA" char(4000) )  )
  7    location ('test.dat' )
  8  )
  9  /



if your rowsize exceeds 4000 characters, you would just use multiple fields.

to find out how many rejects -- "select count(*) from t", to see the rejects -- select * from t;

 

A reader, June 24, 2004 - 3:41 pm UTC

No, what I meant was...

In the SQLLDR world, the bad file has exactly the same format as my input data file. I dont have to do anything for that, thats just the way SQLLDR works.

In contrast, if I want to trap bad records in the external table world, I have to specifically create the DDL for the bad table. This just seems like extra work to me!

Alternatively, instead of

access parameters
(records delimited by newline
badfile external_tables_dir:foobar


I would have liked to do

access parameters
(records delimited by newline
badTABLE foobar)

and then Oracle would automagically create the badfile and the corresponding external table to access that for me, in one swoop.

I know, I know, all these new features are making us lazy!






Performance Issues with External Tables

External Table user, June 25, 2004 - 9:45 am UTC

Hi,
I am using external tables to create a database. Data to be loaded is in the form of flat files. These files have been used as external tables, and there are PL/SQL procedures to transform them, and load them into database tables.

Recently, there have been performance issues with the databases(The UNIX Server hosts many databases). The DBA's have come to the conclusion that it might be caused due to the external tables. (This is a development database, and there are only around 12 external tables, with each file less than 10 KB. The external tables are referenced only for loading data. There are only a couple of joins in one of the procedures.)Do you this this conclusion is right? Can you throw some light on the issue?

Thanks

Tom Kyte
June 25, 2004 - 2:36 pm UTC

ROTFL

"The DBA's have come to the conclusion that it might be
caused due to the external tables. "


How's about THEY throw a little (just a little) light on the issue and explain how they came to a conclusion that it MIGHT be something....

Hmmm. It might be the alignment of mecury with saturn too -- or sun spots.

But, without any sort of analyis or facts to back it up, we cannot really be sure.

Data File In Application Server Can we USE EXTERNAL TABLE ??

Arindam, July 02, 2004 - 10:54 am UTC

Tom,
Thank for answering our questions (sometimes as stupid as mine's) day in and day out.
We have a application server which has a Oracle client installed , and has the data residing in it. We usually use SQL Loader to push the data from App Server to DB server. Can we use external table here ? I know we need to create a directory prior but we don't have any links(soft or hard) between those servers. Could you comment on this ?

Regards

Tom Kyte
July 02, 2004 - 11:12 am UTC

the file must be accessible the DATABASE SERVER as your dedicated/shared server is the process that actually reads it. It could be an NFS mount.

Error checking, lookups, etc

A reader, July 13, 2004 - 10:16 am UTC

I have many processes that get a file from a external vendor, load it into a staging table, run PL/SQL code to process each row and merge it into the main table.

When we upgrade to 9iR2, I would like to use the external tables feature for any new files we get in the future.

But I see the following issues that I would appreciate your help with.

1. The vendor sends us their primary keys (cust_no, account_no, etc). The PL/SQL looks these up into our tables and translates this into our internal ids.

2. The current process looks at the exit code from 'sqlldr' and/or the number of rows in the bad file and throws an error if either of them are not good.

3. The current process loops thru all the rows in the staging table, does all the validations and if any record fails validation, it writes that to an error table and a post-load sqlplus script queries this error table and generates a report.

All these things seem more suited towards a procedural, row-by-row approach rather than a set-based external table approach.

Can you please address the above issues and advise if it makes sense to convert these kinds of processes to use external tables?

Thanks

Tom Kyte
July 13, 2004 - 12:18 pm UTC

it absolutely makes sense.

1) scalar subqueries are awesome for automating that. you can select your lookup right in the query.

2) reject = 0, the queries against external tables will get an error, same difference.

3) if you need row by row validation -- you would BULK collect 100 or so rows, and then forall i insert them after "cleansing them"

Error checking, lookups, etc

A reader, July 13, 2004 - 1:10 pm UTC

Not sure if I understand you, lets take an example and work with it.

create table t
(
our_cust_no int,
their_cust_no int,
cust_name varchar2(50),
state_desc varchar2(50)
)

insert into t values (100,1,'First',null);
insert into t values (101,2,'Second',null);


My external file is defined as

1,First modified,NJ
2,Second modified,NY
3,Doesnt exist,PA
4,Bad record,ZZ

The first and second records should update my table with the modified cust_names and the expanded state descriptions (New Jersey and New York)

The 3rd record should get inserted as

(103,3,'Doesnt Exist','Pennsylvania) --The 103 here is just the max(our_cust_no) from t + 1

The 4th record should write to an error table since it has a bad state code (ZZ) or some other invalid data.

Given this setup, how can I apply your 3 suggestions?

Thanks



Tom Kyte
July 13, 2004 - 1:50 pm UTC

you'll turn a "load file into stage table, read record by record, process" into

o open a cursor on the external table -- perhaps outer joining that to the "current" table so you'll know if a record is to be updated or inserted....

o this cursor can also use a join or scalar subquery to turn NJ into New Jersey (or null if ZZ is found)

o you bulk fetch 100 or so of these.

o now you fill up three arrays
array 1 -- good records to be inserted
array 2 -- good records to be updated
array 3 -- bad records to be logged


and you forall i process them. You skip the sqlldr, you change the logic to be BULK.



Error checking, lookups, etc

A reader, July 13, 2004 - 6:44 pm UTC

This is my first time using external tables, so please bear with me, but I still havent quite got it.

Using my earlier example, how would this go?

select col1,col2,
(select state_desc from lookup
where state_code=a.state_code)
bulk collect into ...
from external_table a,current_table b
where a.pk=b.pk(+)

Now, depending on my validations and lookups, I fill up the 3 arrays you suggest and then forall i bulk insert/update/log the records for EACH of the 3 arrays.

Questions:

1. If I set my REJECT LIMIT to 0 for the external table and the access driver rejects a row, what would happen? The entire SELECT BULK COLLECT above would fail and throw an exception, right? How do I handle this? I would want to just log that bad row into a bad table and proceed, not abort the entire bulk collect

2. From an ease-of-use perspective, how in the world is all this better?! I mean, even my existing load into staging, process row by row logic, I can just replace external_table with staging_table in my code above and still get the advantages of bulk processing, right? So, what exactly is the external table feature buying me here?

'select * from' a flat_file sounds very cool but if you want all the robust error-checking, logging, validations, lookups, etc, typically associated with load-into-staging-and-process jobs, external tables dont really add much value, do they?

Thanks

Tom Kyte
July 13, 2004 - 8:16 pm UTC

1) a fetch would fail -- yes. Your logic said you wanted to stop above:

<quote>
2. The current process looks at the exit code from 'sqlldr' and/or the number of
rows in the bad file and throws an error if either of them are not good.
</quote>

the reading of the external table is like the sqlldr, i assumed you wanted to stop. if not, set rejects = unlimited.


2) everything is in the database, on process, one procedure, less work for the database -- no load into stage, read stage. just read/process file.

Just even using external tables as "sqlldr" adds a ton -- even if you did not change the process, just dropped the shell scripts you've bought alot.

heck, if you want to -- you can:
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:19481671347143 <code>

whereby you put the validation code into a pipelined function -- route bad records to a bad table, output good records after doing whatever lookups you wanted -- and use merge to make your entire etl become:

merge into existing
using ( select * from table( pipelined_function ) ) et
on ( ... )
when matched then update...
when not matched then insert ...


Correction

A reader, July 13, 2004 - 7:18 pm UTC

Sorry, I contradicted myself. If I set reject limit=0 and the bulk collect fails because a row was rejected, I indeed want the entire thing to fail (similar to how I get a non-zero return code $? from sqlldr in my current process)

But I would appreciate your comments on my code and my other observations.

Thanks

Bulk fetch

A reader, July 13, 2004 - 7:21 pm UTC

1. So I bulk fetch from my external_table cursor into one array
2. Loop thru the array and determine insert/update/bad and populate the corresponding array
3. forall i EACH array and insert/update my final table and bad table

And all this is faster than just processing the external_table cursor one row at a time?

Thanks


Tom Kyte
July 13, 2004 - 8:17 pm UTC

bulk is better, yes.

Awesome!

A reader, July 13, 2004 - 8:29 pm UTC

<quote>
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:19481671347143 <code>

merge into existing
using ( select * from table( pipelined_function ) ) et
on ( ... )
when matched then update...
when not matched then insert ...
<quote>

This is awesome! Cant wait to try it out in my environment!

Thanks a lot, you are the best. Dont know how you do what you do on this forum!

Sequence

A reader, July 14, 2004 - 5:12 pm UTC

Another question:

How can I make it such that all the rows loaded from the exteral table have the same sequence number which is one more than the max(seq_no) in my real table?

Thanks

Tom Kyte
July 15, 2004 - 11:46 am UTC

insert into your_table
select (select max(seq_no)+1 from your_table), et.* from external_table et;



More questions

A reader, July 14, 2004 - 6:55 pm UTC

Some more questions about using external tables:

1. I dont seem to have much flexibility in naming my logfile. Is there a way to have some dynamic file name generation based on the time of day, input parameters passed into the stored proc, etc?

2. Similar to UTL_FILE_DIR, the directory specified by CREATE DIRECTORY seems to refer to only that specific directory, not subdirectories under it. Is there a way to grant access to an entire directory tree?

3. SQLLDR has the ability to load multiple tables using various INTO TABLE and WHEN clauses. How can I do this using external tables? I guess I can use the INSERT ALL (multi-table insert) feature? How would this work in conjunction with the LOAD WHEN clause of the external table's access parameters?

I mean, suppose I have

INSERT ALL
when ... then into table ...
when ... then into table ...
select ... from external_table;

I guess it makes sense to NOT have any LOAD WHEN in the external table definition and have the WHEN clauses in the INSERT ALL control that, right?

Thanks a lot

Tom Kyte
July 15, 2004 - 11:57 am UTC

1) there are modifiers %p and %a you can include in there and utl_file can be used to rename the log file to whatever you like after you are done with it.

2) no subdirectories, you must be explicit.

3) multi-table inserts. you have a where clause -- infinitely more powerful than anything you could do in sqlldr

i would just use "where" -- use the load when to exclude records that should never be seen.

Dates

A reader, July 15, 2004 - 12:10 pm UTC

Thanks a lot, another question, if you dont mind.

I have a pipe-delimited file with 100 fields, 2 of which are dates.

I did

create table ext
(
f1 varchar2(10),
f2 number,
...
f99 date,
f100 date
)
organization external
(type oracle_loader
default directory my_dir
access parameters
(
records delimited by newline
fields terminated by '|'
)
location ('test.dat')
);

Now my date field is in YYYYMMDDHH24MISS format.

Yes, I can specify all the fields again in the access parameters section and use the CHAR DATE_FORMAT DATE MASK ... for these 2 fields.

But is there a way to specify the date format just for these 2 fields and default the rest? I hate to duplicate all the other 98 field definitions!

Thanks

Tom Kyte
July 15, 2004 - 1:32 pm UTC

you can either

o map them as a varchar2 and just to_date them in the select
o just list the other column names a,b,c,d,e,f,..... and just give the datatype spec for these two.

Skipping first and last records

A reader, July 15, 2004 - 12:28 pm UTC

How can I skip loading the first and last records in the file?

SKIP 1 will skip the first but how can I skip the last?

Thanks

Tom Kyte
July 15, 2004 - 5:49 pm UTC

there would have to be something that denotes the "last record", a where clause would filter it -- although we could use rownum and a count(*) over () to skip it as well (less efficiently)

Rename

A reader, July 15, 2004 - 1:22 pm UTC

You said

"there are modifiers %p and %a you can include in there and utl_file can be used to rename the log file to whatever you like after you are done with it"

If the logfile has the %p modifier in it, the filename is dynamic. How can I use utl_file.frename to rename the file if I dont know the name?!

Thanks

Tom Kyte
July 15, 2004 - 6:03 pm UTC

%p is your process id, just query it up.

ops$tkyte@ORA9IR2> create table external_table
  2  (EMPNO NUMBER(4) ,
  3   ENAME VARCHAR2(10),
  4   JOB VARCHAR2(9),
  5   MGR NUMBER(4),
  6   HIREDATE DATE,
  7   SAL NUMBER(7, 2),
  8   COMM NUMBER(7, 2),
  9   DEPTNO NUMBER(2)
 10  )
 11  ORGANIZATION EXTERNAL
 12  ( type oracle_loader
 13    default directory data_dir
 14    access parameters
 15    (
 16      RECORDS DELIMITED BY NEWLINE CHARACTERSET WE8ISO8859P1
 17    logfile 'foo_%p.log'
 18    fields terminated by ','
 19    )
 20    location ('emp.dat')
 21  )
 22  /
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> set linesize 105
ops$tkyte@ORA9IR2> set echo on
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> column log new_val log
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select '/tmp/foo_' || a.spid || '.log' log
  2    from v$process a, v$session b
  3   where a.addr = b.paddr
  4     and b.sid = (select sid from v$mystat where rownum=1)
  5  /
 
LOG
-------------------------
/tmp/foo_15741.log
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> !ls -l &LOG
ls: /tmp/foo_15741.log: No such file or directory
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select empno from external_table where rownum = 1
  2  /
 
     EMPNO
----------
      7369
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> !ls -l &LOG
-rw-r--r--    1 ora9ir2  ora9ir2      1137 Jul 15 18:00 /tmp/foo_15741.log



 

Logging exceptions

A reader, July 15, 2004 - 4:47 pm UTC

After reviewing all your suggestions in the context of my environment, I have the following so far.

declare
type my_tab_t is table of my_table%rowtype index by binary_integer;
my_tab my_tab_t;
errors exception;
PRAGMA exception_init(errors, -24381);

begin

select
seq.nextval,
(select nvl(max(load_seq_no),0)+1 from my_table),
(select our_cust_no from cust where their_cust_no=ext.their_cust_no),
...
BULK COLLECT INTO
my_tab
from external_table ext;

-- do some more validations

begin
forall i in 1..my_tab.count SAVE EXCEPTIONS
insert into my_table values my_tab(i);
when errors then
...
end;

Now, in my WHEN ERRORS section above, I need to log the 'their_cust_no' field from my vendor's file since we communicate with them on that key. But since I have translated it to our_cust_no in my scalar subquery, it is lost!

How can I get it back?

Yes, instead of defining a %ROWTYPE table, I can create a record with all the columns and add this additional column, but then I lose the ability to simply

insert into table values record; -- this is really cool

Is there a more elegant solution?

Thanks

Tom Kyte
July 15, 2004 - 6:38 pm UTC

translate it back when you log the error??

A reader, July 15, 2004 - 6:45 pm UTC

"translate it back when you log the error??"

um, maybe I didnt explain it clearly. I cant translate it back because I dont have anything to translate from, thats the point.

See the file has their_cust_no and I translate this using a scalar subquery into our_cust_no. My target table has a NOT NULL and a FK on our_cust_no. So if the lookup fails, my bulk collect into (and hence my sql%bulk_exceptions) has NULL. In this case, I want to get my our_cust_no back.

How can I do this?

Thanks

Tom Kyte
July 15, 2004 - 6:52 pm UTC

umm, if you can do this:

(select our_cust_no from cust where their_cust_no=ext.their_cust_no),

you can do this:

(select their_cust_no from cust where our_cust_no=ERROR_RECORD.cust_no),


you appear to have a lookup table that goes back and forth no????

A reader, July 15, 2004 - 6:46 pm UTC

And in any case, to translate it back, I would have to query the external table again i.e. launch the loader process again. This would be expensive.

I was hoping there would be a PL/SQL way to do this. Thanks

Tom Kyte
July 15, 2004 - 6:52 pm UTC

see above.

A reader, July 15, 2004 - 8:39 pm UTC

<quote>
umm, if you can do this:

(select our_cust_no from cust where their_cust_no=ext.their_cust_no),

you can do this:

(select their_cust_no from cust where our_cust_no=ERROR_RECORD.cust_no),

you appear to have a lookup table that goes back and forth no????
<quote>

yes, but as I explained, (select our_cust_no from cust where their_cust_no=ext.their_cust_no), returns no rows (null in the scalar subquery) because their_cust_no is a BAD number, NOT in my lookup table.

So, error_record.cust_no is NULL, I cant use it to translate back, hence my comment about ext.cust_no being "lost"



Tom Kyte
July 16, 2004 - 10:41 am UTC

well, guess you would have to fetch it out in the "bulk collect" -- and since you are thinking of moving into 3 arrays anyhow? just don't move that field into the 3 arrays. you need to split them out anyway right?

How does the access driver work?

A reader, July 15, 2004 - 9:29 pm UTC

These might be stupid questions, but please indulge me...

1. If I 'select * ext_table where rownum<=10' does it just load the first 10 records in the file? (similar to sqlldr's load=10 option)

2. Or is the entire file loaded every time the ext_table is referenced in any manner in any DML?

3. To change the file location, I can simply do
alter table ext_table location ('new_filename');

But to change from LOGFILE to NOLOGFILE, I need to specify the entire 'access parameters' section! This is a pain.
From a design perspective, do you know why the logfile clause is treated any differently than the 'location' clause?

Thanks

Tom Kyte
July 16, 2004 - 10:42 am UTC

1) yes.

2) it stops

3) it seems you would either

a) log it
b) not log it?

I suppose - logfile is a small part of a larger clause is the technical reason I suppose, but it seems you would either want to log it or not always.

Reply to Reader with translation issue

Bill, July 16, 2004 - 8:29 am UTC

<quote>
umm, if you can do this:

(select our_cust_no from cust where their_cust_no=ext.their_cust_no),

you can do this:

(select their_cust_no from cust where our_cust_no=ERROR_RECORD.cust_no),

you appear to have a lookup table that goes back and forth no????
<quote>

yes, but as I explained, (select our_cust_no from cust where
their_cust_no=ext.their_cust_no), returns no rows (null in the scalar subquery)
because their_cust_no is a BAD number, NOT in my lookup table.

So, error_record.cust_no is NULL, I cant use it to translate back, hence my
comment about ext.cust_no being "lost"
===================================================
Seems obvious to me, but:
why not put a holding variable in your code, and store your their_cust_no in there until you are finished error processing? This way, you will not lose it - just set the holding variable every time you loop through a new record.


Tom Kyte
July 16, 2004 - 11:21 am UTC

won't work -- we are using bulk operations -- not slow by slow processing.

Reply to Reader with translation issue - another think

Bill, July 16, 2004 - 8:47 am UTC

Just realized that if you do:
select our_cust_no from cust where
their_cust_no=ext.their_cust_no)

and it returns no rows, why can't you just set

error_record.cust_no = their_cust_no

at this point? You already "know" what their_cust_no is - just add a check to see if error_record.cust_no is null and if it is, redirect it to their_cust_no. Make sense?

To Bill

A reader, July 16, 2004 - 9:29 am UTC

We are not on the same page, let me explain...

I have a target table I am loading from a external table. I am BULK COLLECTING into a table type based on %ROWTYPE. Look at my code above. This lets me do

select
<all the columns in my target table>
bulk collect into myrectab
from external_table;

forall i in 1..myrectab.count save exceptions
insert into target_table values myrectab(i);

Now, at this point, myrectab already contains all the data I need to insert. If few of the rows has a bad their_cust_no, the scalar subquery in my select above would have stored NULL in in that record in myrectab. So when I FORALL INSERT into my table, I would get a 'not null constraint violated' error. At this point, I need to log the their_cust_no *for that record* into an error table. But since I translated it to NULL, the their_cust_no is lost!

"why not put a holding variable in your code, and store your their_cust_no in there until you are finished error processing? This way, you will not lose it -
just set the holding variable every time you loop through a new record"

Right, all I need to do is select both the their_cust_no and the translated our_cust_no, but then I cant bulk collect into a ROWTYPE collection. i.e. it would be like

select
<columns needed for my target_table>,
ext.their_cust_no
bulk collect into XXX
from external_table ext;

But the XXX above has to be a ROWTYPE collection based on my target_table otherwise I cant use it in my FORALL to insert the row.



To Reader - OOPS! I missed that, sorry!

Bill, July 16, 2004 - 10:10 am UTC

Did not catch the "has to be a %ROWTYPE collection" part - sorry about that! I must learn to READ!

How does the access driver work?

A reader, July 16, 2004 - 11:03 am UTC

1. OK so if I have a rownum<=10 against the external table, it will load only the first 10 records.

But if I access the external table in a complicated query using few other tables and joins and WHERE clauses and stuff, it *has* to load the entire table in memory or temp to satisfy the query, right?

In other words, from a EXPLAIN PLAN perspective, what are the various access paths that Oracle has for a external table? The only one I see after I analyze the external table is EXTERNAL TABLE ACCESS (FULL). Are there others?

2. Regarding the LOGFILE clause, I was thinking that I would set it to NOLOGFILE in the table DDL assuming that most times the file is fine, no errors, so there is no benefit in having a logfile generated and clutter up my directory. But when there is a problem, I can just ALTER TABLE LOGFILE and re-run the code to have a logfile generated. Sort of like turning on tracing to diagnose a problem.

Thanks

Tom Kyte
July 16, 2004 - 1:52 pm UTC

1) no, it all depends. it all depends. it might, it might not.

table access full is pretty much it for external tables -- it does not mean however that it has to read the entire thing up front for any query. just means we'll be reading it row by row at some point -- but we can always stop.


2) then you'll be using the entire access parameter - set it up in a stored procedure to make it easier (that's what sp's are really good at)

Bulk collect into multiple collections

A reader, July 16, 2004 - 11:16 am UTC

Here is what I want to do:

type rec_t is record
(
emp_rec emp%rowtype,
extra_column varchar2(100)
);

type mytype is table of rec_t index by binary integer;
mytab mytype;

select
a.*,
'zzz'
bulk collect into mytab
from emp a;

forall i in 1..mytab.count
insert into emp_copy values mytab(i).emp_rec;

This gives me an error about not being able to mix single-row and multi-row in a bulk collect.

How can I split this into 2 collections and still use the rowtype and bulk collect? (I want to use the rowtype so that I can insert the entire record instead of specifying each field)

Thanks


Tom Kyte
July 16, 2004 - 1:55 pm UTC

you cannot -- as i said, you'd have to move it into another array -- which seems to be what you would want to do anyhow

a) array 1 to be udpated
b) array 2 to be inserted
c) array 3 to be error logged.



Splitting into 3 arrays

A reader, July 16, 2004 - 2:45 pm UTC

Um, sorry, I was trying to figure this out myself but wasnt quite able to modify my existing code (posted earlier) to split into the 3 arrays you suggest (and still maintain my %rowtype construct)

Could you please get me started on how to split into these 3 arrays?

Thanks

Tom Kyte
July 16, 2004 - 10:52 pm UTC

you would have 4 arrays - one you fetch into, one you insert with, one you update with and one you log errors with.

they can all be different.

write three stored procedures to assign a fetched row into the other types and your code will be "nice looking"

How?

A reader, July 17, 2004 - 12:29 am UTC

Um, thanks, but can you post some sample code to illustrate what you suggest?

"you would have 4 arrays - one you fetch into, one you insert with, one you update with and one you log errors with"

so, I get all my rows bulk collect into one array. Now what? how can I split them into the other 3 arrays #b and maintain bulk processing?#b


"write three stored procedures to assign a fetched row into the other types and your code will be "nice looking""

Not sure what you mean by nice looking?

Tom Kyte
July 17, 2004 - 2:34 pm UTC

you are procedurally processing data here - it is called "assignment", you assign the elements of the array you fetched into -- into one of three arrays. then, after you fill them up -- you execute three bulk statements.


by nice looking, I mean readable and terse:

if ( inserting ) then
insert_array(insert_array.count+1) :=
make_me_an_insert_record( bulk_fetched_record(i) );
elsif ( updating ) then
update_array(update_array.count+1) :=
make_me_an_update_record( bulk_fetched_record(i) );
else
bad_array(bad_......)





A reader, July 30, 2004 - 10:38 am UTC

<quote>
2) then you'll be using the entire access parameter - set it up in a stored procedure to make it easier (that's what sp's are really good at)
<quote>

Hm, I just realized the ACCESS_PARAMETERS is a CLOB in the dictionary. Can you help me with writing a SP to change the LOGFILE parameter and keep the rest unchanged? The spec for the sp would be

my_pkg.alter_external_table(table_name,file_type,dir,name)

file_type would be log, bad, or disc or NONE (to put the NOLOGFILE)

I havent worked with DBMS_LOB much, so any help would be much appreciated.

Since it is a free-form text, the sp would need to parse out

LOGFILE <bunch of whitespace including newline> ....

and stuff.

Thanks for your help

Tom Kyte
July 30, 2004 - 5:28 pm UTC

not a chance.

you do not touch the data dictionary -- ever.

A reader, July 30, 2004 - 7:56 pm UTC

No, no, you misunderstand. I was just trying to implement your suggestion to replace the entire access parameters clause.

i.e. read in the user_external_tables.access_parameters clob, parse out the logfile clause, write the new logfile clause and write the entire access parameters clause back using the regular ALTER TABLE ... ACCESS PARAMETERS clause.

All this just because the logfile clause is not available via a simple ALTER TABLE like the LOCATION clause is.

Help?

Thanks

Tom Kyte
July 31, 2004 - 10:45 am UTC

Ok, just read the lob out into a plsql variable of type "long". Unless you make it bigger than 32k (doubtful), it'll just be a "string"

is
l_variable long;
...
begin
....
l_variable := dbms_lob.substr( l_clob_value, 32765, 1 );
....


then use substr, instr, etc to "parse" it and do your replacement.

how to load this data

Pukala, August 03, 2004 - 5:24 am UTC

Hi

I have this date

200308;1
200405;2

I then create this table

create table go$external
(a date,
b number)
organization external
(type oracle_loader
default directory prueba
access parameters
( records delimited by newline
badfile 'go.bad'
logfile 'go.log'
fields terminated by ';'
(a char(6) date_format date mask "yyyymm",
b)
)
location ('go.txt')
)
reject limit unlimited;

the problem is since the second filed hasnt got ";" I get this error

error processing column B in row 1 for datafile /tmp/go.txt
ORA-01722: invalid number

how can I overcome this?

Tom Kyte
August 03, 2004 - 8:49 am UTC

you'll have to show me how to reproduce step by step -- for the example you gave works "out of the box"


ops$tkyte@ORA9IR2> create or replace directory prueba as '/tmp'
  2  /
 
Directory created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> drop table go$external;
 
Table dropped.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create table go$external
  2  (a date,
  3   b number)
  4  organization external
  5  (type oracle_loader
  6   default directory  prueba
  7   access parameters
  8                     ( records delimited by newline
  9                       badfile 'go.bad'
 10                       logfile 'go.log'
 11                       fields terminated by ';'
 12                       (a char(6) date_format date mask "yyyymm",
 13                        b)
 14                     )
 15   location ('go.txt')
 16  )
 17  reject limit unlimited;
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> !cat /tmp/go.txt
200308;1
200405;2
 
ops$tkyte@ORA9IR2> select * from go$external;
 
A                  B
--------- ----------
01-AUG-03          1
01-MAY-04          2
 
 

overwrite the logfile

A reader, August 03, 2004 - 5:33 am UTC

Everytime we query the external table the logfile is appended, is there a way to overwrite it? Cant find anything in the docs!

Cheers

Tom Kyte
August 03, 2004 - 8:52 am UTC

<quote>
The LOGFILE clause names the file that contains messages generated by the external tables utility while it was accessing data in the datafile. If a log file already exists by the same name, the access driver reopens that log file and appends new log information to the end. This is different from bad files and discard files, which overwrite any existing file. NOLOGFILE is used to prevent creation of a log file.
</quote>

that is the way it works by design, you'd have to use utl_file to "erase it" first if this was not desired.

Accessing logfile from client

va, August 06, 2004 - 11:25 pm UTC

The loader process creates the logfile/badfile on the server (Unix). How can developers read these from the client side (Windows)? They dont have Unix user accounts.

Help? Thanks

Tom Kyte
August 07, 2004 - 10:08 am UTC

create YAET

yet another external table


just liks external_clntques_bad demonstrated above.

CLOB

A reader, August 07, 2004 - 11:23 am UTC

Instead of doing

ops$ora920@ORA920.US.ORACLE.COM> create table external_clntques_bad
2 ( text1 varchar2(4000) ,
3 text2 varchar2(4000) ,
4 text3 varchar2(4000)
5 )
6 organization external
...

Can I just load the file into a CLOB column and be done with it instead of these text1, text2, text3 columns?

Thanks

Tom Kyte
August 07, 2004 - 1:47 pm UTC

ops$tkyte@ORA9IR2> create table external_clntques_bad
  2  ( txt clob
  3  )
  4  organization external
  5  (type oracle_loader
  6   default directory external_tables_dir
  7   access parameters
  8   (
  9    records delimited by newline
 10     fields
 11    missing field values are null
 12  ( txt postition(1:1000000)
 13     )
 14   )
 15   location ('foobar.bad')
 16  )
 17  /
( txt clob
  *
ERROR at line 2:
ORA-30656: column type not supported on external organized table
 

Concurrent access

A reader, August 11, 2004 - 6:56 am UTC

1. How does Oracle handle concurrent access to external tables? Suppose 2 sessions do a 'select' from the external table, they dont interfere with each other in any way, right?

2. Suppose one session is in the middle of a select * from a million row file, other session comes along and does a ALTER TABLE LOCATION ('newfile.dat'). Would the first session get confused or would the second session block?

Thanks

Tom Kyte
August 11, 2004 - 10:13 am UTC

1) they are read only -- other than "io will be contended for" they do not "bother" eachother

2) once a plan is in place, a plan is in place and nothing is going to alter the course of the guy executing that plan. no confusion.

Concurrent access

A reader, August 11, 2004 - 11:17 am UTC

"once a plan is in place, a plan is in place and nothing is going to alter the course of the guy executing that plan. no confusion"

Right, I dont doubt that the first session will continue to keep fetching the 1 million rows.

But when this is going on, would the second sessions

ALTER TABLE EXT_TABLE LOCATION('new.dat')

succeed?

What if the first session has paused the fetching using 'sqlplus' pause or something? And resumes fetching after the LOCATION has changed? Would the first session error out because it is now pointing to a different file?

Tom Kyte
August 11, 2004 - 1:33 pm UTC

the first session has a plan -- plan is "read this file foo.dat".  that plan is not affected by an alter -- the plan is fixed in place.  no matter what happens -- plan will stay "read this file foo.dat" until the query completes.

ops$tkyte@ORA9IR2> create or replace directory data_dir as '/tmp/'
  2  /
 
Directory created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create table external_table
  2  (EMPNO NUMBER(4) ,
  3   ENAME VARCHAR2(10),
  4   JOB VARCHAR2(9),
  5   MGR NUMBER(4),
  6   HIREDATE DATE,
  7   SAL NUMBER(7, 2),
  8   COMM NUMBER(7, 2),
  9   DEPTNO NUMBER(2)
 10  )
 11  ORGANIZATION EXTERNAL
 12  ( type oracle_loader
 13    default directory data_dir
 14    access parameters
 15    ( fields terminated by ',' )
 16    location ('emp.dat')
 17  )
 18  /
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> host flat scott/tiger emp > /tmp/emp.dat
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> variable x refcursor
ops$tkyte@ORA9IR2> variable y refcursor
ops$tkyte@ORA9IR2> exec open :x for select empno from external_table;
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> alter table external_table location( 'new.dat' );
 
Table altered.
 
ops$tkyte@ORA9IR2> exec open :y for select empno from external_table;
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> print x
 
     EMPNO
----------
      7369
      7499
      7521
      7566
      7654
      7698
      7782
      7788
      7839
      7844
      7876
      7900
      7902
      7934
 
14 rows selected.
 
ops$tkyte@ORA9IR2> print y
ERROR:
ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-04040: file new.dat in DATA_DIR not found
ORA-06512: at "SYS.ORACLE_LOADER", line 14
ORA-06512: at line 1
 
 
 
no rows selected
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> drop table external_table;
 
Table dropped.
 
 

Log file directory - Oracle 9.2.0.4

Raja, August 12, 2004 - 3:01 pm UTC

Hi Tom,
Just wanted to know if a seperate logfile directory could be specified in the external table DDL and how. Else, is there any other way to get around this.

Thanks,
Raja

Tom Kyte
August 12, 2004 - 3:28 pm UTC

organization external
(type oracle_loader
default directory my_dir
access parameters
(records delimited by newline
badfile my_bad_dir:'ext_vende%a_%p.bad'
logfile my_log_dir:'ext_vende%a_%p.log'

fields terminated by ','
missing field values are null
(

you can specify the directory, yes.

Log file directory - Oracle 9.2.0.4

Raja, August 12, 2004 - 4:05 pm UTC

Thanks Tom it works!!

array processing external files

A reader, September 19, 2004 - 3:00 pm UTC

Hi

We have databases in Oracle 8i and 9i. We process text files daily with PRO*C. With 9i this is easy we make use of external tables however in 8i we process the text files line by line and it's not as fast.

My question is, is it possible process array of lines with PRO*C and text files?

Tom Kyte
September 19, 2004 - 3:15 pm UTC

sure, C is C is C...

in C you can fread as much data as you want -- you have infinite control.

Not sure, since Oracle would not even be involved in the reading of an OS file via Pro*C directly -- it is all about C at that point, what else to say...

fread vs fget

A reader, September 19, 2004 - 4:12 pm UTC

Hi

I am afraid we arent using fread since the text file has variable line length so we must use fgets. It's hard to modify the text format since these files come from other companies.

I guess this is not related to Oracle as you say. However do you know if it's possible read arrays of line using fget?


Thx a lot


Tom Kyte
September 19, 2004 - 5:09 pm UTC

fread plus strchr can do wonders.........

parsing a line in C is simple.


just read a big chunk, and strchr looking for newlines

Spaces not getting loaded

DAP, September 23, 2004 - 6:21 pm UTC

I'm trying to load a space from an external file into a table, but the space is getting treated as a null.  Is there an access parameter to make this work?

DATAFILE:
--------
abcde fghi


TABLE DEFINITION:
----------------
SQL> desc a
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 A                                         NOT NULL VARCHAR2(5)
 B                                         NOT NULL VARCHAR2(1)
 C                                         NOT NULL VARCHAR2(4)


EXTERNAL FILE DEFINITION:
------------------------
drop table a_load;
create table a_load
                           (a  varchar2(5),
                            b  varchar2(1),
                            c  varchar2(4))
  ORGANIZATION EXTERNAL (TYPE ORACLE_LOADER DEFAULT DIRECTORY ext_dir
                         ACCESS PARAMETERS (RECORDS delimited by newline 
                                            FIELDS
                           (a  char(5),
                            b  char(1),
                            c  char(4)))
                          LOCATION ('a')) nologging;

insert into a
                           (a,
                            b,
                            c)
(SELECT
                            a,
                            b,
                            c
FROM                        a_LOAD);

COMMIT;


insert into a
*
ERROR at line 1:
ORA-01400: cannot insert NULL into ("CPUT"."A"."B")
 

Tom Kyte
September 24, 2004 - 10:04 am UTC

ops$tkyte@ORA9IR2> create table a_load
  2  (a  varchar2(5),
  3  b  varchar2(1),
  4  c  varchar2(4))
  5  ORGANIZATION EXTERNAL
  6  (TYPE ORACLE_LOADER DEFAULT DIRECTORY ext_dir
  7  ACCESS PARAMETERS (RECORDS delimited by newline
  8  FIELDS <b>NOTRIM</b>
  9  (
 10  a  char(5),
 11  b   char(1),
 12  c   char(4)))
 13  LOCATION ('a')) nologging;
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select '"'||a||'"', '"'||b||'"', '"'||c||'"' from a_load;
 
'"'||A| '"' '"'||C
------- --- ------
"abcde" " " "fghi"
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> insert into a
  2                             (a,
  3                              b,
  4                              c)
  5  (SELECT
  6                              a,
  7                              b,
  8                              c
  9  FROM                        a_LOAD);
 
1 row created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> COMMIT;
 
Commit complete.
 

9.2.0.5.0 prevents external tables reading from link (hard/symbolic)

Jim Dickson, September 24, 2004 - 6:22 am UTC

Can you reproduce this problem.
Under Standard Edition 9.2.0.3.0 on Solaris 2.8 an external table could happily read a file that was a symbolic link.

This appears to be no longer possible in 9.2.0.5.0.
I cannot find any documentation to support theory that perhaps this was a security hole Oracle have plugged.

We extensively use external tables and symbolic links.

TEST CASE:
( You will have to substiture directory name/path )

Since upgrading to 9.2.0.5.0, it fails.

TEST CASE:
hh-l:/home/jdickson/External_Tables.92050->more external_tables_link.sql
spool external_tables_link.out
set echo on
show user

!rm /scv/dev/external_tables/file1.psv
!rm /scv/dev/external_tables/file2.psv

!echo "file1" > /scv/dev/external_tables/file1.psv
!echo "file2" > /scv/dev/external_tables/file2.psv
!cat /scv/dev/external_tables/file1.psv
!cat /scv/dev/external_tables/file2.psv

drop table ext_table;
create table ext_table
(
col1 varchar2(15)
)
ORGANIZATION EXTERNAL
( type oracle_loader
default directory external_tables
access parameters
(
RECORDS DELIMITED BY NEWLINE
fields terminated by ',' NOTRIM
MISSING FIELD VALUES ARE NULL
)
location ('file0.psv')
)
reject limit 0
;

alter table ext_table location ('file1.psv');
select * from ext_table;

alter table ext_table location ('file2.psv');
select * from ext_table;

!rm /scv/dev/external_tables/file2.psv
!ln /scv/dev/external_tables/file1.psv /scv/dev/external_tables/file2.psv
!cat /scv/dev/external_tables/file1.psv
!cat /scv/dev/external_tables/file2.psv


select * from ext_table;

spool off
exit;

RESULTS:
RESULTS:
jdickson@DV01> show user
USER is "JDICKSON"
jdickson@DV01>
jdickson@DV01> !rm /scv/dev/external_tables/file1.psv

jdickson@DV01> !rm /scv/dev/external_tables/file2.psv

jdickson@DV01>
jdickson@DV01> !echo "file1" > /scv/dev/external_tables/file1.psv

jdickson@DV01> !echo "file2" > /scv/dev/external_tables/file2.psv

jdickson@DV01> !cat /scv/dev/external_tables/file1.psv

jdickson@DV01> !cat /scv/dev/external_tables/file2.psv

jdickson@DV01>
jdickson@DV01> drop table ext_table;

Table dropped.

jdickson@DV01> create table ext_table
2 (
3 col1 varchar2(15)
4 )
5 ORGANIZATION EXTERNAL
6 ( type oracle_loader
7 default directory external_tables
8 access parameters
9 (
10 RECORDS DELIMITED BY NEWLINE
11 fields terminated by ',' NOTRIM
12 MISSING FIELD VALUES ARE NULL
13 )
14 location ('file0.psv')
15 )
16 reject limit 0
17 ;

Table created.

jdickson@DV01>
jdickson@DV01> alter table ext_table location ('file1.psv');

Table altered.

jdickson@DV01> select * from ext_table;

COL1
---------------
file1

jdickson@DV01>
jdickson@DV01> alter table ext_table location ('file2.psv');

Table altered.

jdickson@DV01> select * from ext_table;

COL1
---------------
file2
jdickson@DV01>
jdickson@DV01> !rm /scv/dev/external_tables/file2.psv

jdickson@DV01> !ln /scv/dev/external_tables/file1.psv /scv/dev/external_tables/file2.psv

jdickson@DV01> !cat /scv/dev/external_tables/file1.psv

jdickson@DV01> !cat /scv/dev/external_tables/file2.psv

jdickson@DV01>
jdickson@DV01>
jdickson@DV01> select * from ext_table;
select * from ext_table
*
ERROR at line 1:
ORA-29913: error in executing ODCIEXTTABLEFETCH callout
ORA-29400: data cartridge error
KUP-04001: error opening file /scv/dev/external_tables/file2.psv
ORA-06512: at "SYS.ORACLE_LOADER", line 14
ORA-06512: at line 1

jdickson@DV01>
jdickson@DV01> spool off

PLUS
hh-l:/home/jdickson/External_Tables.92050->tail /scv/dev/external_tables/EXT_TABLE_19785.log
Record format DELIMITED BY NEWLINE
Data in file has same endianness as the platform
Rows with all null fields are accepted

Fields in Data Source:

COL1 CHAR (255)
Terminated by ","
KUP-04001: error opening file /scv/dev/external_tables/file2.psv
KUP-04017: OS message: Permission denied


Tom Kyte
September 24, 2004 - 11:14 am UTC

confirmed in 10g and 9205 -- not sure if this was the expected outcome or not. You'll want to contact support and ask them about:

BugTag: Support notes on <Bug:2290323> - DDR info <BugDesc:2290323>
Affects: RDBMS (90-A0)
NB: FIXED
Abstract: EXTERNAL TABLE parameters should disallow symbollic links
Fixed-Releases: 9202 A000
Tags: EXTTAB
Details:
EXTERNAL TABLE parameters should disallow symbollic links
(for security reasons).
Note: This fix is modified slightly in 9205


the fix was for files written by external tables in 9202 (log, bad, etc) -- the 'modification' in 9205 seems to be affecting a read, but i didn't see anymore information than that. They'll have to research it a little to see if this is an unexpected behaviour or the behaviour going forward.

External tables are awesome

Scott, October 10, 2004 - 11:11 pm UTC

G'day Tom.

I created my first external table today, and it worked first time (well, almost, after I defined one of my columns to the correct width...)

Great feature, and thanks for all these examples.

Skip Column - Filler in External Tables

Jan, October 21, 2004 - 5:55 am UTC

How to skip a column if it has variable size but delimited by e.g. ',' (without using PL/SQL or such tricks)?

Thanks

Tom Kyte
October 21, 2004 - 7:02 am UTC

in Loader yes, but in External Tables?

Jan, October 21, 2004 - 8:16 am UTC

FILLER seems to work for Loader only, or I have somewhere
a syntax error...
Please correct me where I made mistake.
i have a text file with 3 columns but in my ext. table I want to have just 2-nd and 3-th columns.

Thanks.


CREATE TABLE T_EXT
(
col2 VARCHAR2(10),
col3 VARCHAR2(10)
)
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
DEFAULT DIRECTORY MY_DIR
ACCESS PARAMETERS
( FIELDS TERMINATED BY '|' RTRIM
MISSING FIELD VALUES ARE NULL (col1 FILLER CHAR, col2, col3)
)
LOCATION (MY_DIR:text.txt')
)

Tom Kyte
October 21, 2004 - 2:24 pm UTC

oh, didn't see EXTERNAL table in the 'subject', but isn't the answer sort of obvious?


instead of select * from et;
select a, c from et;

to skip b? seems "intuitive"?



but, if you have a control file:

LOAD DATA
INFILE *
INTO TABLE t
replace
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
(
a ,
b filler,
c
)


all you need to do is run sqlldr ..... external_table=generate_only to see what you would need to do.

Is SQL*Loader My Only Choice for Normalization?

Doug Wingate, October 26, 2004 - 7:48 pm UTC

Tom,

I've been asked to extract records from a legacy application implemented in VSAM and written in COBOL and to load the information into an Oracle schema (as the first step to construction of a new data warehouse). One of the dubious features of COBOL is that it encourages storage of unnormalized information by allowing one to implement a one-to-many relationship as a record description containing a mixture of nonrepeating and repeating fields or groups of fields. My plan for the numerous extractions and loads I have to do was to extract whole, intact records, FTP them to the Oracle data warehouse server, and use external tables as the means to read the raw extract files. However, although I think I know how to load the files into multiple tables in a normalized way using SQL*Loader, I haven't been able to find syntax for the access parameters of external tables that would allow me to create normalized external tables on top of the unnormalized external files.

In case that description isn't clear, here is a high-level description of a record type of the kind I want to be able to read by means of multiple external tables. Each record contains

(1) several fields forming a composite key to the record,
(2) several nonrepeating fields,
(3) a group of fields (a, b, c) that repeat a fixed number of times as (a1, b1, c1, a2, b2, c2, a3, b3, c3), and
(4) several additional nonrepeating fields.

Each record type and all of its parts are physically fixed length. Variation in the number of logical occurrences in (3) is realized physically as zeroed or blanked fields in the trailing physical occurrences.

As I said, I've been asked to normalize the information when I load it. As the first step, I would like to be able to read each extract file via multiple external tables that give each unnormalized external file the appearance of being multiple normalized tables. In the case of the example I'm using, I would like to create external tables that display the information like this:

External Table #1
(1) composite key (2) non-repeaters (4) non-repeaters

External Table #2
(1) composite key (1b) 1 (3) a1, b1, c1
(1) composite key (1b) 2 (3) a2, b2, c2
(1) composite key (1b) 3 (3) a3, b3, c3

"Column" 1b is intended to preserve whatever information may be implicit in the physical order of the occurrences.

Describing external table #1 doesn't appear to be a problem. However, I haven't been able to find a combination of syntactic elements that will allow me to describe the relationship between an external file containing unnormalized information and an external table like the one described in #2. Do you know of external table syntax that I've overlooked and that would allow me to describe a table like #2?

Thanks as always.

Doug Wingate



Tom Kyte
October 26, 2004 - 8:37 pm UTC

can you give me an example of an input record or two or three (single char(1) fields to keep it small) and an example of what tables you want to put it into.

this sounds just like a multi-table insert.

Afterthought to Preceding Problem Description

Doug Wingate, October 26, 2004 - 7:53 pm UTC

I should have included mention of the fact that the ETL project I'm working on is being implemented in Oracle Database 10g. Thanks.

Mock-Up Example You Asked For

Doug Wingate, October 27, 2004 - 2:05 pm UTC

Here are tables intended to be modeled on the "example" I used above.

CREATE TABLE Table_for_Nonrepeating_Fields
(Key_Col_1 NUMBER(2,0),
Nonrepeating_Col_2A CHAR(2),
Nonrepeating_Col_2B CHAR(1),
Nonrepeating_Col_4A NUMBER(3,0),
Nonrepeating_Col_4B CHAR(1));

CREATE TABLE Table_for_Repeating_Groups
(Key_Col_1 NUMBER (2,0),
Occurrence_Key_Col_1B NUMBER(1,0),
Repeating_Col_3A NUMBER(2,0),
Repeating_Col_3B CHAR(1),
Repeating_Col_3C CHAR(1));

The corresponding input format looks like this:

Key_Col_1
Nonrepeating_Col_2A
Nonrepeating_Col_2B
Repeating_Col_3A
Repeating_Col_3B
Repeating_Col_3C
Repeating_Col_3A
Repeating_Col_3B
Repeating_Col_3C
Repeating_Col_3A
Repeating_Col_3B
Repeating_Col_3C
Nonrepeating_Col_4A
Nonrepeating_Col_4B

Here are a couple of sample records. For discussion, I've introduced whitespace and made all of the numbers "external" instead of packed. However, my actual records have fixed-length fields without whitespace and do contain IBM-type packed decimal numbers. Pleas note that the iteration is physically represented in the record and I have in mind to represent the physical order logically with column Occurrence_Key_Col_1B NUMBER(1,0).

12 AB A 12 A A 34 B B 56 C C 123 A
34 CD C 78 C D 90 E F 12 G H 456 C

I do understand that I could load this with a multi-table load using SQL*Loader. I could also define an unnormalized external table in which the repeating columns were represented with distinct names; then I could insert from the external table into multiple tables using a multi-table INSERT. I'm wondering if there's a way to define external tables like the two (ordinary) tables above so that the record contents appear in normal form without having to be written to new tables as a first step. In this respect, the syntax of external table access parameters doesn't seem to be as powerful as the syntax of SQL*Loader control files, but I'm hoping you know a trick or two.

I hope this is all somewhat clearer with the mock-up example.

Thanks,
Doug Wingate

Tom Kyte
October 27, 2004 - 4:17 pm UTC

set up the external table so that you can:

ops$tkyte@ORA9IR2> select * from et;
 
KEY NON NON REP REP REP REP REP REP REP REP REP NON NON
--- --- --- --- --- --- --- --- --- --- --- --- --- ---
12  AB  A   12  A   A   34  B   B   56  C   C   123 A
34  CD  C   78  C   D   90  E   F   12  G   H   456 C
 

ops$tkyte@ORA9IR2> insert all
  2  into Table_for_Nonrepeating_Fields values
  3  ( key_col_1, Nonrepeating_Col_2A, Nonrepeating_Col_2B, Nonrepeating_Col_4A, Nonrepeating_Col_4B )
  4  into Table_for_Repeating_Groups values
  5  ( key_col_1, 1, Repeating_Col_3A1, Repeating_Col_3B1, Repeating_Col_3C1 )
  6  into Table_for_Repeating_Groups values
  7  ( key_col_1, 2, Repeating_Col_3A2, Repeating_Col_3B2, Repeating_Col_3C2 )
  8  into Table_for_Repeating_Groups values
  9  ( key_col_1, 3, Repeating_Col_3A3, Repeating_Col_3B3, Repeating_Col_3C3 )
 10  select * from et;
 
8 rows created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select * from Table_for_Nonrepeating_Fields;
 
 KEY_COL_1 NO N NONREPEATING_COL_4A N
---------- -- - ------------------- -
        12 AB A                 123 A
        34 CD C                 456 C
 
ops$tkyte@ORA9IR2> select * from Table_for_Repeating_Groups;
 
 KEY_COL_1 OCCURRENCE_KEY_COL_1B REPEATING_COL_3A R R
---------- --------------------- ---------------- - -
        12                     1               12 A A
        34                     1               78 C D
        12                     2               34 B B
        34                     2               90 E F
        12                     3               56 C C
        34                     3               12 G H
 
6 rows selected.


 

Thanks

Doug Wingate, October 27, 2004 - 8:11 pm UTC

All right, thanks, Tom. I was hoping there was a way to overlay normalized external tables on unnormalized external files. However, even though it seems there's not, I can see that by using external tables such as the one you've made (and leaving normalization to be accomplished in the process of insertion), I can at least accomplish two other goals: (1) I can provide humanly-readable views of the EBCDIC, packed contents of the external files prior to their insertion into the database (by letting the Oracle_Loader access driver make the characterset and numeric datatype conversions) and (2) I can internalize the load process to the instance instead of using a utility. By themselves, those still seem like worthy reasons to use external tables.

Load only different records

A reader, November 04, 2004 - 4:00 pm UTC

I have a table like

create table cust_hist(cust_id int,modified_dt date,i int,j int,k int);

i.e. all modifications of the cust_id are timestamped and stored in the table.

These changes come from a vendor-supplied external file. I have create a external table based on this file.

Question:

How can I load only records from the file/table that are different than the latest (based on modified_dt) record for that cust_id? Of course, if the cust_id doesnt exist in my table, load it in.

I tried something like

select * from ext_Table
where (...) not in (select * from (select max(modified_dt) over (partition by cust_id) max_dt,a.* from cust_hist a) where modified_dt=max_dt)

but nulls are creating havoc?

Thanks

Tom Kyte
November 05, 2004 - 3:05 pm UTC

ops$tkyte@ORA9IR2> create table cust_hist(cust_id int,modified_dt date,i int,j int,k int);
 
Table created.
 
ops$tkyte@ORA9IR2> create table et(cust_id int,modified_dt date,i int,j int,k int);
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> insert into et
  2  select rownum, created, user_id, user_id, user_id
  3   from all_users;
 
20 rows created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> merge into cust_hist
  2  using ( select et.*
  3            from et left OUTER join cust_hist on (et.cust_id = cust_hist.cust_id)
  4           where et.modified_dt > nvl(cust_hist.modified_dt,et.modified_dt-1) ) et
  5  on (cust_hist.cust_id = et.cust_id)
  6  when matched then update set modified_dt = et.modified_dt, i = et.i, j = et.j, k = et.k
  7  when not matched then insert values ( et.cust_id, et.modified_dt, et.i, et.j, et.k )
  8  /
 
20 rows merged.
 
ops$tkyte@ORA9IR2> /
 
0 rows merged.
 
ops$tkyte@ORA9IR2> update et set modified_dt = sysdate, i = 55 where cust_id = 5;
 
1 row updated.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select * from cust_hist where cust_id = 5;
 
   CUST_ID MODIFIED_          I          J          K
---------- --------- ---------- ---------- ----------
         5 24-SEP-04         49         49         49
 
ops$tkyte@ORA9IR2> merge into cust_hist
  2  using ( select et.*
  3            from et left OUTER join cust_hist on (et.cust_id = cust_hist.cust_id)
  4           where et.modified_dt > nvl(cust_hist.modified_dt,et.modified_dt-1) ) et
  5  on (cust_hist.cust_id = et.cust_id)
  6  when matched then update set modified_dt = et.modified_dt, i = et.i, j = et.j, k = et.k
  7  when not matched then insert values ( et.cust_id, et.modified_dt, et.i, et.j, et.k )
  8  /
 
1 row merged.
 
ops$tkyte@ORA9IR2> select * from cust_hist where cust_id = 5;
 
   CUST_ID MODIFIED_          I          J          K
---------- --------- ---------- ---------- ----------
         5 05-NOV-04         55         49         49
 
ops$tkyte@ORA9IR2> insert into et values ( 100, sysdate, 1,2,3 );
 
1 row created.
 
ops$tkyte@ORA9IR2> select * from cust_hist where cust_id = 100;
 
no rows selected
 
ops$tkyte@ORA9IR2> merge into cust_hist
  2  using ( select et.*
  3            from et left OUTER join cust_hist on (et.cust_id = cust_hist.cust_id)
  4           where et.modified_dt > nvl(cust_hist.modified_dt,et.modified_dt-1) ) et
  5  on (cust_hist.cust_id = et.cust_id)
  6  when matched then update set modified_dt = et.modified_dt, i = et.i, j = et.j, k = et.k
  7  when not matched then insert values ( et.cust_id, et.modified_dt, et.i, et.j, et.k )
  8  /
 
1 row merged.
 
ops$tkyte@ORA9IR2> select * from cust_hist where cust_id = 100;
 
   CUST_ID MODIFIED_          I          J          K
---------- --------- ---------- ---------- ----------
       100 05-NOV-04          1          2          3
 
 

A reader, November 05, 2004 - 3:36 pm UTC

Um, not exactly.

update et set modified_dt=sysdate,k=999 where cust_id=100;

Now when I run your MERGE, it updates the cust_id=100 row. I want it to create a NEW row with cust_id=100 and the new modified_dt (thats the reason for having the modified_dt, I dont want to lose any modifications to the cust_id but I dont want to keep storing stuff if nothing changed!)

So, I never want to UPDATE my cust_hist table. I only want to keep appending new rows to the table if the row in the external table is different than the latest row (based on modified_dt) for that cust_id

Help? Thanks


Tom Kyte
November 05, 2004 - 5:48 pm UTC

the problem description was not "very clear".


insert into cust_hist
select * from et
where (cust_id, modified_dt) not in ( select cust_id, modified_dt from cust_hist);

period, done.

cust_id and modified_dt should BOTH be not null -- no null issues.

Not quite

A reader, November 05, 2004 - 6:06 pm UTC

Um, not quite again.

"insert into cust_hist
select * from et
where (cust_id, modified_dt) not in ( select cust_id, modified_dt from cust_hist);"

I dont know how to communicate this better. My vendor is giving me a "dupe" with only the modified_dt different! They insist that they cannot prevent this since they are giving us only a subset of columns and if some other column on the record changes, they ship it to us. But as far as we are concerned, it is a dupe.

In our example, if I have

cust=id=1,modified_dt=1/1/2004,i=1,j=1,k=1

They send me

cust=id=1,modified_dt=1/2/2004,i=1,j=1,k=1

Your SQL above will load this. But I dont want to load this since the content of the record (i/j/k) is the same as the latest record I already have for that cust_id.

I would want to load the following records

cust=id=1,modified_dt=1/2/2004,i=1,j=1,k=null
cust=id=1,modified_dt=1/2/2004,i=1,j=1,k=2
i.e. a different modified_dt (most likely in the future of what I already have, but it could be in the past too). More importantly, the "data" should be "different"!

Help? Thanks

Tom Kyte
November 05, 2004 - 6:29 pm UTC

ahh, thats more informative.


select cust_id, i, j, k from ET
MINUS
select cust_id, i, j, k from cust_hist

shows the records to load (nulls are taken care of)

insert into cust_hist
select * from et
where cust_id in (select cust_id from (THAT_QUERY) );

if you need to get their modified date (unfortunately, we'd have to read that ET twice here)

if you can get away with just loading the CURRENT sysdate as the modified dt, you can:

insert into cust_hist
select x.*, sysdate
from (THAT_QUERY);




Brilliant!

A reader, November 05, 2004 - 8:23 pm UTC

Thanks! Nah, the external table is really small, so reading it twice is not a problem

Thanks again

A reader, November 05, 2004 - 9:33 pm UTC

Hm, wait a minute

select cust_id, i, j, k from ET
MINUS
select cust_id, i, j, k from cust_hist

will give me all the records in the file that are different than any record I already have. I want the record from the file only if it is different than the latest record I have for that cust_id?

Tom Kyte
November 06, 2004 - 10:33 am UTC

this is the strangest "delta" file I've ever heard of then.

you have to join and compare

select et.*
from et, (select *
from (select cust_hist,
max(modified_dt) over (partition by cust_id) maxdt
from cust_hist)
where modified_dt = maxdt) cust_hist
where et.cust_id = cust_hist.cust_id(+)
and (et.modified_dt > cust_hist.modified_dt or cust_hist.cust_id is null)
and ((cust_hist.cust_id is null) OR
(et.i <> cust_hist.i OR (et.i is null and cust_hist.i is not null) OR
(et.i is not null and cust_hist.i is null)) OR
(et.j <> cust_hist.j OR (et.j is null and cust_hist.j is not null) OR
(et.j is not null and cust_hist.j is null)) OR
.......
)

Outer join ET to CUST_HIST.

Keep the records where et.modified_dt exceeds the cust_hist.modified_dt OR the cust_hist.cust_id is NULL (new record).

Then -- look to see if this is a new record OR i's differ OR j's differ and so on....

Deltafile

A reader, November 06, 2004 - 2:51 pm UTC

I already told you the reason...

My vendor is giving me a "dupe" with only the modified_dt different! They insist that they cannot prevent this since
they are giving us only a subset of columns and if some other column on the record changes, they ship it to us. But as far as we are concerned, it is a dupe. So I want to load the record only if something is different than the last version of the record I have!

Its not that strange?

I was hoping that query wouldnt be hairy (there are 50 columns), but I guess there is no other way

Thanks

A reader, November 06, 2004 - 3:58 pm UTC

Instead, cant I do

select cust_id, i, j, k from ET
MINUS
select cust_id, i, j, k from
(select max(modified_dt) over (partition by cust_id) max_dt,
a.* from cust_hist a)
where max_dt=modified_dt

insert into cust_hist
select * from et
where cust_id in (select cust_id from (THAT_QUERY));

I like the way MINUS takes care of all the nulls for me, doing (a is null and b is not null ...) for 50 columns is a little painful?

Would this work? Thanks

Tom Kyte
November 06, 2004 - 6:03 pm UTC

no, for the same reason the simple minus wont work (as far as I can tell..)

A reader, November 07, 2004 - 8:18 pm UTC

"no, for the same reason the simple minus wont work (as far as I can tell..)"

Um, I am a little lost with all the back and forth we have done on this.

What is the reason this wont work?

Tom Kyte
November 08, 2004 - 9:44 am UTC

You said:

<quote>
Hm, wait a minute

select cust_id, i, j, k from ET
MINUS
select cust_id, i, j, k from cust_hist

will give me all the records in the file that are different than any record
I already have. I want the record from the file only if it is different than the
latest record I have for that cust_id?
</quote>

Well:

select cust_id, i, j, k from ET
MINUS
select cust_id, i, j, k from
(select max(modified_dt) over (partition by cust_id) max_dt,
a.* from cust_hist a)
where max_dt=modified_dt

will give you even MORE records -- so, if the first one gave you too many, this'll give you even "more"


btw: should have mentioned


where (a.c <> b.c OR (a.c is null and b.c is not null) OR (a.c is not null and b.c is null))


can be expressed as:

where decode( a.c, b.c, 1, 0 ) = 0;

decode treats NULLs as the same, so that is like saying:

if ( a.c = b.c /* treating nulls as the same */ )
then
return 1
else
return 0
end if

A reader, November 08, 2004 - 9:48 am UTC

select cust_id, i, j, k from ET
MINUS
select cust_id, i, j, k from
(select max(modified_dt) over (partition by cust_id) max_dt,
a.* from cust_hist a)
where max_dt=modified_dt

will give you even MORE records -- so, if the first one gave you too many, this'll give you even "more"

Maybe I need my first cup of coffee, but I dont understand why this would give me more records than the first one? I am just selecting the latest record for each cust_id, and minusing that out from all the records? This would give me all the records in my file that are different than the latest record for that cust_id. Exactly what I wanted? Your MINUS would give me records which are different than ANY record for that cust_id?

Tom Kyte
November 08, 2004 - 4:52 pm UTC

if this:

select cust_id, i, j, k from ET
MINUS
select cust_id, i, j, k from cust_hist

gives you more than you want, that one (which selects LESS records from cust_hist) gives at least as many records from ET if not more (since you are selecting LESS records from cust_hist, you are subtracting at least as many, if not MORE records from ET)

A reader, November 08, 2004 - 4:57 pm UTC

But its not about which query gives me more or less records. It is about which one is more accurate, given my requirements?

As I said, my version of the MINUS would give me all
the records in my file that are different than the latest record for that cust_id. Exactly what I wanted? Your MINUS would give me records which are different than ANY record for
that cust_id?

What am I missing? Thanks

Tom Kyte
November 08, 2004 - 8:35 pm UTC


you want "older records"?

what if in the ET, there is a record for 01-jan-2004

and the current record in your cust_hist is for 04-jan-2004


do you want that record from the ET?


(i see what you are saying now)

A reader, November 08, 2004 - 9:31 pm UTC

"what if in the ET, there is a record for 01-jan-2004
and the current record in your cust_hist is for 04-jan-2004
do you want that record from the ET?"

Yes

As I said earlier, ..."i.e. a different modified_dt (most likely in the future of what I already have, but it could be in the past too). More importantly, the "data" should be
"different"!


So, given all this, is my version of the MINUS query the best way to do this?

Thanks

Tom Kyte
November 08, 2004 - 9:38 pm UTC

but your minus does not filter out "old different" data -- which is the crux of my prior statements.

it only filters out "last modified_dt differences"

We would be back to a simple


select ... from et
minus
select ... from cust_hist


to get:


i.e. a different modified_dt (most likely in the future
of what I already have, but it could be in the past too). More importantly, the
"data" should be
"different"!


if your last modified_dt is 04-jan-2004 for cust_id = 123

And they give you a record with a modified_dt of 01-jan-2004 for cust_id = 123

and the data is different in that record

AND YOU HAVE a cust_hist record with a modified_dt of 01-jan-2004 for cust_id = 123

did you really want to update it?




A reader, November 08, 2004 - 9:48 pm UTC

Well, my cust_hist table is NEVER updated, just inserted into. It has a surrogate PK. So, yes, if they give me a cust_id=123 record dated 1/1/2004, I would just want to generate a new surrogate PK (sequence) and just stuff it in. Typically, though, I would never get a modified_dt from the vendor older than what I already have.

As I said, right now, we are inserting tons of dupes every day because all these checks are not there. I just want to minimize putting in data if nothing has changed.

So, to answer your question,

"if your last modified_dt is 04-jan-2004 for cust_id = 123
And they give you a record with a modified_dt of 01-jan-2004 for cust_id = 123
and the data is different in that record
AND YOU HAVE a cust_hist record with a modified_dt of 01-jan-2004 for cust_id = 123
did you really want to update it?"

In this case, I would just generate a new PK for the record in my cust_hist table and add it in because it is different than the latest record I have for that cust_id?

So, given all this, which would work? Your MINUS or mine? I am still not sure. Thanks

Tom Kyte
November 09, 2004 - 8:29 am UTC

well, then, do you really care about the "last modified dt"

why are you not just diffing the ET with CUST_HIST by your entire record including the modified_dt.

If you have it, skip it, if you don't, you appear to want it.

How to discard records starting with TAB or' ['?

Arun Gupta, November 11, 2004 - 11:08 am UTC

Tom,
I am working on creating an external table. I want to reject records starting with TAB character (0x09) or starting with '['. I tried LOAD WHEN, but the syntax eludes me. The records are delimited by newline and the fields are terminated by whitespace.

Thanks

Tom Kyte
November 11, 2004 - 11:16 am UTC

is column 1 part of the data on the other records?

I'd just as soon "map it"

and then query:


select * from et where column_1 not in ( chr(9), '[' );



Arun Gupta, November 11, 2004 - 11:42 pm UTC

Actually, the file I am trying to load is logs generated by a web logging program. It was very easy, records delimited by newline and fields separated by whitespaces. The correct pattern is:

Action Hostname [Timestamp] User details Other details(newline)
Action Hostname [Timestamp] User details Other details(newline)

Then I hit a road block when the logs suddenly broke the pattern. I hit upon
[Timestamp](newline)
(tab)other_detail_1(newline)
(tab)other_detail_2(newline)
(tab)other_detail_3(newline)

and then the correct pattern continues. The malformed pattern is found throughout the input file mixed with the correct pattern. I have to find a way to reject the malformed records. These records are security information and cannot be lost. They have to go into either badfile or discardfile as a whole without any modifications.

The Timestamp is enclosed in []. When Oracle finds the first record beginning with [, it puts this record into the badfile. Then it continues to load the other_detail_1, other_detail_2 and other_detail_3 into the database. This causes the malformed record to split, which is not acceptable.

One option I thought of was if somehow records beginning with [ and TAB can be rejected, the malformed records will go into the discardfile as a whole and will not be lost or modified. The condition_clause of LOAD WHEN does allow range comparison but I could not make it work. Something like
load when (1:1 != '0x09' OR 1:1 != '[') would be ideal.

Thanks...

Tom Kyte
November 12, 2004 - 7:03 am UTC

sure, you could do that or just map the first character as a column and say:

select * from et where c1 not in ( chr(9), '[' );


which is what I'm going to recommend as the path of least resistance for now :)

Another discard related question

Paul, November 15, 2004 - 3:30 am UTC

I have attempted to use the external table process as a replacement for SQL*Loader. It was progressing well until I bumped into an issue with NULLs.

I am loading the following table:
CREATE TABLE TEST_TBL
(CLAIM_ID VARCHAR2(38) NOT NULL,
DIAG_SEQUENCE NUMBER(10) NOT NULL,
POLICY_HOLDER_ID VARCHAR2(26) NOT NULL,
SUFFIX_ID VARCHAR2(4) NOT NULL,
DIAG_CD VARCHAR2(7) NOT NULL,
DISCHARGE_DATE DATE );

Notice that most of the fields are NOT NULL.

I am using the following external definition:

Create table Test_Tbl_Ext
(CLAIM_ID VARCHAR2(38),
DIAG_SEQUENCE NUMBER(10),
POLICY_HOLDER_ID VARCHAR2(26),
SUFFIX_ID VARCHAR2(4),
DIAG_CD VARCHAR2(7),
DISCHARGE_DATE DATE
)
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
DEFAULT DIRECTORY Test_external_tables
ACCESS PARAMETERS
( RECORDS DELIMITED BY NEWLINE
BADFILE 'Test_tbl.bad'
LOGFILE 'Test_tbl.log'
FIELDS TERMINATED BY ' '
MISSING FIELD VALUES ARE NULL
REJECT ROWS WITH ALL NULL FIELDS
( CLAIM_ID,
DIAG_SEQUENCE,
POLICY_HOLDER_ID,
SUFFIX_ID,
DIAG_CD,
DISCHARGE_DATE date mask "MM/DD/YYYY"
)
)
LOCATION ('Test_tbl.tab')
) REJECT LIMIT UNLIMITED;

As you can see, the records are tab delimited. In testing this, I use the basic "insert into test_tbl ( select * from test_tbl_ext );" syntax.

I was surprised to discover that while it behaves like SQL*Loader and rejects format, size and other data related issues, it does not handle the NOT NULL issue, it just stops with "ORA-01400: cannot insert NULL into ("TEST"."TEST_TBL"."DIAG_CD").

Is there no good way to get an EXTERNAL table to handle the NULLs properly and put them in the bad file? Or is the answer to add NOT NULL tests to every column in the SELECT statement (which seems like overkill.)

Tom Kyte
November 15, 2004 - 6:42 am UTC

umm, you TOLD it that "missing field values are null"

but in any case -- this did handle the nulls properly? having a NULL value is not an error (until you goto load it)

but yes, using "where c1 is not null" would be appropriate here.


Arun Gupta, November 15, 2004 - 10:27 am UTC

Tom,
I followed your suggested method and it works fine. However, I still wanted to experiment with LOAD WHEN and it doesn't seem to work. I used the syntax:

load when ((1:1) != '0x09' or (1:1) != '[')

As per documentation, this should put any records starting with either a TAB or '[' into the discardfile. The behaviour I am getting is that records beginning with '[' are being put in the badfile and records beginning with TAB are being loaded in the database. I have tried on 9i (Solaris), 10g (Linux). The statement is:
CREATE TABLE t_ext_sm_a
(
sm_eventid VARCHAR2(30),
sm_hostname VARCHAR2(30),
sm_timestamp VARCHAR2(30),
sm_username VARCHAR2(1000),
sm_agentname VARCHAR2(1000),
sm_status_sessid VARCHAR2(1000),
sm_reason VARCHAR2(1000),
sm_dummy1 VARCHAR2(100),
sm_dummy2 VARCHAR2(100),
sm_dummy3 VARCHAR2(100),
sm_dummy4 VARCHAR2(100)
)
ORGANIZATION EXTERNAL
(
TYPE ORACLE_LOADER
DEFAULT DIRECTORY SM
ACCESS PARAMETERS
(
RECORDS DELIMITED BY NEWLINE
BADFILE sm:'bad_records.txt'
DISCARDFILE sm:'discard_records.txt'
NOLOGFILE
load when ((1:1) != '0x09' or (1:1) != '[')
FIELDS TERMINATED BY WHITESPACE
MISSING FIELD VALUES ARE NULL
REJECT ROWS WITH ALL NULL FIELDS
(
sm_eventid char(30) terminated by whitespace,
sm_hostname char(30) terminated by whitespace,
sm_timestamp char terminated by whitespace enclosed by '[' and ']',
sm_username char(1000) terminated by whitespace enclosed by '"',
sm_agentname char(1000) terminated by whitespace optionally enclosed by '"',
sm_status_sessid char(1000) terminated by whitespace optionally enclosed by '[' and ']',
sm_reason char(1000) terminated by whitespace optionally enclosed by '[' and ']',
sm_dummy1 char(100) terminated by ',',
sm_dummy2 char(100) terminated by ',',
sm_dummy3 char(100) terminated by ',',
sm_dummy4 char(100) terminated by whitespace
)
)
location ('test.log'))
reject limit unlimited;

I also tried specifying the characterset as WE8MSWIN1252 since the file was generated on Windows machine, but no success.
Thanks


Is this NOT an AND ?

Alan Stephen, November 15, 2004 - 12:06 pm UTC

I suspect that the OT should be an AND. The current expression will allow a row through if the first condition is true OR the second condtion. The '[' passes the first test, but fails the second.

Tom Kyte
November 15, 2004 - 3:49 pm UTC

very good :)

Rearranging a record

A reader, November 23, 2004 - 7:13 am UTC

I have a file with 3 record types with different layouts. I want to rearrange the layout as per a mapping and output the new records

A......
B.....
C....

The mapping is

A5-A10 -> B6-B11
A11 -> C56

and so on

How can I use external tables and a mapping table to do this? Yes, I know I cant write external tables in 9iR2, but I can write the record to a table and dump it out.

Or are Unix text processing tools like awk/perl better suited for this?

Thanks


Tom Kyte
November 23, 2004 - 7:49 am UTC

do you have a mapping table:

old new
---- ----
A5 B6
A6 B7
A7 B8
A8 B9
A9 B10
A10 B11
A11 C56

if you have that, you can

update ( select t1.c1 old_c1, t2.new new_c1
from existing_table t1, mapping_table t2
where t1.c1 = t2.old )
set old_c1 = new_c1;







A reader, November 23, 2004 - 9:19 am UTC

Guess I should have explained better, those A1, B1, etc are not data elements, they are the record type and column positions of the data in the record!

For example, the file has records like

Aasajahaldkhadlahdakldatyu
Bjfksgajasdgalashdajhaadar

I want the records rearranged like

Ahsjdsajksadshakgdadadjassa
Bahsdsajjsahakgakjshsakgads

where the mapping is as per that table i.e. record A, column position 1-5 should go to record B column position 6-10 and so on

I dont see how your update solves the problem?

Tom Kyte
November 23, 2004 - 9:35 am UTC

you have two records? just edit them -- not sure where you are going with this, I've never really heard of exchanging fields between rows like this before.

but if you have just two records, do whatever -- unless there is more "hidden" information here, doesn't seem like a problem you'd even need a computer for.

A reader, November 23, 2004 - 9:46 am UTC

I have 1000s of records in a file, but 5 record types. This is a flat-file from a vendor. They changed the layout on us and I am trying to adjust. The layout changed as per the mapping I keep talking about. Basically, they moved fields around. So, when we send them the records, I want to do the same re-arranging.

Yes, this is purely a text/file processsing problem, but I was hoping some cool exteral tables/SQL trick would do the job?

I already explained the full scope of the problem in my previous followup?

Tom Kyte
November 23, 2004 - 10:28 am UTC

not really (on the explain full scope)

is there "one" A record and "one" B record or does order matter or ......


give me a 10 record example with as much detail as possible.

A reader, November 23, 2004 - 11:37 am UTC

OK here goes

The file contains 10 records with the following format

1:1 = pk
2:2 = record type
3 onwards depends on the record type, doesnt matter what the layout is, we want to access each one as a 1-character field anyway.

The input file is

1Aabcdxyz
1Bqwertyu
1Cpopopoy

The mapping says the data in record type A, position 3-6 needs to go on a B record on positions 10-13
B4-5 -> C3-4
C8-9 -> D6-7

So, given the above input, the output should be

1B abcd (10-13)
1Cwe (3-4)
1D oy (6-7)

i.e. fill up the rhs of the mapping from the lhs and pad with spaces as necessary. Not all information from the source record might be mapped on the target record. Multiple source positions might map to the same target position, in which case the last one wins (as ordered by pk,rec_type).

Since the target could be mapped from any source record type (for that pk), I guess some sort of break-processing is needed. i.e. wait for all source records for pk=1 to be filled up and then start to map and output the target records?

Help? Thanks

Tom Kyte
November 23, 2004 - 12:42 pm UTC

How many A records, B records, C records, etc -- or is there a map PER record (which sounds unreasonable given the number of records you say there would be)

and if there are many A records, many B records -- how to you know "which" is "what"

A reader, November 23, 2004 - 11:47 am UTC

The pk for each "set" is the pk+rec_type, so for id=1, there will be only 1 A record, 1 B record, etc.

Tom Kyte
November 23, 2004 - 1:32 pm UTC

ops$tkyte@ORA9IR2> select pk || typ ||
  2         case when typ = 'A' then arec
  3                  when typ = 'B' then rpad(nvl(substr(brec,1,9-2),' '),7) || nvl(substr(arec,3-2,4),rpad(' ',4)) || substr(brec,14-2)
  4                  when typ = 'C' then rpad(nvl(substr(crec,1,2-2),' '),0) || nvl(substr(brec,4-2,2),rpad(' ',2)) || substr(crec,5-2)
  5                  when typ = 'D' then rpad(nvl(substr(drec,1,6-2),' '),4) || nvl(substr(crec,8-2,2),rpad(' ',2)) || substr(drec,8-2)
  6                  end rec
  7    from (
  8  select pk, typ,
  9         max( case when typ='A' then data end ) over (partition by pk) arec,
 10         max( case when typ='B' then data end ) over (partition by pk) brec,
 11         max( case when typ='C' then data end ) over (partition by pk) crec,
 12         max( case when typ='D' then data end ) over (partition by pk) drec
 13    from t
 14         )
 15  /
 
REC
--------------------------------------------------------------
1Aabcdxyz
1Bqwertyuabcd
1Cwepopoy
1D    oy
 


would be a sql approach. 

A reader, November 23, 2004 - 1:34 pm UTC

"How many A records, B records, C records, etc -- or is there a map PER record"

1 record of each type per pk (pk+rec_type is unique)

Well, the map is per record type

Ax - Ay maps to Bm - Bn
for all A records

Think of it as one logical entity split over 5 physical records



Tom Kyte
November 23, 2004 - 2:28 pm UTC

sorry -- you've seen my "approach", i'm not following your lingo -- ax-ay maps to bm-bn ?

My approach using sql would be a variation on the above -- substr. See what you see with that, else I guess use whatever tool you see fit.

A reader, November 23, 2004 - 1:43 pm UTC

Your output doesnt match my required output.

1. If there is no mapping for the target position, it should be blank, the source record shouldnt "peek through".

2. The output shouldnt have the A record since it is not present in the target after mapping

3. How can I put the mapping into a table so that the query can be table driven?

4. In any case, can you please explain your solution so I can possibly adapt it for my needs?

Thanks

Tom Kyte
November 23, 2004 - 2:46 pm UTC

please -- use the approach, you should have everything you need to develop your own approach if you understood what I did.

I put the "old records" value there, should be pretty trivial for you to blank it out.

you can use a where clause to remove whatever you want.


All my thing is doing is making every record type (a,b,c,d,....) available to every row by PK value -- so PK = 1 and TYP=A record, it has the arec, brec, crec, drec, whatever.




A reader, November 23, 2004 - 3:53 pm UTC

Right, I understand your approach, my main question was how can I make it table-driven i.e. your query has 3 || on each columns because I specified 3 "transformation rules".

What if I want these mapping rules in a table and apply all the rules on each row? The table would look like

CREATE TABLE mapping
(
S_REC_TYPE VARCHAR2(1),
S_START INTEGER,
S_END INTEGER,
T_REC_TYPE VARCHAR2(1),
T_START INTEGER,
T_END INTEGER
)

Thanks

Tom Kyte
November 23, 2004 - 7:25 pm UTC

"join"

join by s_rec_type. you have your substr subscripts now.

in case you can tolerate another opinion ...

Gabe, November 23, 2004 - 5:15 pm UTC

To “A reader” …

You seem to really want an answer here … if you can take a bit of criticism … this piecemeal approach to _fully_ and _clearly_ define the context of your question (plus the absence of resources … create table, etc.) doesn’t usually help in getting a quick and complete answer from Tom. BTW, your requirements are still not fully and clearly specified.

In any case, maybe this will help a bit and get you started (my resources are at the end) …

Source Data:

flip@FLOP> select * from src t order by substr(text,1,2);

TEXT
---------------------------------------------------------------
1Aabcdxyz
1Bqwertyu
1Cpopopoy
1Dabcdefghijkl
1E1234567890
2Amnbvcxz
2Blmnbgre
2Ciuytree
2Dlkjghfdsaert
2Ethfsgj123456

Maps:

flip@FLOP> select * from map_s2t_vw;

MAPPING_RULE
-------------------------------------------
A3-6 -> B10-13
B4-5 -> C3-4
C8-9 -> D6-7
D10-14 -> E6-10

Results:

flip@FLOP> col text format a30
flip@FLOP> select t.pk,t.tgt_rt,collapse(collapse_expr(t.field,t.tgt_from,t.tgt_thru)) text
2 from (
3 select s.text
4 ,substr(s.text,1,1) pk
5 ,substr(s.text,2,1) rt
6 ,substr(s.text,m.src_from,m.src_thru-m.src_from+1) field
7 ,m.tgt_rt
8 ,m.tgt_from
9 ,m.tgt_thru
10 from src s, map_s2t m
11 where substr(s.text,2,1) = m.src_rt
12 ) t
13 group by t.pk,t.tgt_rt
14 order by t.pk,t.tgt_rt;

P T TEXT
- - ------------------------------
1 B _________abcd
1 C __we
1 D _____oy
1 E _____hijkl
2 B _________mnbv
2 C __mn
2 D _____ee
2 E _____saert

Notes:
1. the results are not deterministic … the order for collapsing is not deterministic … you would have to modify the “collapse_agg_type” to implement a _controlled_ collapsing (I didn’t do it because your requirement for ordering is ambiguous).
2. I put underscores instead of blanks (one can see them !!!)
3. There is a limit of 4000 characters.
4. I didn't bother to look into the correctness of odciaggregatemerge.

Resources:

create table src ( text varchar2(4000) not null );

create table map_s2t
( src_rt varchar2(1) not null check (src_rt in ('A','B','C','D','E'))
,src_from number(9) not null check (src_from>2)
,src_thru number(9) not null check (src_thru>2)
,tgt_rt varchar2(1) not null check (tgt_rt in ('A','B','C','D','E'))
,tgt_from number(9) not null check (tgt_from>2)
,tgt_thru number(9) not null check (tgt_thru>2)
,constraint chk_map check (src_thru-src_from = tgt_thru-tgt_from)
);

insert into map_s2t values ('A', 3, 6,'B',10,13);
insert into map_s2t values ('B', 4, 5,'C', 3, 4);
insert into map_s2t values ('C', 8, 9,'D', 6, 7);
insert into map_s2t values ('D',10,14,'E', 6,10);

create or replace view map_s2t_vw as
select decode( src_from,src_thru
,src_rt||to_char(src_from)||' -> '||tgt_rt||to_char(tgt_from)
,src_rt||to_char(src_from)||'-'||to_char(src_thru)||' -> '
||tgt_rt||to_char(tgt_from)||'-'||to_char(tgt_thru)
) mapping_rule
from map_s2t
;

--12345678901234
insert into src values ('1Aabcdxyz');
insert into src values ('1Bqwertyu');
insert into src values ('1Cpopopoy');
insert into src values ('1Dabcdefghijkl');
insert into src values ('1E1234567890');

insert into src values ('2Amnbvcxz');
insert into src values ('2Blmnbgre');
insert into src values ('2Ciuytree');
insert into src values ('2Dlkjghfdsaert');
insert into src values ('2Ethfsgj123456');

create or replace type collapse_expr as object
( str varchar2(4000)
,pos_from number(9)
,pos_thru number(9)
);
/
show errors

CREATE OR REPLACE TYPE collapse_agg_type AS OBJECT
(
str VARCHAR2 (4000),

STATIC FUNCTION odciaggregateinitialize (
sctx IN OUT collapse_agg_type)
RETURN NUMBER,

MEMBER FUNCTION odciaggregateiterate (
SELF IN OUT collapse_agg_type,
ctx IN collapse_expr)
RETURN NUMBER,

MEMBER FUNCTION odciaggregateterminate (
SELF IN collapse_agg_type,
returnvalue OUT VARCHAR2,
flags IN NUMBER)
RETURN NUMBER,

MEMBER FUNCTION odciaggregatemerge (
SELF IN OUT collapse_agg_type,
ctx2 collapse_agg_type)
RETURN NUMBER
);
/
show errors

CREATE OR REPLACE TYPE BODY collapse_agg_type
AS
STATIC FUNCTION odciaggregateinitialize (
sctx IN OUT collapse_agg_type)
RETURN NUMBER
IS
BEGIN
sctx := collapse_agg_type(null);
RETURN odciconst.success;
END;

MEMBER FUNCTION odciaggregateiterate (
SELF IN OUT collapse_agg_type,
ctx IN collapse_expr)
RETURN NUMBER
IS
len pls_integer;
BEGIN
if self.str is null then
self.str := lpad(ctx.str,ctx.pos_thru,'_');
else
len := length(self.str);

if len < ctx.pos_from then
self.str := self.str || lpad(ctx.str,ctx.pos_from-len-1,'_');
elsif len >= ctx.pos_thru then
self.str := substr(self.str,1,ctx.pos_from-1)||ctx.str||substr(self.str,ctx.pos_thru+1);
else
self.str := substr(self.str,1,ctx.pos_from-1)||ctx.str;
end if;
end if;
RETURN odciconst.success;
END;

MEMBER FUNCTION odciaggregateterminate (
SELF IN collapse_agg_type,
returnvalue OUT VARCHAR2,
flags IN NUMBER)
RETURN NUMBER
IS
BEGIN
returnvalue := SELF.str;
RETURN odciconst.success;
END;

MEMBER FUNCTION odciaggregatemerge (
SELF IN OUT collapse_agg_type,
ctx2 IN collapse_agg_type)
RETURN NUMBER
IS
BEGIN
SELF.str := SELF.str || ctx2.str;
RETURN odciconst.success;
END;
END;
/
show errors

CREATE OR REPLACE FUNCTION collapse ( ctx IN collapse_expr)
RETURN VARCHAR2 DETERMINISTIC PARALLEL_ENABLE
AGGREGATE USING collapse_agg_type;
/
show errors

Good Luck.

To Gabe

A reader, November 23, 2004 - 9:54 pm UTC

Gasp! That is exactly what I am looking for, wish I had put it that way the first time and not went back and forth with Tom.

1. Why do you say the collapsing is not deterministic?
2. I dont really care about odciaggregatemerge, I can just remove the parallel_enable?
3. Finally, I tried but didnt quite get how the aggregate stuff is working? What exactly does collapse(collapse_expr(t.field,t.tgt_from,t.tgt_thru)) do?

Thanks a lot!

To Gabe

A reader, November 23, 2004 - 10:40 pm UTC

I think I understand this now. I added another row to the mapping table

insert into map_s2t values ('A',7,9,'B', 3,5);

and did (in 10g)

set lines 100
col stragg format a40
col text format a40

SQL> select t.pk,t.tgt_rt,
  2  cast(collect(t.field||'/'||t.tgt_from||'/'||t.tgt_thru) as varchar2_tab) stragg,
  3  collapse(collapse_expr(t.field,t.tgt_from,t.tgt_thru)) text
  4   from (
  5   select s.text
  6         ,substr(s.text,1,1) pk
  7         ,substr(s.text,2,1) rt
  8         ,substr(s.text,m.src_from,m.src_thru_m.src_from+1) field
  9         ,m.tgt_rt
 10         ,m.tgt_from
 11         ,m.tgt_thru
 12   from   src s, map_s2t m
 13   where  substr(s.text,2,1) = m.src_rt
 14   ) t
 15   group by t.pk,t.tgt_rt
 16   order by t.pk,t.tgt_rt;

P T STRAGG                                             TEXT
_ _ __________________________________________________ ______________________________
1 B VARCHAR2_TAB('xyz/3/5', 'abcd/10/13')              __xyzabcd
1 C VARCHAR2_TAB('we/3/4')                             __we
1 D VARCHAR2_TAB('oy/6/7')                             _____oy
1 E VARCHAR2_TAB('hijkl/6/10')                         _____hijkl
2 B VARCHAR2_TAB('cxz/3/5', 'mnbv/10/13')              __cxzmnbv
2 C VARCHAR2_TAB('mn/3/4')                             __mn
2 D VARCHAR2_TAB('ee/6/7')                             _____ee
2 E VARCHAR2_TAB('saert/6/10')                         _____saert

8 rows selected.

I see that in the VARCHAR2_TAB('xyz/3/5', 'abcd/10/13'), the abcd/10/13 part is getting processed first and then the xyz/3/5 resulting in a final result of "__xyzabcd". The result should be "__xyz____abcd"

But how can the order in which the aggregate function gets it inputs be controlled?

Thanks 

To Gabe

A reader, November 23, 2004 - 11:00 pm UTC

SQL> select t.pk,t.tgt_rt,
  2  cast(collect(t.field||'/'||t.tgt_from||'/'||t.tgt_thru) as varchar2_tab) stragg,
  3  collapse(collapse_expr(t.field,t.tgt_from,t.tgt_thru))  text
  4   from (
  5   select s.text
  6         ,substr(s.text,1,1) pk
  7         ,substr(s.text,2,1) rt
  8         ,substr(s.text,m.src_from,m.src_thru-m.src_from+1) field
  9         ,m.tgt_rt
 10         ,m.tgt_from
 11         ,m.tgt_thru
 12   from   src s, <b>(select * from map_s2t order by tgt_rt,tgt_from)</b> m
 13   where  substr(s.text,2,1) = m.src_rt
 14   ) t
 15   group by t.pk,t.tgt_rt
 16   order by t.pk,t.tgt_rt;

P T STRAGG                                             TEXT
- - -------------------------------------------------- ------------------------------
1 B VARCHAR2_TAB('abcd/10/13', 'xyz/3/5')              --xyz----abcd
1 C VARCHAR2_TAB('we/3/4')                             --we
1 D VARCHAR2_TAB('oy/6/7')                             -----oy
1 E VARCHAR2_TAB('hijkl/6/10')                         -----hijkl
2 B VARCHAR2_TAB('mnbv/10/13', 'cxz/3/5')              --cxz----mnbv
2 C VARCHAR2_TAB('mn/3/4')                             --mn
2 D VARCHAR2_TAB('ee/6/7')                             -----ee
2 E VARCHAR2_TAB('saert/6/10')                         -----saert

seems to work as I want. Did I just get lucky or is that what you meant by controlled collapsing? 

good to see you got busy on your task ...

Gabe, November 24, 2004 - 12:07 am UTC

To "A reader" ...

Yes ... you did get lucky ... see Tom's approach
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:15637744429336#16551777586484 <code>
Basically you have to cache/array the fields during odciaggregateiterate and only assemble the final aggregate in odciaggregateterminate ... you can _controll_ the collapsing in here by sorting the cache/array based on whatever criteria you decide (you may have to extend the collapse_expr object to send more stuff in).




To Gabe

A reader, November 24, 2004 - 9:15 am UTC

OK even without the 'order by' on the select from map_s2t, how about if I do this?

if len < ctx.pos_from then
self.str := self.str || rpad('_',ctx.pos_from-len,'_') || lpad(ctx.str,ctx.pos_from-len-1,'_');

That also seems to solve my deterministic/ordering issue? i.e. the results are the same regardless of the ordering of the rows in the map table? Comments?

Thanks

P.S: This is a really creative use of user-defined aggregates. Never thought of using them to do arbitrary string "aggregation" in this fashion! Opens up a lot of possibilities, thanks!

How to Pass unix parameter to sqlplus within a create external table statement..

Ananth, January 14, 2005 - 11:53 am UTC

Unix variable to input location in external table is not passing to SQL location. Need you help .....

[oraerpp@test ap_inv]$ INPUT_DATA=`ls -mtr|cut -d',' -f1|head -1`

[oraerpp@test ap_inv]$ echo $INPUT_DATA
demo.dat

[oraerpp@test ap_inv]$ sqlplus apps/apps@erpp @ext_tab.sql '${INPUT_DATA}'

SQL*Plus: Release 9.2.0.3.0 - Production on Fri Jan 14 08:06:28 2005
Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.
Connected to:
Oracle9i Enterprise Edition Release 9.2.0.3.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.3.0 - Production

Enter value for input_data: 
old  22: location('&INPUT_DATA'))
new  22: location(''))

Table created.

SQL> exit
Disconnected from Oracle9i Enterprise Edition Release 9.2.0.3.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.3.0 - Production
[oraerpp@test ap_inv]$ 
CREATE TABLE apps.ext_tab (
           empno    VARCHAR2(5 BYTE),
           ename    VARCHAR2(25 BYTE),
           job      VARCHAR2(10  BYTE),
           deptno   VARCHAR2(5  BYTE)
       )
ORGANIZATION EXTERNAL 
(TYPE oracle_loader DEFAULT DIRECTORY LOG_DIR_AP_INV 
ACCESS parameters 
(records delimited BY newline 
badfile LOG_DIR_AP_INV:'demodata%a_%p.bad' 
discardfile LOG_DIR_AP_INV:'demodata%a_%p.dis' 
LOGFILE LOG_DIR_AP_INV:'demodata%a_%p.log'
fields LRTRIM
MISSING FIELD VALUES ARE NULL 
(EMPNO     POSITION(1:5)       CHAR
,ENAME     POSITION(6:30)      CHAR
,JOB       POSITION(31:40)     CHAR
,DEPTNO    POSITION(41:45)     CHAR
)
) 
location('&INPUT_DATA'))
REJECT LIMIT UNLIMITED NOPARALLEL;

Need help to pass location as parameter from unix to sqlplus 

Tom Kyte
January 14, 2005 - 7:59 pm UTC

it would be &1 as a parameter to the script, not the name of your environment variable.


put

define &INPUT_DATA=&1

at the top of your script.

Validating external tables

A reader, January 17, 2005 - 12:27 pm UTC

I have a set of 4 files that need to be cross-checked against each other to be considered "good". Also, each file has a header/trailer with sums to ensure that the file is self-consistent.

If I create these 4 files as external tables and do all the stuff above, the files, in effect, would be "loaded" multiple times during the course of doing all the validations above.

In this case, would it make sense to just use traditional sqlldr and load the files into "staging" tables instead of using the external table feature?

Thanks

Tom Kyte
January 17, 2005 - 5:18 pm UTC

insufficient data to answer -- but probably just use the ETs as if they were your staging tables. You'll be full scanning SOMETHING multiple times. so why add an EXTRA full scan (to load into stage)?

Full scanning something

A reader, January 17, 2005 - 7:55 pm UTC

"You'll be full scanning SOMETHING multiple times"

Right, but it doesnt quite "feel" right to invoke the access driver multiple times for the same file?

In terms of resource usage, wouldnt it be more efficient to do a traditional load into staging and then FTS the staging tables multiple times?

Tom Kyte
January 17, 2005 - 8:41 pm UTC

Maybe, maybe not -- I'd benchmark your situation....


ops$tkyte@ORA9IR2> drop table stage;
 
Table dropped.
 
ops$tkyte@ORA9IR2> REM drop table external_table;
ops$tkyte@ORA9IR2> REM drop table emp;
ops$tkyte@ORA9IR2> REM
ops$tkyte@ORA9IR2> REM create or replace directory data_dir as '/tmp/'
ops$tkyte@ORA9IR2> REM /
ops$tkyte@ORA9IR2> REM
ops$tkyte@ORA9IR2> REM create table external_table
ops$tkyte@ORA9IR2> REM (EMPNO NUMBER(4) ,
ops$tkyte@ORA9IR2> REM  ENAME VARCHAR2(10),
ops$tkyte@ORA9IR2> REM  JOB VARCHAR2(9),
ops$tkyte@ORA9IR2> REM  MGR NUMBER(4),
ops$tkyte@ORA9IR2> REM  HIREDATE DATE,
ops$tkyte@ORA9IR2> REM  SAL NUMBER(7, 2),
ops$tkyte@ORA9IR2> REM  COMM NUMBER(7, 2),
ops$tkyte@ORA9IR2> REM  DEPTNO NUMBER(2)
ops$tkyte@ORA9IR2> REM )
ops$tkyte@ORA9IR2> REM ORGANIZATION EXTERNAL
ops$tkyte@ORA9IR2> REM ( type oracle_loader
ops$tkyte@ORA9IR2> REM   default directory data_dir
ops$tkyte@ORA9IR2> REM   access parameters
ops$tkyte@ORA9IR2> REM   ( fields terminated by ',' )
ops$tkyte@ORA9IR2> REM   location ('emp.dat')
ops$tkyte@ORA9IR2> REM )
ops$tkyte@ORA9IR2> REM /
ops$tkyte@ORA9IR2> REM
ops$tkyte@ORA9IR2> REM create table emp as select * from scott.emp where 1=0;
ops$tkyte@ORA9IR2> REM exec gen_data('EMP',50000);
ops$tkyte@ORA9IR2> REM insert /*+ append */ into emp select * from emp;
ops$tkyte@ORA9IR2> REM commit;
ops$tkyte@ORA9IR2> REM insert /*+ append */ into emp select * from emp;
ops$tkyte@ORA9IR2> REM commit;
ops$tkyte@ORA9IR2> REM insert /*+ append */ into emp select * from emp;
ops$tkyte@ORA9IR2> REM commit;
ops$tkyte@ORA9IR2> REM insert /*+ append */ into emp select * from emp;
ops$tkyte@ORA9IR2> REM commit;
ops$tkyte@ORA9IR2> REM insert /*+ append */ into emp select * from emp;
ops$tkyte@ORA9IR2> REM commit;
ops$tkyte@ORA9IR2> REM host flat / emp > /tmp/emp.dat
ops$tkyte@ORA9IR2> set autotrace on
ops$tkyte@ORA9IR2> select count(*) from external_table
  2  /
 
  COUNT(*)
----------
   1600000
 
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   SORT (AGGREGATE)
   2    1     EXTERNAL TABLE ACCESS (FULL) OF 'EXTERNAL_TABLE'
 
 
 
 
Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
         25  consistent gets
          0  physical reads
          0  redo size
        380  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          1  rows processed
 
ops$tkyte@ORA9IR2> select count(*) from emp
  2  /
 
  COUNT(*)
----------
   1600000
 
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   SORT (AGGREGATE)
   2    1     TABLE ACCESS (FULL) OF 'EMP'
 
 
 
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      12619  consistent gets
       7649  physical reads
        420  redo size
        380  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
 
ops$tkyte@ORA9IR2> set autotrace off
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec runStats_pkg.rs_start;
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> select count(*) from external_table;
 
  COUNT(*)
----------
   1600000
 
ops$tkyte@ORA9IR2> select count(*) from external_table;
 
  COUNT(*)
----------
   1600000
 
ops$tkyte@ORA9IR2> select count(*) from external_table;
 
  COUNT(*)
----------
   1600000
 
ops$tkyte@ORA9IR2> exec runStats_pkg.rs_middle;
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> create table stage as select * from external_table;
 
Table created.
 
ops$tkyte@ORA9IR2> select count(*) from stage;
 
  COUNT(*)
----------
   1600000
 
ops$tkyte@ORA9IR2> select count(*) from stage;
 
  COUNT(*)
----------
   1600000
 
ops$tkyte@ORA9IR2> select count(*) from stage;
 
  COUNT(*)
----------
   1600000
 
ops$tkyte@ORA9IR2> exec runStats_pkg.rs_stop(1000);
Run1 ran in 390 hsecs
Run2 ran in 780 hsecs
run 1 ran in 50% of the time
 
Name                                  Run1        Run2        Diff
STAT...OS Page reclaims                  0       1,038       1,038
STAT...OS User time used                 0       1,049       1,049
STAT...db block gets                   530       1,642       1,112
STAT...recursive calls                  33       1,150       1,117
STAT...db block changes              1,024       2,230       1,206
STAT...redo entries                    509       1,723       1,214
LATCH.redo allocation                  509       1,735       1,226
LATCH.library cache                    156       1,756       1,600
LATCH.simulator lru latch                1       2,365       2,364
LATCH.simulator hash latch               1       4,799       4,798
LATCH.multiblock read objects            0       9,450       9,450
STAT...physical writes                   0      12,599      12,599
STAT...physical writes non che           0      12,599      12,599
STAT...physical writes direct            0      12,599      12,599
LATCH.row cache enqueue latch       26,826      10,310     -16,516
LATCH.row cache objects             40,236      14,906     -25,330
STAT...prefetched blocks                 0      33,072      33,072
STAT...table scan blocks gotte           6      37,801      37,795
STAT...physical reads                    0      37,797      37,797
LATCH.cache buffers lru chain            8      37,807      37,799
STAT...free buffer requested             4      37,808      37,804
STAT...no work - consistent re          14      37,986      37,972
STAT...buffer is not pinned co          75      38,258      38,183
STAT...consistent gets                  82      38,511      38,429
STAT...session logical reads           612      40,153      39,541
STAT...session pga memory                0      65,536      65,536
STAT...session pga memory max            0      65,536      65,536
STAT...redo size                    65,300     180,832     115,532
LATCH.cache buffers chains           2,698     121,499     118,801
STAT...table scan rows gotten           24   4,800,010   4,799,986
 
Run1 latches total versus runs -- difference and pct
Run1        Run2        Diff       Pct
70,908     209,249     138,341     33.89%
 
PL/SQL procedure successfully completed.
 
<b>this is by NO MEANS conclusive proof of anything</b>....  I would seriously consider a benchark -- since it is so easy to do.... just name your external table after your stage table name and see how it does...

 

Benchmark

A reader, January 17, 2005 - 9:39 pm UTC

But thats just the thing, its kinda hard to benchmark using these tools for measuring resource usage when using external tables. In the example you showed,

Statistics
----------------------------------------------------------
8 recursive calls
0 db block gets
25 consistent gets
0 physical reads

all the lios above are all due to reading the dictionary (8 recursive calls) to parse the access parameters and stuff. They have nothing to do with the resource usage of the access driver to actually load the file.

Comparing this to

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
12619 consistent gets
7649 physical reads

is not really apples-to-apples, is it?

The only relevant statistic regarding external table resource usage is wall-clock time, which is really useless except in a single-user system.

Thanks

Tom Kyte
January 18, 2005 - 9:36 am UTC

a benchmark is many things -- that is why I say sometimes "in general, when tuning a query, the goal is to reduce LIOs" -- but the keyword is in general.  Consider:


ops$tkyte@ORA9IR2> create table t as select * from all_objects;
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> set arraysize 2
ops$tkyte@ORA9IR2> set autotrace traceonly statistics;
ops$tkyte@ORA9IR2> select * from t;
 
27947 rows selected.
 
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      14176  consistent gets
        383  physical reads
          0  redo size
    3418435  bytes sent via SQL*Net to client
     154202  bytes received via SQL*Net from client
      13975  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      27947  rows processed
 
ops$tkyte@ORA9IR2> select * from t order by 2;
 
27947 rows selected.
 
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        387  consistent gets
          0  physical reads
          0  redo size
    3212746  bytes sent via SQL*Net to client
     154202  bytes received via SQL*Net from client
      13975  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
      27947  rows processed
 
ops$tkyte@ORA9IR2> set autotrace off


Now, that second query -- where did the LIO's go?  Well, they went away obviously, but where?  they went to TEMP.  

For you see, we don't do an LIO on TEMP -- we don't do a consistent get against temp, we don't NEED to.

Same with ET's -- we don't need to goto the buffer cache, we don't need to hit the buffer cache over and over.

So, which query is "better"?  


Benchmark it -- which ever one in your environment runs best wins?  Factors to consider are latching and such -- but hey, this is a bulk load so multi-user scaling doesn't count.


You need to look at what you are trying to do, benchmark different implementations (easy in this case since you just name your external table after your stage table and run the same code effectively "two times") and see which flys the best in your case.


Here, I was looking predominantly at "time to complete".  If we just benchmark 3 full scans against an external table and a real table, the real table wins hands down.  However, throw in the full scan of the external table and a load of a stage table followed by 3 full scans and the external table wins -- but then again, when you do something more complex than just full scan -- you join, hash, whatever -- you might find something different. 

Benchmark

A reader, January 18, 2005 - 9:51 am UTC

Valid points, I will keep them in mind.

Another question:

When I do something like

select ...
from et1,et2,et3
where ...

i.e. treat the ext tables as real tables and join them, the only access path for a ext table is a full scan, so the above statement will always perform better with 3 real tables than 3 external tables, right?

Thanks

Tom Kyte
January 18, 2005 - 10:13 am UTC

maybe -- maybe not.

maybe not if before you do this -- you have to load 3 real tables!

remember, they will probably be hash joined -- so they are full scanned into memory, hashed, perhaps swapped out to temp. Just like a "real table". beyond the act of full scanning, what happens afterwards is the same.


If you have fixed width files (not delimited) you might even find the external table to be "as fast" as a real table (most of the time with external tables is spent parsing, using positional files -- which are larger -- parse faster)

problem with external table

yogi, January 21, 2005 - 6:50 pm UTC

Tom,
I have problem with using external table as follows:

create table external_table
(switch varchar2(200),
location varchar2(200),
npa varchar2(20)
)
organization external
(typ oracle_loader
default directory data_dir
access parameters
( records delimited by newline
nologfile
fields terminated by ','
missing field values are null
)
location('switch_npa.csv')
)
reject limit unlimited
/
I get following error .
ORA-29913: error in executing ODCIEXTTABLEFETCH callout.

Surprisingly, i don't get any other details following this error.

Can you please tell me what could be possible causes for this error ? Infact, I can get data from csv file using utl_file , using same directory. So it is not problem with file permissions.


Tom Kyte
January 21, 2005 - 8:19 pm UTC

I did have to change "TYP" to "TYPE" in the typ oracle_loader line... 

but if that wasn't the case for you -- can you make a full test case like this fail?




ops$tkyte@ORA9IR2> create or replace directory data_dir as '/tmp'
  2  /
 
Directory created.
 
ops$tkyte@ORA9IR2> !echo a,b,c > /tmp/switch_npa.csv
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create table external_table
  2  (switch varchar2(200),
  3   location varchar2(200),
  4   npa         varchar2(20)
  5  )
  6  organization external
  7  (type oracle_loader
  8   default directory data_dir
  9   access parameters
 10  ( records delimited by newline
 11    nologfile
 12    fields terminated by ','
 13    missing field values are null
 14  )
 15   location('switch_npa.csv')
 16  )
 17  reject limit unlimited
 18  /
 
Table created.
 
ops$tkyte@ORA9IR2> select * from external_table;
 
SWITC LOCAT NPA
----- ----- --------------------
a     b     c
 
 

External table

A reader, January 22, 2005 - 2:00 pm UTC

thanks for the reply.
I still get 'ORA-29913: error in executing ODCIEXTTABLEFETCH callout'
Quick question on file permission.
The Oracle directory created for '/btd/sql/data' .I have read permission on '/btd/sql/data' ,dont have write permission. Does oracle need write permission on unix directory ? I use 'NOLOGFILE' in access parameters so that oracle does not create any log files on that directory.

I truly appreciate your help.
thanks.



Tom Kyte
January 22, 2005 - 4:31 pm UTC

yes, if you didn't specify a log/bad file location to be "elsewhere".

it'll try to create them there and fail.

you still have the "bad" files to worry about too -- best to set up another directory you do have write on and send them over there (they would be useful for you to figure out "whats wrong")

Validating data

A reader, January 31, 2005 - 4:07 pm UTC

I have a external table that needs to be validated against some business rules (rule1 thru ruleN) and data that violates the rules should be spit out

select
pk,
case when a+b!=c then 'rule1 violated'
case when x+y!=z then 'rule2 violated'
...
end
from
et
where
(
(a+b!=c) or
(x+y!=z) or
...
);


How can I avoid putting the rules in both the SELECT and the WHERE clauses?

Thanks

Tom Kyte
January 31, 2005 - 4:17 pm UTC

select *
from ( select pk, case when this then 'rule1 violated'
when that then 'rule2 violated'
....
end rule_check
from et
)
where rule_check IS NOT NULL;



A reader, January 31, 2005 - 4:21 pm UTC


Brilliant!

A reader, January 31, 2005 - 4:32 pm UTC


Elapsed time

A reader, January 31, 2005 - 4:44 pm UTC

Strange...

select
pk,
case when a+b!=c then 'rule1 violated'
case when x+y!=z then 'rule2 violated'
...
end
from
et
where
(
(a+b!=c) or
(x+y!=z) or
...
)

shows

Elapsed: 00:00:18.31

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
37312 consistent gets
34798 physical reads
60 redo size
7974 bytes sent via SQL*Net to client
1161 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
6 rows processed

while

your version shows

Elapsed: 00:01:07.72

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
37311 consistent gets
34801 physical reads
0 redo size
995 bytes sent via SQL*Net to client
1174 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
6 rows processed

The LIO/PIO are almost identical, so why does your version take so much longer elapsed time than mine?

Thanks

Tom Kyte
January 31, 2005 - 4:54 pm UTC

tkprof it. with 10046 level 12 (waits...)


and for grins, add rownum to the inline view first and a where rownum > 0 to the inline view.

Here is the tkprof with 10046 level 12

A reader, January 31, 2005 - 4:56 pm UTC

Also, I loaded the ET into a real table to remove any differences due to the access driver

Mine

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.02 0.01 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 14.07 18.43 34796 37312 0 6
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 14.09 18.45 34796 37312 0 6

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 24

Rows Row Source Operation
------- ---------------------------------------------------
6 TABLE ACCESS FULL REAL_TABLE


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 2 0.00 0.00
db file scattered read 634 0.05 7.16
db file sequential read 627 0.00 0.15
SQL*Net message from client 2 0.00 0.01


Yours

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.01 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 74.16 76.42 34788 37312 0 6
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 74.17 76.44 34788 37312 0 6

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 24

Rows Row Source Operation
------- ---------------------------------------------------
6 TABLE ACCESS FULL REAL_TABLE


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 2 0.00 0.00
db file sequential read 640 0.01 0.09
db file scattered read 623 0.04 6.57
SQL*Net message from client 2 0.00 0.01
********************************************************************************


Tom Kyte
January 31, 2005 - 5:56 pm UTC

can i hgave a real test case to play with?

did you do rownum like I asked?

rownum

A reader, January 31, 2005 - 5:19 pm UTC

If I add rownum stuff to the inline view and the outer query, it doesnt make it any faster, it just changes my row source operation counts

Rows Row Source Operation
------- ---------------------------------------------------
6 VIEW
1415090 COUNT
1415090 FILTER
1415090 TABLE ACCESS FULL REAL_TABLE

Rows Row Source Operation
------- ---------------------------------------------------
6 VIEW
1415090 COUNT
1415090 TABLE ACCESS FULL REAL_TABLE

My version is still 18sec vs your 75sec

Thanks



Tom Kyte
January 31, 2005 - 6:03 pm UTC

need test

Test case

VA, January 31, 2005 - 6:30 pm UTC

I will try to whip up a small concise complete test case, but the reason seems obvious to me?

Since Oracle fetches rows satisfying the WHERE clause and then applies any expressions in the SELECT list, my version is fetching 6 rows and applying my business rules (CASE ... END) to these 6 rows.

Your version fetches all 1.5 million rows and executes the CASE ... END for each of these rows and finally throws out all but 6 rows.

Wont that account for the large differences in elapsed time between the 2 approaches?

Tom Kyte
January 31, 2005 - 8:27 pm UTC

well,

select pk, case when this then 'x'
when that then 'y'
when the_other_thing then 'z'
end
from table
where this OR that OR the_other_thing


would do the where clause against 1.5 million rows and then apply the case (with the same "where's" basically.

this could be a situation where "where clause processing is much faster than doing the equivalent in a CASE"


Because, I do case against 1.5, then dump.

You do where against 1.5, then case against 6 then dump.


So, the only "obvious" thing to me would be CASE when processing is much slower than "where" because we both process 1.5 million rows with the same condition.

This is why we need test cases to see what you see..

For a simple test case for me (1,000,000 rows)

select *
from big_table
where object_id+5-to_char(created,'ddyyyy') > 1000000

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 1.51 1.55 11789 14381 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 1.51 1.55 11789 14381 0 0
********************************************************************************
select *
from (
select big_table.* ,
case when object_id+5-to_char(created,'ddyyyy') > 1000000 then 1 end cond
from big_table
)
where cond is not null

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 1.65 1.66 11789 14381 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 1.65 1.67 11789 14381 0 0


shows no significant difference (run this again and the timings are reversed)

select *
from big_table
where object_id+5-to_char(created,'ddyyyy') > 1000000

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 1.71 1.73 11791 14381 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 1.71 1.73 11791 14381 0 0

********************************************************************************
select *
from (
select big_table.* ,
case when object_id+5-to_char(created,'ddyyyy') > 1000000 then 1 end cond
from big_table
)
where cond is not null

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 1.63 1.66 11790 14381 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 1.63 1.66 11790 14381 0 0


so there must be something special about your case, and I'm curious....



Test case

A reader, January 31, 2005 - 10:40 pm UTC

Dont know how I can make a test case, its a plain table with about 100 columns and 1.5 mill rows. The CASE statement does stuff like

case
when the_type=0 and a+b!=c then 'rule1 violated'
when the_type=1 and a+b+c!=d then 'rule2 violated'
...

there are 6 such WHEN conditions

Why would my results be different than yours? Do you know of other instances where "WHERE clause procesing is faster than the same thing in a CASE"?

Thanks

Tom Kyte
February 01, 2005 - 8:39 am UTC

show me the actual query then.

Look -- I'm willing to look into it, try to figure out, file a bug if need be

But -- I need help reproducing your issue. I just showed above "in general, not always true".

Show us the counter case.

Actual query

A reader, February 01, 2005 - 9:07 am UTC

Here is your version

SELECT * FROM (
SELECT
CASE
WHEN transaction_code = 'Z' AND adjustment_type = 0
THEN transaction_id || ' adjustment trade with no adjustment'
WHEN transaction_code != 'Z' AND adjustment_type != 0
THEN transaction_id || ' non-adjustment trade with an adjustment'
WHEN adjustment_type = 0 AND (oid_on_sale_amt + oid_on_deliver_amount + oid_owned_amt + mtm_income_amt + mtm_adj_amt + adj_to_bond_amt) != 0
THEN transaction_id || ' contains unexpected adjustment'
WHEN adjustment_type = 1 AND (oid_on_deliver_amount + oid_owned_amt + mtm_income_amt + mtm_adj_amt + adj_to_bond_amt) != 0
THEN transaction_id || ' contains more than OID on sale adjustment'
WHEN adjustment_type = 2 AND (oid_on_sale_amt + oid_owned_amt + mtm_income_amt + mtm_adj_amt + adj_to_bond_amt) != 0
THEN transaction_id || ' contains more than OID on deliver adjustment'
WHEN adjustment_type = 3 AND (oid_on_sale_amt + oid_on_deliver_amount + mtm_income_amt + mtm_adj_amt + adj_to_bond_amt) != 0
THEN transaction_id || ' contains more than OID owned adjustment'
WHEN adjustment_type = 4 AND (oid_on_sale_amt + oid_on_deliver_amount + oid_owned_amt + mtm_adj_amt + adj_to_bond_amt) != 0
THEN transaction_id || ' contains more than MTM income adjustment'
WHEN adjustment_type = 5 AND (oid_on_sale_amt + oid_on_deliver_amount + oid_owned_amt + mtm_income_amt + adj_to_bond_amt) != 0
THEN transaction_id || ' contains more than MTM adjustment'
WHEN adjustment_type = 6 AND (oid_on_sale_amt + oid_on_deliver_amount + oid_owned_amt + mtm_income_amt + mtm_adj_amt) != 0
THEN transaction_id || ' contains more than bond adjustment'
WHEN adjustment_type NOT IN (0, 1, 2, 3, 4, 5, 6)
THEN transaction_id || ' contains unknown adjustment type'
ELSE NULL
END error_message,a.transaction_id
FROM stagetransactions a
WHERE run_sequence = -1
)
WHERE error_message IS NOT NULL;

Here is my version

SELECT
CASE
WHEN transaction_code = 'Z' AND adjustment_type = 0
THEN transaction_id || ' adjustment trade with no adjustment'
WHEN transaction_code != 'Z' AND adjustment_type != 0
THEN transaction_id || ' non-adjustment trade with an adjustment'
WHEN adjustment_type = 0 AND (oid_on_sale_amt + oid_on_deliver_amount + oid_owned_amt + mtm_income_amt + mtm_adj_amt + adj_to_bond_amt) != 0
THEN transaction_id || ' contains unexpected adjustment'
WHEN adjustment_type = 1 AND (oid_on_deliver_amount + oid_owned_amt + mtm_income_amt + mtm_adj_amt + adj_to_bond_amt) != 0
THEN transaction_id || ' contains more than OID on sale adjustment'
WHEN adjustment_type = 2 AND (oid_on_sale_amt + oid_owned_amt + mtm_income_amt + mtm_adj_amt + adj_to_bond_amt) != 0
THEN transaction_id || ' contains more than OID on deliver adjustment'
WHEN adjustment_type = 3 AND (oid_on_sale_amt + oid_on_deliver_amount + mtm_income_amt + mtm_adj_amt + adj_to_bond_amt) != 0
THEN transaction_id || ' contains more than OID owned adjustment'
WHEN adjustment_type = 4 AND (oid_on_sale_amt + oid_on_deliver_amount + oid_owned_amt + mtm_adj_amt + adj_to_bond_amt) != 0
THEN transaction_id || ' contains more than MTM income adjustment'
WHEN adjustment_type = 5 AND (oid_on_sale_amt + oid_on_deliver_amount + oid_owned_amt + mtm_income_amt + adj_to_bond_amt) != 0
THEN transaction_id || ' contains more than MTM adjustment'
WHEN adjustment_type = 6 AND (oid_on_sale_amt + oid_on_deliver_amount + oid_owned_amt + mtm_income_amt + mtm_adj_amt) != 0
THEN transaction_id || ' contains more than bond adjustment'
WHEN adjustment_type NOT IN (0, 1, 2, 3, 4, 5, 6)
THEN transaction_id || ' contains unknown adjustment type'
ELSE NULL
END error_message,a.transaction_id
FROM stagetransactions a
WHERE
(
(transaction_code = 'Z' AND adjustment_type = 0) OR
(transaction_code != 'Z' AND adjustment_type != 0) OR
(adjustment_type = 0 AND (oid_on_sale_amt + oid_on_deliver_amount + oid_owned_amt + mtm_income_amt + mtm_adj_amt + adj_to_bond_amt) != 0) OR
(adjustment_type = 1 AND (oid_on_deliver_amount + oid_owned_amt + mtm_income_amt + mtm_adj_amt + adj_to_bond_amt) != 0) OR
(adjustment_type = 2 AND (oid_on_sale_amt + oid_owned_amt + mtm_income_amt + mtm_adj_amt + adj_to_bond_amt) != 0) OR
(adjustment_type = 3 AND (oid_on_sale_amt + oid_on_deliver_amount + mtm_income_amt + mtm_adj_amt + adj_to_bond_amt) != 0) OR
(adjustment_type = 4 AND (oid_on_sale_amt + oid_on_deliver_amount + oid_owned_amt + mtm_adj_amt + adj_to_bond_amt) != 0) OR
(adjustment_type = 5 AND (oid_on_sale_amt + oid_on_deliver_amount + oid_owned_amt + mtm_income_amt + adj_to_bond_amt) != 0) OR
(adjustment_type = 6 AND (oid_on_sale_amt + oid_on_deliver_amount + oid_owned_amt + mtm_income_amt + mtm_adj_amt) != 0) OR
(adjustment_type NOT IN (0, 1, 2, 3, 4, 5, 6))
)


Thanks

Tom Kyte
February 01, 2005 - 2:00 pm UTC

I'll need more help reproducing...

drop table stagetransactions;

create table stagetransactions ( transaction_code varchar2(1), adjustment_type number, transaction_id number ,
oid_on_sale_amt number, oid_on_deliver_amount number, oid_owned_amt number, mtm_income_amt number, mtm_adj_amt number, adj_to_bond_amt number , run_sequence
number )
/
insert into stagetransactions
select 'Z', 1, 42, 0, 0, 0, 0, 0, 0, -1
from big_table.big_table where rownum <= 1500000;






SELECT * FROM (
SELECT
CASE
WHEN transaction_code = 'Z' AND adjustment_type = 0
THEN transaction_id || ' adjustment trade with no adjustment'
...
THEN transaction_id || ' contains unknown adjustment type'
ELSE NULL
END error_message,a.transaction_id
FROM stagetransactions a
WHERE run_sequence = -1
)
WHERE error_message IS NOT NULL

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 2.97 2.94 921 6049 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 2.97 2.94 921 6049 0 0

********************************************************************************

SELECT
CASE
WHEN transaction_code = 'Z' AND adjustment_type = 0
THEN transaction_id || ' adjustment trade with no adjustment'
...
THEN transaction_id || ' contains unknown adjustment type'
ELSE NULL
END error_message,a.transaction_id
FROM stagetransactions a
WHERE
(
(transaction_code = 'Z' AND adjustment_type = 0) OR
...
(adjustment_type NOT IN (0, 1, 2, 3, 4, 5, 6))
)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 2.51 2.48 524 6049 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 2.51 2.48 524 6049 0 0


A reader, February 01, 2005 - 9:12 am UTC

Ignore the ELSE NULL part in the CASE expression above, I thought it might make a difference but it didnt

A reader, February 01, 2005 - 2:29 pm UTC

Sigh...dont know why we are getting such different results. I did the exact same thing

SQL> drop table t;

Table dropped.

  1  create table t as select
  2  adj_to_bond_amt,
  3  adjustment_type,
  4  mtm_adj_amt,
  5  mtm_income_amt,
  6  oid_on_deliver_amount,
  7  oid_on_sale_amt,
  8  oid_owned_amt,
  9  transaction_code,
 10  transaction_id,
 11  run_sequence
 12* from stagetransactions
SQL> /

Table created.

SQL> select count(*) from t;

  COUNT(*)
----------
   1415090

SQL> set autotrace traceonly
SQL> /


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   SORT (AGGREGATE)
   2    1     TABLE ACCESS (FULL) OF 'T'




Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       6465  consistent gets
       6462  physical reads
          0  redo size
        494  bytes sent via SQL*Net to client
        651  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Then I ran the 2 queries above and got

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2     61.51      60.41       6462       6466          0           6
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4     61.52      60.42       6462       6466          0           6

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 24  

Rows     Row Source Operation
-------  ---------------------------------------------------
      6  TABLE ACCESS FULL T 


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       2        0.00          0.00
  db file scattered read                        101        0.06          0.81
  SQL*Net message from client                     2        0.02          0.03

for your version

and 

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.02       0.01          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2     10.02      10.09       6462       6466          0           6
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4     10.04      10.10       6462       6466          0           6

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 24

Rows     Row Source Operation
-------  ---------------------------------------------------
      6  TABLE ACCESS FULL T


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       2        0.00          0.00
  db file scattered read                        101        0.10          0.78
  SQL*Net message from client                     2        0.00          0.00
********************************************************************************

for mine

LIOs are identical but huge difference in elapsed time.

Anything else I can look at? init.ora parameters or something? Thanks 

Tom Kyte
February 01, 2005 - 3:45 pm UTC

what is your version, what is your machine (i was doing this on a desktop, a year old desktop -- your machine seems "slow" from the get go)

how big would that dmp file for that table be after being compressed and would you be able to send it?

(DO NOT SEND it just let me know the details)


one more thing, I wonder -- is this a sql_trace bug.... there was one where cpu time would be "over done" and over done differently for different queries.

use this:

------------- mystat --------------
set echo off
set verify off
column value new_val V
define S="&1"

set autotrace off
select a.name, b.value 
from v$statname a, v$mystat b
where a.statistic# = b.statistic#
and lower(a.name) like '%' || lower('&S')||'%'
/
set echo on
-----------------------------------

------------ mystat2 ------------
set echo off
set verify off
select a.name, b.value V, b.value-&V diff
from v$statname a, v$mystat b
where a.statistic# = b.statistic#
and lower(a.name) like '%' || lower('&S')||'%'
/
set echo on

----------------------------------------


and test like this:

ops$tkyte@ORA9IR2> @mystat "cpu used by this session"
ops$tkyte@ORA9IR2> set echo off
 
NAME                                VALUE
------------------------------ ----------
CPU used by this session              568
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> SELECT * FROM (
  2  SELECT
  3      CASE
  4          WHEN transaction_code = 'Z' AND adjustment_type = 0
  5          THEN transaction_id || ' adjustment trade with no adjustment'
...
 25        END error_message,a.transaction_id
 26  FROM stagetransactions a
 27  WHERE run_sequence = -1
 28  )
 29  WHERE error_message IS NOT NULL;
 
no rows selected
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> @mystat2
ops$tkyte@ORA9IR2> set echo off
 
NAME                                    V       DIFF
------------------------------ ---------- ----------
CPU used by this session              856        288
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> @mystat "cpu used by this session"
ops$tkyte@ORA9IR2> set echo off
 
NAME                                VALUE
------------------------------ ----------
CPU used by this session              856
 
ops$tkyte@ORA9IR2> SELECT
  2      CASE
  3          WHEN transaction_code = 'Z' AND adjustment_type = 0
...
 37      (adjustment_type NOT IN (0, 1, 2, 3, 4, 5, 6))
 38  )
 39  /
 
no rows selected
 
ops$tkyte@ORA9IR2> @mystat2
ops$tkyte@ORA9IR2> set echo off
 
NAME                                    V       DIFF
------------------------------ ---------- ----------
CPU used by this session             1110        254
 
 

A reader, February 01, 2005 - 3:53 pm UTC

Oracle 9.2.0.6

Machine is Sun Enterprise 450 running Solaris 8

$ uname -a
SunOS xxxx 5.8 Generic_117350-14 sun4u sparc SUNW,Ultra-4

$ prtconf -v|head
System Configuration:  Sun Microsystems  sun4u
Memory size: 4096 Megabytes
System Peripherals (Software Nodes):

SUNW,Ultra-4

SQL> show sga

Total System Global Area 1499435184 bytes
Fixed Size                   733360 bytes
Variable Size             436207616 bytes
Database Buffers         1056964608 bytes
Redo Buffers                5529600 bytes

pga_aggregate_target                 big integer 1048576000
db_cache_size                        big integer 1056964608
shared_pool_size                     big integer 218103808

ls -l expdat.dmp.gz for that 1 table is 4375307 bytes

mpstat

CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    2   0  333   400  300   57    0    1    1    0   130    1   1   1  98
  1    2   0   33   160  159   93    0    3    3    0    96    0   1   1  98
  2    2   0   29   103  101   95    0    3    1    0   139    1   1   1  98
  3    2   0   24   129  127  100    0    3    1    0   139    1   1   1  98

Thanks 

Tom Kyte
February 01, 2005 - 4:07 pm UTC


go ahead, email the dump file (tkyte@oracle.com).  I might not look at it for a day or two, just getting on a plane to go far away...
....


Ok, turns out it was your specific set of data.


  1* select adjustment_type, count(*) from t  group by adjustment_type
ops$tkyte@ORA9IR2> /
 
A   COUNT(*)
- ----------
1          6
2          2
3          1
5          2
6          6
     1415073
 
6 rows selected.


almost all of your adjustment_types were null, the where clause was better able to "remove them from consideration".

Adding a simple "when then" to the case, makes the case by itself faster (than the where + case)


ops$tkyte@ORA9IR2> SELECT * FROM (
  2  SELECT
  3      CASE<b>
  4              when adjustment_type is null
  5                  then null</b>
  6          WHEN transaction_code = 'Z' AND adjustment_type = 0
  7          THEN transaction_id || ' adjustment trade with no adjustment'
  8          WHEN transaction_code != 'Z' AND adjustment_type != 0
  9          THEN transaction_id || ' non-adjustment trade with an adjustment'
 10          WHEN (oid_on_sale_amt + oid_on_deliver_amount + oid_owned_amt + mtm_income_amt + mtm_adj_amt + adj_to_bond_amt) != 0 and adjustment_type=0
 11          THEN transaction_id || ' contains unexpected adjustment'
 12          WHEN (oid_on_deliver_amount + oid_owned_amt + mtm_income_amt + mtm_adj_amt + adj_to_bond_amt) != 0 and adjustment_type=1
 13          THEN transaction_id || ' contains more than OID on sale adjustment'
 14          WHEN (oid_on_sale_amt + oid_owned_amt + mtm_income_amt + mtm_adj_amt + adj_to_bond_amt) != 0 and adjustment_type=2
 15          THEN transaction_id || ' contains more than OID on deliver adjustment'
 16          WHEN (oid_on_sale_amt + oid_on_deliver_amount + mtm_income_amt + mtm_adj_amt + adj_to_bond_amt) != 0 and adjustment_type=3
 17          THEN transaction_id || ' contains more than OID owned adjustment'
 18          WHEN (oid_on_sale_amt + oid_on_deliver_amount + oid_owned_amt + mtm_adj_amt + adj_to_bond_amt) != 0 and adjustment_type=4
 19          THEN transaction_id || ' contains more than MTM income adjustment'
 20          WHEN (oid_on_sale_amt + oid_on_deliver_amount + oid_owned_amt + mtm_income_amt + adj_to_bond_amt) != 0 and adjustment_type=5
 21          THEN transaction_id || ' contains more than MTM adjustment'
 22          WHEN (oid_on_sale_amt + oid_on_deliver_amount + oid_owned_amt + mtm_income_amt + mtm_adj_amt) != 0 and adjustment_type=6
 23          THEN transaction_id || ' contains more than bond adjustment'
 24          WHEN adjustment_type NOT IN (0, 1, 2, 3, 4, 5, 6)
 25          THEN transaction_id || ' contains unknown adjustment type'
 26        ELSE NULL
 27        END error_message,a.transaction_id, rownum r
 28  FROM t a
 29  WHERE run_sequence = -1
 30  )
 31  WHERE error_message IS NOT NULL;
 

sql_trace

A reader, February 01, 2005 - 3:59 pm UTC

I dont think sql_trace has anything to do with this

even if I do just

set timing on

without any tracing, one SQL is much much slower than the other

Tom Kyte
February 02, 2005 - 4:40 am UTC

see above....

monitor file system growth using external tables

N.Balasubramanian, February 02, 2005 - 12:41 am UTC

Hi Tom
I am using HP-UX. I want the output of the bdf command loaded into a table using external table. Following the example given in your book I am able to do it successfully. However, for certain filesystems, where the filesystem name is long, the result is displayed in two lines. In such cases I have to manually alter the flat file. How to overcome this so that I can automate the entire process

Tom Kyte
February 02, 2005 - 5:02 am UTC

well, if you want the long name -- you'll have to bdf | awk ... and fix it up.

if you just want the truncated name, bdf | grep to filter out the ones you don't need.

else, you'll be writing a plsql loop or using lag/lead to fix it up.


but not having an example to work with, well, makes it hard.

I am using awk

N.Balasubramanian, February 02, 2005 - 7:06 am UTC

I am using awk to make the comma separated file.
But still it doesn't work.
The output of bdf is as follows.

Filesystem kbytes used avail %used Mounted on
/dev/vg00/lvol3 155648 143743 11349 93% /
/dev/vg00/lvol1 63765 27760 29628 48% /stand
/dev/vg02/ltraceora
2097152 1213714 835104 59% /traceora
/dev/vg00/lvol5 524288 119150 380713 24% /tmp
/dev/vg02/ltestzip 5120000 2863016 2116654 57% /testzip
/dev/vg01/lorarac 5324800 3392018 1823235 65% /rac9i
/dev/vg02/lcrashdmp
4194304 2018936 2039445 50% /q4crash
/dev/vg02/lpstage 4194304 2026629 2032213 50% /pstage
/dev/vg01/lotrace 716800 714377 2423 100% /oratrace
/dev/vg02/lorapatch
6553600 6346021 194607 97% /orapatch
/dev/vg02/ltestora 5632000 4692376 882518 84% /ora9i

Moreover, I want all the file systems loaded into the table. So I can't use grep -v to avoid the long ones.

Please help.

Thanks

Tom Kyte
February 02, 2005 - 7:56 am UTC

so modify your awk so that if the kbytes are missing, don't output, get the next line instead (remembering the filesystem) and then output the filesystem kbytes, etc....

Thank You

N.Balasubramanian, February 02, 2005 - 8:28 am UTC

Thank you Tom. I will try that.


Brilliant!

VA, February 02, 2005 - 9:35 am UTC

"almost all of your adjustment_types were null, the where clause was better able to "remove them from consideration""

Your version works like a champ now, much faster than than mine.

But I am not sure I understand what happened here.

My WHERE clause had stuff where all predicates involved equality/inequality using adjustment_type. Since this was NULL for most of the rows, are you saying that Oracle intelligently skipped these rows without evaluating the predicate at all and thats what resulted in the speed boost for my version?

Tom Kyte
February 02, 2005 - 9:49 am UTC

I think (guessing here) that the where clause was able to abort evaluation since adjustment_type was used in the entire thing. (each OR branch returns UNKNOWN if it is null)

the case statement would have tried each of the when's (it isn't a predicate tree, it is a big "function")

by handling the case that would make all other cases moot -- we blow out of the case early - just like the where clause did (faster even)

can not access in procedure

A reader, February 04, 2005 - 5:13 pm UTC

Tom,
I have created a directory object ref_log_dir as '/bsd/log';
I can use this directory from independent block as following :
SQL> declare
  2   v_output_file        utl_file.file_type;
  3  begin
  4        v_output_file := utl_file.fopen( 'REF_LOG_DIR', 'convert.log', 'a');
  5        utl_file.put( v_output_file, 'testing' || USER);
  6        utl_file.new_line(v_output_file);
  7        utl_file.fclose(v_output_file );
  8  end;
  9  /

PL/SQL procedure successfully completed.

If use this directory in a procedure then i get access denied error;

   1  create or replace procedure test_dir IS
  2  v_output_file        utl_file.file_type;
  3  begin
  4        v_output_file := utl_file.fopen( 'REF_LOG_DIR', 'convert.log', 'a
  5        utl_file.put( v_output_file, 'testing' || USER);
  6        utl_file.new_line(v_output_file);
  7        utl_file.fclose(v_output_file );
  8* end;
SQL> /

Procedure created.

SQL> execute test_dir;
BEGIN test_dir; END;

*
ERROR at line 1:
ORA-29289: directory access denied
ORA-06512: at "SYS.UTL_FILE", line 18
ORA-06512: at "SYS.UTL_FILE", line 424
ORA-06512: at "OMSCONV.TEST_DIR", line 4
ORA-06512: at line 1

Do I have give any other grants/permission to directory object in order to use it in a procedure ?

thanks.

 

Tom Kyte
February 05, 2005 - 5:17 am UTC

External Table files are on client machine

Praveen, March 10, 2005 - 5:00 am UTC

Tom,

I am trying to write a procedure which will accept a string as IN parameter which is a directory path. Using this path a 'directory' object is created. External tables are then created based on this path (which will later take part in complex ETL process). Platform 9iR2 on Windows 2000.

a) Is it possible to create directory objects based on network paths? One cannot say in which client machine, the files upon which the external tables are created, resides at a particular time.

b) If not possible, any work-around for the above problem?

Regards
Praveen

Tom Kyte
March 10, 2005 - 7:46 am UTC

mounted file systems are hard in windows.

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:241814624807 <code>

In general, it seems like a bad idea to be creating directories on the fly, seems to me it could lead to some nasty security related things over time.

external tables : field separator and date_format

laurent, March 17, 2005 - 12:45 pm UTC

Hi Tom,

I'm trying to create an external table for the following file sample:
assurance3-x_XGDR00 C /base/oracle/XGDR00 06 Mar 05 08:05:08 06 Mar 05 11:28:00
assurance1-x_pro C /base/oracle/pro 06 Mar 05 08:07:05 06 Mar 05 12:49:00

I need to have it that way (comma separare the fields I want)
assurance3-x_XGDR00,C,/base/oracle/XGDR00,06 Mar 05 08:05:08,06 Mar 05 11:28:00

I think the problem is that my field separator ' ' also appears in the dates fields.

I tried the following but I got an error when I select from the table :

create or replace directory FIT_DIR AS '/projets/fit/home/fitbatch/rp';
grant read on directory FIT_DIR to fit00batch;

create table ext_tina_jour_save_ctrlm  (
   Client                 VARCHAR2(30),
   Strategie              VARCHAR2(10),
   Repertoire_sauvegarde  VARCHAR2(255),
   Date_debut             date,
   Date_fin               date
)
ORGANIZATION EXTERNAL
(       TYPE ORACLE_LOADER
        DEFAULT DIRECTORY FIT_DIR
        ACCESS PARAMETERS ( RECORDS DELIMITED BY NEWLINE
                            NOLOGFILE
                            FIELDS TERMINATED BY ' '
                            (client,
                             strategie,
                             Repertoire_sauvegarde,
                             Date_debut date_format date mask "dd-mon-yy hh24:mi:ss",
                             Date_fin date_format date mask "dd mon yy hh24:mi:ss"
                            )
                            LRTRIM )
        LOCATION ('tina_psa089_jour_save_ctrlm')
);

SQL> select * from ext_tina_jour_save_ctrlm
  2  ;
select * from ext_tina_jour_save_ctrlm
*
ERREUR à la ligne 1 :
ORA-29913: erreur d'exécution de l'appel ODCIEXTTABLEOPEN
ORA-29400: erreur de cartouche de données
KUP-00554: error encountered while parsing access parameters
KUP-01005: syntax error: found "date_format": expecting one of: "comma, char,
date, defaultif, decimal, double, float, integer, (, nullif, oracle_date,
oracle_number, position, raw, recnum, ), unsigned, varrawc, varchar, varraw,
varcharc, zoned"
KUP-01007: at line 7 column 41
ORA-06512: à "SYS.ORACLE_LOADER", ligne 14
ORA-06512: à ligne 1

thanks for your help and sharing your knowledge with us on this website.
Laurent 

error from my 1st post

laurent, March 17, 2005 - 12:59 pm UTC

sorry I copy-pasted the wrong create table :/
This one works gives me no error but no lines are retreived from the file

SQL> select * from ext_tina_jour_save_ctrlm;

aucune ligne sélectionnée

SQL>


create table ext_tina_jour_save_ctrlm  (
   Client                 VARCHAR2(30),
   Strategie              VARCHAR2(10),
   Repertoire_sauvegarde  VARCHAR2(255),
   Date_debut             date,
   Date_fin               date
)
ORGANIZATION EXTERNAL
(       TYPE ORACLE_LOADER
        DEFAULT DIRECTORY FIT_DIR
        ACCESS PARAMETERS ( RECORDS DELIMITED BY NEWLINE
                            NOLOGFILE
                            FIELDS TERMINATED BY ' '
                            (client,
                             strategie,
                             Repertoire_sauvegarde,
                             Date_debut char date_format date mask "dd-mon-yy hh24:mi:ss",
                             Date_fin char date_format date mask "dd mon yy hh24:mi:ss"
                            )
                            )
        LOCATION ('tina_psa089_jour_save_ctrlm')
)
reject limit unlimited;
 

Tom Kyte
March 17, 2005 - 2:02 pm UTC

if you say "fields delimited by ' '" and you have spaces in your fields??? not sure what I can say.

any chance?

laurent, March 17, 2005 - 6:43 pm UTC

well I guess I will have to edit my file using perl to remove spaces into the dates fields

Is there by chance any way I could load that file into an external table without editing it, like using fields delimited by ' ' for the first 3 and fixed length for the last 2 (18 characters for each date) ?

thanks again

Tom Kyte
March 17, 2005 - 6:49 pm UTC

no reason you cannot map the entire line as a single varchar2(4000) and just use SQL on it:


ops$tkyte@ORA9IR2> select substr( x, 1, instr( x, ' ', 1, 1 ) ) ,
  2         substr( x, instr( x, ' ', 1, 1 )+1, 
                       instr( x, ' ', 1, 2 )-instr(x, ' ', 1, 1 ) ) ,
  3         substr( x, instr( x, ' ', 1, 2 )+1, 
                       instr( x, ' ', 1, 3 )-instr(x, ' ', 1, 2 ) ) ,
  4         substr( x, instr( x, ' ', 1, 3 )+1, 18 ),
  5         substr( x, instr( x, ' ', 1, 3 )+1+19 )
  6    from t
  7  /
 
SUBSTR(X,1,INSTR(X,'',1,1))
-------------------------------------------------------------------------------
SUBSTR(X,INSTR(X,'',1,1)+1,INSTR(X,'',1,2)-INSTR(X,'',1,1))
-------------------------------------------------------------------------------
SUBSTR(X,INSTR(X,'',1,2)+1,INSTR(X,'',1,3)-INSTR(X,'',1,2))
-------------------------------------------------------------------------------
SUBSTR(X,INSTR(X,'
------------------
SUBSTR(X,INSTR(X,'',1,3)+1+19)
-------------------------------------------------------------------------------
assurance3-x_XGDR00
C
/base/oracle/XGDR00
06 Mar 05 08:05:08
06 Mar 05 11:28:00
 

External Table Problem

Faisal, June 16, 2005 - 10:41 am UTC

Hi Tom,

I have a problem creating external table dynamically using stored procedure. Here is the info.

I have one external table template.

CREATE TABLE CLM_HWINGCLMAC_EXT
(
TEST_COL1 VARCHAR2(5 BYTE),
TEST_COL2 VARCHAR2(5 BYTE)
)
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
DEFAULT DIRECTORY EISHCLM_DIR
ACCESS PARAMETERS
( RECORDS DELIMITED BY NEWLINE
BADFILE eishclm_dir:'hwingclmac.bad'
FIELDS RTRIM
( test_col1 (1:5),
test_col2 (6:10))
)
LOCATION (EISHCLM_DIR:'hwingclmac.ctl')
)
REJECT LIMIT UNLIMITED;

Following block of code works fine without any problem.

DECLARE
string1 VARCHAR2(2000);
string2 VARCHAR2(2000);
tname VARCHAR2(30) := 'CLM_HWINGCLMAC_EXT';
w_table_owner VARCHAR2(30) := 'CLAIM';
ttype VARCHAR2(30) := 'TABLE';
tnewname VARCHAR2(30)
:= 'CLM_HWINGCLMAC_EXT_NEW';
w_file_name VARCHAR2(30) := 'hwingclmac.ctl';
filedir VARCHAR2(30) := 'EISHCLM_DIR';
BEGIN
SELECT REPLACE(DBMS_METADATA.get_ddl(ttype
,tname
,w_table_owner
)
,tname
,tnewname
)
INTO string1
FROM DUAL;

string2 :=
'alter table '
|| tnewname
|| ' LOCATION ('
|| filedir
|| ':'
|| ''''
|| w_file_name
|| ''''
|| ')';

EXECUTE IMMEDIATE string1;

EXECUTE IMMEDIATE string2;
END;
/

But I did not work using stored procedure

CREATE OR REPLACE PROCEDURE create_ext_tables IS
string1 VARCHAR2(2000);
string2 VARCHAR2(2000);
tname VARCHAR2(30) := 'CLM_HWINGCLMAC_EXT';
w_table_owner VARCHAR2(30) := 'CLAIM';
ttype VARCHAR2(30) := 'TABLE';
tnewname VARCHAR2(30)
:= 'CLM_HWINGCLMAC_EXT_NEW';
w_file_name VARCHAR2(30) := 'hwingclmac.ctl';
filedir VARCHAR2(30) := 'EISHCLM_DIR';
BEGIN
SELECT REPLACE(DBMS_METADATA.get_ddl(ttype
,tname
,w_table_owner
)
,tname
,tnewname
)
INTO string1
FROM DUAL;

string2 :=
'alter table '
|| tnewname
|| ' LOCATION ('
|| filedir
|| ':'
|| ''''
|| w_file_name
|| ''''
|| ')';

EXECUTE IMMEDIATE string1;

EXECUTE IMMEDIATE string2;
END;
/

I got the following error

ORA-01031: insufficient privileges
ORA-06512: at "CLAIM.CREATE_EXT_TABLES", line 33
ORA-06512: at line 2

Please help me!!!

Tom Kyte
June 16, 2005 - 1:10 pm UTC

More Info

A reader, June 16, 2005 - 2:27 pm UTC

Hi Tom,

I am sorry I missed one point, I can create the stored proceure with out any error, error appears when procedure try to create external table using execute immediate.

Regards,


Tom Kyte
June 16, 2005 - 2:46 pm UTC

see same link. It is exactly that reason.

Thanks

Faisal, June 16, 2005 - 4:12 pm UTC

Hi Tom,

Problem resolved, thanks a million!!!



Your suggestion Required

Faisal, June 17, 2005 - 6:07 pm UTC

Hi Tom,

The way I am creating dynmaic external on daily basis is the right approch or there is a better way to do that, need your openion.

CREATE OR REPLACE PROCEDURE create_ext_tables IS
string1 VARCHAR2(2000);
string2 VARCHAR2(2000);
tname VARCHAR2(30) :=
'CLM_HWINGCLMAC_EXT';
w_table_owner VARCHAR2(30) :=
'CLAIM';
ttype VARCHAR2(30) :=
'TABLE';
tnewname VARCHAR2(30)
:=
'CLM_HWINGCLMAC_EXT_NEW';
w_file_name VARCHAR2(30) :=
'hwingclmac.ctl';
filedir VARCHAR2(30) :=
'EISHCLM_DIR';
BEGIN
SELECT REPLACE(DBMS_METADATA.get_ddl(ttype
,tname
,w_table_owner
)
,tname
,tnewname
)
INTO string1
FROM DUAL;

string2 :=
'alter table '
|| tnewname
|| ' LOCATION ('
|| filedir
|| ':'
|| ''''
|| w_file_name
|| ''''
|| ')';

EXECUTE IMMEDIATE string1;

EXECUTE IMMEDIATE string2;
END;

Thanks,


Tom Kyte
June 17, 2005 - 6:59 pm UTC

I don't know - what is your "goal"

More Info.

Faisal, June 17, 2005 - 9:39 pm UTC

Hi Tom,

For my DWH project, We have two source systems, each source systems have 4 companies, each company will transfer 5 flat files (different structure) to DWH. On daily basis, we will receive (2 source systems X 8 companies X 5 types of files = 80 files). The structure of flat files of source A is different from source B, but within the source, structure of files is identical for the companies. So basically I have two different sets of files for multiple companies. Now, in order to load the data in DWH using external tables, I am planning to create two sets of external tables (templates) i.e. I will have 10 external tables permanantly. Once I will receive the data of source A and company C1, I will create the external tables using the pre defined meta data table as I am doing in the above procedure, load the data and then drop the external tables. To achive this functionality, do you think the procedure I wrote is the right approch?

Tom Kyte
June 18, 2005 - 7:11 pm UTC

why do you need to copy the external table definition? why not just alter the file name and load from it? (from the one)

A reader, June 19, 2005 - 10:09 am UTC

Because we have to load muliple companies in parallel, so I can not create one defination, I need seperate tables for each company and source systems.

Regards,

Faisal

Tom Kyte
June 19, 2005 - 11:47 am UTC

ok, so create them all at once. why do you need to create them dynamically?

Faisal, June 19, 2005 - 1:42 pm UTC

Becuase if there is structure change (add new column) then I have to update 8 tables (identical table for 8 companies) but if I have one template (each for source) then the change will be on one place i.e. template. This is the only reason I am going to dynmaic approch.

Tom Kyte
June 19, 2005 - 2:06 pm UTC

"The way I am creating dynmaic external on daily basis "

why on a daily basis.

No problem with scripting the creation, but why create them every single day???? Why not just create them once and when they change, not every time?

Faisal, June 19, 2005 - 9:45 pm UTC

Hi Tom,

I understand your point, my only objective is to change the defination of template external table once, and the copy will be created on fly, but I think you are right, there is no need to drop and create them on daily basis, I can create them once which are 80 tables, and if there is any change which will be quite often because it is the start of project but after some time it will be stable and really there is no worth to do dynnamic. Thanks for guidence....

Faisal,

Implicit Ordering for External Tables

John Gilmore, June 28, 2005 - 7:35 am UTC

Hi Tom,

I know that there is no explicit ordering of data in a normal table but can I assume that data read by SQL*Loader from an external flat file will be ordered in the same order as the physical file?

What I'm trying to do is read from an external table into a physical table. In the process, I want to include a pseudo-column based on a sequence so that selecting from the physical table (and ordering by the sequence column) will give the same order that is present in the external file. Will this work or is it not supported? Any suggestions for a workaround would be gratefully accepted.

Tom Kyte
June 28, 2005 - 8:06 am UTC

as long as there is no parallel/set operations the file is a sequential file. if you do parallel or anything "set" oriented that implies a group or a sort of any kind, it'll be whatever order it feels like.

(rownum is useful here -- select rownum R, et.* from et)

Interesting

John Gilmore, June 29, 2005 - 2:44 am UTC

So an explicit order in external tables is supported. Do you know if this is documented anywhere? I couldn't find anything in the documentation to suggest that this was a supported feature.

Tom Kyte
June 29, 2005 - 8:55 am UTC

as long as you don't order it, group it, "set it", parallel it, it has to read the sequential file from top to bottom. I do not believe it to be documented, no.

External Table with POSITIONal and DECODE funcation

Ramasamy, July 18, 2005 - 11:09 am UTC


COntrol File STructure

LOAD DATA
TRUNCATE
INTO TABLE table1
(column1 POSITION(1:8) "LTRIM(:column1,'0')",
column2 POSITION (54:54) "DECODE:column1,'O',1,'S',3,2)" ,
...

How can i create an external table with calling fuctions for positional fields like i do as above in control files?.
Mainly the functions like DECODE , LTRIM etc.

Thanks a lot for your help.

Tom Kyte
July 18, 2005 - 11:19 am UTC

$ sqlldr u/p controlfile.name EXTERNAL_TABLE=GENERATE_ONLY

let sqlldr do the heavy lifting, the log file from that (it won't load ANYTHING) will have the external table definition.

What you'll discover is the SQL functions are used in .... SQL.


The ET doesn't have ltrim, decode, the SELECT against the ET does.



bit late, but ...

Gabe, July 18, 2005 - 11:43 am UTC

John Gilmore said ...
<quote>What I'm trying to do is read from an external table into a physical table. In the process, I want to include a pseudo-column based on a sequence so that selecting from the physical table (and ordering by the sequence column) will give the same order that is present in the external file. Will this work or is it not supported? Any suggestions for a workaround would be gratefully
accepted.
</quote>

RECNUM in the external table definition?

Tom Kyte
July 18, 2005 - 12:17 pm UTC

RECNUM is ok as long as you don't need it to be sequential like rownum would be, yes.

Using External Table

Yoav, September 13, 2005 - 3:22 am UTC

Hi Mr. Kyte

I would like to know if its possible to do the folling task using external table:
I have a directory on windows map network drive.
I need to read the content of all the files in that directory.(the the names of the files is there created date).
Each file contain one header line that should by insert to DEPT table, and the reast on the lines
should by insert to EMP table.

q1. How can i direct the first line to DEPT table and the rest of the line to EMP table using external table.
q2. As i said the name of the files is changing. Is it possible to change dynamicly the value of the location clause.

Thank you.


Tom Kyte
September 13, 2005 - 12:05 pm UTC

q1)
insert
when r = 1 then into t1 (username, user_id, created) values (username,user_id,created)
when r <>1 then into t2 (username, user_id, created) values (username,user_id,created)
select rownum r, username, user_id, created
from external_table;


q2) you can use alter table to change the file location referenced by the external table.


However, your two main hurdles will be:

a) getting the list of files (java stored procedure can do that)
b) getting Oracle the ability to see that directory, it is not easy like Unix would be.

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:241814624807 <code>


External Table

Yoav, September 14, 2005 - 4:25 am UTC

Hi Mr.Kyte
Your answer was very helpful.
I would like to ask one more question:
What if the structure of the first line in the file is different from the rest of the line in the same file ?
is its still possible to read the content of the file using external table?

In the following example, the first line represent the depratment the users belong to, and it separated
by "@" while the reset of the lines include details about the employees are separated by comma.

10@FINANCE@NEY YORK
RAN,132,01-JUL-2005
MICK,201,01-APR-2005
JOHN,228,01-MAR-2005

Thanks You!!

Tom Kyte
September 14, 2005 - 8:31 am UTC

You would probably want to use two different external tables.

One, you just query the first row.

The other, you have it SKIP the first row (you can do that in the syntax of the external table create "SKIP 1"


else, you would have to generically map all of the fields of the widest table (the one with the most columns) to varchar2's and use to_date/to_number to convert them all.

whitespace before and after in the column data

Vin, November 08, 2005 - 9:08 pm UTC

Oracle 10.2.0.1.0
With the external table as defined below with seed data as given
Seed Data Example: whitespace exist for second column data
whose lenght is 11
'whitespace"C"whitespacewhitespace"P64"whitespacewhitespace"2"whitespace"

2002, C P64 2 ,180390130
2002, EW65 2 ,180445120

create table ET_VIN
(
MYEAR varchar2(4),
MVIN varchar2(11),
MCODE varchar2(9)
)
organization external
( type oracle_loader default directory ff_mvv_dir
access parameters ( records delimited by ','
fields notrim
( MYEAR, MVIN, MCODE )
)
location ('mvv.csv')
)
reject limit unlimited;

also tried the below format without success
fields notrim
( MYEAR char(4),
MVIN char(11),
MCODE char(9)
)

Also tried using SQLLDR with the control file defined below
load data
infile 'mvv.csv'
append into table MVV
fields terminated by ","
optionally enclosed by '"'
trailing nullcols
(
MYEAR ,
MVIN char "substr(:VIN,1,11)",
MCODE
)

Is there anyway to load those "whitespace" along with other characters. The requirement was after loading this col(11), i need to split this one column of size 11 and load into 11 columns of size 1, thus loading those white space was important to get the correct position of any character in the col lenght of 11.
As always appreciate your help, in giving your valuable time in educating and helping the Oracle user community.

whitespace before and after in the column data

Vin, November 11, 2005 - 9:36 am UTC

Just so that you know, this worked for me using sqlldr.
(
MYEAR position(1:4),
MVIN position(6:16),
MCODE position(18:26)
)

I think i could using the same positioning of data when using the External Tables. Please correct me if I am wrong.
As always your sharing of these tricks and knowledge, and the valuable time you spare is immensely appreciated.


Tom Kyte
November 12, 2005 - 8:35 am UTC

take your control file and run

sqlldr u/p controlfilename external_table=generate_only

and you'll have your external table in the log file.

whitespace before and after in the column data

Vin, November 11, 2005 - 7:54 pm UTC

This worked for me while using a External Table.

create table ET_VIN
(
MYEAR varchar2(4),
MVIN varchar2(11),
MCODE varchar2(9)
)
organization external
( type oracle_loader default directory ff_mvv_dir
access parameters ( records delimited by newline
nologfile
fields terminated by ','
optionally enclosed by '"' and '"' notrim
missing field values are null
reject rows with all null fields
(MYEAR position(1:4),
MVIN position(6:16),
MCODE position(18:26)
)
)
location ('mvv.csv')
)
reject limit unlimited;

Regards


Very Simple SQLPlus Script Calls - @@ - C:\Documents and Settings\

Prince Faran, November 18, 2005 - 10:27 am UTC

Hello Tom

I have a two very simple SQLPlus script files script1 and script2 placed within same folder. One script is calling other using @@. The fully qualified names of these scripts and their codes are as under:


-- C:\Documents and Settings\Administrator\My Documents\My Scripts\script1.sql
======================start of script=====================
connect scott/tiger;
@@script2.sql
=======================end of script======================


-- C:\Documents and Settings\Administrator\My Documents\My Scripts\script2.sql
======================start of script=====================
SELECT user FROM dual;
=======================end of script======================


Here you can see what happend, when I called script1 from sqlprompt:

SQL> @C:\Documents and Settings\Administrator\My Documents\My Scripts\script1.sql
SP2-0310: unable to open file "C:\Documents.sql"
SQL> -- Sorry, I forgot to enclose the filename in double quotes.
SQL> @"C:\Documents and Settings\Administrator\My Documents\My Scripts\script1.sql"
Connected.
SP2-0310: unable to open file "C:\Documents.sql"
SQL> -- connect command in script1 worked but script1 could not call script2 within the same folder

I guess if you are using Oracle9i on Windows2003, the @@ cannot handle current path value including white spaces in it (using ordinary way). Can you guide me, what would be solution of above-mentioned scenario?

Thanks

Faran 

Tom Kyte
November 18, 2005 - 3:38 pm UTC

I suppose to not use spaces in path names - you've diagnosed the issue all of the way.

spaces in path names kill so many things (not just oracle things).

RE: Problem with External table and Timestamps

Kevin, November 28, 2005 - 2:06 pm UTC

Tom:

I have a table which has some date fields in it:

Date_Table
============
FieldA DATE,
FieldB DATE

The data is coming from a MSSQL server in the format of a timestamp:

outfile.dat
------------------

2005-11-11 00:00:00.000||2003-12-10 00:00:00.000||

I setup the externally organized table as:

CREATE TABLE test_file_extern(SYS_UPDATE_DATE DATE,
DW_LOAD_DT DATE)
ORGANIZATION EXTERNAL (
TYPE ORACLE_LOADER
DEFAULT DIRECTORY export_dir
ACCESS PARAMETERS
( RECORDS DELIMITED BY NEWLINE
FIELDS TERMINATED BY '||'
(SYS_UPDATE_DATE TIMESTAMP 'YYYY-MM-DD HH:MI:SSxFF3', DW_LOAD_DT TIMESTAMP 'YYYY-MM-DD HH:MI:SSxFF3')
)
LOCATION ('somefile.dat')
)
REJECT LIMIT 0;

The documentation for 9i CLEARLY states that TIMESTAMP is a valid input format:

"The opaque_format_spec lets you list the parameters and their values. Please refer to Oracle9i Database Utilities for information on how to specify values for the opaque_format_spec"

and

Table 6-2 Datatype Conversions for Datetime and Interval Datatypes
SQL*Loader Datatype Oracle Database Datatype (Conversion Support)
N = N (Yes), C (Yes), D (No), T (No), TS (No), YM (No), DS (No)
C = N (Yes), C (Yes), D (Yes), T (Yes), TS (Yes), YM (Yes), DS (Yes)
D = N (No), C (Yes), D (Yes), T (No), TS (Yes), YM (No), DS (No)
T = N (No), C (Yes), D (No), T (Yes), TS (Yes), YM (No), DS (No)
TS = N (No), C (Yes), D (Yes), T (Yes), TS (Yes), YM (No), DS (No)
YM = N (No), C (Yes), D (No), T (No), TS (No), YM (Yes), DS (No)
DS = N (No), C (Yes), D (No), T (No), TS (No), YM (No), DS (Yes)

and
date_format_spec ::=

DATE_FORMAT
{
{ DATE | { TIME | TIMESTAMP } [WITH TIMEZONE] } MASK \"date/time mask\"
| INTERVAL { YEAR_TO_MONTH | DAY_TO_SECOND }
}
For information on how to read this syntax, see

However, when I try to select data from the table I get the following error:

ERROR at line 1:
ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-00554: error encountered while parsing access parameters
KUP-01005: syntax error: found "timestamp": expecting one of: "comma, char,
date, defaultif, decimal, double, float, integer, (, nullif, oracle_date,
oracle_number, position, raw, recnum, ), unsigned, varrawc, varchar, varraw,
varcharc, zoned"
KUP-01007: at line 7 column 38
ORA-06512: at "SYS.ORACLE_LOADER", line 14
ORA-06512: at line 1

For the life of me I can't figure out why... The documentation says I can, and so does tons of other examples.

I'm using the following:

Oracle Version: 9iR2 v9.2.0.6
OS Version: Solaris v5.9

Any ideas?

Tom Kyte
November 28, 2005 - 3:29 pm UTC

Nullable FKs and external tables

Sam Chowdary, November 28, 2005 - 3:11 pm UTC

We depend upon external tables a lot to import data from CSV format files. Though we could overcome many other issues, we could not over come this issue of nullable FKs.

Suppose there is master and detail tables. Detail table has master table FK as nullable. When we try to load data from external tables, we need to have some data in all FK columns - otherwise we can not import! Till now we overcame this by having dummy records in all master tables and carry dummy as data in FK columns in child tables (where ever data is not required). This approach is not looking good.

Do you have any suggestion to over come this issue?

Thanks in advance.

Tom Kyte
November 28, 2005 - 3:34 pm UTC

why cannot you import?

If the fkey columns are allowed to be NULL,set them null and away you go?

problem is not clear, do you have an example?

Nullable FKs and external tables

A reader, November 28, 2005 - 3:58 pm UTC

Problem here is with state_id. Unless I provide some dummy data in parent (state) and child (supplier -> SPLR_TAG) I am not able to load external data.

1) SUPPLIER table

SPLR_ID NOT NULL NUMBER(10)
SPLR_TAG NOT NULL VARCHAR2(50)
SPLR_NAME VARCHAR2(50)
SPLR_ADDRS VARCHAR2(100)
SPLR_CITY VARCHAR2(50)
STATE_ID NUMBER(10) -> Nullable FK to STATE parent table
SPLR_ZIP VARCHAR2(20)
CNTRY_ID NOT NULL NUMBER(10) -> FK to COUNTRY parent table
TZONE_ID NOT NULL NUMBER(10) -> FK to TIMEZONE parent table
CALN_ID NOT NULL NUMBER(10) -> FK to CALENDAR parent table

2) Script to load external data.

create table SUPPLIER_EXT
(
SPLR_TAG VARCHAR2(50),
SPLR_NAME VARCHAR2(50),
SPLR_ADDRS VARCHAR2(100),
SPLR_CITY VARCHAR2(50),
STATE_TAG VARCHAR2(50),
SPLR_ZIP VARCHAR2(20),
CNTRY_TAG VARCHAR2(50),
TZONE_TAG VARCHAR2(50),
CALN_TAG VARCHAR2(50)
)
ORGANIZATION EXTERNAL
( type oracle_loader default directory data_dir
access parameters
(
RECORDS DELIMITED BY NEWLINE
BADFILE bad_dir:'SUPPLIER.bad'
SKIP 1
LOGFILE log_dir:'SUPPLIER.log'
fields terminated by ','
MISSING FIELD VALUES ARE NULL
)
location('SUPPLIER.csv'))
REJECT LIMIT UNLIMITED;


MERGE INTO SUPPLIER
USING (
SELECT SUPPLIER_EXT.*, t271.STATE_ID "STATE_ID", t272.CNTRY_ID "CNTRY_ID", t273.TZONE_ID "TZONE_ID", t274.CALN_ID "CALN_ID"
FROM SUPPLIER_EXT, STATE t271, COUNTRY t272, TIMEZONE t273, CALENDAR t274
WHERE SUPPLIER_EXT.STATE_TAG = t271.STATE_TAG AND SUPPLIER_EXT.CNTRY_TAG = t272.CNTRY_TAG AND SUPPLIER_EXT.TZONE_TAG = t273.TZONE_TAG AND SUPPLIER_EXT.CALN_TAG = t274.CALN_TAG
) ABC ON ( SUPPLIER.SPLR_TAG = ABC.SPLR_TAG)
WHEN MATCHED THEN
UPDATE SET
SUPPLIER.SPLR_NAME = ABC.SPLR_NAME, SUPPLIER.SPLR_ADDRS = ABC.SPLR_ADDRS, SUPPLIER.SPLR_CITY = ABC.SPLR_CITY, SUPPLIER.STATE_ID = ABC.STATE_ID, SUPPLIER.SPLR_ZIP = ABC.SPLR_ZIP, SUPPLIER.CNTRY_ID = ABC.CNTRY_ID, SUPPLIER.TZONE_ID = ABC.TZONE_ID, SUPPLIER.CALN_ID = ABC.CALN_ID
WHEN NOT MATCHED THEN
INSERT (
SUPPLIER.SPLR_ID, SUPPLIER.SPLR_TAG, SUPPLIER.SPLR_NAME, SUPPLIER.SPLR_ADDRS, SUPPLIER.SPLR_CITY, SUPPLIER.STATE_ID, SUPPLIER.SPLR_ZIP, SUPPLIER.CNTRY_ID, SUPPLIER.TZONE_ID, SUPPLIER.CALN_ID
) VALUES (
SEQ_SUPPLIER.nextval, ABC.SPLR_TAG, ABC.SPLR_NAME, ABC.SPLR_ADDRS, ABC.SPLR_CITY, ABC.STATE_ID, ABC.SPLR_ZIP, ABC.CNTRY_ID, ABC.TZONE_ID, ABC.CALN_ID
);
drop table SUPPLIER_EXT;


Tom Kyte
November 28, 2005 - 11:44 pm UTC

still don't get it. why is there a problem?


can you take the example down to:

a) simple table - just one column and one FK column should be sufficient
b) supply everything needed for simple example.


If the incoming fk is null - what is the issue?

correction

A reader, November 28, 2005 - 4:01 pm UTC

PS: read SPLR_TAG as STATE_TAG

Problem here is with state_id. Unless I provide some dummy data in parent
(state) and child (supplier -> STATE_TAG) I am not able to load external data.




Issue with import

A reader, November 30, 2005 - 7:33 pm UTC

When I try to import SUPPLIER with empty state_tag, then it does not import any records. May be because of the SQL is trying to match parent record and not able to find one?

1) SUPPLIER table

SPLR_ID NOT NULL NUMBER(10)
SPLR_TAG NOT NULL VARCHAR2(50)
STATE_ID NUMBER(10) -> Nullable FK to STATE parent table

2) Script to load external data.

create table SUPPLIER_EXT
(
SPLR_TAG VARCHAR2(50),
STATE_TAG VARCHAR2(50)
)
ORGANIZATION EXTERNAL
( type oracle_loader default directory data_dir
access parameters
(
RECORDS DELIMITED BY NEWLINE
BADFILE bad_dir:'SUPPLIER.bad'
SKIP 1
LOGFILE log_dir:'SUPPLIER.log'
fields terminated by ','
MISSING FIELD VALUES ARE NULL
)
location('SUPPLIER.csv'))
REJECT LIMIT UNLIMITED;


MERGE INTO SUPPLIER
USING (
SELECT SUPPLIER_EXT.*, t271.STATE_ID "STATE_ID",
FROM SUPPLIER_EXT, STATE t271
WHERE SUPPLIER_EXT.STATE_TAG = t271.STATE_TAG ) ABC ON ( SUPPLIER.SPLR_TAG = ABC.SPLR_TAG)
WHEN MATCHED THEN
UPDATE SET ABC.STATE_ID
WHEN NOT MATCHED THEN
INSERT (
SUPPLIER.SPLR_ID, SUPPLIER.SPLR_TAG, SUPPLIER.STATE_ID
) VALUES (
SEQ_SUPPLIER.nextval, ABC.SPLR_TAG, ABC.STATE_ID
);
drop table SUPPLIER_EXT;





Tom Kyte
November 30, 2005 - 9:14 pm UTC

I need a full example, in the fashion I provide to you on this site - something anyone could run.

Problem loading data from a file with variable length

Knut Billerbeck, January 12, 2006 - 9:13 am UTC

Hi Tom,
I have a file with the following contense:

0028This is the first record0029This is the second record

I would like to get the contense of such a file into an external table. The Problem is when I declare the record with a varible length, let me say 'RECORDS VARIABLE 4', then the loader expects the 4 bytes to be excluded from the number of bytes to be readed (means the loader would read 28 bytes after the 0028) but I have them included (means the loader should read only 28-4=24 bytes after the 0028). How can I do this?
Thanks in advance
Knut

Tom Kyte
January 12, 2006 - 11:04 am UTC

not sure that you can

Alex, March 22, 2006 - 2:44 pm UTC

Hi Tom,

Sorry to bother you with such an easy question, but I cannot find the answer anywhere. All I want to do is specify a default value for a column in my external table.
So say I'm loading 3 columns, but only two have values in my input file, I want to use the same value for all the rows for this third column.

I received an error when trying to use the default clause on the column in my external table, as if it were a regular table. I hope that makes sense.

Tom Kyte
March 22, 2006 - 5:14 pm UTC

define "default value" in the context of an external table?

do you mean "if this field is null return this"
do you mean "add this field with this constant value"

Either of which is easily done via a view:


create view et_view
as
select et.*, nvl(that_column,'x') case_1,
'y' case_2
from et;


and use the view, not the external table.

Alex, March 22, 2006 - 5:43 pm UTC

Ok what I mean is, let's say the data in my file has the value for two columns, delimited by '|', e.g.

Joe Stevens | 603-321-1234

But the Oracle table I will copying the external table data into has three columns: name, number, and data_tag. The data_tag column is something I will be using to identify the source of the data. So, other than copying and pasting in textpad over and over for every row "| data_load", can I default that value to be loaded for the data_tag column?
Hope that is more clear, thank you.

Tom Kyte
March 22, 2006 - 7:13 pm UTC

did you see my answer above? "create view" and done.

or just select the constant in your query.

EXTENAL TABLE - Selects.

VIKAS, April 03, 2006 - 3:31 am UTC

Dear Mr. Kyte

It is required by our application to select some hardcoded values from a Flat Notepad Text File. For this i thought of using External tables. I've tried creating the Tables as :- 


create or replace directory EXTTAB '\\vikas\E:\Shared\EXTTAB';

Directory created

Subsequently, In the above created Direcory, I've created a Text file "XtrLeads.txt" 

(Having values 
32, 001, 
32, 002, 
32, 003, ... and so on)

This file would be used hold the above values to be  Pick/Selected by the External table below.

Then, 

SQL> create table EXTLEADS
  2  (
  3    OID number(5),
  4    LID NUMBER(8)
  5  )
  6  organization external
  7  (
  8    type oracle_loader
  9    default directory EXTTAB
 10    access parameters
 11    (
 12      fields terminated by ','
 13      records delimited by newline
 14      nobadfile
 15      nodiscardfile
 16      nologfile
 17    )
 18     location('XtrLeads.txt')
 19  )
 20  reject limit unlimited
 21  /

Table created

But now, when I

Select * from EXTLEADS;

I get follwing messages...

ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-00554: error encountered while parsing access parameters
KUP-01005: syntax error: found "records": expecting one of: "enclosed, exit, (, ltrim, lrtrim, ldrtrim, missing, notrim, optionally, rtrim, reject"
KUP-01007: at line 2 column 5
ORA-06512: at "SYS.ORACLE_LOADER", line 14
ORA-06512: at line 1

Here, I have my Oracle atabase Server on Fedora Core 4 Linux, My client machine is Windows XP (where I am also creating the above Directory)and Lastly, I have not configured the UTL_FILE_DIR paramater(its Default, empty).

Please tell me where am I going wrong? and how can I do it right and achive the desired results?

Take care, regards...
 

Tom Kyte
April 03, 2006 - 8:12 am UTC

you do know that windows is not like Unix - it is not a multi-user OS really. That share - not like a mount point in Unix - it likely will NOT be visible to the server at all - the file system you are trying to read just wont be there. Getting a windows service able to see a filesystem is "not trivial"


access parameters
(
records delimited by newline
nobadfile
nodiscardfile
nologfile
fields terminated by ','
)
location('XtrLeads.txt')
)


EXTERNAL TABLE - Selects

VIKAS, April 04, 2006 - 2:12 am UTC

Dear Mr. Kyte,

Thanx for the reply to the above subjected post.

Based on above reply, I would like to know from you that, Then how can we achieve this on windows evironment?

I mean how to create and make use External Tables in a client server netweork where Database server is Linux based and client machine is Windows XP and a user wants to make use of his Windows client machine to create External Tables and make use of them.

Also, When ever I try to create the directory on my client machine(Windows XP), the creation is successful, but, when I try to physically check the existence of my diretory it is not there. Where actually is the directory being created? is it created into the database server? if not then where?

Why is this command successful?
SQL > create directory dir1 as 'C:\dir1'

What workarounds should one try in such conditions?

Take care, Regards...


Tom Kyte
April 04, 2006 - 9:54 am UTC

if the database server is Linux, you can MOUNT file systems on linux and linux will be able to see it.

it would be if the server was WINDOWS - and since you used a UNC name in your directory.... it would appear that it was.

SELECT COUNT(*) VS SELECT * ON EXTERNAL TABLES ...

VKOUL, April 04, 2006 - 1:59 pm UTC

SQL>
SQL> select * from iislog_web1;

no rows selected

SQL> select count(*) from iislog_web1;

  COUNT(*)
----------
        44

SQL>
SQL>
SQL> !file iislog_web1.dat
iislog_web1.dat: ASCII text, with CRLF line terminators

SQL>


Q1 : This file contains some unwanted characters at the end of each line, which makes the external table to reject all records, why COUNT(*) shows 44 records VS * ?

Q2 : Is there a way in an external table to clean & load file without running a UNIX command like "tr -cd '\11\12\40-\176' < input_file > output_file"

Thanks
 

Tom Kyte
April 04, 2006 - 7:32 pm UTC

and how did you create the external table.

SELECT COUNT(*) VS SELECT * ON EXTERNAL TABLES ...

VKOUL, April 05, 2006 - 3:17 pm UTC

CREATE TABLE iislog_web1
(
"date" VARCHAR2(10),
"time" VARCHAR2(8),
"s-ip" VARCHAR2(15),
"cs-method" VARCHAR2(4),
"cs-uri-stem" VARCHAR2(500),
"cs-uri-query" VARCHAR2(500),
"c-ip" VARCHAR2(15),
"sc-status" NUMBER,
"sc-bytes" NUMBER,
"cs-bytes" NUMBER,
"time-taken" NUMBER
)
ORGANIZATION EXTERNAL
(
TYPE ORACLE_LOADER
DEFAULT DIRECTORY report_dat_dir
ACCESS PARAMETERS
(
RECORDS DELIMITED BY NEWLINE
BADFILE report_bad_dir:'iislog_web1.bad'
DISCARDFILE report_bad_dir:'iislog_web1.dis'
LOGFILE report_log_dir:'iislog_web1.log'
SKIP 4
FIELDS TERMINATED BY ' '
REJECT ROWS WITH ALL NULL FIELDS
(
"date",
"time",
"s-ip",
"cs-method",
"cs-uri-stem",
"cs-uri-query",
"c-ip",
"sc-status",
"sc-bytes",
"cs-bytes",
"time-taken"
)
)
LOCATION ('iislog_web1.dat')
)
REJECT LIMIT UNLIMITED
/


Tom Kyte
April 06, 2006 - 9:37 am UTC

I assume you are in 9i - in 10g you can control this and by default, in 10g the following script:

create or replace directory report_bad_dir as '/tmp/'
/
create or replace directory report_log_dir as '/tmp/'
/
create or replace directory report_dat_dir as '/tmp/'
/

!echo a b c d e f g 1 2 3 4 > /tmp/iislog_web1.dat
!echo a b c d e f g 1 2 3 4 >> /tmp/iislog_web1.dat
!echo a b c d e f g 1 2 3 4 >> /tmp/iislog_web1.dat
!echo a b c d e f g 1 2 3 4 >> /tmp/iislog_web1.dat
!echo a b c d e f g 1 2 3 4 >> /tmp/iislog_web1.dat
!echo a b c d e f g 1 2 3 a >> /tmp/iislog_web1.dat


drop TABLE iislog_web1;
CREATE TABLE iislog_web1
(
...... your create here .....
/

select * from iislog_web1;
select count(*) from iislog_web1;
select "date", "time", "s-ip" from iislog_web1;
select "date", "time", "s-ip", "time-taken" from iislog_web1;


produces:

ops$tkyte@ORA10GR2> select * from iislog_web1;

date       time     s-ip            cs-m
---------- -------- --------------- ----
cs-uri-stem
-------------------------------------------------------------------------------
cs-uri-query
-------------------------------------------------------------------------------
c-ip             sc-status   sc-bytes   cs-bytes time-taken
--------------- ---------- ---------- ---------- ----------
a          b        c               d
e
f
g                        1          2          3          4


ops$tkyte@ORA10GR2> select count(*) from iislog_web1;

  COUNT(*)
----------
         1

ops$tkyte@ORA10GR2> select "date", "time", "s-ip" from iislog_web1;

date       time     s-ip
---------- -------- ---------------
a          b        c

ops$tkyte@ORA10GR2> select "date", "time", "s-ip", "time-taken" from iislog_web1;

date       time     s-ip            time-taken
---------- -------- --------------- ----------
a          b        c                        4


but in 9i, it'll:


ops$tkyte@ORA9IR2> select * from iislog_web1;

date       time     s-ip            cs-m
---------- -------- --------------- ----
cs-uri-stem
-------------------------------------------------------------------------------
cs-uri-query
-------------------------------------------------------------------------------
c-ip             sc-status   sc-bytes   cs-bytes time-taken
--------------- ---------- ---------- ---------- ----------
a          b        c               d
e
f
g                        1          2          3          4


ops$tkyte@ORA9IR2> select count(*) from iislog_web1;

  COUNT(*)
----------
         2

ops$tkyte@ORA9IR2> select "date", "time", "s-ip" from iislog_web1;

date       time     s-ip
---------- -------- ---------------
a          b        c
a          b        c

ops$tkyte@ORA9IR2> select "date", "time", "s-ip", "time-taken" from iislog_web1;

date       time     s-ip            time-taken
---------- -------- --------------- ----------
a          b        c                        4


in 9i, it only parses the fields you select - resulting in potentially different record counts if you select (or not) a field that causes a record to be rejected...



One solution would be to map each field as a varchar2(4000) instead and perform the validation/translation in the select itself.


select substr( f1, 1, 20 ) f1, 
       ...
       to_number( substr( translate(f10, ... ) , ..... ) )
  from et
 where <any condition to filter bad data>

 

SELECT COUNT(*) VS SELECT * ON EXTERNAL TABLES ...

VKOUL, April 06, 2006 - 1:49 pm UTC

You are right Tom,

I am using 9i here, but one thing I do not understand, since there are 6 records, why it only shows 2 in COUNT(*)?

$ cat iislog_web1.dat | wc -l
6
$

The records which are replicated show only once, I wonder !!

The showing of COUNT(*) it seems to me is, it shows actual number of records in the file. But when you select specific column/(s), it loads only those which actually is kind of cool (I don't think we can do that in SQL Loader that easily) as it shows if the COUNT(*) and that particular column selected produces the same number of records, then the column data is free from errors. Which means just by selecting a column or a group of columns you can determine which column data is errornous.

In 10g, as you showed it gives only the successful load of records in external table. Hmm...


Tom Kyte
April 07, 2006 - 8:18 pm UTC

in 10g, you get to "pick" the behaviour - there is an option that lets the parser accept any record as long as the SELECTED columns parse OK.

SELECT COUNT(*) VS SELECT * ON EXTERNAL TABLES ...

VKOUL, April 06, 2006 - 1:55 pm UTC

For showing less number of records, I didn't realize there was a SKIP 4 in the definition.

I looked here, but it didn't answer my question

DW, April 12, 2006 - 7:28 pm UTC

I have a question about utl_file.fgetattr, but I'll take your "fair warning" to heart and I won't ask it here--it isn't directly related to the original post. I've been using this site regularly for three years and I'm a huge fan. I usually find great answers to my questions in pre-existing threads, but I'm out of luck if I can't find an answer. On at least one occasion my question was a direct follow-on to a particular thread, so I was able to ask it in the review without guilt, but not today. In three years I have never ever seen anything but "Sorry, I have a large backlog..." on the home page. You're a popular guy, Tom.

records delimited by newline option in external tables

PADMAJA, May 31, 2006 - 11:25 am UTC

Hello tom,
I have a text file on Unix box. And I created an external table as follows.
create table xt_tablenm (
-- place oracle field definitions here
--------------------------------------
field1 VARCHAR2(20),
field2 VARCHAR2(255),
field3 VARCHAR2(4),
field4 VARCHAR2(4),
field5 VARCHAR2(255),
field6 VARCHAR2(10),
field7 varchar2(8)
--------------------------------------
) organization external
( type oracle_loader default directory defdir access parameters
( records delimited by newline
badfile ext_bad: filenm.bad
logfile ext_log: filenm.log
fields terminated by '|' optionally enclosed by '"' rtrim missing
field values are null (
-- place physical field definitions here
----------------------------------------
field1,
field2,
field3,
field4,
field5,
field6,
field7
----------------------------------------
)) location ( defdir : filenm.txt))
reject limit 50 nologging nocache parallel ( degree 8 instances 1 );


And the field5 holds comments data, which has newline inside it. So because of
'records delimited by newline' oracle is terminating the comments data to
second record where ever newline occurs.

Example data:

"1111"|"field2 data"|"0 "|"1 "|"field5 data comes here.
And has few new lines in this data. "|"23122006"|"adf1"|
"1112"|"field2 data"|"2 "|"2 "|"field5 data comes here.
And has few new lines in this data. "|"23122006"|"adf2"|


in "field5 data comes here.
And has few new lines in this data. " data there is a newline so after new line rest of the comments go to socond record and i get error there that data is no correct.

Can you please sujest that there is any option i can use to no to do this

Thanks in Advance
Padmaja


Tom Kyte
May 31, 2006 - 3:21 pm UTC

You cannot do this unless you have a well defined record TERMINATOR.

Is is assured that your logical record ends with |\n ??

Every record ends with |<newline>?


And -you'll never ever have:

"1111"|"field2 data"|"0 "|"1 "|"field5 data comes here with a pipe!.|
And has few new lines in this data. "|"23122006"|"adf1"|

records delimited by newline option in external tables

PADMAJA, June 06, 2006 - 1:42 pm UTC

Hi Tom,
Thank you very much for your response.
Yes records always ends with |newline. But because I specified in the program that new line is the record delimiter. Program is reading after newline in comments field data as new record.
The tool we use to extract this text file is datastage. And now I came to know that we could scrub this new line from comments filed. Then now it is loading data properly. But I want to know how we can fix it in oracle side.

Thank you
Padmaja


Tom Kyte
June 06, 2006 - 2:07 pm UTC

use:

ACCESS PARAMETERS
(
RECORDS DELIMITED BY 0x'7C0A' ....


that is delimited by |\n



records delimited by newline option in external tables

PADMAJA, June 13, 2006 - 10:31 am UTC

Hi Tom,

Thank you again for your excellent answer. I didn’t test it yet. But I will. How I will know that 0x'7C0A' is for |/n? And if there is a character other than |/n, how I can convert into the format you gave for |/n?

Thank you very much,
padmaja


Tom Kyte
June 13, 2006 - 12:27 pm UTC

ops$tkyte@ORA9IR2> select to_char(ascii('|'),'xx') from dual;
 
TO_
---
 7c
 

 
Elapsed: 00:00:00.01
ops$tkyte@ORA9IR2> select to_char(ascii('
  2  '),'xx') from dual;
 
TO_
---
  a
 
Elapsed: 00:00:00.00
 

records delimited by newline option in external tables

PADMAJA, June 13, 2006 - 3:49 pm UTC

Hi Tom,

THank you for that code. That is very helpfull.

Padmaja.



Using Merge while loading from External Tables

Kumar, July 16, 2006 - 1:30 pm UTC

Tom,

Can you please look into the folowing merge statement. I am trying to insert into a blank table from an external table while simultaneously doing updates on the data.

The external table data is:

ab/10/0/
ab/10/0/
ab/10/0/
bc/20/0/
cd/10/0/
cd/10/0/
de/10/0/
fg/10/0/

Ext table created is 'T_EXT' that is successfuly created.

My source table is: ( g is the primary key)
g h
-------------------
ab 20
bc 20
cd 30
de 20
ef 20

The target table 't' after the merge should look like:
a b c
-------------------------
ab 10 3
ab 10 3
ab 10 3
bc 20 1
cd 10 3
cd 10 3
de 10 3
ef 20 2
fg 10 4

My merge is:

merge into T
using
(SELECT
te.A A1,
te.B,
Z.G,
Z.H,
(CASE
WHEN Z.B = Z.h
THEN 1
WHEN (Z.B > Z.h OR Z.B < Z.h)
THEN 3
ELSE 4
END ) tag
from
(
SELECT TEX.A,TEX.B,S.G,S.H FROM
(select a,c,sum(b) b from T_EXT group by a,c) TEX
FULL OUTER JOIN S
on (TEX.a=s.g )
)z,t_EXT TE WHERE Z.A= TE.A

) SR

on (t.a = sr.a1)
when matched then update set t.C = sR.tag
when not matched then
insert
(T.A,T.B,T.C)
values
(NVL(sr.a1,sr.g),NVL(sR.b,sr.h),sr.tag);

The above statement actually produces in table 't':
a b c
-------------------------
ab 10 3
ab 10 3
ab 10 3
bc 20 1
cd 10 3
cd 10 3
de 10 3
fg 10 4

a) I am not able to see the nvl values ie., 'ef' from table 's'.
b) Is it required for me to use the T_EXT twice in the statement (the table has millions of records) one for doing the grouping and the other to get the individual values?
c) I get the same result above even using LEFT OUTER JOIN instead of FULL OUTER JOIN. Why is this?




Tom Kyte
July 16, 2006 - 4:25 pm UTC

interesting - you lost me early on with "my external table is this", "my source table is that", "my target should look like this afterwards"

All we need is

a) source table (the create table command AND the inserts to populate it)
b) target table (the create and INITIAL inserts - sort of nice to know what it started from)
c) the specification for the desired outcome

external table seems not relevant here.


couple of create tables, requisite inserts into the tables - that is necessary here. Something we can cut and paste and run

Sorry ..here are the create table statement

Kumar, July 16, 2006 - 1:37 pm UTC

For my above request:

create table t (a varchar2(2),b number, c number);

create table s (g varchar2(2), h number);

insert into s values('ab',20);
insert into s values('bc',20);
insert into s values('cd',30);
insert into s values('de',20);
insert into s values('ef',20);

create table t_EXT (a varchar2(2),b number,c number)
ORGANIZATION EXTERNAL
( type oracle_loader
default directory TEMP
access parameters
( fields terminated by '/'
OPTIONALLY ENCLOSED BY '"' and '"'
LRTRIM(A,B,C))
location ('TESTLOAD.TXT'))




Tom Kyte
July 16, 2006 - 4:46 pm UTC

no external tables, I don't have your data, not relevant to the problem at all.

just normal tables will do - since at runtime you cannot tell the difference between a regular table and an external table!

So, table T is empty - the target is empty. 

After I run your merge, I see in T the values of the SELECT in the using. That is the only thing I "should" see.

And I see them all.  

Now what?  I'm missing something - it did exactly what you coded and I'm not sure what you were expecting exactly.

ops$tkyte%ORA10GR2> create table t_ext( a varchar2(2), b number, c number );

Table created.

ops$tkyte%ORA10GR2> insert into t_ext values ( 'ab',10,0);

1 row created.

ops$tkyte%ORA10GR2> insert into t_ext values ( 'ab',10,0);

1 row created.

ops$tkyte%ORA10GR2> insert into t_ext values ( 'ab',10,0);

1 row created.

ops$tkyte%ORA10GR2> insert into t_ext values ( 'bc',20,0);

1 row created.

ops$tkyte%ORA10GR2> insert into t_ext values ( 'cd',10,0);

1 row created.

ops$tkyte%ORA10GR2> insert into t_ext values ( 'cd',10,0);

1 row created.

ops$tkyte%ORA10GR2> insert into t_ext values ( 'de',10,0);

1 row created.

ops$tkyte%ORA10GR2> insert into t_ext values ( 'fg',10,0);

1 row created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> SELECT
  2       te.A A1,
  3       te.B,
  4       Z.G,
  5       Z.H,
  6        (CASE
  7        WHEN Z.B = Z.h
  8        THEN 1
  9        WHEN (Z.B > Z.h OR Z.B < Z.h)
 10        THEN 3
 11        ELSE 4
 12        END ) tag
 13        from
 14        (
 15        SELECT TEX.A,TEX.B,S.G,S.H FROM
 16        (select a,c,sum(b) b from T_EXT group by a,c) TEX
 17        FULL OUTER JOIN S
 18        on (TEX.a=s.g )
 19        )z,t_EXT TE WHERE Z.A= TE.A
 20  /

A1          B G           H        TAG
-- ---------- -- ---------- ----------
ab         10 ab         20          3
ab         10 ab         20          3
ab         10 ab         20          3
bc         20 bc         20          1
cd         10 cd         30          3
cd         10 cd         30          3
de         10 de         20          3
fg         10                        4

8 rows selected.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> merge into  T
  2  using
  3      (SELECT
  4       te.A A1,
  5       te.B,
  6       Z.G,
  7       Z.H,
  8        (CASE
  9        WHEN Z.B = Z.h
 10        THEN 1
 11        WHEN (Z.B > Z.h OR Z.B < Z.h)
 12        THEN 3
 13        ELSE 4
 14        END ) tag
 15        from
 16        (
 17        SELECT TEX.A,TEX.B,S.G,S.H FROM
 18        (select a,c,sum(b) b from T_EXT group by a,c) TEX
 19        FULL OUTER JOIN S
 20        on (TEX.a=s.g )
 21        )z,t_EXT TE WHERE Z.A= TE.A
 22         ) SR
 23      on (t.a = sr.a1)
 24      when matched then update set t.C = sR.tag
 25      when not matched then
 26      insert
 27      (T.A,T.B,T.C)
 28      values
 29   (NVL(sr.a1,sr.g),NVL(sR.b,sr.h),sr.tag);

8 rows merged.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select * from t;

A           B          C
-- ---------- ----------
ab         10          3
ab         10          3
ab         10          3
bc         20          1
cd         10          3
cd         10          3
de         10          3
fg         10          4

8 rows selected.
 

Kumar, July 16, 2006 - 6:00 pm UTC

Thanks for such a quick reply Tom. I was expecting to see in the target table 't' after the merge:
a b c
-------------------------
ab 10 3
ab 10 3
ab 10 3
bc 20 1
cd 10 3
cd 10 3
de 10 3
ef 20 2 -- This is not there in T_EXT butis there
in table 'S'.
fg 10 4

And I see you have run the merge statement I have used. That is not producing the row with value 'ef' from table S.

You said External Table is not relevant here. I am hoping to use external table for data in the flat file and then merge into 'T' using source 'S'.

Also as you observed I have used T_EXT twice in the merge. Is that required for the result I need. Somehow I was trying to see that I use this T_EXT only once. But have been unsuccessful.

Tom Kyte
July 17, 2006 - 1:03 pm UTC

maybe if you explain what your desired "end result" is (heck, skip the MERGE - the SELECT is the problem here) we can look.  Problem is, you have given us a "query that doesn't work" but no real explaination of the goal -the specification.

my comment about the external table is simply "it doesn't need to be an external table for purposes of this problem - it just needs to be a table, hence the example should be SO MUCH SIMPLIER, smaller, more concise, less noise, fewer things to distract from the problem at hand"

We don't need the merge (not relevant to your problem, the problem is your select doesn't return what you want)

We don't need the external table (not relevant again, a plain old simple table does the job)

Maybe you can work with this query to get what you need - I don't see how B would be "20" since B is the sum from T_EXT and there is no EF to sum up in that table - nor how the tag would be 2 since B is missing:


ops$tkyte@ORA10GR2> select coalesce(tex.a,s.g), tex.b, s.h,
  2         (CASE WHEN tex.B = s.h THEN 1
  3               WHEN (tex.B > s.h OR tex.B < s.h) THEN 3
  4               ELSE 4
  5           END ) tag
  6    from
  7  (
  8  select a, c, sum(b) over (partition by a,c) b
  9    from t_ext
 10  ) tex
 11    full outer join
 12  (
 13  select g, h
 14    from s
 15  ) s
 16  on (tex.a = s.g)
 17  /

CO          B          H        TAG
-- ---------- ---------- ----------
ab         30         20          3
ab         30         20          3
ab         30         20          3
bc         20         20          1
cd         20         30          3
cd         20         30          3
de         10         20          3
fg         10                     4
ef                    20          4

9 rows selected.
 

External table and outer join

Lise, July 25, 2006 - 8:49 am UTC

Back to the initial question in this thread, I find that when I outer join a table with an external table, and there are no matches in the external table, the last row in the flat file is always returned.
This is incorrect since the fields in the outer join do not match.
Say I select 1 row from the database table. There are no matching rows in the external table, but it still returns the last row in the flat file.
How can I get around this?

Tom Kyte
July 25, 2006 - 11:48 am UTC

exmaples always help - I cannot reproduce.  show us how to set this up step by step by step.

ops$tkyte%ORA10GR2> create or replace directory data_dir as '/tmp/'
  2  /

Directory created.

ops$tkyte%ORA10GR2> create table external_table
  2  (EMPNO NUMBER(4) ,
  3   ENAME VARCHAR2(10),
  4   JOB VARCHAR2(9),
  5   MGR NUMBER(4),
  6   HIREDATE DATE,
  7   SAL NUMBER(7, 2),
  8   COMM NUMBER(7, 2) ,
  9   DEPTNO NUMBER(2)
 10  )
 11  ORGANIZATION EXTERNAL
 12  ( type oracle_loader
 13    default directory data_dir
 14    access parameters
 15    ( fields terminated by ',' )
 16    location ('emp.dat')
 17  )
 18  /

Table created.

ops$tkyte%ORA10GR2> host flat scott/tiger emp > /tmp/emp.dat

ops$tkyte%ORA10GR2> set linesize 105
ops$tkyte%ORA10GR2> set echo on
ops$tkyte%ORA10GR2> select * from external_table
  2  /

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7788 SCOTT      ANALYST         7566 19-APR-87       3000                    20
      7369 SMITH      CLERK           7902 17-DEC-80        800                    20
      7499 ALLEN      SALESMAN        7698 20-FEB-81       1600        300         30
      7521 WARD       SALESMAN        7698 22-FEB-81       1250        500         30
      7566 JONES      MANAGER         7839 02-APR-81       2975                    20
      7654 MARTIN     SALESMAN        7698 28-SEP-81       1250       1400         30
      7698 BLAKE      MANAGER         7839 01-MAY-81       2850                    30
      7782 CLARK      MANAGER         7839 09-JUN-81       2450                    10
      7839 KING       PRESIDENT            17-NOV-81       5000                    10
      7844 TURNER     SALESMAN        7698 08-SEP-81       1500          0         30
      7876 ADAMS      CLERK           7788 23-MAY-87       1100                    20
      7900 JAMES      CLERK           7698 03-DEC-81        950                    30
      7902 FORD       ANALYST         7566 03-DEC-81       3000                    20
      7934 MILLER     CLERK           7782 23-JAN-82       1300                    10

14 rows selected.

ops$tkyte%ORA10GR2> create table t ( x number );

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select t.x, et.empno from t left outer join external_table et on (t.x = et.empno);

no rows selected

ops$tkyte%ORA10GR2> select t.x, et.empno from t right outer join external_table et on (t.x = et.empno);

         X      EMPNO
---------- ----------
                 7788
                 7369
                 7499
                 7521
                 7566
                 7654
                 7698
                 7782
                 7839
                 7844
                 7876
                 7900
                 7902
                 7934

14 rows selected.

 

this is what I mean...

Lise, July 26, 2006 - 8:32 am UTC

alter table t add constraint PK_1 primary key (x)
  using index;

insert into table t values(1);

SQL> SELECT * FROM external_table;

EMPNO
-----
    2
    3
    4
    5
    0

SQL> select * from t;

         X N
---------- --
  99999999 
         2 
         1 

SQL> 
SQL> select t.x, et.empno from t left outer join external_table
  2  et on (t.x = et.empno) where t.x = 1;

         X EMPNO
---------- -----
         1     0

SQL> 
SQL> alter table T
  2    disable constraint PK1;

Table altered

SQL> 
SQL> select t.x, et.empno from t left outer join external_table
  2    et on (t.x = et.empno)
  3    WHERE t.x =  1
  4  ;

         X EMPNO
---------- -----
         1 

It seems that if you add a unique key on the database table, and there are no matching entries in the external table, it will always return the last row in the file.
If this is the case, how can I get round this?


  

Tom Kyte
July 26, 2006 - 11:29 am UTC

that would be a bug. and it is not one that is reproducing on 10gr2 for me.

Please contact support for that one.

Thanks

Lise, July 27, 2006 - 3:42 am UTC

Thanks Tom. I thought I was going mad, but there is always a logical explanation to everything.

a student, October 16, 2006 - 11:03 pm UTC

Sorry, I don't know how to ask a new question. I have a question like:
I have about 30 tables for realtime data,and the data records grow rapidly. I have to dump the data of these tables periodicaly in PL/SQL. I want to create range partition tables on date column, and put the partitions of same date in the same tablespace. Then I dump the patitions through transporting the tablespace. But I found it doesn't work.
Could you advise me?

Tom Kyte
October 17, 2006 - 4:21 am UTC

what do you mean you "found it doesn't work"

You can

a) range partition - definitely
b) put the partitions where ever you like - definitely
c) alter the table and exchange the partition with an empty table - making it so you can transport it

Regarding Limitation on File Size

Sourabh, December 13, 2006 - 9:05 am UTC

Hi Tom,

The following article states that in Oracle 9i External table cannot access file of size greater that 2GB

</code> http://www.rampant-books.com/art_more_ext_tables.htm <code>

I was not able to find whether the same is true on Oracle 10g Database.Could you please let me know whether the same is true on Oracle 10g Database or is it some Operating System file size issue.

Thanks in Advance.

A hard nut to crack

a reader, December 19, 2006 - 4:59 am UTC

Hi,Tom:
I've encountered a difficult problem.

I've create a new database oradb,the Chinese characters are saved and displayed welled like this.

CREATE TABLE AA VARCHAR2(10), B VARCHAR2(10), C DATE) ;

insert into aa values('&#20013;&#20013;','aa','2006-12-10');

select * from aa;
A B C
---------- ---------- -------------------
&#20013;&#20013; aa 2006-12-10 00:00:00

But after doing some exercises of backup and restore,recover, I couldn't save and display the chinese character rightly.
I didn't remember what I have done . I just remembered these:
1)I've done an exercise of rebuilding control file
2)I used RMAN to backup database and restore the database.


Now , the problem is :

CREATE TABLE AA VARCHAR2(10), B VARCHAR2(10), C DATE) ;

insert into aa values('&#20013;&#20013;','aa','2006-12-18');

Now select data in sqlplus
select * from aa;
A B C
---------- ---------- -------------------
VPVP aa 2006-12-18 00:00:00


select data in OEM's data editor
A B C
---------- ---------- -------------------
VPVP aa 18-12&#26376;-2006 12:00:00AM



I've checked the NLS parameters in NLS_DATABASE_PARAMETERS like this:

select * from nls_database_parameters
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CHARACTERSET ZHS16GBK
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE AMERICAN
NLS_SORT BINARY
NLS_TIME_FORMAT HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY $
NLS_COMP BINARY
NLS_LENGTH_SEMANTICS BYTE
NLS_NCHAR_CONV_EXCP FALSE
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_RDBMS_VERSION 9.0.1.1.1

select * from v$nls_parameters
NLS_LANGUAGE SIMPLIFIED CHINESE
NLS_TERRITORY CHINA
NLS_CURRENCY RMB
NLS_ISO_CURRENCY CHINA
NLS_NUMERIC_CHARACTERS .,
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT YYYY-MM-DD HH24:MI:SS
NLS_DATE_LANGUAGE SIMPLIFIED CHINESE
NLS_CHARACTERSET ZHS16GBK
NLS_SORT BINARY
NLS_TIME_FORMAT HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY RMB
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_COMP BINARY
NLS_LENGTH_SEMANTICS BYTE
NLS_NCHAR_CONV_EXCP FALSE

I've also checked window's regist table,the NLS_LANG is :
NLS_LANG=simplified chinese_china.zhs16gbk


In order to find the problem, I've also imported an DMP file from another databse,

There is a table 'YHB' in the dmp file,and the data display is correct.

like this:

imp jhw/306@oradb tables=(YHB) file=c:\yhb.dmp;

But when I select data in YHB that I just import,
I found the data is not correct

XM SM
--------- -----------------
V50`T11 3,<69\@mT1

and the right data should be
XM SM
--------- -----------------
&#20540;&#29677;&#21592;1 &#36229;&#32423;&#31649;&#29702;&#21592;


Then the questions are:
1 why couldn't I save and display the chinese characters correctly?
2 In NLS_database_parameters and v$nls_parameters , some parameters' value like 'NLS_LANGUAGE' are not the same? why?


Thanks!!





How to specify the date format with time zone as date mask in External Tables?

Gowtham Sen, April 18, 2007 - 10:22 am UTC

Hi Tom,

I have a scenario, where I am getting a date filed with time zone.

I created the external table as following

CREATE TABLE "Test_EXTBL"
(
"Test_dt" Date
)
ORGANIZATION EXTERNAL (
TYPE ORACLE_LOADER
DEFAULT DIRECTORY TEST_DIR
ACCESS PARAMETERS (
RECORDS DELIMITED BY NEWLINE
CHARACTERSET WE8MSWIN1252
STRING SIZES ARE IN BYTES
BADFILE TEST_DIR:'bad_test.txt'
DISCARDFILE TEST_DIR:'discard_test.txt'
LOGFILE TEST_DIR:'log_test.txt'
FIELDS
TERMINATED BY '|'
OPTIONALLY ENCLOSED BY '"' AND '"'
LRTRIM
MISSING FIELD VALUES ARE NULL
(
"Test_dt" CHAR(30) DATE_FORMAT DATE MASK 'RRMMDDHH24MISS'
)
)
LOCATION (
TEST_DIR:'test.txt'
)
)
REJECT LIMIT UNLIMITED
NOPARALLEL
;

The data in the file test.txt is as follows.
061109235500
061109235501
061109235502
061109235503
061110235504
061110235505

SQL> SELECT * FROM "Test_EXTBL";

Test_dt
-----------
11/9/2006 1
11/9/2006 1
11/9/2006 1
11/9/2006 1
11/10/2006
11/10/2006

6 rows selected

SQL>

Now its working fine.
But I am getting the data in the file as follows

061109235500CST
061109235501CST
061109235502CST
061109235503CST
061110235504CST
061110235505CST

Here CST is time zone.

Here I would like to set the date mask for this data and if it fails processing I mean, if there is wrong data, it should go to the log.
for example say, if I get one row value as 0611. This record should go to the log.

Then I created the external table in the following way.

drop table "Test_EXTBL";

CREATE TABLE "Test_EXTBL"
(
"Test_dt" Date
)
ORGANIZATION EXTERNAL (
TYPE ORACLE_LOADER
DEFAULT DIRECTORY TEST_DIR
ACCESS PARAMETERS (
RECORDS DELIMITED BY NEWLINE
CHARACTERSET WE8MSWIN1252
STRING SIZES ARE IN BYTES
BADFILE TEST_DIR:'bad_test.txt'
DISCARDFILE TEST_DIR:'discard_test.txt'
LOGFILE TEST_DIR:'log_test.txt'
FIELDS
TERMINATED BY '|'
OPTIONALLY ENCLOSED BY '"' AND '"'
LRTRIM
MISSING FIELD VALUES ARE NULL
(
"Test_dt" CHAR(30) DATE_FORMAT DATE MASK 'RRMMDDHH24MISSTZR'
)
)
LOCATION (
TEST_DIR:'test.txt'
)
)
REJECT LIMIT UNLIMITED
NOPARALLEL
;

External table created. But while querying the data, it is showing the error "Invalid date mask is specified".

I am using Oracle 10G
Does oracle accept the date format with timezone in the external tables datemasking?

Please suggest me to implement this.

Thanks in Advance.

Thank you,
Regards,

Gowtham Sen.
Tom Kyte
April 18, 2007 - 12:40 pm UTC

it would be a timestamp, not a date
you would use a timestamp mask

DATE_FORMAT timestamp MASK


but, what does to_char(systimestamp,'tzr') return on your system?

Still it is showing the error "Invalid date mask"

Gowtham Sen, April 19, 2007 - 2:51 am UTC

Thanks Tom.

I tried the one for "to_char(systimestamp,'tzr')" on my system.

The output is as follows.
SQL> select to_char(systimestamp,'tzr') from dual;

TO_CHAR(SYSTIMESTAMP,'TZR')
--------------------------------
+05:30

I tried with the option for date mask as specifying "timestamp" in the following way.

drop table "Test_EXTBL";

CREATE TABLE "Test_EXTBL"
(
"Test_dt" Date
)
ORGANIZATION EXTERNAL (
TYPE ORACLE_LOADER
DEFAULT DIRECTORY TEST_DIR
ACCESS PARAMETERS (
RECORDS DELIMITED BY NEWLINE
CHARACTERSET WE8MSWIN1252
STRING SIZES ARE IN BYTES
BADFILE TEST_DIR:'bad_test.txt'
DISCARDFILE TEST_DIR:'discard_test.txt'
LOGFILE TEST_DIR:'log_test.txt'
FIELDS
TERMINATED BY '|'
OPTIONALLY ENCLOSED BY '"' AND '"'
LRTRIM
MISSING FIELD VALUES ARE NULL
(
"Test_dt" CHAR(30) DATE_FORMAT TIMESTAMP MASK 'RRMMDDHH24MISSTZR'
)
)
LOCATION (
TEST_DIR:'test.txt'
)
)
REJECT LIMIT UNLIMITED
NOPARALLEL
;

Even though, it is showing the same error as "Invalid date mask".

Then, I tried with the following option

"Test_dt" CHAR(30) DATE_FORMAT TIMESTAMP WITH TIMEZONE MASK 'RRMMDDHH24MISSTZR'

It works successfully!!

Thanks a lot.
You put me in the right direction to think.

Thank you,
Regards,

Gowtham Sen

ORA-01882: timezone region not found

Gowtham Sen, April 22, 2007 - 7:24 am UTC

I am facing one more problem while converting date field with timezone.

Intitally I am getting the data in the file as follows

061109235500CST
061109235501CST
061109235502CST
061109235503CST
061110235504CST
061110235505CST

The timezone masking is working fine.

But in some records I am getting data as follows.

061109235500CDT
061109235501CDT
061109235502CDT
061109235503CDT
061110235504CDT
061110235505CDT

Then its showing an error "Invalid date/time conversion".

Then I verified this conversion at sql command prompt as follows.

SQL> select to_timestamp_tz('061109235500CST', 'RRMMDDHH24MISSTZR') from dual;

TO_TIMESTAMP_TZ('061109235500C
------------------------------
09-NOV-06 11.

SQL>

Its working fine with timezone CST.
But I specify the time zone as CDT I got the error as follows.

SQL> select to_timestamp_tz('061109235500CDT', 'RRMMDDHH24MISSTZR') from dual;

select to_timestamp_tz('061109235500CDT', 'RRMMDDHH24MISSTZR') from dual

ORA-01882: timezone region not found

Even I verified this in the thable V$Timezone_names. Its there.

I don't know, why I am getting ths error.

Please help me in this.

Thanks in advance.

Thank you,
Regards,

Gowtham Sen.

Position based data

Chan, April 15, 2008 - 9:17 am UTC

Tom:
It look like I can create external tables for position based data file also.

Now.. I have my data file organized something like this..

VE2ACEX76F5426800000002299K250404 11160746071 3 2
VF2ACEX76F542682000 00001 90L03967AAA
VG2ACEX76F54268 90L03967 RETURN/REWORK TO SUPPL
--- some more -- data
VE2ACEX76F5426900000002099K250405 11160746071 3 2
VF2ACEX76F542692000 00001 90L03967AAA
VG2ACEX76F54269 90L03967 RETURN/REWORK TO SUPPL
---

lines starting with VE, VF, VG in a data file makes a single record.

The values like 54268 and 54269 are key columns and each line carry this key value.

My question: Can I this kind of data file, where data is scattered into multiple lines with a key value/record in each column ?

How to Load multiple files using External Table

Anan, August 27, 2008 - 3:48 pm UTC

Tom,
Can you help me with external table script to load multiple files text data (data_daily1.txt 1.4GB size, data_daily2.txt 1.1 GB size, data_daily3.txt 2.1 GB size)using external tables.
Thanks,

I am using sqlloader
====================
load data
infile 'data_daily1.txt'
infile 'data_daily2.txt'
infile 'data_daily3.txt'
append
into table daily_upload
fields terminated by ","
optionally enclosed by '"'
trailing nullcols
(col1,
col2,
col3,
col4,
col5)

Tom Kyte
August 29, 2008 - 1:11 pm UTC

<b>$ sqlldr / test.ctl external_table=generate_only</b>


sqlldr will generate all you need for you..

(test.ctl is your control file, / is your user/password)

I created a table daily_load with 5 columns, all numbers - you would of course have your real types in there... and then

<b>$ cat test.log
</b>

SQL*Loader: Release 10.2.0.2.0 - Production on Fri Aug 29 12:48:05 2008

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

Control File:   test.ctl

There are 3 data files:
Data File:      data_daily1.txt
  Bad File:     data_daily1.bad
  Discard File:  none specified

 (Allow all discards)
Data File:      data_daily2.txt
  Bad File:     data_daily2.bad
  Discard File:  none specified

 (Allow all discards)
Data File:      data_daily3.txt
  Bad File:     data_daily3.bad
  Discard File:  none specified

 (Allow all discards)

Number to load: ALL
Number to skip: 0
Errors allowed: 50
Continuation:    none specified
Path used:      External Table

Table DAILY_UPLOAD, loaded from every logical record.
Insert option in effect for this table: APPEND
TRAILING NULLCOLS option in effect

   Column Name                  Position   Len  Term Encl Datatype
------------------------------ ---------- ----- ---- ---- ---------------------
COL1                                FIRST     *   ,  O(") CHARACTER
COL2                                 NEXT     *   ,  O(") CHARACTER
COL3                                 NEXT     *   ,  O(") CHARACTER
COL4                                 NEXT     *   ,  O(") CHARACTER
COL5                                 NEXT     *   ,  O(") CHARACTER



CREATE TABLE statement for external table:
------------------------------------------------------------------------
CREATE TABLE "SYS_SQLLDR_X_EXT_DAILY_UPLOAD"
(
  "COL1" NUMBER(38),
  "COL2" NUMBER(38),
  "COL3" NUMBER(38),
  "COL4" NUMBER(38),
  "COL5" NUMBER(38)
)
ORGANIZATION external
(
  TYPE oracle_loader
  DEFAULT DIRECTORY UPLOAD_DIRECTORY
  ACCESS PARAMETERS
  (
    RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII
    BADFILE 'UPLOAD_DIRECTORY':'data_daily1.bad'
    LOGFILE 'test.log_xt'
    READSIZE 1048576
    FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"' LDRTRIM
    MISSING FIELD VALUES ARE NULL
    REJECT ROWS WITH ALL NULL FIELDS
    (
      "COL1" CHAR(255)
        TERMINATED BY "," OPTIONALLY ENCLOSED BY '"',
      "COL2" CHAR(255)
        TERMINATED BY "," OPTIONALLY ENCLOSED BY '"',
      "COL3" CHAR(255)
        TERMINATED BY "," OPTIONALLY ENCLOSED BY '"',
      "COL4" CHAR(255)
        TERMINATED BY "," OPTIONALLY ENCLOSED BY '"',
      "COL5" CHAR(255)
        TERMINATED BY "," OPTIONALLY ENCLOSED BY '"'
    )
  )
  location
  (
    'data_daily1.txt',
    'data_daily2.txt',
    'data_daily3.txt'
  )
)REJECT LIMIT UNLIMITED


INSERT statements used to load internal tables:
------------------------------------------------------------------------
INSERT /*+ append */ INTO DAILY_UPLOAD
(
  COL1,
  COL2,
  COL3,
  COL4,
  COL5
)
SELECT
  "COL1",
  "COL2",
  "COL3",
  "COL4",
  "COL5"
FROM "SYS_SQLLDR_X_EXT_DAILY_UPLOAD"




How to Load multiple files using External Table

Anan, September 05, 2008 - 2:46 pm UTC

Thanks Tom.
I have another question regarding location.
.....
location
(
'data_daily1.txt',
'data_daily2.txt',
'data_daily3.txt'
)
....

I may have 1 to n daily text files to load instead of using static files.
'data_daily1.txt',
'data_daily2.txt',
'data_daily3.txt',
......
......
'data_dailyn.txt'

Can I use data_daily*.txt instead of static files or suggest alternate method to use multiple text files in location.

Thanks,

Tom Kyte
September 05, 2008 - 4:59 pm UTC

you would have to use the alter table command to provide the list, no "*" won't work

How to Load multiple files with Header Rows using External Table

Liz, September 12, 2008 - 8:17 am UTC

Hi Tom,

I have a similar issue loading multiple files into one external table except the files all have header rows that I wish to skip.

I have tried using skip 1 but this only seems to skip over the header row of the first file and not all files.

Any ideas how I can do this?
Tom Kyte
September 16, 2008 - 1:23 pm UTC

is there something you can use to "identify" your header rows - what do they look like.

map all of the fields using varchar2 and use a where clause to filter out the rows you do not want to consider.

How to Load multiple files with Header Rows using External Table

Liz, September 19, 2008 - 7:15 am UTC

The header row is literally just column names...

Account Reference Number,Component Reference Number,CompleteionDate,PreviousCompletionDate,TransferDate,Customer ID,Number of Borrowers...

I have used unix to remove the header rows to resolve this now, I did have a go at using LOAD WHEN, but I wanted to avoid changing the datatypes on the external table if possible.

Thank you

Tom Kyte
September 19, 2008 - 8:41 am UTC

ops$tkyte%ORA10GR2> create or replace directory tmpdir as '/home/tkyte'
  2  /

Directory created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> CREATE TABLE ET
  2  (
  3    "X" NUMBER(38),
  4    "Y" DATE,
  5    "Z" VARCHAR2(30)
  6  )
  7  ORGANIZATION external
  8  (
  9    TYPE oracle_loader
 10    DEFAULT DIRECTORY TMPDIR
 11    ACCESS PARAMETERS
 12    (
 13      RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII
 14      LOAD WHEN ((1: 1) != "A")
 15      BADFILE 'TMPDIR':'test.bad'
 16      LOGFILE 'test.log_xt'
 17      READSIZE 1048576
 18      FIELDS TERMINATED BY "," LDRTRIM
 19      MISSING FIELD VALUES ARE NULL
 20      REJECT ROWS WITH ALL NULL FIELDS
 21      (
 22        "X" CHAR(255)
 23          TERMINATED BY ",",
 24        "Y" CHAR(255)
 25          TERMINATED BY ",",
 26        "Z" CHAR(255)
 27          TERMINATED BY ","
 28      )
 29    )
 30    location
 31    (
 32      'test1.dat', 'test2.dat'
 33    )
 34  )REJECT LIMIT UNLIMITED
 35  /

Table created.



and given inputs:

ops$tkyte%ORA10GR2> !cat test1.dat
Account Reference Number,Component Reference Number,CompleteionDate
1,01-jan-2008,hello
2,01-jun-2008,world

ops$tkyte%ORA10GR2> !cat test2.dat
Account Reference Number,Component Reference Number,CompleteionDate
3,11-jan-2008,HELLO
4,11-jun-2008,WORLD

ops$tkyte%ORA10GR2> select * from et;

         X Y         Z
---------- --------- ------------------------------
         1 01-JAN-08 hello
         2 01-JUN-08 world
         3 11-JAN-08 HELLO
         4 11-JUN-08 WORLD


and the discard file would have the headers:

ops$tkyte%ORA10GR2> !cat ET_17048.dsc
Account Reference Number,Component Reference Number,CompleteionDate
Account Reference Number,Component Reference Number,CompleteionDate

Changing a column width

Craig, April 01, 2009 - 12:02 pm UTC

I'd like to further address question #1 of the original post. Is there a way to change a column width on an existing external table?

DROP TABLE et_alter_test PURGE;
create table et_alter_test
(
  col_1  VARCHAR2(10)
, col_2  VARCHAR2(10)
)
ORGANIZATION external 
(
  TYPE oracle_loader
  DEFAULT DIRECTORY IN_DIR
  ACCESS PARAMETERS 
  (
    RECORDS DELIMITED BY NEWLINE CHARACTERSET WE8MSWIN1252
    BADFILE 'IN_DIR':'test.bad'
    LOGFILE 'IN_DIR':'test.log_xt'
    READSIZE 1048576
    FIELDS TERMINATED BY '::' LDRTRIM
    MISSING FIELD VALUES ARE NULL
    REJECT ROWS WITH ALL NULL FIELDS
    (
      col_1               CHAR(10)
    , col_2               CHAR(10)
    )
  )
  location 
  (
    'file.txt'
  )
)REJECT LIMIT 0;

Representative data:
hello::1234567890
hi::12345
error::123456789012345678

Everything was working fine until the source of this file decided that col_2 needed to be 20 characters wide instead of 10 (without telling me). So this morning I started getting rejects because the data is wider than the field.
SQL> 
SQL> SELECT *
  2  FROM et_alter_test;
 
ORA-29913: error in executing ODCIEXTTABLEFETCH callout
ORA-30653: reject limit reached
ORA-06512: at "SYS.ORACLE_LOADER", line 52


So I try to widen the field in the external table.
SQL> 
SQL> SELECT c.column_name
  2       , c.data_length
  3  FROM user_tab_cols c
  4  WHERE table_name = 'ET_ALTER_TEST';
 
COLUMN_NAME                    DATA_LENGTH
------------------------------ -----------
COL_1                                   10
COL_2                                   10
 
SQL> ALTER TABLE et_alter_test MODIFY col_2 VARCHAR2(20);
 
Table altered
 
SQL> 
SQL> SELECT c.column_name
  2       , c.data_length
  3  FROM user_tab_cols c
  4  WHERE table_name = 'ET_ALTER_TEST';
 
COLUMN_NAME                    DATA_LENGTH
------------------------------ -----------
COL_1                                   10
COL_2                                   20
 
SQL> SELECT *
  2  FROM et_alter_test;
 
ORA-29913: error in executing ODCIEXTTABLEFETCH callout
ORA-30653: reject limit reached
ORA-06512: at "SYS.ORACLE_LOADER", line 52


I still get the rejects, because the alter didn't alter the bottom portion (the sql*ldr portion) of the external table definition. A quick look at user_external_tables reveals this. So, short of dropping/recreating the external table, is there a way to make such a modification to an existing external table?

Changing the DDL to read as follows yields the identical result...except that now issuing the alter actually fixes the problem.
    ...
    REJECT ROWS WITH ALL NULL FIELDS
    (
      col_1               CHAR(10)
    , col_2               CHAR(4000)
    )
    ...

SQL> ALTER TABLE et_alter_test MODIFY col_2 VARCHAR2(20);
 
Table altered
SQL> SELECT *
  2  FROM et_alter_test;
 
COL_1      COL_2
---------- --------------------
hello      1234567890
hi         12345
error      123456789012345678
 
SQL> 


Assuming you can't change a column width, is there any reason at all why I shouldn't always declare the lower portion to be CHAR(4000) and set the upper declarations to the proper width? And if so, why is the default width only 255?
Tom Kyte
April 01, 2009 - 4:33 pm UTC

by default the char(NN) is 255 in sqlldr/external tables. Why 255? Because it dates back to the beginning of time when strings had smaller sizes - way way way back in time.

having at 4000 will just affect (marginally) the memory needed (a staging buffer).

Changing a column width

Craig, April 01, 2009 - 5:42 pm UTC

So there's no way to change it short of dropping/recreating the table?
Tom Kyte
April 02, 2009 - 9:18 am UTC

I didn't say that, quick glance at documentation shows

http://docs.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_3001.htm#i2104234

ops$tkyte%ORA10GR2> create table t
  2  (
  3    col_1  VARCHAR2(10)
  4  , col_2  VARCHAR2(10)
  5  )
  6  ORGANIZATION external
  7  (
  8    TYPE oracle_loader
  9    DEFAULT DIRECTORY IN_DIR
 10    ACCESS PARAMETERS
 11    (
 12      RECORDS DELIMITED BY NEWLINE CHARACTERSET WE8MSWIN1252
 13      BADFILE 'IN_DIR':'test.bad'
 14      LOGFILE 'IN_DIR':'test.log_xt'
 15      READSIZE 1048576
 16      FIELDS TERMINATED BY '::' LDRTRIM
 17      MISSING FIELD VALUES ARE NULL
 18      REJECT ROWS WITH ALL NULL FIELDS
 19      (
 20        col_1               CHAR(10)
 21      , col_2               CHAR(10)
 22      )
 23    )
 24    location
 25    (
 26      'file.txt'
 27    )
 28  )REJECT LIMIT 0;

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> alter table t default directory in_dir access parameters
  2    (
  3      RECORDS DELIMITED BY NEWLINE CHARACTERSET WE8MSWIN1252
  4      BADFILE 'IN_DIR':'test.bad'
  5      LOGFILE 'IN_DIR':'test.log_xt'
  6      READSIZE 1048576
  7      FIELDS TERMINATED BY '::' LDRTRIM
  8      MISSING FIELD VALUES ARE NULL
  9      REJECT ROWS WITH ALL NULL FIELDS
 10      (
 11        col_1               CHAR(20)
 12      , col_2               CHAR(10)
 13      )
 14    )
 15  /

Table altered.

ops$tkyte%ORA10GR2> alter table t modify col_1 varchar2(20);

Table altered.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select dbms_metadata.get_ddl( 'TABLE', 'T' ) from dual;

DBMS_METADATA.GET_DDL('TABLE','T')
-------------------------------------------------------------------------------

  CREATE TABLE "OPS$TKYTE"."T"
   (    "COL_1" VARCHAR2(20),
        "COL_2" VARCHAR2(10)
   )
   ORGANIZATION EXTERNAL
    ( TYPE ORACLE_LOADER
      DEFAULT DIRECTORY "IN_DIR"
      ACCESS PARAMETERS
      ( RECORDS DELIMITED BY NEWLINE CHARACTERSET WE8MSWIN1252
    BADFILE 'IN_DIR':'test.bad'
    LOGFILE 'IN_DIR':'test.log_xt'
    READSIZE 1048576
    FIELDS TERMINATED BY '::' LDRTRIM
    MISSING FIELD VALUES ARE NULL
    REJECT ROWS WITH ALL NULL FIELDS
    (
      col_1               CHAR(20)
    , col_2               CHAR(10)
    )
      )
      LOCATION
       ( 'file.txt'
       )
    )



How to get external table data from another unix server

Anan, February 02, 2010 - 11:52 am UTC

Hi Tom
Oracle 10g DB is on Unix server A.
External table ou.txt files is on Unix server B.
(ou.txt data is located ABMT_FTP on unix Server B /apps01/abm/ou.txt)

CREATE TABLE T_OU_MAP_EXT
(
EFF_DT DATE,
OU_ID VARCHAR2(12 BYTE),
OU_PRNT_ID VARCHAR2(12 BYTE),
BILL_ENT_ID VARCHAR2(6 BYTE)
)
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
DEFAULT DIRECTORY ABMT_FTP
ACCESS PARAMETERS
( records delimited by newline
badfile ABMT_FTP:'ou_ext.bad'
logfile ABMT_FTP:'ou_ext.log'
fields terminated by ','
OPTIONALLY ENCLOSED BY '"'
MISSING FIELD VALUES ARE NULL
REJECT ROWS WITH ALL NULL FIELDS )
LOCATION (ABMT_FTP:'ou.txt')
)
REJECT LIMIT UNLIMITED
NOPARALLEL
NOMONITORING;

How do I do a map the Location to the other Server(B)?

Thanks,
Anan


Tom Kyte
February 02, 2010 - 12:49 pm UTC

NFS would be a path...

basically, the file system MUST be available to the database server instance - it must be.

Else, you would be asking the database server instance to be able to read the file system of another machine - very virus'y like. It won't happen.

External tables and Multilingual characters

MK, July 02, 2010 - 3:33 pm UTC

Hi,

I am trying to help a client have a better and more efficient way of unloading data from a Microsoft SQL server 2008 database to Oracle 11g at a remote location. They currently use SSIS to do this and have complained about it being extremely slow. I have managed to convince them to use Oracle's high speed flat file loaders instead which do the job much quicker. They upload music related data like music tracks. My question now - Will the external table script that I have written below work to load in track names or artist names which can have any international character sets ranging from Chinese to Hungarian or would I need to do something special to make this work? The target Oracle instance has NLS_CHARSET of AL32UTF8. Also if you have any suggestions on better formatting or use of better options then I would be more than interested to know about them!

-- CREATE OR REPLACE DIRECTORY dmpdir AS '/home/oracle/dpump_dir';
-- grant all on DIRECTORY dmpdir to public;

/* External Table Creation */
CREATE TABLE EXT_TRACK
(
"ID" NUMBER(10) ,
"PRIMARY_GENRE_ID" NUMBER(10) ,
"PRIMARY_ARTIST_ID" NUMBER(10) ,
"PRIMARY_IMAGE_ID" NUMBER(10) ,
"PRIMARY_STORY_ID" NUMBER(10) ,
"SPECIFIC_PRICE_ID" NUMBER(10) ,
"EDITOR_NOTE_ID" NUMBER(10) ,
"RELEASE_ID" NUMBER(10) ,
"SIDELOADED_BY_CUSTOMER_ID" NUMBER(10) ,
"LYRICS_STORY_ID" NUMBER(10) ,
"ARTIST_SET_ID" NUMBER(10) ,
"IMAGE_SET_ID" NUMBER(10) ,
"STORY_SET_ID" NUMBER(10) ,
"SIMILAR_TRACK_SET_ID" NUMBER(10) ,
"RELATED_TRACK_SET_ID" NUMBER(10) ,
"RECORD_LABEL_COMPANY_ID" NUMBER(10) ,
"ENABLED" NUMBER(1) ,
"ENCRYPTED" NUMBER(1) ,
"ADULT_CONTENT" NUMBER(1) ,
"NOT_IN_LIBRARY" NUMBER(1) ,
"SIDELOADED_BY_CUSTOMER" NUMBER(1) ,
"YEAR_RELEASED" NUMBER(4) ,
"BEATS_PER_MINUTE" NUMBER(4) ,
"DISC_NUMBER" NUMBER(4) ,
"DISC_COUNT" NUMBER(4) ,
"TRACK_NUMBER" NUMBER(4) ,
"TRACK_COUNT" NUMBER(4) ,
"TOTAL_TIME_IN_SECONDS" NUMBER(12) ,
"ISRC_CODE" VARCHAR2(16 CHAR) ,
"KIND" VARCHAR2(32 CHAR) ,
"GUID" VARCHAR2(32 CHAR) ,
"RECORD_LABEL_COMPANY_NAME" VARCHAR2(256 CHAR) ,
"TITLE" VARCHAR2(256 CHAR) ,
"RELEASE_TITLE" VARCHAR2(256 CHAR) ,
"ARTIST_NAME" VARCHAR2(256 CHAR) ,
"DESCRIPTION" VARCHAR2(256 CHAR) ,
"P_LINE" VARCHAR2(256 CHAR) ,
"C_LINE" VARCHAR2(256 CHAR) ,
"DATA_CLASSIFICATION" NUMBER(10) ,
"CREATED" TIMESTAMP(3) ,
"INSERTED" TIMESTAMP(3) ,
"MODIFIED" TIMESTAMP(3) ,
"SUB_RECORD_LABEL_COMPANY_NAME" VARCHAR2(256 CHAR) ,
"COMPOSER_SET_ID" NUMBER(10)
)
ORGANIZATION external
(
TYPE oracle_loader
DEFAULT DIRECTORY dmpdir
ACCESS PARAMETERS
(
RECORDS DELIMITED BY NEWLINE CHARACTERSET AL32UTF8
BADFILE 'xt_track.bad'
LOGFILE 'xt_track.log'
READSIZE 1048576
FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LDRTRIM
MISSING FIELD VALUES ARE NULL
REJECT ROWS WITH ALL NULL FIELDS
(
"ID" CHAR(300)
TERMINATED BY "|" ,
"PRIMARY_GENRE_ID" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"PRIMARY_ARTIST_ID" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"PRIMARY_IMAGE_ID" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"PRIMARY_STORY_ID" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"SPECIFIC_PRICE_ID" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"EDITOR_NOTE_ID" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"RELEASE_ID" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"SIDELOADED_BY_CUSTOMER_ID" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"LYRICS_STORY_ID" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"ARTIST_SET_ID" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"IMAGE_SET_ID" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"STORY_SET_ID" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"SIMILAR_TRACK_SET_ID" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"RELATED_TRACK_SET_ID" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"RECORD_LABEL_COMPANY_ID" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"ENABLED" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"ENCRYPTED" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"ADULT_CONTENT" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"NOT_IN_LIBRARY" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"SIDELOADED_BY_CUSTOMER" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"YEAR_RELEASED" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"BEATS_PER_MINUTE" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"DISC_NUMBER" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"DISC_COUNT" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"TRACK_NUMBER" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"TRACK_COUNT" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"TOTAL_TIME_IN_SECONDS" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"ISRC_CODE" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"KIND" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"GUID" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"RECORD_LABEL_COMPANY_NAME" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"TITLE" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"RELEASE_TITLE" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"ARTIST_NAME" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"DESCRIPTION" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"P_LINE" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"C_LINE" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"DATA_CLASSIFICATION" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"CREATED" CHAR(300) date_format timestamp mask "DD/MM/YYYY HH24:MI:SS.FF",
"INSERTED" CHAR(300) date_format timestamp mask "DD/MM/YYYY HH24:MI:SS.FF",
"MODIFIED" CHAR(300) date_format timestamp mask "DD/MM/YYYY HH24:MI:SS.FF",
"SUB_RECORD_LABEL_COMPANY_NAME" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"',
"COMPOSER_SET_ID" CHAR(300)
TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"'
)
)
location
(
'track.dat'
)
)REJECT LIMIT UNLIMITED PARALLEL
;

Inserting into multiple tables from external file

Shimmy, January 18, 2012 - 11:41 am UTC

Hi,

I have an external file with the following info
ST*850*0001
PO1*1*1*2120251
PO1*2*1*1141332
SE*3*0001
ST*850*0002
PO1*1*2*1382225
PO1*2*1*3444486
PO1*3*1*2463487
PO1*4*1*1146380
SE*5*0002


CREATE TABLE EXTERNAL_FILE
(
  EDI_TEXT  VARCHAR2(2000)
)
ORGANIZATION EXTERNAL
  (  TYPE ORACLE_LOADER
     DEFAULT DIRECTORY EXT
     ACCESS PARAMETERS 
       ( RECORDS DELIMITED BY NEWLINE
        NOBADFILE NODISCARDFILE NOLOGFILE
        FIELDS(
        EDI_TEXT        CHAR(2000)
        )
         )
     LOCATION (EXT:'EDI850.txt')
  )
REJECT LIMIT UNLIMITED;



CREATE TABLE ORD_HDR
(ORDER_NUMBER   NUMBER(10) NOT NULL PRIMARY KEY);

CREATE TABLE ORD_DTL
(ORDER_NUMBER   NUMBER(10) NOT NULL,
 ITEM_NUMBER NUMBER(10),
 QUANTITY    NUMBER(10));


Lines starting with ST denotes the beginning of the order and line starting with SE denotes end of the order.
Lines starting with PO1 indicates order detail info.
The fields are seperated by * .
On the ST and SE lines, the third field is the order number.
On the PO1 line, the third field is the quantity and fourth field is the item number.

Is it possible to just use SQL to populate the ORD_HDR and ORD_DTL detail by reading the external table just once?


I am looking for an output which looks likes the following
SELECT ORDER_NUMBER FROM ORD_HDR;
ORDER_NUMBER
0001
0002

SELECT ORDER_NUMBER FROM ORD_DTL;
ORDER_NUMBER ITEM_NUMBER   QUANTITY
0001         2120251       1
0001         1141332       1
0002         1382225       2
0002         3444486       1
0002         2463487       1
0002         1146380       1 

Thank you

Tom Kyte
January 18, 2012 - 12:58 pm UTC

ops$tkyte%ORA11GR2> create or replace directory MY_DIR as '/home/tkyte'
  2  /

Directory created.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> CREATE TABLE edifact_data
  2  (
  3    "REC_TYPE" VARCHAR2(20),
  4    "EDIFACT_TT" VARCHAR2(20),
  5    "FIELD1" VARCHAR2(20),
  6    "FIELD2" VARCHAR2(20),
  7    "FIELD3" VARCHAR2(20),
  8    "FIELD4" VARCHAR2(20)
  9  )
 10  ORGANIZATION external
 11  (
 12    TYPE oracle_loader
 13    DEFAULT DIRECTORY MY_DIR
 14    ACCESS PARAMETERS
 15    (
 16      RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII
 17      BADFILE 'data.bad'
 18      LOGFILE 'x.log_xt'
 19      READSIZE 1048576
 20      FIELDS TERMINATED BY "*" LDRTRIM
 21      MISSING FIELD VALUES ARE NULL
 22      REJECT ROWS WITH ALL NULL FIELDS
 23      (
 24        "REC_TYPE" CHAR(255)
 25          TERMINATED BY "*",
 26        "EDIFACT_TT" CHAR(255)
 27          TERMINATED BY "*",
 28        "FIELD1" CHAR(255)
 29          TERMINATED BY "*",
 30        "FIELD2" CHAR(255)
 31          TERMINATED BY "*",
 32        "FIELD3" CHAR(255)
 33          TERMINATED BY "*",
 34        "FIELD4" CHAR(255)
 35          TERMINATED BY "*"
 36      )
 37    )
 38    location
 39    (
 40      'data.txt'
 41    )
 42  )REJECT LIMIT UNLIMITED
 43  /

Table created.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> create table ord_hdr( order_number varchar2(20) );

Table created.

ops$tkyte%ORA11GR2> create table ord_dtl( order_number varchar2(20), item_number varchar2(20), quantity varchar2(20) );

Table created.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> insert
  2  when rec_type = 'ST' then into ord_hdr(order_number) values ( ONUM )
  3  when rec_type = 'PO1' then into ord_dtl(order_number,item_number,quantity) values ( ONUM,  field2, field1 )
  4  select rec_type,
  5         max(case when rec_type = 'ST' then field1 end) over (order by r) onum,
  6         field1, field2
  7    from (select e.*, rownum r from edifact_data e)
  8  /

8 rows created.

ops$tkyte%ORA11GR2> select * from ord_hdr;

ORDER_NUMBER
--------------------
0001
0002

ops$tkyte%ORA11GR2> select * from ord_dtl;

ORDER_NUMBER         ITEM_NUMBER          QUANTITY
-------------------- -------------------- --------------------
0001                 2120251              1
0001                 1141332              1
0002                 1382225              2
0002                 3444486              1
0002                 2463487              1
0002                 1146380              1

6 rows selected.


Inserting into multiple tables from external file

Shimmy, January 20, 2012 - 10:42 am UTC

Thank you for the quick feedback.

The above query will work only if Order Number field is in the ascending order. If the data in the file is as follows, then it won't work because 0002 is greater than 0001 and the order number does not reset when it reaches "ST*850*0001" line.
ST*850*0002
PO1*1*2*1382225
PO1*2*1*3444486
PO1*3*1*2463487
PO1*4*1*1146380
SE*5*0002
ST*850*0001
PO1*1*1*2120251
PO1*2*1*1141332
SE*3*0001

Can you please offer any suggestion to fix it?

Thank you

Tom Kyte
January 20, 2012 - 10:46 am UTC

use last_value(column ignore nulls) instead then.

ops$tkyte%ORA11GR2> create table ord_hdr( order_number varchar2(20) );

Table created.

ops$tkyte%ORA11GR2> create table ord_dtl( order_number varchar2(20), item_number varchar2(20), quantity varchar2(20) );

Table created.

ops$tkyte%ORA11GR2> !cat data.txt
ST*850*0002
PO1*1*2*1382225
PO1*2*1*3444486
PO1*3*1*2463487
PO1*4*1*1146380
SE*5*0002
ST*850*0001
PO1*1*1*2120251
PO1*2*1*1141332
SE*3*0001

ops$tkyte%ORA11GR2> insert
  2  when rec_type = 'ST' then into ord_hdr(order_number) values ( ONUM )
  3  when rec_type = 'PO1' then into ord_dtl(order_number,item_number,quantity) values ( ONUM,  field2, field1 )
  4  select rec_type,
  5         last_value(case when rec_type = 'ST' then field1 end IGNORE NULLS) over (order by r) onum,
  6         field1, field2
  7    from (select e.*, rownum r from edifact_data e)
  8  /

8 rows created.

ops$tkyte%ORA11GR2> select * from ord_hdr;

ORDER_NUMBER
--------------------
0002
0001

ops$tkyte%ORA11GR2> select * from ord_dtl;

ORDER_NUMBER         ITEM_NUMBER          QUANTITY
-------------------- -------------------- --------------------
0002                 1382225              2
0002                 3444486              1
0002                 2463487              1
0002                 1146380              1
0001                 2120251              1
0001                 1141332              1

6 rows selected.


External Table

A reader, January 26, 2012 - 1:52 am UTC

hi Tom,

I have to create an external table. Now there is an FTP server whose path is provided in util_file_dir.

1. Is it possible to keep the data file in another server in Unix when we have oracle installed on another server. Does creating the logical mapping through CREATE DIRECTORY works? I read about NFS mount point but can a FTP server, with entry in init.ora parameter be equivalent to that?

2. In case it is possible then can we change the names of data files? as we may have multiple data files every day.

Version: Oracle 11g R2
Tom Kyte
January 26, 2012 - 10:34 am UTC

a) do not use utl_file_dir, it is not secure, inflexible, and considered deprecated

b) how do you have a ftp server path in utl_file_dir??? that doesn't compute, I don't know what that means.


1) you have to have the file available in the file system of the server - via an nfs mount or local file system.

2) yes, you can rename files using utl_file, you can rename the file the external table points to using ALTER TABLE

external table with multiple files and skip

Steve, March 14, 2012 - 11:02 am UTC

Tom,

How is the Skip option suppose to work when you have multiple files associated with the external table?

I have a situation where I would like to read from multiple files all of the same format, but each file would have a header row.

I've tried this below, but it seems to only "skip 1" for the first file in the list and not "skip 1" in each file. Would it be fair to say that the 3 files in this case are glued together and then the skip 1 applied?

Create Or Replace Directory Test_Dir as 'd:\'
/

Drop Table Ext_Test_Tab
/

Create Table Ext_Test_Tab
(
A_Column Varchar2(25),
B_Column Varchar2(30)
)
Organization External
(
Type Oracle_Loader
Default Directory Test_Dir
Access Parameters (
Records Delimited By Newline
Badfile Test_Dir:'ext.bad'
Skip 1
Logfile Test_Dir:'ext.log'
Fields Terminated By ","
Optionally Enclosed By '"'
Missing Field Values Are Null
(
A_Column,
B_Column
)
)
Location ('ext_001.csv')
)
Reject Limit Unlimited
/

/*
File ext_001.csv:

F1 Header 1,F1 Header 2
A,AA

File ext_002.csv:

F2 Header 1,F2 Header 2
B,BB

File ext_003.csv:

F3 Header 1,F3 Header 2
C,CC
*/


Select *
From Ext_Test_Tab

Alter table Ext_Test_Tab Location ('ext_001.csv','ext_002.csv','ext_003.csv')
/

Select *
From Ext_Test_Tab

Thanks
Steve
Tom Kyte
March 14, 2012 - 3:23 pm UTC

skip N would skip the first N records of the first file.


is there something in the header record you can look at and filter out with a where clause?

Steve, March 15, 2012 - 3:01 am UTC

Tom,

Unfortunately not, the files are created by the user of our software in MS Excel and thus all have headers, but can be different in each file because they are specified by the user. The reason for multiple files is because different areas of their business will all have a separate file per business area, which we then suck in via the external table. The format of the file contains date and number fields also (not just varchar2 as per my example), so the header rows from the 2 other files currently end up in the bad file. Which at a push is ok, just not ideal.

Steve
Tom Kyte
March 15, 2012 - 7:34 am UTC

the other alternative is to have it point to a file at a time and load each one.

loop while more files
   alter table external_table to point to new location
   insert into table select * from external_table
end loop

Stop those logfiles !!

Ankit, March 19, 2012 - 9:46 am UTC

Hi Tom
This is in relation to the initial question asked, but I was not able to find what I needed to know, that's why I am posting the question. After creating an external table, every select creates/appends to the log file. Why does this happen ? Can it be stopped once table has created ? I fear that logfile will keep on growing and occupy space which is not at all required.

Thanks a lot

Ankit
Tom Kyte
March 19, 2012 - 10:33 am UTC

you can use

"nologfile"

instead of logfile 'filename' in your create table.



or, just set up a process to occasional zero out the log file - which might be better as it would give you some diagnostic information if something goes wrong.

Creating Directories with Parameters

Rami, August 04, 2012 - 2:35 am UTC

Hi Tom,

I am trying to create a directory in my code so I can write into a file. However this directory's path I want it to be set using the function z_site_preference('ANYTHING').
I keep on receiving an error when I try:
create or replace directory DIRE_CTORY as z_site_preference('ANYTHING')
1. How would I do it?
2. Can I pass a parameter upon creation of the directory or it is not applicable to do so?

Thanks a lot
Tom Kyte
August 17, 2012 - 10:08 am UTC

I don't even know what you mean.

If you are asking "can I bind an identifier in DDL", the answer is "absolutely not"


If you have an application that is dynamically creating directories - I have a real problem with your application. It is a HUGE security risk. I would definitely suggest (strongly) that you reconsider this - these directories are on the database server, they are accessed with the Oracle account privileges, they can cause real damage if you mess up. I would not do this under any circumstance.

In fact, NO developer account - no code account would ever be granted an "ANY" privilege - which is needed to create directories.

Load data using external table vs java batch inserts

Lal, March 05, 2013 - 4:22 am UTC

Tom,
We have a module to load data from a file having size of 2GB. This needs to be loaded into a database table, the no of records is around 3,000,000 rows.
Currently its implemented in the app tier (java) to load the data from file and its done as a batch inserts of 200 records each and committed.
This takes around 30 minutes to complete.

I know that external tables are more useful for these type of loads. But to substantiate this i need some stats.
My question is what are the benefits of using external tables over using a java program which does the parsing and batch inserts and commits 200 records per batch.

Also if i use external table, do i need to split the source file into different files of small sizes rather than loading from a big 2GB file?

One more doubt i have is the table to which the data is to be loaded have heavy read operations.
What would be best way to load this table with the new data.

Load data in a temp table and exchange partition to the current table?
How exactly can this be done. What will happen to the read operations when the new partition is added and old partition is deleted?
The file loading will happen on a weekly basis and after the file load the old data needs to be removed and new data has to be shown.

Also while removing the old data, there are some data which is not part of the file load which should not be deleted at all.

As always expecting your precious input on the same.
Tom Kyte
March 05, 2013 - 11:53 am UTC

I know that external tables are more useful for these type of loads. But to
substantiate this i need some stats.


generate them then!!!! that is what I've been trying to show here for 13+ years. Benchmark, set up a *test*, in your environment with your data, on your hardware.


My question is what are the benefits of using external tables over using a java
program which does the parsing and batch inserts and commits 200 records per
batch.


you can use direct path load - insert /*+ APPEND */ (skips undo, no silly commits)
you can replace a ton of code with a single line of code
it is going to be faster than a slow by slow 200 row commit process.


Also if i use external table, do i need to split the source file into different
files of small sizes rather than loading from a big 2GB file?


no. 2gb / 3,000,000 rows is pretty darn small.

What would be best way to load this table with the new data.


direct path loads, no doubt.

using partitioning would be *great* if you can do that.

What will happen to the read operations when the
new partition is added and old partition is deleted?


already running queries will still see the old data, new queries will see the new data. In effect "nothing happens to them"



Batch load and exchange partition

Lal, March 09, 2013 - 1:30 pm UTC

Hi Tom,
Thanks for your reply.I understand that external table is the best option. If i use insert with append hint (after truncate) will the no of blocks be similar to the normal insert? For making the read operations fast, the no of blocks should be less right?


Regarding the table load i am rephrasing the question as given below.


I have a table T with 20 million rows with data distribution for different codes as

A 10 million
B 5 Million
C 1 million
rest in 1000s of rows for a code


This table data will be queried very frequently and should be available 24*7 for read operations. The data for each code will be refreshed weekly. For example code A data will be refreshed every Monday, Code B every Tuesday like that.

The following are the options we have thought for loading data.

Option 1
Insert into T with an publish indicator for the load data as N, if the load is successful delete the entries for a code
and update the indicator to Y. This can cause issues for code A since delete is a heavy operation.

Option 2
Maintain a separate table T1 and load the data there (passive area) and if the load is successful, copy the rest of the data from T and make T1 the active table. When next load comes T will be truncated and then the same process will be followed.This way the heavy delete operation can be avoided. But code becomes complex to manage the active/passive setup . The application code will refer to a synonym T which points to T or T1 table based on the active/passive logic.
One problem with synonym approach is if for one read transaction the T synonym is referred multiple times, on query can go to T and next query can go to T1 (if a synonym repoint happened).

Option 3
Partition T table data with code (List parition). Load the data to a normal table and if the load is successful exchange data with the corresponding partition in T.

Which option do you suggest, considering the heavy read operations and occationalDML updates on T and considering the availability requirements.

During exchange partition, will the DMLs /query be affected? Will it be resource intensive operation.

Please give your thoughts (new options?).
Tom Kyte
March 11, 2013 - 8:36 am UTC

If i use insert with append hint (after truncate) will the no of blocks be similar to the normal insert? For making the read operations fast, the no of blocks should be less right?


they can be - because you can use direct path compression (basic compression) alter table t compress; - then next time you direct path load it, it'll load compressed and might be 50% or less the size of the original table.


option 3 wins hands down. 100% I would not consider anything else actually.

queries will not be affected, modifications of course would be - we'd need to lock the partition in question. I'd have to question how you could be doing modifications to data that is just going to be entirely wiped out however, this doesn't even begin to make sense to me.

Batch load and exchange partition

Lal, March 12, 2013 - 8:38 am UTC

Tom,
Thanks very much for your inputs and time.

When we tried exchanging partition with 2 million rows it took 4 minutes. Mostly the time taken was for updating the global index. (Two global indexes, one for PK (an id column) and one for the fetch). Without the global index updation, the time to exchange partition was 4 secs.

Also we faced the following error when we tried fetching data from the table while the exchange partition was happening.

java.sql.SQLException: ORA-00600: internal error code, arguments: [12406], [], [], [], [], [], [], [], [], [], [], []
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:440)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)
at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:837)

A simple java program was used to fetched data from the table using Global index in a single session. This was executed as a loop when the exchange partion with global index was happening in a separate session. When we removed the update global index clause and also when the fetch was done in different connections, this error didn't come. But in the same connection just when the exchange completes one select fails with this error. Rest of the selects work. So if we have a connection pool environment, based on the no of connections that many fetches can fail.

Have you come across any such issue? Are we doing some thing wrong? We are using 11.2.0.3 version.
Or this approach is not recommended when the fetches use global index?

I have few more queries.
Ours is an Oltp application, The fetch will be like select * from T where (col1,col2,code) in ((?,?,?),(?,?,?).... like that).
Will partitioning impact the fetch performance?

If i use local index, the exchange partition error is not there. If there can be max of 700 partitions(codes), will local index deteriorate the performance?.

For exchange partion Will the reads be affected because of the global index updates? (Query plan changes? hard parses due to index changes)

We are planning to proceed with the synonym option because of the ORA-600 error. Do you foresee any issues with the synonym approach.?
You recommended the synonym approach for a different scenario of OLTP in the following thread.
http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:1238800184155 (Search for What about OLTP ?)
This case is different from us in that in this case a full refresh is happening, whereas ours is a selective refresh.
Tom Kyte
March 12, 2013 - 8:40 am UTC

please utilize support for ora-600's, that is obviously a support issue (the error message says as much).



Batch load and exchange partition

Lal, March 12, 2013 - 8:58 am UTC

Tom,
We will take up the 600 error with support. Can you please answer the following queries.

Ours is an Oltp application, The fetch will be like select * from T where (col1,col2,code) in ((?,?,?),(?,?,?).... like that).
Will partitioning impact the fetch performance?

If i use local index, the exchange partition error is not there. If there can be max of 700 partitions(codes), will local index deteriorate the performance?.

For exchange partion Will the reads be affected because of the global index updates? (Query plan changes? hard parses due to index changes)

Do you foresee any issues with the synonym approach.?
You recommended the synonym approach for a different scenario of OLTP in the following thread. http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:1238800184155 (Search for What about OLTP ?)
This case is different from us in that in this case a full refresh is happening, whereas ours is a selective refresh.
Tom Kyte
March 12, 2013 - 9:35 am UTC

Will partitioning impact the fetch performance?


it will not make it faster, if you index improperly, it could make it much slower. so be careful in your indexing strategy (unless you are partitioning by col1, col2, code - use a global index on col1, col2, code to ensure you don't to MANY index range scans, one each for each partition)

if you have 700 partitions - you could be doing 700 index range scans .

... For exchange partion Will the reads be affected because of the global index
updates? (Query plan changes? hard parses due to index changes) ...

you'll be doing more IO's - yes, your IO's might take longer, you'll be doing CPU stuff and IO stuff during this operation - that could affect performance - absolutely.

there could be some plan invalidation as well.


I don't see how synonyms would do anything for you.


Look - you are hitting a bug, it should not happen, please work with support to get a patch for that. You want your global indexes for queries, you want to use partitions for exchanging data in.



Skip first record for each flat file

Charlie 木匠, February 02, 2015 - 11:15 pm UTC

Hi Tom,

When there are many data files for an External table, could we skip the first record(the header) for each file?

Thanks,
Charlie

More to Explore

PL/SQL demos

Check out more PL/SQL tutorials on our LiveSQL tool.

PL/SQL docs

PL/SQL reference manual from the Oracle documentation library