Thanks for the question, Muthu.

Asked: October 19, 2006 - 3:11 am UTC

Last updated: December 15, 2006 - 8:40 am UTC

Version: 9i

Viewed 10K+ times! This question is

You Asked

Hi,

I have to 2 tables records say A & B in a file format. I need to compare B with A. If new records are found in A, then the table B needs to get inserted with records from A. on the otherhand, if records for a particular row(s) in B mismatches with A, then that particular row(s) of B table need to get updated with records from A.

There is chance for'n' no of insertions and/or deletion done on the table B.

How shall i achieve the above said using SQL*loader. also let me know how many control file or data file i need to create for the same.

Pls answer at the very earliest.

Lot of thanks in advance

and Tom said...

You cannot update with sqlldr.

You are looking for the MERGE command.
You can accomplish this with MERGE and an external table instead of SQLLDR.

sqlldr does one thing - loads, it only knows how to insert.

ops$tkyte%ORA10GR2> create or replace directory data_dir as '/tmp/'
2 /

Directory created.

ops$tkyte%ORA10GR2> create table external_table
2 (EMPNO NUMBER(4) ,
3 ENAME VARCHAR2(10),
4 JOB VARCHAR2(9),
5 MGR NUMBER(4),
6 HIREDATE DATE,
7 SAL NUMBER(7, 2),
8 COMM NUMBER(7, 2) ,
9 DEPTNO NUMBER(2)
10 )
11 ORGANIZATION EXTERNAL
12 ( type oracle_loader
13 default directory data_dir
14 access parameters
15 ( fields terminated by ',' )
16 location ('emp.dat')
17 )
18 /

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create table emp
2 as
3 select * from scott.emp;

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> delete from emp where mod(empno,2) = 1
2 /

4 rows deleted.

ops$tkyte%ORA10GR2> update emp set sal = sal/2
2 /

10 rows updated.

ops$tkyte%ORA10GR2> commit;

Commit complete.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select empno, sal from emp
2 MINUS
3 select empno, sal from external_table
4 /

EMPNO SAL
---------- ----------
7566 1487.5
7654 625
7698 1425
7782 1225
7788 1500
7844 750
7876 550
7900 475
7902 1500
7934 650

10 rows selected.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select empno, sal from external_table
2 MINUS
3 select empno, sal from emp
4 /

EMPNO SAL
---------- ----------
7369 800
7499 1600
7521 1250
7566 2975
7654 1250
7698 2850
7782 2450
7788 3000
7839 5000
7844 1500
7876 1100
7900 950
7902 3000
7934 1300

14 rows selected.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> merge into EMP e1
2 using EXTERNAL_TABLE e2
3 on ( e2.empno = e1.empno )
4 when matched then
5 update set e1.sal = e2.sal
6 when not matched then
7 insert ( empno, ename, job, mgr, hiredate, sal, comm, deptno )
8 values ( e2.empno, e2.ename, e2.job, e2.mgr, e2.hiredate, e2.sal, e2.comm, e2.deptno )
9 /

14 rows merged.

ops$tkyte%ORA10GR2> commit;

Commit complete.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select empno, sal from emp
2 MINUS
3 select empno, sal from external_table
4 /

no rows selected

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select empno, sal from external_table
2 MINUS
3 select empno, sal from emp
4 /

no rows selected

Rating

(14 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Any performance Issues with Bigger tables ??

Avid Reader, October 19, 2006 - 3:15 pm UTC

As usual , ur examples are crystal clear and that makes understanding anything lot more easier .

I however have followup question

Have you ever encounterd and performance bottleneck with Merge feature when using for Large data set over the dblink ,something similiar to ETL's in datawarehouse application ??

thanks
Avid Reader.

October 20, 2006 - 3:46 am UTC

who is "UR", I looked for their examples - I cannot find them anywhere?

the more you do, the more you do. Is there a "bottleneck", depends on the link, the network, everything.

A reader, October 19, 2006 - 11:28 pm UTC

Hi Tom,
I am really interested in knowing how to improve the performance of merge. I need to usually merge about a million records. It takes lot of time.
Regards,
HH

October 20, 2006 - 4:11 am UTC

merge less records?

what is "less time"
what is "long time"

what are you "expectations"
are they "reasonable"

merge is brutally efficient.

How to use insert and update into same table using SQL*Loader

Duke Ganote, October 21, 2006 - 7:35 am UTC

For smaller loads, have SQL*LOADER insert into a view. Put an INSTEAD OF trigger on the view that MERGEs into the table.

October 21, 2006 - 11:08 am UTC

be careful with that axe eugene.....

that will be row by row and will NOT be deterministic, say the table T(x int primary key, y int ) is empty and you load the file:

1,2
1,3

and someone else loaded the file:

1,3
1,2

files have same data, you would come to different end results however.

Merge would not - merge would alert you to the ambiguity.

I'm not a fan of the "view does automagic trickery in the background" approach.

How does a MERGE works with external tables

Wolfgang, October 21, 2006 - 12:07 pm UTC

Hi Tom,

we are planning to use external tables together with the MERGE Statement. I'm just interested how the MERGE works with external tables. As you can not create an Index on an external table the MERGE must look at the source tables PK for every single row in the external table (because its not sorted). So if you have about 2 million rows (as we will have) in an external table the merge will execute 2 million single lookups at the PK of the source table (rootblock-> branchblock1->...->branchblockX->leafblock)
Is this correct?

Regards
Wolfgang

October 21, 2006 - 1:03 pm UTC

I would seriously hope not a SINGLE index was used anywhere for a merge of a large set into a large set!!!!


a merge with a "when matched and when NOT matched" clause is alot like an outer join:

select * 
  from source left outer join target on (join condition)


and when target.* is NULL - we insert, when target.* is not null - then we update.


I would hope for two full scans and an hash outer join.


ops$tkyte%ORA10GR2> merge into t2
  2  using t1
  3  on (t1.user_id = t2.user_id)
  4  when matched then update set username = t1.username, created = t1.created
  5  when not matched then insert( user_id, username, created) values ( t1.user_id, t1.username, t1.created );

36 rows merged.


Execution Plan
----------------------------------------------------------
Plan hash value: 791008889

------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes |TempSpc| Cost (%
------------------------------------------------------------------------
|   0 | MERGE STATEMENT         |      |  2000K|   198M|       |   100K
|   1 |  MERGE                  | T2   |       |       |       |
|   2 |   VIEW                  |      |       |       |       |
|*  3 |    HASH JOIN RIGHT OUTER|      |  2000K|   171M|    60M|   100K
|   4 |     TABLE ACCESS FULL   | T2   |  1000K|    48M|       | 30855
|   5 |     TABLE ACCESS FULL   | T1   |  2000K|    74M|       | 61708
------------------------------------------------------------------------


Just like that, if it uses an index - that would indicate a big problem!

Thanks a lot

Wolfgang, October 21, 2006 - 4:16 pm UTC

>if it uses an index - that would indicate a big problem!

As I read your books, I would have thought the same...

We will implement this the next 2-3 months. We don't load only one table but about 80. Perhaps it would be better to split the file delivered from the mainfraim into smaller pieces, so that we don't have to fullscan a 10 million rows table for just 10000 Inserts.
As far it's not possible to create stats at an external table? Am I right?
That leeds to another question: How does the CBO know how many rows we (might) have in the external table? Is Oracle aware of the filesize?

If you're interested in, I will poste our procedure how we solved it. But it will take a few month... we are currently doing a merge with another company (that's why we must speed up loading)

Regards Wolgang

PS: You heard that many times, but I really want to thank you for writing your books in such an understandable way. They have prevented me to believe some Oracle gurus like Donald B. and friends.

October 21, 2006 - 5:16 pm UTC

how would splitting help?  you have the same number of things to eventually process, if you split the big file - you'll end up scanning the target table that many more ies.

You can use dbms_stats to set table stats, sure.




ops$tkyte%ORA10GR2> set autotrace traceonly explain
ops$tkyte%ORA10GR2> select * from external_table;

Execution Plan
----------------------------------------------------------
Plan hash value: 1024155065

------------------------------------------------------------------------
| Id  | Operation                  | Name           | Rows  | Bytes | Co
------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                |  8168 |   693K|
|   1 |  EXTERNAL TABLE ACCESS FULL| EXTERNAL_TABLE |  8168 |   693K|
------------------------------------------------------------------------

ops$tkyte%ORA10GR2> exec dbms_stats.set_table_stats( user, 'EXTERNAL_TABLE', numrows=> 123 );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> select * from external_table;

Execution Plan
----------------------------------------------------------
Plan hash value: 1024155065

------------------------------------------------------------------------
| Id  | Operation                  | Name           | Rows  | Bytes | Co
------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                |   123 | 10701 |
|   1 |  EXTERNAL TABLE ACCESS FULL| EXTERNAL_TABLE |   123 | 10701 |
------------------------------------------------------------------------

ops$tkyte%ORA10GR2> set autotrace off


8168 is the default guess on an 8k block size database...

Splitting

Wolfgang, October 22, 2006 - 3:04 am UTC

>how would splitting help?

We get one big file from the mainfrain. It will look like that:
TABLE1|field1|field2|field3
TABLE1|field1|field2|field3
TABLE2|fieldA|fieldB|fieldC|fieldD
TABLE3|...
more data for about 80-90 different tables
...

There will be a convention, that the first field in the file is always the PK of the table. The MERGE-Statement for each table will be generated dynamicly in a little PL/SQL Procedure.

When we splitt the file and create several external tables we can merge more than one table at a time (ok we could do that without splitting) and for example if TABLE2 has only 100 rows in the file, we would have to fullscan the whole 1 Million rows file and eventually its associated 10 Million rows table - just for 100 changes.

Splitting the file will enable us to avoid that, and we can set propper statistics because the splitprogram knows how many rows are existing for a certain table.

Regards Wolfgang

October 22, 2006 - 7:45 am UTC

yes, you could do that without splitting

AND

if you split them AND the key is actually repeated (unbeknownest to you), you would get different results......

I still don't get it - if you have N rows - regardless of the NUMBER OF FILES YOU SPLIT THEM INTO - you have N rows to process. I see this saving *nothing*, not unless you can actually SKIP A FILE - which you cannot. So, you still have the same amount of data to process, you would just process the target over and over and over and over....

yes we can update using sqlldr

Rajeshwar, October 22, 2006 - 4:26 am UTC

Actually I wrote function, which will update/insert based on input from file.

Inside function use Autonomous Transactions.

call function inside the sqlldr

October 22, 2006 - 7:48 am UTC

umm, no - you are doing the "view" trick only "not even as good as the view trick"

you are not loading into the table you are wanting to load into. In your case, you would be faster, better, easier loading into a view with instead of triggers.

Actually, you would be best by using MERGE with an external table.

clarification

Wolfgang, October 22, 2006 - 8:43 am UTC

The big file contains data for different tables (80-90). Every row has the Tablename where the data belongs to as the first column in it.
We split the big file into smaller files which then contains only the data for one table.

Regards Wolfgang

October 22, 2006 - 4:09 pm UTC

that would be different then - yes, if you split them (and splitting is "low cost") that would make sense.

Muthu Maheswaran, October 23, 2006 - 1:36 am UTC

I find the answer useful to me very much. In fact, it changed the view of the logic i am thinking so far

Helena Marková, October 23, 2006 - 8:32 am UTC

Reader, December 13, 2006 - 7:19 am UTC

I have a flat file in which I do not have an id column:
e_mail,first_name,last_name,phone_no
Record #1 in flat file may have all fields (
i.e e-mail, first_name,last_name,phone_no)
abc@yahoo.com,abc,def,123-456-789

Record #2 in flat file may have
(phone_no)
111-111-1111

I already have a table which has some data that are in this file. I do not have a primary key / unique key column in this table. I need to update the existing records if they are present or I need to insert the record if the records are not present. Can I load the data in this flat file to an external table and use a merge statement to do the inert/update (if e-mail id present for a record,then use that to find if there is already a record in the table. If e-mail id not found, then use phone number to find if the record exists or not and then update/insert accordingly) Can you please let me know how to use the merge statement for this scenario?

December 13, 2006 - 7:52 am UTC

if it is

a) match by email and then if nothing found
b) match by phone number and then if nothing found
c) insert

not really.

Merge

emob, December 13, 2006 - 9:20 am UTC

This may be a very basic question but its this type that usually stymies me :)
Previously, you gave as (part of) an example:
ops$tkyte%ORA10GR2> merge into t2
2 using t1
3 on (t1.user_id = t2.user_id)
4 when matched then update set username = t1.username, created = t1.created
5 when not matched then insert(user_id, username, created) values (t1.user_id, t1.username, t1.created );

36 rows merged.

The part I'm missing is how to: what to do if, when matched, do nothing?
I only want to insert new recs, not update.

Thanks.

December 13, 2006 - 11:38 am UTC

well, 10g can do that (just leave off the when matched bit)

but......

insert into t2
select * from t1
where user_id not in (select user_id from t2);

Reader, December 13, 2006 - 12:02 pm UTC

I asked:

I have a flat file in which I do not have an id column:
e_mail,first_name,last_name,phone_no
Record #1 in flat file may have all fields (
i.e e-mail, first_name,last_name,phone_no)
abc@yahoo.com,abc,def,123-456-789

Record #2 in flat file may have
(phone_no)
111-111-1111

I already have a table which has some data that are in this file. I do not have
a primary key / unique key column in this table. I need to update the existing
records if they are present or I need to insert the record if the records are
not present. Can I load the data in this flat file to an external table and use
a merge statement to do the inert/update (if e-mail id present for a record,then
use that to find if there is already a record in the table. If e-mail id not
found, then use phone number to find if the record exists or not and then
update/insert accordingly) Can you please let me know how to use the merge
statement for this scenario?

Followup:
You said:

if it is

a) match by email and then if nothing found
b) match by phone number and then if nothing found
c) insert

not really.

My question:
I loaded the data from flat file into a table called test_new

create table test_new
(email varchar2(20)
,first_name varchar2(20)
,last_name varchar2(20)
,phone varchar2(20)
);

insert into test_new
values
('abc@yahoo.com','abc','def','123-456-789');

insert into test_new
values
('xyz@yahoo.com','xyz','wjr','111-111-345');

insert into test_new
(first_name,last_name,phone)
values
('tqr','ghi','999-999-999');

insert into test_new
values
('pqr@yahoo.com','pqr','stv','777-666-888');

create table test_old
(email varchar2(20)
,first_name varchar2(20)
,last_name varchar2(20)
,phone varchar2(20)
);

insert into test_old
values
('abc@yahoo.com','abc','xxx','123-456-789');

insert into test_old
values
('xyz@yahoo.com','xyz','wjr','666-666-666');

insert into test_old
(first_name,last_name,phone)
values
('ttt','ghi','999-999-999');

Could plese let me know how to merge the records from test_new to test_old (if e-mail id found match using that, if not match using phone).

Thnaks

December 14, 2006 - 9:58 pm UTC

sort of already said "merge is not going to work" didn't i?

it is called "write code" at this point.

data source

Robert, December 14, 2006 - 6:34 pm UTC

Hi everybody

I have this situation. On one server I have Oracle 9 and on another one I have text files in this path(//10.10.10.10/data_source/files.txt). I would like to ask (if it's possible) to create an external table using data files stored in different server. If yes, could you write me how to create this directory in oracle?

Thank you very much

Robert

December 15, 2006 - 8:40 am UTC

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:241814624807 <code>

would not really recommend it - it is a pain in Windows

Breadcrumb

Question and Answer