Any performance Issues with Bigger tables ??
Avid Reader, October 19, 2006 - 3:15 pm UTC
As usual , ur examples are crystal clear and that makes understanding anything lot more easier .
I however have followup question
Have you ever encounterd and performance bottleneck with Merge feature when using for Large data set over the dblink ,something similiar to ETL's in datawarehouse application ??
thanks
Avid Reader.
October 20, 2006 - 3:46 am UTC
who is "UR", I looked for their examples - I cannot find them anywhere?
the more you do, the more you do. Is there a "bottleneck", depends on the link, the network, everything.
A reader, October 19, 2006 - 11:28 pm UTC
Hi Tom,
I am really interested in knowing how to improve the performance of merge. I need to usually merge about a million records. It takes lot of time.
Regards,
HH
October 20, 2006 - 4:11 am UTC
merge less records?
what is "less time"
what is "long time"
what are you "expectations"
are they "reasonable"
merge is brutally efficient.
How to use insert and update into same table using SQL*Loader
Duke Ganote, October 21, 2006 - 7:35 am UTC
For smaller loads, have SQL*LOADER insert into a view. Put an INSTEAD OF trigger on the view that MERGEs into the table.
October 21, 2006 - 11:08 am UTC
be careful with that axe eugene.....
that will be row by row and will NOT be deterministic, say the table T(x int primary key, y int ) is empty and you load the file:
1,2
1,3
and someone else loaded the file:
1,3
1,2
files have same data, you would come to different end results however.
Merge would not - merge would alert you to the ambiguity.
I'm not a fan of the "view does automagic trickery in the background" approach.
How does a MERGE works with external tables
Wolfgang, October 21, 2006 - 12:07 pm UTC
Hi Tom,
we are planning to use external tables together with the MERGE Statement. I'm just interested how the MERGE works with external tables. As you can not create an Index on an external table the MERGE must look at the source tables PK for every single row in the external table (because its not sorted). So if you have about 2 million rows (as we will have) in an external table the merge will execute 2 million single lookups at the PK of the source table (rootblock-> branchblock1->...->branchblockX->leafblock)
Is this correct?
Regards
Wolfgang
October 21, 2006 - 1:03 pm UTC
I would seriously hope not a SINGLE index was used anywhere for a merge of a large set into a large set!!!!
a merge with a "when matched and when NOT matched" clause is alot like an outer join:
select *
from source left outer join target on (join condition)
and when target.* is NULL - we insert, when target.* is not null - then we update.
I would hope for two full scans and an hash outer join.
ops$tkyte%ORA10GR2> merge into t2
2 using t1
3 on (t1.user_id = t2.user_id)
4 when matched then update set username = t1.username, created = t1.created
5 when not matched then insert( user_id, username, created) values ( t1.user_id, t1.username, t1.created );
36 rows merged.
Execution Plan
----------------------------------------------------------
Plan hash value: 791008889
------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%
------------------------------------------------------------------------
| 0 | MERGE STATEMENT | | 2000K| 198M| | 100K
| 1 | MERGE | T2 | | | |
| 2 | VIEW | | | | |
|* 3 | HASH JOIN RIGHT OUTER| | 2000K| 171M| 60M| 100K
| 4 | TABLE ACCESS FULL | T2 | 1000K| 48M| | 30855
| 5 | TABLE ACCESS FULL | T1 | 2000K| 74M| | 61708
------------------------------------------------------------------------
Just like that, if it uses an index - that would indicate a big problem!
Thanks a lot
Wolfgang, October 21, 2006 - 4:16 pm UTC
>if it uses an index - that would indicate a big problem!
As I read your books, I would have thought the same...
We will implement this the next 2-3 months. We don't load only one table but about 80. Perhaps it would be better to split the file delivered from the mainfraim into smaller pieces, so that we don't have to fullscan a 10 million rows table for just 10000 Inserts.
As far it's not possible to create stats at an external table? Am I right?
That leeds to another question: How does the CBO know how many rows we (might) have in the external table? Is Oracle aware of the filesize?
If you're interested in, I will poste our procedure how we solved it. But it will take a few month... we are currently doing a merge with another company (that's why we must speed up loading)
Regards Wolgang
PS: You heard that many times, but I really want to thank you for writing your books in such an understandable way. They have prevented me to believe some Oracle gurus like Donald B. and friends.
October 21, 2006 - 5:16 pm UTC
how would splitting help? you have the same number of things to eventually process, if you split the big file - you'll end up scanning the target table that many more ies.
You can use dbms_stats to set table stats, sure.
ops$tkyte%ORA10GR2> set autotrace traceonly explain
ops$tkyte%ORA10GR2> select * from external_table;
Execution Plan
----------------------------------------------------------
Plan hash value: 1024155065
------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Co
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 | 693K|
| 1 | EXTERNAL TABLE ACCESS FULL| EXTERNAL_TABLE | 8168 | 693K|
------------------------------------------------------------------------
ops$tkyte%ORA10GR2> exec dbms_stats.set_table_stats( user, 'EXTERNAL_TABLE', numrows=> 123 );
PL/SQL procedure successfully completed.
ops$tkyte%ORA10GR2> select * from external_table;
Execution Plan
----------------------------------------------------------
Plan hash value: 1024155065
------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Co
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 123 | 10701 |
| 1 | EXTERNAL TABLE ACCESS FULL| EXTERNAL_TABLE | 123 | 10701 |
------------------------------------------------------------------------
ops$tkyte%ORA10GR2> set autotrace off
8168 is the default guess on an 8k block size database...
Splitting
Wolfgang, October 22, 2006 - 3:04 am UTC
>how would splitting help?
We get one big file from the mainfrain. It will look like that:
TABLE1|field1|field2|field3
TABLE1|field1|field2|field3
TABLE2|fieldA|fieldB|fieldC|fieldD
TABLE3|...
more data for about 80-90 different tables
...
There will be a convention, that the first field in the file is always the PK of the table. The MERGE-Statement for each table will be generated dynamicly in a little PL/SQL Procedure.
When we splitt the file and create several external tables we can merge more than one table at a time (ok we could do that without splitting) and for example if TABLE2 has only 100 rows in the file, we would have to fullscan the whole 1 Million rows file and eventually its associated 10 Million rows table - just for 100 changes.
Splitting the file will enable us to avoid that, and we can set propper statistics because the splitprogram knows how many rows are existing for a certain table.
Regards Wolfgang
October 22, 2006 - 7:45 am UTC
yes, you could do that without splitting
AND
if you split them AND the key is actually repeated (unbeknownest to you), you would get different results......
I still don't get it - if you have N rows - regardless of the NUMBER OF FILES YOU SPLIT THEM INTO - you have N rows to process. I see this saving *nothing*, not unless you can actually SKIP A FILE - which you cannot. So, you still have the same amount of data to process, you would just process the target over and over and over and over....
yes we can update using sqlldr
Rajeshwar, October 22, 2006 - 4:26 am UTC
Actually I wrote function, which will update/insert based on input from file.
Inside function use Autonomous Transactions.
call function inside the sqlldr
October 22, 2006 - 7:48 am UTC
umm, no - you are doing the "view" trick only "not even as good as the view trick"
you are not loading into the table you are wanting to load into. In your case, you would be faster, better, easier loading into a view with instead of triggers.
Actually, you would be best by using MERGE with an external table.
clarification
Wolfgang, October 22, 2006 - 8:43 am UTC
The big file contains data for different tables (80-90). Every row has the Tablename where the data belongs to as the first column in it.
We split the big file into smaller files which then contains only the data for one table.
Regards Wolfgang
October 22, 2006 - 4:09 pm UTC
that would be different then - yes, if you split them (and splitting is "low cost") that would make sense.
Muthu Maheswaran, October 23, 2006 - 1:36 am UTC
I find the answer useful to me very much. In fact, it changed the view of the logic i am thinking so far
Helena Marková, October 23, 2006 - 8:32 am UTC
Reader, December 13, 2006 - 7:19 am UTC
I have a flat file in which I do not have an id column:
e_mail,first_name,last_name,phone_no
Record #1 in flat file may have all fields (
i.e e-mail, first_name,last_name,phone_no)
abc@yahoo.com,abc,def,123-456-789
Record #2 in flat file may have
(phone_no)
111-111-1111
I already have a table which has some data that are in this file. I do not have a primary key / unique key column in this table. I need to update the existing records if they are present or I need to insert the record if the records are not present. Can I load the data in this flat file to an external table and use a merge statement to do the inert/update (if e-mail id present for a record,then use that to find if there is already a record in the table. If e-mail id not found, then use phone number to find if the record exists or not and then update/insert accordingly) Can you please let me know how to use the merge statement for this scenario?
December 13, 2006 - 7:52 am UTC
if it is
a) match by email and then if nothing found
b) match by phone number and then if nothing found
c) insert
not really.
Merge
emob, December 13, 2006 - 9:20 am UTC
This may be a very basic question but its this type that usually stymies me :)
Previously, you gave as (part of) an example:
ops$tkyte%ORA10GR2> merge into t2
2 using t1
3 on (t1.user_id = t2.user_id)
4 when matched then update set username = t1.username, created = t1.created
5 when not matched then insert(user_id, username, created) values (t1.user_id, t1.username, t1.created );
36 rows merged.
The part I'm missing is how to: what to do if, when matched, do nothing?
I only want to insert new recs, not update.
Thanks.
December 13, 2006 - 11:38 am UTC
well, 10g can do that (just leave off the when matched bit)
but......
insert into t2
select * from t1
where user_id not in (select user_id from t2);
Reader, December 13, 2006 - 12:02 pm UTC
I asked:
I have a flat file in which I do not have an id column:
e_mail,first_name,last_name,phone_no
Record #1 in flat file may have all fields (
i.e e-mail, first_name,last_name,phone_no)
abc@yahoo.com,abc,def,123-456-789
Record #2 in flat file may have
(phone_no)
111-111-1111
I already have a table which has some data that are in this file. I do not have
a primary key / unique key column in this table. I need to update the existing
records if they are present or I need to insert the record if the records are
not present. Can I load the data in this flat file to an external table and use
a merge statement to do the inert/update (if e-mail id present for a record,then
use that to find if there is already a record in the table. If e-mail id not
found, then use phone number to find if the record exists or not and then
update/insert accordingly) Can you please let me know how to use the merge
statement for this scenario?
Followup:
You said:
if it is
a) match by email and then if nothing found
b) match by phone number and then if nothing found
c) insert
not really.
My question:
I loaded the data from flat file into a table called test_new
create table test_new
(email varchar2(20)
,first_name varchar2(20)
,last_name varchar2(20)
,phone varchar2(20)
);
insert into test_new
values
('abc@yahoo.com','abc','def','123-456-789');
insert into test_new
values
('xyz@yahoo.com','xyz','wjr','111-111-345');
insert into test_new
(first_name,last_name,phone)
values
('tqr','ghi','999-999-999');
insert into test_new
values
('pqr@yahoo.com','pqr','stv','777-666-888');
create table test_old
(email varchar2(20)
,first_name varchar2(20)
,last_name varchar2(20)
,phone varchar2(20)
);
insert into test_old
values
('abc@yahoo.com','abc','xxx','123-456-789');
insert into test_old
values
('xyz@yahoo.com','xyz','wjr','666-666-666');
insert into test_old
(first_name,last_name,phone)
values
('ttt','ghi','999-999-999');
Could plese let me know how to merge the records from test_new to test_old (if e-mail id found match using that, if not match using phone).
Thnaks
December 14, 2006 - 9:58 pm UTC
sort of already said "merge is not going to work" didn't i?
it is called "write code" at this point.
data source
Robert, December 14, 2006 - 6:34 pm UTC
Hi everybody
I have this situation. On one server I have Oracle 9 and on another one I have text files in this path(//10.10.10.10/data_source/files.txt). I would like to ask (if it's possible) to create an external table using data files stored in different server. If yes, could you write me how to create this directory in oracle?
Thank you very much
Robert
December 15, 2006 - 8:40 am UTC