Thanks for the question, jing.

Asked: April 27, 2012 - 2:52 am UTC

Last updated: May 06, 2012 - 3:24 pm UTC

Version: 10.2.0.4

Viewed 10K+ times! This question is

You Asked

hello tom:
we have a target table msisdn_ua1 which has more than 100 million rows,and we want to merge the rows in table file_temp with about one million rows into msisdn_ua,the sql we use as follow:

MERGE  /*+parallel(t1,8)*/ INTO msisdn_ua t1
               USING (SELECT t.msisdn, t.ua_code,
                             TO_DATE (t.DAY, 'yyyymmdd') record_time,
                             t.model_id
                        FROM file_temp t) t2
               ON (    t1.msisdn = t2.msisdn
                   AND t1.ua_code = t2.ua_code
                   AND t1.model_id = t2.model_id)
               WHEN NOT MATCHED THEN
                  INSERT (msisdn, ua_code, record_time, model_id)
                  VALUES (t2.msisdn, t2.ua_code, t2.record_time, t2.model_id);

we have a index IDX_MSISDN_UA on table msisdn_ua(msisdn,model_id,ua_code),now we find that the plan is to use hash join,and the operation is very slow;
i mean that ,if we can use fast full index scan on the msisdn_ua,we can use nested loop join,i think it will be faster;so i use the sql like this :

MERGE  /*+index_ffs(t1 IDX_MSISDN_UA) use_nl(t2,t1) */ INTO msisdn_ua t1
               USING (SELECT t.msisdn, t.ua_code,
                             TO_DATE (t.DAY, 'yyyymmdd') record_time,
                             t.model_id
                        FROM file_temp t) t2
               ON (    t1.msisdn = t2.msisdn
                   AND t1.ua_code = t2.ua_code
                   AND t1.model_id = t2.model_id)
               WHEN NOT MATCHED THEN
                  INSERT (msisdn, ua_code, record_time, model_id)
                  VALUES (t2.msisdn, t2.ua_code, t2.record_time, t2.model_id);

but ,the plan shows that the merge operation will not use fast full index scan on msisdn_ua but index range scan ,so the operation will slower;
my question is :
1.why dosen't it use fast full index scan here,the operation just to test if the msisdn,ua_code,model_id which in the index not in the msisdn_ua table,then inert it,they don't use any other field in the msisdn_ua;
2.how can i make the operation faster?

thanks very much!

and Tom said...

first, an index fast full scan is the ability of the database to use an INDEX instead of a TABLE - the index is read "not in order", but full scanned out of order.

second, if it used an index and got a nested loops join, it would be just about as slow as you can make it

https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:6749454952894#6760861174154

an index is good to retrieve A SMALL NUMBER of rows from a table. 1,000,000 rows is *NOT* a small number. 100,000,000 rows is *NOT* a small number. Using an index would probably not make sense

Now, that said, it sounds like you want to add rows from FILE_TEMP that do not already exist in MSISDN_UA. Assuming there is a primary key in place, you would want to evaluate:

ops$tkyte%ORA11GR2> create table file_temp (msisdn number, ua_code number, model_id number, data varchar2(2000) );

Table created.

ops$tkyte%ORA11GR2> create table msisdn_ua (msisdn number, ua_code number, model_id number, data varchar2(2000), primary key(msisdn,ua_code,model_id) );

Table created.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> exec dbms_stats.set_table_stats( user, 'FILE_TEMP', numrows => 1000000, numblks => 100000 );

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> exec dbms_stats.set_table_stats( user, 'MSISDN_UA', numrows => 10000000, numblks => 1000000 );

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> set autotrace on explain
ops$tkyte%ORA11GR2> insert into msisdn_ua
  2  select file_temp.*
  3    from file_temp, msisdn_ua
  4   where file_temp.msisdn   = msisdn_ua.msisdn (+)
  5     and file_temp.ua_code  = msisdn_ua.ua_code (+)
  6     and file_temp.model_id = msisdn_ua.model_id (+)
  7     and msisdn_ua.rowid is null;

0 rows created.


Execution Plan
----------------------------------------------------------
Plan hash value: 1644611145

-----------------------------------------------------------------------------------------
| Id  | Operation                | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |              |  1000K|   144M| 27203   (1)| 00:05:27 |
|   1 |  LOAD TABLE CONVENTIONAL | MSISDN_UA    |       |       |            |          |
|*  2 |   FILTER                 |              |       |       |            |          |
|   3 |    NESTED LOOPS OUTER    |              |  1000K|   144M| 27203   (1)| 00:05:27 |
|   4 |     TABLE ACCESS FULL    | FILE_TEMP    |  1000K|    95M| 27123   (1)| 00:05:26 |
|*  5 |     INDEX UNIQUE SCAN    | SYS_C0032619 |     1 |    51 |     0   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("MSISDN_UA".ROWID IS NULL)
   5 - access("FILE_TEMP"."MSISDN"="MSISDN_UA"."MSISDN"(+) AND
              "FILE_TEMP"."UA_CODE"="MSISDN_UA"."UA_CODE"(+) AND
              "FILE_TEMP"."MODEL_ID"="MSISDN_UA"."MODEL_ID"(+))

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> insert into msisdn_ua
  2  select * from file_temp
  3  where (msisdn,ua_code,model_id) not in (select msisdn, ua_code, model_id from msisdn_ua);

0 rows created.


Execution Plan
----------------------------------------------------------
Plan hash value: 2572901664

-------------------------------------------------------------------------------------------------
| Id  | Operation                | Name         | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |              |  1000K|   132M|       |   150K  (1)| 00:30:05 |
|   1 |  LOAD TABLE CONVENTIONAL | MSISDN_UA    |       |       |       |            |          |
|   2 |   MERGE JOIN ANTI SNA    |              |  1000K|   132M|       |   150K  (1)| 00:30:05 |
|   3 |    SORT JOIN             |              |  1000K|    95M|   217M| 49829   (1)| 00:09:58 |
|   4 |     TABLE ACCESS FULL    | FILE_TEMP    |  1000K|    95M|       | 27123   (1)| 00:05:26 |
|*  5 |    SORT UNIQUE           |              |    10M|   371M|   995M|   100K  (1)| 00:20:07 |
|   6 |     INDEX FAST FULL SCAN | SYS_C0032619 |    10M|   371M|       |     9   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access(INTERNAL_FUNCTION("MSISDN")=INTERNAL_FUNCTION("MSISDN") AND
              INTERNAL_FUNCTION("UA_CODE")=INTERNAL_FUNCTION("UA_CODE") AND
              INTERNAL_FUNCTION("MODEL_ID")=INTERNAL_FUNCTION("MODEL_ID"))
       filter(INTERNAL_FUNCTION("MODEL_ID")=INTERNAL_FUNCTION("MODEL_ID") AND
              INTERNAL_FUNCTION("UA_CODE")=INTERNAL_FUNCTION("UA_CODE") AND
              INTERNAL_FUNCTION("MSISDN")=INTERNAL_FUNCTION("MSISDN"))

ops$tkyte%ORA11GR2> set autotrace off

I don't have valid stats on my indexes and tables - so I would expect your plans to be a little different.

Rating

(6 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

jing wei, April 28, 2012 - 12:50 am UTC

tom,thanks for your reply,and i'm sorry my englinsh is so poor .

i think maybe there are some differences between my Oracle environment and yours,i change my code as follow:

insert into TMP_MSISDN_UA1 t1
select t.msisdn, t.ua_code,TO_DATE (t.DAY, 'yyyymmdd') record_time,t.model_id 
from  file_temp1 t
where (t.msisdn,t.ua_code,t.model_id) not in (select t.msisdn,t.ua_code,t.model_id from TMP_MSISDN_UA1);

the execute plan is :

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1865280055

------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name               | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT      |                    |  1000K|    32M|   299G  (1)|999:59:59 |       |       |
|*  1 |  FILTER               |                    |       |       |            |          |       |       |
|   2 |   TABLE ACCESS FULL   | FILE_TEMP1         |  1000K|    32M| 18053   (1)| 00:03:37 |       |       |
|*  3 |   FILTER              |                    |       |       |            |          |       |       |
|   4 |    PARTITION RANGE ALL|                    |    63M|       |   300K  (1)| 01:00:08 |     1 |    10 |
|   5 |     PARTITION HASH ALL|                    |    63M|       |   300K  (1)| 01:00:08 |     1 |     8 |

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   6 |      INDEX FULL SCAN  | IDX_TMP_MSISDN_UA1 |    63M|       |   300K  (1)| 01:00:08 |     1 |    80 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( NOT EXISTS (SELECT /*+ */ 0 FROM "TMP_MSISDN_UA1" "TMP_MSISDN_UA1" WHERE
              LNNVL(:B1<>:B2) AND LNNVL(:B3<>:B4) AND LNNVL(:B5<>:B6)))
   3 - filter(LNNVL(:B1<>:B2) AND LNNVL(:B3<>:B4) AND LNNVL(:B5<>:B6))

20 rows selected.

it won't use merge sort join and fast full index scan,
and i just test your code before,the execute plan is different too:

SQL> show user 
USER is "UA"
SQL> create table test_file_temp (msisdn number, ua_code number, model_id 
  2  number, data varchar2(2000) );

Table created.

SQL> create table test_msisdn_ua (msisdn number, ua_code number, model_id 
  2  number, data varchar2(2000), primary key(msisdn,ua_code,model_id) );

Table created.

SQL>  exec dbms_stats.set_table_stats( 'UA', 'TEST_FILE_TEMP', numrows =>1000000, numblks => 100000 );

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.set_table_stats( 'UA', 'TEST_MSISDN_UA', numrows => 10000000, numblks => 1000000 );

PL/SQL procedure successfully completed.

SQL> set autotrace on explain
SQL> insert into test_msisdn_ua
  2  select test_file_temp.*
  3  from test_file_temp, test_msisdn_ua
  4  where test_file_temp.msisdn   = test_msisdn_ua.msisdn (+)
  5  and test_file_temp.ua_code  = test_msisdn_ua.ua_code (+)
  6  and test_file_temp.model_id = test_msisdn_ua.model_id (+)
  7  and test_msisdn_ua.rowid is null;

0 rows created.
Execution Plan
----------------------------------------------------------
Plan hash value: 3957549117

--------------------------------------------------------------------------------------
| Id  | Operation           | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT    |                |  1000K|   144M| 18200   (2)| 00:03:39 |
|*  1 |  FILTER             |                |       |       |            |          |
|   2 |   NESTED LOOPS OUTER|                |  1000K|   144M| 18200   (2)| 00:03:39 |
|   3 |    TABLE ACCESS FULL| TEST_FILE_TEMP |  1000K|    95M| 18045   (1)| 00:03:37 |
|*  4 |    INDEX UNIQUE SCAN| SYS_C00291122  |     1 |    51 |     0   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("TEST_MSISDN_UA".ROWID IS NULL)
   4 - access("TEST_FILE_TEMP"."MSISDN"="TEST_MSISDN_UA"."MSISDN"(+) AND
              "TEST_FILE_TEMP"."UA_CODE"="TEST_MSISDN_UA"."UA_CODE"(+) AND
              "TEST_FILE_TEMP"."MODEL_ID"="TEST_MSISDN_UA"."MODEL_ID"(+))

SQL> insert into test_msisdn_ua
  2  select * from test_file_temp
  3  where (msisdn,ua_code,model_id) not in (select msisdn, ua_code, model_id from 
  4  test_msisdn_ua);

0 rows created.


Execution Plan
----------------------------------------------------------
Plan hash value: 2894540078

-------------------------------------------------------------------------------------
| Id  | Operation          | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT   |                | 50000 |  4882K|   141M (82)|473:13:30 |
|*  1 |  FILTER            |                |       |       |            |          |
|   2 |   TABLE ACCESS FULL| TEST_FILE_TEMP |  1000K|    95M| 18047   (1)| 00:03:37 |
|*  3 |   INDEX FULL SCAN  | SYS_C00291122  |  8573K|   318M|   142  (82)| 00:00:02 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( NOT EXISTS (SELECT /*+ */ 0 FROM "TEST_MSISDN_UA"
              "TEST_MSISDN_UA" WHERE LNNVL("MSISDN"<>:B1) AND LNNVL("UA_CODE"<>:B2) AND
              LNNVL("MODEL_ID"<>:B3)))
   3 - filter(LNNVL("MSISDN"<>:B1) AND LNNVL("UA_CODE"<>:B2) AND
              LNNVL("MODEL_ID"<>:B3))

the execute plan shows that it will not use merge sort join,i don't know what would cause this,do you tell me why this happen？thanks very much.

April 30, 2012 - 8:03 am UTC

are the three columns in question defined NOT NULL (i had them as a primary key, do you?)

A reader, April 28, 2012 - 4:04 pm UTC

Hi Tom,

In your test case the explain plan for merge anti join is in step 5 oracle is doing sort? since it has done FFS on the unique index?
why not just do Full scan and avoid sorting?


Execution Plan
----------------------------------------------------------
Plan hash value: 2572901664

------------------------------------------------------------------------------------------
-------
| Id  | Operation                | Name         | Rows  | Bytes |TempSpc| Cost (%CPU)| 
Time     |
------------------------------------------------------------------------------------------
-------
|   0 | INSERT STATEMENT         |              |  1000K|   132M|       |   150K  (1)| 
00:30:05 |
|   1 |  LOAD TABLE CONVENTIONAL | MSISDN_UA    |       |       |       |            |    
      |
|   2 |   MERGE JOIN ANTI SNA    |              |  1000K|   132M|       |   150K  (1)| 
00:30:05 |
|   3 |    SORT JOIN             |              |  1000K|    95M|   217M| 49829   (1)| 
00:09:58 |
|   4 |     TABLE ACCESS FULL    | FILE_TEMP    |  1000K|    95M|       | 27123   (1)| 
00:05:26 |
|*  5 |    SORT UNIQUE           |              |    10M|   371M|   995M|   100K  (1)| 
00:20:07 |
|   6 |     INDEX FAST FULL SCAN | SYS_C0032619 |    10M|   371M|       |     9   (0)| 
00:00:01 |
------------------------------------------------------------------------------------------
-------

Thanks in advance

Regard's

April 30, 2012 - 8:14 am UTC

FFS = fast full scan - it is using multiblock IO to read the index structure in its entirety - NOT in sorted order (that would take reading the index block by block - that would be an index full scan - not an index FAST full scan)

jing wei, April 30, 2012 - 8:37 am UTC

hi,tom

in my develop environment ，the three columns are not primary key because model_id may be null and the exiting data may not be unique, otherwise , the table tmp_msisdn_ua1 is a partition table, and the index is a local partition index.

April 30, 2012 - 8:44 am UTC

well, umm, then you cannot develop the logic in development that will work the same in production.

nulls change this completely. How can you work with two ENTIRELY different data models? what kind of sense does that make???

jing wei, April 30, 2012 - 9:13 am UTC

maybe i can change my data model, to delete the duplicate data,and replace the null of model_id to a negative number，then make it to the primay key;but,i just take the test as yours ,the result is different ,i don't know why,could you tell me?thanks very much!

SQL> show user 
USER is "UA"
SQL> create table test_file_temp (msisdn number, ua_code number, model_id 
  2  number, data varchar2(2000) );

Table created.

SQL> create table test_msisdn_ua (msisdn number, ua_code number, model_id 
  2  number, data varchar2(2000), primary key(msisdn,ua_code,model_id) );

Table created.

SQL>  exec dbms_stats.set_table_stats( 'UA', 'TEST_FILE_TEMP', numrows =>1000000, numblks => 100000 
);

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.set_table_stats( 'UA', 'TEST_MSISDN_UA', numrows => 10000000, numblks => 
1000000 );

PL/SQL procedure successfully completed.

SQL> set autotrace on explain
SQL> insert into test_msisdn_ua
  2  select test_file_temp.*
  3  from test_file_temp, test_msisdn_ua
  4  where test_file_temp.msisdn   = test_msisdn_ua.msisdn (+)
  5  and test_file_temp.ua_code  = test_msisdn_ua.ua_code (+)
  6  and test_file_temp.model_id = test_msisdn_ua.model_id (+)
  7  and test_msisdn_ua.rowid is null;

0 rows created.
Execution Plan
----------------------------------------------------------
Plan hash value: 3957549117

--------------------------------------------------------------------------------------
| Id  | Operation           | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT    |                |  1000K|   144M| 18200   (2)| 00:03:39 |
|*  1 |  FILTER             |                |       |       |            |          |
|   2 |   NESTED LOOPS OUTER|                |  1000K|   144M| 18200   (2)| 00:03:39 |
|   3 |    TABLE ACCESS FULL| TEST_FILE_TEMP |  1000K|    95M| 18045   (1)| 00:03:37 |
|*  4 |    INDEX UNIQUE SCAN| SYS_C00291122  |     1 |    51 |     0   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("TEST_MSISDN_UA".ROWID IS NULL)
   4 - access("TEST_FILE_TEMP"."MSISDN"="TEST_MSISDN_UA"."MSISDN"(+) AND
              "TEST_FILE_TEMP"."UA_CODE"="TEST_MSISDN_UA"."UA_CODE"(+) AND
              "TEST_FILE_TEMP"."MODEL_ID"="TEST_MSISDN_UA"."MODEL_ID"(+))

SQL> insert into test_msisdn_ua
  2  select * from test_file_temp
  3  where (msisdn,ua_code,model_id) not in (select msisdn, ua_code, model_id from 
  4  test_msisdn_ua);

0 rows created.


Execution Plan
----------------------------------------------------------
Plan hash value: 2894540078

-------------------------------------------------------------------------------------
| Id  | Operation          | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT   |                | 50000 |  4882K|   141M (82)|473:13:30 |
|*  1 |  FILTER            |                |       |       |            |          |
|   2 |   TABLE ACCESS FULL| TEST_FILE_TEMP |  1000K|    95M| 18047   (1)| 00:03:37 |
|*  3 |   INDEX FULL SCAN  | SYS_C00291122  |  8573K|   318M|   142  (82)| 00:00:02 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( NOT EXISTS (SELECT /*+ */ 0 FROM "TEST_MSISDN_UA"
              "TEST_MSISDN_UA" WHERE LNNVL("MSISDN"<>:B1) AND LNNVL("UA_CODE"<>:B2) AND
              LNNVL("MODEL_ID"<>:B3)))
   3 - filter(LNNVL("MSISDN"<>:B1) AND LNNVL("UA_CODE"<>:B2) AND
              LNNVL("MODEL_ID"<>:B3))

May 01, 2012 - 3:07 pm UTC

to delete the duplicate data,and replace the
null of model_id to a negative number，

NO - do not do that.

Is model id supposed to be null or not? What is the "official model" and why do you have TWO different ones - one in development that differs from "real life"????

are those three columns a primary key OR NOT?

A reader, April 30, 2012 - 2:00 pm UTC

Hi Tom,

Thanks for your prompt reply!!

"FFS = fast full scan - it is using multiblock IO to read the index structure in its entirety - NOT in sorted order (that would take reading the index block by block - that would be an index full scan - not an index FAST full scan) "

I know that its doing FFS thats why need to do sort but my question is why doing FFS?

is doing (Fast Full Scans + sorting) outweight (Full Scan and no sorting) ?

Thank you,

Regard's

May 01, 2012 - 3:30 pm UTC

it determined that doing a fast full scan+sort was better than single block reads all over the place, yes.

Merging 1 milllion into 100 million

Nitin, May 06, 2012 - 1:27 am UTC

Tom I think the business goal of the question is to do the merge of 1 million rows into a table with 100million rows complete in the fastest/least resource intensive manner.

For this, dont you think that instead of worrying about fast full scans of indexes or other items, from a design perspective the best approach is to:
- Hash partition the 100 million table (assuming that the 1million being merged can be sitting anywhere in these has partitions)
- Have local indexes on hash partitions
- Use MERGE along with PARALLEL hint. This should allow for partition pruning and parallel partition scans.

Thoughts?

May 06, 2012 - 3:24 pm UTC

I'm assuming that partitioning fits into that already.

And you'd still be worrying about fast full scans versus index range scans - partitioning would not really change that.

And you cannot come up with a partitioning scheme for someone without a TON more information. Local indexes? What if at query time (which you do a lot more than this merge one presumes) you would be forcing them to do N index range scans (where N = 64, 128, 256, something like that - the number of partitions)??? Local indexes would be sort of suspect.

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

Merging 1 milllion into 100 million

More to Explore

PL/SQL demos

PL/SQL docs

how to use fast full index scan

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

Merging 1 milllion into 100 million

More to Explore

PL/SQL demos

PL/SQL docs