Skip to Main Content
  • Questions
  • Best approach to move Billion rows from one table to another.

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Tushar.

Asked: November 19, 2015 - 10:08 pm UTC

Last updated: November 24, 2015 - 1:43 am UTC

Version: 11.2.0.1.0

Viewed 10K+ times! This question is

You Asked

Hi,

I have a situation where I have to move around 1 Billion rows from one table to another.

Table A - 1.2 Billion rows (Source table)
Table B - 300 million rows (Target table)
Both the tables have same schema.

I have to move all the rows from Table A , which are not in present in Table B. Basically, I have to move A MINUS B

Here are the two approaches I have implemented.But, both are very slow.

Step 1 ) Get delta of A MINUS B
Approach 1 : Using Left outer join
Select A.RowID from A Left outer JOIN B ON A.<List of Primary Columns> = B.<List of Primary Columns> WHERE B.ID IS NULL
Approach 2 : Using MINUS.
Select A.<List of Primary Columns> FROM A
MINUS
Select B.<List of Primary Columns> FROM B

STEP 2) Insert this Delta into a dummy table
Approach 1 : Insert Rowid's of A from above query
Approach 2 : Insert PrimaryCols of Table A from above query.

STEP 3) JOIN Dummy Table to Table A and do bulk insert with LIMIT.

I am facing major problem is getting the delta. Since both the tables have huge number of records i am not sure of what could be the proper way to get it done.

I tried creating a procedure using bulk insert with LIMIT clause for the left outer join query (mentioned above).however, fetching records itself is taking long time. After 1 hr of processing I can see no records in dummy table.


If you can tell me any better approach than those above or if any performance tips for the above approached it would be a great help.

PS: I am new to oracle and this is my first post here. Please correct me if I have not followed any of the guidelines.

and Connor said...

I dont think you need dummy tables and the like (although perhaps your question is a simplified version of your true requirement).

I would be just use the outer join method along with an insert, eg here's an example (from my laptop) with 100m and 99m rows respectively.

SQL> create table T pctfree 0 as
  2  select rownum x, rownum y, rownum z
  3  from
  4    ( Select 1 from dual connect by level <= 10000),
  5    ( Select 1 from dual connect by level <= 10000)
  6  /

Table created.

SQL>
SQL>
SQL> create table T1 pctfree 0 as
  2  select rownum x, rownum y, rownum z
  3  from
  4    ( Select 1 from dual connect by level <= 10000),
  5    ( Select 1 from dual connect by level <= 9900)
  6  /

Table created.


SQL> select count(*) 
  2  from t, t1
  3  where t.x = t1.x(+)
  4  and t1.x is null
  5  /

COUNT(*)--T.X,T.Y,T.Z
---------------------
              1000000

1 row selected.

SQL> insert /*+ APPEND */ into T1
  2  select t.x, t.y, t.z
  3  from t, t1
  4  where t.x = t1.x(+)
  5  and t1.x is null
  6  /

1000000 rows created.



Rating

  (1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Tkprof with timings and row source operations

Rajeshwaran, Jeyabal, November 23, 2015 - 1:28 pm UTC

Connor,

if you could post Tkprof result for the Insert statement (with append hints), it could be easy for the end user to get the exact timings for the operation, along with waits associated (if any)
Connor McDonald
November 24, 2015 - 1:43 am UTC

insert /*+ APPEND */ into T1
select t.x, t.y, t.z
from t, t1
where t.x = t1.x(+)
and t1.x is null

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.00          0         27          0           0
Execute      1    135.19     153.19    1056863     568627       2870     1000000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2    135.20     153.19    1056863     568654       2870     1000000

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 102  
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  LOAD AS SELECT  (cr=568627 pr=1056863 pw=491112 time=153194292 us)
   1000000    1000000    1000000   OPTIMIZER STATISTICS GATHERING  (cr=568627 pr=1056863 pw=488250 time=152775388 us cost=382339 size=24000000 card=1000000)
   1000000    1000000    1000000    HASH JOIN RIGHT ANTI (cr=568627 pr=1056863 pw=488250 time=152616721 us cost=382339 size=24000000 card=1000000)
  99000000   99000000   99000000     TABLE ACCESS FULL T1 (cr=282883 pr=282876 pw=0 time=11005649 us cost=77260 size=594000000 card=99000000)
 100000000  100000000  100000000     TABLE ACCESS FULL T (cr=285744 pr=285737 pw=0 time=16032369 us cost=78144 size=1800000000 card=100000000)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  direct path read                              133        0.23          5.02
  direct path write temp                       3543        0.21         11.91
  direct path read temp                         324        0.02          0.46
  direct path write                               1        0.00          0.00
  SQL*Net message to client                       1        0.00          0.00
  SQL*Net message from client                     1        0.00          0.00
********************************************************************************



Not entirely sure what benefit this is for the original poster - given its my laptop, different table structure and different row counts...


More to Explore

PL/SQL demos

Check out more PL/SQL tutorials on our LiveSQL tool.

PL/SQL docs

PL/SQL reference manual from the Oracle documentation library