Skip to Main Content
  • Questions
  • How to Update millions or records in a table

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Murali.

Asked: November 10, 2002 - 9:21 pm UTC

Answered by: Tom Kyte - Last updated: May 10, 2019 - 8:06 am UTC

Category: Developer - Version: 8.1.7

Viewed 100K+ times! This question is

You Asked

Good Morning Tom.

I need your expertise in this regard. I got a table which contains millions or records. I want to update and commit every time for so many records ( say 10,000 records). I dont want to do in one stroke as I may end up in Rollback segment issue(s). Any suggestions please ! ! !

Murali

and we said...

If I had to update millions of records I would probably opt to NOT update.

I would more likely do:

CREATE TABLE new_table as select <do the update "here"> from old_table;

index new_table
grant on new table
add constraints on new_table
etc on new_table

drop table old_table
rename new_table to old_table;


you can do that using parallel query, with nologging on most operations generating very little redo and no undo at all -- in a fraction of the time it would take to update the data.

and you rated our response

  (451 ratings)

Is this answer out of date? If it is, please let us know via a Review

Reviews

updating millions of records

November 11, 2002 - 11:09 am UTC

Reviewer: Om from India

Hi Tom,

Could you please elaborate

CREATE TABLE new_table as select <do the update "here"> from old_table;

the above statement with a suitable example please


Tom Kyte

Followup  

November 11, 2002 - 5:45 pm UTC

ok, say you wanted to update emp to set ename = lower(ename).  Instead, you could do this:

ops$tkyte@ORA817DEV.US.ORACLE.COM> create table new_emp as
  2  select empno, LOWER(ename) ename, JOB,
  3     MGR, HIREDATE, SAL, COMM, DEPTNO
  4   from emp;

Table created.

ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> drop table emp;

Table dropped.

ops$tkyte@ORA817DEV.US.ORACLE.COM> rename new_emp to emp;

Table renamed. 

Million records update

November 11, 2002 - 1:44 pm UTC

Reviewer: Ramesh G from U.K

What if a table has over 100 million records and if i only want to update 1 million? If your method is still applicable could you elaborate it.

Many thanks in advance.


Tom Kyte

Followup  

November 11, 2002 - 6:33 pm UTC

most likely -- yes. I don't have a 100million row table to test with for you but -- the amount of work required to update 1,000,000 indexed rows is pretty large. Fortunately, you are probably using partitioning so you can do this easily in parallel -- bit by bit.

"How to Update millions or records in a table", version 8.1.7

November 11, 2002 - 2:21 pm UTC

Reviewer: John Bittner from Hunt Valley, MD USA

This is absolutely a viable approach, and one we have used repeatedly. One of our apps updates a table of several hundred million records.

The cursor..For loop approach for the update was calculated to take 53.7 years to complete!
We institued the Insert into a dummy table append with nologging, and were able to complete the "update" in under 30 minutes.

With nologging, if the system aborts, you simply re-run the 'update' again, as you have the original data in the main table. When done, we swap the partition of original data with the 'dummy' table (the one containing new values), rebuild indexes in parallel, and wha-la! Our update is complete.

i.e, to update field2 in a table:

1) First create your dummy hold table: create table xyz_HOLD as select * from xyz where rownum<1. Alter tablexyz nologging.

2) insert /*+ append parallel (xyzhold,12) */ into xyz_hold xyzhold (field1, field2, field3) select /*+ parallel (x,12) */ xyz.field1, my_new_value_for_field2, xyz.field3 from xyz x where blah blah blah.

3) when done, either rename the table, or swap the partition if your original table is partitioned, and you only updated one partition as we do. Obviously you need to rebuild indecies, etc as required.

Hope this helps!


updating millions of records

November 11, 2002 - 11:28 pm UTC

Reviewer: A.Ashiq from Trichy,Tamil Nadu ,India

Hi Tom,

As u suggested us to create a new table ,then drop the original table and rename the new table to original table instead of updating a table with millions of records.But what happen to dependent objects ,everything will get invalidated.Yeah ,of course it'll recompile itself when it called next time.But again it(dependent objects) has to do parsing.Is it Ok.

Tom Kyte

Followup  

November 12, 2002 - 10:11 am UTC

It is OK.

in case of delete

November 12, 2002 - 9:58 am UTC

Reviewer: A reader

We've a similar situation., We delete around 3 million records from 30 million rows table everyday.
There is no logical column to do partition.,
I guess the insert into a new table will take considerable time with 27 mil records.. Please let me know what is the best approach.

Tom Kyte

Followup  

November 12, 2002 - 10:35 am UTC

wait 10 days so that you are deleting 30 million records from a 60 million record table and then this will be much more efficient.

Time it some day though. 3 million records on an indexed table will take considerable time. There is a chance that INSERT /*+ append */ select <rows to keep> done nologging, parallel with create indexes in parallel could best the delete.

November 12, 2002 - 12:43 pm UTC

Reviewer: A reader

Tom,

Recently I had conducted a interview in which one the dba mentioned that they had a table that might conatin 10 million records or might be 1 million. He meant to say they delete the records and some time later the table will be populated again and viceversa.

Tom according to you do you consider partitions for such tables and if yes which type of partition..

Thanks.




Tom Kyte

Followup  

November 13, 2002 - 12:31 pm UTC

hard to tell -- is the data deleted by something that is relatively constant (eg: the value in that column doesn't change - so the row doesn't need to move from partition to partition). If so -- sure, cause we could just drop partitions (fast) instead of deleting the data.

Agree with John - insert append / CTAS / partition swap is the only way to fly

November 12, 2002 - 1:19 pm UTC

Reviewer: Jack Silvey from Richardson, TX

I work with John Bittner, one of the previous reviewers. I second what he said absolutely. It is the only way to fly. We also have an absolutely incredible stored procedure that rebuilds all of our indexes concurrently after the load, using the Oracle job scheduler as the mechanism of allowing separate threads in pl/sql.


addendum

November 13, 2002 - 10:39 am UTC

Reviewer: A reader

This process was introduced to our environment by a master tuner and personal friend, Larry Elkins. This was a totally new paradigm for the application, and one that saved the entire mission-critical application. The in-place updates would not have worked with the terrabytes of data that we have in our database.

How to Update millions or records in a table

November 19, 2002 - 1:55 pm UTC

Reviewer: Boris Milrud from San Jose, CA USA

In response to the Jack Silvey (from Richardson, TX ) review, where he wrote "It is the only way to fly. We also have an absolutely incredible stored procedure that rebuilds all of our indexes concurrently after the load, using the Oracle job scheduler as the mechanism of allowing separate threads in pl/sql":

Could you provide more information about that procedure and how to rebuild multiple same-table indexes concurrently using Oracle job scheduler?

Thanks,
Boris.


Tom Kyte

Followup  

November 19, 2002 - 5:29 pm UTC

instead of

begin
execute immediate 'alter index idx1 rebuild';
execute immediate 'alter index idx2 rebuild';
end;

you can code:

declare
l_job number;
begin
dbms_job.submit( l_job, 'execute immediate ''alter index idx1 rebuild'';' );
commit;
dbms_job.submit( l_job, 'execute immediate ''alter index idx2 rebuild'';' );
commit;
end;


Now, just set job_queue_processes > 0 to set the "degree of threads" and in 8i and before set job_queue_interval to say 60 or so and there you go.

How to Update millions or records in a table

November 19, 2002 - 9:10 pm UTC

Reviewer: Boris Milrud from San Jose, CA USA

Thanks, Tom.
Your status said that you had a large backlog, so I decided not to wait for your response and tried myself using dbms_job.submit() calls. At first, it did not work (job_queue_processes was 0), but after I set it to 12 it started working. The only one difference between your code and mine is that I issue just one commit at the end. It should not matter, right?

I selected 1 mln. rows table and rebuild 5 non-partitioned indexes with 'compute statistics parallel nologging' clause.
Here is the numbers I've got: rebuilding indexes sequentually consistently took 76 sec., while using dbms_job.submit() calls took around 40 - 42 sec.

I said "around", because the technique I used may not be perfect, though it served the purpose. I recorded the time right after the commit statement at the end of PL/SQL block - that's the start time. Then I kept querying user_jobs view every 2 - 3 sec, until the last of the 5 jobs were gone. That's was the end time.

The last question on this topic: is user_jobs view is the right place to look in order to determine that's rebuilding is done and how long it took? In package I am writing, I do massive delete operation, then rebuilding indexes, then starting the next routine. What would be the best way to detect the end of rebuiding, in order to proceed with the next call?

Thanks.
Boris.




Tom Kyte

Followup  

November 19, 2002 - 10:14 pm UTC

You can use user_jobs or dba_jobs but -- you might just want to put some "logging" into your jobs themselves so you can monitor their progress and record their times.

You can even rebuild all the indexes in a partition simultaneously

November 21, 2002 - 12:50 am UTC

Reviewer: ramakrishna from India

Thanks Tom,
In our environment, we have partitioned tables and we use:
ALTER TABLE table_name MODIFY PARTITION part_name REBUILD UNUSABLE LOCAL INDEXES

and this rebuilds all the indexes in the partition at one shot. We found this much faster than doing the indexes one by one. We will now try out if we can submit multiple such jobs in parallel (one for each partition of the table).

regards
ramakrishna



in case of deletes

November 29, 2002 - 5:17 pm UTC

Reviewer: carl from moondog, mo

Hi Tom,

Thanks so much for your web site and help. It is our
number 1 reference in times of fear and loathing.


This is what we came up with concerning mass updates

INV 50M
INVINS 10M
INVDEL 7M

There are indexes on INV.KEY and INVDEL.KEY


Execution Plan ( for deletes and updates )
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 FILTER
2 1 TABLE ACCESS (FULL) OF 'INV'
3 1 INDEX (RANGE SCAN) OF 'IX_INVDEL01' (NON-UNIQUE)




alter table INVTMP nologging;

-- INVINS contains delta inserts and updates

insert /*+ APPEND */
into INVTMP
select *
from INVINS t1;

-- INVDEL contains delta deletes and updates

insert /*+ APPEND */
into INVTMP
select *
from INV t1
where not exists ( select null
from INVDEL t2
where t2.KEY = t1.KEY );

alter table INVTMP logging;
drop table INV;
rename INVTMP to INV

-- build indexs etc

This is what we came up with and is to the fastest
approach we've tested.

Any comments or suggestions are welcome and appreciated.



Tom Kyte

Followup  

November 30, 2002 - 8:49 am UTC

I might try an anti-join on the NOT EXISTS and doing the work in a single parallel CTAS

create table invtmp as
select * from invins
UNION ALL
select t1.*
from inv t1, invdel t2
where t1.key = t2.key(+)
and t2.key is null;

add parallel /nologging as needed.

in case of deletes - many thanks

December 01, 2002 - 7:45 pm UTC

Reviewer: carl from moondog, mo

Ran the test cases at home 500K rows in INV and 50K rows in
INVINS and INVDEL

My way: 16.33 secs
Your way: 08.35 secs


Many thanks.



A bit confused

March 17, 2003 - 10:30 am UTC

Reviewer: Choche from England

I was just wandering that none of the reviews made mention if these techniques could be applied in a multi-user environment where multiple users could be updating the same table at the same time.

Tom Kyte

Followup  

March 17, 2003 - 10:32 am UTC

sorry -- I though it obvious that in most cases "no" is the answer. we are copying the data or locking excessive amounts of it -- or disabling indexes and the like.

This is a "batch" process here, to update millions of records.

Replacing PK values

July 25, 2003 - 6:42 pm UTC

Reviewer: Michael from San Diego, CA, USA

We need to replace the main ID for all entries in the database which is the PK in about 100 tables, largest of them have about 2 mln rows. New and old values for ID are stored in the lookup table, and there about half a million of them. The original approach was to replace PK in all tables for one ID value, commit the changes and move on to the next ID. This way referential integrity would be guaranteed even if update fails at any stage, and we do not need a long rollback segments and can restart the process at the point it aborted. But this update is too lengthy, and in our case would take almost a month of continuous processing. Do you have any suggestions on how to improve the performance of this critical update? Would it be better to add new ID as an additional column into one table at a time and populate all of it at once as "create table ... as select" (with nologging option), and then rename table and drop old ID column?

Tom Kyte

Followup  

July 25, 2003 - 8:42 pm UTC

sounds like I would:

create table new_table as select <join>;
create table new_table2 as select <join>;
etc etc etc etc

drop old tables
add indexes
add primary/fkeys/constraints etc....


that is -- rebuild in bulk, using nologging (no redo, no undo generated), using parallel when appropriate.

Update million Rows

July 26, 2003 - 5:41 am UTC

Reviewer: Reaz from Dhaka, Bangladesh

Dear Tom,

We have this situation, where we load data from external files numbering 7. Next we had to run a transformation process, which checks if the row exists in the target table and updates the existing row with the new one. Otherwise inserts the row to the target table.

Our current process is very slow, sometimes taking a night long to complete for say 200,000 rows.

The procedures discussed so far meant of updating only, but in our case we need to update and insert too.

I would appreciate your suggestion.

Thanks.
Reaz.

Tom Kyte

Followup  

July 26, 2003 - 12:53 pm UTC

read about MERGE in the sql reference guide.

A single batch?

July 26, 2003 - 9:54 am UTC

Reviewer: A reader

Tom,
Is it still okay, if these steps (create a new table, drop the old one, and them rename the new table as old, create indexes etc.) are coded as the batch update program? OR would you suggests these should be detailed as the Steps that should be performed under the supervision of a DBA? What I would like to know is that can we still have the steps in a single batch routine (as update would have had) or instead of the single batch these should be DBA instructions instead?
Thanks

Tom Kyte

Followup  

July 26, 2003 - 12:54 pm UTC

I would have these performed under the watchful eye of someone.

it is a one time bulk manipulation. it won't be a "pretty program" with tons of gracefully logging/error handling. You'll be doing it "down and dirty" (at least I would be). It would need someone to watch over it.

What about space consideration

July 28, 2003 - 1:39 am UTC

Reviewer: A reader

Hi Tom,
I had a similar problem where my table had gone big and i wanted to truncate some old data, my table was big more then 20 million records, i could not use the create temp table truncate and insert flow as the space of the database was not enough to hold 2 copies of the large table, I thought of doing a partition on table and then truncate one partition of table but that does not work.
Can you suggest some way which caters for this space constraint also




Tom Kyte

Followup  

July 28, 2003 - 7:04 am UTC

why didn't the partitioning work for you.

How to update part of a Very Large Table

July 28, 2003 - 6:11 am UTC

Reviewer: Mohan K from Bangalore, India

The update operation can be made simple by sepearting out the query and update operation. To do that create another table which stores the rowids of the records in the original table which has to be updated along with the the value to be updated. Then run the PL/SQL script to update the records one by one. To test the above method the following scripts can be used.

SQL> create table test_update(id number, name varchar2(100), description varchar2(4000)) storage initial(48M next 4m);

SQL> declare
v_n number;
v_name number;
v_desc number;
i number;
begin
for i in 1..1000000 LOOP
insert into test_update(id, name, description) values(i, 'Test Name'||i, 'Test Name '||i||' description ');
END LOOP;
end;
/

Elapsed: 00:04:277.23

The above script will insert 1 million rows.

SQL> select count(*) from test_update where description like '%5 description%';

  COUNT(*)
----------
    100000

Elapsed: 00:00:02.63

SQL> create table test_update_rowids as select rowid rid, description from test_update where description like '%5 description%';

Elapsed: 00:00:54.58

The table test_update_rowids stores the rowids and the new values that has to be updated. ie 100000 rows needs to be updated.

SQL>  declare
begin
for c1 in(select rid, description from test_update_rowids)
LOOP
update test_update set description=c1.description||' after update' where rowid=c1.rid;
END LOOP;
end;
/

Elapsed: 00:01:82.17

The above script performs the updation. 

Mohan
 

Tom Kyte

Followup  

July 28, 2003 - 7:37 am UTC

the above script performs an "updation" in the worst possible way, consuming the most resources.

it should have been simply:

update test_udpate set description = description || ' after update ' where description like '%5 description%';


Never do procedurally that which can be done in a single sql statement.

Updating 100K of 1 Million rows

July 28, 2003 - 8:38 am UTC

Reviewer: Mohan K from Bangalore, India

I ran the update statement as a single SQL statement and the time elapsed is slightly more than the above method. It has to be tested against update statements containing complex queries or join operations.

SQL> update test_update set description = description || ' after update ' where description like '%5 description%';

Elapsed: 00:01:100.13


Mohan
 

Tom Kyte

Followup  

July 28, 2003 - 8:53 am UTC

trust me -- that difference is neglible and the amount of time you spent writing code wasn't included.

Also, you FORGOT to add in the time to populate your temporary table -- no?


if you can do it in a single sql statment (and you almost always can) you SHOULD. 

Lets do you benchmark a little better:

ops$tkyte@ORA920> create table test_update(id number, name varchar2(100), description varchar2(4000));

Table created.

ops$tkyte@ORA920>
ops$tkyte@ORA920> insert /*+ append */
  2  into test_update
  3  select rownum, 'Test Name' || rownum, 'Test Name ' || rownum || ' description '
  4    from big_table.big_table
  5   where rownum <= 1000000;

1000000 rows created.

ops$tkyte@ORA920> commit;

Commit complete.

ops$tkyte@ORA920>
ops$tkyte@ORA920> select count(*)
  2    from test_update
  3   where description like '%5 desc%' ;

  COUNT(*)
----------
    100000

ops$tkyte@ORA920>
ops$tkyte@ORA920> select count(*)
  2    from test_update
  3   where description like '%6 desc%' ;

  COUNT(*)
----------
    100000

ops$tkyte@ORA920>
ops$tkyte@ORA920> exec runstats_pkg.rs_start

PL/SQL procedure successfully completed.

ops$tkyte@ORA920>
ops$tkyte@ORA920> update test_update
  2     set description = description || ' after update'
  3   where description like '%6 description%';

100000 rows updated.

ops$tkyte@ORA920>
ops$tkyte@ORA920> exec runstats_pkg.rs_middle

PL/SQL procedure successfully completed.

ops$tkyte@ORA920>
ops$tkyte@ORA920> create table test_update_rowids
  2  as
  3  select rowid rid, description from test_update where description like '%5 description%';

Table created.

ops$tkyte@ORA920>
ops$tkyte@ORA920> begin
  2     for c1 in(select rid, description from test_update_rowids)
  3     LOOP
  4        update test_update
  5           set description=c1.description||' after update'
  6         where rowid=c1.rid;
  7     END LOOP;
  8  end;
  9  /

PL/SQL procedure successfully completed.

ops$tkyte@ORA920>
ops$tkyte@ORA920>
ops$tkyte@ORA920> exec runstats_pkg.rs_stop
Run1 ran in 1376 hsecs
Run2 ran in 9806 hsecs
run 1 ran in 14.03% of the time

<b>so, according to me, the SINGLE (efficient, effective, correct way) update statment runs in about 14% of the time.  furthermore:</b>


Name                                  Run1        Run2        Diff
STAT...recursive cpu usage               5       4,662       4,657
STAT...CPU used by this sessio         448       5,397       4,949

<b>only 4,949 MORE cpu seconds using the row by row approach...</b>

STAT...CPU used when call star         448       5,397       4,949
STAT...free buffer inspected        10,162      17,039       6,877
STAT...db block gets               111,705     104,450      -7,255
STAT...dirty buffers inspected       9,016      16,620       7,604
STAT...calls to kcmgas               7,755          68      -7,687
STAT...Cached Commit SCN refer       7,718           0      -7,718
STAT...switch current to new b       7,719           0      -7,719
STAT...cleanouts only - consis           0       7,720       7,720
STAT...immediate (CR) block cl           0       7,720       7,720
STAT...commit txn count during           0       7,721       7,721
STAT...cleanout - number of kt           3       7,726       7,723
STAT...consistent gets - exami           3       7,760       7,757
LATCH.undo global data                 221       7,985       7,764
STAT...redo entries                100,700     108,727       8,027
LATCH.redo allocation              100,795     108,945       8,150
STAT...db block changes            202,723     210,907       8,184
STAT...Elapsed Time                  1,385       9,825       8,440
STAT...physical reads                7,894      16,476       8,582
LATCH.checkpoint queue latch        26,845      38,012      11,167
LATCH.cache buffers lru chain       35,638      52,827      17,189
STAT...no work - consistent re       7,741     100,072      92,331
STAT...table scan blocks gotte       7,719     107,721     100,002
STAT...table scan rows gotten    1,000,000   1,100,002     100,002
STAT...execute count                    27     100,092     100,065
STAT...buffer is not pinned co       7,763     107,836     100,073
STAT...calls to get snapshot s          56     100,241     100,185
LATCH.shared pool                      172     100,778     100,606
STAT...session logical reads       119,503     220,165     100,662
STAT...consistent gets               7,798     115,715     107,917
LATCH.library cache pin                233     200,865     200,632
STAT...recursive calls                 177     200,870     200,693
LATCH.library cache                    332     201,322     200,990
LATCH.cache buffers chains         587,377     835,167     247,790
STAT...redo size                31,323,804  31,836,904     513,100
STAT...session pga memory max            0     524,824     524,824
STAT...session pga memory                0   1,049,112   1,049,112

Run1 latches total versus runs -- difference and pct
Run1        Run2        Diff       Pct
754,602   1,553,986     799,384     48.56%

PL/SQL procedure successfully completed.

<b>and worse, it used 2x the number of latches!  (scalability inhibitors)

There is nothing good about row at a time processing, it is something to do ONLY when a set based operation isn't feasible</b>

 

This is very good example for updating the records

July 28, 2003 - 9:00 am UTC

Reviewer: Bipin Ganar from INDIA

Hi Tom,
Yes this is good example to update the record but my problem
is i have 1Million records in one table. I want update and give hike of 10% of their salaries in one table but that table is accessed by n no of people at the same moment. I can not drop the table and update and second how can i define tablespace in this script
Create Table a as select * from emp_sal;

Tom Kyte

Followup  

July 28, 2003 - 9:10 am UTC

create table a
tablespace foo
as
select * from emp_sal;

slightly different problem!

July 28, 2003 - 12:33 pm UTC

Reviewer: Srikanth Adiga from Pune, India

My table:
5M records are inserted each month.

Operations:
5M records inserted on the first day of the month
5M records updated on rest of the days on the month.
Note that this update is update each row with diff values.

I am using OCI and using batch operations. Can you please suggest me what else can I do within my program to improve the performance of my updates? I tried multithreading, but that did not help.

anything else?

Tom Kyte

Followup  

July 28, 2003 - 12:48 pm UTC

it only takes a couple of minutes to update 5 million rows on my desktop pc -- so, there must be something more involved here.

I'd be able to do that in minutes -- why is it taking you so long -- what is your bottleneck.

space considerations

July 28, 2003 - 8:56 pm UTC

Reviewer: A reader

could not create another temp table which holds the data same as teh table being partitioned as there was not enuff space in the harddisk to hold 2 or 1.5 to be precise copies of the same huge table,is there a way to partition without having to drop of truncate or transfer data to some temp location and then partition

Tom Kyte

Followup  

July 29, 2003 - 6:52 am UTC

disk is cheap.

time is money.


your time spent just thinking about this problem cost much more then the disk to solve it would.

penny wise, pound foolish. How big is this 20million row table (assuming you are the same "a reader" from above). Maybe 1-3 gig?

slightly different problem!

July 29, 2003 - 12:30 am UTC

Reviewer: Srikanth Adiga from Pune,India

couple of minutes!!! Oh dear, thats really too good.

Let me reitterate my case:
My table schema is something like this:
SubscriberVsQtyTable:
int client-id
int year
int month
int quantity
primary key - (client-id, year,month)

This table has 5M client ids.

Now every day, I have update the quantity field with the txns made for that day. Note that, I have to update one subscriber at a time since the number of txns made by each subscriber if different.

So, even now do you believe you can do this in couple of minutes? My app takes about 30 to 50 mins even on higher machines like 8 CPU.

As I said, this is a C++ app, running Solaris + Oracle 8i/9i. We are using OCI.

Thanks,


Tom Kyte

Followup  

July 29, 2003 - 7:01 am UTC

if you are doing that work with more then a single sql statement, I could see it taking a while.

If you do that work in a single update, no, I don't see it taking more then a couple of minutes.




Time for Update

July 29, 2003 - 8:27 am UTC

Reviewer: Mohan K from Bangalore, India

It is my understanding that the time for update depends on the following factors. There may be something else also involved.

1. Row Chaining/Migration caused by update
2. The where cluase is used to select the rows to be updated.
3. Percentage rows updated in a block. The more the better. Even if one row is updated in a block containing 10 records the entire block has to written by DBWR and LGWR.

So just the number of records is not that matters. 100 million recorsd may takes days for one table and may take less than a hour for another table with few columns.


Mohan


Tom Kyte

Followup  

July 29, 2003 - 8:31 am UTC

1) updating a quantity (number) field is unlikely to do that.
2) correct
3) correct


no, i disagree with the last part. to update 5million rows, a single unindexed column, should not take "a really long time". I was answering their direct question -- they update every row, on every block. so #1 and #3 really don't come into play. The biggest factor is #2 here



Verifing if neccesary to update, before update

July 29, 2003 - 9:09 am UTC

Reviewer: juan carlos reyes from Bolivia

Hi Tom, I had seen several times that you can
increase the performance by evaluating if you really need to update, or maybe it was only my impression.

For example
Update table set income = 500
where level = 7
AND NOT INCOME = 500

The AND NOT INCOME = 500, allows to avoid unncesary updates,
my point is a question

How advisable is it, does it really increase performance?

As you are always thinking how to increase performance, tt could be a good idea in newer version of Oracle, that if Oracle verifies that the value was set to that value, not update
:)

Tom Kyte

Followup  

July 29, 2003 - 11:07 am UTC

if there are more then a handful of records where income = 500, then yes, this will definitely increase performance by

a) decreasing the latching
b) decreasing the redo
c) decreasing the undo


we cannot make that optimization for you -- it would stop triggers and other EXPECTED side effects from happening.

July 29, 2003 - 11:42 am UTC

Reviewer: A reader

:) Thanks,

Maybe then it could be an optional feature
UPDATE_ONLY_IF_DIFFERENT hint
I think it could interest several people
who don't use trigger, neither constraints ,etc.

July 29, 2003 - 2:28 pm UTC

Reviewer: Stu Charlton from Toronto Canada

"Note that, I have to update one subscriber at a time since the number of txns made by each subscriber if different."

That still seems doable in 1 SQL statement, even if you wanted to merge the updates. Couldn't you just do a COUNT(*) grouped by client_id, year, and month?


Tom Kyte

Followup  

July 29, 2003 - 2:38 pm UTC

I agree -- most things CAN be done in a single sql statement.

easier said then done in real life situations

July 29, 2003 - 8:59 pm UTC

Reviewer: A reader

Hi Tom,
I agree you may think getting a disk is cheaper, that is only when u are considering getting up or just going to the store and getting a disk,
In real life conditions specially corporates this is not as easy as it seems nor is it less costly.
A Simple purchase of disk has to first be justified
which is quite difficuult, i dont think i have justification as yet)then be approved then be purchased by purchase dept then be installed or fixed by the server teams, server team may want to take the system down which means more dollars much more.These are just things off my head, when we acutually inititate the purchase process , there will be definitly some more steps and processes.

Coming back to my question, am i right in saying that we can't partion an existing table without copy the data to some temp location,Let me know if there is any other way.


Thanks




Tom Kyte

Followup  

July 29, 2003 - 9:27 pm UTC

justification

you pay me alot
disk is cheap.
think about it.

(sometimes it really can be that simple)

we are talking about a couple of gig (like 1 to 3 -- small, tiny -- my /tmp has more room then that) here, something my LAPTOP would not have an issue with.

in order to partition -- think about it -- you'll need the SOURCE DATA and the PARTITIONED DATA -- at the same time, for a period of time. no magic there.


slightly confused !

July 30, 2003 - 12:00 am UTC

Reviewer: Srikanth Adiga from Pune, India.

Thanks, Stu and rest for your updates.

Taking this buit more further, since I am confused :)

[Sri]
"Note that, I have to update one subscriber at a time since the number of txns
made by each subscriber if different."
[Sri]

[Stu Charlton]
That still seems doable in 1 SQL statement, even if you wanted to merge the
updates. Couldn't you just do a COUNT(*) grouped by client_id, year, and month?

Followup:
I agree -- most things CAN be done in a single sql statement.
[Stu Charlton]

Sorry, how would one do this?
If my table has two rows;
clientid =1,year=july,year=2003,quantity=10
clientid =2,year=july,year=2003,quantity=20
Now I have to update (clientid=1)'s quantity by 15 and (clientid=2)'s quantity by 25.
How would you manage this in a single SQL? Like this there would be 5M rows to be updated.

Btw, my table is indexed on clientid,year and month.

Tom Kyte

Followup  

July 30, 2003 - 7:24 am UTC

what is the OTHER table you are updating from?  you must have a detail table elsewhere from where you derive that 15 and 25.  So, assuming something like this:

ops$tkyte@ORA920> create table t ( clientid int, month int, year int, quantity int );

Table created.

ops$tkyte@ORA920> create table txns ( clientid int, month int, year int );

Table created.

ops$tkyte@ORA920>
ops$tkyte@ORA920> insert into t values ( 1, 7, 2003, 10 );

1 row created.

ops$tkyte@ORA920> insert into t values ( 2, 7, 2003, 20 );

1 row created.

ops$tkyte@ORA920>
ops$tkyte@ORA920> insert into txns select 1, 7, 2003 from all_objects where rownum <= 15;

15 rows created.

ops$tkyte@ORA920> insert into txns select 2, 7, 2003 from all_objects where rownum <= 25;

25 rows created.

ops$tkyte@ORA920>
ops$tkyte@ORA920>
ops$tkyte@ORA920> select * from t;

  CLIENTID      MONTH       YEAR   QUANTITY
---------- ---------- ---------- ----------
         1          7       2003         10
         2          7       2003         20

ops$tkyte@ORA920> update t
  2     set quantity = quantity + ( select count(*)
  3                                   from txns
  4                                  where txns.clientid = t.clientId
  5                                    and txns.month = t.month
  6                                    and txns.year = t.year );

2 rows updated.

ops$tkyte@ORA920> select * from t;

  CLIENTID      MONTH       YEAR   QUANTITY
---------- ---------- ---------- ----------
         1          7       2003         25
         2          7       2003         45


there you go.


(it is a shame you are using the wrong datatypes -- only a DATE should be used to hold, well, DATES.  Using numbers to hold a year and a month isn't a good practice)
 

SCN,CHECK POINT

July 30, 2003 - 1:03 am UTC

Reviewer: AATIF from PAKISTAN

i ask about whta is difference SCN system commit number,systemt change number and check point.
Which base oracle sync the database SCN or CHECKPOINT ,
when check point ocur who number write in control file headre,datafile and log file header.
dear tom.
your web site say me not ask more question now but i need clear my point.
AATIF.


Tom Kyte

Followup  

July 30, 2003 - 7:30 am UTC

er?

i remember you asking this on another page -- yes.

and I also remember answering this on that other page -- yes.


I fail to see how this relates to this page.


anyway there IS no difference between

scn
system commit number
system change number

they are the same.


as for the rest of the comment -- it doesn't "make sense" (i do suggest a read through of the server concepts guide, many questions are answered there!)

More clarification

August 01, 2003 - 12:34 am UTC

Reviewer: Srikanth Adiga from Pune,India

>what is the OTHER table you are updating from? you must
>have a detail table elsewhere from where you derive that
>15 and 25.
It is coming from an OCI application after processing some files.



Tom Kyte

Followup  

August 01, 2003 - 7:49 am UTC

use bulk processing - update between 100 and 500 rows at a time in a single call using array processing.

Or, put the data where it belongs - not in files, in tables -- and process the data in the database rather then in C

August 06, 2003 - 10:16 am UTC

Reviewer: Srikanth Adiga from Pune, India

>what is the OTHER table you are updating from? you must
>have a detail table elsewhere from where you derive that
>15 and 25.
It is coming from an OCI application after processing some files.

>Followup:
>use bulk processing - update between 100 and 500 rows at a >time in a single call using array processing.
Yes, that is what we do. It takes about 60 mins to update 5M records on a 2CPU machine. Is this much expected?

If I do the same in multiple threads I do not see any performance improvement i.e. 1M records parallely updated in 5 threads.
Any idea why?

Tom Kyte

Followup  

August 06, 2003 - 10:55 am UTC

are you sure they are going in parallel and not blocking/locking each other.

verify that all 5 sessions are in fact not only ACTIVE (v$session) at the same time but that you are not blocking yourself.

and you have taken some statspacks during your testing to see if you have any obvious bottlenecks right?

Explicit commit !

August 07, 2003 - 5:04 am UTC

Reviewer: Nathan from London

Tom,

Why do we commit explicitly after submitting jobs ?

Thanks
Nathan
/*------------------------------------------------
Followup:
instead of

begin
execute immediate 'alter index idx1 rebuild';
execute immediate 'alter index idx2 rebuild';
end;

you can code:

declare
l_job number;
begin
dbms_job.submit( l_job, 'execute immediate ''alter index idx1 rebuild'';' );
commit;
dbms_job.submit( l_job, 'execute immediate ''alter index idx2 rebuild'';' );
commit;
end;
---------------------------------*/





Tom Kyte

Followup  

August 09, 2003 - 3:51 pm UTC

because if you do not YOUR session can "see" the job in the job queue but the job queue processes cannot!


How many of us have sat there for minutes going "when is the stupid thing going to run" and until we exit sqlplus it does nothing :)



Explicit Commit

August 07, 2003 - 8:51 am UTC

Reviewer: Nathan from London

Apologies,
I must withdraw my previous question... i was obviously not thinking straight. I was thinking about a similar sitaution where a commit was issued ( before dbms_job.submit ) with a comment /* do not remove */ without any explanations... I'm still pondering why the commit should be there.

Sorry for the trouble.

Regards
Nathan




I have to take issue....

August 07, 2003 - 10:00 am UTC

Reviewer: cheaper than disk from USA

Your response about justification for buying disk space is outdated...

"justification

you pay me alot
disk is cheap.
think about it.

(sometimes it really can be that simple)"

Anybody else out there NOT paid alot ANYMORE?!

;-)


Re: Explicit Commit - To Nathan

August 07, 2003 - 3:25 pm UTC

Reviewer: A reader

Tom is out till weekend..
"I'm still pondering why the commit should be there. "
The commit after dbms_job is requierd so that other sessions can see the job_queue...which typically is the
requirement...As usual, Tom can confirm..

Menon:)

Reader

August 10, 2003 - 1:27 pm UTC

Reviewer: A reader

"
create table invtmp as
select * from invins
UNION ALL
select t1.*
from inv t1, invdel t2
where t1.key = t2.key(+)
and t2.key is null;
"

Like to know why
"
where t1.key = t2.key(+)
and t2.key is null
"
better than
where t1.key != t2.key with ALWAYS_ANTI_JOIN hint

Tom Kyte

Followup  

August 10, 2003 - 2:42 pm UTC

because there is no "always anti join" hint?


you could of course use the CBO, and a NOT IN subquery in which case it'll do it itself.

August 14, 2003 - 5:54 pm UTC

Reviewer: Su Baba

So if I have a big table, x, that needs to be updated, I would

create table y nologging as select * from x...
retrieve index definition from x
build indexes on y
drop table x
rename y to x

If I have all these done in a PL/SQL batch job, what's the best way to retrieve index definition and form create index statement from table x? I want to create indexes on y that have the exact storage parameter, tablespace, maybe partition, etc. as the ones on table x.

By the way, the database version is 8.1.7.

thanks




Tom Kyte

Followup  

August 14, 2003 - 8:01 pm UTC

you COULD, not WOULD, it is an option.


you don't have your create indexes somewhere?

guess I would exp rows=n, imp indexfile=foo to get them -- or DBMS_METADATA.get_ddl in 9i

How to update part of a Very Large Table

August 28, 2003 - 7:03 am UTC

Reviewer: Mohan K from Bangalore, India

please refer "How to update part of a Very Large Table July 28, 2003"

When updating a large table in a database having comparatively less buffer cache one thing to make sure is that blocks are updated in a continuous order. Suppose some records of block1 is updated first in the beginning and next after updating some other blocks, then block1 will be written twice by DBWR and LGWR. After the first update block1 may get flushed out by LRU algorithm. If all records in block1 is updated at a time then block1 is wriiten by LGWR and DBWR only once. If the update statement is doing a full table scan then bloacks are updated in order. But the order can be different when the is using an index for selecting the rows.

The SQL statement that makes sure of that is

update (select id, description from test_update where rowid in(select rowid from (select rowid from test_update order by rowid))) set description=description||' new val' where description like '%5%';



Tom Kyte

Followup  

August 28, 2003 - 7:54 am UTC

not necessarily so -- first lgwr doesn't write blocks -- it writes redo. second, dbwr may or may not write the block more then once.


I would never never never use your approach. why I would want to

a) read the entire table
b) sort the rowids
c) distinct that set (implied)
d) join to it (and you are making the assumption the table in the IN will be the driving table and that a hash join will not be used and ..... lots of other stuff)


when a simple:

update T set d = d || 'x' where d like '%5%'

will do it -- if you are worried about the index on d being used - you would simply FULL hint it -- but doubtful that it would be necessary.

the additional work you add would more then offset the work you are trying to save.

update large table

August 28, 2003 - 2:50 pm UTC

Reviewer: Vera from D.C.

Hi Tom and thank you again and again for all your work.

My problem is that I cannot simply replace update with creating a new table. The table I need to update - I'll call it 'pension table' - has about 1.5 mln rows. I have to update "amount" fields for 1994, 1995,1995 etc. years, and the way how I update these fields depends on status of the participant at each particular year. To determine this status I built a function that looks through another tables and takes several input parameneters including each particular financial year because it determines not just status of a participant, but status he/she had in particular FY. (This function I call a "box of cockroaches"). Basically, I ask Oracle to bring me annual monthly amount for Mr/Mrs 'X' for, let's say, 1994 and if 'X' had status 'retired' in 1994 then I want this amount to be calculated one way, if 'X' was 'deferred' -another way and if 'X' was 'dead' third way.
As you alredy guessed, updates take long time - about an hour for each year. I wonder what approach would you use in a situation like this?
Thank you
By the way, what is a status of your new book?

Tom Kyte

Followup  

August 29, 2003 - 8:35 am UTC

I would use a CASE statement in SQL without using PLSQL to compute the value.

optionally -- i would turn it inside out. instead of sql calling plsql, this might be one of the times when plsql calling sql is more appropriate. build up procedurally an array of values to update -- forall update them -- and do it again until the rows are processed. with a little foresight, you can even parallelize this processing and run your procedure N times -- each instance of it running against a different slice of the table...


book is "real soon" -- should be in trucks right now.


update large table

August 28, 2003 - 4:18 pm UTC

Reviewer: Luc Gyselinck from Belgium

For large tables, if I have to update more than 20 percent of the rows, I do it like Tom: write a query to insert the 'updated' records in a new table, using the /*+ APPEND */ hint. You have no indexes to update, no constraints to validate and you generate almost no redo. Next, I truncate the old table, after I have disabled all constraints and triggers (once again, almost no redo, very fast), and then I inject, again with the /*+ APPEND */ hint, the data from the new table into the old (very fast, little redo). Indexes get automaticaly rebuild AFTER the extents are populated. I reenable the triggers, the constraints (with or without validation, as you wish).

If the new values for the columns being updated must come from other tables, I NEVER write something like

update t1
set c1 = (select c2 from t2 where t2.c3 = t1.c4)

In fact, you are performing NESTED LOOPs. Using functions to get the new values for the columns is much the same : NESTED LOOPS (even worse: SQL / PLSQL engine context switches, open/close cursors in PL/SQL, NO read consistency).

Whenever I find myself in such a situation (typicaly when writing batch procedures, during data migrations, data transformations), I make sure my queries use HASH joins, I give the session more resources (higher SORT_AREA_SIZE, higher HASH_AREA_SIZE, DB_MULTIBLOCK_READ_COUNT to the max), avoid the INDEX RANGE SCANS, do FULL (yes FULL) table scans, do FAST FULL INDEX scans, thus bypassing the buffer pool which otherwise gets flushed by the data from my queries. And by the way, use LMTs with uniform size extents.

As an example, I rewrote a batch procedure that took 3 days to complete (once, many years ago), written the old way (see the update statement above), that now does the same job in only 4 hours, on the same (old) hardware, same database, but different Oracle version: Oracle 7 / Oracle 8i.


Yes, you are right,

August 28, 2003 - 4:50 pm UTC

Reviewer: Vera from D.C.

but right now I don't see how can I perform my task without update statement and a function. In other cases I do exactly the same thing you do on large tables - create new table rather than update old one and then rename tables.
As for LM tablespaces I was begging our DBAs to switch on LMT since I don't know when and they keep promising to do it.


Does parallel insert just imply direct?

September 08, 2003 - 1:49 pm UTC

Reviewer: Dennis from Missouri, USA

Tom (or anyone),

Does a parallel insert imply direct (append) insert? I read something in the documentation (or is that forbidden territory) ;) awhile back about the append hint not being needed on a parallel insert because it was implied.

I issued the following:
INSERT /*+ append, nologging, parallel(ext, 8) */ INTO hrdmadm.ext_xpayd_pay_details ext
SELECT /*+ parallel(ext2, 8) */ *
FROM hrdmadm.ext_xpayd_pay_details@hip2 ext2

and while I saw the parallel processes spawn on hip2 (for the select), I didn't notice any spawn where I was.

I was wondering if that was because parallel insert was synonymous with direct insert, or did I mess up the syntax somehow? Is the insert in fact parallel, and it's just 'hidden' from us?

Thanks,
Dennis

Tom Kyte

Followup  

September 08, 2003 - 2:13 pm UTC

</code> http://docs.oracle.com/docs/cd/B10501_01/server.920/a96524/c21dlins.htm#10629 <code>

quote:

Serial and Parallel Direct-Path INSERT

When you are inserting in parallel DML mode, direct-path INSERT is the default. In order to run in parallel DML mode, the following requirements must be met:
......


Thanks!

September 08, 2003 - 2:57 pm UTC

Reviewer: Dennis from Missouri, USA

My bad was that I didn't alter session. Thanks Tom for your valuable insight. I guess I was just too afraid to read down a few more pages when I was in the documentation, with that being forbidden territory and all ;) Well maybe not forbidden...just hard to put that 'haunted/fear-inspiring' word that would fit in there.

Thanks,
Dennis

Optimizing Query

September 11, 2003 - 8:40 am UTC

Reviewer: Mohan

BULK collect and BULK insert can improve the performance. Create SQL objects of type t1%rowtype and t2%rowtype.
BULK collect 100 records at a time to the collection array. Create seperate collection arrays to store values that are computed or obtained after searching the database.
Do BULK insert/update into the target table. Search this site using key words "bulk insert" or "bulk collect" or "bulk update" for more examples.

Other things are
1. Improve query performance by proper indexing
2. Remove fragmentation by "alter table move" command
3. Rebuild indexes
4. Allow large pctfree if the update causes row migration/row chaining
5. Size rollback segments and rollback tablespaces


Mohan


Tom Kyte

Followup  

September 11, 2003 - 9:49 am UTC

#2 isn't really "a good thing" in general.
#3 is really not "a good thing" in general.

reorgs -- so so overrated.


September 11, 2003 - 1:51 pm UTC

Reviewer: Kailash from NJ

Can you explain which of the 3 operations will be faster. If I want to update millions of rows,
1. then would delete/reinsert be faster or
2. mere update will be faster
3. the one you suggested will be faster.
Can you advise as to why the method you had suggested will be faster than 1 and 2.

Can you explain why updating millions of rows is not a good idea. Is it based on the amount of redo generated and what if the columns I update are not part of the index?

Which operation will be the fastest and I request you to explain the various DML operations and the amount of redo they generate. ie. I would appreciate if you could explain me which DML generates most redo, followed by the second followed by the 3rd which generates the least redo.

Currently we have an application that checks for a condition in a table A and obtains the unique id. if a record is found, then flushes all of the dependent tables for the given id and re-inserts these records back into the dependent tables with a set of new values. Is this a good approach? If so, what would be the ideal approach?

Tom Kyte

Followup  

September 11, 2003 - 7:25 pm UTC

I can say yes to all three -- each is the fastest.

each is the slowest.



If you have a large bulk update that is done once and hits most of the of rows -- you may be best off by doing a CTAS, drop old and rename new.

the CTAS, indexing, etc can be done in nologging mode -- no undo, no redo. you end up getting done much faster.



updating millions of records

October 03, 2003 - 4:28 pm UTC

Reviewer: b from austin TX

Hi,
The approach of updating the view instead of a table works great.
But here i want to update multiple colums of the same table based on different table queries.
we have around 600,000 records in the base table.
what could be the most inexpensive query.
We run oralce 9i.


UPDATE user1.UserSummary
Set
Column1 = (
SELECT Count(DISTINCT WebOrderID)
FROM user1.OrderTotal ot
WHERE total>0 and
(ot.ostatus <> -1 and
ot.ostatus <> -1) and
ot.UREFITEM = user1.UserSummary.UREFITEM)),
Column2 = (
SELECT NVL(Sum(Total),0) FROM
user1.OrderTotal ot
WHERE (ot.ostatus <> -1 and ot.ostatus <> -1)
and ot.UREFITEM = user1.UserSummary.UREFITEM),
Column3 = (
SELECT min(sodate) FROM orders.salesorders so left join
user1.ordertotal ot on ot.sorefitem = so.sorefitem
WHERE total>0 and (so.ostatus <> -1 and so.ostatus <> -1)
and so.UREFITEM = user1.UserSummary.UREFITEM),
Column4 = (
SELECT max(sodate) FROM orders.salesorders so
left join user1.ordertotal ot on ot.sorefitem = so.sorefitem
WHERE total>0 and (so.ostatus <> -1 and so.ostatus <> -1) and
so.UREFITEM = user1.UserSummary.UREFITEM)
;
the above is the sample of my query

thanks in advance
;;)


Tom Kyte

Followup  

October 04, 2003 - 9:38 am UTC

if you can come up with a single query that pulls all 4 columns together in a single result set -- we can optimize this. Left as an exercise for you -- see if you cannot merge the above 4 separate query into a single query

great

October 17, 2003 - 3:33 pm UTC

Reviewer: b from tx

thanks
i finally made my query simple

Great

October 21, 2003 - 5:28 pm UTC

Reviewer: Raj from San Diego CA USA

Great technique never thought of it.

October 22, 2003 - 1:27 am UTC

Reviewer: Anthony Reddy from Malaysia, KL


Updating 1 million records

October 22, 2003 - 4:43 pm UTC

Reviewer: Kailash from NJ, USA

I have a situation where I have a unique key on 3 columns (c1,c2,c3) on a table T and I want to update c2 from A to B for some records that match certain criteria. There are a total of 30 million records in the table and there will be about 1 million records in the table that match the criteria to be updated. The table is not partitioned.
1. Your suggestion to drop the old table and to re-create a new table and re-naming it to the old table may not be possible because I am updating only a subset of the table. Is there any other method to acheive this ?
2. Which one of the following will be a better solution?
a. Drop index and update the rows and recreate the
index
b. Delete the rows with C2 = A and insert new rows
with C2 = B. My DBA says that this will create
fragmentation and is not a better approach. Is
that true?
c. Do you suggest any better solution under these
scenarios since I keep getting these type of
questions from my manager often.

Your suggestion in this regard is highly appreciated.


Tom Kyte

Followup  

October 22, 2003 - 6:28 pm UTC

have you considered just "update set c2 = b where ...."?

1,000,000 rows out of 30,000,000 isn't alot. if c2 is indexed, you might benefit from a set unusable, skip unusable, update, rebuild.

but it won't take that long to update 1,000,000 rows. couple of minutes on my laptop


big_table@ORA920> update big_table set created = created+1;

1833792 rows updated.

Elapsed: 00:02:13.28


not a big deal.

How about rollback segment problem when updating million rows.

October 22, 2003 - 10:49 pm UTC

Reviewer: Ramasamy from NC USA

Tom ,

Thanks lot for your service.

I have to update a single column NUMBER(9) where
the table has got more than 80 columns.
This table has got 30 million rows and i have to update
around 17 million rows each night.

As you always insisted ,
i tried with a single update like

UPDATE t1 SET batchno = (A constant )
WHERE batchno is NULL;

1.)I came across the rollback segment issue.
2.) It is taking around 3 hrs to update.

Your input is greatly appreciated to solve the above issues.

Thanking You,
Ramasamy


Tom Kyte

Followup  

October 23, 2003 - 8:03 am UTC

there is no problem -- other then you haven't sized your system to do the things you require it to do!!!!!!!!


1) the rollback segment "issue" is a "non issue". size things for what you do and there are NO ISSUES. disk is cheap -- compared to what they pay us all to have this discussion. consider that. the amount of time you've spent thinking about this already cost your employer much more then the disk it would take to do this correctly.

2) you are changing a null to a non-null -- increasing the width of the field. make sure you are not causing massive row migration (check your pctfree, make sure it is set appropriately here). is batchno "indexed"?

MILLION RECORD INSERT/SELECT/UPDATE TAKES TOO MUCH TIME

October 23, 2003 - 3:27 am UTC

Reviewer: Santosh Das from Bangalore, INDIA.

Hi Tom,
I have a table A and soon after creating it 1million records are put into it. Then it being in a real time environment its size goes on increasing atleast half million each day.However each day a cleanup application cleans half million every day.After inserting 1 million at the creation of table its put in the realtime environment.When ever a new record is inserted a contuniously listening application detects it and applies a select operation on table A. Then it send the selected items to some table B and then updates the table A.
The listening operation may be 3 or 4 in number.How ever the select and update operations are taking a lot of time and the initial insertion of 1 million records is taking 11 hours to complete so what can be the possible architecture.
I tried sending the rowid to a queue/pipe after inserting in table A and then used it to select and update.But still its taking a lot of time. Plz suggest some method.
Thanks and regards,
Sandy..

Tom Kyte

Followup  

October 23, 2003 - 12:42 pm UTC

sounds like you should just be using AQ -- plop the record in as a message -- have as many dequeuers as you like.

not sure what else to say, not a ton of detail here.

Select and update of a table Having Millions of Records.

October 23, 2003 - 8:05 am UTC

Reviewer: Jack from NY

Hi Tom,
I have the following queries which execute on a table of million records.That table has a PK but its not used in both the queries given below.CMTIME,SAD and STATUS all are NUMBER fields.If I make index on cmtime then it takes longer. Presently i dont have a index field.Can you plz tellme how i can optimize these or where the performance bottleneck is ?

update SD set CMTIME = 1064929834 where (CMTIME is NULL or 1064929834 - CMTIME >= 10) and (1064929834 - SAD >= 0) and 1064929834 not in (select CMTIME from SD where CMTIME is not NULL)

SELECT * from SD where (CMTIME is not NULL and (CMTIME = 1064929834 OR 1064929834 - CMTIME >= 10)) AND (ED - 1064929834 > 0 and (STATUS = 1 OR STATUS = 0) and 1064929834 - SAD >= 0)

Thanks and regards,
Jack.

Tom Kyte

Followup  

October 23, 2003 - 12:50 pm UTC

are you using the CBO.



Followup to your question

October 23, 2003 - 2:17 pm UTC

Reviewer: Ramasamy from NC USA

2) you are changing a null to a non-null -- increasing the width of the field.
make sure you are not causing massive row migration (check your pctfree, make
sure it is set appropriately here). is batchno "indexed"?

Yes Tom. The Batchno is indexed like
index on (BATCHno,STATUS);

Will it be worth Droping the index and recreating it?.

Thanks,
Ramasamy

Tom Kyte

Followup  

October 23, 2003 - 7:16 pm UTC

could definitely help, yes.

October 23, 2003 - 9:12 pm UTC

Reviewer: George Lee from Hong Kong

I tried to update 30 million records using Tom's solution.
It works fine. Using approximately 1.5 hours to finish.

George (Broadway Photo Supply Ltd. HK)

whats CBO

October 24, 2003 - 4:29 am UTC

Reviewer: Jack from NY


Tom Kyte

Followup  

October 24, 2003 - 9:37 am UTC

cost based optimizer

No

October 27, 2003 - 1:37 am UTC

Reviewer: jack from NY

We are not using any cost based optimizers.
It's a simple query which we are using to retrieve data from a table having millions of records and it's performance is not satisfactory.
How to optimize it so that it runs faster ?
jack

Tom Kyte

Followup  

October 27, 2003 - 7:42 am UTC



you are kidding right?

this goes right up there with "my car won't start, why not?"

there is so much missing from this "request" as to make it impossible to say anything sensible.



November 11, 2003 - 11:21 pm UTC

Reviewer: George Lee from Hong Kong

I have a fact table being partitioned by month. The indexes are built local to partition.

The data population is working fine.

But the rebuild index fails at the end.

I don't know how to rebuild indexes in the partitions.
Can you help me ?

Tom Kyte

Followup  

November 12, 2003 - 10:19 am UTC

why do you need to rebuild here at all??

but why don't you show us what you are doing -- at the very least -- define what "fails at the end" means (error codes, messages, details)

November 12, 2003 - 8:17 pm UTC

Reviewer: George Lee from Hong Kong

Dear Tom,

Yes, I should provide enough information to you. Sorry for that.

Here is the story,

I create a table as following,

drop table D_DYNRPT_SALES_FACT;

create table D_DYNRPT_SALES_FACT
(
TX_DATE DATE,
SHOP_NO VARCHAR2(20),
ITEM_NO VARCHAR2(12),
PARENT_ITEM_NO VARCHAR2(12),
BRAND_NO VARCHAR2(5),
VENDOR_NO VARCHAR2(20),
ITEM_GROUP VARCHAR2(5),
ITEM_TYPE VARCHAR2(5),
CONSIGNMENT VARCHAR2(1),
ITEM_DESC VARCHAR2(40),
BRAND_DESC VARCHAR2(30),
VENDOR_DESC VARCHAR2(30),
CATEGORY_DESC VARCHAR2(30),
QTY_SOLD NUMBER(14,2),
NET_SALES_AMT NUMBER(14,2),
GROSS_PROFIT_AMT NUMBER(14,2)
)
PARTITION BY RANGE (TX_DATE)
( PARTITION D_DYNRPT_SALES_FACT_2000_09 VALUES LESS THAN (TO_DATE('01/10/2000','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2000_09,
PARTITION D_DYNRPT_SALES_FACT_2000_10 VALUES LESS THAN (TO_DATE('01/11/2000','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2000_10,
PARTITION D_DYNRPT_SALES_FACT_2000_11 VALUES LESS THAN (TO_DATE('01/12/2000','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2000_11,
PARTITION D_DYNRPT_SALES_FACT_2000_12 VALUES LESS THAN (TO_DATE('01/01/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2000_12,
PARTITION D_DYNRPT_SALES_FACT_2001_01 VALUES LESS THAN (TO_DATE('01/02/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_01,
PARTITION D_DYNRPT_SALES_FACT_2001_02 VALUES LESS THAN (TO_DATE('01/03/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_02,
PARTITION D_DYNRPT_SALES_FACT_2001_03 VALUES LESS THAN (TO_DATE('01/04/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_03,
PARTITION D_DYNRPT_SALES_FACT_2001_04 VALUES LESS THAN (TO_DATE('01/05/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_04,
PARTITION D_DYNRPT_SALES_FACT_2001_05 VALUES LESS THAN (TO_DATE('01/06/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_05,
PARTITION D_DYNRPT_SALES_FACT_2001_06 VALUES LESS THAN (TO_DATE('01/07/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_06,
PARTITION D_DYNRPT_SALES_FACT_2001_07 VALUES LESS THAN (TO_DATE('01/08/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_07,
PARTITION D_DYNRPT_SALES_FACT_2001_08 VALUES LESS THAN (TO_DATE('01/09/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_08,
PARTITION D_DYNRPT_SALES_FACT_2001_09 VALUES LESS THAN (TO_DATE('01/10/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_09,
PARTITION D_DYNRPT_SALES_FACT_2001_10 VALUES LESS THAN (TO_DATE('01/11/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_10,
PARTITION D_DYNRPT_SALES_FACT_2001_11 VALUES LESS THAN (TO_DATE('01/12/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_11,
PARTITION D_DYNRPT_SALES_FACT_2001_12 VALUES LESS THAN (TO_DATE('01/01/2002','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_12,
PARTITION D_DYNRPT_SALES_FACT_2002_01 VALUES LESS THAN (TO_DATE('01/02/2002','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2002_01,
PARTITION D_DYNRPT_SALES_FACT_2002_02 VALUES LESS THAN (TO_DATE('01/03/2002','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2002_02,
PARTITION D_DYNRPT_SALES_FACT_2002_03 VALUES LESS THAN (TO_DATE('01/04/2002','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2002_03,
PARTITION D_DYNRPT_SALES_FACT_2002_04 VALUES LESS THAN (TO_DATE('01/05/2002','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2002_04,
.
.
.
PARTITION D_DYNRPT_SALES_FACT_2004_12 VALUES LESS THAN (TO_DATE('01/01/2005','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2004_12,
PARTITION D_DYNRPT_SALES_FACT_MAXVALUE VALUES LESS THAN (maxvalue)
TABLESPACE D_DYNRPT_SALES_FACT_MAXVALUE
)
/

CREATE INDEX D_DYNRPT_SALES_FACT_I01 ON D_DYNRPT_SALES_FACT
(TX_DATE) LOCAL;

CREATE INDEX D_DYNRPT_SALES_FACT_I02 ON D_DYNRPT_SALES_FACT
(TX_DATE, ITEM_GROUP, ITEM_TYPE) LOCAL;

CREATE INDEX D_DYNRPT_SALES_FACT_I03 ON D_DYNRPT_SALES_FACT
(TX_DATE, SHOP_NO, ITEM_NO, ITEM_GROUP) LOCAL;

CREATE INDEX D_DYNRPT_SALES_FACT_I04 ON D_DYNRPT_SALES_FACT
(TX_DATE, BRAND_NO, ITEM_GROUP, ITEM_TYPE) LOCAL;

CREATE INDEX D_DYNRPT_SALES_FACT_I05 ON D_DYNRPT_SALES_FACT
(TX_DATE, ITEM_NO) LOCAL;

CREATE OR REPLACE PUBLIC SYNONYM D_DYNRPT_SALES_FACT FOR BPSADM.D_DYNRPT_SALES_FACT;

then, populate data into the table as the following,

alter index D_DYNRPT_SALES_FACT_I01 unusable;
alter index D_DYNRPT_SALES_FACT_I02 unusable;
alter index D_DYNRPT_SALES_FACT_I03 unusable;
alter index D_DYNRPT_SALES_FACT_I04 unusable;
alter index D_DYNRPT_SALES_FACT_I05 unusable;

alter session set skip_unusable_indexes=true;

@pop_d_dynrpt_sales_fact.sql '2000/09/01' '2000/01/30';
.
.
.
@pop_d_dynrpt_sales_fact.sql '2003/11/01' '2003/11/30';

alter index D_DYNRPT_SALES_FACT_I01 rebuild nologging;
alter index D_DYNRPT_SALES_FACT_I02 rebuild nologging;
alter index D_DYNRPT_SALES_FACT_I03 rebuild nologging;
alter index D_DYNRPT_SALES_FACT_I04 rebuild nologging;
alter index D_DYNRPT_SALES_FACT_I05 rebuild nologging;

the data_population is working fine, but alter index rebuild nologging fail with the error as following,

alter index D_DYNRPT_SALES_FACT_I01 rebuild nologging
*
ERROR at line 1:
ORA-14086: a partitioned index may not be rebuilt as a whole



alter index D_DYNRPT_SALES_FACT_I02 rebuild nologging
*
ERROR at line 1:
ORA-14086: a partitioned index may not be rebuilt as a whole



alter index D_DYNRPT_SALES_FACT_I03 rebuild nologging
*
ERROR at line 1:
ORA-14086: a partitioned index may not be rebuilt as a whole



alter index D_DYNRPT_SALES_FACT_I04 rebuild nologging
*
ERROR at line 1:
ORA-14086: a partitioned index may not be rebuilt as a whole



alter index D_DYNRPT_SALES_FACT_I05 rebuild nologging
*
ERROR at line 1:
ORA-14086: a partitioned index may not be rebuilt as a whole

I checked the reference, it should be something syntax error on my rebuild index statement, it is like alter index D_DYNRPT_SALES_FACT_I05 rebuild partition..., I don't know the exact statement, can you help me, thanks.




Tom Kyte

Followup  

November 13, 2003 - 6:54 am UTC

ops$tkyte@ORA920LAP> CREATE TABLE t
  2  (
  3    data       char(255),
  4    temp_date  date
  5  )
  6  PARTITION BY RANGE (temp_date) (
  7    PARTITION part1 VALUES LESS THAN (to_date('13-mar-2003','dd-mon-yyyy')) ,
  8    PARTITION part2 VALUES LESS THAN (to_date('14-mar-2003','dd-mon-yyyy')) ,
  9    PARTITION part3 VALUES LESS THAN (to_date('15-mar-2003','dd-mon-yyyy')) ,
 10    PARTITION part4 VALUES LESS THAN (to_date('16-mar-2003','dd-mon-yyyy')) ,
 11    PARTITION part5 VALUES LESS THAN (to_date('17-mar-2003','dd-mon-yyyy')) ,
 12    PARTITION part6 VALUES LESS THAN (to_date('18-mar-2003','dd-mon-yyyy')) ,
 13    PARTITION junk VALUES LESS THAN (MAXVALUE)
 14  )
 15  ;

Table created.

ops$tkyte@ORA920LAP>
ops$tkyte@ORA920LAP> create index t_idx1 on t(temp_date) LOCAL nologging;

Index created.

ops$tkyte@ORA920LAP>
ops$tkyte@ORA920LAP> alter index t_idx1 unusable;

Index altered.

ops$tkyte@ORA920LAP>
ops$tkyte@ORA920LAP> begin
  2          for x in ( select 'alter index ' || index_name ||
  3                            ' rebuild partition ' || partition_name stmt
  4                   from user_ind_partitions
  5                                  where index_name = 'T_IDX1' )
  6          loop
  7                  dbms_output.put_line( x.stmt );
  8                  execute immediate x.stmt;
  9          end loop;
 10  end;
 11  /
alter index T_IDX1 rebuild partition PART1
alter index T_IDX1 rebuild partition PART2
alter index T_IDX1 rebuild partition PART3
alter index T_IDX1 rebuild partition PART4
alter index T_IDX1 rebuild partition PART5
alter index T_IDX1 rebuild partition PART6
alter index T_IDX1 rebuild partition JUNK

PL/SQL procedure successfully completed.

 

December 11, 2003 - 4:56 pm UTC

Reviewer: A reader

Hi Tom,
I am running below update statement from past 24 hr and it's still running..

Table S_REVN has 1.5 Million and Table S_OPTY_PROD has 1.2 Million Record.. It's doing full table scan of both table sinace I am using function in where clause.. We using RULE base optimizer and oracle 8.1.7.4

Here is sql statement

Update siebel.S_REVN r
set x_opty_prod_city=(select X_CITY from siebel.S_OPTY_PROD p where replace(replace(replace(p.ROW_ID,'ce','='),'-','='), '@', '$') = r.row_id),
X_opty_prod_STATE =(select X_STATE from siebel.S_OPTY_PROD p where replace(replace(replace(p.ROW_ID,'ce','='),'-','='), '@', '$') = r.row_id),
X_PR_YEAR_QTY =(select X_PR_YEAR_QTY from siebel.S_OPTY_PROD p where replace(replace(replace(p.ROW_ID,'ce','='),'-','='), '@', '$') = r.row_id)


0 UPDATE STATEMENT Optimizer=RULE
1 0 UPDATE OF 'S_REVN'
2 1 TABLE ACCESS (FULL) OF 'S_REVN'
3 0 TABLE ACCESS (FULL) OF 'S_OPTY_PROD'
4 0 TABLE ACCESS (FULL) OF 'S_OPTY_PROD'
5 0 TABLE ACCESS (FULL) OF 'S_OPTY_PROD'

Please advice
Thanks



Tom Kyte

Followup  

December 11, 2003 - 6:27 pm UTC

you understand that it is doing 3 full scans of S_OPTY_PROD for EACH and EVERY row in S_REAVN yes???


how about

update t
set (a,b,c) = (select a,b,c from t2 where ... )

at the very least (reduce the full scans from 3 to 1)


at the very most, consider creating an index:

create index temp_index_to_make_this_complete_before_my_grandchildren_have_grandchildren
on p(replace(replace(replace(p.ROW_ID,'ce','='),'-','='), '@', '$'))

and useing a hint (to let sighbul use the cbo) to get the index used.

December 12, 2003 - 12:44 pm UTC

Reviewer: A reader

Hi Tom,

I am following your advice on update I've changed sql statement


Update /*+ INDEX(S_OPTY_PROD T_TEMP) */ siebel.S_REVN r
set (x_opty_prod_city,X_opty_prod_STATE,X_PR_YEAR_QTY) =
(select X_CITY,X_STATE,X_PR_YEAR_QTY from siebel.S_OPTY_PROD p
where replace(replace(replace(p.ROW_ID,'ce','='),'-','='), '@', '$') = r.row_id)

created function based index on S_OPTY_PROD

create index T_TEMP
on S_OPTY_PROD(replace(replace(replace(ROW_ID,'ce','='),'-','='), '@', '$'))

BUT above sql statement in not using index eventhough I supply hint (Is it b/c function based index not reconized by Rule based optimizer?)

Please help
Thanks

Tom Kyte

Followup  

December 13, 2003 - 11:11 am UTC

you put the hint in the wrong place :)

and you might not have met all of the requirements for FBI's.  read:

http://asktom.oracle.com/~tkyte/article1/index.html

but it is not the RBO (the well formed hint -> cbo)....


your hint is wrong (must use the correlation name of P) and in the wrong place (should be in the subquery).

you need query_rewrite_enabled and (depending on version) might need query_rewrite_integrity (see the above link).

but here is an example:
ops$tkyte@ORA920> create table t1( x_opty_prod_city int, x_opty_prod_state int, x_pr_year_qty int, row_id varchar2(25) );
 
Table created.
 
ops$tkyte@ORA920> create table t2( x_city int, x_state int, x_pr_year_qty int, row_id varchar2(25) );
 
Table created.
 
ops$tkyte@ORA920>
ops$tkyte@ORA920> create index t2_idx on t2( replace(replace(replace(ROW_ID,'ce','='),'-','='), '@', '$') );
 
Index created.
 
ops$tkyte@ORA920>
ops$tkyte@ORA920> delete from plan_table;
 
4 rows deleted.
 
ops$tkyte@ORA920>
ops$tkyte@ORA920> ALTER SESSION SET QUERY_REWRITE_ENABLED=true;
 
Session altered.
 
ops$tkyte@ORA920>
ops$tkyte@ORA920> explain plan for
  2  Update t1 r
  3  set (x_opty_prod_city,X_opty_prod_STATE,X_PR_YEAR_QTY)     =
  4      (select /*+ INDEX( t2 t2_idx ) */ X_CITY,X_STATE,X_PR_YEAR_QTY
  5             from t2 p
  6        where replace(replace(replace(p.ROW_ID,'ce','='),'-','='), '@', '$') = r.row_id)
  7  /
 
Explained.
 
ops$tkyte@ORA920>
ops$tkyte@ORA920> prompt @?/rdbms/admin/utlxpls
@?/rdbms/admin/utlxpls
ops$tkyte@ORA920> set echo off
 
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------
 
-------------------------------------------------------------------------------
| Id  | Operation                    |  Name       | Rows  | Bytes |Cost(%CPU)|
-------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT             |             |    82 |  4346 |   3  (34)|
|   1 |  UPDATE                      | T1          |       |       |          |
|   2 |   TABLE ACCESS FULL          | T1          |    82 |  4346 |   3  (34)|
|   3 |   TABLE ACCESS BY INDEX ROWID| T2          |     1 |    53 |   2  (50)|
|*  4 |    INDEX RANGE SCAN          | T2_IDX      |     1 |       |   2  (50)|
-------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   4 - access(REPLACE(REPLACE(REPLACE("P"."ROW_ID",'ce','='),'-','='),'@','$')=:B
              1)
 
16 rows selected.
 

December 12, 2003 - 9:04 pm UTC

Reviewer: A reader

hello Tom,I create table as following using the parallel hint,but the cost is different:

sql>insert /*+ append parallel(demo01,4) */ into demo01 select /*+ parallel(customer,4) */ * from customer;
the cost --42s
and the Execution Plan is:
Execution Plan
----------------------------------------------------------
ERROR&#65306;
ORA-12838: &#26080;&#27861;&#22312;&#24182;&#34892;&#27169;&#24335;&#19979;&#20462;&#25913;&#20043;&#21518;&#35835;/&#20462;&#25913;&#23545;&#35937;
SP2-0612: &#29983;&#25104;AUTOTRACE EXPLAIN&#25253;&#21578;&#26102;&#20986;&#29616;&#38169;&#35823;

Statistics
----------------------------------------------------------
1698 recursive calls
455 db block gets
8728 consistent gets
8075 physical reads
66492440 redo size
630 bytes sent via SQL*Net to client
525 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
11 sorts (memory)
0 sorts (disk)
390126 rows processed

sql>insert * from demo02 select * from customer;
cost --26s
and the Execution Plan is:
Execution Plan
----------------------------------------------------------
0 INSERT STATEMENT Optimizer=CHOOSE (Cost=1226 Card=659568 Byt
es=680674176)

1 0 PARTITION RANGE (ALL)
2 1 TABLE ACCESS (FULL) OF 'CUSTOMER' (Cost=1226 Card=659568
Bytes=680674176)





Statistics
----------------------------------------------------------
1939 recursive calls
44876 db block gets
20918 consistent gets
1 physical reads
64954576 redo size
646 bytes sent via SQL*Net to client
467 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
7 sorts (memory)
0 sorts (disk)
390126 rows processed

The customer table have 0.4 million records and the machine is 2 CPU's SUN E420,which runiing Oracle database.

Can you tell me why the physical reads increase so much using parallel hint in the DML sql,thanks

Tom Kyte

Followup  

December 13, 2003 - 11:49 am UTC

well, for a tiny number of records like this, i would not even consider PQ at this point.


But -- to answer your questions

the cost, well the cost is affected by thousands of things. HINTS definitely affect the cost (That is in part how hints work -- by playing with costs). Ignore the cost of two queries that are not the same, they are NOT comparable.


the PIO's -- well, you ran one before the other? That and parallel query prefers to checkpoint -- and do direct io many times (you might find PQ doing a global checkpoint before the query begins to get current images onto disk in order to just all hit slam the data as fast as it can without messing with the buffer cache)


but for so few records -- it would seem that regular sql is what you want.

December 13, 2003 - 4:37 pm UTC

Reviewer: A reader

Hi Tom,
Followed you advice...WORKS LIKE A CHARM..updated 1.5 Million recod in 10 Min. EXCELLENT!!!!!!!

Thanks a LOTTTTTTTTTTT

Update siebel.S_REVN r
set (x_opty_prod_city,X_opty_prod_STATE,X_PR_YEAR_QTY) =
(select /*+ INDEX(S_OPTY_PROD T_TEMP) */ X_CITY,X_STATE,X_PR_YEAR_QTY from siebel.S_OPTY_PROD p
where replace(replace(replace(p.ROW_ID,'ce','='),'-','='), '@', '$') = r.row_id)


Online updation of very big table

March 06, 2004 - 3:37 pm UTC

Reviewer: Praveen from Delhi

Hi Tom
Iwant to update a table with one go on online system.A table has 20 lakh records and has 110 columns. when i give the update command it takes ~ one hrs.I don't know why it is taking so much time even index is created on that particular field.

select count(*),app_flg from test;
app_flag has only two values Approved or unapproved.By defualt is unapproved.
approved =170000
unapproved=30000
update test set app_flg='APPROVED' where app_flg='UNAPPROVED'
it took 1hrs to update the records and other application online users processing got slow down and locks started to occure on the table.

Suggest the best solution
pkr

Tom Kyte

Followup  

March 06, 2004 - 8:30 pm UTC

It would be the INDEX that slowed it down.

You want to FULL SCAN this table.
You want there to be NO index on the column you are updating.


it gives error

March 07, 2004 - 5:16 am UTC

Reviewer: praveen from india

on dropping the index it gives error ora-00054

thanks

Tom Kyte

Followup  

March 07, 2004 - 9:26 am UTC

that'll happen whilst there are outstanding transactions, yes. You'll want to keep trying until you get a chance to drop it.


thanks

March 07, 2004 - 11:16 am UTC

Reviewer: jasdeep,praveen from patiala india

i have solved that problem as an user has locked rows on the but at present not logged on i killed that session and index was dropped immediately.
and updates were as fast as u can think.

thanks


Update taking hrs of time

March 09, 2004 - 7:44 am UTC

Reviewer: Sachin from India

Hi Tom,
I have a query as:
I have two tables Table1(with around 10000 records max) and table2(actually the GL code combination master table with around 600,000 records). I need to update three fields in table1 (which is a temp processing table) with a unique value from table2. T1 has ccid fields which need to be updated, and s1-s4 fields corresponding to segment1-segment4 fields of table2
Following is the query like:

UPDATE table1 t1
SET t1.ccid1 = (SELECT t2.code_combination_id
FROM gl_code_combinations t2,
gl_sets_of_books gb
WHERE t2.segment5 = 'yyyyy'
and t2.segment4 = t1.s4
and t2.segment3 = 'XXXXX'
and t2.segment2 = t1.s2
and t2.segment1 = t1.s1
and t2.coa_id = gb.chart_of_accounts_id
and gb.set_of_books_id = 1),
t1.ccid2 = (SELECT t2.code_combination_id
FROM gl_code_combinations t2,
gl_sets_of_books gb
WHERE t2.segment5 = 'yyyyy'
and t2.segment4 = t1.s4
and t2.segment3 = 'zzzzzz'
and t2.segment2 = t1.s2
and t2.segment1 = t1.s1
and t2.coa_id = gb.coa_id
and gb.set_of_books_id = 1),
t1.ccid3 = (SELECT t2.ccid
FROM gl_code_combinations t2,
gl_sets_of_books gb
WHERE t2.segment5 = 'yyyyy'
and t2.segment4 = t1.s4
and t2.segment3 = 'wwwww'
and t2.segment2 = t1.s2
and t2.segment1 = t1.s1
and t2.coa_id = gb.coa_id
and gb.set_of_books_id = 1

But this is taking more than 6 hrs to complete. When I check the table locks the table remains locked Row Xclusively. I am committing immediately after the update statement in the procedure. Cud u pls tell me why is this happening?

Tom Kyte

Followup  

March 09, 2004 - 12:26 pm UTC

not that i don't believe you but -- about the locking issue -- I'm not sure i believe you.

when you commit -- locks are released. You'd have to show me the info from v$lock showing that t1 is still "locked" and locked by your session.


you could

set (c1,c2,c3) = ( select max(decode(segment3,'xxxxx',ccid)),
max(decode(segment3,'zzzzz',ccid)),
max(decode(segment3,'wwwww',ccid))
FROM gl_code_combinations t2,
gl_sets_of_books gb
WHERE t2.segment5 = 'yyyyy'
and t2.segment4 = t1.s4
and t2.segment3 in ('xxxxx','zzzzzz', 'wwwww')
and t2.segment2 = t1.s2
and t2.segment1 = t1.s1
and t2.coa_id = gb.coa_id
and gb.set_of_books_id = 1 )

instead of running a query per column.

if you run the query


select max(decode(segment3,'xxxxx',ccid)),
max(decode(segment3,'zzzzz',ccid)),
max(decode(segment3,'wwwww',ccid))
FROM gl_code_combinations t2,
gl_sets_of_books gb
WHERE t2.segment5 = 'yyyyy'
and t2.segment4 = some_value_value
and t2.segment3 in ('xxxxx','zzzzzz', 'wwwww')
and t2.segment2 = some_value_value
and t2.segment1 = some_value_value
and t2.coa_id = gb.coa_id
and gb.set_of_books_id = 1


how long does that take?

Updating millions of rows

March 15, 2004 - 1:46 pm UTC

Reviewer: A reader from NJ, USA

Tom,

In this discussion you mention that you would not do an UPDATE/DELETE of millions of rows, but rather would create a new table (in parallel/nologging) with the rows that needs to be retained and then create indexes in parallel, drop the old table and rename the new table to old table. If so, my understanding is :

1. Delete/UPDATE of tables with more than 1 million row would also have to maintain the index structure and also will generate considerable rollback and undo.
2. Other consideration with respect to update/delete is that the blocks will be moved in and out of the freelists. Please confirm if this is true and if so, please explain what happens during update/delete of large number of rows with regard to freelists?
3. Other consideration is that it will cause lot of table/segment fragmentation and lot of empty blocks below the HWM and all full table scans on the table will read lot of empty blocks.

Is my understanding correct and if I have missed any other things as to why DELETE/UPDATE of large tables is not recommended, please let me know.

4. If my table that needs to be updated/deleted is the primary/parent table with dependent child tables/constraints and triggers, is this approach correct
Disable all the triggers/constraints
Drop the table
Rename the new table to the old table
Reenable all constraints and triggers
What to do in such cases?

Tom Kyte

Followup  

March 15, 2004 - 3:03 pm UTC

I "might", "probably" if it was most of the records...


1) correct
2) correct, deletes would put them on, updates might put them on, might take them off.

3) it could, yes.


the delete/update can take lots longer than CTAS + reindex if you are doing MOST all of the records in a LARGE table with lots of indexes.

4) triggers are not relevant here....

but you would have to drop the fkeys
do the CTAS+drop+rename
add the constraints back with enable/rely (don't need validation)



Deletes/Updates

March 15, 2004 - 5:44 pm UTC

Reviewer: A reader from NJ, USA

Tom,

In the above discussion you mention :
1. The deletes will put them on because the used space in the block will drop below PCTUSED. Is it correct?
2. The update will put them on if the updated column makes the used space in the block fall below PCTUSED or the updated column makes the free space available in the block less than PCTFREE. Is it correct.

If both of the above understanding is not correct, please explain. Also, please let me know how we could track the movement of the blocks on and off the freelist. Any SQL scripts/tools available that I could use?

Thanks

Tom Kyte

Followup  

March 15, 2004 - 6:24 pm UTC

1) the deletes could, yes.

2) yes.


to track individual blocks -- not without dumping blocks (and I don't go there). dbms_space and other tools (search this site for show_space for the script I use) can show you how MANY things are on the freelist.

A small correction

March 15, 2004 - 6:30 pm UTC

Reviewer: A reader from NJ, USA

Tom,

In the above scenario for question 2, there is a small correction marked in CAPITAL letters :
"The update will put them on if the updated column makes the used space in the block fall below PCTUSED AND WILL TAKE OFF of the freelist if the updated column makes the free space available in the block less than PCTFREE. "
Is it correct?

Thanks


Tom Kyte

Followup  

March 15, 2004 - 6:33 pm UTC

I'll state it simply, forget pctfree/pctused/etc -- in english:

if the update reduces the row size, it can put it back onto the freelist.

if the update increases the row size, it can take it off the freelist.

Parallel DML

March 16, 2004 - 8:22 am UTC

Reviewer: A reader

1)On page 630 of your Oracle Expert One on One, you mention that parallel DML can be used only upon partitions -- in 8i. I'm using 9.2. Has this behavior changed in the later versions ?

2)I need to do a mass delete on a 12 GB table and it's taking to much time (far too much, really). How could I speed it up ?

Thanks

Tom Kyte

Followup  

March 16, 2004 - 9:36 am UTC

1) yes, that is a 9ir2 new feature.

</code> http://docs.oracle.com/docs/cd/B10501_01/server.920/a96531/ch2_9ir2.htm#74432 <code>

2) did you read the original answer above?

enable PK with parallel clause

March 20, 2004 - 6:40 pm UTC

Reviewer: A reader

Hi

I am working in 8.1.7.4.

I want to delete 30 million rows of 60 million so I am doing this

1. create a copy of original table with good data
2. disable original tables´s constraints, primary keys (and child FKs) and foreign keys
3. truncate the original table
4. make all original tables indexes unusable
5. insert append back from copy to original, I have to do this way because I am not allowed to modify constraint names
6. drop the copy
7. enable constraints. here when I enable PK I cannot provide paralle clause right? I searched the doco but it seems that I can only specify using index tablespace clause or I am missing something?
8. rebuild all unusable indexes



Tom Kyte

Followup  

March 21, 2004 - 9:50 am UTC

you can create an index in parallel (unique) and then enable/rely/novalidate the pk.

How to Update millions or records in a table

March 25, 2004 - 11:55 am UTC

Reviewer: A reader

Hi Tom,
I read your response to Murali's question above and believe, there will be a downtime window for the application. This is because if I want to keep the index names same as before, I will have to create the new table, drop the old table, rename the new table to old table name and then create the required indexes on the same.

Wondering, how if we can create the indexes by some other name on the new table and rename the indexes after dropping the old table.

As always your valuable advice helps a lot!

Tom Kyte

Followup  

March 25, 2004 - 9:13 pm UTC

you can rename the indexes.


alter index supports this.

how to make this restartble?

March 25, 2004 - 6:14 pm UTC

Reviewer: A reader

Hi

I want to delete 80% of rows from a couple of 50 million rows table. The process is divided in 10 steps

1. Create TEMP table, CTAS from original table
2. Disable constraints
3. Truncate original table
4. Set indexes to UNUSABLE
5. Insert into original table from TEMP table
6. Drop TEMP table
7. Create PK and UK
8. Enable PK and UK
9. Enable FKs
10. Rebuild indexes

I want to make this process restartble, i.e if it fails in step 3 if I later rerun the procedure it will start from step 3 again. How can we achiveve this? I have been thinking using a table which stores the two table name and process status something like following table then update the status as the process is going

TABLE: PROCESS_STATUS
TABLE_NAME STATUS
----------- ---------------
EMP 2
DEPT 0

Process_status is the step, 0 means that I need to start from beginning so in each process I would query this table and see which step should be run. For example if the process failed at step 3 then STATUS would be 2



select status into l_proc_status from PROCESS_STATUS
where table_name = 'EMP'
if l_proc_status = 0
then
run process 1
end if;

select status into l_proc_status from PROCESS_STATUS
where table_name = 'EMP'
if l_proc_status = 1
then
run process 2
end if;

select status into l_proc_status from PROCESS_STATUS
where table_name = 'EMP'
if l_proc_status = 2
then
run process 3
end if;

select status into l_proc_status from PROCESS_STATUS
where table_name = 'EMP'
if l_proc_status = 2
then
run process 3
end if;



and so on.

Any suggestions :-? What I see is that I will hav quite a few redundant codes...

Tom Kyte

Followup  

March 25, 2004 - 9:35 pm UTC

you'd have to keep a state table and have you code query it up, much like you are suggesting. That would do it yes.

another "inserting" idea might be to

a) insert the steps to process
b) delete them as you complete them (and commit)
c) to restart, just pick up at the step you wanted

you could insert the procedures to be called and just:

for x in ( select name from procedures order by seq )
loop
execute immediate 'begin ' || x.name || '; end;';
delete from procedures....
commit;
end loop

(just a thought, not fully baked)

provide more information about that procedure drop indexes and rebuild with nologging

April 07, 2004 - 2:39 am UTC

Reviewer: mohan from india

Hi Tom
Could you provide more information about that procedure and
how to drop indexes and rebuild with nologging .We are using informatica etl tool
before loading bulk data into target drop the indexes ( pre-session) and after load the data rebuilt the indexes with nologging (post-session ), it takes less amount of time because its generates less undo
Regards
Mohan



Tom Kyte

Followup  

April 07, 2004 - 9:09 am UTC

it is just a drop and create? not sure what you are looking for -- if it is "syntax", we document that.

Problem with Update

April 08, 2004 - 6:23 am UTC

Reviewer: Ram from Bangalore,India

Hi Tom,
This update is not working properly.How to correct this?

SQL> declare
  2   x number;
  3   y number;
  4  begin
  5   execute immediate 'update e set sal = sal*1.1 where ename in        ('SMITH','KING') returning sal into :x,:y' using x,y;
  6   dbms_output.put_line(x || chr(9)||y);
  7* end;
                                                          *
ERROR at line 5:
ORA-06550: line 5, column 65: 
PLS-00103: Encountered the symbol "SMITH" when expecting one of the following: 
. ( * @ % & = - + ; < / > at in is mod not rem return 
returning <an exponent (**)> <> or != or ~= >= <= <> and or 
like between into using || bulk 

Please do reply.
Bye!

 

Tom Kyte

Followup  

April 08, 2004 - 10:09 am UTC

..... ( ''SMITH'', ''KING'' ) .....

you need two quotes for a quote in a string (until 10g when there is another way)

Updating million records

April 08, 2004 - 7:34 am UTC

Reviewer: Himnish Narang from Chandigarh, India

Hi Tom,

i was just going through the discussion in this subject. In this you have described that you will create a table instead of going for updates of million records and this update will be peroformed as part of creation of new table.

Don't you think that the new table will also occupy the same amount of space and after dropping the table we will have to do the reorg for that tablespace.

please comment.

Thanx.



Tom Kyte

Followup  

April 08, 2004 - 10:24 am UTC

reorgs are so overrated.


no, i would not reorge a tablespace or anything simply cause I copied some data and dropped some old stuff. not a chance.




How to Update millions or records in a table

April 08, 2004 - 9:03 am UTC

Reviewer: Ganesh from Visakhapatnam

Hi Tom,
Its very useful and I used this saved lots of time by creating new table.
I got another issue similar to this. We are using 9i. there is requirement of needs to modify primary key datatype number to varchar which is having lot of dependents. Is there any option with out rebuilding table as data already exist.
Can you please suggest on this.
Thanks in Advance.
Ganesh

Tom Kyte

Followup  

April 08, 2004 - 10:51 am UTC

that'll be a rebuild pretty much.

Please see this

April 08, 2004 - 11:38 pm UTC

Reviewer: Ram from Bangalore,India

Hi Tom,
Thanks for your reply but Still it's not working properly.
SQL> declare
  2   x number;
  3   y number;
  4  begin
  5   execute immediate 'update e set sal = sal*1.1 where ename in (''SMITH'',''ALLEN'')
  6                      RETURNING sal into :x,:y' using x,y;
  7     dbms_output.put_line(x ||chr(9) ||y);
  8  end;
  9  /
declare
*
ERROR at line 1:
ORA-00913: too many values 
ORA-06512: at line 5 


How to correct this?You say about another way in Oracle 10G.How does that work?Could you please provide an example. 
Please do reply.
Bye!


 

Tom Kyte

Followup  

April 09, 2004 - 7:42 am UTC

you are returning SAL (one thing) into two things (:x,:y)

different problem all together.

in 10g, the strings could be

q'|how's this for quoting|'


instead of
'how''s this for quoting'

a character string literal that starts with q for quote will use the next two characters as the start quote and those two characters in reverse for the end quote.

Thanks

April 09, 2004 - 9:39 am UTC

Reviewer: Ram from Bangalore,India

Hi Tom,
 Thanks for your reply.I found out the way of doing it as follows and Do you have any other option to do it in a better way?
SQL> create or replace type sal_t as table of number;
  2  /

Type created.

SQL> declare
  2  sals sal_t := sal_t();
  3  begin
  4   execute immediate 'update emp set sal = sal*1.1 where ename in (''SMITH'',''KING'')
  5                      returning sal into :sals' returning bulk collect into sals;
  6   for i in 1..sals.count loop
  7    dbms_output.put_line('Updated salary:'||sals(i));
  8   end loop;
  9  end;
 10  /
Updated salary:968                                                              
Updated salary:5500                                                             

PL/SQL procedure successfully completed.

Please do reply.
Bye!

 

Tom Kyte

Followup  

April 09, 2004 - 10:17 am UTC

get rid of the dynamic sql, it isn't necessary (fixes your original issue with the quotes as well).  also, would seem to make sense to pull back other columns so you would be able to see what rows were actually updated:


ops$tkyte@ORA9IR2> declare
  2     type numArray is table of number;
  3     l_sals  numArray;
  4     l_empno numArray;
  5  begin
  6     update emp
  7        set sal = sal*1.1
  8      where ename in ('SMITH','KING')
  9  returning empno, sal bulk collect into l_empno, l_sals;
 10
 11      for i in 1..l_sals.count loop
 12            dbms_output.put_line
 13          ('Updated salary for empno ' || l_empno(i) ||
 14           ' to: '||l_sals(i));
 15       end loop;
 16  end;
 17  /
Updated salary for empno 7369 to: 880
Updated salary for empno 7839 to: 5500
 
PL/SQL procedure successfully completed.
 
 

To Mr. Ram

April 09, 2004 - 10:44 am UTC

Reviewer: A reader

Hi Ram and All,
May I request you all not to pester Tom with inane posts requesting Tom to debug and program on our behalf.
This forum is not to learn coding.

update based on rowid

April 16, 2004 - 2:02 am UTC

Reviewer: john

Tom,

we use non-intelligent primary key to update the table through a stored procedure.

i.e. when the table is queried, our select procedure returns the non-intelligent key along with the details like below

select emp_key, other details from emp;

when updating the data, emp_key is passed from the front end to the stored procedure and table is updated based on emp_key.

is it a bad idea, using rowid instead of non-intelligent key(emp_key)?

Tom Kyte

Followup  

April 16, 2004 - 7:27 am UTC

non-intelligent? interesting choice of terminology.


But in any case -- once upon a time ago, rowids were IMMUTABLE. Once assigned a row would have a rowid assigned to it and that rowid would live with that row until you deleted it.

Starting in 8i with support for updates to partition keys that would cause a row to move from partition A to partition B -- that is no longer true (and then there are IOT's...).

In 10g, there are even more options for 'row movement' -- an online segment shrink for example.

So, rowids can change and are changing in more circumstances as time goes on.


Sooooo, if you have lost update detection in place using 'FOR UPDATE' -- rowids are very safe (forms uses them). What I mean is -- you

a) select a.*, rowid from T a where .... 'normally'
b) you decide to update a row, so you

select * from t where rowid = :that_rowid AND
((c1 = :that_c1 or (c1 is null and :that_c1 is null)) AND
((c2 = :that_c2 or (c2 is null and :that_c2 is null)) AND ...
((cN = :that_cN or (cN is null and :that_cN is null))
for update NOWAIT;

you lock it -- this prevents anything that could change a rowid from doing so and verifies the data hasn't been changed. If that returns 0 rows, someone changed the data (or reorg'ed it and moved it). You need to requery to get the current values before you let the user even think about modifying it. If that returns a row -- you got it. If that returns ora-54, something has it locked, you have to decide what you want to do about that.

c) you can then safely update that row by rowid



If you do not have lost update detection in place using FOR UPDATE locks -- then you should stick with a primary key probably, just to protect yourself in the future.

update based on rowid

April 19, 2004 - 12:42 am UTC

Reviewer: john

thanks a lot Tom.

to understand fully what you said, can you please tell

1.why do we have to where condition?

((c1 = :that_c1 or (c1 is null and :that_c1 is null)) AND
((c2 = :that_c2 or (c2 is null and :that_c2 is null)) AND ...
((cN = :that_cN or (cN is null and :that_cN is null))


2.another question is, is there a situation where a rowid of a row being assigned another row of the same table?(because this may cause more danger as we end up updating another row)?

thanks again.


Tom Kyte

Followup  

April 19, 2004 - 6:37 am UTC

1) you want to make sure the row hasn't changed.

c1 = :that_c1 will only with if both the column and the bind are not null, hence the rest is needed to make it so null's compare "the same"

2) this is tied into 1 again. sure, we could delete from emp where empno = 7788; commit; and then re-insert that row and get the same rowid (rare, but could happen).

But - if we inserted the same exact values and they got the same exact rowid -- then #1 would make this "safe". the values of the row are the same, for all intents and purposes it IS the same row.

If the newly inserted row doesn't match column by column the values we expect, well, then we'll reject it (won't lock it) and all will be well.

update large table

April 22, 2004 - 6:04 pm UTC

Reviewer: Prasad Chittori from US

I have a very large partitioned table with DATE column, I would like to take out the time portion from the date column
I did the following and it is taking lot of time and failing with unable to extend the rollback segments.
Any better ideas without increasing the rollback segments or changing the query

update t1
set dt = trunc(dt)

Tom Kyte

Followup  

April 23, 2004 - 10:12 am UTC

parallel dml -- each pq slave can get its own RBS, letting you use them all at the same time, not just one of them.

convert delete to insert

April 26, 2004 - 11:30 am UTC

Reviewer: marvin

Hi

We want to delete several tables of several million of rows. The problem is, the delete statement is quite simple but if we want to this faster we would insert to temp table, truncate original and insert back the good data. We want to convert delete into insert statements but I think it´s very hard

For example

We have these DELETE statements

DELETE
FROM OM_ANNEX t
WHERE EXISTS
(SELECT 1
FROM LISTA_ORDENES l,OM_CONTRACT_INST c
WHERE t.contract_id=c.contract_id
AND c.order_id=l.order_id)
/
COMMIT
/
DELETE
FROM OM_ANNEX t
WHERE EXISTS(SELECT 1 FROM OM_CONTRACT_INST c
WHERE t.contract_id=c.contract_id
AND c.order_id IS NULL)
/
COMMIT

to convert to that we did

INSERT /*+ APPEND */ INTO OM_ANNEX$TMP
SELECT t.*
FROM OM_ANNEX t
WHERE NOT EXISTS (SELECT 1
FROM LISTA_ORDENES l,
OM_CONTRACT_INST c
WHERE t.contract_id=c.contract_id
AND c.order_id=l.order_id)
UNION
SELECT t.*
FROM OM_ANNEX t
WHERE NOT EXISTS (SELECT 1
FROM OM_CONTRACT_INST c
WHERE t.contract_id=c.contract_id
AND c.order_id IS NULL);

Seems incorrect, if I run the queries inside insert seperately I get correct results but if we use UNIONs we get data which we dont want, if we do it in two inserts we get duplicate rows

One approach is this

create a temp table using the delete statements but converted to querys for example instead of

DELETE
FROM OM_ANNEX t
WHERE EXISTS
(SELECT 1
FROM LISTA_ORDENES l,OM_CONTRACT_INST c
WHERE t.contract_id=c.contract_id
AND c.order_id=l.order_id)

we convert to

select t.*
FROM OM_ANNEX t
WHERE EXISTS (SELECT 1
FROM LISTA_ORDENES l,OM_CONTRACT_INST c
WHERE t.contract_id=c.contract_id
AND c.order_id=l.order_id)

then simply says

select * from om_annex a
where not exists (select null from temp_table b
where a.annexid = b.annexid)

annexid is PK

the other approach is this

select * from om_annex a
where annexid not in (select annexid
FROM OM_ANNEX t
WHERE EXISTS (SELECT 1
FROM LISTA_ORDENES l,OM_CONTRACT_INST c
WHERE t.contract_id=c.contract_id
AND c.order_id=l.order_id)
union
select annexid
FROM OM_ANNEX t
WHERE EXISTS(SELECT 1
FROM OM_CONTRACT_INST c
WHERE t.contract_id=c.contract_id
AND c.order_id IS NULL))

They are all very slow because involving tables have several million of rows!

Is there a better approach?

Tom Kyte

Followup  

April 26, 2004 - 2:20 pm UTC

why not:

create table new as select rows to keep;
drop table old;
rename new to old;

do the first in parallel, with nologging.


err the problem is convert the DELETE to CTAS

April 26, 2004 - 3:03 pm UTC

Reviewer: A reader

Hi

create table new as select rows to keep;
drop table old;
rename new to old;
do the first in parallel, with nologging.

That is exactly I want to do, the problem is until now we have always done the other way round, use plain DELETE (and it takes a week to delete everything!), the problem I am not sure how to convert DELETE to CTAS. If I want to do the reverse of DELETE statements (some table has 5 DELETE statements!) it is not as simple as write the DELETE the other way round? For example how would you change

delete tab1
where exists (select null
from tab2, tab3
where tab2.id = tab3.id
and tab1.id = tab2.fid)

delete tab1
where exists (select null
from tab2
where tab2.fid = tab1.id
and tab2.id is null)

Would you change it to

insert into tmp_x
select *
from tab1
where not exists (select null
from tab2, tab3
where tab2.id = tab3.id
and tab1.id = tab2.fid)


insert into tmp_x
select *
from tab1
where not exists (select null
from tab2
where tab2.fid = tab1.id
and tab2.id is null)

Is simple as this?



Tom Kyte

Followup  

April 27, 2004 - 4:02 am UTC

if i had this:

delete tab1
where exists (select null
from tab2, tab3
where tab2.id = tab3.id
and tab1.id = tab2.fid)

delete tab1
where exists (select null
from tab2
where tab2.fid = tab1.id
and tab2.id is null)



I would probably have this:

create table tab1_new
as
select tab1.*
from tab1, tab2, tab3
where tab1.id = tab2.fid(+)
and tab2.id = tab3.id(+)
and NOT ( tab2.fid is not null and tab3.id is not null )
and NOT ( tab2.fid is not null and tab2.id is null )
/

outer join the three tables. Negate the conditions for the where exists.

that is, after outer joining tab1 to tab2, tab3 -- remove the rows

where tab2.fid is not null and tab3.id is not null -- that is subquery one in your deletes above.

where tab2.fid is not null and tab2.id is null -- that is subquery two in your deletes above.



err the problem is convert the DELETE to CTAS

April 26, 2004 - 3:04 pm UTC

Reviewer: marvin

Hi

create table new as select rows to keep;
drop table old;
rename new to old;
do the first in parallel, with nologging.

That is exactly I want to do, the problem is until now we have always done the other way round, use plain DELETE (and it takes a week to delete everything!), the problem I am not sure how to convert DELETE to CTAS. If I want to do the reverse of DELETE statements (some table has 5 DELETE statements!) it is not as simple as write the DELETE the other way round? For example how would you change

delete tab1
where exists (select null
from tab2, tab3
where tab2.id = tab3.id
and tab1.id = tab2.fid)

delete tab1
where exists (select null
from tab2
where tab2.fid = tab1.id
and tab2.id is null)

Would you change it to

insert into tmp_x
select *
from tab1
where not exists (select null
from tab2, tab3
where tab2.id = tab3.id
and tab1.id = tab2.fid)


insert into tmp_x
select *
from tab1
where not exists (select null
from tab2
where tab2.fid = tab1.id
and tab2.id is null)

Is simple as this?



thank you very much for the outer join tip

April 27, 2004 - 6:16 am UTC

Reviewer: marvin

Hi

I am going to have a look how to apply the outer join in order to convert DELETE to CTAS.

I have a further question, I have another table which undergoes 4 DELETEs


DELETE
FROM SW_PERSON t
WHERE EXISTS
(SELECT 1
FROM LISTA_ORDENES o
WHERE o.order_id=t.swobjectid AND t.swtype='ORDER')
/
COMMIT
/
DELETE
FROM SW_PERSON t
WHERE t.swtype='ORDER' AND t.swobjectid IS NULL AND COMP_INST_ID IS NULL
/
COMMIT
/
DELETE
FROM SW_PERSON t
WHERE t.swtype IS NULL
/
COMMIT
/
DELETE
FROM SW_PERSON t
WHERE t.swtype='ORDER'
AND t.swobjectid IS NULL
AND EXISTS
(SELECT 1
FROM OM_COMPANY_INST c, LISTA_ORDENES l
WHERE c.COMP_INST_ID=t.COMP_INST_ID
AND l.order_id=c.order_id)
/
COMMIT
/


I need to convert this to CTAS as well, however I am not sure if this can be done in a single statement. These DELETE for example cant be converted into one as follows right? (because of commit between them)

DELETE
FROM SW_PERSON t
WHERE EXISTS (SELECT 1
FROM LISTA_ORDENES o
WHERE o.order_id=t.swobjectid
AND t.swtype='ORDER')
OR (t.swtype='ORDER'
AND t.swobjectid IS NULL
AND COMP_INST_ID IS NULL)
OR t.swtype IS NULL
OR (t.swtype='ORDER'
AND t.swobjectid IS NULL
AND EXISTS (SELECT 1
FROM OM_COMPANY_INST c,
LISTA_ORDENES l
WHERE c.COMP_INST_ID=t.COMP_INST_ID
AND l.order_id=c.order_id));

Can this use the outer join tip as well?

TIA

Tom Kyte

Followup  

April 28, 2004 - 12:17 pm UTC

why have any commits in between.

but of course -- any four deletes against a single table can (and if you ask me, should) be done as a single delete.


the outer join was used in the CTAS, not in a delete.


why do you use outer join

April 27, 2004 - 6:57 am UTC

Reviewer: A reader

hi

Why is outer join needed for tab1, tab2 and tab3 :-?

create table tab1_new
as
select tab1.*
from tab1, tab2, tab3
where tab1.id = tab2.fid(+)
and tab2.id = tab3.id(+)
and NOT ( tab2.fid is not null and tab3.id is not null )
and NOT ( tab2.fid is not null and tab2.id is null )
/


Tom Kyte

Followup  

April 28, 2004 - 12:20 pm UTC



because we wanted to keep all rows in tab1 -- if there is NO mate in tab2/tab3

since they deleted "where exists in tab2/tab3":

delete tab1
where exists (select null
from tab2, tab3
where tab2.id = tab3.id
and tab1.id = tab2.fid)

delete tab1
where exists (select null
from tab2
where tab2.fid = tab1.id
and tab2.id is null)

rows in tab2/tab3, we need to ALWAYS keep rows that are NOT in tab2/tab3 using CTAS. outer join is mandatory for that in this example.

to marvin

April 27, 2004 - 9:33 am UTC

Reviewer: A reader

hi marvin, you can try this


select * from tab1
where PK not in (select PK
from tab1
where exists (select null
from tab2, tab3
where tab2.id = tab3.id
and tab1.id = tab2.fid)
union
select *
from tab1
where exists (select null
from tab2
where tab2.fid = tab1.id
and tab2.id is null)))


regarding the conditions

April 27, 2004 - 11:09 am UTC

Reviewer: A reader

Hi Tom

May you show some light why

NOT ( tab2.fid is not null and tab3.id is not null )
is same as
exists (select null
from tab2, tab3
where tab2.id = tab3.id
and tab1.id = tab2.fid)

and
NOT ( tab2.fid is not null and tab2.id is null )
is same as
exists (select null
from tab2
where tab2.fid = tab1.id
and tab2.id is null)

Cant see why. Thank you

Tom Kyte

Followup  

April 28, 2004 - 12:50 pm UTC

it isn't the same.

it is in fact the opposite.


if you outer join T1 to T2 to T3

and before you were looking for (where exists)

a) a match in T2 (tab1.id = tab2.fid)
b) a match in T3 for that T2 (tabe2.id = tab3.id)

then you are saying "if I outer join T1 to T2 to T3, that row would be such that:

a) tab2.fid is NOT NULL (we found a mate)
b) tab3.id is NOT NULL (we found a mate in t3 for t2)

with the where exists -- we would have deleted that row, hence with the CTAS (which is finding rows to keep) we simply NEGATE that (with NOT).

Therefore we would keep that row, IF that was "not" satisfied.


Same logic for the second part.

the second where exists says

delete the row if

a) there is a match in T2 (where tab2.fid = tab1.id)
b) the id column in t2 for that match is NULL

that would be (in an outer join)

(tab2.fid is not null) -- we joined to a row
(tab2.id is null) -- and that row is having id is null


negate it and keep it.

Updating Table having millions of records taking lot of Time..

April 28, 2004 - 3:45 am UTC

Reviewer: Anand Pandey from INDIA

Hi Tom,
I hd a table having millions of record in which two of its cols are Null.
i just tried to update the null colls with the data from other table, which is taking taking around 10-12 hrs for a single day record, and I've to update it for 31 days.
pls help Me in getting the high perf. on updation.

Query used is:

UPDATE /*+nologging parallel(4) */ MASTER_tab A SET (null_col1,null_col2)=(SELECT /*+PARALLEL(4) */ MIN(C1),MIN(C2)
FROM Table2 B WHERE SUBSTR (A.col1_T1, 1, LENGTH (B.C1)) = B.C1
AND SUBSTR(A.col2_T1,1,3)=B.C2)
WHERE c_date='01-Jan-04'

Thanks and Regards,
Anand Pandey



Nologging - how does it impact recovery?

April 30, 2004 - 11:12 am UTC

Reviewer: Naresh from Cyprus

Hi Tom,

This is a great chain of discussion. I especially liked the "outer join to replace the not exists". I am really looking forward to my copy of your first book that I oredered recently (on it's way from amazon).

One question regarding making the table nologging: Does it not have implications for recovery? What am I missing?

Tom Kyte

Followup  

April 30, 2004 - 5:16 pm UTC

you need to schedule a hot backup if you use non-logged operations, yes.

db sequential waits on UPDATE

May 14, 2004 - 9:55 am UTC

Reviewer: A reader

Dear Tom,

Due to some processing complexities I am unable to use a single SQL for Update/Insert.

I use:
LOOP
1. Bulk select (500 rows at a time) from Table A, C with rowids from C
2. Bulk insert
3. Bulk update table C
END LOOP

I am getting a very high number of "db file sequential read" waits on the update part. Please see relevant portions of Trace File:
********************************************************************************
SELECT /*+ FULL(v) FULL(c) PARALLEL(v,4) PARALLEL(c,4) HASH(v,c) */
c.rowid,
...
FROM vp v,citi c
WHERE v.idno= c.idno

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.03 0.07 10 234 0 0
Execute 1 0.32 3.38 3023 156 3 0
Fetch 155 1.96 82.82 0 0 0 77500
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 157 2.32 86.28 3033 390 3 77500

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 95 (recursive depth: 1)

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 4 0.01 0.03
rdbms ipc reply 26 0.00 0.00
enqueue 2 0.02 0.02
PX Deq: Join ACK 7 0.00 0.00
PX Deq: Parse Reply 4 0.23 0.35
PX Deq: Execute Reply 717 2.06 735.45
PX qref latch 4 1.02 2.75
PX Deq: Table Q Normal 5218 0.35 23.95
********************************************************************************

INSERT /*+ APPEND PARALLEL(A,4) */ INTO t_v A
VALUES(:b1,
:b2)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.01 2 11 0 0
Execute 155 0.32 8.00 552 1116 5567 77500
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 156 0.32 8.02 554 1127 5567 77500

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 95 (recursive depth: 1)

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 552 0.36 6.43
free buffer waits 6 1.02 1.23
********************************************************************************

UPDATE /* PARALLEL(c,4) */ citi c
SET ...
WHERE c.rowid = :b6

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 1 0 0
Execute 154 35.26 586.87 75251 77036 79595 77000
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 155 35.26 586.87 75251 77037 79595 77000

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 95 (recursive depth: 1)

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 75578 1.04 461.09
free buffer waits 242 1.03 96.11
********************************************************************************

Is it not true that if the row has already been selected (and is there in the buffer cache) no disk IO should happen to update this row?

Please tell me a way to make this faster.

Regards

Tom Kyte

Followup  

May 15, 2004 - 10:33 am UTC

insert /*+ append parallel... */ VALUES hmmmmmm

append = no go with values, only works with "as select"
parallel -- parallel on what exactly?


update /* parallel(c,4) */ hmmmmmm

in search of the fast = true switch. thankfully that hint is malformed and hence ignored as well -- you are updating by rowid.

parallel is *not* fast=true.
append is not either really, especially with the values clause.



you can review the trace file itself (p1,p2,p3 will tell you file and block info, along with blocks read). you can use that to verify that it is the very act of reading the indexes that need to be updated and maintained that is causing this. If they are not in the case, well, we'll need to read them into there.

More Info

May 16, 2004 - 11:41 pm UTC

Reviewer: A reader

Thanks Tom,
The hint in the update was there by a Developer, it has been rightly disabled. The insert is not a bottleneck so didn't look into it.
This is a development machine, there are no other jobs running, asynchronous IO is enabled, the machine is on RAID 0 (no fault tolerance - being a development one). There are NO INDEXES on Tables being inserted and updated.

The segment on which the WAITS (db sequential read) are happening are that of the TABLE that is being UPDATED.

Please guide me next.

Tom Kyte

Followup  

May 17, 2004 - 7:29 am UTC

then you are seeing physical IO performed to read into the buffer cache the data needed to be updated. If it is not cached, we'll have to read it.


PARALLEL DML

May 28, 2004 - 2:50 pm UTC

Reviewer: Sar from NJ, USA

Tom,

I need to update a table that has 50 million rows but
the number of rows affected are only 1 million rows.

I have a single update statement to do this.

UPDATE acc_proc
SET acc_seq_num = 0
WHERE acc_seq_num > 0
/

The column acc_seq_num is not part of an index.

Can you please suggest me if there is anything better
that I can do on this SQL to run it faster.

Thanks
Sar


Tom Kyte

Followup  

May 28, 2004 - 2:56 pm UTC

nope, that is perfect.


shouldn't take very long at all. unless you are getting blocked constantly by other sessions.

do i need to commit after execute immediate ? for dml or ddl ?

June 02, 2004 - 11:31 am UTC

Reviewer: A reader

</code> http://download-east.oracle.com/docs/cd/A87861_01/NT817EE/index.htm <code>


do i need to commit after execute immediate ? for dml or ddl ? or it dose automatically ?

I tried to find the answer but not found in doc. from above site please help

Tom Kyte

Followup  

June 02, 2004 - 11:35 am UTC

DDL is processed *conceptually as:


begin
commit; -- commit anything you started, even if DDL fails, you are committed
do the ddl;
commit; -- commit the ddl
exception
when others
then
rollback; -- the ddl
raise;
end;
/


You need not commit DDL, it'll commit 2 times for you.

You need to commit DML as DML does not do that.


This is regardless of the method used to execute SQL, native dynamic sql, dbms_sql, static sql, whatever -- the rules are the same in all cases.

Thanks !!

June 02, 2004 - 11:50 am UTC

Reviewer: A reader


ok,

so i did
1.)

begin
begin
Execute immediate 'SET CONSTRAINTS ALL DEFERRED';
end;

begin
execute immediate 'delete from table ' || t;
end;

begin
execute immediate 'delete from table ' || t1;
end;

exception
when others then
rollback;
raise;
end;
/


i got the message some constraint of t1 violated. (child record exists)
if i did not issue the commit,post or rollback,
and did not perform any DDL,constraints are DEFERRED
what is the issue ? can u help ?



Tom Kyte

Followup  

June 02, 2004 - 11:56 am UTC

...
begin
execute immediate 'set constraints all deferred';
execute immediate 'delete from ' || t;
execute immediate 'delete from ' || t1;
execute immediate 'set constraints all immediate';
end;
/

that is all the code you want -- the all immediate will validate the contraints

if they fail -- then the error (exception) will be propagated back to the client and the rollback of the statement (which is whatever the client sent over to the database -- a plsql block, whatever) will be undone.




June 02, 2004 - 12:01 pm UTC

Reviewer: A reader

...
begin
execute immediate 'set constraints all deferred';
execute immediate 'delete from ' || t; -- ***** this one
-- execute immediate 'delete from ' || t1;
-- execute immediate 'set constraints all immediate';
end;
/


the question is
as soon as it executes the
'delete from ' || t;
statement, it raises the referential constriant violation exception. why ?
as
1.) constraints all deferred
2.) it NOT a ddl
3.) I am not commiting(or ending the tx) manually



Tom Kyte

Followup  

June 02, 2004 - 12:45 pm UTC

the constraint must not be deferrable.

you have to have deferrable constraints in order to defer them. the default is "not deferrable"

7.3.4 - extracting 5 million records from 100 million record table

June 15, 2004 - 12:39 pm UTC

Reviewer: A reader

Any suggestions on how to accomplish this on 7.3.4? No concept of nologging, partitioning, Export/query clause etc in 7.3.4..

How I can extract about 5 million records (based on a condition) into a separate database/server from a 100million record table? What's the most efficient way? I'm having to make do with very limited disk/space on both machines.

Thanks.

Tom Kyte

Followup  

June 15, 2004 - 5:30 pm UTC

5 million is pretty small.

I'd just

create table as select ( concept of nologging did in fact exist, unrecoverable);
export it
import it

or use the sqlplus copy command if they are connected via a nice network.

June 16, 2004 - 5:38 pm UTC

Reviewer: A reader

Thanks. Had some long columns in there. So 'Create table as select' wouldn't work. Am just ending up plsql looping and hitting smaller chunks of the mega table. Then creating smaller target tables..then exp/imp.

Am afraid I won't have sufficient temp space to do a sqlplus "Copy from". Also, documentation talks about Copy not being intended for Oracle to Oracle DBs. No idea why.

Tom Kyte

Followup  

June 16, 2004 - 6:12 pm UTC

copy is ONLY for oracle to oracle db's....

June 16, 2004 - 7:40 pm UTC

Reviewer: A reader

Per 7.3.4 documentation :
</code> http://download-west.oracle.com/docs/pdf/A42562_1.pdf <code>


Excerpt from it -

Note: In general, the COPY command was designed to be used
for copying data between Oracle and non-Oracle databases. You should use SQL commands (CREATE TABLE AS and INSERT)
to copy data between Oracle databases.

Tom Kyte

Followup  

June 17, 2004 - 7:53 am UTC

but if you think about it..... doesn't matter what the doc says, it takes two connections. connections are only to oracle databases. sure you could be using a gateway -- but even there, it would be true that create table as and insert would work.

Updating and Inserting 1 Million rows Daily with Bitmap indexes

June 24, 2004 - 6:32 pm UTC

Reviewer: Sidda from Santa Clara,USA

Hi Tom,
Here we are facing very big problem. We have a Partitioned table with 180Million records(with 70 columns and 10 bitmaps,10 B-tree indexes).End users need Bitmap indexes 24/7 so we can't drop and recreate table and indexes.Daily we have to update and insert 1Million records. We tried with bulk updates but invain. What is the best method to follow up?

Thanks in advance
Sidda


Tom Kyte

Followup  

June 24, 2004 - 9:01 pm UTC

describe "in vain", what went wrong?

it would be best to do a SINGLE insert into and a SINGLE Update against this table (not even in bulk - just single statements)

Creating Table with aggregated data from another table

July 26, 2004 - 1:16 pm UTC

Reviewer: RB from NJ

Tom: Qn related to creating a table with data from another table -

I want to create a table with few fields and aggregated some of few columns from another table.

Here is a query:

create table SRC200401 as select ID, src, LOC, sum(BYTES) TOTAL, sum (billable_bytes) TOTAL_BILLABLE from SRC_MONTHLY where ID = 1072915200 group by ID,LOC, SRC;

I have 128Million record in the SRC_MONTHLY table and it is taking more than 1 hr to get this table created. I have index on ID, src and LOC fields.

Any faster way of getting this table created?



Tom Kyte

Followup  

July 26, 2004 - 1:32 pm UTC

so how many records have that id?


Great aproach, but is it as fast for an IOT table?

July 26, 2004 - 2:34 pm UTC

Reviewer: Peter Tran from Houston, TX USA

Hi Tom,

I'm trying the same approach with an IOT table. We have an IOT table partitioned daily. I want to recreate this table with a monthly partition. I do a CTAS parallel nologging using the new monthly partition, but it's SLOW. Then again, the table does have 200 million rows. Is the "Index Organization" part of table the slow part?

Thanks,
-Peter

Tom Kyte

Followup  

July 26, 2004 - 2:47 pm UTC

what do the inputs/outputs look like (table structure) -- remember, you are sorting 200,000,000 rows as well!

did you give it a nice juicy sort area size?

Unfortunately no.

July 26, 2004 - 2:58 pm UTC

Reviewer: Peter Tran from Houston, TX USA

"...nice juicy sort area size"  That would be a negative.  :(  Man, this is going to take forever isn't it?   Ugh...

SQL> show parameter sort_area

NAME                                 TYPE        VALUE
------------------------------------ ----------- --------
sort_area_retained_size              integer     8388608
sort_area_size                       integer     8388608

SQL> desc odfrc;
 Name                                                  Null?    Type
 ----------------------------------------------------- -------- ------------

 ODIFID                                                NOT NULL NUMBER(10)
 TRPORGN                                               NOT NULL VARCHAR2(5)
 TRPDSTN                                               NOT NULL VARCHAR2(5)
 POSCOUNTRYCODE                                        NOT NULL VARCHAR2(3)
 PAXTYPE                                               NOT NULL VARCHAR2(1)
 DCP                                                   NOT NULL NUMBER(2)
 ODIFDATE                                              NOT NULL DATE
 FRCDATE                                               NOT NULL DATE
 BKGMEAN                                                        NUMBER
 BKGMEANINFLUENCED                                              NUMBER
 BKGVARIANCE                                                    NUMBER
 XXLMEAN                                                        NUMBER
 XXLMEANINFLUENCED                                              NUMBER
 XXLVARIANCE                                                    NUMBER

Here's my CTAS:

CREATE TABLE ODFRC_MONTHLY (
     ODIFID,
     TRPORGN,
     TRPDSTN,
     POSCOUNTRYCODE,
     PAXTYPE,
     DCP,
     ODIFDATE,
     FRCDATE,
     BKGMEAN,
     BKGMEANINFLUENCED,
     BKGVARIANCE,
     XXLMEAN,
     XXLMEANINFLUENCED,
     XXLVARIANCE,
    CONSTRAINT ODFRC_MONTHLY_PK PRIMARY KEY
      (ODIFID, ODIFDATE, TRPORGN, TRPDSTN,POSCOUNTRYCODE, PAXTYPE, DCP, FRCDATE)
   ) ORGANIZATION INDEX nologging parallel 8
  PARTITION BY RANGE (ODIFDATE)
(PARTITION ODFRC_20021130  VALUES LESS THAN (TO_DATE('2002-12-01', 'YYYY-MM-DD'))
  TABLESPACE ODFRC_IDX01 ,
 PARTITION ODFRC_20021231  VALUES LESS THAN (TO_DATE('2003-01-01', 'YYYY-MM-DD'))
  TABLESPACE ODFRC_IDX02 ,
 PARTITION ODFRC_20030131  VALUES LESS THAN (TO_DATE('2003-02-01', 'YYYY-MM-DD'))
  TABLESPACE ODFRC_IDX03 ,
 PARTITION ODFRC_20030228  VALUES LESS THAN (TO_DATE('2003-03-01', 'YYYY-MM-DD'))
  TABLESPACE ODFRC_IDX04 ,
 PARTITION ODFRC_20030331  VALUES LESS THAN (TO_DATE('2003-04-01', 'YYYY-MM-DD'))
  TABLESPACE ODFRC_IDX05 ,
 PARTITION ODFRC_20030430  VALUES LESS THAN (TO_DATE('2003-05-01', 'YYYY-MM-DD'))
  TABLESPACE ODFRC_IDX06 ,
 PARTITION ODFRC_20030531  VALUES LESS THAN (TO_DATE('2003-06-01', 'YYYY-MM-DD'))
  TABLESPACE ODFRC_IDX07 ,
 PARTITION ODFRC_20030630  VALUES LESS THAN (TO_DATE('2003-07-01', 'YYYY-MM-DD'))
  TABLESPACE ODFRC_IDX08 ,
 PARTITION ODFRC_20030731  VALUES LESS THAN (TO_DATE('2003-08-01', 'YYYY-MM-DD'))
  TABLESPACE ODFRC_IDX09 ,
 PARTITION ODFRC_20030831  VALUES LESS THAN (TO_DATE('2003-09-01', 'YYYY-MM-DD'))
  TABLESPACE ODFRC_IDX10 ,
 PARTITION ODFRC_20030930  VALUES LESS THAN (TO_DATE('2003-10-01', 'YYYY-MM-DD'))
  TABLESPACE ODFRC_IDX11 ,
 PARTITION ODFRC_20031031  VALUES LESS THAN (TO_DATE('2003-11-01', 'YYYY-MM-DD'))
  TABLESPACE ODFRC_IDX12 , 
 PARTITION ODFRC_20031131  VALUES LESS THAN (TO_DATE('2003-12-01', 'YYYY-MM-DD'))
  TABLESPACE ODFRC_IDX13) AS SELECT
     ODIFID,
     TRPORGN,
     TRPDSTN,
     POSCOUNTRYCODE,
     PAXTYPE,
     DCP,
     ODIFDATE,
     FRCDATE,
     BKGMEAN,
     BKGMEANINFLUENCED,
     BKGVARIANCE,
     XXLMEAN,
     XXLMEANINFLUENCED,
     XXLVARIANCE
FROM odfrc nologging;

The original ODFRC table has daily partition for one year.  Anyway, I can estimate how long this will take?  

-Peter 

Tom Kyte

Followup  

July 26, 2004 - 3:29 pm UTC

the nologging is in the wrong place

from odfrc nologging;

is the same thing as:

from odfrc a;



would you consider

a) create the table (empty, alter it to be nologging)
b) insert /*+ append */ into the individual partitions (instead of all at once) in parallel sessions?


Create Table with data from an aggregated sum of few fields from another table

July 26, 2004 - 3:48 pm UTC

Reviewer: RB from NJ

Tom: Qn related to creating a table with data from another table -

I want to create a table with few fields and aggregated some of few columns from
another table.

Here is a query:

create table SRC200401 as select ID, src, LOC, sum(BYTES) TOTAL, sum
(billable_bytes) TOTAL_BILLABLE from SRC_MONTHLY where ID = 1072915200 group by
ID,LOC, SRC;

I have 128Million record in the SRC_MONTHLY table and it is taking more than 1
hr to get this table created. I have index on ID, src and LOC fields.

Any faster way of getting this table created?




Followup:
so how many records have that id?

RB: Tom - This number varies - we have so many IDs in the master table. If I pass one id then the query will have one equi join with that ID if more than one I was planning to use IN clause. So I do not know how many records per id I will have it in the table at any given point of time.




Tom Kyte

Followup  

July 26, 2004 - 3:51 pm UTC

now I'm confused -- the predicate is variant? why create *that* table then?

what is the goal here -- to make queries of the form:

select ID, src, LOC, sum(BYTES) TOTAL, sum
(billable_bytes) TOTAL_BILLABLE from SRC_MONTHLY where ID = :x group by id, loc, src;

faster in general? (you don't have to duplicate lots of text, it is all right here)

Great suggestion!

July 26, 2004 - 3:58 pm UTC

Reviewer: Peter Tran from Houston, TX USA

Hi Tom,

Thanks for the useful suggestion.

1) Part one done (easy).
2) Is this what you mean for part 2? When you say parallel sessions, do you mean kick off a bunch of them using execute immediate?

INSERT /*+ append */ INTO odfrc_monthly(
ODIFID,TRPORGN,TRPDSTN,POSCOUNTRYCODE,
PAXTYPE,DCP,ODIFDATE,FRCDATE,BKGMEAN,
BKGMEANINFLUENCED,BKGVARIANCE,XXLMEAN,
XXLMEANINFLUENCED,XXLVARIANCE)
SELECT
ODIFID,TRPORGN,TRPDSTN,POSCOUNTRYCODE,
PAXTYPE,DCP,ODIFDATE,FRCDATE,BKGMEAN,
BKGMEANINFLUENCED,BKGVARIANCE,XXLMEAN,
XXLMEANINFLUENCED,XXLVARIANCE
FROM odfrc partition(ODFRC_20021114) nologging;

3) Should I still give it a nice juicy sort_area_size? :)

Thanks,
-Peter

Tom Kyte

Followup  

July 26, 2004 - 5:13 pm UTC

2) fire up N sqlplus sessions and run the insert append in each

3) sure, let each session have a big SAS -- alter session can be used.

Create Table with data from an aggregated sum of few fields from another table

July 26, 2004 - 7:57 pm UTC

Reviewer: RB from NJ

Tom - If the user can select one or more ids. If I have more than one ID then I was planning to use an IN clause in the where clause. The temp table that I am creating will be used in a later phase of the app for other joins. What I am looking for a soln which will be must fater than my current approach. The Query that I have given with a 120M table is taking more than 1 hr to create the aggregated table.


Tom Kyte

Followup  

July 26, 2004 - 8:19 pm UTC

what is the query plan in general then.

you would not create a temporary table in oracle -- that would be so sqlserver. just use that query in the "IN" statment in the first place!!!!!

Update...

July 27, 2004 - 4:11 pm UTC

Reviewer: Peter Tran from Houston, TX USA

Hi Tom,

I wanted to give you an update on progress. The nice thing about your approach is I can monitor the progress, but it's not as fast I as thought it would be.

I started 6 sessions and gave each session a sort_area_size = 200M. I then executed a month's worth of insert each session. It's taking about 30 minutes per partition/day to insert from the source daily table to the monthly table. Each partition holds around 700K to 800K rows.

Should I expect to run this long?

Thanks,
-Peter




Tom Kyte

Followup  

July 27, 2004 - 7:01 pm UTC

sounds long -- can you run one of the sessions with a 10046 level 12 trace and see what they might be waiting on?

sort_area_size, pga_aggregate_target, insert /*+ APPEND*/

July 28, 2004 - 3:02 am UTC

Reviewer: Dusan Valasek from Czech republic

Hi Tom,

I have two questions about setting of big sort_area_size before direct path load.
1) How does setting of sort_area_size (from session) affect direct path load? I thought that this is useful for sort operation and building of indexes.
2) If I use pga_aggregate_target, does setting of sort_area_size (from session) override pga_aggregate_target?

Thanks,

Dushan

Tom Kyte

Followup  

July 28, 2004 - 8:14 am UTC

1) you are loading an IOT. big sort going on.

2) you would set the workarea policy to manual (in your session) to allow you to override the sort area size for this "batch operation"

July 28, 2004 - 12:10 pm UTC

Reviewer: A reader

Would you suggest to re-create the table when other users want to update it online

Tom Kyte

Followup  

July 28, 2004 - 1:28 pm UTC

huh?

How to change a DATATYPE of column.

July 30, 2004 - 7:54 am UTC

Reviewer: peru

Hi Tom,
How to change a datatype of a particular column.
say a table with column VARCHAR2 for date.
Now i want to change the datatype DATE.
Assume that table has many records and referenced with
tables , procedures,triggers.
Thank you

Tom Kyte

Followup  

July 30, 2004 - 4:43 pm UTC

not really going to happen.

you can add a new column that is a date, update this new column to be to_date() of the varchar2 and set the varchar2 to null, drop the old column and rename the newly added column to the old name.

Update to July 27, 2004

August 22, 2004 - 11:11 am UTC

Reviewer: Peter Tran from Houston, TX USA

Hi Tom,

Sorry it took awhile to get back with you on this. You wanted to see a 10046 level 12 trace. I wanted to do some research first and I had to rebuild the table to reproduce the step.

Here's the trace.

insert /*+ append */ into odfrc_monthly select * from odfrc
partition(ODFRC_20040201)



call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.01 0 0 0 0
Execute 1 15.08 5776.28 11657 15874 111333 569939
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 15.09 5776.29 11657 15874 111333 569939

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 34

Rows Row Source Operation
------- ---------------------------------------------------
1 LOAD AS SELECT
569939 INDEX FAST FULL SCAN ODFRC_PK PARTITION: 185 185 (object id 12351)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
control file sequential read 8 0.00 0.00
enqueue 1911 3.00 5726.43
db file sequential read 38 0.03 0.20
db file scattered read 380 0.63 9.50
direct path write 19 0.00 0.00
direct path read 10 0.00 0.00
log buffer space 45 0.61 12.01
log file switch completion 2 0.28 0.33
log file sync 1 0.00 0.00
SQL*Net message to client 1 0.00 0.00
SQL*Net message from client 1 0.00 0.00


Jonathan Lewis wrote in his book that you cannot use direct-path insert using the APPEND hint into an IOT table. Can this be the reason for the large enqueue timed event? Are the 6 sessions waiting to lock the index to modify it?

Thanks,
-Peter

Tom Kyte

Followup  

August 22, 2004 - 4:55 pm UTC

Umm, you were blocked by someone else for an excessively long period of time here. Enqueue waits:

enqueue 1911 3.00 5726.43


normal "blocker, blockee" problem here! doh, it was me doing it to you.

only one session at a time can append. only one session at a time can direct path insert into a table.

sorry -- use normal insert, I goofed.

delete 2M records without dropping table.

August 23, 2004 - 9:18 pm UTC

Reviewer: Sean from NJ, USA

Hi Tom,

I have to delete 2M records from 6M records and the table has about 400 columns (Oracle 920, Solaris 9).

I understand that your suggestion of creating temp table with the records I needed, then drop the original table and change the temp table name.

But since our table is not that big and the application is using this table all the time, we try to use traditional delete method to accomplish this.

(1)If we have enough redo tablespace, what is the good number of rows for each delete. I tried each commit for 10000 records or 1000 records. Both are quite slow.
(2)Anything you can suggest to improve the speed of the delete in the following procedure.

Thanks so much for your help

------------------------------------------------------------
create or replace procedure sp_del_audit(p_total_loop_time number)
is

total_loop_time number := 1;

BEGIN
WHILE total_loop_time <= p_total_loop_time
LOOP
delete /*+ parallel */ from audit a
where rownum < 10000
and exists ( select /*+ parallel */ null from audit_temp b
where a.Pkey = b.Pkey and
a.row_type = b.row_type and
a.update_time = b.update_time);

commit;

total_loop_time := total_loop_time + 1;

END LOOP;
END;
/
------------------------------------------------------------




Tom Kyte

Followup  

August 24, 2004 - 7:19 am UTC

1) assuming you meant UNDO tablespace, there is no such thing as a REDO tablespace...

removing 2 million rows, with 400 columns (every row in this table is CHAINED, when you have more than 255 columns...) and probably many indexes -- it is not going to be what you might term "speedy".

2) do it as a SINGLE delete (deleting 10,000 records in "parallel" doesn't even begin to make sense here. PQ is for BIG BIG BIG things. this is tiny)

delete from audit
where (pkey, rowi_type, update_time) in ( select pkey, row_type, update_time
from audit_temp);

nothing more, nothing less. but don't expect this to be super fast if this table is indexed. removing 33% of the index entries is going to take some amount of time.

Deleting 90% of rows

October 13, 2004 - 10:54 pm UTC

Reviewer: A reader

As part of some ETL processing, I need to delete 90% of the rows in a large (6million rows) table. This table has a PK index so traditional delete takes a long time. I see the following options

1. Mark the index unusable, delete and rebuild index nologging. This is significantly faster than plain delete
2. create new table as remaining 10% of the rows, drop, rename, etc
3. Precreate a regular or GTT and do insert /*+ append */ in my batch program.

(2) is not very attractive as I dont want to be doing DDL in my code.

How would you compare (1) and (3) above? Both of them would generate similar amounts of undo/redo, take roughly same amount of time?

Thanks

Tom Kyte

Followup  

October 14, 2004 - 9:19 am UTC

ETL is 'special', it doesn't matter 5 times a minute.

I'd go with #2 actually, CTAS a newone, drop oldone, rename newone to oldone.

#3 is "dangerous" -- you could lose everything in the event of a failure.

October 14, 2004 - 11:02 am UTC

Reviewer: A reader

"#3 is "dangerous" -- you could lose everything in the event of a failure"

Why would I lose everything? What kind of failure? Instance or media failure? Oracle guarantees recoverability of commited transactions, right, why do you bring that up here?

The only reason I dont want to do (2) is that I usually want to prevent DDL in my code. The only difference between (2) and (3) is that the table is already created in (3), right? Yes, (3) would generate much more redo/undo (lesser if I create the table as nologging), but I thought using a GTT would cut down on the redo/undo as well?)

Tom Kyte

Followup  

October 14, 2004 - 11:16 am UTC

i saw "use a gtt"

i could only assume you mean:

insert rows to keep into gtt
truncate table
insert rows from gtt into the table

that would be dangerous. if you fail after the truncate, before/during the insert -- gone, all gone. gtt would be dangerous here.

(truncate is ddl btw...)

How can you Display multiple rows in one record

October 14, 2004 - 2:34 pm UTC

Reviewer: Mack from USA

Hi Tom,

Let's suppose in deptno 10, there are 3 to 5 employees, I want to see the emp names like
TOM, KIM, JOHN... and so on.

Is there an easy way to do it in SQL? Not through PL/SQL. The number of records are unknown, it could have 10, 20 or one hundred. Please advise.

Tom Kyte

Followup  

October 14, 2004 - 7:42 pm UTC

collect() in 10g

stragg (search for it -- nasty plsql you install once and use over and over and over and over and over) in pre 10g

Query first and then update

October 15, 2004 - 1:23 am UTC

Reviewer: A reader

We have two tables with approximately 28 million and 35 million records respectively. These tables are joined to produce data to be displayed to users using IE browsers. Based on this information, around 10000 records get added to these tables and around the same updated daily. Our SLA is to display each screenful (100 rows) in 2-4 seconds.

While partitioning is being reviewed to improve the performance for the queries, could you let us know if there are any issues regarding partitions? For instance someone has reported that using global indexes on a partitioned table has degraded the performance.

Tom Kyte

Followup  

October 15, 2004 - 11:19 am UTC

i seriously doubt partitioning is going to be used to increase the performance of these queries.

partitioning -- great for speeding up a full scan. are you suggesting to full scan and still return in 2-4 seconds?


"someone reported using global indexes ... degraded performance"

I just about fell out of my chair on that one. geez.


If you have my book "Effective Oracle by Design" -- I go into the "physics behind partitioning".

In order to return the first 100 rows to a web based application -- you are doing to be using indexes (or you are not going to be doing what you signed up to do -- funny, you have an SLA in place but no idea if you can live up to it...). whether the tables are partitioned or not probably won't have any bearing on making this faster. Partitioning could certainly make things slower if you do it wrong, but in a system like this, you would be looking at partitioning to make administration easier and hopefully not NEGATIVELY impact performance.

given two tables...
index access
get 100 rows

I personally would be shooting for well under 1 second response times for everything -- regardless of whether there was 1 row or 1 billion.


don't get the tie into "query first and then update" though.

October 18, 2004 - 1:33 am UTC

Reviewer: A reader

"Our SLA is to display each *screenful* (100 rows) in 2-4 seconds."

In our case the queries, after joining 2 tables of 28 million and 35 million records, could return 10 screens or 600 screens or many more screens based on the query parameters. Each screenful (with 100 records in each screen) should appear in 2-4 seconds.

Tom Kyte

Followup  

October 18, 2004 - 8:13 am UTC

so, you've signed up for an SLA you have no idea if you can meet.


but hey -- using indexes to retrieve 100 rows from 1, 1000, 10000000000000 should be about the same amount of time (and way under subsecond).


but -- getting to "screen 600" is not. Look to google as the gold standard for searching and web pagination

o totally estimate the number of returned rows -- don't even THINK about giving an accurate count

o don't give them the ability to go to page "600", pages 1-10 is more than sufficient

o even if there is a page 600 -- realize it doesn't make sense to go there (no human could know "what I need is on page 600 -- 6,000 rows into this result set). Google stops you at page 99

o understand that page 2 takes more time to retrieve than page 1, page 50 more than 2, and so on (as you page through google --each page takes longer)


But perhaps most importantly -- laugh at people that say things like:

"For instance
someone has reported that using global indexes on a partitioned table has
degraded the performance."


and say -- "yes, it has when improperly applied it can degrade performance, there have also been sightings of systems where it didn't affect performance at all, there have also been reported instances of them massively improving performance. Now what we have to do is understand how the feature works, what it does, and how it might apply to our problem"

The neat thing about that paragraph -- it is INFINITELY reusable. You can use it with regards to any feature!


(if you have effective Oracle by design -- i go into the "physics" of partitioning and who -- without the judicious use of global indexes, your system could fall apart and run really really slow as well)






Senior Statistical Analyst

October 18, 2004 - 5:58 pm UTC

Reviewer: Hank Freeman from Atlanta, GA

18 October 2004

Tom,

Thanks for the lob_replace code !!!
It worked wonders when trying to fix about 25,000 clobs with a known error. Here my detailed discussion in outline and then in detail.

1. What type of data was is in the CLOB
2. What went wrong to create the error
3. What was done to correct it.
a. Stored Proc
b. Declare syntax
c. VBA code in Excel


1. What type of data was is in the CLOB. The company has about 45,000 internet/intranet web pages stored in an Oracle database table field which is a CLOB. Meaning instead of calling the website from a file server or physical location the site, the entire website-webpages/html source code is held in the table at this CLOB field.
2. What went wrong to create the error. An error occurred when these records were being modified to remove a specific piece of information and a replacement null character was inserted. The null character for some unknown reason did not work and the web-page information in the CLOB field got garbage appended to the end of the web-page after the closing </html> tag.
a. It was easy to spot the records but a conventional update within would not work because of all the size of the web-pages being so large and the special characters that were found within and the unique nature of working with clobs.
b. I contacted a friend who pointed me to your site and URL: </code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:1533006062995
c.    Upon examination of the last changes you made for a previous person, the solution came into focus clear on how to update the records and, it had to be done with a program.
3.    What was done to correct the problem. 
a.    The last stored procedure you listed on your website was used, because it was applicable to our situation. This procedure allowed the creation of a specific version of the Declare sql to select the bad part of the CLOB record and replace it with a “ ” (space character) and not a null.
i.    The Stored Proc syntax, is not listed since it is the Tom’s “lob_replace” syntax verbatim.
ii.    The Declare statement is listed below would take the unique record id and the starting location of the bad information in the CLOB. The sql would then pass the CLOB, the data to find, and the replacing string (a space) to the stored procedure. 

 
Declare Sql syntax

declare   
l_lob clob;   
l_substr  varchar2(32767);   
begin   
   select    KS_DESCRIPTION into l_lob   
   from      ks_longdescription   
   WHERE     KS_LONGDESCRIPTIONID = 55012 for  update;   
     select    SUBSTR(KS_DESCRIPTION, 876, length(KS_DESCRIPTION) - 876) into l_substr   
     from      ks_longdescription   
     WHERE     KS_LONGDESCRIPTIONID = 55012;   
     hf_clob_replace (l_lob, l_substr, ' ');   
end;


iii.    It was decided to use Excel VBA to control the update process. It was contemplated to write a second procedure with for the declare statement and just call execute it with parameters. However, as you noticed the maximum value for the l_substr Varchar2 field was declared as 32767.  It was already known that some of the erroneous parts of the CLOB data were larger than 25,000 characters. Therefore, Excel was used to ascertain which records were going to be within this limit and those outside a 25,000 character limit would require a manual fix.
iv.    Approximately 6,000 records were loaded into an Excel worksheet and the following Macro code was written to process/update the records. This was the optimum method of performing the cleanup because the VBA code was written to decide:
1.    Length of clob
2.    Starting location of the junk in the clob
3.    The unique record id was already known and with the aforementioned starting location being determined by the VBA code, it was a simple task to just pass there parameters so the VBA Declare SQL string and create a specific run for each qualifying record.
4.    The VBA code then would execute the Declare SQL string, firing up the stored procedure hf_lob_replace, thus cleaning up the record.
5.    If a record could be processed it was, otherwise it was marked for manual cleanup. 
6.    During the process it was determined that any l_Substr value greater than 20,000 would cause an Oracle error on the Buffer size. Therefore, upon see that the error would occur often any l_Substr(p_what) value greater then 20,000 was immediately marked for a “SQL Buffer Failure”, meaning it would have to be manually updated.. The important part here was that the process would not stop if did not like a specific record, it just kept going.

In Summary, it took less than ONE second to fix a record ,with an auto-commit turned on and we knew it was working for the process could be watched. Within four hours (6,000 per hour) using two systems,  the 25,000 records were corrected with only 150 needing a manual fix, a highly acceptable result.

 
In closing, I would like to give credit to the following.
•    Mr. Andy Sword, who is/was a subscriber to Tom’s web page and submitted to me the first draft of the Declare statement we used. This was what we really needed to know !!!  I had never worked with CLOB, so this was a real leason.
•    Tom’ Web page: 
http://asktom.oracle.com/pls/ask/f?p=4950:1:
•    Tom Reed: 
http://the-big-o.port5.com/index.shtml <code>where some of the best Excel/VBA code for interfacing with Oracle can be found.
• I have listed the VBA code below for those who are so inclined to read it. There is nothing there that is private and confidential. So anyone who can learn from it may feel free to do so.


Respectfully

Hank Freeman
FCS Ltd
hfreeman@msn.com


updating millions of records

October 26, 2004 - 10:28 am UTC

Reviewer: Josphat from KENYA

You approach is quite ok for one line update but how about procedure which updates the next row based on value of previous row.
Kindly look at the following scripts and give hard suggestions on how 5 million rows could be processed in under 10 hours...capturing the business logic
----starting here
---3 Scripts to process over 5 million rows
---and constitute approx 40000 distinct identifiers (id)
---should exec as ordered

drop table temporary_emphist;


create table temporary_emphist
(ID VARCHAR2(15),PAYDATE DATE, AMOUNT NUMBER(10,2),
PPART NUMBER(13,2), IPART NUMBER(13,2), INTRATE NUMBER(5,2), OBAL NUMBER(13,2),
REF NUMBER(7), NOOFMONTHS NUMBER(13,2), ACCRUED_INT NUMBER(15,2))
/

-----values(ID ,PAYDATE,AMOUNT , PPART, IPART, INTRATE, OBAL, REF, NOOFMONTHS, ACCRUED_INT)
-----data for different ids being inserted (one id might have as many as 300 entries)
-----but i have put just a few for each id
----set for customer id 10979844

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE, OBAL)
values('10979844','30-jun-2003',2000,2,112000);


insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10979844','30-jul-2003',2000,2);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10979844','30-aug-2003',2000,2);


insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10979844','30-dec-2003',2000,2);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10979844','30-jan-2004',1950,2);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10979844','30-jun-2004',5000,2);


insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10979844','30-aug-2004',4000,2);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10979844','30-oct-2004',3000,2);

----set for customer id 10779844

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE, OBAL)
values('10779844','30-jun-2003',2000,2,99380);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10779844','30-jul-2003',2000,2);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10779844','30-aug-2003',2000,2);


insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10779844','30-dec-2003',2000,2);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10779844','30-jan-2004',1950,2);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10779844','30-jun-2004',5000,2);


insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10779844','30-aug-2004',4000,2);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10779844','30-oct-2004',3000,2);



----set for customer id 10679844
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE, OBAL)
values('10679844','30-jun-2003',800,4,40390);


insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-jul-2003',1500,4);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-aug-2003',1600,4);


insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-dec-2003',2000,4);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-jan-2004',1950,4);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-mar-2004',1950,4);


insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-jul-2004',1000,4);


insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-aug-2004',1000,4);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-sep-2004',1992,4);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-oct-2004',1200,4);


commit;
set lines 1000
select * from temporary_emphist
order by paydate,id;
create index temporary_emphistid on temporary_emphist(id);
create index temporary_emphistpd on temporary_emphist(paydate);

---script 1 (will number transactions for each customer from 1 to ..n then start from 1 for the next customer
DECLARE -----Procedure 19
V_2 NUMBER(7) :=1;
V_1 VARCHAR2(18);
v_3 VARCHAR2(18);
v_4 VARCHAR2(18) :='T';
v_5 date;
CURSOR DRECS IS
SELECT ROWID,id,paydate
FROM temporary_emphist
ORDER BY ID,paydate;
BEGIN
OPEN DRECS;
LOOP
FETCH DRECS INTO V_1,v_3,v_5;
EXIT WHEN DRECS%NOTFOUND;

if v_3 <> v_4 then
v_4 := v_3;
v_2 := 1;
end if;
UPDATE temporary_emphist
SET REF = V_2
WHERE ROWID = V_1;
V_2 := V_2+1;
END LOOP;
commit;
CLOSE DRECS;
END;
/
select * from temporary_emphist
order by id,paydate;

-----script 2

----gets the difference in months from one transaction to the next

DECLARE -----Procedure 20
V_1 NUMBER(7) :=0;
V_2 NUMBER(15,2):=0;
V_3 NUMBER(15,2) :=0;
v_4 NUMBER(15,2) :=0;
V_5 NUMBER(15,2) :=0;
V_6 NUMBER(15,2) :=0;
V_7 varchar2(15) :='0';
V_8 NUMBER(15,2) :=0;
v_9 number(15,2);
v10 number(35,10) := ROUND(1/12,10);
V11 number(15,2) ;
V12 DATE;
V14 DATE;
V13 NUMBER(13,2);
V15 number(5);

CURSOR UPLO IS
SELECT REF,OBAL,amount,ID,INTRATE,last_day(PAYDATE) FROM temporary_emphist
where nvl(ref,0)>1
ORDER BY ID, REF;

BEGIN
OPEN UPLO;
LOOP
FETCH UPLO INTO V_1,V_2,V_3,V_7,V11,V12;
EXIT WHEN UPLO%NOTFOUND;
---V13 :=V12;
uPDATE temporary_emphist
SET NOOFMONTHS=mONTHS_BETWEEN(v12,last_day(paydate))
WHERE NVL(REF,0) = V_1-1 and id = V_7 ;


END LOOP;

commit;
CLOSE UPLO;
END;
/
select * from temporary_emphist
order by id,paydate;

---script 3
-------------------------------------------puts interest per month
-------------------------------------------CALCULATES INTEREST ON OUTSTANDING BALANCE
---as per the numbering above(script 1) value of rows for customer
----will be manipulated to get values for the subsequent row though some input from the
----current row will be integrated with derivative of the previous to populate values into the
----current row. This is applied for each customer untill all are cleared


SELECT 'ESTABLISHING BALANCES ON REDUCING BASIS-----Procedure 21 STARTS ' FROM DUAL;
DECLARE -----Procedure 21
V_1 NUMBER(7) :=0;
V_2 NUMBER(15,2):=0;
V_3 NUMBER(15,2) :=0;
v_4 NUMBER(15,2) :=0;
V_5 NUMBER(15,2) :=0;
V_6 NUMBER(15,2) :=0;
V_7 varchar2(15) ;
V_8 NUMBER(15,2) :=0;
v_9 number(15,2);
v10 number(35,10) := ROUND(1/12,10);
V11 number(15,2) ;
V12 NUMBER(15,2) :=0;
V13 NUMBER(15,2);
v14 number(15,2) :=0; ---interest hold
v15 number(15,2) :=0; ---balance hold
v16 number(15,2) :=0; ---ipart hold
V20 VARCHAR2(18) :='T';
CURSOR UPLO IS
SELECT NVL(REF,0),NVL(OBAL,0),NVL(amount,0),ID,INTRATE,nvl(NOOFMONTHS,1) FROM temporary_emphist

ORDER BY ID,REF;
BEGIN
OPEN UPLO;
LOOP
FETCH UPLO INTO V_1,V_2,V_3,V_7,V11,V12;
EXIT WHEN UPLO%NOTFOUND;
if v_7<>v20 AND V_1=1 then
v20 :=v_7;
V_4 :=V_2-V_3;
v16 :=ROUND((((v_4)*POWER((1+V11/100),v10*V12)-V_4)/V12),2);
v15 :=v_4;
ELSE

select obal,INTRATE,nvl(NOOFMONTHS,1) into v_4,V11,V12 from temporary_emphist
where NVL(ref,0) = V_1 - 1 AND ID = V_7 ;

select nvl(amount,0) into v_3 from temporary_emphist
where NVL(ref,0) = V_1 AND ID = V_7 ;

v15 := ROUND((v_4)*POWER((1+V11/100),v10*V12),2)-v_3; -------obal
v14 :=ROUND((v_4)*POWER((1+V11/100),v10*V12),2)-V_4; ---accrued interest
v16 :=ROUND((((v_4)*POWER((1+V11/100),v10*V12)-V_4)/V12),2); --ipart
END IF;
UPDATE temporary_emphist
SET OBAL = v15,ACCRUED_INT =v14,IPART=v16
WHERE NVL(REF,0) =V_1 AND ID = V_7 ;
--END IF;
END LOOP;
commit;
CLOSE UPLO;
END;
/
select * from temporary_emphist
order by id,paydate;


I dont know whether there is a faster way to employ the logic...I've seen your authority..give it a short


Tom Kyte

Followup  

October 26, 2004 - 10:42 am UTC

one word for you:

analytics


they rock, they roll. lag/lead -- i frankly did not really read all of your code (i'm not a compiler, v_1, v_2 etc make it really truly utterly hard to follow and comprehend) but if you need to update a row based on the prior row(s) after sorting -- analytics -- lag, lead, sum, whatever.

database

October 26, 2004 - 10:45 am UTC

Reviewer: cp

Hi Tom,

I have to appear for interview and need your help.
1. Hows to decide for datafile size?
2. How to decide for redolog file size?
3. Any other factors which you feel as interviewer to ask?

Please help me out as its really urgent
Regards




Tom Kyte

Followup  

October 26, 2004 - 10:47 am UTC

how do you know they'll ask these questions -- not questions I would ask.

datafile size = pure function of amount of data you want to store. I mean, well, what else could you say?

redolog file size is a function of

o amount of redo generated by most of your transactions
o the number of these transactions you do in a period of time

predominantly (but if you are an experienced dba -- why not just fall back on your knowledge of what you've done in the past?)

problem

October 26, 2004 - 1:42 pm UTC

Reviewer: cp

So untill or unless we get some opportunity we cant learn the skills to decide for initial size of database right?

Please provide words about or facotrs to think on this

Thanks!


Tom Kyte

Followup  

October 26, 2004 - 2:43 pm UTC

experience.

you get an entry level job (you have to sort of start in the basement) and get experience.

update of millions of records/rows

October 29, 2004 - 10:29 am UTC

Reviewer: Josphat from KENYA

Hi Tom,
Thanks for decisive debate in this site.

Kindly go through my script cos now I have elaborated on
the v_1,v_2..etc just variables as you will see...SORRY for MY FIRST SUBMISSION. I will be grateful if ya suggestion /code
could reduce my processing time for 5 million records from
15 hrs to 1, 2, 3? .


---Script to process over 5 million rows
---and constitute approx 40000 distinct identifiers (id)

-------------------------------------------puts interest per month
-------------------------------------------CALCULATES INTEREST ON OUTSTANDING BALANCE
-----EACH customer HAS OWN ROWS STARTING 1..N AND REPRESENTED BY REF COLUMN ON TABLE
-----MULTIPLIER FACTOR FOR EACH MONTH IS REPRESENTED BY NOOFMONTHS
-----will be manipulated to get values for the subsequent row though some input from the
-----current row will be integrated with derivative of the previous to populate values into the
-----current row. This is applied for each customer untill all are cleared
-----PROCESSING IS SEQUENCIAL (I.E FROM REF 1 ..N FOR EACH CUSTOMER)
-----REF =1 FOR EACH CUSTOMER HOLDS A VALUE FOR OBAL WHICH WILL BE USED TO FILL UP
-----THE SUBSEQUENT ROWS AFTER MANIPULATION WITH MULTIPLIER FACTOR


----work starts next

drop table temporary_emphist;


create table temporary_emphist
(ID VARCHAR2(15),PAYDATE DATE, AMOUNT NUMBER(10,2),
PPART NUMBER(13,2), IPART NUMBER(13,2), INTRATE NUMBER(5,2), OBAL NUMBER(13,2),
REF NUMBER(7), NOOFMONTHS NUMBER(13,2), ACCRUED_INT NUMBER(15,2))
/




----DATA set for customer id 10979844

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE, OBAL,ref,NOOFMONTHS)
values('10979844','30-jun-2003',2000,2,112000,1,1);


insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10979844','30-jul-2003',2000,2,2,1);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10979844','30-aug-2003',2000,2,3,4);


insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10979844','30-dec-2003',2000,2,4,1);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10979844','30-jan-2004',1950,2,5,5);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10979844','30-jun-2004',5000,2,6,2);


insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10979844','30-aug-2004',4000,2,7,2);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10979844','30-oct-2004',3000,2,8,0);

----DATA set for customer id 10779844

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE, OBAL,ref,NOOFMONTHS)
values('10779844','30-jun-2003',2000,2,99380,1,1);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10779844','30-jul-2003',2000,2,2,1);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10779844','30-aug-2003',2000,2,3,4);


insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10779844','30-dec-2003',2000,2,4,1);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10779844','30-jan-2004',1950,2,5,5);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10779844','30-jun-2004',5000,2,6,2);


insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10779844','30-aug-2004',4000,2,7,2);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10779844','30-oct-2004',3000,2,8,0);



----DATA set for customer id 10679844
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE, OBAL,ref,NOOFMONTHS)
values('10679844','30-jun-2003',800,4,40390,1,1);


insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-jul-2003',1500,4,2,1);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-aug-2003',1600,4,3,4);


insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-dec-2003',2000,4,4,1);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-jan-2004',1950,4,5,2);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-mar-2004',1950,4,6,4);


insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-jul-2004',1000,4,7,1);


insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-aug-2004',1000,4,8,1);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-sep-2004',1992,4,9,1);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-oct-2004',1200,4,10,0);


commit;
set lines 1000
select * from temporary_emphist
order by paydate,id;

create index temporary_emphistid on temporary_emphist(id);
create index temporary_emphistpd on temporary_emphist(paydate);

---script STARTS


DECLARE
V1_REF temporary_emphist.REF%TYPE;
V2_OBAL temporary_emphist.OBAL%TYPE;
V3_AMOUNT temporary_emphist.amount%TYPE;
v_4 NUMBER(15,2) :=0;
V5 NUMBER(15,2) :=0;
V6 NUMBER(15,2) :=0;
V7_ID Temporary_emphist.ID%TYPE;
V8 NUMBER(15,2) :=0;
v9 number(15,2);
v10 number(35,10) := ROUND(1/12,10); ---CONVERTS ONE MONTH INTO YEARS
V11_INTRATE temporary_emphist.INTRATE%TYPE;
V12_NOOFMONTHS temporary_emphist.NOOFMONTHS%TYPE;
V13 NUMBER(15,2);
v14 number(15,2) :=0; ---interest hold
v15 number(15,2) :=0; ---balance hold AS CALCULATED
v16 number(15,2) :=0; ---ipart holdING INTEREST AS CALCULATED
V20 VARCHAR2(18) :='T'; ----ARBITRALY KEY TO ENABLE FOCUS ON DIFFERENT IDS
CURSOR UPLO IS
SELECT NVL(REF,0),NVL(OBAL,0),NVL(amount,0),ID,INTRATE,nvl(NOOFMONTHS,1) FROM temporary_emphist

ORDER BY ID,REF;
BEGIN
OPEN UPLO;
LOOP
FETCH UPLO INTO V1_REF,V2_OBAL,V3_AMOUNT,V7_ID,V11_INTRATE,V12_NOOFMONTHS;
EXIT WHEN UPLO%NOTFOUND;

----V20 IS VARIABLE KEY FOR CHECKING WHETHER WORKING ON SAME ID

if v7_ID<>v20 AND V1_REF=1 then
v20 :=v7_ID;
V_4 :=V2_OBAL-V3_AMOUNT;
v16 :=ROUND((((v_4)*POWER((1+V11_INTRATE/100),v10*V12_NOOFMONTHS)-V_4)/V12_NOOFMONTHS),2);
v15 :=v_4;
ELSE

select obal,INTRATE,nvl(NOOFMONTHS,1) into v_4,V11_INTRATE,V12_NOOFMONTHS from temporary_emphist
where NVL(ref,0) = V1_REF - 1 AND ID = V7_ID ;
select nvl(amount,0) into v3_AMOUNT from temporary_emphist
where NVL(ref,0) = V1_REF AND ID = V7_ID ;
--------V10 IS ONE MONTH COVERTED INTO YEARS
v15 := ROUND((v_4)*POWER((1+V11_INTRATE/100),v10*V12_NOOFMONTHS),2)-v3_AMOUNT; -------CALCULATED obal
v14 :=ROUND((v_4)*POWER((1+V11_INTRATE/100),v10*V12_NOOFMONTHS),2)-V_4; ---CALCULATED accrued interest
v16 :=ROUND((((v_4)*POWER((1+V11_INTRATE/100),v10*V12_NOOFMONTHS)-V_4)/V12_NOOFMONTHS),2); --ipart
END IF;
UPDATE temporary_emphist
SET OBAL = v15,ACCRUED_INT =v14,IPART=v16
WHERE NVL(REF,0) =V1_REF AND ID = V7_ID ;
--END IF;
END LOOP;
commit;
CLOSE UPLO;
END;
/
select * from temporary_emphist
order by id,paydate;
spool off





Tom Kyte

Followup  

October 29, 2004 - 10:49 am UTC

this isn't the right place for "recode my application".

show me your attempt at using analytics.

(UPDATE temporary_emphist
SET OBAL = v15,ACCRUED_INT =v14,IPART=v16
WHERE NVL(REF,0) =V1_REF AND ID = V7_ID ;

is still really obscure, comments didn't "do much" really)

Reader

November 21, 2004 - 3:35 pm UTC

Reviewer: dheeraj from UK

Excellent discussion as usual on this site.

Selecting data over a database link from 7.3.4 to 10g

December 02, 2004 - 10:54 am UTC

Reviewer: Stewart W. Bryson from Atlanta, GA

Tom:

If I use the "INSERT /*+ APPEND */ ..." statement on a 10g database, but the "SELECT..." part of the statement is over a database link to a 7.3.4 database, would the bulk-loader still be invoked, or would the hint be ignored?

Also, if I do a "MERGE... " statement with the "INSERT..." part of the statement being an "INSERT /*+ APPEND */...", would the MERGE work correctly, and would the bulk-loader be used?

Thanks for your time.

Tom Kyte

Followup  

December 02, 2004 - 11:54 am UTC

trick question

it isn't going to work -- 9ir2 and later do not interoperate with 7.3 and earlier.


if it did work

insert /*+ append */ into local_table select * from remote_table@dblink;

could be direct pathed.


insert /*+ append */ is supported only for insert as select -- not insert values.


ops$tkyte@ORA9IR2> create table t ( x varchar2(25), y varchar2(25) );
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> merge into t
  2  using dual
  3  on ( t.x = dual.dummy )
  4  when matched then update set y = dummy
  5  when not matched then insert /*+ append */ values ( dummy, null );
 
1 row merged.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select * from t;
 
X                         Y
------------------------- -------------------------
X
 
ops$tkyte@ORA9IR2> insert /*+ append */ into t values ( 'y', null );
 
1 row created.
 
ops$tkyte@ORA9IR2> select * from t;
 
X                         Y
------------------------- -------------------------
X
y
 
ops$tkyte@ORA9IR2> insert /*+ append */ into t select 'z', null from dual;
 
1 row created.
 
ops$tkyte@ORA9IR2> select * from t;
select * from t
*
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel
 
 
ops$tkyte@ORA9IR2>


<b>only the last insert append was a direct path...</b>
 

A follow-up question

December 02, 2004 - 1:20 pm UTC

Reviewer: Stewart W. Bryson from Atlanta, GA

Thanks for the good information Tom.

I've looked and I've looked, but I still cannot find the information about 9iR2 and 7.3.4 database links stated in the documentation. As a matter of fact, the 9iR2 Migration guide lists that 7.3.4 Materialized View sites can interact with 9iR2 master sites, and vice versa:

</code> http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96530/migrep.htm#620714 <code>

As Advanced Replication utilizes database links, this seems a little contradictory. Of course, if you tell me once more (I'm a bit stuborn) that it won't work, I promise to believe you.

As I am currently architecting a solution to try and bring some of the data from our European sites into a central repository, and as I don't yet have access to those sites, I cannot simply test it as we have no 7.3.4 sites (thank God) in the US.

I appreciate your time and understanding.

Tom Kyte

Followup  

December 02, 2004 - 7:21 pm UTC

contact support -- they will confirm


version 7.3 was so far out of any sort of support at the time of 9ir2 being released that they never even considered testing it. and it does not work.

I believe that documentation to be erroneously left over from 9ir1.

December 03, 2004 - 10:30 am UTC

Reviewer: mg from Germany

Hi Tom,

I have a table T in tablespace TS1.

Every day so much of insertions on table T.

So I needs to check the a perticular size of TS is exceeds, if then I needs to move the older data to another tablespace.

So could you please tell me the best way to do this?

meny thanks in advance.

Tom Kyte

Followup  

December 03, 2004 - 11:01 am UTC

you sound like you really want a rolling window partitioned table -- a table partitioned by date.

December 06, 2004 - 10:20 am UTC

Reviewer: A reader

Can we use trigger to check the % of filled tablespace size ?

Tom Kyte

Followup  

December 06, 2004 - 12:35 pm UTC

yes, but -- why? just to make the code run really really slow?

What about the larger tables?

December 15, 2004 - 3:44 pm UTC

Reviewer: Steven from Austin, TX USA

I used to work at a place where the main table had 4 billion rows and growing. This tracked manufacturing details and rows for an item trickled in and then were "archived" (written to tape then deleted) as one. Finding a good partitioning scheme was difficult due to the large variable time that items could stay in the manufacturing process. That is, two rows inserted on the same day could be archived on vastly different timescales.

The system had around 40 million rows inserted and deleted on a daily basis (about 1%). Some of the items could have 1 million or more records; they needed to be archived at the same time. The table had a PK and a secondary index.

Obviously CTAS was not an option due to the 24x7 availability requirement and besides we usually didn't have enough space for a full copy of the main table.

Is this a case where you just kick off the delete and take your lumps?

(Hmm, given past growth curves, the table should by now be up to 20 billion rows and 200 million rows in/out daily.)

Tom Kyte

Followup  

December 15, 2004 - 6:25 pm UTC

or, enable row movement....

partition by a date field, the date field represents "record done as of"

when you initially insert it, it has a "done as of" way into the future (in the active partition(s))

when you are done, you update this field and it moves from active into the "you can archive me later" partitions.


It would be like the delete but the delete would be happening continously as little transactions instead of a big delete all at once.

data warehouse daily load

December 16, 2004 - 3:58 am UTC

Reviewer: shgoh from Malaysia

Dear Tom,

One of my dw developer is having performance problem with their daily data load from legacy system. Approximately 5-8 million records inserted into a table (append load table). The table has a lots of constraints and none of us are familiar with the data given (it could have a constraint voilation).

What is the method to speed up the data loading process beside parallel, direct load and nologging ?

Appreciated if you can share your view. Since we don't know well about the data and the constraints, is it advisable to disbale constraint in this case. Thanks

Rgds
SHGoh

Tom Kyte

Followup  

December 16, 2004 - 8:18 am UTC

parallel
direct
nologging

they are them. how long is "long"

if you are adding 5/8 million records a day -- are you using partitioning? you can look at disabling portions of partitioned indexes and rebuilding them after the fact.

Not a partionioning table

December 17, 2004 - 4:14 am UTC

Reviewer: shgoh from Malaysia

Dear Tom,

It is not a partioning table. Would it help if I go for parallel , nologging, direct and disable portion of the constraints? Thanks

Rgds
Shgoh

Tom Kyte

Followup  

December 17, 2004 - 2:30 pm UTC

you are already doing parallel, nologging, direct -- disabling the constraints only means "gotta enable them".

you asked for additional ideas. partitioning.

How about Bulk Update?

December 22, 2004 - 5:13 am UTC

Reviewer: Tony from India

Tom,

I have two tables PO_HDR and PO_DTL having millions of records. I want to join the two tables, and update a flag in PO_HDR table. Can I use Bulk Update based on rowid here?

Example:

DECLARE

TYPE tblrowid IS TABLE OF ROWID INDEX BY BINARY_INTEGER;
arr_rowid tblrowid;

CURSOR po_cur
IS SELECT a.rowid FROM po_hdr a, po_dtl b
WHERE a.poseq = b.poseq
AND a.custid = b.custid
AND b.po_qty <> b.del_qty;

BEGIN

OPEN po_cur;
LOOP
FETCH po_cur BULK COLLECT INTO arr_rowid LIMIT 100000;
EXIT WHEN arr_rowid.COUNT = 0;

FORALL i IN arr_rowid.FIRST..arr_rowid.LAST
UPDATE po_hdr
SET del_flg = 'Partial'
WHERE rowid = arr_rowid(i);

COMMIT;
END LOOP;
CLOSE po_cur;

END;
/

Is rowid based bulk update is fast, and reliable? Is there any better, and faster approach?

Please advice me.




Tom Kyte

Followup  

December 22, 2004 - 9:41 am UTC

update po_hdr
set del_flg = 'Partial'
where (poseq,custid) in ( select poseq,custid
from po_dtl
where po_qty <> del_qty );


period.

To update or insert by sql statement and not procedurally

December 23, 2004 - 2:44 pm UTC

Reviewer: DAV

Hi,Tom,
I have 2 questions:
The one is :
By the update - sql statement that you wrote above
say i want to execute on millions records, and i want to make commit every 1000 records
how can i do it?
The second question is:
If i have to use pl/sql table and i want to load
alot of records to the table say 70,000- have i do it by:
fetch bulk bind limit 1000
or SELECT ... BULK BIND ....?
Why is it a problem to load many records to pl/sql tables?
Thank's
DAV

Tom Kyte

Followup  

December 23, 2004 - 3:08 pm UTC

you don't want to commit every 1000 records, that would take longer, consume more resources and break your transactional integrity.


if you have 70k rows to process, and want to do it 1k at a time, you'll be using fetch ... bulk collect ... limit 1000;

can you spell "memory"

Tom,sorry i didn't understand you.

December 23, 2004 - 3:49 pm UTC

Reviewer: DAV

Why fetch is better than sql statement at the case that written above. You told before that sql is better.
Can you answer me please to the second question thar i wrote above?
Tnank you very much.
DAV

Tom Kyte

Followup  

December 23, 2004 - 4:30 pm UTC

not understanding you.

you asked:

If i have to use pl/sql table and i want to load
alot of records to the table say 70,000- have i do it by:
fetch bulk bind limit 1000
or SELECT ... BULK BIND ....?
Why is it a problem to load many records to pl/sql tables?


and I answered??? the question wasn't "single sql vs procedural code"

you started with "if I HAVE TO USE..."

you would not select BULK COLLECT for 70k records, you would use fetch bulk collect with the limit clause.

I was unclear - let me explain

December 23, 2004 - 5:12 pm UTC

Reviewer: A reader

Sorry ,Tom,
i didn't explain myself well.
What i meant was:

First:

if i have to update million records in the plsql procedure
is it better to do it in update sql statement or procedural kode like
fetch.....? i want to commit every n records because of rollback segment so if i use sql update statement how can i do it?

Second:

General question - IF I USE PROCEDURAL CODE - why it is not good to load say 40k of records to plsql table by SELECT...BULK BIND... but by fetch bulk bind limit ...
As i know plsql table's records are kept in pga memory,so what happend if there will be 40k of records?
Please,clarify the theory.
Thank you very much.



Tom Kyte

Followup  

December 23, 2004 - 8:03 pm UTC

you need to size your rollback segment large enough to do you job. seriously -- really you do.

a million records is nothing.


big_table@ORA9IR2> update big_table set owner = lower(owner);

1000000 rows updated.

big_table@ORA9IR2> select used_ublk*8/1024 from v$transaction;


USED_UBLK*8/1024
----------------
92.4921875

big_table@ORA9IR2>

93meg, big deal.

If you commit every N records -- you do realize you have to make your code restartable somehow, so when you inflict the ora-1555 on YOURSELF by committing, or fail for whatever other reason -- you can restart where you left off. Now we are talking some code!


If you have to update tons of records and tons of columns AND they are indexed, disable indexes, consider recreating the table instead of a mass update (if you are updating millions of records -- this is going to be an offline thing anyway, you might be supporting queries but you are not going to have "high concurrency")


can you spell "memory"

and think about streaming -- keeping everyone busy. So you wait for a long time while oracle gets 40k records, jams them into ram, manages them, then you process them and do it all over. doing things in bite sized pieces is easier for everyone - couple hundred in , couple hundred out.

Update 1000000 in sql statement

December 24, 2004 - 1:24 am UTC

Reviewer: A reader

Thank's, Tom!
What is the message ora-1555?I haven't a guide.
And i really can't findthe way to mmit n rows in update sql statement.May you give me some idea?
About loading to PL/SQL table for example 80k, if i need procedural code of update at least 1000000 records and not update sql statement so why can'not i avoid pl/sql table and do it just by regular cursor?


Tom Kyte

Followup  

December 24, 2004 - 9:01 am UTC

search this site for 1555, you'll see lots and lots. check out Expert one on one Oracle -- i wrote about it lots.


YOU DO NOT WANT TO COMMIT THE ROWS INCREMENTALLY. that is my fundemental premise here.


If you have access to my book "Expert one on one Oracle" -- you can read all about it.

Another question

December 24, 2004 - 3:58 pm UTC

Reviewer: A reader

About loading to PL/SQL table for example 80k, if i need procedural code of
update at least 1000000 records and not update sql statement so why can'not i
avoid pl/sql table and do it just by regular cursor?

Thank's

Tom Kyte

Followup  

December 24, 2004 - 4:06 pm UTC

slow by slow is slow by slow (opps, keep doing that don't I, row by row is row by row)



Sorry,misunderstanding.

December 25, 2004 - 4:21 am UTC

Reviewer: A reader

I didn't explain myself well.
What i meant in my question is a generall question:
If a need a cursor ( and not sql statement) for some process that acceses to a big table,why is it better to load data to pl/sql table and run on plsql table than just fetch records directly from the db table ?





Tom Kyte

Followup  

December 25, 2004 - 9:10 am UTC

I understood, you didn't understand me.


slow by slow processing (a row at a time)

is slower than

o a single sql statement
o bulk collect and forall updates

doing things in BULK is

o faster
o uses less resources

index DDL in the midst of transactions

December 30, 2004 - 3:36 pm UTC

Reviewer: Nik Malenovic from Chicago, IL

Tom,

this is a followup to your March 07, 2004 followup related to ddl operations on indecies whilst there are outstanding transactions in progress on the underlying table. You said

"that'll happen whilst there are outstanding transactions, yes. You'll want to keep trying until you get a chance to drop it."

We just ran into a similar production problem: the system was slow because we were missing an index on a column and there were updates running full table scans (don't ask!). Since there are users banging on the darn thing, there was no way to shutdown and add index so we had to do it on the fly. Here's what we did:

declare
l_attempt pls_integer := 0;
l_limit pls_integer := 2000*10; -- 10 seconds
l_rindex binary_integer;
l_slno binary_integer;
l_target binary_integer;
begin
l_rindex := dbms_application_info.set_session_longops_nohint;
l_target := 42;

dbms_application_info.set_client_info('Vincent The Builder');
dbms_application_info.set_module('rebuild_index', 'between_transactions');
for l_attempt in 1 .. l_limit
loop
begin
dbms_application_info.set_session_longops
(
rindex => l_rindex,
slno => l_slno,
op_name => 'rebuilding the index',
target => l_target,
context => 0,
sofar => l_attempt,
totalwork => l_limit,
target_desc => 'attempt',
units => 'attempts'
);

execute immediate 'create index RESPONSE_QUESTIONID_INDX on response (questionid) nologging parallel 5';
dbms_output.put_line('SUCCESS - now rebuilding');
exit;
exception
when others then
null;
end;
if l_attempt = l_limit then
dbms_output.put_line('FAILED - there was not a phat chance to succeed');
end if;
end loop;
dbms_application_info.set_module(null, null);
dbms_application_info.set_client_info(null);
end;
/

and in another session we monitored the progress with

select sid, message from v$session_longops where time_remaining > 0;

that's all. This may be useful for somebody in a similar situation, hence the post.

thanks,


Nik


Tom Kyte

Followup  

December 30, 2004 - 4:10 pm UTC

you could try (for maximum "uptime")

create index .. ONLINE;

that would

a) block and wait for outstanding transactions to complete but...
b) block new transactions from starting and
c) once it started creating -- lets the new transactions proceeed and
d) at the very very end -- does a/b again to finish

reduces the amount of "downtime" by allowing people to still work in the middle (yes, they'll be full scanning, but working -- if that was something you wanted..)

CTAS doesnt have compute statistics?

January 06, 2005 - 11:15 am UTC

Reviewer: A reader

create index .... compute statistics;

Why doesnt create table have a similar compute statistics clause?

I have a CTAS that takes 5 hours during which Oracle already knows everything about the data (similar to create index). Gathering stats on this table later takes another 5 hours!

Is there a way to gather stats during the CTAS itself?

Thanks



Tom Kyte

Followup  

January 06, 2005 - 11:31 am UTC

Oracle does not already know about everything.

Have you traced a dbms_stats.gather_table_stats to see how many queries, in how many different ways it gathers stuff?

about the only thing Oracle knows after a CTAS is number of rows and number of blocks (and so do you, you can set that)

there is no way it was computing histograms and such


I don't believe it takes 5 hours after the fact -- if it does, have you considered "estimate"

January 18, 2005 - 8:53 am UTC

Reviewer: robin

I am trying to modify your code from above:

/**********************************************/
set (c1,c2,c3) = ( select max(decode(segment3,'xxxxx',ccid)),
max(decode(segment3,'zzzzz',ccid)),
max(decode(segment3,'wwwww',ccid))
FROM gl_code_combinations t2,
gl_sets_of_books gb
WHERE t2.segment5 = 'yyyyy'
and t2.segment4 = t1.s4
and t2.segment3 in ('xxxxx','zzzzzz', 'wwwww')
and t2.segment2 = t1.s2
and t2.segment1 = t1.s1
and t2.coa_id = gb.coa_id
and gb.set_of_books_id = 1 )
/***************************************************/
to allow me to update one column based on several different where clauses and having no luck.

What I want to do is:

update table t1

set t1.c1 = 'X' where t1.c5 and t1.c6 are null

set t1.c1 = 'A' where t1.4 is not null and t1.c8 > sysdate

set t1.c1 = 'B' where t1.4 is null and t1.c8 < sysdate

etc....

I've tried a case statement and decode like above, but neither seems to compile....thanks

Tom Kyte

Followup  

January 18, 2005 - 10:07 am UTC

well, in answer to the "what I want to do is"


update table t1
set t1.c1 = (case when c5 is null and c6 is null
then 'X'
when c4 is not null and c8 > sysdate
then 'Y'
when c4 is null and c8 < sysdate
then 'B'
end )
where (c5 is null and c6 is null )
or (c4 is not null anc c8 > sysdate)
or (c4 is null and c8 < sysdate)

I cannot comment on the other one -- well, because there isn't anything to comment on :)

updating large table

January 20, 2005 - 9:50 am UTC

Reviewer: Tracy from UK

I have table_a with 50 million rows and table_b with 10 million rows.

table_a has a new column added called statid

Table_b has statid and tabaid. tabaid has a unique constraint on it and it is also a foreign key to the primary key of table_a (id)

I want to update all rows in table_a to set the statid column to match the statid column from table_b where the tabaid column in table_b = the primary key in table_a

which of the following two do you think is better, and is there some other altogether different way which would be better still?

UPDATE
(SELECT a.statid newval
,b.statid oldval
FROM table_a a
,table_b b
WHERE a.id=b.tabaid)
SET newval = oldval

or

UPDATE table_a a SET a.statid =
(SELECT b.statid
FROM table_b b
WHERE a.id=b.tabaid)

I've tested the above on a subset of the data using rownum, and both statements seem to update at a rate of about 300,000 rows per hour, but I would like to get it going faster than this.

Tom Kyte

Followup  

January 20, 2005 - 10:46 am UTC

update the join.

expect to see a nice big juicy full scan and hash join. it'll be as good as it gets.

I have a similar situation where I must update millions of records fast

February 09, 2005 - 10:58 pm UTC

Reviewer: Ryan from Canada

First I want to say thank you very much for all your help this last week Tom, it has helped me out a lot. Unfortunately Oracle is not easy so I have another question.

I am updating millions of records using insert statements.
I am updating the records using insert with values. I'm not updating one table using another table.

example:
insert into dbname (FirstName, Number) values ('Ryan', 1);

called with different records millions of times.

When I call "commit" at the end of all the insert's it takes more than 17 hours to commit. Of course this was importing records from a 1.5 gigabyte text file with 83000 records. The time to call all insert statements was less than a half an hour. Can you think of any reasons why the commit would be so slow. There is another program at the office that can import the same data (in another database format) in less than 20 minutes and that program is calling insert and commit in a similar fashion so it can't be that oracle is just that slow. I'm also disabling all triggers at the beginning of the import, and enabling them at the end. I'm turning off logging and using /*+ APPEND */ and parallel hints but they probably don't help as I'm not using direct load inserts.

Any help would be greatly appreciated.


Tom Kyte

Followup  

February 10, 2005 - 1:07 am UTC

<quote>
I am updating millions of records using insert statements.
</quote>

really?

sorry-- but i don't believe you on your commment about commit, you are mistaken.

committing is fast, regardless of the number of records. Your method of observation is somehow "flawed".


Unless of course you have an "on commit refresh" set of mv's or something.


So, turn on 10046 level 12 trace and trace it, use tkprof to report on it and see what you see.

query retrival slow it effects in the whole system

February 10, 2005 - 3:41 pm UTC

Reviewer: Praveen from US

hi Tom
When developer using the escape sequence in the selection criteria the response time is slow upto 15sec. this delay time effect the whole system. Kindly suggest the appropriate way to resolve this.
select count(*)
from pf_document_rv where
((r_folder_path like '/Jobs/52021/Assembly/UNDER~_BAR/%' escape '~')
and (lower(r_object_id) <> lower('f004000000010471')))
takes about 15 secs to run

select count(*)
from pf_document_rv where
((r_folder_path like '/Jobs/52021/Assembly/NOBAR/%')
and (lower(r_object_id) <> lower('f004000000010471')))
takes less then one second.
Perhaps this is helpful ... if I add the escape '~' specifier to this query it also runs slowly even though it doesn't use the escape character.
regards
PKR

Tom Kyte

Followup  

February 11, 2005 - 7:38 pm UTC

so, why do you believe this has anything to do with escape?


(you do understand these two queries return totally different result sets and perhaps that -- that is the "difference")

have you tkprof'ed these queries and compared them?

RE : Row Migration

February 17, 2005 - 3:35 pm UTC

Reviewer: A reader from NJ, USA

Hi Tom,

You mentioned in one of the above discussion threads that while updating a null-column to a not null value we have to make sure that we have enough PCTFREE and no row migration happens.
I have a table that gets 2 million rows on a daily basis and I want to update say a column for only 50000 rows.

1. What should be my approach
A. if the column to update is indexed.
B. if the column to update is not indexed
C. How do I find out row migration that may happen
during the update

Thanks

Tom Kyte

Followup  

February 17, 2005 - 6:58 pm UTC

regardless of whether the column is indexed or not -- the approach would be "the same".  the index will make the update take longer as it does lots more work, but for just 50,000 rows -- it won't be a very big deal.

2,000,000 and 50,000 are only big if it is US dollars and it was your bonus and paycheck.  In a database, in 2005, they are very small numbers.

ops$tkyte@ORA9IR2> set timing on
ops$tkyte@ORA9IR2> update big_table set owner = owner||'x' where mod(id,40) = 0;
 
50000 rows updated.
 
Elapsed: 00:00:18.02
 
ops$tkyte@ORA9IR2> create index big_table_idx on big_table(owner);
 
Index created.
 
Elapsed: 00:00:27.84
ops$tkyte@ORA9IR2> update big_table set owner = owner||'y' where mod(id,40) = 0;
 
50000 rows updated.
 
Elapsed: 00:00:28.04
ops$tkyte@ORA9IR2> select count(*) from big_table;
 
  COUNT(*)
----------
   2000000
 
Elapsed: 00:00:03.02
ops$tkyte@ORA9IR2>


to discover if rows migrated, you would analyze the table (perhaps using the list chained rows, if you just analyze it, you'll get chain_cnt filled in the dictionary)

you cannot find out "before" the update what will happen. 

RE : Update

February 18, 2005 - 8:59 am UTC

Reviewer: A reader from NJ, USA

Tom,

I think I did not make the last statement regarding the UPDATE of 50000+ rows more clear. Here is my requirement :

1. I have a flat-file coming in to our system around 10.00 AM in the morning with over 2 million records and has the following fields :
ACCOUNT_NUMBER
SECURITY_ID
COST_BASIS
....
....
2. I have another flat file (a delta file containing only data that has changed) coming in on a daily-basis around 2.00 AM in the morning with around 50000+ records and has the following fields :
ACCOUNT_NUMBER
SECURITY_ID
.....
.....
which we directly load into a table B with the following fields :
ACCOUNT_NUMBER
SECURITY_ID
COST_BASIS <== which is set to NULL during this load
.....
.....
3. Once this file is loaded, we run a process which takes unmodified data (from prior run ie. which did not appear in file in step 2) so that the table B always has full-load which is the 2 million records I was talking about
4. We need to run a process that will update the COST_BASIS field only for modified data (ie. that appeared in today's delta load which will be the 50000+ records or so that I need to update. For data is brought over from previous run, we use prior run's value itself)

In order to perform this, I thought of using the 2 million file as an external table (say table A) and join this table with B and update the COST_BASIS from A to B for the modified records. We have the logic to identify the changed records however. But my concern is that the table A has to be full-scanned for 50000+ records from B and no indexes can be created on A.
I kindly need your suggestion as to what would be the best approach in tackling this problem.

Tom Kyte

Followup  

February 18, 2005 - 9:24 am UTC

why would A be full scanned more than once in a join?

RE : Full Scan

February 18, 2005 - 10:13 am UTC

Reviewer: A reader from NJ, USA

Tom,

This is because I thought if we join two tables, then it will do a Nested Loop join. For each row got from table B, it would have to full scan A to get matching rows and also that B (delta table) would be my driving table (since it has fewer number of rows to start with)

If my understanding is not correct, please advise.

Tom Kyte

Followup  

February 18, 2005 - 1:59 pm UTC

hash join it would do in light of no indexes.

hash join it would do even in the presence of indexes because the CBO is smart and knows indexes are evil when most or all of the rows are going to be processed.

it would be a nice big juicy fast hash join.

Use MERGE command to combine steps 3 and 4

February 18, 2005 - 12:51 pm UTC

Reviewer: Logan Palanisamy from Santa Clara, CA USA

Combine steps 3 and 4 with the merge command. Your problem is solved. The External table with 2 million records will be scanned just once.

MERGE INTO B
USING A
ON (B.ACCOUNT_NUMBER = A.ACCOUNT_NUMBER)
WHEN MATCHED THEN UPDATE
SET B.COST_BASIS = A.COST_BASIS
WHEN NOT MATCHED THEN INSERT (B.ACCOUNT_NUMBER, B.SECURITY_ID, B.COST_BASIS, ..) VALUES (A.ACCOUNT_NUMBER, A.SECURITY_ID, A.COST_BASIS, ..)

"A" is your external table with 2 million records.
"B" is the one that initally contains 50,000+ records. After the merge, B will have 2 million records.

The INSERT clause is step number 3
The UPDATE clause is step number 4

Hope that helps


Delete Record !!!

February 28, 2005 - 6:49 am UTC

Reviewer: Anupam

What is the diffence between :

1. DELETE <table> TRUNCATE WHERE <clause>;
2. DELETE <table> WHERE <clause>;

?? Both works!!!

Tom Kyte

Followup  

February 28, 2005 - 8:01 am UTC

just like

ops$tkyte@ORA9IR2> delete t this_is_a_correlation_name where 1=1;
 
27794 rows deleted.


works -- truncate is just being used as a correlation name there (a table alias) 

Delete record

February 28, 2005 - 6:50 am UTC

Reviewer: Anupam from India

What is the diffence between :

1. DELETE <table> TRUNCATE WHERE <clause>;
2. DELETE <table> WHERE <clause>;

?? Both works!!!

Deleted message

March 03, 2005 - 10:11 am UTC

Reviewer: Franc Uwa from Nigeria

the deleted was of today and trying to see if you can help me to resend those deleted message of me back to my inbox

Tom Kyte

Followup  

March 03, 2005 - 10:28 am UTC

huh?

March 10, 2005 - 10:51 am UTC

Reviewer: Brad Doczkat from Pittsburgh, PA

What do you do when the resident DBAs do not approve of the use of this method? How do I convince them that this approach is much more streamlined and requires far less processing time than doing an update of millions of rows the old fashioned way? For my particular need that brought me to this thread, I only have to do a one time initialization of a one character column that I want to put a not null (with default) contraint on. There are 5.5 million rows in the table. I need to do this one-time as an implementation step, yet the DBAs do not seem to buy into this approach. They would rather update so many rows a night, over the course of several nights, doing frequent commits during the update process.

Tom Kyte

Followup  

March 10, 2005 - 7:11 pm UTC

tell the dba's to do the work then?

Sorry -- I just don't know how to make your dba's do something they don't want to.

Help Please I don't Understand

March 11, 2005 - 4:20 pm UTC

Reviewer: A Reader from US

Tom:

I have a project due tomorrow that I am getting paid a boat load of money for and do not have a clue… Could you write it for me without being compensated? Oh and answer my interview questions for me as well.

Just joking ;)

As always you have the best info on your site (dare we say better than meta link?).

However I have made it through two threads in just over 9 hours. Have you ever thought of hiring a few people and extracting: best part of thread ; lessons learned; sample code (I still have not found the latest and greatest 10g show_space); and a glossary for items like “CTAS”, “(sic)”, “SLA”, “FBI”? Perhaps make a book out of it and sell it but it would have to be a searchable electronic copy. (Hmm, WIKI enable a portion of asktom (naa)?)

In this thread alone I found: q’| quoted strings |’; CTAS for speed; I really need to start using hints; I need to look into Analytics; I have to check if my DW really needs partitions…

Thanks for everything.
A Reader.

(Guess response: This is a feature request… Denied :)


Tom Kyte

Followup  

March 12, 2005 - 9:58 am UTC

I do that -- it is called "a book" :)

Effective Oracle by Design -- best of asktom, with scripts as of the time of the book.

Expert One on One Oracle -- what you need to know, with scripts -- second edition under construction (in fact I'm going to take April "off" -- from asktom -- in order to just get it going)

Expert One on One Oracle second will be the 10g version of what you are looking for.

March 12, 2005 - 10:16 am UTC

Reviewer: Alberto Dell'Era from Milan, Italy

> best of asktom, with scripts
> as of the time of the book.

Yes, but that doesn't mean that the books are just a collection of q&a from asktom (as i've found many ones believe); they are "regular" books - the q&a are mainly used as "Customer-driven Topic Selection", so to speak...

Tom Kyte

Followup  

March 12, 2005 - 10:20 am UTC

correct -- definitely not a collection of asktom articles, that would be disjointed at best. Rather, a flow -- a story -- with examples from the site to help it along. The things I pick to write about are driven very much by the questions and trends I see here.

March 13, 2005 - 2:45 am UTC

Reviewer: A reader

Hi Tom,
Regarding...
"in fact I'm going to take April "off" -- from asktom "

What will happen to desparate people, who are looking for answers supported by technical reasons? I am sure that others will be of the same opinion that this is one of greatest help ever one can get in Oracle in timely and accurate manner...

Tom Kyte

Followup  

March 13, 2005 - 9:54 am UTC

start here:
</code> http://www.jlcomp.demon.co.uk/faq/ind_faq.html

and try:

groups.google.com -> comp.databases.oracle.*

the place where I got hooked on discussing Oracle in the first place way back in October of 1994.  You might get an answer from Jonathan Lewis, Connor McDonald, or a host of others.  I still lurk there from time to time.


otn.oracle.com -> discussion forums

metalink.oracle.com -> discussion forums

the oracle-l list
http://www.freelists.org/archives/oracle-l/ <code>
Where Cary Millsap or Steve Adams or Tim Gorman to name but a very few might answer your question.


You might even find yourself answering some from time to time!

dblink

March 23, 2005 - 2:49 am UTC

Reviewer: shiva from india

We are trying to access a stored procedure which is existing in another database thro dblink, in that stored procedure one update stat.. is there because of that screen gets hanged

Regards,
Shivachandramouli.R


Tom Kyte

Followup  

March 23, 2005 - 8:46 am UTC

so little data... 'one update stat' what is that. and what have you done to diagnose the issue?

questions in the dependent objects after

March 27, 2005 - 10:03 pm UTC

Reviewer: A reader


Hi Tom,

As u suggested us to create a new table ,then drop the original table and rename
sounds the great method update the table by CTAS, drop original table and then rename it to the original table.
questions all the dependent objects such as packages, views, synonynums... all will because invalid. do these need to be recompiled? I know all the grants have to be regrants.

Thanks.

Tom Kyte

Followup  

March 28, 2005 - 7:18 am UTC

dependent objects will take care of themselves next time they are executed.

March 27, 2005 - 10:06 pm UTC

Reviewer: A reader

Hi Tom,


Sounds the great method to update the table by CTAS, drop original table and then
rename it to the original table.
Question is all the dependent objects such as packages, views, synonynums... all
will become invalid. Do these need to be recompiled? I know all the grants have
to be regrants.

Thanks.


March 27, 2005 - 10:48 pm UTC

Reviewer: A reader

tom,
you said:

"
Now, just set job_queue_processes > 0 to set the "degree of threads" and in 8i
and before set job_queue_interval to say 60 or so and there you go.
"

what's the "degree of threads"?

Tom Kyte

Followup  

March 28, 2005 - 7:29 am UTC

"degree of threads" is number of concurrent job queue processes, how many dbms_job jobs can be running simultaneously.

commits or not

April 01, 2005 - 5:09 pm UTC

Reviewer: A reader

want to insert 30 million records to a dummy table with nologging as you adivsed.

insert into /*+ append parallel ( ) */
select * from ...

if i have the large rollback segs, it can be complete without doing the commits during the insert into (for example comment every one millions inserts), is it faster without the commits?

Thanks alot

Tom Kyte

Followup  

April 01, 2005 - 6:12 pm UTC

to commit or not. Everytime you commit, you will wait, wait for log file sync.

Every time you commit, you say "i'm done". That's ok, if you are restartable!

However, with a big parallel direct path insert it'll be all or nothing (no point to do N rows, and do the query all over again). Each parallel process will utilize it's own rbs and if you have no indexes, a direct path insert isn't going to generate MUCH undo at all (it writes above the HWM and doesn't need to generate UNDO information)

April 05, 2005 - 5:08 am UTC

Reviewer: A reader


Let's Tell the CBO About Our Great Idea!

April 11, 2005 - 7:28 pm UTC

Reviewer: Doug Winage from New Orleans, LA

Given that it seems there are clear cases in which making and loading a new table is preferable to updating in place, I wonder if it would be just as well to make such an approach available to the CBO as a candidate execution plan. I'll grant that one wouldn't want to leave the choice entirely up to the optimizer; the safest thing would be to allow enabling and disabling of the plan option at the system level with a parameter called, say, "ctas_for_updates_enabled." One could also provide a pair of hints, say, /*+ USE_CTAS */ and /*+ NO_CTAS */, to allow one to override the CBO, if need be, and expressly demand or prohibit a plan involving recreation of the table.

Tom Kyte

Followup  

April 11, 2005 - 8:23 pm UTC

all really good ideas need to go via support and get filed....

CTAS instead of Deletes

April 11, 2005 - 11:02 pm UTC

Reviewer: GG from Minneapolis, MN

Tom,
Please, this is a relevant follow up to this thread.
I have also included a test case.
Instead of deleting rows from a table with 270 Million records, I have used CTAS. The delete criteria is based on a join of two fields from two tables. The delete occurs only on one table (ORDERS). The other table is INVOICE.
If DOC_NUMBER and ORD_ITEM values from both tables match, then delete the matching records from ORDERS.

Test case:

drop table orders;
create table orders (
DOC_NUMBER number(10) not null
,S_ORD_ITEM number(10) not null
,PROMOTION varchar2(10) not null);


drop table invoice;
create table invoice(
DOC_NUMBER number(10) not null
,S_ORD_ITEM number(10) not null
,OTHER_INFO varchar2(10) not null);


begin
for i in 1..10 loop
insert into orders values(i,i+1,'ORDERS');
end loop;
commit;
end;
/


select * from orders;

DOC_NUMBER S_ORD_ITEM PROMOTION
---------- ---------- ----------
1 2 ORDERS
2 3 ORDERS
3 4 ORDERS
4 5 ORDERS
5 6 ORDERS
6 7 ORDERS
7 8 ORDERS
8 9 ORDERS
9 10 ORDERS
10 11 ORDERS


begin
for i in 1..5 loop
if mod(i,2) = 0 then
insert into invoice values(i,i+1,'INVOICE');
end if;
end loop;
commit;
end;
/


select * from invoice order by 1;

DOC_NUMBER S_ORD_ITEM PROMOTION
---------- ---------- ----------
2 3 INVOICE
4 5 INVOICE

The above two rcords should be deleted from ORDERS:

drop TABLE ORDERS_NEW;
CREATE TABLE ORDERS_NEW
NOLOGGING
AS
SELECT /*+ parallel(ORDERS,4) */ *
from orders ord
where NOT EXISTS
(select null from invoice inv
where inv.DOC_NUMBER = ord.DOC_NUMBER and
inv.S_ORD_ITEM = ord.S_ORD_ITEM);

select * from orders_new order by 1;

DOC_NUMBER S_ORD_ITEM PROMOTION
---------- ---------- ----------
1 2 ORDERS
3 4 ORDERS
5 6 ORDERS
6 7 ORDERS
7 8 ORDERS
8 9 ORDERS
9 10 ORDERS
10 11 ORDERS

IS THIS THE CORRECT WAY OF DOING IT? ESPECIALLY THE SQL WITH "NOT EXISTS"?



Tom Kyte

Followup  

April 12, 2005 - 7:07 am UTC

hints need to use the correlation (alias) names....

I would just use NOT IN but not exists with the CBO will be considered as not in conceptually so it is just harder to code, but has the same net effect.

See the concepts guide for details on parallel CTAS.

Update millions of records

April 12, 2005 - 1:33 pm UTC

Reviewer: Marsboy from USA

Hi Tom,
It helped me so much that my days work is now can be accomplished in minutes.




update partition

April 14, 2005 - 2:33 am UTC

Reviewer: phil

I am sorry for posting aq similar question in another thread. The follow up should have been here !

I am getting a little confused with the technique of updated lots of rows in a partitioned table.

I have a table partioned by month. To do an update on a column you suggest doing a create table as select. This of course will create a new table with the updated colum and the data from the table to be updated.

What I do not understand is how to handle the each partition. My new table has none, just valid data. How do I now get the new table to the structure of the old with all the partitions ?

I am sure the answer is here, I just cannot see it !

regards

Tom Kyte

Followup  

April 14, 2005 - 7:43 am UTC

to do a huge bulk update of most of the rows, using a create table as select could be a way to go. I followed up on the other thread.

Partioned tables

April 14, 2005 - 7:54 am UTC

Reviewer: Greg from Toronto

a previous poster was asking about partioned tables ..

[quote]
I am getting a little confused with the technique of updated lots of rows in a
partitioned table.
[/quote]

Feel free to ignore this question if you answered it already in the other followup (I wasn't sure which thread you posted it in, and can't see find it .. *sigh*)

Anyway ... if the table is partitioned, can you not just deal with each partition seperately ?

That is: if you have 15 partitions (ie P01 through P15 ...), and you "update" only encompased P03 and P04 ... then could you not just "re-create" that partition in the same way you "re-create" the table .. then just "stuff" the partition(s) back in?

I can't remember all the syntax to pull this little "card-shuffle" off ... but I'm pretty sure it can be done? can't it?


Tom Kyte

Followup  

April 14, 2005 - 8:35 am UTC

sorry, the other thread is
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:7143624535091 <code>


Yes, if the table is partitioned, you can. (and that is what I was trying to describe on the other page, I just didn't have sufficient detail the first time around to make it clear)

April 21, 2005 - 12:12 pm UTC

Reviewer: A reader

"
declare
l_job number;
begin
dbms_job.submit( l_job, 'execute immediate ''alter index idx1 rebuild'';' );
commit;
dbms_job.submit( l_job, 'execute immediate ''alter index idx2 rebuild'';' );
commit;
end;


Now, just set job_queue_processes > 0 to set the "degree of threads" and in 8i
and before set job_queue_interval to say 60 or so and there you go.

"
will the dbms job only be executed once?

TIA

Tom Kyte

Followup  

April 22, 2005 - 9:02 am UTC

yes, well, it will execute successfully ONCE.

It might fail over and over and over and over, then run successfully and go away.

How to Update millions or records in a table?

May 17, 2005 - 7:48 am UTC

Reviewer: S.Darshan from Singapore

I have to update one table which has 10 million records. The regular rollback segment won't enough to hold this data, In this case, how can we predict, how much rollback area will be used by this update.

Initially I thought of your approach "CREATE TABLE new_table as select <do the update "here"> from old_table;"

But I have to update about 30% of columns. Rest of the 70% columns should be not touched. Also this table have lot foreign key Constraints. Is any alternative method? Thanks in Advance.

With best regards
S.Darshan

Tom Kyte

Followup  

May 17, 2005 - 9:16 am UTC

if I had to update 1/3 of a 10,000,000 row table -- that is the perfect candidate for create table as select. foreign keys or not (you can always disable, ctas and for the columns you have not updated -- enable novalidate, just put them back on)

but, if you cannot do that, disable indexes on columns to be updated and update and then rebuild the indexes. to get a grasp on "how much undo", create a copy of the first 100,000 rows of data for example, update it and see how much undo you generated.




Parallel DML

May 18, 2005 - 10:05 am UTC

Reviewer: Tamilselvan from USA

I agree with what Tom said.

How ever, in many real life situation CTAS may not be executed as we think. For example, if the table has a LONG column, then CTAS will not work.

Consider Parallel DML. I have successfully implemented parallel dml in a large DB.

Here is the script:

alter table scott.table_a parallel 8;
alter index scott.table_a_p1 parallel 8;
alter table scott.tamil_temp_table_b parallel 8;
alter index scott.tamil_temp_table_b_idx parallel 8;

commit;
alter session enable parallel dml ;
alter session enable parallel query ;

update /*+ parallel(a,8) */ scott.table_a a
set a.pr_addr_id = ( select addr_per_id
from scott.tamil_temp_table_b c
where c.accnt_id = a.ID)
where exists ( select null
from scott.tamil_temp_table_b c
where c.accnt_id = a.ID) ;
commit;
alter session disable parallel dml ;
alter session disable parallel query ;
alter table scott.table_a noparallel ;
alter index scott.table_a_p1 noparallel ;

Thanks
Tamil

Tom Kyte

Followup  

May 18, 2005 - 11:20 am UTC

<quote>
if the table has a LONG column, then CTAS will not work.
</quote>

convert long to clob and be done with it.

Ecellent Explanation

May 18, 2005 - 11:18 pm UTC

Reviewer: A reader from Virgina

this is very helpful nice Toma i have question i am facing while using dbstart utility where can i ask that question

Tom Kyte

Followup  

May 19, 2005 - 7:39 am UTC

suggest you don't use dbstart but rather write your own rc.d script to startup/shutdown that which you want.

What I do is like this:

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:1302619965694#12118047114111 <code>



How to Update millions or records in a table

June 08, 2005 - 5:29 am UTC

Reviewer: Debasish from New Delhi, India

you told that create tble with update command. but I have a problem that. ( in 8i)
One of my Master table is contain following fields ( approx 500000 records)

1. ID varchar2
2. Pincode Nuimber
3. Tag char

Another three table is bankmaster1,bankmaster2,bankmaster3 fields are
1. pincode number
2. branch varchar2

I have to update tag column in master table according the pincode distribution in bankmaster table, like tag E for bank1, tag C form bank2, tag V for bank3.
when I try it through update command then it works for a small table but not for this 500000 record table.

Plese let me know in advance.


Tom Kyte

Followup  

June 08, 2005 - 9:04 am UTC

I told you to create a table with the update command?  Where and when -- I'll need to correct that if true, for you cannot create a table with the update command.


It works for me for a 50,000,000 row table or a 5 row table.  You'll have to tell us why it isn't "working" (and hopefully it is not using a single index)


ops$tkyte@ORA10G> create table t
  2  as
  3  select id, id pincode, 'x' tag
  4    from big_table.big_table
  5   where rownum <= 1000000;
 
Table created.
 
Elapsed: 00:00:01.36
ops$tkyte@ORA10G>
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> create table t1 as select id pincode, 'xxxxxx' branch from t where mod(id,3) = 0;
 
Table created.
 
Elapsed: 00:00:01.60
ops$tkyte@ORA10G> create table t2 as select id pincode, 'xxxxxx' branch from t where mod(id,3) = 1;
 
Table created.
 
Elapsed: 00:00:01.54
ops$tkyte@ORA10G> create table t3 as select id pincode, 'xxxxxx' branch from t where mod(id,3) = 2;
 
Table created.
 
Elapsed: 00:00:01.49
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> exec dbms_stats.gather_table_stats( user, 'T', method_opt=> 'for all indexed columns' );
 
PL/SQL procedure successfully completed.
 
Elapsed: 00:00:00.06
ops$tkyte@ORA10G> exec dbms_stats.gather_table_stats( user, 'T1', method_opt=> 'for all indexed columns' );
 
PL/SQL procedure successfully completed.
 
Elapsed: 00:00:00.13
ops$tkyte@ORA10G> exec dbms_stats.gather_table_stats( user, 'T2', method_opt=> 'for all indexed columns' );
 
PL/SQL procedure successfully completed.
 
Elapsed: 00:00:00.07
ops$tkyte@ORA10G> exec dbms_stats.gather_table_stats( user, 'T3', method_opt=> 'for all indexed columns' );
 
PL/SQL procedure successfully completed.
 
Elapsed: 00:00:00.13
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> set timing on
ops$tkyte@ORA10G> update (select tag from t where pincode in (select pincode from t1)) set tag = 'E';
 
333333 rows updated.
 
Elapsed: 00:00:06.13
ops$tkyte@ORA10G> update (select tag from t where pincode in (select pincode from t2)) set tag = 'C';
 
333334 rows updated.
 
Elapsed: 00:00:06.48
ops$tkyte@ORA10G> update (select tag from t where pincode in (select pincode from t3)) set tag = 'V';
 
333333 rows updated.
 
Elapsed: 00:00:06.40
ops$tkyte@ORA10G>


Only takes a couple of seconds. 

June 10, 2005 - 8:42 am UTC

Reviewer: A reader from MI

Tom,
I have this query in which I'm trying update the old POC_ID with the new POC_ID. However, it doesn't produce the results I would expect. The table does not get updated with the correct POC_ID. What am I missing here?

begin for c1 in (select w.poc_id new_poc_id, a.poc_id old_poc_id
from wsar_poc w, ap32admin.adas_poc a
where w.poc_responsible_ind = a.poc_responsible_ind)
loop
update wsar_archive_cs
set poc_id = c1.new_poc_id
where poc_id = c1.old_poc_id;
end loop;
end;

Thanks,
Randy

Tom Kyte

Followup  

June 10, 2005 - 10:34 am UTC

need example.

but that looks like it should be at most a SINGLE sql statement, no code whatsoever...

June 10, 2005 - 10:59 am UTC

Reviewer: A reader

I have a table WSAR_ARCHIVE_CS which contains a column POC_ID. POC_ID is FK to the ADAS_POC POC_ID PK. I want to update that column with the new POC_ID which is stored in WSAR_POC POC_ID PK.

Thanks,
Randy

Tom Kyte

Followup  

June 10, 2005 - 11:10 am UTC

create table .....
create table ....
create table ....
insert some data...
insert some more data...

followed by "and this is what I need to do..."

June 10, 2005 - 11:28 am UTC

Reviewer: A reader

create table wsar_poc
(POC_ID NOT NULL NUMBER,
POC_RESPONSIBLE_IND VARCHAR2(50),
DIVISION_ID NOT NULL NUMBER,
OUSER VARCHAR2(30),
ACCESS_ROLE VARCHAR2(50),
PHONE_NUMBER VARCHAR2(20),
EMAIL VARCHAR2(200))


Create table wsar_archive_cs
(FILE_ID NOT NULL NUMBER,
ONELINER_DATE DATE,
PRODUCT VARCHAR2(50),
COMPONENT VARCHAR2(400),
PRIORITY VARCHAR2(50),
POC_ID NUMBER,
SUBJECT VARCHAR2(300),
ISSUE VARCHAR2(4000))

I have the table adas_poc which contians the old poc_id that is in the WSAR_ARCHIVE_CS table.

desc adas_poc
POC_ID NOT NULL NUMBER,
POC_RESPONSIBLE_IND VARCHAR2(50),
DIVISION_ID NOT NULL NUMBER,
OUSER VARCHAR2(30),
ACCESS_ROLE VARCHAR2(50),
PHONE_NUMBER VARCHAR2(20),
EMAIL VARCHAR2(200)

I tried to update using the cursor for loop, but no luck. The column does not get updated with the correct ID.

begin for c1 in (select w.poc_id new_poc_id, a.poc_id old_poc_id
from wsar_poc w, ap32admin.adas_poc a
where w.poc_responsible_ind = a.poc_responsible_ind)
loop
update wsar_archive_cs
set poc_id = c1.new_poc_id
where poc_id = c1.old_poc_id;
end loop;
end;

Thanks,
Randy

Tom Kyte

Followup  

June 10, 2005 - 3:33 pm UTC

oh, so close -- you almost had inserts there with data to show

a) what happened
b) explain what you expected

That will definitely join wsap_poc to adas_poc by poc_responsible_ind

it will definitely then update wsar_archive_cs

you need to explain why this logic isn't working, it is rather straight forward. Demonstrate for us.

RE: Parallel Import

June 20, 2005 - 6:37 pm UTC

Reviewer: Bakunian

Tom,

I created range partitioned table with parallelization enabled, then I use import utility to load 5Gb of data into table however I don't see parallel processes spawned during load.
Shouldn't import take advantage of this functionality?
I am using 9.2 on Solaris 8.

These are instance parameters.

Thank you for your time.

NAME_COL_PLUS_SHOW_PARAM TYPE VALUE_COL_
--------------------------------------------- ----------- ----------
fast_start_parallel_rollback string LOW
log_parallelism integer 4
parallel_adaptive_multi_user boolean TRUE
parallel_automatic_tuning boolean TRUE
parallel_execution_message_size integer 4096
parallel_instance_group string
parallel_max_servers integer 40
parallel_min_percent integer 0
parallel_min_servers integer 0
parallel_server boolean FALSE
parallel_server_instances integer 1
parallel_threads_per_cpu integer 2
recovery_parallelism integer 0

Tom Kyte

Followup  

June 21, 2005 - 7:51 am UTC

imp is a serial process, always has been - it reads a single DMP file sequentially and loads it.

10g with impdb has a parallel capability.

Tome u r great

June 30, 2005 - 3:08 am UTC

Reviewer: shyam

I have a quick question my statistics are as below i am loading data

datafile Blocks_Read FS.Phyblkwrt Blocks_Written Total_IOs


/oracle/u03/oradata/dev/data_01.dbf
26507 5240 31747

/oracle/u03/oradata/dev/data_09.dbf
26823 2823 29646

/oracle/u06/oradata/dev/data_08.dbf
24978 3704 28682
/oracle/u04/oradata/dev/data_10.dbf
26288 1831 28119

/oracle/u04/oradata/dev/data_02.dbf
24053 3634 27687

/oracle/u05/oradata/dev/data_11.dbf
26155 1284 27439
/oracle/u05/oradata/dev/data_07.dbf
23125 2865 25990

/oracle/u04/oradata/dev/data_06.dbf
23558 1978 25536

/oracle/u06/oradata/dev/data_25.dbf
21672 2234 23906
/oracle/u05/oradata/dev/data_03.dbf
21299 2574 23873

/oracle/u03/oradata/dev/data_05.dbf
20324 1819 22143

/oracle/u06/oradata/dev/data_04.dbf
19673 1746 21419

/oracle/u06/oradata/dev/data_12.dbf
19574 1306 20880

/oracle/u04/oradata/dev/data_14.dbf
19309 755 20064

/oracle/u03/oradata/dev/data_13.dbf
18807 848 19655
/oracle/u05/oradata/dev/data_24.dbf
17656 1316 18972

/oracle/u06/oradata/dev/data_16.dbf
16748 1730 18478

/oracle/u05/oradata/dev/data_15.dbf
17006 810 17816
/oracle/u03/oradata/dev/data_17.dbf
14810 1350 16160

/oracle/u05/oradata/dev/data_19.dbf
13552 1157 14709

/oracle/u03/oradata/dev/data_21.dbf
13622 911 14533

/oracle/u05/oradata/dev/data_23.dbf
13366 878 14244

/oracle/u04/oradata/dev/data_18.dbf
12644 908 13552

/oracle/u04/oradata/dev/data_22.dbf
12521 741 13262
/oracle/u06/oradata/dev/data_20.dbf
12137 546 12683

Tom u can observe that total reads is more than total writes as a result the performance is decreasing. let me know how should i increse the block written. as a result o this my data loading is slow

Tom Kyte

Followup  

June 30, 2005 - 9:29 am UTC

who is "u" and why is "u" observing things?


how do you know that reads are the cause of performance decreasing. All I see are a bunch of numbers.

loading data

June 30, 2005 - 3:10 am UTC

Reviewer: shyam

this is the quesry i used to generate statis stics

select DF.Name File_Name,
FS.Phyblkrd Blocks_Read,
FS.Phyblkwrt Blocks_Written,
FS.Phyblkrd+FS.Phyblkwrt Total_IOs
from V$FILESTAT FS, V$DATAFILE DF
where DF.File#=FS.File#
order by FS.Phyblkrd+FS.Phyblkwrt desc;


Tom Kyte

Followup  

June 30, 2005 - 9:30 am UTC

so, you've just shown the number of reads done since the database was started. All they are are a bunch of numbers with no meaning.

how do you know read IO is your bottleneck? what made you decide that?

performance while loading data

June 30, 2005 - 7:54 pm UTC

Reviewer: shyam

FS.Phyblkwrt
datafile = /oracle/u03/oradata/dev/data_01.dbf

Blocks_Read = 26507
Blocks_Written =5240
Total_IOs= 31747

Tom Kyte

Followup  

June 30, 2005 - 9:09 pm UTC

so what? (he says again)

you've shown me that file has been read 26,507 times SINCE STARTUP and written to 5,240 time SINCE STARTUP

big deal, so what?

where they all during your load.
did they take A LONG TIME <<<===



PRECAUTIONS TO TAKE WHEN HUGE INSERTS

July 03, 2005 - 8:44 am UTC

Reviewer: shyam

tom can u tell me the tips and how to improv the speed with which data is loading.

to to improve data load speed. for huge inserts at present i have 73 tables data is being loaded it takes for one project 60 mins

Tom Kyte

Followup  

July 03, 2005 - 10:28 am UTC

depends on your needs and circumstances. To describe them all would be a small book.

So, you'd need to be more "specific". How many new rows, how many old rows, is there lots of validation going on, is this a reload or a load, have lots of un-necessary indexes, etc etc etc

Delete & import 6M records - Data Warehouse

July 06, 2005 - 11:54 pm UTC

Reviewer: A reader

Tom,

We currently have v8.0.6 running data warehouse. We have 6M rows to delete everyday & same time import same amount. It currently take us 10hrs to do deletion part. Data are import every 5mins. Is there any way we can speed up the deletion & improve the import(using oracle imp/exp)

Thanx so much for your help

Tom Kyte

Followup  

July 07, 2005 - 9:09 am UTC

insufficient data. why no partitioning? maybe a create table as select rows to keep and drop old table.

but insufficient data to answer properly.



and really old software.
and really archaic tools.

a lot has happened in the time between 8.0.6 and 10gR2 - you'd find many more options in 2005 than you did in 1997 when 8.0 was current.

slowly my data base

August 06, 2005 - 6:50 pm UTC

Reviewer: hameed

hi tom
pleas help me
i have program by oracle 8i& dev 6i
the problem:
data base is slowly

One large insert Vs multiple smaller inserts

August 14, 2005 - 10:55 am UTC

Reviewer: Sami

Dear Tom,

We had this issue in production during huge data migration.

Environment:-
Oracle 8.1.7.3 Sun 2.6 - 2 Node OPS. 12 CPU on each box.
The table in question here involved in M-M Oracle replication.

There is no other activity on both "12Million_Rec_table" and "1Million_Rec_table " other than my process.

Process-1
=========
exec dbms_reputil.replication_off();
commit;
insert /*+ append parallel(t,12) */ into 12Million_Rec_table t
select /*+ parallel */ * from 1Million_Rec_table;


Process-2
=========
exec dbms_reputil.replication_off();
commit;
--for loop with diff predicate value
loop
insert /*+ append parallel(t,12) */ into 12Million_Rec_table
select /*+ parallel */ * from 1Million_Rec_table
where predicate=&1; -- there is an Index on Predicate column
end loop


12Million_Rec_table has only one primary key index. There is no other constraints or LOB columns on this table. For some reason it didn't use direct-path insert, don't know why.


Process-2 executed N times in a sequence.Basically Process-1 is one large transaction compared to Process-2 (which has N number of transactions in a loop).

In terms of execution time the Process-1 took longer than Process-2. Process-1 was running for more than
2 hrs without completion so we stopped it. Process-2 with smaller chunk of records completed very fast.


When Process-1 started the USED_UBLK value in v$rollstat increased rapidly then slowed down gradually. I have seen
this behaviour many times while running Process-1.

My questions are

1)Why Process-1 is slower than Process-2. [We didn't have any rollback segment issue while running both Process-1 and Process-2]

2)Is there any performnce issue if the rollback segment goes beyond some size, say 200M or something..?




Tom Kyte

Followup  

August 14, 2005 - 11:49 am UTC

you did not enable parallel DML, to do a parallel direct path insert, you must.

I cannot answer these questions, insufficient data. You could have captured information while it was running.

There is a *small* chance that what happened was

a) the big insert, because of the primary key index, generated gobs of undo.
b) this caused more undo to be generated than fit in the buffer cache
c) so it had to be flushed by dbwr
d) causing buffer waits by the insert (trying to get more buffers from the cache for undo)
e) and causing IO contention on the disk

The smaller inserts *MIGHT* have

a) filled up part of the buffer cache with undo blocks
b) committed, allowing the undo blocks in the cache to be reused
c) goto A n times

avoiding the buffer waits and the flushing of the undo to disk.

However, I would not have taken your approach for a small set of one million rows. It likely would have been much faster to

a) get rid of the primary key index
b) set big table nologging if not already
c) insert /*+ append */ into big_table select * from small_table;
d) recreate the primary key index in parallel (if even needed, just 12 million rows), nologging
e) put logging attribute back on for the future
f) backup affected datafiles



did not get rid of PK index because the table is in M-M replication

August 14, 2005 - 1:22 pm UTC

Reviewer: Sami

Dear Tom,

Thanks a lot for your immediate response. You are absolutely
right about why the "smaller chunk transactions" defeated "one large transaction".

<asktom>
a) get rid of the primary key index
b) set big table nologging if not already
c) insert /*+ append */ into big_table select * from small_table;
d) recreate the primary key index in parallel (if even needed, just 12 million
rows), nologging
e) put logging attribute back on for the future
f) backup affected datafiles
</asktom>

I agree with you 100%. Even I prefer your approach [get rid of the primary key index,delete,rebuild index]. However the table in question is involved in Master-Master replication.
We are not sure about getting rid of primary key index for the table which involved in replication. That is why we didn't take this approach.

Tom,thanks for your time & effort.

Additional Info

August 14, 2005 - 8:20 pm UTC

Reviewer: Sami

Our large pool is set to 0 size. I am just wondering whether this value had any impact on our parallel process poor performance.

NAME TYPE VALUE
------------------------------------ ------- ------------------------------
large_pool_size string 0





Tom Kyte

Followup  

August 15, 2005 - 6:52 am UTC

are you sure it is really zero, if you are using automatic SGA memory management:

ops$xp10gr1\tkyte@ORA10GR1> show parameter large

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
__large_pool_size                    big integer 4M
large_pool_size                      big integer 0
ops$xp10gr1\tkyte@ORA10GR1>




I didn't set a large pool, but Oracle has set up a 4m one (that can grow/shrink as needed) 

Large pool size

August 15, 2005 - 1:45 pm UTC

Reviewer: Sami

Tom, We are still in 8.1.7.3.

Yes, I am sure the large pool size is set to zero on both instances(2 node OPS)

Tom Kyte

Followup  

August 15, 2005 - 10:22 pm UTC

it isn't "impacting it" performance wise, it is using memory from elsewhere if, but the large pool would be a better place since this memory for parallel execution isn't "LRU" memory like the other stuff in the shared pool -- the large pool is designed for memory that is allocated and deallocated, unlike shared pool memory which is designed to be allocated and aged out over time.

Undo - With and without index

August 15, 2005 - 3:17 pm UTC

Reviewer: Sami

Session #1
=================================================================================
/*

create table t1 as select * from dba_objects;
create index t1_idx01 on t1(object_id);

*/
SQL> select distinct sid from v$mystat;

       SID
----------
       126

SQL> insert into t1 select * from dba_objects;

48601 rows created.

=================================================================================


Session #2 
=================================================================================
SQL> select taddr,used_ublk from v$transaction t,v$session s,v$rollstat r where s.taddr=t.addr and t.xidusn=r.usn and s.sid=126;

TADDR     USED_UBLK
-------- ----------
1965B9F4        568

SQL> select taddr,used_ublk from v$transaction t,v$session s,v$rollstat r wheres.taddr=t.addr and t.xidusn=r.usn and s.sid=126;

TADDR     USED_UBLK
-------- ----------
1965AFFC          2

SQL> select taddr,used_ublk from v$transaction t,v$session s,v$rollstat r where s.taddr=t.addr and t.xidusn=r.usn and s.sid=126;

TADDR     USED_UBLK
-------- ----------
1965B9F4        636


SQL> select taddr,used_ublk from v$transaction t,v$session s,v$rollstat r where s.taddr=t.addr and t.xidusn=r.usn and s.sid=126;

TADDR     USED_UBLK
-------- ----------
1965AFFC          2

SQL> select taddr,used_ublk from v$transaction t,v$session s,v$rollstat r where s.taddr=t.addr and t.xidusn=r.usn and s.sid=126;

no rows selected

SQL> select taddr,used_ublk from v$transaction t,v$session s,v$rollstat r where s.taddr=t.addr and t.xidusn=r.usn and s.sid=126;

TADDR     USED_UBLK
-------- ----------
1965B9F4        754

SQL> select taddr,used_ublk from v$transaction t,v$session s,v$rollstat r where s.taddr=t.addr and t.xidusn=r.usn and s.sid=126;

TADDR     USED_UBLK
-------- ----------
1965B9F4       1035

SQL>

=================================================================================

I monitored rollback activities for session#1(while inserting rows) through session#2.

While the USED_UBLK is gradually going up for one transaction , I noticed  some other 
transaction(TADDR=1965AFFC) with USED_UBLK=2.
Is it because of Index t1_idx01? 

I didn't see this behavior when I tested the same test case without Index.



Sometimes I noticed "no rows selected" in session#2. I assume this is the time the transaction is
switching between TADDR 1965AFFC and 1965B9F4?


Please correct me if I am wrong? 

Tom Kyte

Followup  

August 15, 2005 - 10:35 pm UTC

ddl commits, there were two transactions there

the create table and the create index.

no ddl here

August 15, 2005 - 11:04 pm UTC

Reviewer: Sami

Tom,

Sorry for the confusion. I shouldn't have put the create table and create index statement over there (though it was commented)

create table and create index is outside of this scope. The reason I put those because I just wanted to show you that
1) the table is created from dba_objects
2) it has index on object_id column


I have tested this behavior in both 8i and 10g. The following output and "no rows selected" appeared N times (more than 2) WHILE THE INSERT STATEMENT IS GOING ON, just for clarity I just truncated the output.


TADDR USED_UBLK
-------- ----------
1965AFFC 2



Tom Kyte

Followup  

August 16, 2005 - 10:46 am UTC

then I have no clue what we are in fact looking at there.

I cannot even remotely reproduce - nor is this the way it works.

the used_ublks would in this case increase, until you commit, then they go away.

1 big INSERT with joins - OR - multiple UPDATES...

August 16, 2005 - 8:05 am UTC

Reviewer: Sandeep from London, UK

Hi Tom,

I'm trying to archive data from some big tables (5 or 6 of them). Before doing that I need to "rollup" an "amount"
column within each of those big tables (based on
a "creation date" criteria). The summations of these amount will be grouped by date (monthly sum-up) and will
need to be put into a reporting table.

The reporting table will have columns like
"Month", "Sum_Amt_BigTable1","Sum_Amt_BigTable2",
"Sum_Amt_BigTable3" and the like.

Initially there was only one BigTable (say BigTable1), so the archiving script was simple.
Just one - INSERT INTO REPORT_TABLE... SELECT.. FROM BIGTABLE1.. WHERE (the date criteria)
GROUP BY (the date column).

Now since there are more "BigTables", I have two(?) options:

1. Modify the insert script such that there is JOIN
between all BigTables, say BigTable1..BigTable6 (eeps!)
on the date criteria, and just sum-up the required
columns. For the sake of simplicity, assume that all
the tables contain at least one record for each month
(so no need to outer-join!).

2. Use the original INSERT script to create the initial
records within the REPORT table... then run multiple
UPDATEs on the REPORT table - one for
each bigtable, like this (roughly!)-
UPDATE REPORT_TABLE
SET Sum_Amt_BigTable2 = (SELECT sum(amountxx)
FROM BigTable2 WHERE (the date criteria)
GROUP BY (the date column))
WHERE REPORT_TABLE.month = LAST_DAY(TO_DATE(TO_CHAR(
BigTable2.date_column,'DD-MON-YYYY'),'DD-MON-YYYY'))
;

... and so on - for each remaining BigTable..

Which option do you thing will be better in terms of performance? - I suspect option2, as I'm a bit
apprehensive about joining 6 big tables for one big insert. Will be testing this out, but wanted
to have your opinion about the same.

Regards,
Sandeep

Tom Kyte

Followup  

August 17, 2005 - 10:36 am UTC

#1 of course, do as much as you can in a single sql statement.

big hash area size, nice juicy full scans -- ready to rock and roll and have it done fast.

big INSERT with joins - OR - multiple UPDATES

August 17, 2005 - 11:57 am UTC

Reviewer: Sandeep from London, UK

Hi,
Was thinking you might actually suggest #2 !!
Having a big join across 6 tables with multimillion
records? hmm... not sure.. but will definitely try this
out.
Did the #2 and it works well... will try to do the
#1 and get some stats to compare... somehow joins
between 6 *huge* tables does not convince me... but...
Thanks,
Sandeep

Tom Kyte

Followup  

August 17, 2005 - 1:58 pm UTC

6 tables, 60 tables -- I don't care, make SQL work for you -- NOT the other way around.


databases were BORN TO JOIN, it is what they do

Option #3?

August 17, 2005 - 2:25 pm UTC

Reviewer: Bob B from Latham, NY

If there was the likelihood of an outer join in the above scenario, it might be worth using a union all approach. Something like:

SELECT
'Big Table 1' NAME,
SUM( WHATEVER ) WHATEVER_SUM,
SUM( WHATEVER2 ) WHATEVER2_SUM,
PARAM1,
PARAM2
FROM BIG_TABLE1
WHERE CONDITION1 = 'whatever'
GROUP BY PARAM1, PARAM2
UNION ALL
SELECT
'Big Table 2' NAME,
SUM( WHATEVER ) WHATEVER_SUM,
SUM( WHATEVER2 ) WHATEVER_SUM,
PARAM1,
PARAM2
FROM BIG_TABLE2
WHERE CONDITION1 = 'whatever'
GROUP BY PARAM1, PARAM2
... etc

Essentially, write the report query for each table, union all them together and then summarize the union all query as necessary:

SELECT
SUM( SUM_WHATEVER ) SUM_WHATEVER,
SUM( SUM_WHATEVER2 ) SUM_WHATEVER2,
PARAM1,
PARAM2
FROM (
QUERY ABOVE
)
GROUP BY PARAM1, PARAM2



With more clear test case

August 17, 2005 - 9:12 pm UTC

Reviewer: sami

</asktom>
then I have no clue what we are in fact looking at there.

I cannot even remotely reproduce - nor is this the way it works.  

the used_ublks would in this case increase, until you commit, then they go away. 
</asktom>


Dear Tom,

While the session#1 is doing DIRECT PATH INSERT on TABLE which has PRIMAR KEY,I Monitored SESSION#2 as below.

While the USED_UBLK is gradually going up, there are 2 occurances it reported different TADDR with USED_UBLK=2.

I just wanted to know what is the new TADDR.

Session#1
#############################################################################

20:37:47 SQL> select distinct sid from v$mystat;

       SID
----------
       162

20:37:50 SQL> insert /*+ append */ into t1 select * from dba_objects;

48603 rows created.

20:38:23 SQL>
#############################################################################



Session #2

#############################################################################
SQL> desc undo_monitor
 Name                                      Null?    Type
 ----------------------------------------- -------- -------------------------

 TADDR                                              VARCHAR2(50)
 UNDO_UBLK                                          NUMBER
 TIME                                               DATE

SQL> truncate table undo_monitor;

Table truncated.

SQL> set time on
SQL>
SQL>
SQL> declare
  2  i number :=0;
  3  begin
  4  loop
  5  for c0rec in (select taddr,used_ublk from v$session s,v$transaction t
  6  where s.taddr=t.addr
  7  and s.sid=&session_to_be_monitored)
  8  loop
  9  insert into undo_monitor values(c0rec.taddr,c0rec.used_ublk,sysdate);
 10  end loop;
 11  i := i+1;
 12  if i=100000 then
 13  exit;
 14  end if;
 15  end loop;
 16  end;
 17  /
Enter value for session_to_be_monitored: 162
old   7: and s.sid=&session_to_be_monitored)
new   7: and s.sid=162)

PL/SQL procedure successfully completed.


20:39:49 SQL> select distinct time,taddr,undo_ublk from undo_monitor  order by t
ime;

TIME      TADDR                                               UNDO_UBLK
--------- -------------------------------------------------- ----------
17-AUG-05 4C64FADC                                                    1
17-AUG-05 4C64FADC                                                    1
17-AUG-05 4C64FADC                                                    1
17-AUG-05 4C64FADC                                                    1
17-AUG-05 4C6504D4                                                    2
17-AUG-05 4C64FADC                                                    5
17-AUG-05 4C64FADC                                                    6
17-AUG-05 4C64FADC                                                    9
17-AUG-05 4C64FADC                                                   10
17-AUG-05 4C64FADC                                                   16
17-AUG-05 4C6504D4                                                    2
17-AUG-05 4C64FADC                                                   17
17-AUG-05 4C64FADC                                                   21
17-AUG-05 4C64FADC                                                   25
17-AUG-05 4C64FADC                                                   27
17-AUG-05 4C6504D4                                                    2
17-AUG-05 4C64FADC                                                   27
17-AUG-05 4C64FADC                                                   30
17-AUG-05 4C64FADC                                                   32
17-AUG-05 4C64FADC                                                   36
17-AUG-05 4C64FADC                                                   38
17-AUG-05 4C64FADC                                                   49
17-AUG-05 4C64FADC                                                   52
17-AUG-05 4C64FADC                                                   52
17-AUG-05 4C64FADC                                                   52
17-AUG-05 4C64FADC                                                   52
17-AUG-05 4C64FADC                                                   52
17-AUG-05 4C64FADC                                                   52
17-AUG-05 4C64FADC                                                   52
17-AUG-05 4C64FADC                                                   52

80 rows selected.

20:39:56 SQL>

#############################################################################

 

Tom Kyte

Followup  

August 18, 2005 - 3:34 pm UTC

as I said, I cannot even REMOTELY reproduce this at all.

date is only valid down to the second, perhaps -- these rows are scrambled. put I in that last column, not sysdate and see.


(select to_char(time, 'dd-mon-yyyy hh24:mi:ss' ) -- see if your times are not all in the same second there.

What if standby database were used?

August 18, 2005 - 9:25 am UTC

Reviewer: Michel from France

Hi Tom,
replacing update with CTAS and/or INSERT /*+ append */ INTO ... is especially usefull because of NOLOGGING

Now, what to do if we are using stnadby databases and so it is not a good idea to use operations with NOLOGGING





Tom Kyte

Followup  

August 18, 2005 - 4:05 pm UTC

CTAS and /*+ append */ are done with logging enabled in ARCHIVELOG mode, unless you specify "alter <segment_name> NOLOGGING" and even then the dba can "force logging" at the database level.


LOGGING happens with this stuff!!!

Excellent

August 18, 2005 - 12:55 pm UTC

Reviewer: Peter from Philadelphia, PA USA

Hi Tom,

You are the best of best. I visit your site everyday.

I need your expertise in this regard too.

I have a fact table T
( order_id number,
customer_id number,
order_date date,
month_id number,
prd_id number,
prd_name varchar2(30),
quantity number,
zip number,
flag smallint,
swiprd_id number)
with about 5 million records imported every month.
NOTE: one product (prd_name) may have one or more prd_ids. There are about 200 prd_name and 2000 different prd_id in this table)

Oracle SERVER: Oracle9i Enterprise Edition Release 9.2.0.5.0 - 64bit Production

T was partitioned by month_id.
A local index is builded on T(customer_id, order_date, prd_name).

table T was analyzed using
dbms_stats.GATHER_TABLE_STATS(
OWNNAME=>'USER',
TABNAME=>'T',
ESTIMATE_PERCENT=>dbms_stats.auto_sample_size,
METHOD_OPT => 'for all indexed columns',
cascade => true,
degree => 4);


I need to know following statitics informations each month

1) how many continue customers for each prd_id (not just prd_name) in this month.
the continue customer is defined as A customer who bought a product (with a prd_id ) in this month
and also bought same product (with same or different prd_id) in past 365 days and ((previous order_date+ 30*previous quantity) >= current order_date)
SET flag =1

2) how many restart customers for each prd_id (not just prd_name) in this month.
the restart customer is defined as A customer who is not a continue customer and bought a product (with a prd_id ) in this month
and also bought same product (with same or different prd_id) in past 365 days and ((previous order_date+ 30*previous quantity) < current order_date)

SET flag =2

3) how many switch customers for each prd_id (not just prd_name) in this month.
the restart customer is defined as A customer who is not a continue nor restart customer and bought a product (with prd_id) in this month
and also bought different product (not just different prd_id) in past 365 days.
and also record the what prd_id switched from
SET flag =3, swiprd_id = previous prd_id

4) how many new customers for each prd_id (not prd_name) in this month.
the new customer is defined as A customer who bouht a product but never bought any product in past 365 days
SET flag =4

Current solutions: use a procedure to process each order

CREATE OR REPLACE PROCEDURE get_flag(v_monthid IN NUMBER)
AS
cursor c1 IS
SELECT distinct prd_id
from T
WHERE month_id=v_monthid order by prd_id;

cursor c2 is
SELECT customer_id, order_date, prd_name
FROM T
WHERE prd_id=v_prd_id and month_id = v_monthid
order by customer_id, order_date desc, quantity desc
FOR UPDATE of flag NOWAITE ;

cursor c3 -- for all orders with same customer and same
-- product during lookback
is SELECT order_date, quantity
FROM T
WHERE customer_id = v_customer_id
and prd_name = v_prd_name
AND order_date BETIEN (v_order_date-366)
AND (v_order_date-1)
order by order_date desc, quantity desc;

cursor c4 -- for all orders with same customer and
-- different product during lookback
is SELECT prd_id
FROM T
WHERE customer_id = v_customer_id
and prd_name <> v_prd_name
AND order_date BETIEN (v_order_date-366)
AND (v_order_date-1)
order by order_date desc, quantity desc;


TYPE customer_array is TABLE of T.customer_id%type;
TYPE order_date_array is TABLE of T.order_date%type;
TYPE prd_name_array is TABLE of T.prd_name%type;
lv_customer_arr customer_array;
lv_order_date_arr order_date_array;
lv_prd_name_arr prd_name_array;

v_prd_id T.prd_id%type;
v_customer_id T.customer_id%type;
v_order_date T.order_date%type;
v_order_date_tmp T.order_date%type;
v_prd_name T.prd_name%type;
v_quantity T.quantity%type;
v_flag PLS_INTEGER;
v_swiprd_id T.swiprd_id%type;


BEGIN
FOR lv_prd in c1
LOOP
v_prd_id :=lv_prd.prd_id;
v_customer_id :=0 ;
OPEN c2;
LOOP
FETCH BULK COLLECT INTO lv_customer_arr,
lv_order_date_arr, lv_prd_name_arr
LIMIT 100;
FOR i in 1 .. lv_customer_arr.count
LOOP
IF v_customer_id <> lv_customer_arr(i)
v_customer_id := lv_customer_arr(i);
v_order_date := lv_order_date_arr(i)
v_prd_name := lv_prd_name_arr(i);
v_flag := 0;
v_swiprd_id := 0;

OPEN c3;
FETCH c3 INTO v_order_date_tmp, v_quantity ;
IF c3%NOTFOUND
THEN
OEPN c4
FETCH INTO v_swiprd_id ;
IF c4%NOTFOUND
THEN -- bought another product
v_flag :=4;
ELSE
v_flag :=3;
END IF;
CLOSE c4;
ELSE -- bought same product
IF (v_order_date_tmp + 30*v_quantity)
>=v_order_date
THEN
v_flag :=1 ;
ELSE
v_flag :=2;
END IF;
END IF;
CLOSE c3
-- UPDATE flag and swiprd_id in table T
IF v_flag =3
THEN
UPDATE T
SET flag = v_flag,
swiprd_id = v_swiprd_id
WHERE CURRENT OF c2;
ELSE
UPDATE T
SET flag = v_flag
WHERE CURRENT OF c2;
END IF;
END IF;
END LOOP;

EXIT WHEN c2%notfound;
END LOOP;
CLOSE c2;
END LOOP;

EXCEPTION
WHEN ... THEN ....;
WHEN OTHERS THEN
dbms_output.put_line(SubStr('Error '||TO_CHAR(SQLCODE)||': '||SQLERRM, 1, 255));
RAISE;
END;


After processing I can run some query to get statitics information I need.


This solution is obviously very slow.
(for each record 1~2 querys invoke, total about 5~7 million querys called)
It will take about a week to finish the whole process.
IF I divide the prd_id into 6 different ranges and run 6 procedures with diffent prd_id ranges at same time.
It will take about 2 days. But it is still too long.

I did go thru. some threads in your site. you suggested that:

CREATE TABLE new_table as select <do the update "here"> from old_table;
............


but it is very hard to achieve <do the update "here"> here.


My questions are:

1) Is there any better way to achieve this?


2) IF this is a way to be, how to improve the performance.


Thanks,

As you requested

August 20, 2005 - 9:54 am UTC

Reviewer: sami

Dear Tom,

As you requested I put the sequence number(loop#).
I am able to reproduce consistenly in both 8i solaris and 10 windows.

Look at the TADDR difference between
  SEQUENCE TADDR                 UNDO_UBLK TIME
---------- -------------------- ---------- ------------------
   18714 4C656458                     27 20-AUG-05 09:26:51
   18715 4C65695C                      2 20-AUG-05 09:26:51


Session#1
==========
SQL> truncate table t1;

Table truncated.

SQL> select index_name from dba_indexes where table_name='T1';

INDEX_NAME
------------------------------
T1_PK

SQL> desc t1
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------

 OWNER                                              VARCHAR2(30)
 OBJECT_NAME                                        VARCHAR2(128)
 SUBOBJECT_NAME                                     VARCHAR2(30)
 OBJECT_ID                                 NOT NULL NUMBER
 DATA_OBJECT_ID                                     NUMBER
 OBJECT_TYPE                                        VARCHAR2(19)
 CREATED                                            DATE
 LAST_DDL_TIME                                      DATE
 TIMESTAMP                                          VARCHAR2(19)
 STATUS                                             VARCHAR2(7)
 TEMPORARY                                          VARCHAR2(1)
 GENERATED                                          VARCHAR2(1)
 SECONDARY                                          VARCHAR2(1)

SQL> select sid from v$mystat where rownum=1;

       SID
----------
       122


----*** STARTED MONITONING THIS SESSION FROM HERE ***----

SQL> insert /*+ APPEND */ into t1 select * from dba_objects;

48526 rows created.

SQL>
----*** STOPPED MONITONING THIS SESSION FROM HERE ***----
##########################################################


Session#2
===================
 SQL> drop table undo_monitor;
 
 Table dropped.
 
 SQL> create global temporary table undo_monitor(sequence number,taddr varchar2(2
 0),undo_ublk number,time date)
   2  ;
 
 Table created.
 
 SQL> alter session set nls_date_format='DD-MON-YY HH24:MI:SS';
 
 Session altered.
 
 SQL>  declare
   2   i number :=0;
   3   begin
   4   loop
   5   for c0rec in (select taddr,used_ublk from v$session s,v$transaction t
   6   where s.taddr=t.addr
   7   and s.sid=&session_to_be_monitored)
   8   loop
   9   insert into undo_monitor values(i,c0rec.taddr,c0rec.used_ublk,sysdate);
  10   end loop;
  11   i := i+1;
  12   if i=100000 then
  13   exit;
  14   end if;
  15   end loop;
  16   end;
  17   /
 Enter value for session_to_be_monitored: 122
 old   7:  and s.sid=&session_to_be_monitored)
 new   7:  and s.sid=122)
 
 PL/SQL procedure successfully completed.
 
 SQL> spool c:\temp\1.log
 SQL> set lines 120 pages 300
SQL> select min(undo_ublk),max(undo_ublk) from undo_monitor;

MIN(UNDO_UBLK) MAX(UNDO_UBLK)
-------------- --------------
             1             52

SQL>
SQL> select * from undo_monitor where undo_ublk >1 and undo_ublk <52 order by se
quence;
  SEQUENCE TADDR                 UNDO_UBLK TIME
---------- -------------------- ---------- ------------------
[truncated for display purpose]

    18696 4C656458                     27 20-AUG-05 09:26:51
    18697 4C656458                     27 20-AUG-05 09:26:51
    18698 4C656458                     27 20-AUG-05 09:26:51
    18699 4C656458                     27 20-AUG-05 09:26:51
    18700 4C656458                     27 20-AUG-05 09:26:51
    18701 4C656458                     27 20-AUG-05 09:26:51
    18702 4C656458                     27 20-AUG-05 09:26:51
    18703 4C656458                     27 20-AUG-05 09:26:51
    18704 4C656458                     27 20-AUG-05 09:26:51
    18705 4C656458                     27 20-AUG-05 09:26:51
    18706 4C656458                     27 20-AUG-05 09:26:51
    18707 4C656458                     27 20-AUG-05 09:26:51
    18708 4C656458                     27 20-AUG-05 09:26:51
    18709 4C656458                     27 20-AUG-05 09:26:51
    18710 4C656458                     27 20-AUG-05 09:26:51
    18711 4C656458                     27 20-AUG-05 09:26:51
    18712 4C656458                     27 20-AUG-05 09:26:51
    18713 4C656458                     27 20-AUG-05 09:26:51
    18714 4C656458                     27 20-AUG-05 09:26:51
    18715 4C65695C                      2 20-AUG-05 09:26:51
    18716 4C65695C                      2 20-AUG-05 09:26:51
    18717 4C65695C                      2 20-AUG-05 09:26:51
    18718 4C65695C                      2 20-AUG-05 09:26:51
    18719 4C65695C                      2 20-AUG-05 09:26:51
    18720 4C65695C                      2 20-AUG-05 09:26:51
    18721 4C65695C                      2 20-AUG-05 09:26:51
    18722 4C65695C                      2 20-AUG-05 09:26:51
    18723 4C65695C                      2 20-AUG-05 09:26:51
    18724 4C65695C                      2 20-AUG-05 09:26:51
    18725 4C65695C                      2 20-AUG-05 09:26:51
    18726 4C65695C                      2 20-AUG-05 09:26:51
    18727 4C65695C                      2 20-AUG-05 09:26:51
    18728 4C65695C                      2 20-AUG-05 09:26:51
    18729 4C65695C                      2 20-AUG-05 09:26:51
    18730 4C65695C                      2 20-AUG-05 09:26:51
    18731 4C65695C                      2 20-AUG-05 09:26:51
    18732 4C65695C                      2 20-AUG-05 09:26:51
    18733 4C65695C                      2 20-AUG-05 09:26:51
    18734 4C65695C                      2 20-AUG-05 09:26:51
    18735 4C65695C                      2 20-AUG-05 09:26:51
    18736 4C65695C                      2 20-AUG-05 09:26:51
    18737 4C65695C                      2 20-AUG-05 09:26:51
    18738 4C65695C                      2 20-AUG-05 09:26:51
    18739 4C65695C                      2 20-AUG-05 09:26:51
    18740 4C65695C                      2 20-AUG-05 09:26:51
    18741 4C65695C                      2 20-AUG-05 09:26:51
    18742 4C65695C                      2 20-AUG-05 09:26:51
    18743 4C65695C                      2 20-AUG-05 09:26:51
    18744 4C65695C                      2 20-AUG-05 09:26:51
    18745 4C65695C                      2 20-AUG-05 09:26:51
    18746 4C656458                     38 20-AUG-05 09:26:51
    18747 4C656458                     38 20-AUG-05 09:26:51
    18748 4C656458                     38 20-AUG-05 09:26:51
    18749 4C656458                     38 20-AUG-05 09:26:51
    18750 4C656458                     38 20-AUG-05 09:26:51
    18751 4C656458                     38 20-AUG-05 09:26:51
    18752 4C656458                     38 20-AUG-05 09:26:51
    18753 4C656458                     38 20-AUG-05 09:26:51
    18754 4C656458                     38 20-AUG-05 09:26:51
    18755 4C656458                     38 20-AUG-05 09:26:51
    18756 4C656458                     38 20-AUG-05 09:26:51
    18757 4C656458                     38 20-AUG-05 09:26:51
    18758 4C656458                     38 20-AUG-05 09:26:51
    18759 4C656458                     38 20-AUG-05 09:26:51
    18760 4C656458                     38 20-AUG-05 09:26:51
    18761 4C656458                     38 20-AUG-05 09:26:51
    18762 4C656458                     38 20-AUG-05 09:26:51
    18763 4C656458                     38 20-AUG-05 09:26:51
    18764 4C656458                     38 20-AUG-05 09:26:51
    18765 4C656458                     38 20-AUG-05 09:26:51
    18766 4C656458                     38 20-AUG-05 09:26:51
    18767 4C656458                     38 20-AUG-05 09:26:51
    18768 4C656458                     38 20-AUG-05 09:26:51
    18769 4C656458                     38 20-AUG-05 09:26:51
    18770 4C656458                     38 20-AUG-05 09:26:51
    18771 4C656458                     38 20-AUG-05 09:26:51
    18772 4C656458                     38 20-AUG-05 09:26:51
    18773 4C656458                     38 20-AUG-05 09:26:51
    18774 4C656458                     38 20-AUG-05 09:26:51
    18775 4C656458                     38 20-AUG-05 09:26:51
    18776 4C656458                     38 20-AUG-05 09:26:51
    18777 4C656458                     38 20-AUG-05 09:26:51
    18778 4C656458                     38 20-AUG-05 09:26:51
    18779 4C656458                     38 20-AUG-05 09:26:51
    18780 4C656458                     38 20-AUG-05 09:26:51
    18781 4C656458                     38 20-AUG-05 09:26:51
    18782 4C656458                     38 20-AUG-05 09:26:51
    18783 4C656458                     38 20-AUG-05 09:26:51
    18784 4C656458                     38 20-AUG-05 09:26:51
    18785 4C656458                     38 20-AUG-05 09:26:51
    18786 4C656458                     38 20-AUG-05 09:26:51
    18787 4C656458                     38 20-AUG-05 09:26:51
    18788 4C656458                     38 20-AUG-05 09:26:51
    18789 4C656458                     38 20-AUG-05 09:26:51
    18790 4C656458                     38 20-AUG-05 09:26:51
    18791 4C656458                     38 20-AUG-05 09:26:51
    18792 4C656458                     38 20-AUG-05 09:26:51
    18793 4C656458                     38 20-AUG-05 09:26:51
    18794 4C656458                     38 20-AUG-05 09:26:51
    18795 4C656458                     38 20-AUG-05 09:26:51
    18796 4C656458                     38 20-AUG-05 09:26:51
    18797 4C656458                     38 20-AUG-05 09:26:51
    18798 4C656458                     38 20-AUG-05 09:26:51
    18799 4C656458                     38 20-AUG-05 09:26:51
    18800 4C656458                     38 20-AUG-05 09:26:51
    18801 4C656458                     38 20-AUG-05 09:26:51
    18802 4C656458                     38 20-AUG-05 09:26:51
    18803 4C656458                     38 20-AUG-05 09:26:51
    18804 4C656458                     38 20-AUG-05 09:26:51
    18805 4C656458                     38 20-AUG-05 09:26:51
    18806 4C656458                     38 20-AUG-05 09:26:51
    18807 4C656458                     38 20-AUG-05 09:26:51
    18808 4C656458                     38 20-AUG-05 09:26:51
    18809 4C656458                     38 20-AUG-05 09:26:51
    18810 4C656458                     38 20-AUG-05 09:26:51
    18811 4C656458                     38 20-AUG-05 09:26:51
    18812 4C656458                     38 20-AUG-05 09:26:51
    18813 4C656458                     38 20-AUG-05 09:26:51
    18814 4C656458                     38 20-AUG-05 09:26:51
    18815 4C656458                     38 20-AUG-05 09:26:51
    18816 4C656458                     38 20-AUG-05 09:26:51
    18817 4C656458                     38 20-AUG-05 09:26:51
    18818 4C656458                     38 20-AUG-05 09:26:51
    18819 4C656458                     38 20-AUG-05 09:26:51
    18820 4C656458                     38 20-AUG-05 09:26:51
    18821 4C656458                     38 20-AUG-05 09:26:51
    18822 4C656458                     38 20-AUG-05 09:26:51
    18823 4C656458                     38 20-AUG-05 09:26:51
    18824 4C656458                     38 20-AUG-05 09:26:51
    18825 4C656458                     38 20-AUG-05 09:26:51
    18826 4C656458                     38 20-AUG-05 09:26:51
    18827 4C656458                     38 20-AUG-05 09:26:51
    18828 4C656458                     38 20-AUG-05 09:26:51
    18829 4C656458                     38 20-AUG-05 09:26:51
    18830 4C656458                     38 20-AUG-05 09:26:51
    18831 4C656458                     38 20-AUG-05 09:26:51
    18832 4C656458                     38 20-AUG-05 09:26:51
    18833 4C656458                     38 20-AUG-05 09:26:51
    18834 4C656458                     38 20-AUG-05 09:26:51
    18835 4C656458                     38 20-AUG-05 09:26:51
    18836 4C656458                     38 20-AUG-05 09:26:51
    18837 4C656458                     38 20-AUG-05 09:26:51
    18838 4C656458                     40 20-AUG-05 09:26:51
    18839 4C656458                     40 20-AUG-05 09:26:51
    18840 4C656458                     40 20-AUG-05 09:26:51
    18841 4C656458                     40 20-AUG-05 09:26:51
    18842 4C656458                     40 20-AUG-05 09:26:51
    18843 4C656458                     40 20-AUG-05 09:26:51
    18844 4C656458                     40 20-AUG-05 09:26:51
    18845 4C656458                     40 20-AUG-05 09:26:51
    18846 4C656458                     40 20-AUG-05 09:26:51
    18847 4C656458                     40 20-AUG-05 09:26:51
    18848 4C656458                     40 20-AUG-05 09:26:51
    18849 4C656458                     40 20-AUG-05 09:26:51
    18850 4C656458                     40 20-AUG-05 09:26:51
    18851 4C656458                     40 20-AUG-05 09:26:51
    18852 4C656458                     40 20-AUG-05 09:26:51
    18853 4C656458                     40 20-AUG-05 09:26:51
    18854 4C656458                     40 20-AUG-05 09:26:51
    18855 4C656458                     40 20-AUG-05 09:26:51
    18856 4C656458                     40 20-AUG-05 09:26:51
    18857 4C656458                     40 20-AUG-05 09:26:51
    18858 4C656458                     40 20-AUG-05 09:26:51
    18859 4C656458                     40 20-AUG-05 09:26:51
    18860 4C656458                     40 20-AUG-05 09:26:51
    18861 4C656458                     40 20-AUG-05 09:26:51
    18862 4C656458                     40 20-AUG-05 09:26:51
    18863 4C656458                     40 20-AUG-05 09:26:51
    18864 4C656458                     40 20-AUG-05 09:26:51
    18865 4C656458                     40 20-AUG-05 09:26:51
    18866 4C656458                     45 20-AUG-05 09:26:51
    18867 4C656458                     45 20-AUG-05 09:26:51
    18868 4C656458                     45 20-AUG-05 09:26:51
    18869 4C656458                     45 20-AUG-05 09:26:51
    18870 4C656458                     45 20-AUG-05 09:26:51
    18871 4C656458                     45 20-AUG-05 09:26:51
    18872 4C656458                     45 20-AUG-05 09:26:51
    18873 4C656458                     45 20-AUG-05 09:26:51
    18874 4C656458                     45 20-AUG-05 09:26:51
    18875 4C656458                     45 20-AUG-05 09:26:51
    18876 4C656458                     45 20-AUG-05 09:26:51
    18877 4C656458                     45 20-AUG-05 09:26:51
    18878 4C656458                     45 20-AUG-05 09:26:51
    18879 4C656458                     46 20-AUG-05 09:26:51
    18880 4C656458                     51 20-AUG-05 09:26:51
    18881 4C656458                     51 20-AUG-05 09:26:51
    18882 4C656458                     51 20-AUG-05 09:26:51
    18883 4C656458                     51 20-AUG-05 09:26:51
    18884 4C656458                     51 20-AUG-05 09:26:51
    18885 4C656458                     51 20-AUG-05 09:26:51
    18886 4C656458                     51 20-AUG-05 09:26:51
    18887 4C656458                     51 20-AUG-05 09:26:51
    18888 4C656458                     51 20-AUG-05 09:26:51
    18889 4C656458                     51 20-AUG-05 09:26:51
    18890 4C656458                     51 20-AUG-05 09:26:51
 

Tom Kyte

Followup  

August 20, 2005 - 5:15 pm UTC

Recursive SQL.

turn on sql trace and see if you are not doing SQL behind your SQL (space management, quotas, etc)....

That was likely a recursive sql transaction to update the dictionary.

Which is the best way to update?

August 22, 2005 - 11:45 pm UTC

Reviewer: A Reader from India

Hi Tom,

We have a fact table with few dimension tables. Initially the records are inserted into the fact table. This way most of the columns in the table will be populated with values. For the remaining columns we have to update the table by joining with other tables.
1) What will be the best way to update? We are currently using correlated queries to update, but there is no index on the column being joined in the fact table. The source table has index if its a key column.

2)Should we create index on the joining columns in the fact table? If there are many such columns, is it a good idea to create index on each of them before updating? Will this improve performance?

3)We are currently analyzing the fact table before update. Is this the right way though we are not creating index on all required columns before update?

4) Should index be created on table before update and the table be analyzed?

Your inputs will be highly appreciated and very helpful.

Thanks in advance.

Tom Kyte

Followup  

August 24, 2005 - 3:44 am UTC

I like to update joins

update ( select t1.a, t2.b from t1, t2 where t1.fk = t2.pk )
set a = b;

t2 would need to have a primary key in place for this.

Which is the best way to update?

August 24, 2005 - 12:17 am UTC

Reviewer: A Reviewer from India

Tom can you please answer the queries posted in the immediately previous post?

Tom Kyte

Followup  

August 24, 2005 - 9:06 am UTC

I did, update the join by primary key. That was my answer. Use the CBO.

October 02, 2005 - 1:01 pm UTC

Reviewer: Allan Sun from IL, USA

I found your answer is very helpful and wonder how I can submit my question to you.
Thanks


Tom Kyte

Followup  

October 02, 2005 - 5:35 pm UTC

when taking them, there is a button on the home page.

Calculate redolog generation

October 04, 2005 - 1:49 pm UTC

Reviewer: A reader

Tom,
Is there any way to calculate the amount of redolog that is going to be generated when some 2 million records are updated?
I have to update 2 million records (I can't recreate a dummy table etc etc). I need to know how much redolog will be generated and OS space required for the undo datafile

Tom Kyte

Followup  

October 04, 2005 - 6:23 pm UTC

not really, not without simulating at least a portion of it on some system and measuring the amount of redo.

Table/column widths, row chaining/migration, indexes involved - they'll all play a role.

I frequently do 1,000,000 to 10,000,000 records on my laptop - I would think you can test this out on a test system someplace?

it's very good

December 07, 2005 - 5:51 am UTC

Reviewer: gopal


Delete, then Insert or Update?

January 06, 2006 - 3:35 pm UTC

Reviewer: A reader

Tom,
Quick Question. while trying to modify a row in a table from a procedure, is it better[Performance wise] to delete existing one and then insert with new values or update the existing values with new values,WHY?

Thanks,

Tom Kyte

Followup  

January 06, 2006 - 4:52 pm UTC

update is less work.

you typically update a subset of the columns.

only the columns that change values and have indexes on them will affect the indexes.

doing a delete+insert would take more resources, hit all indexes.

Large update due to consolidation

January 25, 2006 - 8:41 pm UTC

Reviewer: David Piazza from San Mateo, CA

We are consolidating five databases where the schemas are the same into one database. Currently a test system was created with five schemas that hold the previous data from the five databases. It isn't a clean update as all primary keys are in the same range, so some number x will have to be added to the primary(Psuedo key) and child keys in each schema to avoid dups. After reading through this topic, I can see two ways of doing the merge and updating the first schema: CTAS incrementing each primary and foreign key by some x value. Then either insert /*+ append or use the merge command to insert/merge the data into the first schema. The second way would be to directly insert /*+ append into the first schema from tables in each of the five schemas adding the appropriate value to the primary/foreign keys. How would you suggest implementing this?

Tom Kyte

Followup  

January 26, 2006 - 10:16 am UTC

I would try to load the "fixed" data.

Eg: if the keys that are in need of fixing are all numbers - I might multiply each key by 10 and add 1, 2, 3, 4, or 5 to each during the load.

mass insert code package

February 13, 2006 - 4:47 pm UTC

Reviewer: Brett Clark from Des Moines, IA USA

For what its worth, here is a package that can be used as a generic foundation for mass loads. It will disable triggers and indexes and (optionally) truncate the main table before running the insert statement that you provide. Hope this helps someone.

create or replace package table_support is

TRUNCATE_NO constant boolean := FALSE;
TRUNCATE_YES constant boolean := TRUE;

procedure load_table(G_TABLE_NAME Varchar2, G_INSERT_STMT Varchar2, TRUNC_ACTION Boolean DEFAULT TRUNCATE_NO );

end table_support;
/
create or replace package body table_support is

ACTION_ENABLE constant boolean := TRUE;
ACTION_DISABLE constant boolean := FALSE;

procedure exec( inStmt Varchar2 ) is
begin
EXECUTE IMMEDIATE inStmt;
end exec;

function exec( inStmt Varchar2 ) return Number is
begin
exec( inStmt );
return( SQL%ROWCOUNT );
end exec;

procedure tbl_triggers(G_TABLE_NAME Varchar2, action Boolean) is
begin
if( action = ACTION_ENABLE ) then
exec( 'ALTER TABLE ' || G_TABLE_NAME || ' ENABLE ALL TRIGGERS' );
elsif ( action = ACTION_DISABLE ) then
exec( 'ALTER TABLE ' || G_TABLE_NAME || ' DISABLE ALL TRIGGERS' );
end if;
end tbl_triggers;

procedure tbl_indexes( G_TABLE_NAME Varchar2, action Boolean ) is
begin
FOR C IN (SELECT INDEX_NAME
FROM USER_INDEXES
WHERE TABLE_NAME = G_TABLE_NAME) LOOP
if( action = ACTION_DISABLE ) then
exec( 'alter index ' || C.INDEX_NAME || ' unusable' );
elsif( action = ACTION_ENABLE ) then
exec( 'alter index ' || C.INDEX_NAME || ' rebuild parallel nologging' );
end if;
END LOOP;

if( action = ACTION_ENABLE ) then
exec( 'alter session set skip_unusable_indexes=false' );
elsif ( action = ACTION_DISABLE ) then
exec( 'alter session set skip_unusable_indexes=true' );
end if;
end tbl_indexes;


procedure load_table(G_TABLE_NAME Varchar2, G_INSERT_STMT Varchar2, TRUNC_ACTION Boolean DEFAULT TRUNCATE_NO ) is
rowCount Number;
begin
-- DISABLE ALL TRIGGERS
tbl_triggers( G_TABLE_NAME, ACTION_DISABLE );

-- TRUNCATE TABLE
if( TRUNC_ACTION = TRUNCATE_YES ) then
exec( 'truncate table ' || G_TABLE_NAME );
end if;

-- DISABLE INDEXES
tbl_indexes( G_TABLE_NAME, ACTION_DISABLE );

-- ENABLE PARALLEL DML
exec( 'alter session enable parallel dml' );

-- EXECUTE INSERT STATEMENT
rowCount := exec( G_INSERT_STMT );

DBMS_OUTPUT.PUT_LINE('Inserted ' || rowCount || ' records into ' || G_TABLE_NAME);

-- ENABLE/REBUILD INDEXES
tbl_indexes( G_TABLE_NAME, ACTION_ENABLE );

-- ENABLE TRIGGERS
tbl_triggers( G_TABLE_NAME, ACTION_ENABLE );

-- TURN OFF PARALLEL DML
exec( 'alter session disable parallel dml' );

end load_table;


begin
-- Initialization
NULL;
end table_support;
/


mass update

February 17, 2006 - 5:19 pm UTC

Reviewer: Hans from Canada

I have a table with 50 columns and 10 million records and I have a file with 10 million rows and I have to update 5 columns in the table from the file( the update is performed by getting the common ID value).
Steps I am performing
1.open the file using utl_file
2. read the record
3. perform the update in the table
The process is very slow.
I will appreciate better ideas from the group

Tom Kyte

Followup  

February 18, 2006 - 8:16 am UTC

external table (so you can query the flat file), or sqlldr it into a temporary table.

and then merge.

If you are in 9i, you'll have to:

merge into db_table
using (select * from external_table where id in (select id from db_table))
on (db_table.id = external_table.id)
when matched then update ...
when not matched then insert (id) values (null); <<<=== never happens


in 10g, you can just

merge into db_table
using external_table
on (db_table.id = external_table.id)
when matched then update ...;




February 18, 2006 - 5:37 am UTC

Reviewer: A reader


Morning TOM.

February 19, 2006 - 12:14 am UTC

Reviewer: sudip from BGALORE, india

What is archieve log mode? why we prefer recovery in Archieve log mode? Please, will u brief it?

Tom Kyte

Followup  

February 19, 2006 - 9:06 am UTC

"U" is not here
I have never met "U"

If you know how to contact them - I'd really appreciate a pointer on doing so. They get a lot of requests.


If you don't mind, I'll give you a pointer to what archivelog mode is (until we can find this "U" person)

</code> http://docs.oracle.com/docs/cd/B19306_01/backup.102/b14192/strategy003.htm#sthref127 <code>

:)

February 19, 2006 - 12:17 pm UTC

Reviewer: A reader

To sudip from BGALORE, india

Looks like you are new to this site just put one thing in your mind Tom really ticks off for the "U" so dont use it next time :-)

Tom Kyte

Followup  

February 19, 2006 - 5:16 pm UTC

Hey, the page he used to post that actually said "if you do this, I will do what I just did..."

Which is to poke fun at at :)

It amazes more than ticks me off.

Just like in some parts of the world, an "English Accent" adds 10 IQ points (when I hear a British accent, for whatever reason - I add default IQ points, maybe it was all of the James Bond movies when I was a kid) - using this instant message speak takes them away.

Large Update Due To Consolidation

February 21, 2006 - 4:29 pm UTC

Reviewer: David Piazza from San Mateo, CA

Tom,

Regarding my previous question. They've decided to do merge one of the five clusters at a time. I've got scripts to implement the first cluster as such:

CTAS NOLOGGING setting the keys to old_key*10+1
insert /*+ append */
DROP old table
RENAME new table
GRANTs
ALTER TABLE add Primary Key
ALTER TABLE add Foreign Key
CREATE INDEX NOLOGGING parallel(degree 4)
ALTER TABLE ADD CONSTRAINT UNIQUE ENABLE
ALTER TABLE Logging

This seems pretty fast. Anything you see that will make it faster?

When I merge in the next clusters, should I:
- Do the same steps as above(*10+2,*10+3 etc.)
- Disable primary keys, make indexes unusable and
Insert /*+ append, and then enable PK's, and indexes
- Drop primary keys, indexes, and constraints, and
INSERT /*+ append, create PK's, Indexes, and constraints

The number of rows in the tables(Approximately the same in each cluster) are: 72338677,32278464,25347185,23834984,7595305,4519904,4371347,1229877,1168373,6669752,6382030,97250,4694,934,166

-Thanks


Tom Kyte

Followup  

February 22, 2006 - 8:26 am UTC

why not all of them at once to avoid having to do the disable/enable over and over.

Large Update Due To Consolidation

February 22, 2006 - 1:28 pm UTC

Reviewer: David Piazza from San Mateo, CA

That was my suggestion, but the customer wants to do one cluster at a time, especially the first one, ensuring that it works as before and all the users are happy before merging another cluster in...

Tom Kyte

Followup  

February 22, 2006 - 2:09 pm UTC

then you got it.

You don't really need to disable all contraints and if you could get it done in two bits - that would be the least work (load cluster 1, then 2, 3, 4 and 5 after testing 1)

Large Update Due To Consolidation

February 23, 2006 - 4:23 pm UTC

Reviewer: David Piazza from San Mateo, CA

Ok. Which of the three methods would you suggest to merge in the remaining clusters, or would you recommend something else?

1) Do the same steps as above, CTAS/DROP old table/Rename
new table to old table/Insert /*+ append(*10+3,*10+4
etc.)/Create constraints,indexes.
OR
2) Disable primary keys, make indexes unusable and
Insert /*+ append, and then enable PK's, and indexes
OR
3) Drop primary keys, indexes, and constraints, and
INSERT /*+ append, create PK's, Indexes, and constraints
OR
4) Something else

I'm leaning towards 2) or 3)

-Thanks


Tom Kyte

Followup  

February 23, 2006 - 8:05 pm UTC

I would just be disabling indexes (if I wanted to - insert append does a rather efficient "merge" of data) and insert appending the other data. I would not redo the CTAS over.

Bulk Insert / Update in sigle SQL

February 24, 2006 - 1:49 am UTC

Reviewer: Narendra from India

Tom,

I have a target table that stores current as well as history records. Current records are identified by a field CURRENT_REC having value of 'Y'. History records have value of CURRENT_REC as 'N'. At any stage, for a PART_NO, there can be only SINGLE record with CURRENT_REC as 'Y' in target table.
My problem is I want to transfer data (in bulk) from source table to target table with CURRENT_REC flag as 'Y' and simultaneously update existing records having CURRENT_REC as 'Y'. Following is table definition:
Source table:
CREATE TABLE PART_SRC
(
PART_NO NUMBER,
PART_NAME VARCHAR2(100),
BALANCE_QTY NUMBER(5),
CREATED_DATE DATE,
CURRENT_REC VARCHAR2(1)
)
target Table:
CREATE TABLE PART_TABLE
(
PART_NO NUMBER,
PART_NAME VARCHAR2(100),
BALANCE_QTY NUMBER(5),
CREATED_DATE DATE,
CURRENT_REC VARCHAR2(1)
)

Is this possible in a single SQL ?

Tom Kyte

Followup  

February 24, 2006 - 8:25 am UTC

I don't get it.

I want to transfer data (in bulk) from source table to target table with current_rec flag as 'Y'

I understand that.

the "simultaneously update existing records have a current-rec as 'Y'"

which existing records, in which table, and update what exactly (since we are already looking only at 'Y' records - what exactly are we updating??!?!)

Bulk Insert / Update in sigle SQL

February 24, 2006 - 9:50 am UTC

Reviewer: Michel Cadot from France

If i clearly understand you, you want something like (inspired by "merge delete" or "insert all" statements):

merge into part_table t
using part_src s
on (t.part_no = s.part_no and t.current_rec='Y')
when matched then
update t.current_rec = 'N'
insert (t.part_no, ...) values (s.part_no, ...)
when not matched then
insert (t.part_no, ...) values (s.part_no, ...)
/

or

merge ALL into part_table t
using part_src s
on (t.part_no = s.part_no and t.current_rec='Y')
when matched then update t.current_rec = 'N'
when TRUE then
insert (t.part_no, ...) values (s.part_no, ...)
/

But this does not exist. ;)

Michel


More details

February 24, 2006 - 9:28 pm UTC

Reviewer: Narendra from India

Tom,

One correction to table definitions.
Source table does not contain CURRENT_REC field.
What I am trying to achieve is transfer data from source to target. Each time data is appended to target table. I may transfer updated details of PART_NO multiple times from source to target. However, I do not want data in target table to be overwritten. In order to identify the "last loaded record", I am maintaining CURRENT_REC field in target table. If a PART_NO (from source table) does not exist in target table, it should simply add that PART_NO to target table with value of CURRENT_REC = 'Y'. However, if PART_NO already exists i.e. data has been transferred earlier from source to target for that PART_NO, target table should already have a single record with CURRENT_REC as 'Y' and may have multiple records with CURRENT_REC as 'N' for same PART_NO. Now, when I transfer data for same PART_NO again from source to target, it should be added with CURRENT_REC as 'Y'. However, at the same time, existing record for same PART_NO, having CURRENT_REC as 'Y' should be updated with 'N'.
Hope I am clear.

Michel,
MERGE will not help since CURRENT_REC will not be there in my source table.

Thanks

More details

February 25, 2006 - 8:45 am UTC

Reviewer: Michel Cadot from France

Narendra,

I understood this and the pseudo-merges don't use the current_rec of source (s) but only of target (t). It would be clearer if I wrote "insert (t.part_no, ..., t.current_rec) values (s.part_no, ..., 'YES')".

There is no way to do it in a single SQL statement but it is easy to do it in an efficient PL/SQL block.

Regards
Michel


Merging ten large tables

March 17, 2006 - 7:00 pm UTC

Reviewer: Jesse Buzzard from VA - USA

Good Evening,

This is my first foray into very large databases and I need some advice.

We have ten tables that contain blob data. The total size of the tables is approximately 1.8TB. They are stored in the same table space. How this came to be is a long story for another discussion thread. Our task is merge the tables into one table. After selecting the table with the largest number of rows as the target we came up with the below code to insert the data from another table into it.
-------- start of code -------
#: Announce start insert

echo "-------------------------------------------------------------"
echo "`date '+ %a %e-%b-%Y %X'` - $0 - started" >> $1
echo "-------------------------------------------------------------"

$ORACLE_HOME/bin/sqlplus /nolog << EOF
connect odirms/odirms

insert /* +APPEND PARALLEL (nto, 8) */ into odirms.t_ompf4 nto
select /* +APPEND PARALLEL (tompf1, 8) */ * from t_ompf1 tompf1;
commit;
exit

EOF

#: End of program
echo "`date '+ %a %e-%b-%Y %X'` - $0 - completed" >> $1
echo "-------------------------------------------------------------"
echo
------------ end of code ----------

At present we are trying to merge 4 of the tables. All logging and indexes are off. Multi threading is on. From all I have read we should have completed this in a relatively short amount of time. This job has been running for over 28 hours. I can see the space used is growing (about 300MB /minute), but my supervisors would like to know when we will have all the tables merged.
Where have gone astray?

Thank You,
Jesse


Tom Kyte

Followup  

March 18, 2006 - 11:55 am UTC

well, you said "all indexes are off" and "we have blobs" - the two concepts do not go together.  every blob has an index associated with it that cannot possibly be disabled for any reason (short of dropping the blob column!)

So, you do having logging going on, you do have index maintenance going on. 

but you don't give any information as to the size of the tables being read (and hence written into the new table). 

I don't think anything has "gone astray" here, I can say that your "hint" isn't a hint, it is just text you typed in:

ops$tkyte@ORA10GR2> create table t as select * from dual;

Table created.

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> insert /* +APPEND */ into t select * from dual;

1 row created.

ops$tkyte@ORA10GR2> select * from t;

D
-
X
X

ops$tkyte@ORA10GR2> insert /*+ APPEND */ into t select * from dual;

1 row created.

ops$tkyte@ORA10GR2> select * from t;
select * from t
              *
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel



See how the /* + didn't really do anything, but /*+ does.  If that is your real script, you likely did not achieve your goal of either append nor parallel (parallel implies append by the way, if it was in fact using parallel)

 

Merging ten large tables

March 20, 2006 - 12:15 am UTC

Reviewer: Jesse Buzzard from VA - USA

Thank you for your response.
If I may I would like to impose on your wisdom again. We are averaging 1800MB per hour. Yes, the sample I provied is one of our real scripts, and yes the goal was work in parallel. At present five of the tables (56% of the rows) have been merged. The total size of the data, is 1.8 TB in 32,518,931 rows. As you can see from the description below the amount of data in addition to the blob data is rather small. If we have gotten the hint correct would we have achieve a better insert/append rate?

Name Null? Type
----------------------------------------- -------- -----------

OMPF_ID NOT NULL NUMBER
OMPF_UID NUMBER(10)
OMPF_IMG_TYPE_ID CHAR(1)
OMPF_IMG_STYPE_ID CHAR(1)
OMPF_PAGE_NUM NUMBER(5)
OMPF_ACTIVE_FLAG CHAR(1)
OMPF_ADDED_DT DATE
OMPF_INACTIVE_DT DATE
OMPF_IMAGE BLOB
OMPF_LOCK_FLAG NUMBER(5)
OMPF_USER_ID CHAR(8)




Tom Kyte

Followup  

March 20, 2006 - 7:16 am UTC

you might have, you might not have. you sort of have to figure out where you are spending your time (tracing - a level 12 10046 trace on a bit of this would go a long way)

Alternative Suggestion

March 21, 2006 - 2:40 pm UTC

Reviewer: VK from Atlanta

Here is an approach based on exchange partitions I used successfully in the past for combining tables (This will work only if the data in the individual tables is distinct and there is no overlap - A scenario in which this will work well is when people in their infinite wisdom create horizontally "partitioned" tables. An example may be where all customers with ID 1 to one million are kept in CUST1 table and one million to 5 million in CUST2 table and so on..):

4 tables to be combined:

create table test(a number,b blob);
create table test1(a number,b blob);
create table test2(a number,b blob);
create table test3(a number,b blob);

Now, let's say table test has values 1 to 10,000 in column a and test1 has 10,001 to 20,000, test2 has 20,001 to 30,000 and test3 has 30,001 to 40,000.

Create a partitioned table that looks exactly like the tables being combined - pick a partition key where the values do not overlap between tables.

create table testp(a number,b blob)
partition by range(a)
(
partition p1 values less than (10001),
partition p2 values less than (20001),
partition p3 values less than (30001),
partition p4 values less than (maxvalue)
);

You can combine the 4 tables into the partitioned table using exchange partition:

alter table testpart exchange partition p1 with table test;
alter table testpart exchange partition p1 with table test1;
alter table testpart exchange partition p3 with table test2;
alter table testpart exchange partition p4 with table test3;

At this point, you can use the partitioned table or if you don't want the table to be partitioned, you can merge all 4 partitions into a single partition and exchange with a non-partitioned table.

alter table testpart merge partitions p1,p2 into partition p5;
alter table testpart merge partitions p5,p3 into partition p6;
alter table testpart merge partitions p6,p4 into partition p7;

Now we have just one partition (P7) in the table and you can exchange this with any of the original tables (we'll use test).

alter table testpart exchange partition p7 with table test;

Now we have all 40,000 rows in table test. As you can see, it is a lot of work to combine 4 small tables but if you have tens/hundreds of millions of rows, this will be much faster.

Merging ten large tables

March 21, 2006 - 5:03 pm UTC

Reviewer: Jesse Buzzard from VA - USA

Tom,
VK,

Thank you for your suggestions. They are educational and helpful. The merges have nearly completed. Now I am off to read about reorganizing the tablespace to reclaim some of the space formerluy used by the smaller tables.




Merging ten large tables

March 21, 2006 - 5:48 pm UTC

Reviewer: Jesse Buzzard from VA - USA

Tom,
VK,

Thank you for your suggestions. They are educational and helpful. The merges have nearly completed. Now I am off to read about reorganizing the tablespace to reclaim some of the space formerluy used by the smaller tables.




Very slow insert into table

April 17, 2006 - 12:16 am UTC

Reviewer: bayaraat from Mongolia

Hi Tom

I need your expertise in this regard.
My table is
create table CDR00
(
CALLDATETIME DATE,
TELNO VARCHAR2(20),
CALLEDNO VARCHAR2(20),
SECONDS NUMBER,
PULSE NUMBER,
PREFIXCODEA CHAR(14),
PREFIXCODEB CHAR(14),
RATECODE CHAR(28),
TOG_AMOUNT NUMBER(20,2),
USD_AMOUNT NUMBER(20,8),
NO_TOG_AMOUNT NUMBER(20,8),
NO_USD_AMOUNT NUMBER(20,2),
TOG_TAX NUMBER(20,8),
USD_TAX NUMBER(20,8),
FNAME VARCHAR2(12),
OPERATOR VARCHAR2(10),
RESERVED VARCHAR2(10),
PRINTED NUMBER(1)
)
tablespace CDR01
pctfree 20
pctused 80
initrans 1
maxtrans 255
storage
(
initial 1M
minextents 1
maxextents unlimited
);
Every day insert into this table 2 million records and truncate table. This contains million or records insert into partition table.
Insert process is very slow, respond times 12:00:00.
I want to quickly insert many records. I using RAID 5.
Not use constraint and trigger. Non stop archive log process. Because Oracle database server is Standby database server. Any suggestions please...



Tom Kyte

Followup  

April 17, 2006 - 7:51 am UTC

if done correctly, it would only take minutes or less to bulk load 2 million records.

So, you need to describe HOW you are loading this.

and if you are loading and truncating, doubtful that you are updating - so pctfree=0 would be more appropriate -and perhaps even "compress", but it depends on how you load after all - which we don't know.

Very slow insert into table

April 17, 2006 - 12:32 am UTC

Reviewer: Bayaraat from Mongolia

Hi Tom,

I need your expertise in this regard.

My create table is
create table CDR00
(
CALLDATETIME DATE,
TELNO VARCHAR2(20),
CALLEDNO VARCHAR2(20),
SECONDS NUMBER,
PULSE NUMBER,
PREFIXCODEA CHAR(14),
PREFIXCODEB CHAR(14),
RATECODE CHAR(28),
TOG_AMOUNT NUMBER(20,2),
USD_AMOUNT NUMBER(20,8),
NO_TOG_AMOUNT NUMBER(20,8),
NO_USD_AMOUNT NUMBER(20,2),
TOG_TAX NUMBER(20,8),
USD_TAX NUMBER(20,8),
FNAME VARCHAR2(12),
OPERATOR VARCHAR2(10),
RESERVED VARCHAR2(10),
PRINTED NUMBER(1)
)
tablespace CDR01
pctfree 20
pctused 80
initrans 1
maxtrans 255
storage
(
initial 1M
minextents 1
maxextents unlimited
);

Every day insert into this table for 2 million and records. Response time is 12:00:00. After insert into another partition table and truncate table. I want very quickly insert million and records. Are you need some information? My oracle database server is Standby database server. This table is not use constraint and trigger.
Any suggestions please...

Moving high volumes of data from table1 to table2 in an OLTP system

May 10, 2006 - 10:54 am UTC

Reviewer: A reader

Tom,

We are looking at archiving data in a table (schema1.table1) having about 40 Million records. This table is populated continously by a java application. The objective is to move all but 5 million records(latest records) to the archive table with minimal impact to the end user. The option I am looking at is

create a different schema (schema2) on tablespace2.
create a new table (table2), structure similar to table1.
create a procedure(p_1) that will copy data from schema1.table1 to schema2.table2.
call procedure 1 from a job on a certain schedule.

Psedo code for procedure p_1
----------
insert into schema2.table2
select a.*
from schema1.table1 a
where year(a.creation_date) < 2006
and rownum < 1000
and a.id not in (select id from schema2.table2);
commit;

delete from schema1.table1 aa
where aa.id in (select bb.id from schema2.table2);
commit;
----------

Would like your expert comments on
1) Do you see any problems with the psedo code?
One of the problems I see is
After step #1 and before step #2 in the psedo code, a user may see duplicate records. Is there a way to avoid this situation?

2) Is there a better of implementing this change?

Thank you


Tom Kyte

Followup  

May 11, 2006 - 7:31 am UTC

I'd probably rather move the 5 million records to a new table.

(move the lesser of the two amounts)

time to look at partitioning for you? to avoid this in the future.



Updating a single column to 'N' in a 50 Million records table

May 15, 2006 - 5:34 pm UTC

Reviewer: A reader

Tom,

In one of our application, we have a database conversion job to update a column with a 'N' for a table that has about 50 million rows.
We are using a rollback segment which has the capacity to commit about 8 million rows at a time.
So we are running update statements to commit 7 million rows at a time between each of the 8 identical update statements.
( requires us to run the update 8 times to commit all the 50 million updates)

The time for the conversion of these 50 million rows is taking about 20 hours.

We have already explored the following CTAS method :

1. CTAS/INSERT/APPEND/PARALLEL and rename the new table to orig table but due to space constraints and this particular table is bigger in size, the DBAs say this method is not feasible due to space constraints.

So we are looking for other methods/suggestions to see if we can improve the timing.

Thank you.

Tom Kyte

Followup  

May 15, 2006 - 6:08 pm UTC

parallel update, using many rollback segments simultaneously.

I pray this column is NOT indexed.



Parallel update question

May 18, 2006 - 4:22 pm UTC

Reviewer: A reader

Tom,

Per Oracle documentation, it seems Tables must be partitioned to support parallel update and delete.

So, i tried parallel update on partitioned table as shown below but how do we confirm it really did parallel dml because after parallel update is done.. i tried to query the same table and it did not generate any error. 

I see the same behaviour on non-partitioned tables when parallel update was performed.


Parallel update statement logs are given below :

Partitioned table :
===================

SQL> update /*+ parallel (T, 3) */ T set status='NV' where status='INVALID' ;

1410 rows updated.


Execution Plan
----------------------------------------------------------
   0      UPDATE STATEMENT Optimizer=FIRST_ROWS (Cost=11 Card=138 Byte
          s=690)

   1    0   UPDATE OF 'T'
   2    1     PARTITION RANGE* (ALL)                                   :Q214000
   3    2       TABLE ACCESS* (FULL) OF 'T' (Cost=11 Card=138 Bytes=69 :Q214000
          0)



   2 PARALLEL_COMBINED_WITH_PARENT
   3 PARALLEL_TO_SERIAL            SELECT /*+ Q214000 NO_EXPAND ROWID(A1) */ A1
                                   .ROWID,A1."STATUS" FROM "T" PX_GRANULE(0, BL
                                   OCK_RANGE, DYNAMIC)  A1 WHERE A1."STATUS"='I
                                   NVALID'



Statistics
----------------------------------------------------------
         19  recursive calls
       1733  db block gets
        263  consistent gets
        168  physical reads
     351584  redo size
        854  bytes sent via SQL*Net to client
        597  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          6  sorts (memory)
          0  sorts (disk)
       1410  rows processed

IOMDV:SQL> select count(*) from T ;

    COUNT(*)
------------
       12239

1 row selected.


Non-partitioned table :
========================

update /*+ parallel(tn 3) */ tn set valid_ind='N' ;

29250 rows updated.


Execution Plan
----------------------------------------------------------
   0      UPDATE STATEMENT Optimizer=FIRST_ROWS (Cost=14 Card=29250 By
          tes=58500)

   1    0   UPDATE OF 'TN'
   2    1     TABLE ACCESS* (FULL) OF 'TN' (Cost=14 Card=29250 Bytes=5 :Q212000
          8500)



   2 PARALLEL_TO_SERIAL            SELECT /*+ Q212000 NO_EXPAND ROWID(A1) */ A1
                                   .ROWID,A1."VALID_IND" FROM "IOMDBO"."TN" PX_
                                   GRANULE(0, BLOCK_RANGE, DYNAMIC)  A1



Statistics
----------------------------------------------------------
         73  recursive calls
     143947  db block gets
        244  consistent gets
        222  physical reads
   18118440  redo size
        844  bytes sent via SQL*Net to client
        576  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          6  sorts (memory)
          0  sorts (disk)
      29250  rows processed

bala@IOMDV> select count(*) from tn ;

    COUNT(*)
------------
       29250

1 row selected.

 

Tom Kyte

Followup  

May 19, 2006 - 11:42 am UTC

that used to be true in older releases, Has not been true since 9i.


</code> http://docs.oracle.com/docs/cd/B10501_01/server.920/a96531/ch2_9ir2.htm#74432 <code>

May 19, 2006 - 3:05 pm UTC

Reviewer: A reader

Thanks for the info.



in oracle 8.1.7, However, with parallel DML on partitioned table, i should not be able to query the same table until i commit the parallel update.
But i was able to query the same table with our any error generated.

Is it the expected behaviour.

Tom Kyte

Followup  

May 20, 2006 - 4:39 pm UTC

you did not do a parallel update.

<quote src=expert Oracle database architecture>

We can actually observe the fact that there are separate independent transactions created for the parallel execution servers. We’ll use two sessions again, as before. In the session with SID=162, we explicitly enable parallel DML. PDML differs from parallel query in that regards; unless you explicitly ask for it, you will not get it:

big_table@ORA10GR1> alter session enable parallel dml;
Session altered.

The fact that the table is “parallel” is not sufficient, as it was for parallel query. The reasoning behind the need to explicitly enable PDML in your session is the fact that PDML has certain limitations associated with it, which I list after this example.

</quote>

Thank you

May 20, 2006 - 8:08 pm UTC

Reviewer: A reader


RE: Parallel update

May 23, 2006 - 8:20 am UTC

Reviewer: bakunian from OC, CA

I have table "A" 17 GB and 120 mil. rows. I needed to add columns with NOT NULL therefore I had to add column, update every row setting new column to some value, then make it NOT NULL. When I ran update it ran for almost 10 hours, then I thought to create empty dummy table "B" as select * from "A" and setting column’s default value to required value.
To my surprise when I did
INSERT /*+ parallel (b, 4) INTO b SELECT /*+ parallel (a,4) */ * FROM a
Default values were not inserted but when I issued regular insert into b statement default values appeared. Why?

Thank you for your time.


Tom Kyte

Followup  

May 23, 2006 - 8:38 am UTC

show me - you are using "*" and not listing any columns - I do not see how a conventional path insert would have used defaults.

column defaults are only used when you explicitly skip the column in the insert statement.

ops$tkyte@ORA10GR2> create table t2( x varchar2(20), y int );
 
Table created.
 
ops$tkyte@ORA10GR2> insert into t2 values ( 't2', null );
 
1 row created.
 
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> insert /*+ append */ into t (x) select 'direct path' from dual;
 
1 row created.
 
ops$tkyte@ORA10GR2> commit;
 
Commit complete.
 
ops$tkyte@ORA10GR2> insert /*+ append */ into t  select * from t2;
 
1 row created.
 
ops$tkyte@ORA10GR2> commit;
 
Commit complete.
 
ops$tkyte@ORA10GR2> select * from t;
 
X                             Y
-------------------- ----------
direct path                   0
t2
 
ops$tkyte@ORA10GR2> insert into t (x) select 'direct path' from dual;
 
1 row created.
 
ops$tkyte@ORA10GR2> commit;
 
Commit complete.
 
ops$tkyte@ORA10GR2> insert into t  select * from t2;
 
1 row created.
 
ops$tkyte@ORA10GR2> commit;
 
Commit complete.
 
ops$tkyte@ORA10GR2> select * from t;
 
X                             Y
-------------------- ----------
direct path                   0
t2
direct path                   0
t2
 
 

Parellel Update

May 23, 2006 - 10:07 am UTC

Reviewer: bakunian from OC, CA

I see so for my trick to work all I had to do is to skip columns that were supposed to have default values. Correct?
At the very beginning of this thread you gave below example but that does not really work for me so what would you do in my case where I need to update millions of rows with default values and new columns are added to a large table with NOT NULL constraint. Was my method almost correct?

ops$tkyte@ORA817DEV.US.ORACLE.COM> create table new_emp as
2 select empno, LOWER(ename) ename, JOB,
3 MGR, HIREDATE, SAL, COMM, DEPTNO
4 from emp;

Thank you,

Tom Kyte

Followup  

May 23, 2006 - 10:11 am UTC

well, since you are going to update the entire table - hence locking every row - and you in effect have a "scheduled downtime", I don't agree with you that you cannot use the create table as select. You say "it doesn't really work for you", I say "why not"

Parallel Update

May 23, 2006 - 10:34 am UTC

Reviewer: bakunian from OC, CA

Forgive me I've been up for almost 48 hours and a bit slow so using your example would it look something like below assuming every person in this table will have ename = 'Bob'

create table new_emp as
select empno, 'Bob', job, MGR, ….
From emp;

Correct?

Tom Kyte

Followup  

May 23, 2006 - 3:26 pm UTC

select empno, 'Bob' ename, job, mgr, ...
from emp


yes.

Estimate time for a create table to end

May 26, 2006 - 3:56 am UTC

Reviewer: Thierry Yen Suin from Mauritius

We once had to sort a very large (non-partitioned) table in the region of 60G.
The sorted column was Type NUMBER and indexed.
So what was/is regularly done is

create table sorted_table
as select * from unsorted_table
where col_to_be_sorted>0;

The sorts normally takes 3 hours to finish using the index on col_to_be_sorted. As I understand this create table does not need TEMP tablespace to do the sort.

Once the sort did not finish after 35hours.
Database was 8i. We stopped the create statement.
Rebuild the index (out of desperation, who knows?)
Reissued the create table. -> never finished.
Did an export/import into 9i on another server.
Created index, issued same create statement, took less than 1hr.

My question:
Is there a way to estimate the time it will take for a create table to finish based on some V$ views?
A way to see the number of rows that have been inserted/created into the temp segment of the "to be created" table?
Or anything that gives us a clue that it will take forever.

Thanks,
Thierry Yen Suin

Humm V$SESSION_LONGOPS

June 01, 2006 - 1:22 am UTC

Reviewer: Thierry Yen Suin from Mauritius

Of Course the V$SESSION_LONGOPS view!
How could I have forgotten than?

Thierry Yen Suin.

Am i missing something?

June 13, 2006 - 4:18 am UTC

Reviewer: A reader

Tom, 

I did following to calculate the redo generated with CTAS with & without nologging. But the amount of redo generated is identical & time taken is even more with nologging. Am I missing here something?

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production
PL/SQL Release 9.2.0.1.0 - Production
CORE    9.2.0.1.0       Production
TNS for Solaris: Version 9.2.0.1.0 - Production
NLSRTL Version 9.2.0.1.0 - Production


SQL> set timing on
SQL> set time on
13:25:17 SQL> drop table big_table;

Table dropped.

Elapsed: 00:00:00.67

13:25:52 SQL> select a.value
13:26:38   2  from v$mystat a , V$STATNAME b
13:26:38   3  where a.statistic# = b.statistic#
13:26:38   4  and b.name = 'redo size'  ;

     VALUE
----------
     54144

Elapsed: 00:00:00.29

13:26:38 SQL>
13:26:46 SQL> create table big_table
13:27:01   2  as select * from dba_objects;

Table created.

Elapsed: 00:00:08.56

13:27:10 SQL> select a.value
13:27:22   2  from v$mystat a , V$STATNAME b
13:27:22   3  where a.statistic# = b.statistic#
13:27:22   4  and b.name = 'redo size'  ;

     VALUE
----------
    109040

Elapsed: 00:00:00.33

13:27:23 SQL> drop table big_table;

Table dropped.

Elapsed: 00:00:00.19

13:27:57 SQL> select a.value
13:28:02   2  from v$mystat a , V$STATNAME b
13:28:02   3  where a.statistic# = b.statistic#
13:28:02   4  and b.name = 'redo size'  ;

     VALUE
----------
    134372

Elapsed: 00:00:00.36

13:28:03 SQL> create table big_table  nologging
13:28:12   2  as select * from dba_objects;

Table created.

Elapsed: 00:00:09.04

13:28:21 SQL> select a.value
13:28:46   2  from v$mystat a , V$STATNAME b
13:28:46   3  where a.statistic# = b.statistic#
13:28:46   4  and b.name = 'redo size'  ;

     VALUE
----------
    188816

Elapsed: 00:00:00.24
 

Tom Kyte

Followup  

June 13, 2006 - 12:18 pm UTC

you are in a noarchivelog mode database aren't you.


run it again, it was a fluke that it took 9 seconds.  something else was happening on your system

nologging on a CTAS in a noarchivelog mode database is a big "no-operation", CTAS is already a direct path, non-logged operation in noarchivelog mode.

You would expect the "logged" one to have generated gobs of redo (of the size of the table).  not 50k


Here we are in ARCHIVELOG MODE:

ops$tkyte@ORA9IR2> set timing on
ops$tkyte@ORA9IR2> @test
ops$tkyte@ORA9IR2> drop table t;
 
Table dropped.
 
Elapsed: 00:00:00.06
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> @mystat "redo size"
ops$tkyte@ORA9IR2> set echo off
 
NAME                      VALUE
-------------------- ----------
redo size               3568320
 
Elapsed: 00:00:00.00
ops$tkyte@ORA9IR2> create table t as select * from dba_objects;
 
Table created.
 
Elapsed: 00:00:00.53
ops$tkyte@ORA9IR2> @mystat2
ops$tkyte@ORA9IR2> set echo off
 
NAME                      VALUE DIFF
-------------------- ---------- ------------------
redo size               6998672        3,430,352
 
Elapsed: 00:00:00.01
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> drop table t;
 
Table dropped.
 
Elapsed: 00:00:00.06
ops$tkyte@ORA9IR2> @mystat "redo size"
ops$tkyte@ORA9IR2> set echo off
 
NAME                      VALUE
-------------------- ----------
redo size               7021144
 
Elapsed: 00:00:00.00
ops$tkyte@ORA9IR2> create table t NOLOGGING as select * from dba_objects;
 
Table created.
 
Elapsed: 00:00:00.47
ops$tkyte@ORA9IR2> @mystat2
ops$tkyte@ORA9IR2> set echo off
 
NAME                      VALUE DIFF
-------------------- ---------- ------------------
redo size               7088872           67,728
 
Elapsed: 00:00:00.00


<b>That yours did not generate more than 50k of redo makes me believe you are in noarchivelog mode like this:

</b>

ops$tkyte@ORA9IR2> @test
ops$tkyte@ORA9IR2> drop table t;
 
Table dropped.
 
Elapsed: 00:00:00.03
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> @mystat "redo size"
ops$tkyte@ORA9IR2> set echo off
 
NAME                      VALUE
-------------------- ----------
redo size                205344
 
Elapsed: 00:00:00.00
ops$tkyte@ORA9IR2> create table t as select * from dba_objects;
 
Table created.
 
Elapsed: 00:00:00.47
ops$tkyte@ORA9IR2> @mystat2
ops$tkyte@ORA9IR2> set echo off
 
NAME                      VALUE DIFF
-------------------- ---------- ------------------
redo size                273312           67,968
 
Elapsed: 00:00:00.01
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> drop table t;
 
Table dropped.
 
Elapsed: 00:00:00.06
ops$tkyte@ORA9IR2> @mystat "redo size"
ops$tkyte@ORA9IR2> set echo off
 
NAME                      VALUE
-------------------- ----------
redo size                295856
 
Elapsed: 00:00:00.00
ops$tkyte@ORA9IR2> create table t NOLOGGING as select * from dba_objects;
 
Table created.
 
Elapsed: 00:00:00.45
ops$tkyte@ORA9IR2> @mystat2
ops$tkyte@ORA9IR2> set echo off
 
NAME                      VALUE DIFF
-------------------- ---------- ------------------
redo size                363612           67,756
 
Elapsed: 00:00:00.03


<b>and the 9 seconds - totally because something else was happening on your system at that point that prevented you from being "fast"</b>

 

Absolutely correct as usual

June 14, 2006 - 1:18 am UTC

Reviewer: A reader

SQL> SELECT log_mode
  2  FROM v$database;

LOG_MODE
------------
NOARCHIVELOG

Thanks very much Tom. 

Merge with aggregates.

July 07, 2006 - 10:48 am UTC

Reviewer: Thiru from MT usa

Tom,

I think I posted this wrongly elsewhere!
-----------------------------------------
Will merge be a good thing to use (if so, how)when dealing with aggregates in a
multi-million records tables.

For instance:

create table source ( a varchar2(2),b number)
/
create table target ( a varchar2(2),b number,status varchar2(10))
/
insert into source values('ab',100);
insert into source values('bc',100);
insert into source values('cd',100);
insert into source values('ef',100);
insert into source values('fg',100);
insert into source values('gh',100);
insert into source values('hi',100);
insert into source values('ij',100);
insert into source values('jk',100);
insert into source values('kl',100);
insert into source values('mn',100);

insert into target (a,b) values('ab',50);
insert into target (a,b)values('ab',50);
insert into target (a,b) values('bc',50);
insert into target (a,b)values('bc',50);
insert into target (a,b)values('cd',50);
insert into target (a,b)values('cd',50);
insert into target (a,b)values('de',50);
insert into target (a,b)values('de',50);
insert into target (a,b)values('de',50);
insert into target (a,b) values('ef',50);
insert into target (a,b)values('ef',50);
insert into target (a,b)values('gh',50);
insert into target (a,b)values('hi',50);
insert into target (a,b)values('hi',50);
insert into target (a,b)values('ij',50);
insert into target (a,b)values('ij',50);
insert into target (a,b)values('jk',50);
insert into target (a,b)values('jk',50);
insert into target (a,b)values('jk',50);
commit;

I would want to update the target STATUS column to 'MCHED' if the sum(target.b)
= source.b and target.a=source.a.
In all other cases, update the STATUS column to 'UNMCHED'. The number of records
in both the tables are over 2mill.

b. Also, can this be done in a single update statment?



Tom Kyte

Followup  

July 08, 2006 - 10:52 am UTC

look elsewhere for the answer, I did it over there.

Large import question

July 12, 2006 - 3:57 pm UTC

Reviewer: Mike from Glen Mills, PA

I have visited this thread on and off for a while. I have a question regarding an apparent slowdown of inserts while importing a very large table. We are on 9.2.0.5 on Solaris 8.

I am importing a 60 million row table into our development database. (I just got more space for it, wooohooo!). Not long ago, I found a very nifty script written by Frank Naude` for monitoring import progress, as follows:

select substr(sql_text,instr(sql_text,'INTO "'),30) table_name, rows_processed, 
round((sysdate-to_date(first_load_time,'yyyy-mm-dd hh24:mi:ss'))*24*60,1) minutes,
         trunc(rows_processed/((sysdate-to_date(first_load_time,'yyyy-mm-dd hh24:mi:ss'))*24*60)) rows_per_min
  from   sys.v_$sqlarea
  where  sql_text like 'INSERT %INTO "%'
    and  command_type = 2
    and  open_versions > 0
/

When the table in question started importing, this script returned very high rows per minute values (138905), but as the import progressed, it has fallen to 33779 per minute after a few hours of inserting. I am doing the import from a command line and have the buffer set to 4m, recordlength 64k, commit=n, no statistics (will do entire schema afterward). Can you theorize as to why the performance would erode as the import of a large table progresses? Our box has 4g of memory and 4 cpus. My SGA is about 1G with a healthy cache and auto undo, other parameters would likely pass a sanity check. 


TABLE_NAME                     ROWS_PROCESSED   MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION         524272       3.8       139805

1 row selected.

TABLE_NAME                     ROWS_PROCESSED   MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION         676480       5.2       129263

1 row selected.


TABLE_NAME                     ROWS_PROCESSED   MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION         744128       6.5       114775

1 row selected.

TABLE_NAME                     ROWS_PROCESSED   MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION        1234576      12.9        96075


TABLE_NAME                     ROWS_PROCESSED   MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION        1471344      17.3        84885

                                                         

TABLE_NAME                     ROWS_PROCESSED   MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION        1877232      26.6        70484

                                                                          

TABLE_NAME                     ROWS_PROCESSED   MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION        1978704      28.7        68904

1 row selected.



TABLE_NAME                     ROWS_PROCESSED   MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION        2114000      32.2        65618

1 row selected.



TABLE_NAME                     ROWS_PROCESSED   MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION        2824304      50.4        56019

1 row selected.

SQL> /
Wed Jul 12                                                                     
                                                                           


TABLE_NAME                     ROWS_PROCESSED   MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION        3602256      78.5        45888

1 row selected.


TABLE_NAME                     ROWS_PROCESSED   MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION        4126528      97.8        42186

1 row selected.

SQL> /
Wed Jul 12                                                                     
                                                                           

TABLE_NAME                     ROWS_PROCESSED   MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION        4329472     108.2        40001

1 row selected.

TABLE_NAME                     ROWS_PROCESSED   MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION        5597872     165.7        33779
 

Tom Kyte

Followup  

July 12, 2006 - 5:19 pm UTC

see any "cannot allocate new log" messages?

what do v$session_* views return? like session_wait?

Additional Info

July 13, 2006 - 9:08 am UTC

Reviewer: Mike from Glen Mills, PA

SELECT A.USERNAME ora_user, A.OSUSER os_user, A.PROGRAM, C.EVENT||', '|| C.P1TEXT||', '||C.P2TEXT||', '||
C.P3TEXT event,C.WAIT_TIME
FROM V$SESSION A, V$PROCESS B, V$SESSION_WAIT C
WHERE A.PADDR = B.ADDR
AND A.SID = C.SID(+)
AND A.OSUSER IS NOT NULL
and a.username is not null
and c.event not like 'SQL*Net%'
ORDER BY 1,3

(I edited the output for readability)

ORA_USER OS_USE PROGRAM
-------- ------ ------------
SYSTEM oracle imp@ods-d-pr
od1.appliedc
ard.com (TNS

EVENT WAIT_TIME
---------------------------- ---------
db file sequential read, fil 0
e#, block#, blocks
V1-V3)

There are three mirrored sets of redo logs, 15om each

SELECT class, count FROM v$waitstat
WHERE class IN ('free list', 'system undo header', 'system undo block', 'undo header', 'undo block')
order by class

CLASS COUNT
------------------ ------------
free list 0
system undo block 0
system undo header 0
undo block 0
undo header 160

Rollback!!!
ROLLBACK SEGMENT WAITS for DEVL

waits gets Wait Ratio
71 2,131,551 .00003%

The import ran all night then failed this morning on an
ORA-30036: unable to extend segment by 4 in undo tablespace 'UNDOTBS'. Now I'm really mystified, because this table, with 59 million rows, is only about 3.5G in size. During previous imports, I have seen undo segments stretch up to a bit over 4G, then apparently wrap around with no consequence (there is nobody else reading or writing this table). The undotbs has a 4G file which can extend to 16G. In addition, I have seen the "used" size of the undotbs expand and shrink based on activity....last time I looked yesterday it had reached 14.5G and this morning when the job failed (it is rolling back right now) it had maxxed out the file.

Here are the undo settings:
transactions_per_rollback_segment 4
undo_management AUTO
undo_retention 16000
undo_suppress_errors FALSE
undo_tablespace UNDOTBS

Another thing, the table is NOT partitioned, which is because it was never supposed to grow this big, (bad metrics given to dba!) and there is no natural key we can depend on. We are working on an alternate way to hold this data.

I was just looking at v$undostat and noticed this:

UNXPBLKRELCNT UNXPBLKREUCNT EXPSTEALCNT EXPBLKRELCNT
------------- ------------- ----------- ------------
0 0 0 0
0 0 1 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
4608 0 10 0
10752 0 21 0
9728 0 19 0
8192 0 16 0
13312 0 19 0
11264 0 22 0
10752 0 12 0
9216 0 8 0
11264 0 8 0
10752 3 7 0
8704 0 10 0
10752 0 7 0
8704 0 6 0
13312 3 8 0
9216 0 4 0
9216 3 5 0
12800 0 4 0
6144 0 5 0
12800 0 4 0
4676 0 5 2048
10240 0 7 2560
0 0 2 7680
0 0 2 10240
0 0 1 4096

So did this undo failure occur due to numbers shown in the last column? Hmmmm. undo_retention is set for more than 4 hours, yet the import had been running at least 16 hours when it went belly up.

For now, I am going to do a separate export from production with a where clause and populate the table that way, but I would like to further understand both the original question, which is why import performance degraded as it went along, and what caused the import to blow up on undo (we have successfully imported other very large tables).






Tom Kyte

Followup  

July 13, 2006 - 9:25 am UTC

this is too much information

all I wanted to know is:

o are there any waits you see in the v$session tables
o do you have "cannot allocate new log" messages in your alert log.


160 waits is insignificant/meaningless from v$waitstat - tells us nothing. No "time" information from it and it is a count since the INSTANCE was started.

Results

July 13, 2006 - 9:59 am UTC

Reviewer: Mike from Glen Mills

select sid, event, p1text, p1, p2text, p2 from v$session_wait
/

SID EVENT P1TEXT P1 P2TEXT P2
-- --------------------------- ---------- ------------ --------- --------
1 pmon timer duration 300 0
3 rdbms ipc message timeout 300 0
4 rdbms ipc message timeout 300 0
6 rdbms ipc message timeout 180000 0
7 rdbms ipc message timeout 500 0
12 db file sequential read file# 12 block# 813986
2 db file parallel write requests 153 interrupt 0
5 smon timer sleep time 300 failed 0
9 SQL*Net message to client driver id 1413697536 #bytes 1
8 SQL*Net message from client driver id 1413697536 #bytes 1
11 SQL*Net message from client driver id 1413697536 #bytes 1
18 SQL*Net message from client driver id 1413697536 #bytes 1
22 SQL*Net message from client driver id 1413697536 #bytes 1
23 SQL*Net message from client driver id 1413697536 #bytes 1
25 SQL*Net message from client driver id 1413697536 #bytes 1

The answer to the second question is no.....logs were switching about every 15 - 20 minutes during the import.

Tom Kyte

Followup  

July 13, 2006 - 12:57 pm UTC

that is point in time, session_event shows what?

JUST for your session please. measure it, wait a bit, measure again - subtract. It'll show you what that session waited on during that interval (as you are measuring row counts - as the row counts decrease what wait events increase...)

(and this table is not indexed right... that is my assupmption)

How to monitor the table delete prograss?

July 19, 2006 - 12:10 am UTC

Reviewer: Jim

Tom,
I have a long run delete process:
delete tab where col1 < '01-May-06'
/
it takes a few hours to complete.
how can i check how many records have been deleted during the process?

TIA


Tom Kyte

Followup  

July 19, 2006 - 9:06 am UTC

that is a strange string to compare to isn't it? Because I know col1 cannot be a date since we never compare dates to strings and strings to numbers and so on.... (just a comment..)


as long as the delete uses a full tablescan, you can monitor it in v$session_longops, not by number of rows deleted, but rather by number of blocks scanned in the table.

Very confused

July 25, 2006 - 11:55 am UTC

Reviewer: Elaine from Atlanta, GA

I have been reading through this thread and saw where you said that updating a few million rows should be able to take place in minutes. I have 400 million rows daily to load. We use insert/append to do the insert (sorry, that's redundant insert/insert...).

I am having a disagreement with my boss. There are 9 indexes on the table. I maintain that as each row is appended, the indexes are updated. He maintains that the index updates do not take place until after the load is complete.

Which of us is correct? I have checked in both your books and in the Concepts manual. Someone pointed me to the following quote:
Indexes are created on one or more columns of a table. After it is created, an index is automatically maintained and used by Oracle. Changes to table data (such as adding new rows, updating rows, or deleting rows) are automatically incorporated into all relevant indexes with complete transparency to the users.

This would imply (I think) that I am right that the indexes are updated as the inserts are occurring, or am I misreading this?

Any links to either chapter numbers in your book or other documentation links greatly appreciated if this has already been covered.



Tom Kyte

Followup  

July 25, 2006 - 12:04 pm UTC

neither of you is entirely correct :)

both of you are partitially right! but i'd say he is closer to the truth.


with a direct path load like that - we write the newly added data to the table above the high water mark for the table.

We also build a "mini index" of the newly loaded data off to the side (not in the "big, existing" index yet).

At the end of the append - we merge in BULK the mini-index with the existing index.

So, the existing index is maintained automatically.
But not row by row.
It is done in bulk - efficiently - at the end of the operation.

Great answer!

July 25, 2006 - 12:51 pm UTC

Reviewer: Elaine H from Atlanta, GA

I never understood how that worked. Thanks for the excellent explanation. I still have one thing I don't understand though. If there is a unique index on the table, how does oracle compensate to make sure that the constraint is not violated if it is only building the mini index to merge later?

Also, do you know of any way to speed the load of the mini-index to the main index? Is there any way to influence the manner in which the update/insert is applied the way that you can the initial insert using hints?



Tom Kyte

Followup  

July 25, 2006 - 2:31 pm UTC

</code> http://docs.oracle.com/docs/cd/B19306_01/server.102/b14215/ldr_modes.htm#sthref1475 <code>

when it merges - it'll check things out.


your major influence will be pga memory - sort area size if using manual memory management, pga_aggregate_target elsewise.



Parallel DML and GROUP BY

September 21, 2006 - 1:29 pm UTC

Reviewer: Thiru

I do not know if this is an oracle bug or parallel processing is not going to work in this case:

alter table t parallel;
alter session enable parallel dml;
create sequence t_seq;

update
(select rkey,id from t order by id)
set rkey=t_seq.nextval;

The table contains 50mill records
What it looks like after running the above update is that the sequence used was not consistent with the order on column id.

Though I cannot run a test case at this point for your perusal, the parallel procssing looks like ordered the individual records in the splits and was allocating the sequence numbers.

for eg: the data looked like:

rkey id

100 ab
101 ab
102 bc
103 bc

..
..
500 ab

The question is: Is this the way parallel dml works or is it a bug or is something wrong with my update?

Thanks and regards




Tom Kyte

Followup  

September 22, 2006 - 2:21 am UTC

I would not expect it to be. each section was ordered nicely, but that is about it. I would expect gaps to be possible and basically rows to be randomly updated.

I would say "something wrong with the thought behind doing such an update" rather than something wrong with the update. I don't see the point in assigning a number like that at all.

September 22, 2006 - 5:43 pm UTC

Reviewer: Thiru

The reason behind doing such an update was :

The flat files that contain data is not sorted. I used external table to load this into the table and could not use a sequence and order the records while doing the load. No idea if that was possible ??

So what I understand now is that parallel dml would work if no ordering or grouping is involved. Because if grouping is involved, then it would be the same case of grouping each splits separatlely. Right?

The other option which I thought of was:
create the table through the external table load.
And then create another table CTAS using the sequence and ordering the base table.

create table t1 as select t_seq.nextval,t.id,t.c2 from t order by id

Is this a good option?

Tom Kyte

Followup  

September 24, 2006 - 1:09 pm UTC

umm, but the data is apparently already "orderable" - you just tried to use an order by to assign yet another field to order by.

so, why do you need this second field when one already apparently exists.

Delete / CTAS

September 27, 2006 - 11:25 am UTC

Reviewer: Chris from Lousiville, KY USA

Tom,
Trying to process all the great info in this thread and would like a little feedback. I have a 20 million row table that gets about 2 million rows deleted every day. The table is not partitioned (let's not get into the why). There is only one index on the table and it is the primary key. Due to the extremely critical nature of the data some folks are very nervous with a CTAS, drop/rename strategy. The database does not use locally managed tablespaces and the current process spawns about 70 threads to delete data and bring specific rollback segments online. For other reasons I won't go into here, the database is bounced every day and this delete process would ideally be compatible with running before and after the bounce (currently it must complete before they bounce). I am trying to benchmark something like this and was looking for your thoughts.

1. CTAS getting only those rows to be deleted (unrecoverable/parallel).
2. Delete x number of rows at a time in parallel.

Looks like:

CREATE TABLE mytab unrecoverable parallel 8 AS
SELECT /*+ parallel(mytab_source,8) */ column1
FROM mytab_source
WHERE ...;

DELETE /*+ parallel(mytab,8) */
FROM mytab
WHERE column1 IN (SELECT column1
FROM mytab_source
WHERE rownum < 50000);

The only two reasons for limiting # of rows deleted would be rollback space and time. Would be happy to do it all at once if this became of no concern. If the whole operation could complete in < 30 minutes that would be very acceptable.

Not exactly sure how I'd loop through 50000 at a time though without deleting from mytab_source as well. In other words after I delete the first 50000, how do I get the next 50000.

Anyhow, some questions.

1. Does this seem reasonable or would you suggest another approach (I'm sure you will)?
2. Not much experience with parallel DML, does the above work as is or do I need to "alter session enable parallel dml" first? I though I had to, but it seemed to run really quick without doing this.
3. I don't believe the table currently has a value specified for freelists, should it, and if so what should it be?
4. Is the delete syntax with parallel correct?
5. Might I be better off with a CTAS for the 18 million rows I want to keep, then build an index in parallel (the first time I tried with logging and not in parallel on 11 million rows it took 34 minutes) and then rename/rename? To appease those that are fearful of dropping the old table I could keep the last version with a different name if there is sufficient space keeping a rolling cycle of two tables.
6. How would you suggest doing the delete x number of rows at a time?
7. What is the difference between unrecoverable and nologging on the CTAS?
8. If you use a drop/rename or rename/rename, packages will go invalid when the original table disappears. All code should recompile itself on first use, correct? Am I sure I won't get "invalid state of packages" on the first attempt, and then a successful use on the next attempt?

Tom Kyte

Followup  

September 27, 2006 - 3:51 pm UTC

laughing out loud, but limiting the number of rows deleted - you increase the runtime (which seems to be the converse of what you are trying to achieve)


have you tried "create table as select ROWS TO KEEP", "index" (all of which will require no undo and if you like, no redo), drop and rename?

how does IT perform compared to deleting 10% of the data?

September 27, 2006 - 4:54 pm UTC

Reviewer: Alexander the ok

Why does limiting the number of rows increase runtime?

Tom Kyte

Followup  

September 27, 2006 - 5:33 pm UTC

eh? not sure what you are refering to exactly here?

Your last follow up

September 27, 2006 - 6:09 pm UTC

Reviewer: Alexander the ok


But, I've got a LONG column ...

October 11, 2006 - 1:00 pm UTC

Reviewer: Matt from Lyon, FRANCE


that is used by everyone's favourite CRM application. I've got to get rid of 38 million rows of a 40 million row table. If I understand correctly the conversion from LOB to LONG is a no-no (therefore cannot do a CTAS and then convert back).

So ... how does one delete 90 odd % of a table with a LONG column?


Tom Kyte

Followup  

October 11, 2006 - 4:03 pm UTC

sorry, no good answer for you - longs prevent most things.

LONG and a direct path insert

October 16, 2006 - 5:35 am UTC

Reviewer: Matt from Lyon, FRANCE

Tom,

Apologies for pestering you still, but does having a LONG column also count out the use of a direct path insert?

I'm thinking, create another table for the data we want to keep, do the direct path insert, drop the 'old' table and then rename.


Tom Kyte

Followup  

October 16, 2006 - 7:53 am UTC

you cannot insert a long using insert as select, so the question is moot since you cannot use insert with a "as select" and a long.

Application to insert/copy?

November 01, 2006 - 2:16 pm UTC

Reviewer: Basil

I've inherited a database design where we have the need to make copies of rows in one particular table. The system more or less looks like:
drop table c;
drop table u;
drop table pt;
drop table ans;

create table c (
c_id NUMBER PRIMARY KEY NOT NULL,
name VARCHAR2(500));
create table u (
user_id NUMBER PRIMARY KEY NOT NULL,
name VARCHAR2(500));
create table pt (
pt_id NUMBER PRIMARY KEY NOT NULL,
name VARCHAR2(500));
create table ans (
ans_id NUMBER PRIMARY KEY NOT NULL,
name VARCHAR2(500));

create table rp
(rp_id NUMBER PRIMARY KEY NOT NULL,
ans_id NUMBER NOT NULL,
pt_id NUMBER NOT NULL,
c_id NUMBER NOT NULL ,
date_value DATE,
text_value VARCHAR2(4000),
numeric_value NUMBER,
addon_ids VARCHAR2(4000),
cr_dt DATE DEFAULT SYSDATE NOT NULL,
cr_user_id NUMBER NOT NULL,
upd_dt DATE DEFAULT SYSDATE NOT NULL,
upd_user_id NUMBER NOT NULL,
CONSTRAINT rp_c_fk FOREIGN KEY (c_id)
REFERENCES c (c_id) ENABLE,
CONSTRAINT rp_CRUSER_FK FOREIGN KEY (CR_USER_ID)
REFERENCES u (user_id) ENABLE,
CONSTRAINT rp_PRICETBL_FK FOREIGN KEY (pt_id)
REFERENCES pt (pt_id) ENABLE,
CONSTRAINT rp_SRVYRESP_FK FOREIGN KEY (ans_id)
REFERENCES ans (ans_id) ENABLE,
CONSTRAINT rp_UPDUSER_FK FOREIGN KEY (upd_user_id)
REFERENCES U (USER_ID) ENABLE
) ;
CREATE INDEX rp_ac_id ON rp(c_id, ans_id);
CREATE INDEX rp_idx ON rp(ans_id, pt_id, c_id);
CREATE INDEX rp_crsidx ON rp (c_id, ans_id, pt_id );
-- rp_idx or rp_crsidx should probably go away, to minimize duplication
CREATE INDEX rp_pt_idx ON rp(pt_id);


The table RP has about 30 million rows at present. It is unpartitioned. The use of surrogate keys everywhere makes it difficult to find a suitable key for partitioning. When we do a copy, about 600,000 rows (in one particular bad case) need to be duplicated, adjusting the ans_id, c_id, and pt_id values to new ones.

To accomplish the mapping, I created some temporary tables:
create global temporary table c_map (
previd NUMBER PRIMARY KEY NOT NULL, newid NUMBER )
on commit preserve rows;
create global temporary table pt_map (
previd NUMBER PRIMARY KEY NOT NULL, newid NUMBER )
on commit preserve rows;
create global temporary table ans_map (
previd NUMBER PRIMARY KEY NOT NULL, newid NUMBER )
on commit preserve rows;

As I generate new entries for c, pt, and ans, I do multitable inserts, so that the old and new ids get stored in the temp tables. For example:
INSERT ALL
INTO c_map(previd, newid)
VALUES (c_id, c_seq.NEXTVAL)
INTO c(c_id, name)
VALUES (c_seq.NEXTVAL, name)
SELECT c_id, name FROM c WHERE <criteria>;

My thought was then to do something like:
insert into rp (
rp_id, ans_id, pt_id, c_id,
date_value, text_value, numeric_value, addon_ids,
cr_dt, cr_user_id, upd_dt, upd_user_id )
SELECT rp_seq.NEXTVAL,
ans_map.newid, pt_map.newid, c_map.newid,
date_value, text_value, numeric_value, addon_ids,
cr_dt, cr_user_id, upd_dt, upd_user_id )
FROM rp
JOIN ans_map ON rp.ans_id = ans_map.previd
JOIN pt_map ON rp.pt_id = pt_map.previd
JOIN c_map ON rp.c_id = c_map.previd;

On top of the strange execution plans I'm getting for the SELECT portion (which is doing full scans of rp with its 30 million rows, in spite of indexes that should help), the actual insert is taking a REALLY long time.

Can this UPDATE approach be modified to improve the INSERT performance in my scenario?

Oh yes, this is on a 9iR2 system.

How do hints work

December 05, 2006 - 2:03 pm UTC

Reviewer: Andrew from MI, USA

From time to time, developers tell me that the hints they use to influence execution path, like INDEX, do not work and there must be a bug in Oracle. I tell them that sometimes the hint does not take because a HINT is a hint, not a directive and that they work by artificially changing the cost of the query to favor the HINTed path. The developers insist that HINTs should always be taken because they are directives. I looked through the Oracle documentation, and it consistently refers to hints as instructing or directing the optimizer to take a certain path, rather than favoring the hinted path.

I have found no documentation about how hints work internally. Can you tell me where to find information about how these 'execution path' hints work?

Tom Kyte

Followup  

December 06, 2006 - 9:29 am UTC

they can only be obeyed when they make sense.

give me an example and I'll tell you what mistake your developers made.

Well, in addition to the mistake of hinting queries at the drop of a hat that is.


Using hints is a dicey proposition - for an emergency bug fix of one or two queries - sure, as a tuning tool - I'd avoid it.



ops$tkyte%ORA10GR2> create table t ( x int, y int );

Table created.

ops$tkyte%ORA10GR2> create index t_idx on t(x);

Index created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> @at
ops$tkyte%ORA10GR2> column PLAN_TABLE_OUTPUT format a72 truncate
ops$tkyte%ORA10GR2> set autotrace traceonly explain
ops$tkyte%ORA10GR2> select /*+ index( t t_idx ) */ * from t order by x;

Execution Plan
----------------------------------------------------------
Plan hash value: 961378228

------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    26 |     3  (34)| 00:00:0
|   1 |  SORT ORDER BY     |      |     1 |    26 |     3  (34)| 00:00:0
|   2 |   TABLE ACCESS FULL| T    |     1 |    26 |     2   (0)| 00:00:0
------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement

<b>the index CANNOT be used in this case, X is nullable - it is not possible to use the index on X since any row with X is null would not be in that particular index</b>

ops$tkyte%ORA10GR2> alter table t modify x not null;

Table altered.

ops$tkyte%ORA10GR2> select /*+ index( t t_idx ) */ * from t order by x;

Execution Plan
----------------------------------------------------------
Plan hash value: 3778778741

------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU
------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     1 |    26 |   827   (1
|   1 |  TABLE ACCESS BY INDEX ROWID| T     |     1 |    26 |   827   (1
|   2 |   INDEX FULL SCAN           | T_IDX |     1 |       |    26   (0
------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement

ops$tkyte%ORA10GR2> set autotrace off

<b>and now it can be</b>


directives can only be followed when they make sense.  If I gave you a directive to levitate six feet off of the floor - you would not be able to do that 

Basic how to do simple update in minutes rather then hours

January 02, 2007 - 11:21 am UTC

Reviewer: Matthew from New Orleans. LA

Tom,
Earlier in this thread you demonstrated how you were able to update a million rows in just a few minutes on your laptop.
I am currently developing on a system where it takes over 2 hours to update a single column for 757474 rows, and I am beyond frustrated. I can do a CTAS for millions of rows with a complex sql join of 5 tables in a couple of minutes by modifying my session to use a large sort area and hash area size, but I cannot do a simple update statement (update t set colunmX = 'C') in under a 2 hours.

This is unacceptable as I am doing ETL and my load window needs to be within a 4 hour window. Having an update of a single table take 2 hours is a application killer. The db is at 10.2, but unfortuantely the DBA's are not.

My question is what low hanging fruit should I look for in terms of speeding this processing up, and how can I prove / debug future problems of this nature.

The following the explain plan and autotrace of the update statement (working on getting access to the trace files on the server):
idl@DEVAEETL> explain plan for
2 update idl.srvc_mbr_prmry_dim
3 set current_flag ='I';

Explained.

idl@DEVAEETL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 153652758

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 755K| 1475K| 24004 (4)| 00:04:49 |
| 1 | UPDATE | SRVC_MBR_PRMRY_DIM | | | | |
| 2 | TABLE ACCESS FULL| SRVC_MBR_PRMRY_DIM | 755K| 1475K| 24004 (4)| 00:04:49 |
-----------------------------------------------------------------------------------------

9 rows selected.

idl@DEVAEETL> set autotrace on
idl@DEVAEETL> update idl.srvc_mbr_prmry_dim
2 set current_flag ='I';

757474 rows updated.


Execution Plan
----------------------------------------------------------
0 UPDATE STATEMENT Optimizer=ALL_ROWS (Cost=24004 Card=755486 Bytes=1510972)
1 0 UPDATE OF 'SRVC_MBR_PRMRY_DIM'
2 1 TABLE ACCESS (FULL) OF 'SRVC_MBR_PRMRY_DIM' (TABLE) (Cost=24004 Card=755486
Bytes=1510972)





Statistics
----------------------------------------------------------
960 recursive calls
922410 db block gets
865720 consistent gets
133717 physical reads
253701772 redo size
447 bytes sent via SQL*Net to client
469 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
11 sorts (memory)
0 sorts (disk)
757474 rows processed

Tom Kyte

Followup  

January 04, 2007 - 9:55 am UTC

that current_flag is not indexed correct?

trace it - see what the major waits in fact are.

large update for a table

January 04, 2007 - 5:46 pm UTC

Reviewer: ozjbrown from Canada

Hi Tom,

I have a student_achievement fact table (over 4 million rows) and need to update a single column in the record for every record contained in this table. In order to determine the value updated, I have a temporary student_school table (approximately 2 million rows) that contains the value for that the column should be updated based upon matching criteria. The update to the fact table takes over 6 hours to complete.

The student_achievement fact table has the following columns -

student_achievement_key integer (pk)
school_year varchar2(8)
student_key integer
course_key integer
mark varchar2(5)
school_key integer

The student_school temporary table has the following columns -

school_year varchar2(8)
student_key integer
school_key integer

A relationship to the student_school to student_achievement is one-to-many. Not every student_school record has a record in the student_achievement fact table. And not every student_achievement fact record will have a record in the student_school table.

The logic used in my PL/SQL procedure to update all records in the student_achievement fact table is as follows -

cursor school_upd_curs is
select *
from student_school a
where exists (select 1 from student_achievement b
where a.student_key = b.student_key
and a.school_year = b.school_year)
order by student_key, school_year;

begin
open school_upd_curs;
loop
fetch school_upd_curs bulk collect
into col... limit 1000;
exit when school_year.count = 0;

for i in school_year.first .. school_year.last loop
if first record fetched for the student then
update student_achievement
set school_key = fetched school_key
where school_year <= fetched school_year;
elsif last record fetched for student then
update student_achievement
set school_key = fetched school_key
where school_year >= fetched school_year;
else
update student_achievement
set school_key = fetched school_key
where school_year = fetched school_year;
end if;
end loop;

commit;

end loop;
end;

Is there a way to speed the update?

Thanks.

Tom Kyte

Followup  

January 05, 2007 - 9:43 am UTC

update the join of the tables.


update (select column_to_be_updated, value_to_update_to
from t1, t2
where (join) )
set column_to_be_updated = value_to_update_to;

trace for update that takes 2 hours

January 09, 2007 - 12:54 pm UTC

Reviewer: Matthew from New Orleans

Tom,
Sorry that this took so long to get to you. Had to wait for the DBA's to pull the trace file from the development system for me.

The column being updated is not indexed.

Here is the trace that you requested. Please let me know if you need more information.


update idl.srvc_mbr_prmry_dim
set current_flag ='C'

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.01 0 0 0 0
Execute 1 152.68 6057.11 240133 978321 917824 757474
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 152.69 6057.12 240133 978321 917824 757474

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 68 (IDL)

Rows Row Source Operation
------- ---------------------------------------------------
0 UPDATE SRVC_MBR_PRMRY_DIM (cr=973072 pr=232705 pw=0 time=2145490920 us)
1547044 TABLE ACCESS FULL SRVC_MBR_PRMRY_DIM (cr=978249 pr=237879 pw=0 time=3465511263 us)


Rows Execution Plan
------- ---------------------------------------------------
0 UPDATE STATEMENT MODE: ALL_ROWS
0 UPDATE OF 'SRVC_MBR_PRMRY_DIM'
1547044 TABLE ACCESS MODE: ANALYZED (FULL) OF 'SRVC_MBR_PRMRY_DIM'
(TABLE)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file scattered read 10919 57.93 5622.86
db file sequential read 10192 8.05 218.19
log buffer space 6 0.98 2.94
log file switch completion 12 0.39 3.54
latch: shared pool 5 0.02 0.06
latch: cache buffers lru chain 3 0.00 0.00
latch: cache buffers chains 1 0.00 0.00
free buffer waits 10 0.01 0.15
buffer exterminate 57 0.98 56.17
latch free 1 0.00 0.00
SQL*Net message to client 1 0.00 0.00
SQL*Net message from client 1 0.03 0.03
********************************************************************************


Any help that you can give me would be appreciated as the DBA's and I are stumped.




Tom Kyte

Followup  

January 11, 2007 - 9:38 am UTC

Elapsed times include waiting on following events:
Event waited on                  Times  Max. Wait Total Waited
----------------------------------------  Waited ---------- ------------
db file scattered read              10919    57.93    5622.86 


seems rather obvious that you have a potentially serious IO issue there doesn't it? almost a minute for at least one of those IO's to return??

optimize update millions of lobs

January 10, 2007 - 10:18 am UTC

Reviewer: Wolfgang from Austria

hi tom,
every year i have to reorganize my picture table (500GB). generealy i do this like this way:
1. make everything offline
2. CREATE TABLE dbildtmp AS SELECT cid, bildtn, text FROM dbild ;
3.) ALTER TABLE DBILD SET UNUSED (text);
ALTER TABLE DBILD SET UNUSED (bildtn);
4. ALTER TABLE DBILD DROP UNUSED COLUMNS;
5. ALTER TABLE dbild ADD (bildtn BLOB) LOB (bildtn)
STORE AS ( DISABLE STORAGE IN ROW TABLESPACE NGEN_2K_LOB );

ALTER TABLE dbild ADD (text CLOB)
LOB (text) STORE AS ( DISABLE STORAGE IN ROW TABLESPACE NGEN_2K_LOB );

6. update the new created collumns:

DECLARE
CURSOR cur IS SELECT * FROM dbildlobs;
v_anz NUMBER := 0;
BEGIN
FOR rec IN cur LOOP
UPDATE DBILD
SET bildtn = rec.bildtn , text = rec.text
WHERE cid = rec.cid;
IF v_anz = 1000 THEN
COMMIT;
v_anz := 0;
END IF;
v_anz := v_anz +1;
END LOOP;
COMMIT;
END;
/

7. make everything online

please could you tell me how i can reduce the runtime from point 2 and 4 and particularly point 6 ?
i tryed to modify the procedure with bulk collect and forall. although it doesnt work faster ( 6 hours )

here is my modified procedure:
CREATE OR REPLACE PROCEDURE Update_Dbild2 IS

TYPE t_text IS TABLE OF DBILDLOBS.TEXT%TYPE INDEX BY BINARY_INTEGER;
TYPE t_bildtn IS TABLE OF DBILDLOBS.bildtn%TYPE INDEX BY BINARY_INTEGER;
TYPE t_cid IS TABLE OF DBILDLOBS.cid%TYPE INDEX BY BINARY_INTEGER;
l_text t_text;
l_bildtn t_bildtn;
l_cid t_cid;
CURSOR cr_rec IS SELECT cid,bildtn,TEXT FROM DBILDLOBS WHERE ROWNUM < 1000;

BEGIN
OPEN cr_rec;
FETCH cr_rec BULK COLLECT INTO l_cid,l_bildtn,l_text;
CLOSE cr_rec;

FORALL i IN l_cid.FIRST .. l_cid.LAST
UPDATE DBILD
SET TEXT = l_text(i), bildtn = l_bildtn(i)
WHERE cid = l_cid(i);
END;
/


Large Update on table

January 10, 2007 - 10:28 pm UTC

Reviewer: ozjbrown from Canada

Hi Tom,

I am lost as to what you are recommending I am changing in my procedure that uses the bulk fetch and loops through to update another table with the value.

Would it be possible to use my example as to where I need to make the change?

Thanks.

Performance Issue!! Commit Strategy

January 11, 2007 - 9:41 am UTC

Reviewer: Prashant Pathak from CA,US

Hi Tom,

I have a table which contains following columns
ALLOY itm OpSeq DEPT DAYS offset grp_id
Alloy 102618 9999 PROC 0 0 518346
Alloy 102618 180 SGP 5 0 518346
Alloy 102618 170 QA 1 0 518346
Alloy 102618 160 LAB 0 0 518346
Alloy 102618 150 TRP 1 0 518346
Alloy 102618 140 QA 1 0 518346
Alloy 102618 130 QA 1 0 518346
Alloy 102618 120 WP 5 0 518346
Alloy 102618 110 BA 2 0 518346
Alloy 102618 100 TRP 1 0 518346
Alloy 102618 90 QA 1 0 518346
Alloy 102618 80 QA 1 0 518346
Alloy 102618 70 QA 1 0 518346
Alloy 102618 60 MW2 3 0 518346
Alloy 102618 50 MW1 0 0 518346
Alloy 102618 40 M2M 0 0 518346
Alloy 102618 30 M2M 3 0 518346
Alloy 102618 20 PROC 1 0 518346
Alloy 102618 10 PROC 0 0 518346
Alloy 102963 9999 PROC 0 0 518346
Alloy 102963 90 CHP 1 0 518346
Alloy 102963 80 SG 5 0 518346
Alloy 102963 70 CHP 1 0 518346
Alloy 102963 60 PM 6 0 518346
Alloy 102963 50 PM 6 0 518346
Alloy 102963 40 TRP 1 0 518346
Alloy 102963 30 AIM 0 0 518346
Alloy 102963 20 AIM 0 0 518346
Alloy 102963 10 PROC 0 0 518346


Now i have to update the table for offset column.
starting with opseq 9999 and deduct days from previous row and updated offset column

so for opseq 9999 at first step it will be 0

ALLOY itm OpSeq DEPT DAYS offset grp_id
Alloy 102618 9999 PROC 0 0 518346
Alloy 102618 180 SGP 5 0-5=-5 518346
Alloy 102618 170 QA 1 -5-1=-6 518346
Alloy 102618 160 LAB 0 -6-0=-6 518346
Alloy 102618 150 TRP 1 -6-1=-7 518346
.
.
.


so i have to do this operation based on grp_id and this table has about 50,000 rows.
So my question is sholud i commit the transaction for each grp_id and go forward or should finish for all and at the end commit.

becase currently i am doing all deduction and update and after that only one commit and which is taking long time.

Thanks in advance

Re: update taking over 2 hours

January 11, 2007 - 2:05 pm UTC

Reviewer: Matthew from New Orleans

"Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file scattered read 10919 57.93 5622.86


seems rather obvious that you have a potentially serious IO issue there doesn't it? almost a minute for at least one of those IO's to return?? "

I agreee that they do have an IO problem, the SAN is set as Raid 5. I do have a couple of follow up questions in regards to this and hope you can provide some insight:

On the Trace file execution plan it shows:
Rows Execution Plan
------- --------------------------------------------------
0 UPDATE STATEMENT MODE: ALL_ROWS
0 UPDATE OF 'SRVC_MBR_PRMRY_DIM'
1547044 TABLE ACCESS MODE: ANALYZED (FULL)
OF 'SRVC_MBR_PRMRY_DIM' (TABLE)
1. The actual update statement only updated 757474 rows. Why does the execution plan show over twice the number of rows then what were actually updated?
2. There where 2 db read wait events with similar number of event counts:
db file scattered read 10919 57.93 5622.86
db file sequential read 10192 8.05 218.19

for the single update statement of the table.
Could this be caused by the width of the table making the actual full table scan into the buffer cache making multiple passes over the disc. The first pass using a sequential read and the second or subsequent passes doing a scattered read to match the data in the buffer to the data on the disc?
3. Would it be worth testing the following scenerios to see if I can reduce db scatter read:
a. Creating the very wide table with a larger block size and own SGA buffer size?
b. Increasing the db_file_multiblock_read_count at the session level to higher then 32?

Appreciate your input and look forward to your next book.

plsql is faster then sql

January 22, 2007 - 2:18 pm UTC

Reviewer: A reader

Hi Tom.
whats wrong with my database.
it seems that pl/sql is faster then sql

SQL> create table test nologging as select * from dba_objects;

Table created.

Elapsed: 00:00:01.53
SQL>
SQL> create index test_pk on test(object_id);

Index created.

Elapsed: 00:00:00.41
SQL>
SQL>
SQL>
SQL> declare
2 l_icnt number := 0;
3 l_ucnt number := 0;
4 begin
5 for x in ( select * from test )
6 loop
7
8 update test set status = 'ok' where object_id= x.object_id;
9 l_ucnt:=l_ucnt+1;
10
11 end loop;
12 dbms_output.put_line ('rows updated :'||l_ucnt);
13 end;
14 /
rows updated :46087

PL/SQL procedure successfully completed.

Elapsed: 00:00:11.20
SQL>
SQL>
SQL> commit;

Commit complete.

Elapsed: 00:00:00.05
SQL>
SQL> update test set status='ok';

46087 rows updated.

Elapsed: 00:00:17.80
SQL>
SQL> commit;


January 23, 2007 - 2:27 pm UTC

Reviewer: A reader

Why do you think something is wrong with your data base? I tried your example in the same order and the SQL test ran faster than the PL/SQL test, but I would expect that because the SQL test has the benefit of the now pre-loaded buffers. Did you run the test in the order as presented? Was your database busy with other things at the time of the test, or was it dedicated to your test? Same question can be asked of the server and the disk system. There can be a lot of variables that affect results like this, especially for such a small, short test.

Updating 1M records through bulk update take long time

May 11, 2007 - 9:56 am UTC

Reviewer: RK from chennai,India

I have an update statement provided below which is updating 1 million records
in lead_master table joining with a customer_table having around 5 million records.

This is taking more than 2 hrs. I could not figure out the reason.There are no sessions running parallely.
UPDATE lead_master  
SET customer_gender_cd = (
          SELECT 
          DECODE (customer_gender_cd, 
             NULL, NULL,
             '0' ||customer_gender_cd) FROM customer_table
          WHERE customer_table.customer_id = lead_master.customer_id   
  )              
  WHERE  lead_master.customer_id  IS NOT NULL;


Tom Kyte

Followup  

May 11, 2007 - 11:44 am UTC

because you may well be executing that subselect 1,000,000 times

you don't say if this is indexed
what release
what version
what plan
what anything really.

update ( select a.customer_gender_cd a_cd,
                b.customer_gender_cd b_cd
           from lead_master a, customer_table b
          where a.customer_id = b.customer_id(+) )
  set a_cd = nvl2( b_cd, '0'||b_cd, null );


outer join if and only if necessary of course.


Updating 1M records through bulk update take long time

May 11, 2007 - 9:56 am UTC

Reviewer: RK from chennai,India

I have an update statement provided below which is updating 1 million records
in lead_master table joining with a customer_table having around 5 million records.

This is taking more than 2 hrs. I could not figure out the reason.There are no sessions running parallely.
UPDATE lead_master  
SET customer_gender_cd = (
          SELECT 
          DECODE (customer_gender_cd, 
             NULL, NULL,
             '0' ||customer_gender_cd) FROM customer_table
          WHERE customer_table.customer_id = lead_master.customer_id   
  )              
  WHERE  lead_master.customer_id  IS NOT NULL;


May 14, 2007 - 10:08 am UTC

Reviewer: RK from chennai

Oracle Release is Oracle9i Release 9.2.0.4.0 - 64bit Production
The suggested modified SQL was much useful.
It got executed in 6 minutes.

Thanks a lot.

May 21, 2007 - 8:01 am UTC

Reviewer: sammy from albany, ny usa

Hi Tom,

I am trying to duplicate your May 11th followup.

I am getting the following error...ORA-01779: cannot modify a column which maps to a non key-preserved table.

UPDATE (select a.oldCol a_oldCol,
b.newCcol b_newCol
from table1 partition(part1) a,
table2 b
where a.id = b.id)
SET a_oldCol = b_newCol

b.id is indexed .
a.id is indexed (local).

I also want to use the nvl2 function you use¿can you please explain
nvl2( b_cd, '0'||b_cd, null )

as opposed to...

nvl2( b_cd, b_cd, null ) ??

Thanks in advance for you advice.

I am using Oracle9i Enterprise Edition Release 9.2.0.6.0 - 64bit Production


Tom Kyte

Followup  

May 21, 2007 - 11:10 am UTC

give full example - you need a CONSTRAINT (primary key or unique) on table2(id)

Direct-path INSERT and TEMP Tablespace

June 06, 2007 - 12:23 pm UTC

Reviewer: A reader

Hi Tom,

I am using direct-path INSERT to copy the contents of source_table to destination_table.

When I enable PDML and the INSERT runs in parallel, I observe the use of sort_usage, temp tablespace filling up and, at the same time, nothing being written to the tablespace containing the destination table. Eventually, temp tablespace runs out of space.

However, when I don't enable PDML and the INSERT runs in serial, Oracle starts writing to the temp segments in the destination table's tablespace as soon as the source result set is available. In this scenario, it never uses any sort segments in temp tablespace.

Why is temp tablespace being used during PDML?

As you can see from the query, there are no sorts or joins -- it's just a straight select *. There are no indexes on the destination_table. I thought PDML would work similarly to serial DML (as far as space allocation goes). Shouldn't each parallel execution sever just pick a partition in destination_table start allocating extents in the destination_table's tablespace.

Here is what I run:

alter table destination_table nologging;
insert /*+ append */ into destination_table
select /*+ parallel(s 1) full(s) */ * from source_table@link s;
commit;


The same behavior can be observed when both tables are in the same database (and the select doesn't use the link).

Environment:
Oracle 9.2.0.6
source_table and destination_table are partitioned
source_table is ~400GB in size
temp tablespace is 40GB
there are no indexes on destination_table

Thank you in advance for your help.


Tom Kyte

Followup  

June 06, 2007 - 9:22 pm UTC

full disclosure - show the plans, show the queries you use to monitor this, show everything we need to see what you see.

and you do know that the dblink will serialize right, so the parallel is extremely questionable - you have a parallel query getting serialized over the dblink to return the data.

Deleting 1-2 million of records from 300 millions records

June 20, 2007 - 8:27 pm UTC

Reviewer: Franklin from Aus

Hi Tom,

In one of my project I have a table which contains more than 400 million records, everyday one of my procedure will execute which will delete 1 million records and insert new 1 million records, some time it may no delete any single records, but insert fresh 1 million records, I have tryied no. of way but performance is vary bad, can u guide me.

My table : orders having columns
ord_id,
inv_id,
inv_date
col1,
col2,
...

my delete will be based on ord_id, inv_id, inv_date

delete from orders a
wHERE NOT EXISTS (SELECT 1 FROM ordertemp b
WHERE a.ord_id = b.ord_id
and a.inv_id = b.inv_id
and a.inv_date between b.sdate and b.edate)

My orders tables generated all records based on ordtemp which contains sdate and edate, now suppose this period has changed for any order then i have to delete this data and insert data with new period..

Tom Kyte

Followup  

June 21, 2007 - 10:23 am UTC

what indexing strategies have YOU tried.

what is "performance is very bad" - what are you observing and what are your expectations.

have you considered MERGE instead of delete+insert (eg: update existing, insert new)

"insert into" using a cursor

August 17, 2007 - 5:44 pm UTC

Reviewer: ht from ca

Hi Tom,
I took your advice and, instead of an update that ran for days, am using "create table as select * from".

I'm also using pl/sql with a cursor to insert additional rows into the new table and am wondering if there is a more efficient way to perform the insert. The pl/sql is trying to bring over the bare minimum of recs by first finding a "down" rec, getting only the next "up" rec and inserting it:

>declare
2
3 cursor c1 is
4 select
5 id,op,db
6 from
7 tbl1
8 where
9 status='down'
10 order by
11 id,
12 op,
13 db;
14
15 begin
16
17
18
19 for x in c1 loop
20
21 insert into tbl2
22 (
23 select
24 *
25 from
26 tbl1
27 where
28 id=
29 (
30 select min(id)
31 from
32 tbl1
33 where
34 id>x.id
35 and op=x.op
36 and status='up'
37 and db=x.db
38 )
39 );
40
41
42
43 end loop;
44
45 end;
46
47 /

Tom Kyte

Followup  

August 22, 2007 - 9:48 am UTC

would need to know more, about the data.

if you are trying to get the next "up" record after each "down" record - next being assigned by ordering by ID for each op/db - there are better ways, sure
insert into tbl2
select * 
  from (
select tbl1.*, 
       lag(status) over (partition by op, db order by id) last_status 
  from tbl1
 where status in ( 'up', 'down' )
       )
 where status = 'up' and last_status = 'down'


would get the "next up record after each down record by OP and DB" for example


Works great - should an index be created?

August 22, 2007 - 3:16 pm UTC

Reviewer: ht from california

Tom,
As always, you helped tremendously. My process is taking 3 minutes instead of 2 hours. Given I'm new to "lag" and "partition" statements, would adding an index speed up the process? If yes, which columns would you index on? I know full table scans are not a bad thing - just wondering if I can get tune the query further with indexes since I'll be running this 200+ times.
Tom Kyte

Followup  

August 23, 2007 - 10:44 am UTC

unless the 'up' and 'down' records represent a tiny portion of ALL records - no index, we want a nice juicy full scan.

do they?

why are you running this 200 times?

once for each set of recs

August 24, 2007 - 4:33 pm UTC

Reviewer: ht from california

Tom,
Thanks for the advice. The fts is appropriate in this case and I have to perform the operation 200+ times since I was testing only a subset of the data. In reality, I just removed a predicate to operate on all the recs and it works great!

How to Load (6M+ Records x 300 Values ) between Databases

August 30, 2007 - 12:04 pm UTC

Reviewer: Saravanan Ramalingam from Bangalore, INDIA

Hi Tom,

I have a requirement where I need to load around 6M+ records on daily basis from source database on which every record will hold a minimum of 300 different values. (Source DB will throw only 6 values and the others are permutation and combination of them which will be calculated on the fly). This will go up to 20 Million records per day in future.

The constraints are:
1. Currently I receive 300 different values to be stored (Per Record), which is likely to be increase to 1000 and max up to 25000 different values. This value is for only one record and the total number of records which I receive is around 6M per day.

2. As per the SLA the load window is one hour

3. I need to retain 10 years of data inside active database (10Yrs * 365 Days * 6M Records) to be stored and (10 Yrs * 365 Days * 6M Records * 300 Different Values) needs to be holded

Also I would like to know how to take care of this from the Performance, Storage and Hardware stand point for Oracle.

I have tried all theoretical solutions which are available on the net, they speak only about partitioning, compressing. But I have not found a solution to handle 300 columns (which can be extended up to 10000 in future).



Thanks in Advance,

Regards,

Saravanan Ramalingam
Tom Kyte

Followup  

September 04, 2007 - 5:38 pm UTC

by 300 different values, do you mean "columns", "attributes"?

loading 20 million records will take minutes.


you will want to utilize partitioning to permit you to age the data out over time.


I see you meant 300 columns, which can theoretically go to 1000 but will never go to 25,000. The maximum number of attributes in a table is 1,000.


but - why is 300 columns "a sticking point", what are you looking for to address that? so what if you have 300 columns - why is that relevant to loading data?

is this practible ?

September 01, 2007 - 7:28 am UTC

Reviewer: Karin from germany

Hallo Tom,

do you think, this is still the better way in our case ? the table contains 40 million sentences, has got a lot of foreign keys and we need to update a numeric field from n to n+1 for half of the sentences. And we have got a partitioned table and a partitioned index on that table.

Thanks for your respond, I already got a lot out of your answers.

Karin


Tom Kyte

Followup  

September 05, 2007 - 12:42 pm UTC

I don't know what a sentence is in this context?

sorry, language problem ;-))

September 12, 2007 - 9:56 am UTC

Reviewer: A reader

sentence = row
Tom Kyte

Followup  

September 15, 2007 - 4:41 pm UTC

go ahead and update it?

you wrote:

...
the table contains 40 million sentences,
has got a lot of foreign keys and we need to update a numeric field from n to n+1 for half of the
sentences. And we have got a partitioned table and a partitioned index on that table.
....


40 million rows - understood.

foreign keys - so what? You don't relate that to this column at all, so not sure what relevance it has.

partitioned - again, not sure of the relevance

indexes - you don't say that THIS column is indexed, so not seeing the relevance.

A delete scenario

September 19, 2007 - 2:11 pm UTC

Reviewer: A reader from India

Hi Tom,

I have a table containing 20 Million rows. The table has a composite index (3 columns). The application continually inserts into the table and also does delete from the table every 1 hour through independent delete statements(1 per row). The delete operation is pretty slow and we see a lot of waits on "db file sequencial access" during the delete operation. The delete is based on the indexed column values.

Please suggest a solution which will help in having better performance.
Tom Kyte

Followup  

September 20, 2007 - 9:30 pm UTC

define pretty slow, got a tkprof?

the delete must update ever index, not just the one it reads, the single block IO's are likely due to that.

slow by slow processing should be avoided, it is always the slowest way to process data. think "sets"

ENABLE PARALLEL DML" and TEMP tablespace

September 20, 2007 - 12:05 pm UTC

Reviewer: Florin from The Netherlands

Hi Tom,

Would you please explain me why the "ALTER SESSION ENABLE PARALLEL DML" is consuming TEMP tablespace?

If I'm not using "alter the session enable parallel dml" Oracle will use anyway the paralellism from the hint (4)

Example:

--ALTER SESSION ENABLE PARALLEL DML;
INSERT /*+ PARALLEL ( BL1_CUSTOMER_INFO_TEST 4) APPEND */
INTO BL1_CUSTOMER_INFO_TEST
SELECT /*+ PARALLEL ( BL1_CUSTOMER_INFO 4) FULL (BL1_CUSTOMER_INFO ) */
*
FROM BL1_CUSTOMER_INFO;


In this case the table BL1_CUSTOMER_INFO has more than 200 million rows and over 50 GB.

If I'm leaving the "alter session enable parallel dml" it will fail on TEMP tablespace, while if I'm disabling the "alter session .." it will go direct to the tablespace of the BL1_CUSTOMER_INFO_TEST.

Many thanks in advance!
Tom Kyte

Followup  

September 24, 2007 - 7:32 am UTC

are you sure it is failing on TEMP or is it failing to acquire a temporary extent.

big difference. Let us see the precise error message please.

and what indexes have you on this table.

ENABLE PARALLEL DML" and TEMP tablespace

September 26, 2007 - 9:25 am UTC

Reviewer: Florin from The Netherlands

Hi Tom,

I did another test (this time in our test database not like previous one in Prod).
I have a table (TEST1) with 25 million rows (1.7 GB).

AIM_DBA>create table test2 nologging tablespace pool_ix as select * from test1 where 1=2;

Table created.

AIM_DBA>alter session enable parallel dml;

Session altered.

AIM_DBA>insert /*+ append parallel (test2 2) */ into test2 select /*+ parallel (test1 2) */ * from test1;

ORA-12801: error signaled in parallel query server P000
ORA-01652: unable to extend temp segment by 128 in tablespace TEMP

If I'm doing the same thing without "alter session enable parallel dml" :

AIM_DBA>insert /*+ append parallel (test2 2) */ into test2 select /*+ parallel (test1 2) */ * from test1;

25192932 rows created.

Elapsed: 00:03:31.74

Please note:
No indexes in both tables.
I'm using Oracle 9.2.0.5.0 - 64 bit

I'm very curious to hear your opinion.

Thanks!
Florin


Tom Kyte

Followup  

September 26, 2007 - 9:39 pm UTC

i've bookmarked and will look at this when I get back (and have a server whereby I can do parallel on)

Update Query

October 05, 2007 - 9:09 pm UTC

Reviewer: Pete from oregon,USA

Hi Tom,

We have a query written like mentioned below

update table a set column B = some value where id = :id;

commit;

here column ID is a primary key

This query is written inside a package and is called each time when a user logs into a system. At one time the system ran slow when about 100 persons tried to access the system and the DBA at the client side says this is the wrong way to write the query. It should be written like this (see below);
---- below is DBA reply --
"Like about a 100 people,so this table gets updated frequently so there is a need to avoid locking the index when caching.Your related rowid and using that to bypass any index locking ,access is so important"

he suggest the following way :-

Set transaction read only;

Select rowid into dbky from table a where id = :id;
(best done in a package w. package variables)

Commit;

Return Success;

Then when you need to update
Update table a SET column b = some value where rowid = :dbky; <- no locking index

Commit;
-===========================

My questions is which is the better way ? Rowid or using the primary key as written earlier and why ?

Thanks and appreciate your feedback


Upsert

October 10, 2007 - 10:16 am UTC

Reviewer: Antony

I do have some issues with the merge. Currently in my database there are different schemas where the datas are manipulated with differenet applications. There is one schema which called USAGE is used to maintain the login details etc from differnet application. Currently there is a package for manipulating which is used to fetch the values from different tables in diferenet schema. The procedure basically first deletes the value from tables in USAGE corresponding to the values in source table deleted and updated then it tries to insert the data. The most problem is that during insert and delete statment it takes a hell lot of redo log files due to 1- 32 millions of data in the source table. I reviewed so many comments in u r site and found there are 2 solutions either use Merge statment or drop the table and create a table with nologging then insert the statement that won't be increasing the redolog files. Could you please suggest to which i should proceed

When its done?

October 26, 2007 - 3:20 am UTC

Reviewer: Karthick from India

After reading this thread I figured out one thing. BIG TABLE makes people crazy.

But my question is; when you decide you are done with it for now?

Say you got an update which took 1 hour. Ok you are clever and you made it to run in 15 min. But is 15 min is the optimal time for now? How would you decide that? Because we cant keep on testing and trying. Things need to go to production right!!

Tom Kyte

Followup  

October 29, 2007 - 10:33 am UTC

flip the question around.

what are the requirements for it to complete. how fast does it NEED to execute.

If you always move half way to a wall (eg: you start at 10 feet away and get to 5 feet away. Then you get to 5/2 feet away, then 5/2/2 feet and so on) you never actually get to the wall (sure you get very very close...)

you have to know when you can stop.

Can you show how you do that.

November 05, 2007 - 7:21 am UTC

Reviewer: karthick pattabiraman from India

you said

"it only takes a couple of minutes to update 5 million rows on my desktop pc -- so, there must be
something more involved here.

I'd be able to do that in minutes -- why is it taking you so long -- what is your bottleneck. "

with source table having 1000000 records and target table having 5000000 records where all the records in target table are updated by source.

Can you show how you do it in couple of min. I think it will definitly take lot of time.

Just want to see how you create your index and how you perform your update and so on..

I just want to see that being done with UPDATE statement not CREATE TABLE... SELECT * FROM..

Because i cant drop a table :-(
Tom Kyte

Followup  

November 06, 2007 - 9:04 am UTC

"i think it will definitely take lot of time"

why do you "think" that, why don't you "try that"?

what index would I create? If I am going to update 5 million out of 5 million rows, I am not going to use an index.

update t set col = 'value';

that is how the update would go? Nothing special.


I tried

November 06, 2007 - 9:19 am UTC

Reviewer: karthick pattabiraman from India

Yes I did try. My source table has 1 million records and target has 2 million records.

So my update was like this simple

update target_table
set target_col_1 = (select source_col_1
from source_table
where source_col_2 = target_col_2)
where exists ( select null
from source_table
where source_col_2 = target_col_2)

On seeing the explain plan I was able to identify that

Target table was going for a full table scan and source_table was going for an index range scan.

So I "think" the update will be executed in this way.

for i in (select * from target_table) --> full table scan
loop
select source_col_1
from source_table
where source_col_2 = i.taget_col_2 --> index range scan

replace i.target_col_i with source_col_1
end loop

I don¿t have any index on target_table as you said any way its going to be a full table scan. I have an index on source_col_2.

So when I did the update I took more than 1.5 hour (I don¿t know the exact time as I killed the session) so I enabled parallel dml and ran the same and it completed in 20 min.

Would you suggest parallel dml for such updates? Or any other things can be done before going parallel.

And after that I changed the source from 1 million to .1 million and ran the update in parallel again. Again it took around 18 min.

So is it because source_table is going for index range scan it doesn¿t really matter how many rows are there in source_table.

Tom Kyte

Followup  

November 06, 2007 - 9:39 am UTC

update (select data_from_src, data_from_tgt from src, tgt where src.key = tgt.key)
set data_from_tgt = data_from_src;

update a join, don't do all of those correlated subqueries.

no indexes should be involved although src needs to have a unique constraint on the joined to columns.


Thanks a lot

November 07, 2007 - 2:17 am UTC

Reviewer: karthick pattabiraman from India

What to say as the world says you "ROCK"..

my 20 min query that was running parallel just completed in 3 min without parallel when i changed the update the way you suggested.

can you explain us how oracle internally processes this update.

Once again thank you so much for all your help.
Tom Kyte

Followup  

November 07, 2007 - 5:13 pm UTC

it just joins.

November 08, 2007 - 6:24 am UTC

Reviewer: Stuart Horsman

Tom,

If we're doing an application release and updating multiple tables and multiple indexes with millions of rows, is it best to:

1) Put the database into no archivelog mode.
2) Alter all tables and indexes nologging.
3) Run inserts and updates.
4) Alter all table and indexes logging.
5) Put database back into archive log mode

Regards

Stuart
Tom Kyte

Followup  

November 08, 2007 - 10:15 am UTC

nologging would have no effect.

nologging works for large direct path operations such as:

o create table as select
o create index
o insert /*+ append */ (when the append hint is actually followed, can be followed) - but only for the TABLE, not for any indexes.

nologging will not affect

o insert, update, delete, merge (without append on insert or merge....)
o index maintenance


and in noarchivelog mode, nologging is not necessary, many direct path operations will not generate log in general (create table as select, insert /*+ APPEND */).




If you are not updating an indexed column, just update - try to make sure you ONLY update a table once (eg: do not update col1, then col2 then col3 - just make a single pass).

If you are updating an indexed column - consider making the index unusable and rebuild (with or without logging, up to you).


If you are updating such that you think there might be lots of migrated rows (rows grow in size), you might

o create table as select <modified data>
o drop old table
o rename new table
o re-index new table


compound key

November 21, 2007 - 5:24 am UTC

Reviewer: Karthick. from India

When the column in the source is not unique how to do the same update.

say source COL1+COL2 is unique. i join both column to target table and update target with COL2. But COL2 alone is not unique. And because i have used both COL1 and COL2 in the join condition it ensures to return only one value. But still i cant do join update. In such case what kind of update is better..

is it..

update t set t_c1 = (select s_c1 from s ...)
where exists(select null from s...)

Why 'created' instead of 'Inserted'

December 11, 2007 - 3:25 pm UTC

Reviewer: chandra from NJ

Tom, 
Please the below..
====
SQL> insert into tmp_is_oracle_supplier_match
  2       (
  3          otis_vendor_num,
  4          otis_vendor_name, 
  5          otis_tax_id,  
  6          status_code,  
  7          comments
  8       )   
  9     select attribute2,
 10      vendor_name,
 11      num_1099,
 12      0,
 13      'Valid Record' 
 14     from 
 15      ap_suppliers_int 
 16     where status is null
 17      and REJECT_CODE is null 
 18      and 1=2;

0 rows created.

===

When I try to insert, with the above command, why it says "Created" instead of "Inserted" ??

Tom Kyte

Followup  

December 11, 2007 - 9:41 pm UTC

because sqlplus likes to say "created" when you use insert.

Locking conflict

December 12, 2007 - 8:28 pm UTC

Reviewer: Feng from CA,USA

Hi Tom,

Here is my case which is on 9.2.0.8. On daily basis, we need to batch update few million rows in a 30-million row table (t1) using something like:

update
( select t1.a, t2.b
from t1,t2
where t1.fk=t2.pk
)
set a=b;

t2 is a table with tens of millions rows as well. Usually this takes few hours. We also have small amount real time updates against a single record in t1 from web user. If real time user update hit the same record being updated by batch update process, the real time user update process will wait and eventually time out if batch update still not finish. In order to avoid/minimize such locking conflict, we are thinking of reducing the size of batch update using PL/SQL. Is there other way to solve this problem?

Thanks in advance.
Tom Kyte

Followup  

December 13, 2007 - 9:40 am UTC

don't do the update.

Why no join to retrieve the correct value

Locking Conflict

December 13, 2007 - 12:52 pm UTC

Reviewer: feng from CA,USA

Tom,

Due to our business model, t2 is a temp table which contains periodic update from our vendor that will be used to update our permanent table t1, thus we can not join to retrieve value as you suggested.

Any alternative in this case besides reducing size of update statement either in SQL or PL/SQL?

Thanks.
Tom Kyte

Followup  

December 13, 2007 - 2:24 pm UTC

well, before I even begin to describe how to do this I want you to say yes or no in no uncertain terms to the following question:

What happens if another process reads a record that was updated and committed, and another record that was not yet updated (or was updated but has not yet been committed) during your "batch update". Will the fact this other process will be seeing data that heretofore was IMPOSSIBLE to see (they would never see two records that are inconsistent with respect to each other) be ok. The fact that data integrity is going to be compromised here - is that something your applications can actually live with, deal with, not have a problem with.

Because that is what you are suggesting to do - to have PART of your update be seen by every one before the entire update itself is complete.


also, we need a flag field or the ability to delete the record in t2 as we process it - so we don't process the same record more than once in the event (which will happen) of a failure. Do we have such a field that we can update in t2 as we process it - or can we delete the record from t2.

Locking Conflict

December 13, 2007 - 4:58 pm UTC

Reviewer: feng from CA,USA

Tom,

Thanks for your prompt follow up.

1) Data integrity is not a issue here as records are independent from each other so they don't have to be updated at same time.

2) We don't delete records in t2, but we can add a flag field to it if it's needed.

Please let me know if you need additional information. Thanks.

Tom Kyte

Followup  

December 14, 2007 - 12:51 pm UTC

ops$tkyte%ORA10GR2> create table t1 as select * from all_users;

Table created.

ops$tkyte%ORA10GR2> alter table t1 add constraint t1_pk primary key(user_id);

Table altered.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create table t2 as select user_id, lower(username) username, 'N' processed from all_users where mod(user_id,2) = 0;

Table created.

ops$tkyte%ORA10GR2> create index t2_idx on t2(processed);

Index created.

ops$tkyte%ORA10GR2> select * from t1;

USERNAME                          USER_ID CREATED
------------------------------ ---------- ---------
BIG_TABLE                              58 14-DEC-05
DIP                                    19 30-JUN-05
TSMSYS                                 21 30-JUN-05
LOTTOUSER                              65 30-DEC-05
MDDATA                                 50 30-JUN-05
OPS$ORA10GR2                           56 14-DEC-05
FOO$TKYTE                              60 19-DEC-05
QUOTA                                  94 22-FEB-06
AQ                                    228 15-OCT-07
R                                      76 09-JAN-06
OPS$TKYTE                             233 16-NOV-07
SCOTT                                  84 12-FEB-06
B                                     213 08-JAN-07
RT_TEST                               231 29-OCT-07
A                                     221 04-SEP-07
TEST                                  237 16-NOV-07
DMSYS                                  35 30-JUN-05
DBSNMP                                 24 30-JUN-05
WMSYS                                  25 30-JUN-05
EXFSYS                                 34 30-JUN-05
CTXSYS                                 36 30-JUN-05
XDB                                    38 30-JUN-05
ANONYMOUS                              39 30-JUN-05
OLAPSYS                                47 30-JUN-05
ORDSYS                                 43 30-JUN-05
ORDPLUGINS                             44 30-JUN-05
SI_INFORMTN_SCHEMA                     45 30-JUN-05
MDSYS                                  46 30-JUN-05
SYSMAN                                 51 30-JUN-05
PERFSTAT                              148 31-MAR-06
SYS                                     0 30-JUN-05
SYSTEM                                  5 30-JUN-05
OUTLN                                  11 30-JUN-05
MGMT_VIEW                              53 30-JUN-05
MY_USER                               211 09-NOV-06

35 rows selected.

ops$tkyte%ORA10GR2> select * from t2;

   USER_ID USERNAME                       P
---------- ------------------------------ -
         0 sys                            N
        24 dbsnmp                         N
        34 exfsys                         N
        36 ctxsys                         N
        38 xdb                            N
        44 ordplugins                     N
        46 mdsys                          N
        50 mddata                         N
        56 ops$ora10gr2                   N
        58 big_table                      N
        60 foo$tkyte                      N
        94 quota                          N
       228 aq                             N
        76 r                              N
        84 scott                          N
       148 perfstat                       N

16 rows selected.

ops$tkyte%ORA10GR2> declare
  2      type numArray is table of t2.user_id%type;
  3      type vcArray is table of  t2.username%type;
  4      l_userids    numArray;
  5      l_usernames vcArray;
  6  begin
  7      loop
  8          update t2
  9             set processed = NULL
 10           where processed = 'N'
 11             and rownum <= 10
 12       returning user_id, username
 13            bulk collect into l_userids, l_usernames;
 14
 15          exit when l_userids.count = 0;
 16
 17          forall i in 1..l_userids.count
 18              update t1 set username = l_usernames(i) where user_id = l_userids(i);
 19          commit;
 20        exit when l_userids.count < 10;
 21      end loop;
 22  end;
 23  /

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> select * from t1;

USERNAME                          USER_ID CREATED
------------------------------ ---------- ---------
big_table                              58 14-DEC-05
DIP                                    19 30-JUN-05
TSMSYS                                 21 30-JUN-05
LOTTOUSER                              65 30-DEC-05
mddata                                 50 30-JUN-05
ops$ora10gr2                           56 14-DEC-05
foo$tkyte                              60 19-DEC-05
quota                                  94 22-FEB-06
aq                                    228 15-OCT-07
r                                      76 09-JAN-06
OPS$TKYTE                             233 16-NOV-07
scott                                  84 12-FEB-06
B                                     213 08-JAN-07
RT_TEST                               231 29-OCT-07
A                                     221 04-SEP-07
TEST                                  237 16-NOV-07
DMSYS                                  35 30-JUN-05
dbsnmp                                 24 30-JUN-05
WMSYS                                  25 30-JUN-05
exfsys                                 34 30-JUN-05
ctxsys                                 36 30-JUN-05
xdb                                    38 30-JUN-05
ANONYMOUS                              39 30-JUN-05
OLAPSYS                                47 30-JUN-05
ORDSYS                                 43 30-JUN-05
ordplugins                             44 30-JUN-05
SI_INFORMTN_SCHEMA                     45 30-JUN-05
mdsys                                  46 30-JUN-05
SYSMAN                                 51 30-JUN-05
perfstat                              148 31-MAR-06
sys                                     0 30-JUN-05
SYSTEM                                  5 30-JUN-05
OUTLN                                  11 30-JUN-05
MGMT_VIEW                              53 30-JUN-05
MY_USER                               211 09-NOV-06

35 rows selected.

ops$tkyte%ORA10GR2> select * from t2;

   USER_ID USERNAME                       P
---------- ------------------------------ -
         0 sys
        24 dbsnmp
        34 exfsys
        36 ctxsys
        38 xdb
        44 ordplugins
        46 mdsys
        50 mddata
        56 ops$ora10gr2
        58 big_table
        60 foo$tkyte
        94 quota
       228 aq
        76 r
        84 scott
       148 perfstat

16 rows selected.



expect this update to take longer than the other.

10 is what I used, you would use 100 to 500.

it'll generate more redo, more undo, consume more cpu, take long - but will do it in batches.

and it will not suffer from 1555's as it does not keep a cursor open across commits.

and it is restartable when it fails, just rerun it and it will do it's job.

You will want an index on that column, even with two values.

why did I use NULL instead of "Y" - because entire null entries are not made in b*tree indexes - hence this index will stay small, it only indexes the "N" records.

Thanks

December 14, 2007 - 2:48 pm UTC

Reviewer: Feng

Tom,

Thanks for the solution. We will try this out.

Performance question

January 15, 2008 - 4:26 pm UTC

Reviewer: Feng from CA,USA

Hi Tom,

During testing of your bulk select/update solution, I found out that it has almost same execution time as following PL/SQL.

begin
for x in ( select * from t2 )
loop
update t1 set a=x.b where t1.fk=x.pk;
commit;
end loop;
end;
/

Since this method only lock/commit one row at a time, would this be a better solution than the bulk select and update of 500 rows at a time given that both have similar response time?

What do you think?

Feng
Tom Kyte

Followup  

January 15, 2008 - 4:57 pm UTC

*no*

you have a transaction - a logical unit of work.


You commit IF and ONLY IF and when your transaction is complete.

what if this updates N rows and then fails? what then? where is your code to restart in the middle?


the goal - especially in Oracle - is not to reduce the number of locks at the expense of PROGRAM LOGIC AND CORRECTNESS (we are not sqlserver!).

Performance question

January 15, 2008 - 6:49 pm UTC

Reviewer: Feng from CA,USA

In our application, if we fail after updating N rows, we can start over by updating everything again. This is not as efficient as starting from what was left from last run, but it will reduce chance of lock conflict further since the chance of failure is pretty low.
Tom Kyte

Followup  

January 16, 2008 - 3:13 pm UTC

do you understand the concepts of transactional consistency?

if you are worried about "locks", that means other people must be updating this information.

So, you come and update a row, commit, they update the same information, commit, you fail and you come back and just blind as a bat overwrite their updates again???

You understand that the outside world will see a BIT of your update before it is totally complete?

Do you understand the natural ramifications of what you propose and are you sure your applications are a) expecting that b) can deal with that c) won't totally corrupt your database (logically speaking)

If you actually have what you call "lock conflict", I think the answers to the above are "we really need to commit when we are DONE, because apparently other people are trying to modify this data and we'll introduce lost update problems on a scale the size of the grand canyon"

You are right!

January 16, 2008 - 5:01 pm UTC

Reviewer: Feng


delete thousands records

January 17, 2008 - 2:49 pm UTC

Reviewer: Dawar from Los Angeles, CA USA

I have a excel spread sheet base on one our internal table.
It contains 7000 records.

I need to delete all records from Oracle Database.

What is the quickies way to delete all records?

Thanks
D
Tom Kyte

Followup  

January 19, 2008 - 10:24 pm UTC

7,000 records is

a) big for excel
b) trivial, tiny, almost non-existent for Oracle.


I would just use DELETE FROM T;

Truncate would not be much faster, if it were faster even.

7,000 is such a small number.

BULK UPDATES

January 17, 2008 - 7:57 pm UTC

Reviewer: SRINIVAS CHALLA from Corona,CA

For bulk updates/deletes, Please also consider Merge Statmement. With the usage of Merge statement, one can overcome the temp table creation or performance issues of Pl/SQL BULK update statement.

MERGE INTO ok a
USING (
  SELECT ID
  FROM ok1
    ) b
ON (a.id = b.id)
WHEN MATCHED THEN
update set a.flag = 'Y'
DELETE WHERE a.FLAG='Y';  


desc ok;
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 ID                                                 NUMBER
 FLAG                                               VARCHAR2(1)

SQL> desc ok1;
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 ID                                                 NUMBER
 FLAG                                               VARCHAR2(1)

Tom Kyte

Followup  

January 19, 2008 - 10:31 pm UTC

why would you even consider merge for that?

delete from ok where id in (select id from ok1);


that is all you need, why would you even consider complicating things so horribly in that case with merge?


Temp Tablespace Error

April 06, 2008 - 11:54 pm UTC

Reviewer: Nikhilesh from India

Dear Tom,
When we create a table using CTAS (Create table as select) (query retunrs 20 thousand rows) it works fine for first time but then it eats up all temp tablespace and gives an error saying that it could not allot an extent. We are using Oracle 10.1.0.3, we have true temporary tablespace of 9.6 GB. extent size is 1028.

I'm the only person connected to Oracle and yes after running successfully i killed 1 session which was running the same query using dynamic sql.

Am I doing something wrong or Is it a bug in 10.1.0.3?
I don't have metalink access.

Thanks in advance,
Tom Kyte

Followup  

April 07, 2008 - 9:29 am UTC

no, it is not a bug, it is however insufficiently described to say what it is.

It seems like you are saying:

we do this:

a) create table t as select ......
b) drop table t;
c) create table t as select .... <<<= and this one fails.


are you sure it is blowing up on TEMP - or are you doing something like:

a) create table T1 as select ..... <<== works
b) create table T2 as select ... <<== fails on temp


If that is the case, it could be that what is "blowing up" is not temp at all - but the tablespace where t2 is going into - when we do a CTAS - we load the new data into temporary extents in the target tablespace AND THEN at the end, convert them into permanent extents.

Many times, people say "ran out of temp space", when they really ran out of space in the target tablespace (so, check the error - maybe even consider posting a CUT AND PASTE of it)....


Or, if it is not T2's tablespace that is blowing up, but there are two different tables - I'd say "table T1 needed little to no temp and T2's query needs more temp than you have allocated"


Delete the rows that contain BLOB object as one of the column

May 13, 2008 - 11:01 am UTC

Reviewer: Ramakrishna

I am trying to delete more than 100,000 records. It is faster in the tables that does not contain BLOB object as one of the column.I tried it through simple SQL plus
like 'delete from table_name where indexed_column<10000'. It is hanging there for a long time even for two days , the query is not done. How to effeciently delete this and what is the query I cna use to delete it quickly.
Tom Kyte

Followup  

May 13, 2008 - 11:18 am UTC

two days? really? Not that I don't believe you - but I don't believe you - you must have been blocked or otherwise locked out.

although the statement "It is faster in the tables
that does not contain BLOB object as one of the column." should be rather obvious? If you do something that takes less work - it'll be faster - of course.


Did you monitor this process from another session - looking at the plethora of V$ tables out there that tell you what is going on in the database at all?

Set based update.

May 14, 2008 - 3:25 pm UTC

Reviewer: john hardy from Miami

What exactly is a set based update. I have been hearing it from my colleagues. Is it better than doing an update by rowid.( I know this method is not to your liking,..but it works for me. 100 million updates on a highly indexed &triggered table in 10 hours!)
Tom Kyte

Followup  

May 16, 2008 - 12:14 pm UTC

if it took me 10 hours to update 100,000,000 rows, I'd be upset. We can do better.

I'd have been looking at perhaps getting rid of indexes, looking at that trigger to see if I really needed it to fire (disabling if not) and using a single update - perhaps in parallel - to perform the operation QUICKLY - in a single sql statement.

That is a set based update, slow by slow processing written by procedural programmers = bad, set based operations - packing as much work into a single sql statement = good.

In fact, if I were updating 100,000,000 out of 100,000,000 (or nearly so), I might choose to create table as "select rows as they should be", drop old table, rename-index and so on - or use dbms_redefinition to do it online.

update

May 29, 2008 - 8:25 pm UTC

Reviewer: A reader


delete performance

June 03, 2008 - 4:35 am UTC

Reviewer: Resl from Lebanon

Hi Tom,

I have crm_cust_info table with 301,000 records.
the select count(*) from crm_cust_info where cust_type='CONT' needs less than 1 second to return teh value (there is an index on cust_type).

while the delete statement
delete from crm_cust_info where cust_no='3500' (for example) needs more than 10 seconds to delete the required record knowing than cust_no is a unique an indexed column.

i rebuild indexes also try the same delete with different optimizer mode after analyzing the mentioned table and its indexes but without any performance change.

Please note that i included the delete statment in a cursor in order to delete about 300,000 records.


Tom Kyte

Followup  

June 03, 2008 - 11:33 am UTC

you do realize that comparing the performance of a count(*) where indexed_column=value and a delete where indexed_column = value

is about as useful as comparing the performance of your bike with an airplane. True, your bike and airplane are both modes of traveling (as select and delete are both sql statements) but that is about it.

the select statement - what does it have to do... It has to read a couple of index blocks and count stuff (that it takes almost a second means there is LOTS of records to count, I would be hoping for 0.02 seconds maybe on something like that - for it to be near a second is huge)

the delete statement - what does it have to do....

a) read the index blocks
b) navigate to the table blocks, get the block in consistent read mode
c) get the block in current mode for each row it wants to delete
d) delete it - which requires navigating all of the OTHER indexes on this table to find the entries that point to this row and remove them)
e) generate redo for all of the above
f) generate undo for all of the above
g) generate redo for the undo
h) ultimately delete the index entry that found us the row in the first place

i) do a-h for the remaining 299,999 records


and you did it the slowest way possible!!!! slow by slow via a cursor, so you are making it take EVEN LONGER (if you did it in a single sql statement, you would find it to be faster in all probability)

Update statement taking more hours --

June 09, 2008 - 3:00 am UTC

Reviewer: Sipra from INDIA

Hi Tom,

Have been observing the great support from last six months. Many thanks for the useful info -

I am having a SQL statement whic needs to be update 4 million of rows. I couldnot figure where my query is getting strucked...

Please have the following Code -

UPDATE T1
set TYPE_ID = 141
where TYPE_ID in (
SELECT TYPE_ID
FROM T1, VIEW, T2@dblink T2
WHERE T1.ID = VW.ID
AND VW.NUM = T2.NUM
AND T2.NAME IS NOT NULL
AND T2.REV = 'Y'
AND T2.NUM = '&v_num'
AND TYPE_ID = 1);

Scenario -

Updating the type_id in table T1 with 141 if the NAME of T2 is Not null and T2.REV is 'Y'
At ta time this will update 4-5 million reocrds.

I ahve executed this thru a procedure which has given the following error. Please help me.

regards,
Sipra
www.dwforum.net

Updating millions of records-LOB column

June 16, 2008 - 10:21 am UTC

Reviewer: Praveen from Bangalore

Hi Tom,

This is w.r.t. the original question and the answer given by you. There are 1 million records in a table. And if I need to upload an image or pdf file (say of size 1MB each) into a LOB column of each record (which means there are 1 million LOB files, each of size 1MB), what would be the best approach?

I need to commit after every 1000 records. Also it will be difficult to follow the steps you suggested while dealing with LOBs.

Thanks

Tom Kyte

Followup  

June 16, 2008 - 1:32 pm UTC

follow which steps? This is a HUGE page here.

so, you have 1,000,000 1mb files.

I'd be looking at perhaps, probably using dbms_lob.loadfromfile, called in parallel in as many sessions as you want to have going - each session working on a different set of files.

Re:Updating millions of records-LOB column

June 17, 2008 - 1:08 am UTC

Reviewer: Praveen from Bangalore

The steps for updating 1 million records you had suggested for the original question in this thread. Anyway, I am going to test with the solution you suggested above, for uploading 1 million 1mb files.

Thank you



Update of 2 columns...

June 25, 2008 - 11:55 am UTC

Reviewer: Richard from USA

Hi Tom,

Thanks for the website.. I'm an avid reader.  In reference to this thread, "How to Update millions or records in a table," recently, we've run into an issue where our tables are growing very quickly and our updates aren't able to keep up. Every day, we have to perform an update like the below example. When we had 20-30m records it would take a couple hours and wouldn't be so bad.  But now, that we have over 50m consistently, and sometimes closer to 100m rows, our updates haven't scaled properly.  I use the example, our method of updating is like driving a Geo Metro to the grocery store and back.. if the performance is bad you don't really notice because the ride is so short. But, when you have to drive this same Geo Metro across the state and back ,thats when you notice the performance issues :-)

We have to update our customers' location based on their IP address. We have the IP Geo data and know the starting decimal IP address and the ending decimal IP address.

For example, let's say a row in big_table  has SRCIP = 192.168.1.100 & DSTIP = 172.1.1.1

In the lookup_table, the following is true:

* The country_code = 'LOCAL'  for the dotted decimal IP range of 192.168.0.0 - 192.168.255.255  (decimal IP 3232235520 - 3232301055). 
* The country_code = 'REMOTE'   for the dotted decimal IP range of 172.0.0.0 - 172.255.255.255 (decimal IP 2885681152 - 2902458367).

update big_Table
    set src_cc = (select country_code from lookup_table where srcip between 3232235520 and 3232301055)
       , dst_cc = (select country_code from lookup_table where dstip between 2885681152 and 2902458367)
 where src_cc is null
       or dst_cc is null;

So the Line would eventually look like

SRCIP:     192.168.1.100
DSTIP:     172.1.1.1
SRC_CC:   LOCAL
DST_CC:   REMOTE


I tried to incorporate BULK COLLECT and FORALL into the PL/SQL below, but I'm not really sure how to do so. Maybe my logic is backwards, but I'm thinking we need to collect all of the lookup_table data (~135k rows), loop through them, and then update the big_table columns. This doesn't seem like it should be too bad.. 135k iterations through to update ~60m records. Obviously some loops will update 10k records, some 30k and some only a few hundred.  I've looked at various examples of CTAS, MERGE, /+ Append hints, etc. But when it comes to updating two columns  that rely on different data, I can't seem to wrap my brain around it.

I have indexes on SRCIP & DSTIP in the big_table, and lookup_table is an IOT (ip_from & ip_to columns). Reading your comments in this threat, this probably doesn't matter since my current method requires a full table scan of lookup_table anyway... and it's only ~135k records.  We're running 10.2.0.2 (with plans to upgrade to 10.2.0.4 ASAP) on Solaris 9.  Big_table is in nologging mode. We don't keep this data for very long but it's important for us to make these updates.

Thanks in advance for any support you can offer!


SQL> select count(*) from big_table;

 COUNT(*)
----------
 59,895,942

SQL> select count(*) from lookup_table;

 COUNT(*)
----------
   135,237

SQL> desc lookup_table

 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 IP_FROM                                   NOT NULL NUMBER(10)
 IP_TO                                     NOT NULL NUMBER(10)
 COUNTRY_CODE                                       VARCHAR2(2)

SQL> desc big_table
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 QRY                                                        CHAR(3)
  STARTTIME                                          DATE
 MSECS                                                  RAW(3)
 RANK                                                   NUMBER
 SRCIP                                                  NUMBER
 DSTIP                                                 NUMBER
 SRC_CC                                             CHAR(2)
 DST_CC                                             CHAR(2)
 ASSIGNEDHOSTID                                     RAW(4)
 HOSTNAME                         NOT NULL VARCHAR2(20)


Original update statement....

update big_table
    set src_cc = (select country_code from lookup_table where srcip between ip_from and ip_to)
      ,  dst_cc = (select country_code from lookup_table where dstip between ip_from and ip_to)
/


Obviously, taking forever.... lots of UNDO space required.

Second attempt....  not much better results. Less UNDO used, but more use of DBWn:

begin
FOR x  IN (select country_code, ip_from, ip_to from lookup_table)
LOOP
 update /*+ PARALLEL(big_table,10) */ big_table
       set src_cc = (loop_counter.country_code)
  where srcip between x.ip_from and x.ip_to
      and src_cc='  ';
 commit;
END LOOP;

FOR x  IN (select country_code, ip_from, ip_to from lookup_table)
LOOP
 update /*+ PARALLEL(big_table,10) */ big_table
       set dst_cc = (x.country_code)
  where dstip between x.ip_from and x.ip_to
       and dst_cc='  ';
 commit;
END LOOP;
end;
/




Tom Kyte

Followup  

June 25, 2008 - 12:08 pm UTC

what are the odds of filling out the IP table (that should be really easy to do)

then we could update using no code easily

Filling out IP table...

June 25, 2008 - 3:47 pm UTC

Reviewer: Richard from USA

Thanks for the prompt response. We've considered that option and it may be our only choice.

Two drawbacks that I can identify immediately:
 * This IP data is updated ~monthly, or should be updated ~monthly. As the IP spaces change, we will have to rebuild this table... but, as I know you'll say, rebuilding that table once/month is much better than using 2 between statements on a daily basis :-)
 * The decimal IP range is 33996344 - 4278190079.

If I did my math correctly:

SQL> select 4278190079-33996344 from dual;

4278190079-33996344
-------------------
         4244193735

SQL> select to_char(4244193735,'999,999,999,999') from dual;

TO_CHAR(4244193735,'999,999,99')
-------------------------------
   4,244,193,735

I've considered distinct IP decimal addresses as well. Out of ~50m rows, per day, we have approximately 3m unique srcip or dstip values.

For arguments sake, if we couldn't fill out an IP table, any suggestions on a course of action? This problem has become those song lyrics that I can't get out of my head until I know the rest of the verse. I think you're absolutely right that filling out an IP table might be the best method, but I'm curious what I could do with SQL or PL/SQL to attack the problem.

Thanks again. 



Tom Kyte

Followup  

June 25, 2008 - 3:55 pm UTC

Actually, doh, now that I think about it, why not just maintain a table of your 3milion and just JOIN

you don't need the country code in the base table, just join - so, don't fill out the ip table, maintain a lookup nightly that has just the 3million ip's you need and monthly wise update that after you get a refresh. Every night you would just have to insert into this lookup table any ip's that aren't there yet, populate them with values.

Even if you "must" have the IP's in the base table (and I would disagree, just join, it is ok to join), it would be more efficient to maintain this 3million row table nightly (adding new rows) and only having to lookup the new rows.

union & distinct columns...

June 25, 2008 - 8:36 pm UTC

Reviewer: Rich from USA

Thanks Tom. 

I am thinking this might be a good use of CTAS with /*+ APPEND */ hint, parallel & nologging, no? 

CTAS select distinct srcip as ip from big_table UNION select distinct dstip as ip from big_table... no logging, /*+ append */, parallel, etc.

Then, i'll just keep inserting into that table until it also becomes unmanagable... truncate it and start over.

We'll have ~3m unique IPs per day.. but then on a weekly basis, I would guess it'll be ~4.5m unique IPs... monthly ~10m unique IPs, etc.  So the size of this table shouldn't exceed the size of the big_table anytime soon. *fingers crossed* :-)

Sorry to go off topic, but what are your thoughts on triggers? I read in your books that triggers have less effect on redo in 10g than they did in 8i/9i. We've considered adding the src & dst country_code columns onto big_table and using an on insert trigger to update those 2 columns. This table is appended to via sqlldr (not direct load) a few dozen rows at a time... but those few dozen times add up pretty quickly -- 50+m/day. We haven't tried it yet, and I'm sorry to be the guy who asks you the questions without having tkprof, trace, etc, but I'm just looking for a gut (experience/wisdom) feeling.

Thanks again Tom.






Tom Kyte

Followup  

June 26, 2008 - 3:41 pm UTC

create table as select is ALWAYS append mode (writes above high water mark with direct path). There is no such thing as a CTAS with /*+ append */.

Parallel - up to you, do you have the resources for it, can parallel help it.

Nologging - up to you and your DBA :) they need to do special stuff if you go nologging.


... Sorry to go off topic, but what are your thoughts on triggers? ...

the title of an upcoming column by me in Oracle Magazine is:

<quote>
The Trouble with Triggers¿

Those of you who follow the asktom.oracle.com website know that I have an aversion to triggers. Once upon a time, a long time ago, I thought triggers were the coolest thing ever and I used (and abused) them heavily. Now, I will go very far out of my way in order to avoid a trigger ¿ whenever possible.
</quote>


and you said:

... his table is appended to
via sqlldr (not direct load) a few dozen rows at a time... ...

but that'll skip triggers... and make /*+ append */ a "no-op"


ops$tkyte%ORA10GR2> create table t as select * from all_users where 1=0;

Table created.

ops$tkyte%ORA10GR2> create trigger t_trigger
  2  before insert or update on t for each row
  3  begin
  4          :new.username := lower(:new.username);
  5  end;
  6  /

Trigger created.

ops$tkyte%ORA10GR2> !cat t.ctl
LOAD DATA
INFILE *
INTO TABLE t
REPLACE
FIELDS TERMINATED BY '|'
(
username
,user_id
,created
)
BEGINDATA
BIG_TABLE|58|14-DEC-05
DIP|19|30-JUN-05
TSMSYS|21|30-JUN-05

ops$tkyte%ORA10GR2> !sqlldr / t.ctl direct=y

SQL*Loader: Release 10.2.0.2.0 - Production on Thu Jun 26 15:30:44 2008

Copyright (c) 1982, 2005, Oracle.  All rights reserved.


Load completed - logical record count 3.

ops$tkyte%ORA10GR2> select * from t;

USERNAME                          USER_ID CREATED
------------------------------ ---------- ---------
BIG_TABLE                              58 14-DEC-05
DIP                                    19 30-JUN-05
TSMSYS                                 21 30-JUN-05

<b>no trigger fired</b>

ops$tkyte%ORA10GR2> update t set user_id = user_id;

3 rows updated.

ops$tkyte%ORA10GR2> select * from t;

USERNAME                          USER_ID CREATED
------------------------------ ---------- ---------
big_table                              58 14-DEC-05
dip                                    19 30-JUN-05
tsmsys                                 21 30-JUN-05

<b>that just shows trigger works, but did not fire</b>

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> insert /*+ append */ into t select * from all_users where rownum < 3;

2 rows created.

ops$tkyte%ORA10GR2> select * from t;

USERNAME                          USER_ID CREATED
------------------------------ ---------- ---------
big_table                              58 14-DEC-05
dip                                    19 30-JUN-05
tsmsys                                 21 30-JUN-05
big_table                              58 14-DEC-05
dip                                    19 30-JUN-05

<b>that shows trigger fired and since the select did not fail, we did NOT append!</b>




add column

June 28, 2008 - 3:27 am UTC

Reviewer: jayesh from india

hi tom,

generally we add a new column it added at the end of all column suppose i want to add a column between two column if it is possible so plese give me a examples & syntex
Tom Kyte

Followup  

June 28, 2008 - 1:53 pm UTC

this is what a view is about.

How I wish we never gave "rename column :(" views are the right approach here.

alter table t add ( x int );

create or replace view view_of_t
as
select a, x, b from t;


use the view - if need be, rename the table to "t_table" and create view T as select ....

no one will know...

Update with BETWEEN operator...

July 02, 2008 - 5:50 pm UTC

Reviewer: Rich from USA

Thanks for the response to my questions Tom -- always appreciated.

I've still been thinking about how to attack this and I'm curious how BETWEEN works.If I say:

update big_table
set dst_cc = (select country_code
from lookup_table
where dstip between x.ip_from
and x.ip_to);

Will this act similar to an IN operator or an EXISTS operator? I'm curious if BETWEEN will stop at the first result or continue searching for other values?

I created a quick test... I duplicated a row in lookup_table, but with a different country_code, executed a statement like

update big_table
set src_cc = (select country_code
from lookup_table
where srcip between ip_from and ip_to);


and it returned "ORA-01427: single-row subquery returns more than one row"

Am I correct to assume that Oracle will scan my entire lookup table when it uses the BETWEEN operator?

I'm thinking about rewriting my query to something like:

FOR x IN (select country_code, ip_from, ip_to from lookup_table)
LOOP
update big_table
set src_cc = (x.country_code)
where exists (select 1
from lookup_table
where srcip between x.ip_from and x.ip_to);
commit;
END LOOP;
end;
/

The lookup_table shouldn't have duplicates, ip_to & ip_from have UNIQUE constraints. Therefore, I want my query to select the first country_code & move on to the next update, not continue to search through the lookup_table. I'm not quite sure how to attack this. Any thoughts or suggestions?

Thanks again for your help!

Regards,
Rich




Tom Kyte

Followup  

July 07, 2008 - 9:19 am UTC

..
update big_table
set dst_cc = (select country_code
from lookup_table
where dstip between x.ip_from
and x.ip_to);

Will this act similar to an IN operator or an EXISTS operator?
.....

absolutely not. think of that like:

update big_table set dst_cc = f(x);


no where clause - EVERY row gets updated. And if your correlated subquery returns more than one row - you'll just get an error and the update will fail.


...
Am I correct to assume that Oracle will scan my entire lookup table when it
uses the BETWEEN operator?
....

absolutely not. it'll index range scan more likely if it can.


I cannot really help with the real update you want - too much missing here, this logic doesn't seem sensible:


FOR x IN (select country_code, ip_from, ip_to from lookup_table)
LOOP
update big_table
set src_cc = (x.country_code)
where exists (select 1
from lookup_table
where srcip between x.ip_from and x.ip_to);
commit;
END LOOP;
end;
/


why would you use where exists. that would tend to scan big table for every row in lookup


maybe this?

ops$tkyte%ORA11GR1> create table t ( src_cc varchar2(2), srcip number );

Table created.

ops$tkyte%ORA11GR1> insert into t values ( null, 50 );

1 row created.

ops$tkyte%ORA11GR1> insert into t values ( null, 500 );

1 row created.

ops$tkyte%ORA11GR1> insert into t values ( null, 5000 );

1 row created.

ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> create table lookup ( country_code varchar2(2), ip_from number, ip_to number );

Table created.

ops$tkyte%ORA11GR1> insert into lookup values ( 'aa', 0, 100 );

1 row created.

ops$tkyte%ORA11GR1> insert into lookup values ( 'bb', 101, 1000 );

1 row created.

ops$tkyte%ORA11GR1> insert into lookup values ( 'cc', 1001, 10000 );

1 row created.

ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> select * from t;

SR      SRCIP
-- ----------
           50
          500
         5000

ops$tkyte%ORA11GR1> merge into t
  2  using lookup l
  3  on (t.srcip between l.ip_from and l.ip_to)
  4  when matched then update set src_cc = l.country_code;

3 rows merged.

ops$tkyte%ORA11GR1> select * from t;

SR      SRCIP
-- ----------
aa         50
bb        500
cc       5000


please do NOT commit in the for loop, please....

No commit & no exists...

July 08, 2008 - 8:56 am UTC

Reviewer: A reader

Thanks Tom.

why would you use where exists. that would tend to scan big table for every row in lookup

I agree, I wouldn't want to do that. I was trying to use it as an example of how I wanted to query lookup_table and use 'exists-like logic' to stop after it finds the first entry that matches the between statement. I think your suggestion of a merge is my best bet.

please do NOT commit in the for loop, please....

That being in there was embarrassing -- sorry :-)

Thanks,
Rich


How to insert 10 million rows...into a prtition

July 23, 2008 - 12:08 pm UTC

Reviewer: Suresh from INDIA

Hi Tom,

I have a table (fact_projection) with currently with 26 partitions and having around 10 million rows.
Every day i need to insert into fact_projection all the partitions of table from stging_projection joining some other tables.

Data to all the partitions are being inserted correctly but for a paricular partition called PH is getting too late.

I tried inserting using BULK COLLECT but it is taking bulk of time.

Could you please suggest the best possible approach.


Shall i do it using the

insert into fact_projection PARITION PH
SELECT <join>

Here is the query .-->
INSERT INTO invsl.fact_invntry_projctn
             (mngng_org_id, org_suply_chain_item_key, suply_chain_loc_key,
              snpsht_dt_key, fscl_mth_id, actvty_typ_key, slng_mrkt_id,
              unit_msr_id, crncy_id, srce_sys_id, unit_qty, std_cost_amt,
              sku_parms_key, item_typ_id, mrkt_fsc_key, suply_mrkt_id,PROD_TYP_GRP_ID)
    SELECT   /*+ parallel(a,8) push_subq */ a.mngng_org_id, d.org_item_fg_key, a.suply_chain_loc_key,
             g.clndr_dt_key, b.fscl_mth_id, a.actvty_typ_key, a.lcl_mrkt_id,
             MIN (NVL (a.uom_id, -100)), MIN (NVL (f.crncy_id, -100)),
             MIN (a.srce_sys_id), SUM (a.unit_qty), MIN (a.std_cost_amt), -100,
             MIN (a.item_typ_id), a.mrkt_fsc_key, suply_mrkt_id,min(a.PROD_TYP_GRP_ID)
        FROM invsl_stage.stg_invntry_projctn a JOIN codi_stage.stage_fscl_mth b
             ON (    TO_CHAR (a.projctn_mth_dt, 'mm') = b.fscl_mth_nr
                 AND TO_CHAR (a.projctn_mth_dt, 'yyyy') = b.fscl_yr_nr
                )
             JOIN codi_stage.stage_org_fg d
             ON (a.suply_mrkt_id = d.org_id AND a.item_cd = d.lcl_sc_fsc_cd)
             LEFT OUTER JOIN codi_stage.stage_crncy f
             ON (a.lcl_crncy_cd = f.crncy_cd)
             JOIN codi_stage.stage_clndr_dt g ON (a.snpsht_dt = g.clndr_dt) 
     WHERE  a.mngng_org_cd = lv_mrkt
                AND a.lcl_item_typ = 'FG'
                AND a.prod_typ_grp_cd = 'FG PRODUCT'
    GROUP BY a.mngng_org_id,d.org_item_fg_key,a.suply_chain_loc_key,g.clndr_dt_key,
             b.fscl_mth_id,a.actvty_typ_key,a.lcl_mrkt_id,a.mrkt_fsc_key,a.suply_mrkt_id;

Tom Kyte

Followup  

July 24, 2008 - 10:29 am UTC

you might consider a direct path insert, insert /*+ APPEND */ and enabling parallel dml if you want to do a parallel direct path load.

How to insert 10 million rows...into a prtition

August 08, 2008 - 6:40 am UTC

Reviewer: Sureshpswamy from Hyderabad,India

Hi,

It was working fine when I added the APPEND hint.
But just before the deployment I had a Change request to consider new table mapsrpt_stage.stg_mrkt_fsc.But now the whole query went to old 3hr status.
Cuold you please help me .

INSERT INTO /*+ APPEND  */ invsl.fact_invntry_projctn PARTITION (part_62) NOLOGGING
            (mngng_org_id, org_suply_chain_item_key, suply_chain_loc_key,
             snpsht_dt_key, fscl_mth_id, actvty_typ_key, slng_mrkt_id,
             unit_msr_id, crncy_id, srce_sys_id, unit_qty, std_cost_amt,
             sku_parms_key, item_typ_id, mrkt_fsc_key, suply_mrkt_id,
             prod_typ_grp_id)
   SELECT   /*+ parallel(a,8) push_subq  index(pk_stg_invntry_projctn) */
            a.mngng_org_id, d.org_item_fg_key, a.suply_chain_loc_key,
            g.clndr_dt_key, b.fscl_mth_id, a.actvty_typ_key, a.lcl_mrkt_id,
            MIN (NVL (a.uom_id, -100)), MIN (NVL (f.crncy_id, -100)),
            MIN (a.srce_sys_id), SUM (a.unit_qty), MIN (a.std_cost_amt), -100,
            MIN (a.item_typ_id), a.mrkt_fsc_key, suply_mrkt_id,
            MIN (a.prod_typ_grp_id)
       FROM invsl_stage.stg_invntry_projctn PARTITION (ph) a JOIN codi_stage.stage_fscl_mth b
            ON (    TO_CHAR (a.projctn_mth_dt, 'mm') = b.fscl_mth_nr
                AND TO_CHAR (a.projctn_mth_dt, 'yyyy') = b.fscl_yr_nr
               )
            JOIN codi_stage.stage_org_fg d
            ON (a.suply_mrkt_id = d.org_id AND a.item_cd = d.lcl_sc_fsc_cd)
            LEFT OUTER JOIN codi_stage.stage_crncy f
            ON (a.lcl_crncy_cd = f.crncy_cd)
            JOIN codi_stage.stage_clndr_dt g ON (a.snpsht_dt = g.clndr_dt)
      WHERE a.mngng_org_cd = 'PH'
        AND a.lcl_item_typ = 'FG'
        AND a.prod_typ_grp_cd = 'FG PRODUCT'
        AND EXISTS (
               SELECT 1
                 FROM mapsrpt_stage.stg_mrkt_fsc m
                WHERE a.mrkt_fsc_key = m.mrkt_fsc_key AND m.catgry_id != 17
                   OR (    m.catgry_id = 17
                       AND EXISTS (
                              SELECT 1
                                FROM system_code
                               WHERE m.form_id = code
                                 AND code_type =
                                                'SALES_TOOLS_INCLUSION_FILTER'
                                 AND market = 'GLOBAL')
                      ))
   GROUP BY a.mngng_org_id,
            d.org_item_fg_key,
            a.suply_chain_loc_key,
            g.clndr_dt_key,
            b.fscl_mth_id,
            a.actvty_typ_key,
            a.lcl_mrkt_id,
            a.mrkt_fsc_key,
            a.suply_mrkt_id

Tom Kyte

Followup  

August 08, 2008 - 1:20 pm UTC

no, not really. you'd need to trace this, look at what it is doing, and see if there is a more efficient way to do it.

you'd need to understand the indexing scheme
you'd need to understand the QUESTION trying to be answered
you'd need to understand the relationships between the tables

and so on - you have that, I don't.

Update

August 19, 2008 - 12:13 am UTC

Reviewer: krishna

Hi Tom,

Can I update a table based on some criteria in the where clause which is having joins.

Ex: update emp set sal=1000 where emp.emp_no = <table>.<field_name>


pls, can u provide some examples.

Thanks
Krishna
Tom Kyte

Followup  

August 20, 2008 - 10:18 am UTC

"U" is not available (I believe they are dead actually). Can I help instead?

you can update a join

ops$tkyte%ORA10GR2> create table emp as select * from scott.emp;

Table created.

ops$tkyte%ORA10GR2> create table dept as select * from scott.dept;

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> alter table emp add constraint emp_pk primary key(empno);

Table altered.

ops$tkyte%ORA10GR2> alter table emp modify deptno not null;

Table altered.

ops$tkyte%ORA10GR2> alter table dept add constraint dept_pk primary key(deptno);

Table altered.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> update ( select sal from emp, dept where emp.deptno = dept.deptno) set sal = 1000;

14 rows updated.


and of course a subquery also works:

ops$tkyte%ORA10GR2> update emp set sal = 1000 where deptno in ( select deptno from dept );

14 rows updated.

error logging clause for CTAS

August 29, 2008 - 10:55 am UTC

Reviewer: Oxnard from Boston Ma

ver 10.2
Anyway to do an error logging clause for CTAS?

We do partition swaping with the CTAS tables

so for example we do

create table my_ctas (
col1 not null, -- note the not null
col2,
col3) tablespace tbl nologging
select
cast(col1 as varchar2(20)) as col1
,cast(col2 as number) as col2
,cast(col3 as number) as col3
from taba, tabb
where
taba.x = tabb.x

when there are some data problems it can be a pain to figure it out. If not how do I get a request onto the new features request.

Thanks






Tom Kyte

Followup  

August 30, 2008 - 9:48 am UTC

since you are doing a cast like that, you know the types you want, you would accomplish your goal in two statements

create table my_table( col1 varchar2(30) not null, .... );
insert /*+ append */ into my_table select ..... log errors ....;


use the direct path insert to populate the table, it'll work very much in the same fashion as the create table as select performance wise. By pass buffer cache, write directly to disk, can be non-logged, parallel, etc.

not update but just select and insert

September 10, 2008 - 3:13 am UTC

Reviewer: A reader

I have a subset of parent table data in table A (around 4 Millions records) and detail table B of around 4.5 billion records. I need to query them and insert the 4 Million * X (number of matchin records in B) in new table C. There is index on both table A and B on matching key.

The problem is, when I query table B alone without any conditions, just select count(1) from B, it takes hours to return. There are more than 400 partitions (hash) on the table B and the values in partions are not uniformly distributed, does that cause slowing down ?

What should be the best approach in this condition ? I am thinking of querying table A and B and put the result in C using CTAS, is there another better way ? or any tips to make it faster ?
Tom Kyte

Followup  

September 11, 2008 - 10:42 am UTC

the values in b have nothing to do with anything. You are counting the number '1' (and one might ask - why the number '1', why not '*', if your intention is to count "how many records do i have", you would use count(*) to show to the world you know what you are doing - by counting '1's, it looks like you are confused about how to count records....)

However, the fact that there are 4.5 billion records in there and let us assume an average row width of 100 bytes - you might be full scanning (you do not say if you have indexed a NOT NULL column on this table) about 1/2 of a terabyte of storage.

And depending on your storage, that just might take a while?

I would question the need for this table C, you give no justification for it, but it doesn't seem very useful

not update but just select and insert

September 11, 2008 - 1:03 pm UTC

Reviewer: P from WA

For count(1), I read somewhere that if you just want to count records of a table, it is better than using count(*) , please correct me if I am wrong. I dont have the table analyzed or any other way to find how many rows in the table so just to tell you correct number, I queried select count(1). If count(*) is better, please let me know and also please explain why.

For table C which I mentioned in earlier question is result table from querying A and B and there would be more tables to update table C for some more columns and then table C will be used in main process with other applications's result tables to apply some other business rules for final output of some reports.

For more information on index and tables , table A and table B are indexed on NOT NULL columns (primary keys on A and not null indexed foreign key on B) and I need to query some more tables like B to update columns in C which would have same parent child relationship like A --> B.


sorry, if I was not clear on table C, but the actual question is what approach I should take ? I hope I am able to explain my question to you.
Tom Kyte

Followup  

September 16, 2008 - 12:38 pm UTC

... For count(1), I read somewhere that if you just want to count records of a
table, it is better than using count(*) , please correct me if I am wrong. ...

consider yourself corrected then.

When you read that information - did they

a) say why?
b) show why?
c) demonstrate some noticeable improvement in response times?

Always look for that - if someone says "X will make your life better", you would as to see a, b, and c - do the same with all technical advice.


As for why - count(*) is the way, the documented way, the correct way to count rows. Count(*) is in fact "count rows".

Count( <expression> ) says "count non-null expressions", by saying count(1) you are not saying "count rows", you are in fact saying "count how many non-null number 1's are in this table".

We had to go and file a bug way back when to internally rewrite count(1) as count(*) in order to make things faster - this rumor was so rampant.

read this:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::NO::P11_QUESTION_ID:1156159920245

we've been discussing this for years and years and years now.....


other than saying you have a table A and B and lots of rows in B, I don't have much information to go on here. You do not say how large C is, if C is very large - it could be that full scans + hash join = right way.

If C is not, full scan A + nested loops into B might be the right way.

But in either case, this will take a MEASURABLE amount of time, you do not say how long it is currently taking, what your (possibly unreasonable) expectations are, what the current plan is, how big (in gigabytes - row counts are useless - they only describe the smallest the table could be) A and B are, what kinds of resources you have (parallel? lots of disk? tons of memory, etc...)



and I still doubt that C is necessary, but whatever, people keep building tables like this anyway... Unless you are going to full scan C frequently - C is probably not useful.

Question on Updating a Join

September 12, 2008 - 6:24 pm UTC

Reviewer: Dan from MI

Tom,

I'm looking to rewrite a pretty commonly used procedure for getting phone numbers - it's currently written "slow by slow" and takes a long time on anything of decent size. I'm running into trouble when attempting to use the UPDATE on a JOIN that you specified earlier in this thread:

UPDATE (SELECT tmp.HOME_PHONE, phn.phn_area_cde_num, phn.PHN_NUM
FROM tmp_test1 tmp,
phone_table phn
WHERE tmp.cust_id = phn.cust_id
AND phn.phn_loc = 'H'
AND phn.phn_use = 'G'
AND phn.del = 'N')
SET HOME_PHONE = phn_area_cde_num || phn_num;

This results in a "ORA-01779: cannot modify a column which maps to a non key-preserved table".

This is in 10g. There are primary key restraints on each phone_table field (cust_id, phn_loc, phn_use, del) and I placed a unique constraint on tmp.cust_id after looking through your site a bit for the error. Using the below:

alter table tmp_test1 add constraint tmp_test1_unique
unique(cust_id);

But it's still happening. It seems like that should be sufficient information for Oracle to know there is only one possible result, correct? Or is there another constraint I need?

(Also, I found this interesting - I first ran the sql from another user, one that has permissions on tmp_test1 but not on phone_table, and got an "ORA-01031: insufficient privileges" error. It seems like this should work since I am only attempting to update the table I have access to, but looks like not)
Tom Kyte

Followup  

September 16, 2008 - 5:22 pm UTC

You can use a merge here - but not the join. The optimizer won't recognize the constants there - it would want tmp to have phn_loc, phn_use, del and for tmp to join to phn using 4 columns (we have to join to the primary key of the PHN table in its entirety)

so, just use a merge command.


ops$tkyte%ORA11GR1> create table tmp ( a int primary key, y int );

Table created.

ops$tkyte%ORA11GR1> create table phn ( a int, b int, y int, primary key(a,b) );

Table created.

ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> update ( select tmp.y tmpy, phn.y phny from tmp, phn where phn.a = tmp.a and phn.b = 'x' )
  2  set tmpy = phny;
set tmpy = phny
    *
ERROR at line 2:
ORA-01779: cannot modify a column which maps to a non key-preserved table


ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> merge into tmp
  2  using (select * from phn where b = 'x' ) phn
  3  on (tmp.a = phn.a)
  4  when matched then update set y = phn.y;

0 rows merged.





how to update millions row in produktion

October 15, 2008 - 7:50 am UTC

Reviewer: Francisco Jimenez from Colonia - Germany

Hello Tom,

i need to update one column from one table. This Table have 250 millions rows.

My Problem is that someone can updating rows from this table at the same time while i am updating the table. How can i do this update? I would like to use CTAs but our DBA think that it is not a good idea.

thanks in advance.

Fran.
Tom Kyte

Followup  

October 15, 2008 - 5:31 pm UTC

lock table t in exclusive mode;
update t set x = whatever;
commit;



get the table, update the table, un-get the table by committing.



November 17, 2008 - 1:00 pm UTC

Reviewer: A reader

We have a system in production for about 6 months. One process inserts records into a table and right now it has inserted about 250 million rows in the table. We want to start putting a retention period on this table and change the process a little more. So we want to do the following in production

1. Rename the existing table.
2. Create a new table with the same structure and let it run in production.
3. Every day early morning we want to do the following - check the data in the table created above and if the data is more then 3 days old, archive it to a history table.
4. Delete from the history table any data more than 90 days old.

Now for the step 3, it could be about a million - 2 million rows every day. What will be the best way to move the data or archive it to the history table ?

Thanks for your help.

Tom Kyte

Followup  

November 18, 2008 - 7:29 pm UTC

one wonders, one really really really wonders....

why....

this was not part of the original design from day one, this is important stuff.

You want to use partitioning - anything else would be a hugely inefficient way to approach it.


Change datatype of column in 400 million row table

March 20, 2009 - 5:19 pm UTC

Reviewer: A reader

Tom,
We have a table with 400 million rows. The table is partitioned. We have to change the datatype of a column from number to varchar2. We are running 10gr2.

For some arcane reason, the end users want the column order to remain unchanged.

My thought would be to:
a) Add a new column of type number
b) Update the new column with old column
c) Set old column to NULL
d) Change the datatype of old column to varchar2
e) Update the old column with new column
f) Drop the new column

Is this the right approach or is there a better way considering the data volume? If this is the right approach, how can I speed it up while minimizing the downtime to users?

Thanks...
Tom Kyte

Followup  

March 24, 2009 - 10:47 am UTC

... the end users want the column order to remain
unchanged.
...

we call that a VIEW.

alter table t add new_col varchar2(25);
update t set new_col = number_col, number_col = null;
alter table t drop column number_col;
alter table t rename to t_table;
create or replace view t
as
select ....... <in any order, with any name you like>
  from t_table;




DO NOT do the 'curly shuffle' like you suggest above, that would just be a waste.

Change datatype of column in 400 million row table

March 24, 2009 - 11:08 am UTC

Reviewer: A reader

Tom,
Thanks for your reply.

Following your example, from a test in non-production database, updating the column took about 15 minutes for 1 million row table. For 400 million row table, if I linearly extrapolate, it will take about 100 hours. How can I speed it up?

Thanks...
Tom Kyte

Followup  

March 29, 2009 - 10:46 am UTC

parallel update.


or, use dbms_redefinition which can apply functions and conversions as you redefine the table, you'd need two copies, but it can be 100% online, your end users don't have to have any interrupted service.


actually, the fastest way to do something is NOT TO DO IT in the first place. I would go back and seriously re-evaluate your "need" to perform this operation at all...

Bulk operation

April 20, 2009 - 10:59 am UTC

Reviewer: Sam from India

Hi Tom,

We have a table with 60 million record in production. We need to do a clean up activity based on the following logic.

For a particular FORDERUNG_ID, if the column values in the column REST_KOST_UNVERZ, REST_KAP_VERZ differs, we need to keep those rows. If the value in the following row ( order by GUELTIG_VON_DAT ) has the same value in the columns REST_KOST_UNVERZ and REST_KAP_VERZ, we need to delete that row. Here the problem for me is that I don't know how to compare rows of the same table in a particular order. Sample data is provided below

FORDERUNG_ID REST_KOST_UNVERZ REST_KAP_VERZ GUELTIG_VON_DAT     GUELTIG_BIS_DAT
1387221                     10       3769.59 31.10.2008 10:21:35 25.11.2008 13:07:01
1387221                     10       3769.59 25.11.2008 13:07:02 26.11.2008 13:07:01
1387221                      0             0 26.11.2008 13:07:02 27.11.2008 13:07:01
1387221                     10            10 27.11.2008 13:07:02 04.02.2009 23:59:59
1387221                      0             0 05.02.2009 00:00:00 06.02.2009 23:59:59
1387221                      0             0 10.02.2009 00:00:00 31.12.9999 00:00:00
1387224                      0        340.16 31.10.2008 10:21:35 25.11.2008 13:07:01
1387224                  116.4        340.16 25.11.2008 13:07:02 04.02.2009 23:59:59
1387224                  116.4        340.16 05.02.2009 00:00:00 06.02.2009 23:59:59
1387224                  116.4        340.16 10.02.2009 00:00:00 31.12.9999 00:00:00


The output data should be

    
FORDERUNG_ID REST_KOST_UNVERZ REST_KAP_VERZ GUELTIG_VON_DAT     GUELTIG_BIS_DAT
1387221                     10       3769.59 31.10.2008 10:21:35 25.11.2008 13:07:01
1387221                      0             0 26.11.2008 13:07:02 27.11.2008 13:07:01
1387221                     10            10 27.11.2008 13:07:02 04.02.2009 23:59:59
1387221                      0             0 05.02.2009 00:00:00 06.02.2009 23:59:59
1387224                      0        340.16 31.10.2008 10:21:35 25.11.2008 13:07:01
1387224                  116.4        340.16 25.11.2008 13:07:02 04.02.2009 23:59:59


If you see that I need to compare current rows and previous row to see if there is any difference after ordering by GUELTIG_VON_DAT (valid from date). For your information, GUELTIG_BIS_DAT is valid to date.

Difference between GUELTIG_VON_DAT and GUELTIG_BIS_DAT of next row is GUELTIG_VON_DAT +1 second.

I need a query that returns the result so that I can insert into second table with append mode, drop the original table and rename second table to original table. Can you please help me out?
Tom Kyte

Followup  

April 21, 2009 - 2:27 pm UTC

... For a particular FORDERUNG_ID, if the column values in the column REST_KOST_UNVERZ, REST_KAP_VERZ differs, we need to keep those rows. If the value in the following row ( order by GUELTIG_VON_DAT ) has the same value in the columns REST_KOST_UNVERZ and REST_KAP_VERZ, we need to delete that row. ....

did not make sense, make that more clear and of course provide create tables and inserts - else I cannot really look at it.


It sounds like to me that if you have a row whereby REST_KOST_UNVERZ, REST_KAP_VERZ differs - keep that row (so, you only look within a row, therefore decode( REST_KOST_UNVERZ, REST_KAP_VERZ, 0, 1 ) = 1 are rows to KEEP (they do not differ) always.

Then it gets difficult to understand. For rows where they do not differ (so we are only interested in them I think??) we need to delete that row.


So basically, generate a set of rows whereby REST_KOST_UNVERZ, REST_KAP_VERZ are the SAME and de-duplicate using a key of FORDERUNG_ID, REST_KOST_UNVERZ, REST_KAP_VERZ - keeping only the oldest record? right?



Re: Bulk operation

April 22, 2009 - 11:12 am UTC

Reviewer: Jakub Illner from Prague

Sam, I suppose you should use the LAG/LEAD analytical function which will give you the values of the previous or next row. In the next step you can compare the values in the current versus previous row and decide what to do with the record - delete or keep it. The select (not tested) could look like this:

select FORDERUNG_ID
, REST_KOST_UNVERZ
, REST_KAP_VERZ
, GUELTIG_VON_DAT
, GUELTIG_BIS_DAT
, case
when REST_KOST_UNVERZ = next_REST_KOST_UNVERZ
and REST_KAP_VERZ = next_REST_KAP_VERZ
then 'DELETE'
else 'KEEP'
end as action
from (
select FORDERUNG_ID
, REST_KOST_UNVERZ
, REST_KAP_VERZ
, GUELTIG_VON_DAT
, GUELTIG_BIS_DAT
, lead(REST_KOST_UNVERZ,1)
over (partition by FORDERUNG_ID
order by GUELTIG_VON_DAT
) as next_REST_KOST_UNVERZ
, lead(REST_KAP_VERZ,1)
over (partition by FORDERUNG_ID
order by GUELTIG_VON_DAT
) as next_REST_KAP_VERZ
prev_REST_KAP_VERZ
from <table>
)


Regards,
Jakub

Rewrite the update

April 24, 2009 - 8:00 am UTC

Reviewer: sriram from India

Tom,

I am trying to implement the technique(create new_table as select <update statement> from old_table) that you mentioned on the below sql , but no clues on proceeding.

Please can you suggest a good way of doing this?

UPDATE PS_PI_TAX_DETAIL TAX SET PI_STATE = (NVL((SELECT J_STATE_TAX FROM PS_J_PAYCHECK PAY WHERE PAY.EMPLID = TAX.PI_EMPLID AND PAY.WEEK_END_DT = TAX.PAY_END_DT AND PAY.J_PAYCHECK_NBR = TAX.PI_CHECK_NBR AND SUBSTR(PAY.J_CONTROL_NUMB, 1, 6) = TAX.PI_PAY_GROUP2),:1)) ,PI_LOCALITY = (NVL((SELECT J_LOCAL_TAX FROM PS_J_PAYCHECK PAY WHERE PAY.EMPLID = TAX.PI_EMPLID AND PAY.WEEK_END_DT = TAX.PAY_END_DT AND PAY.J_PAYCHECK_NBR = TAX.PI_CHECK_NBR AND SUBSTR(PAY.J_CONTROL_NUMB, 1, 6) = TAX.PI_PAY_GROUP2),:1)) WHERE PI_RUN_NUM = 0 AND PI_SYSTEM_ID = :2 AND PI_CONFIG_ID = :2
Tom Kyte

Followup  

April 27, 2009 - 12:18 pm UTC

wouldn't you guys hate it if I wrote all of my sql like that? Isn't that just about the most impossible thing to read? All uppercase, no sensible line breaks, totally impossible to read.


you'd have to select out every row of course (your update has a where clause... you are not updating every row)

I think you want to use merge, merge into tax using pay on (join condition) with a WHEN MATCHED clause to update.

merge into ps_pi_tax_detail tax using PS_J_PAYCHECK pay on ( PAY.EMPLID= TAX.PI_EMPLID AND PAY.WEEK_END_DT = TAX.PAY_END_DT AND PAY.J_PAYCHECK_NBR = TAX.PI_CHECK_NBR AND SUBSTR(PAY.J_CONTROL_NUMB, 1, 6) = TAX.PI_PAY_GROUP2 and tax.pi_run_num = 0 and tax.pi_system_id = :b2 and tax.pi_config_id = :b2 ) when matched then update set pi_state = nvl(pay.j_state_tax,:b1), pi_locality = nvl(pay.j_local_tax,:b1)


like that. Only kidding, more like this:

ops$tkyte%ORA10GR2> merge into ps_pi_tax_detail tax
  2  using PS_J_PAYCHECK pay
  3  on ( PAY.EMPLID= TAX.PI_EMPLID
  4       AND PAY.WEEK_END_DT = TAX.PAY_END_DT
  5       AND PAY.J_PAYCHECK_NBR = TAX.PI_CHECK_NBR
  6       AND SUBSTR(PAY.J_CONTROL_NUMB, 1, 6) = TAX.PI_PAY_GROUP2
  7       and tax.pi_run_num = 0
  8       and tax.pi_system_id = :b2
  9       and tax.pi_config_id = :b2 )
 10  when matched then update set pi_state = nvl(pay.j_state_tax,:b1), 
                                  pi_locality =  nvl(pay.j_local_tax,:b1)
 11  /



the nvl probably isn't needed anymore since we are not using scalar subqueries.

Deleting records in table

May 27, 2009 - 7:26 am UTC

Reviewer: Siddu from UK

Hi Tom,

We are trying to delete 5 lakh records weekly. These 5 lakh records are selected based on criteria which happens dynamically. We are using Perl at the front end. Deletion of records is taking 20 minutes. We were looking at variuos options to increase the performance. One approach I thought was to write a stored procedure to delete the records and do a service side execution instead of client side (here it is perl). Could you please suggest any other ways we can tune the performance.

Cheers,
Siddu
Tom Kyte

Followup  

May 27, 2009 - 11:36 am UTC

so, I suppose 5 lakh is 500,000 (best to be unambiguous - acronyms, colloquialisms - should be avoided...)


No one could say if your idea of moving code from perl to plsql will have any impact on performance. Anyone that says they do is lying.

There is no information here to help you.

"These 5 lakh records are selected based on criteria which happens dynamically" is not meaningful, I have no idea what that means, what that entails.

You do not say if you issue 500,000 individual delete statements (bad).

You do not say if you run a query to find 500,000 records and then issue a delete for each one (worse)

You do not say if you run a single delete (good), but you are deleting 500,000 records out of 5,000,000,000 and using a full scan (probably bad)

In short, you don't tell us what you are doing - what you have observed.


20 minutes - that might be really good (if the table has 20 indexes on it - that might be as good as you get).


What don't you (in PAINSTAKING DETAIL) tell us what you need to do and then (and only then, do not skip that part, the telling us what you are trying to do in specification format) tell us your current algorithm.

Then we can suggest methods for rewriting your algorithm more efficiently.

May 28, 2009 - 10:01 am UTC

Reviewer: abdou from asia

hi tom
please i want query show 3 big sal employee from table emp

thank
Tom Kyte

Followup  

May 28, 2009 - 1:56 pm UTC

what if 4,000 people all make the same exact really big salaries.

be specific do you want:

a) a random set of 3 or less people that make a salary that is in the first three rows after sorting the entire set by SAL DESC

select * from (select * from emp order by sal desc) where rownum <= 3;

b) the set of employees (maybe 0 to many many rows) that make the top 3 salaries - accounting for 'ties'. That is, if 50 people make $100, and 50 people make 75$ and 50 people make $50 and 50 people make $25 - this answer would return only the 50 people that make $100 (since they all tied for first place, there is no 2nd, 3rd, 4th, .... 50th place person since the first 50 all tied for that honor)

select * from (select emp.*, rank() over (order by sal desc) r from emp) where r <= 3;

c) the set of employees that make a salary that is in the top three salaries. In the above example that would return 150 people - the first three groups

select * from (select emp.*, dense_rank() over (order by sal desc) r from emp) where r <= 3;


updating some columns & 500,000 records in a table

June 29, 2009 - 9:23 am UTC

Reviewer: Prem from India

hi Tom,

i have a requirement where i need to update a table of 1.5 million records with 300 columns, based on various conditions(as much as 500 different sets of conditions arrived after executing predefined rules)
and the number of rows getting affected could be .5 million and out of 300 columns 20 columns will be updated in the .5 million records

So I need to fire one update query per condition so that makes 500 update statment to be fired and this takes more than 10 hours.

Could you please let me know the best possible solution to handle this situation.

thanks,
prem

3 million rows to update with a subselect

July 17, 2009 - 12:57 pm UTC

Reviewer: Rebecca

Hi Tom,
In a data expansion program I took a set of 22 rows and burst them into 100k+ sets, resulting in
around 3 million rows.

Each set has a common key (FAMILY_ID) and each row a unique key (MEMBER_ID).

The data in the table is structured parent/child-- with the parent's MEMBER_ID being stored in
PARENT_MEMBER_ID.

There are multiple parents in each family.

When I created the 3 million rows- every row gets a new unique MEMBER_ID.
So the new keys to the parent rows need to be remapped to the children they are parents of.
There are about 1 million parents and 2 million children. Its only parent/child- no grandchildren.


Here's where I started:
UPDATE FAMILIES_TBL A SET PARENT_MEMBER_ID = (SELECT X.MEMBER_ID FROM FAMILIES_TBL X
WHERE X.OLD_MEMBER_ID = A.PARENT_MEMBER_ID AND A.BRANCH = X.BRANCH AND A.FAMILY_ID =
X.FAMILY_ID) WHERE PARENT_MEMBER_ID IS NOT NULL

FAMILIES_TBL is a temp table without an index, which was great for the inserts. But we'd both
grow very old before the update completes.


After reading this excellent thread, I changed it to:
CREATE TABLE FAMILIES_TBL_HLD AS SELECT * FROM FAMILIES_TBL WHERE 1=2;
ALTER table FAMILIES_TBL nologging;
ALTER table FAMILIES_TBL_HLD nologging;

Created an index on FAMILIES_TBL("PARENT_MEMBER_ID", "OLD_MEMBER_ID", "BRANCH",
"FAMILY_ID")

insert into FAMILIES_TBL_HLD (PARENT_MEMBER_ID,<rest of fields>)
SELECT (SELECT X.MEMBER_ID FROM FAMILIES_TBL X WHERE A.PARENT_MEMBER_ID = X.OLD_MEMBER_ID AND
A.BRANCH = X.BRANCH AND A.FAMILY_ID = X.FAMILY_ID),<rest of fields> FROM FAMILIES_TBL A WHERE PARENT_MEMBER_ID IS NOT NULL;

Still pretty bad. Any suggestions? It certainly feels like a classic pattern, but Im not 'getting it'.

<table and field names changed to protect the innocent>

Thank you in advance for your suggestions.
Rebecca

Re: 3 million rows to update with a subselect

July 20, 2009 - 12:47 pm UTC

Reviewer: Carsten from Germany

Hi Rebecca,

if we are talking about a one time issue: create an index to support the join operation.

otherwise:
I guess the fastest way to solve your problem is to not do it at all.
Basically you seem to have an ID that changes (OLD_MEMBER_ID -> MEMBER_ID) and then update a corresponding FK-field.

If your Member_ID changes, you shouldn't use it as PK, but instead generate one and use this as reference in FKs.
Generate a view to display the parent_member_id in your old fashion, if you need to, which would greatly benefit from above index as well. So from solving a repetitive task you got to solving it once.

Mazhar

August 17, 2009 - 3:03 am UTC

Reviewer: Mazhar from Pakistan

Great Stuff

Update 40 million rows

August 24, 2009 - 6:52 am UTC

Reviewer: A reader from INDIA

Hi Tom,

We are working on a warehouse and want to update Type 1 and type 2 records from the delta on a given day (which is 40 million) to a table which has 3.5 billion rows.

Some of the approaches which we could think are:
1) Create a dummy copy of the 3.5 billion row table.Run an update statement to this copy. Build indexes and then do an exchamge. this effort takes around 14 hrs total to build 13 indexes along with creating a copy of 3.5 billion rows.
2) We are thinking of another approach to directly update the Dimension on-line. Will that approach work.
3) The delta is sitting in 64 hash sub partitions and the dimension is also hashed partition 64. Can we take advantage of Partition wise Join.
4) Should we try DIY parallelism, as suggested in your book.
5) We are thinking of using Updateable Join view.

Kind of need some guidance which way to try this out.

Thanks

Tom Kyte

Followup  

August 25, 2009 - 9:41 am UTC

since you are only touching 40mill out of 3.5billion - why not just update the data?

Update 40 million

August 25, 2009 - 11:28 am UTC

Reviewer: A reader from india

Hi Tom,

Thanks for your reply. So as I understand we should update the data blocks on-line.

The reason I was confused was we have 12 indexes on this dimension and if we do update the data blocks we will be doing index maintenance and most of the indexes are BITMAP.

Also can you plesae suggest if we should use:

1) Nested Loops as the Join Method (so there will be 40 million index lookups before it touches the data blocks) and updates them OR we should use HASH JOIN to bring the 40 million in hash and scan the bigger dimension (3.5 billion rows) via ROWID Range Scan.

2) I feel the Partition wise Join will scale better since we have 64 hash partitions and each partition of the delta table will own join with the corresponding hash partion of the dimension, so join will happen with 3.5/64 million rows and 40/64 million rows as 64 indepemdent joins.

3) Will NL Partition wise join work, i was eading the docs but could not find any perfect answer to this.

Please help and guide us.

Tom Kyte

Followup  

August 25, 2009 - 8:30 pm UTC

... The reason I was confused was we have 12 indexes on this dimension and if we do
update the data blocks we will be doing index maintenance and most of the
indexes are BITMAP.
...

do it as a single statement - in bulk - one sweep.

do NOT do it 40,000,000 times, a slow by slow approach
do it as a single sql


1) write the sql, let the optimizer decide.

2) see #1

3) see #2

Slow Bulk Collect

August 27, 2009 - 3:12 pm UTC

Reviewer: A reader

Tom:

1. I have a two partitioned table with enable row movement.
partition 2 has unprocessed records , partition 1 processed records
2. I have to update around 2 mill rows which will make them move from partition 2 to partition 1.
3. I update them with normal update and it takes around 40 minutes.
4. I rewrote a procedure to do BULK UPDATE rather than simple update.
5. It takes around 60 minutes to do BULK update.

The SQL trace shows Bulk update happening:

UPDATE TABLE A
SET COL1 = :B4 ,
COL2 = :B3 ,
COL3 = :B2
WHERE
COL4 = :B1


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1708 1203.22 2235.05 203830 7981227 99254416 1707253
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1709 1203.22 2235.05 203830 7981227 99254416 1707253


Is my BULK update which should in turn do a BULK - INSERT, BULK - DELETE for partition row movement happening? How do I trouble shoot this.

please provide your valuable input to me to troubleshoot this.

Thanks in Advance.
Tom Kyte

Followup  

August 28, 2009 - 5:02 pm UTC

why would you introduce slow procedural code what you have a fast single sql statement?


there is nothing to trouble shoot, I've written a billion times by now:

<quote from every book I've ever written and many postings/articles>

I have a pretty simple philosophy when it comes to developing database software, and it is one that has not changed over many years:
* You should do it in a single SQL statement if at all possible.
* If you cannot do it in a single SQL statement, then do it in PL/SQL (but as little PL/SQL as possible!).
* If you cannot do it in PL/SQL (due to some missing feature like listing the files in a directory), try a Java stored procedure. This is an extremely rare need today with Oracle9i and above.
* If you cannot do it in Java, do it in a C external procedure. This is most frequently the approach when raw speed or the use of a third-party API written in C is needed.
* If you cannot do it in a C external routine, you might want to think seriously about why exactly you need to do it.

</quote>


procedural code is going to be slower than a set based operation to do the same thing in almost every case (exceptions? sure, there are strange edge cases). In this case - a simple update is the only thing to consider.


additional information to prev review

August 27, 2009 - 4:56 pm UTC

Reviewer: A reader

Here are the wait events associated:

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 28 (recursive depth: 1)

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 203828 0.54 1017.05
latch: cache buffers chains 95 0.00 0.02
read by other session 27 0.04 0.26
buffer deadlock 1 0.00 0.00
latch free 20 0.00 0.00
buffer busy waits 26 0.10 0.21
log buffer space 283 0.42 15.33
log file switch completion 27 0.08 0.46
latch: object queue header operation 5 0.00 0.00


So Basically single straight update is taking same amount of time as the BULK update.

Thanks in advance...

slow bulk update

August 27, 2009 - 5:58 pm UTC

Reviewer: A reader

I have gone through another series of questions which you answered. I think simple straight update SQL performs best.

Thanks,

PL/SQL vs SQL

August 29, 2009 - 3:36 pm UTC

Reviewer: A reader from London, UK

Hi Tom,

I have a question for your regarding SQL and Performance.
...I have a summary table that maintains counts based on a File ID and Region that's used for reporting. The summary table is a GTT(Global Temp Table).

In this table - I have about 250 fields - that represent when a field has changed - so if it's data is number, char date and so on .. and if's changed I update this table.

e.g update tmp_counts
set field1 = field1 + decode(nvl(x(1).field1, 9999),
nvl(x(1).field1, 9999),0,
9999,1,1),
...etc

example above is to monitor if a numeric field has changed. If so increment the counter for that field.

Now...this update to the table is called in a proc with array parameters (the result from a bulk collect using NOCOPY). The procedure is called millions of times to monitor changes.

Now, if I write the above code updating an array instead of a GTT) procedurely using dbms_sql.execute - i.e checking each individual field using IF..ENDIF statements. It takes much, much, much longer. I would have thought updating a small summary table millions of times would have been slower than updating an array in memory? Can you explain this behaviour? Is it quicker doing things in "a set" using an update rather than doing stuff procedurely in PL/SQL. Thanks!


Tom Kyte

Followup  

August 29, 2009 - 7:29 pm UTC

... procedurely
using dbms_sql.execute - i.e checking each individual field using IF..ENDIF
statements. It takes much, much, much longer. ...

you thought that compiling code and dynamically executing it over and over would be faster? why?


... Is it quicker doing things in
"a set" using an update rather than doing stuff procedurely in PL/SQL. Thanks!
...

almost always - search this site for the word


mantra


and read about my mantra

PL/SQL vs SQL

August 29, 2009 - 9:55 pm UTC

Reviewer: A reader from London, UK

Hi Tom,

I understand what you mean by "mantra" and I have all your books - it's just that "intuitively" you'd think doing something in memory would be quicker than doing I/O (from datafiles to buffer cache) is quicker? ... I process millions of records (bulk collect)from a transaction table - are there possibilities to do processes in parallel? I think if you want to use the parallel hint/degree you need to put the table in "parallel mode" ...
Tom Kyte

Followup  

August 31, 2009 - 7:13 am UTC

... it's just
that "intuitively" you'd think doing something in memory would be quicker than
doing I/O (from datafiles to buffer cache) is quicker?...

Not if your procedural code you write to modify memory is a billion times less efficient than the code we wrote to modify memory is.



As for your last bit - you have a serial bit of code that updates rows slow by slow (sequentially). How would parallel SQL help you - you have a serial process.


If you want something to run orders of magnitude faster - the typical approach is

a) forget the existing code, put it out of your head for a minute
b) write down in GORY detail what it is you have to do
c) design a set based algorithm to achieve that goal (from scratch)


don't buy into 'sunk cost' (the code you have isn't a sunk cost, it is a failed first implementation). Now that you know what you need to do - do it over with SQL set based operations at the core of your thought process.

Insert 50 million rows to cluster table from another table

September 16, 2009 - 4:46 am UTC

Reviewer: Hopkins from USA

Hi Tom

I need to add more than 50 million rows from one table to another. Both are clustered table

I use the following command

insert /*+ append parallel 16 */ into second_tab
select * from first_tab;

But it takes more than 10 hours and hangs

Could you please show me the best way how to achieve it and some information about parallel hint. SHould I give more than 16 or less?

Thanks
Tom Kyte

Followup  

September 16, 2009 - 4:05 pm UTC

... But it takes more than 10 hours and hangs ...

what does it "hang" on - I hate that word, very rarely is it every used correctly. define what you mean by hang.


you cannot direct path load a clustered table, the append hint, not useful. In a cluster - data has a LOCATION, a PLACE where it needs to be. For every row going into the cluster, before we put it away, we have to find out where it needs to go.

A direct path load presumes "data is streaming in, we'll format a block, fill a block, write a block". If the data streaming in has to go onto a SPECIFIC block, we cannot do that.

So, the append hint - not meaningful.

The parallel hint - not specified in any documented format (eg: not correct). http://docs.oracle.com/docs/cd/B19306_01/server.102/b14200/sql_elements006.htm#SQLRF50801 Also, you do not say if you tried even to enable parallel dml (pre-req for parallel dml is to enable it).

Also, there is the problem that clustered tables cannot be created or loaded in parallel using parallel dml - for the same reason direct path doesn't work. The data has a location - multiple loaders would all need access to the same exact locations at the same time.

you do not give us anything to go on here.

a) is the source table in a cluster with other tables, if so, then you realize a full scan will "not be fast", as it has to full scan not only the source table but EVERY OTHER table in there.

b) is the targeted table a btree cluster? hash cluster?

c) are the size and keys settings set correctly?

d) is the target table indexed?



what you should do

a) enable tracing
b) insert some of the data
c) review the work performed, see what the waits were, see if any can be 'removed' (any of the waits)

Here is me doing 50,000,000 rows on a rather old piece of hardware. 10hours would be outrageous.

it also demonstrates that append doesn't work (by showing what append would do if it did - cause a subsequent query to fail).

so, no append
no parallel load

I presume you have lots of indexes - that'll contribute to long load times.

You do have to realize that every row that goes into the cluster will either scan a btree cluster index OR run a hash function (you didn't give us cluster details). That entails quite a bit of work. Clusters take longer to load into (by definition) - their goal is to make data retrieval faster, but loads will necessarily take longer.

You might be able to do a bit of parallel query processing if you do "DIY parallel processing"

http://asktom.oracle.com/pls/ask/search?p_string=diy+parallelism



ops$tkyte%ORA10GR2> /*
ops$tkyte%ORA10GR2> drop table stage;
ops$tkyte%ORA10GR2> create table stage
ops$tkyte%ORA10GR2> as
ops$tkyte%ORA10GR2> select level id, rpad('*',120,'*') data from dual connect by level <= 50000000;
ops$tkyte%ORA10GR2> */
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create cluster btree_cluster ( id number ) size 125
  2  /

Cluster created.

ops$tkyte%ORA10GR2> create index btree_cluster_idx on cluster btree_cluster
  2  /

Index created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create table t
  2  as
  3  select level id, rpad('*',120,'*') data from dual connect by level <= 0;

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> insert /*+ append */ into t select 1, 'x' from dual;

1 row created.

ops$tkyte%ORA10GR2> select * from t;
select * from t
              *
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel


ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> drop table t;

Table dropped.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create table t
  2  cluster btree_cluster( id )
  3  as
  4  select level id, rpad('*',120,'*') data from dual connect by level <= 0;

Table created.

ops$tkyte%ORA10GR2> select * from t;

        ID
----------
DATA
-------------------------------------------------------------------------------
         1
*******************************************************************************
*****************************************


ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> rollback;

Rollback complete.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> set timing on
ops$tkyte%ORA10GR2> insert into t select * from stage;

50000000 rows created.

Elapsed: 02:14:43.63
ops$tkyte%ORA10GR2> set timing off

Updating existing rows with sequence

February 11, 2010 - 9:42 am UTC

Reviewer: SAI from INDIA

Hi Tom,

I have Two tables 'CLIENT_DETAIL' and 'CLIENT_STAUS_HIST' with same table structure.
ID, TimeModified, XXX, Versioin
Version is newly added column in both records.
Here ID is unique in CLIENT_DETAIL, but not in CLIENT_STATUS_HIST.
When ever CLIENT_DETAIL is updated the details are stored in CLIENT_STATUS_HIST.
I have millions of records in CLIENT_STATUS_HIST. Now i want to update the version column sequencially.

CREATE OR REPLACE
PROCEDURE CLIENT_STATUS_VERSION_UPDATE AS

counter NUMBER;
v_client_id CLIENT_STATUS_HIST.CLIENT_DETAIL_ID%TYPE;
CURSOR CLIENT_HIST IS
SELECT CLIENT_DETAIL_ID FROM client_status_hist
GROUP BY client_detail_id;

CURSOR CLIENT_HIST_FOR_UDATE (client_id in NUMBER) IS
SELECT * FROM client_status_hist WHERE client_detail_id=v_client_id
ORDER BY LAST_MODIFIED_TMSTMP ASC FOR UPDATE;


BEGIN
OPEN CLIENT_HIST;

LOOP
FETCH CLIENT_HIST INTO v_client_id;
counter:= 0;
FOR v_row IN CLIENT_HIST_FOR_UDATE (v_client_id)

LOOP

UPDATE CLIENT_STATUS_HIST SET VERSION= counter where current OF CLIENT_HIST_FOR_UDATE;
counter:= counter + 1;

EXIT WHEN CLIENT_HIST_FOR_UDATE%NOTFOUND;
END LOOP;
CLOSE CLIENT_HIST_FOR_UDATE;

UPDATE CLIENT_DETAIL SET VERSION= counter where client_detail_id= v_client_id;

EXIT WHEN CLIENT_HIST%NOTFOUND;
END LOOP;
CLOSE CLIENT_HIST;

END CLIENT_STATUS_VERSION_UPDATE;

This Procedure takes nearly 3 hours to complete the task. Is there any other way to do it in less time?
Tom Kyte

Followup  

February 16, 2010 - 9:51 am UTC

ops$tkyte%ORA10GR2> create table t as select * from all_users;

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> alter table t add version number;

Table altered.
<b>
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> merge into t
  2  using (select rowid rid, row_number() over (order by created asc) rn from t) x
  3  on (t.rowid =  x.rid)
  4  when matched then update set version = x.rn;

46 rows merged.
</b>

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select * from (
  2  select * from t order by version ) where version <= 5;

USERNAME                          USER_ID CREATED                 VERSION
------------------------------ ---------- -------------------- ----------
SYS                                     0 30-jun-2005 19:10:18          1
SYSTEM                                  5 30-jun-2005 19:10:18          2
OUTLN                                  11 30-jun-2005 19:10:24          3
DIP                                    19 30-jun-2005 19:14:45          4
TSMSYS                                 21 30-jun-2005 19:20:34          5

Updating existing rows with sequence

February 17, 2010 - 3:46 am UTC

Reviewer: SAI from INDIA

Tom,

Thank you for the follow up.

my table t looks some thing like this.

USERNAME USER_ID CREATED VERSION
------------------------------ ---------- -------------------- ----------
SYS 0 30-jun-2005 19:10:18 1
SYS 0 30-jun-2005 19:11:18 2
SYSTEM 5 30-jun-2005 19:10:18 1
OUTLN 11 30-jun-2005 19:10:24 1
OUTLN 11 30-jun-2005 19:11:24 2
OUTLN 11 30-jun-2005 19:13:24 3
DIP 19 30-jun-2005 19:14:45 1
TSMSYS 21 30-jun-2005 19:20:34 1




Tom Kyte

Followup  

February 17, 2010 - 9:15 am UTC

just use partition by if you need to break up and reset by username.

ops$tkyte%ORA11GR2> create table t as select * from all_users where username like 'C%' or username like 'S%' or username like 'M%';

Table created.

ops$tkyte%ORA11GR2> insert into t select * from all_users where username like 'C%' or username like 'S%';

9 rows created.

ops$tkyte%ORA11GR2> insert into t select * from all_users where username like 'S%';

8 rows created.

ops$tkyte%ORA11GR2> alter table t add version number;

Table altered.

ops$tkyte%ORA11GR2> merge into t
  2  using (select rowid rid, <b>row_number() over (PARTITION BY USERNAME order by created asc) rn</b> from t) x
  3  on (t.rowid =  x.rid)
  4  when matched then update set version = x.rn;

29 rows merged.

ops$tkyte%ORA11GR2> select * from t order by username, version;

USERNAME                          USER_ID CREATED                VERSION
------------------------------ ---------- ------------------- ----------
CTXSYS                                 43 2009-08-13 23:09:45          1
CTXSYS                                 43 2009-08-13 23:09:45          2
MDDATA                                 65 2009-08-13 23:19:11          1
MDSYS                                  57 2009-08-13 23:12:05          1
MGMT_VIEW                              74 2009-08-13 23:24:58          1
SCOTT                                  84 2009-08-13 23:35:44          1
SCOTT                                  84 2009-08-13 23:35:44          2
SCOTT                                  84 2009-08-13 23:35:44          3
SH                                     88 2009-09-14 15:09:16          1
SH                                     88 2009-09-14 15:09:16          2
SH                                     88 2009-09-14 15:09:16          3
SI_INFORMTN_SCHEMA                     56 2009-08-13 23:12:05          1
SI_INFORMTN_SCHEMA                     56 2009-08-13 23:12:05          2
SI_INFORMTN_SCHEMA                     56 2009-08-13 23:12:05          3
SPATIAL_CSW_ADMIN_USR                  70 2009-08-13 23:22:20          1
SPATIAL_CSW_ADMIN_USR                  70 2009-08-13 23:22:20          2
SPATIAL_CSW_ADMIN_USR                  70 2009-08-13 23:22:20          3
SPATIAL_WFS_ADMIN_USR                  67 2009-08-13 23:22:15          1
SPATIAL_WFS_ADMIN_USR                  67 2009-08-13 23:22:15          2
SPATIAL_WFS_ADMIN_USR                  67 2009-08-13 23:22:15          3
SYS                                     0 2009-08-13 23:00:59          1
SYS                                     0 2009-08-13 23:00:59          2
SYS                                     0 2009-08-13 23:00:59          3
SYSMAN                                 72 2009-08-13 23:22:32          1
SYSMAN                                 72 2009-08-13 23:22:32          2
SYSMAN                                 72 2009-08-13 23:22:32          3
SYSTEM                                  5 2009-08-13 23:00:59          1
SYSTEM                                  5 2009-08-13 23:00:59          2
SYSTEM                                  5 2009-08-13 23:00:59          3

29 rows selected.


Updating existing rows with sequence

February 18, 2010 - 3:51 am UTC

Reviewer: SAI from INDIA

Thank you tom. Thank you very much.

Update huge table

March 03, 2010 - 3:45 pm UTC

Reviewer: Andrey from Russia, St.Petersburg

Hello!
I'm using next method last 3 years & it works well
(But i'm not working with clustered tables...):
1) Open cursor for the query
SELECT O.DATA_OBJECT_ID
,E.RELATIVE_FNO AS FILE_ID
,E.BLOCK_ID
,E.BLOCKS
FROM ALL_OBJECTS O
,DBA_EXTENTS E
WHERE O.OWNER = :PR_SCHEMA_NAME
AND O.OBJECT_NAME = :PR_TABLE_NAME
AND O.DATA_OBJECT_ID IS NOT NULL
AND E.OWNER = O.OWNER
AND E.SEGMENT_NAME = O.OBJECT_NAME
AND NVL(E.PARTITION_NAME, E.SEGMENT_NAME) = NVL(O.SUBOBJECT_NAME, O.OBJECT_NAME)
ORDER BY E.FILE_ID
,E.BLOCK_ID
2) Split by this cursor segment to many continuous parts
of blocks (i am using regions from 8000 to 12000 blocks) & construct start and finish rowids of regions by dbms_rowid
3) Save regions parameters to a queue
4) Start scheduler processes which read regions from queue and update data only between start_rowid and end_rowid of the region with commit for any operation.

P.S.> I've update 525 000 000 records on the production DB for an 2:40:15. 24CPU, 16 Job process, 24Gb mem + high-perf. storage.

Tom Kyte

Followup  

March 04, 2010 - 8:27 am UTC

we call that "do it yourself parallelism", written about many times.

http://asktom.oracle.com/pls/asktom/f?p=100:11:4101082379852146::::P11_QUESTION_ID:10498431232211


but you have to remember that is completely and totally changes the transaction semantics - you have to understand the ramifications of doing it that way.

11g Release 2 actually formalizes this approach using the new dbms_parallel_execute package.

Merging 3 very huge tables.

March 11, 2010 - 2:47 am UTC

Reviewer: Prats

Tom,

In one of the legacy database we have inherited, there are 3 existing tables with millions of records. Most of their columns are redundant and now we are planning to merge their data together in a new table which would contain all columns from these tables.

For this I will essentially have to perform following-
Step 1) insert table_A into new_table
Step 2) Traverse through table_B update respective columns in new_table [if corresponding record has been inserted from table_A].
Else insert new record from table_B.
Step 3) Repeat step 2 with table_C to update existing records and insert extra records from table_C

Though this is one time job, it would be done at production systems.What would be the optimized approach to go about it.
Please guide...
Tom Kyte

Followup  

March 11, 2010 - 8:31 am UTC

why not insert a join of the three????

RE: Merge 3 huge

March 11, 2010 - 3:35 pm UTC

Reviewer: Duke Ganote from Amelia, OH USA

Like this?

create table A 
as ( select level AS key_value, level as attribute_value
     from dual connect by level < 5 )
;
create table B 
as select level-1 AS key_value, level as attribute_value
     from dual connect by level < 5
;
create table C
as select level-2 AS key_value, level as attribute_value
     from dual connect by level < 5
;

create table combined_table as
select coalesce(c.key_value,b.key_value,a.key_value) as key_value
     , coalesce(c.attribute_value,b.attribute_value,a.attribute_value) as key_attribute
  from A
  full 
  join b
    on b.key_value = A.key_value
  full 
  join c
    on c.key_value = coalesce(b.key_value,a.key_value)
;
select key_value as "KEY", key_attribute AS "ATTR"
     , case when key_value = key_attribute then 'A'
            when key_value = key_attribute-1 then 'B'
            when key_value = key_attribute-2 then 'C'
        END as SRC
  from combined_table order by key_value
;
KEY   ATTR SRC
-1 1 "C"
 0 2 "C"
 1 3 "C"
 2 4 "C"
 3 4 "B"
 4 4 "A"

Updating a filed with a subquery for 94 million row table

March 12, 2010 - 11:39 am UTC

Reviewer: CT from GA, USA

Hi Tom,
Hello, how are you? Based on your recocmmendation I tried updating a 94 million row table T1 as follows
where T1's structure is
mpi varchar2(25)
tid Primarykey number
ttime date

and structure of x is
oldmpi varchar2(25)
newmpi varchar2(25)
name varchar2(50)

create table t2 nologging as select (select newmpi from x where x.oldmpi = t1.mpi) mpi, tid, ttime from T1;

when table t2 was created the mpi field did not have any value. In short the subquery that I used to populate the field does not seem to have worked. The timing for the above was about 40 mins or so, which was great! I really need help with this issue as I have not been able to find any method by which to update this huge table. Your help with this is greatly appreciated
Tom Kyte

Followup  

March 12, 2010 - 3:56 pm UTC

you would have to give me an example, with data - as far as I can see, it would have worked correctly as typed in.

However, I would not have done it that way, you should use:

select x.newmpi, t1.tid, t1.ttime from t1, x where t1.mpi = x.oldmpi(+);


RE: "the mpi field did not have any value"

March 12, 2010 - 12:31 pm UTC

Reviewer: Duke Ganote from Amelia, OH USA

Sounds like basic debugging...what do you get with this query?
select count(*) 
  from x 
  join t1
    on x.oldmpi = t1.mpi

If the count result is zero, then it's unsurprising that "the mpi field did not have any value". If it's non-zero, then it is surprising.

The count was non zero

March 12, 2010 - 1:14 pm UTC

Reviewer: CT from GA, USA

I ran the query as suggested by you and the count was non zero. There are matching records in T1 on mpi column between x.oldmpi and T1.mpi. Is there any reason why the subquery may not have worked maybe it is returning more than one row? Thanks
Tom Kyte

Followup  

March 12, 2010 - 4:10 pm UTC

if it returned more than one row, it would fail.

You would need to provide a full example, but again - you want to outer join, NOT use a scalar subquery.

But until we have a FULL EXAMPLE that shows exactly what you did - we cannot comment on how it might have gone wrong for you.

creating a new table after joining two large tables

April 12, 2010 - 7:29 am UTC

Reviewer: Somjit from India

Hi Tom,

I read the solutions you gave in this thread. I have two tables:

A => 1200Million records
B => 70Million records


The tables are not partitioned and we have a plan in place to implement that now. The issue is that we are creating a new table which would contain data starting from 01.Jan.2010 only and for this purpose want to create a new table A_1 which has the same cols as A but with an additional column eventdate:

This is how I proceeded:

create table b_1 as select * from b where 1=2;

alter table b_1 add (eventdate date);

alter table b_1 nologging;

INSERT /*+ append parallel (C,12) */
into a_1 C
SELECT /*+ parallel (A,12) */
A.TRANS_ID, ..<all columns from table>..,
To_Timestamp(to_char(B.EVENTDATE,'dd.mm.yyyy')||' '||B.Eventtime,'dd.mm.yyyy hh24:mi:ss.FF')
from A, B
where B.EventDate >=  to_date('2010-01-01','YYYY-MM-DD')
where A.trans_Id = B.Trans_Id


columns used in predicate are indexed. B.Trans_Id will be unique and A.Trans_Id will be a normal index as it has duplicate values. The concern is that this join condition is doing a heavy sort operation because of which the TEMP tablespace is getting filled up. How do I get this working as efficiently as possible?
Tom Kyte

Followup  

April 13, 2010 - 9:17 am UTC

are you sure temp is getting filled - or is it saying "unable to extend temporary segment in tablespace XXX" where xxx is the target tablespace for this table?

This won't be sorting, this'll be HASHING (I would hope). I would expect B to be full scanned first, the event date predicate evaluated and the rows that are > 2010-01-01 to be hashed into memory and spilling on to disk if necessary. Then the join to A.


If the hash table is spilling to disk you can increase your pga settings, or if you are using old manual memory management the hash area size.


New method

April 19, 2010 - 11:22 pm UTC

Reviewer: Michel from UK

Hi Tom,

This link helped me , But still confusing

A- 170Milion rows
B-80 milion rows

some what like this i am doing

merge /*+ parallel(TableA,4) */ into TableA
using TableB K
on(TableA.Col1=K.col1)
when matched then
Update set
TableA.col2=K.col2,
TableA.col3=K.col3,
TableA.col4='p';

explain plan is taking more time , hw ca i write by using nologging method ...
Tom Kyte

Followup  

April 20, 2010 - 8:13 am UTC

why do you think nologging would make it go faster - it probably would not - the redo is written out in the background, you are not really 'waiting' for it.

You cannot merge "nologging", the insert part of the merge can be done nologging (use the append hint), but NOT the update part, NOT the index maintenance part.

the update part will always be done logging
the index maintenance part will always be done logging

rename

April 20, 2010 - 12:44 pm UTC

Reviewer: Michel

Tom,

My qn was how to write the same requirement not using merge
by using table rename method,
Tom Kyte

Followup  

April 20, 2010 - 12:56 pm UTC

wow, that is about as far away from what I thought your question was as possible. I sure didn't see that from the above...


To do the "create a new table, and rename" trick, you would have to full outer join OLD with NEW (you want to keep all OLD records even if not in NEW, you want all NEW records even if not in OLD)

It'll be like this:


create table new_table
as
select coalesce( old.key, new.key ) key,
       ... repeat this for every column except join key ...
       case when new.key is null then old.column else new.column end, 
       ... above repeated .....
  from old full outer join new on (old.key=new.key);



if old.key is null - record is in new but not old.
if new.key is null - record is in old but not new.
if neither is null - record is in both.

coalesce will return "the key" if old is null, returns the new key (which must not be null)

the case will return the old column if the new key is null - else return the new column (if the new.key is not null - then the new.column overrides anything in the old column)




creating a new table after joining two large tables contd

April 21, 2010 - 7:23 am UTC

Reviewer: Somjit from India

Hello Tom,

This is continuation from my earlier post.

I tried to run an explain plan on my query

SQL> explain plan for
  2  INSERT /*+ append parallel (C, 8) */
  3  into SEALSNET_STATPER_INST_P C
  4  SELECT /*+ parallel (A,8) parallel (b,8) */
  5  A.TRANS_ID, A.INST_ID, A.INST_LOGGERID, A.INST_CREATIONDATE, A.INST_CREATIONTIME, A.INST_COMPCLASS,
  6    A.INST_COMPTYPE, A.INST_COMPNAME, A.INST_TAG, A.INST_UOM, A.INST_VALUE, A.INST_PLATFORM, A.INST_PRIORITY,
  7  To_Timestamp(to_char(B.EVENTDATE,'dd.mm.yyyy')||' '||B.Eventtime,'dd.mm.yyyy hh24:mi:ss.FF')
  8  from SEALSNET_STATPER_INST_BCK A, sealsnet_statper B
  9  where A.trans_Id = B.Trans_Id
 10  and B.EventDate >= to_date('2010-01-01','YYYY-MM-DD');

Explained.

Elapsed: 00:00:00.01
SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2340029746

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |                           |   186M|    39G|   239K  (1)| 00:48:00 |        |      |            |
|   1 |  LOAD AS SELECT          | SEALSNET_STATPER_INST_P   |       |       |            |          |        |      |            |
|   2 |   PX COORDINATOR         |                           |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)   | :TQ10002                  |   186M|    39G|   239K  (1)| 00:48:00 |  Q1,02 | P->S | QC (RAND)  |
|*  4 |     HASH JOIN BUFFERED   |                           |   186M|    39G|   239K  (1)| 00:48:00 |  Q1,02 | PCWP |            |
|   5 |      PX RECEIVE          |                           |    29M|  1327M| 58068   (1)| 00:11:37 |  Q1,02 | PCWP |            |
|   6 |       PX SEND HASH       | :TQ10000                  |    29M|  1327M| 58068   (1)| 00:11:37 |  Q1,00 | P->P | HASH       |
|   7 |        PX BLOCK ITERATOR |                           |    29M|  1327M| 58068   (1)| 00:11:37 |  Q1,00 | PCWC |            |
|*  8 |         TABLE ACCESS FULL| SEALSNET_STATPER          |    29M|  1327M| 58068   (1)| 00:11:37 |  Q1,00 | PCWP |            |
|   9 |      PX RECEIVE          |                           |   289M|    48G|   181K  (1)| 00:36:18 |  Q1,02 | PCWP |            |
|  10 |       PX SEND HASH       | :TQ10001                  |   289M|    48G|   181K  (1)| 00:36:18 |  Q1,01 | P->P | HASH       |
|  11 |        PX BLOCK ITERATOR |                           |   289M|    48G|   181K  (1)| 00:36:18 |  Q1,01 | PCWC |            |
|  12 |         TABLE ACCESS FULL| SEALSNET_STATPER_INST_BCK |   289M|    48G|   181K  (1)| 00:36:18 |  Q1,01 | PCWP |            |
-----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("A"."TRANS_ID"="B"."TRANS_ID")
   8 - filter("B"."EVENTDATE">=TO_DATE('2010-01-01 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))

25 rows selected.


Is there some way I can reduce the huge space required in "HASH JOIN BUFFERED" area? 39G required
Tom Kyte

Followup  

April 21, 2010 - 9:03 am UTC

you could use indexes and nested loops - but then you would be asking me how to make this run in less then 100 days.



40gb is pretty small these days... given the volume of data we work with.

how to audit ddl command from a user in oracle 9i

April 22, 2010 - 10:46 am UTC

Reviewer: Soumya Samanta from INDIA

hi, I am Soumya Samanta. I want to audit ddl commands issued from a user in oracle 9i environment.
How do i do that?

Tom Kyte

Followup  

April 22, 2010 - 12:53 pm UTC

the AUDIT command would work.
http://otn.oracle.com/pls/db92/search?remark=&word=audit&book=&preference=

you could write you own using a system event trigger as well.

http://otn.oracle.com/pls/db92/search?word=system+event+trigger

Slow update

April 29, 2010 - 6:16 am UTC

Reviewer: Philippe Gaudin

Hi Tom,

I have a query that has to update 23.000.000 rows in a table. Here are the query and the plan.

SQL> EXPLAIN PLAN
  SET STATEMENT_ID = 'PGA_TST1'
  FOR
UPDATE (SELECT current_pol_cov_trans.pct_cov_subscription_value AS cur_cov_subscription_value,
               current_pol_cov_trans.pct_cov_in_force_value AS cur_cov_in_force_value,
               current_pol_cov_trans.pct_cov_gross_amount AS cur_cov_gross_amount,
               current_pol_cov_trans.pct_cov_net_amount AS cur_cov_net_amount,
               current_pol_cov_trans.pct_cov_tariff_amount AS cur_cov_tariff_amount,
               previous_pol_cov_trans.pct_cov_subscription_value AS prev_cov_subscription_value,
               previous_pol_cov_trans.pct_cov_in_force_value AS prev_cov_in_force_value,
               previous_pol_cov_trans.pct_cov_gross_amount AS prev_cov_gross_amount,
               previous_pol_cov_trans.pct_cov_net_amount AS prev_cov_net_amount,
               previous_pol_cov_trans.pct_cov_tariff_amount AS prev_cov_tariff_amount,
               current_pol_cov_trans.pct_subscription_counter,
               current_pol_cov_trans.pct_in_force_counter,
               current_pol_cov_trans.pct_cov_delta_net_amount,
               current_pol_cov_trans.pct_cov_delta_tariff_amount
          FROM fact_policy_cov_trans current_pol_cov_trans
               JOIN fact_policy_cov_trans previous_pol_cov_trans
                 ON (previous_pol_cov_trans.pct_policy_version_skey = current_pol_cov_trans.pct_prev_version_skey
                     AND previous_pol_cov_trans.pct_coverage_skey = current_pol_cov_trans.pct_coverage_skey))
   SET pct_subscription_counter     =
         (CASE
            WHEN cur_cov_subscription_value = 'N' AND prev_cov_subscription_value <> 'N' THEN -1
            WHEN cur_cov_subscription_value <> 'N' AND prev_cov_subscription_value = 'N' THEN +1
            ELSE 0
          END),
       pct_in_force_counter         =
         (CASE
            WHEN cur_cov_in_force_value = 'N' AND prev_cov_in_force_value <> 'N' THEN -1
            WHEN cur_cov_in_force_value <> 'N' AND prev_cov_in_force_value = 'N' THEN +1
            ELSE 0
          END),
       pct_cov_delta_net_amount     = cur_cov_net_amount - prev_cov_net_amount,
       pct_cov_delta_tariff_amount  = cur_cov_tariff_amount - prev_cov_tariff_amount
Explain complete.
SQL> SELECT * FROM table (DBMS_XPLAN.display (NULL, 'PGA_TST1', 'BASIC PARALLEL ROWS'))

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------
Plan hash value: 3795694554

-----------------------------------------------------------------------------------------------
| Id  | Operation                | Name                  | Rows  |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT         |                       |    14M|        |      |            |
|   1 |  UPDATE                  | FACT_POLICY_COV_TRANS |       |        |      |            |
|   2 |   PX COORDINATOR         |                       |       |        |      |            |
|   3 |    PX SEND QC (RANDOM)   | :TQ10002              |    14M|  Q1,02 | P->S | QC (RAND)  |
|   4 |     HASH JOIN BUFFERED   |                       |    14M|  Q1,02 | PCWP |            |
|   5 |      PX RECEIVE          |                       |    22M|  Q1,02 | PCWP |            |
|   6 |       PX SEND HAS