Thanks for the question, Murali.

Asked: November 10, 2002 - 9:21 pm UTC

Last updated: August 12, 2022 - 3:05 am UTC

Version: 8.1.7

Viewed 100K+ times! This question is

You Asked

Good Morning Tom.

I need your expertise in this regard. I got a table which contains millions or records. I want to update and commit every time for so many records ( say 10,000 records). I dont want to do in one stroke as I may end up in Rollback segment issue(s). Any suggestions please ! ! !

Murali

and Tom said...

If I had to update millions of records I would probably opt to NOT update.

I would more likely do:

CREATE TABLE new_table as select <do the update "here"> from old_table;

index new_table
grant on new table
add constraints on new_table
etc on new_table

drop table old_table
rename new_table to old_table;

you can do that using parallel query, with nologging on most operations generating very little redo and no undo at all -- in a fraction of the time it would take to update the data.

Rating

(462 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

updating millions of records

Om, November 11, 2002 - 11:09 am UTC

Hi Tom,

Could you please elaborate

CREATE TABLE new_table as select <do the update "here"> from old_table;

the above statement with a suitable example please

November 11, 2002 - 5:45 pm UTC

ok, say you wanted to update emp to set ename = lower(ename).  Instead, you could do this:

ops$tkyte@ORA817DEV.US.ORACLE.COM> create table new_emp as
  2  select empno, LOWER(ename) ename, JOB,
  3     MGR, HIREDATE, SAL, COMM, DEPTNO
  4   from emp;

Table created.

ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> drop table emp;

Table dropped.

ops$tkyte@ORA817DEV.US.ORACLE.COM> rename new_emp to emp;

Table renamed.

Million records update

Ramesh G, November 11, 2002 - 1:44 pm UTC

What if a table has over 100 million records and if i only want to update 1 million? If your method is still applicable could you elaborate it.

Many thanks in advance.

November 11, 2002 - 6:33 pm UTC

most likely -- yes. I don't have a 100million row table to test with for you but -- the amount of work required to update 1,000,000 indexed rows is pretty large. Fortunately, you are probably using partitioning so you can do this easily in parallel -- bit by bit.

"How to Update millions or records in a table", version 8.1.7

John Bittner, November 11, 2002 - 2:21 pm UTC

This is absolutely a viable approach, and one we have used repeatedly. One of our apps updates a table of several hundred million records.

The cursor..For loop approach for the update was calculated to take 53.7 years to complete!
We institued the Insert into a dummy table append with nologging, and were able to complete the "update" in under 30 minutes.

With nologging, if the system aborts, you simply re-run the 'update' again, as you have the original data in the main table. When done, we swap the partition of original data with the 'dummy' table (the one containing new values), rebuild indexes in parallel, and wha-la! Our update is complete.

i.e, to update field2 in a table:

1) First create your dummy hold table: create table xyz_HOLD as select * from xyz where rownum<1. Alter tablexyz nologging.

2) insert /*+ append parallel (xyzhold,12) */ into xyz_hold xyzhold (field1, field2, field3) select /*+ parallel (x,12) */ xyz.field1, my_new_value_for_field2, xyz.field3 from xyz x where blah blah blah.

3) when done, either rename the table, or swap the partition if your original table is partitioned, and you only updated one partition as we do. Obviously you need to rebuild indecies, etc as required.

Hope this helps!

updating millions of records

A.Ashiq, November 11, 2002 - 11:28 pm UTC

Hi Tom,

As u suggested us to create a new table ,then drop the original table and rename the new table to original table instead of updating a table with millions of records.But what happen to dependent objects ,everything will get invalidated.Yeah ,of course it'll recompile itself when it called next time.But again it(dependent objects) has to do parsing.Is it Ok.

November 12, 2002 - 10:11 am UTC

It is OK.

in case of delete

A reader, November 12, 2002 - 9:58 am UTC

We've a similar situation., We delete around 3 million records from 30 million rows table everyday.
There is no logical column to do partition.,
I guess the insert into a new table will take considerable time with 27 mil records.. Please let me know what is the best approach.

November 12, 2002 - 10:35 am UTC

wait 10 days so that you are deleting 30 million records from a 60 million record table and then this will be much more efficient.

Time it some day though. 3 million records on an indexed table will take considerable time. There is a chance that INSERT /*+ append */ select <rows to keep> done nologging, parallel with create indexes in parallel could best the delete.

A reader, November 12, 2002 - 12:43 pm UTC

Tom,

Recently I had conducted a interview in which one the dba mentioned that they had a table that might conatin 10 million records or might be 1 million. He meant to say they delete the records and some time later the table will be populated again and viceversa.

Tom according to you do you consider partitions for such tables and if yes which type of partition..

Thanks.

November 13, 2002 - 12:31 pm UTC

hard to tell -- is the data deleted by something that is relatively constant (eg: the value in that column doesn't change - so the row doesn't need to move from partition to partition). If so -- sure, cause we could just drop partitions (fast) instead of deleting the data.

Agree with John - insert append / CTAS / partition swap is the only way to fly

Jack Silvey, November 12, 2002 - 1:19 pm UTC

I work with John Bittner, one of the previous reviewers. I second what he said absolutely. It is the only way to fly. We also have an absolutely incredible stored procedure that rebuilds all of our indexes concurrently after the load, using the Oracle job scheduler as the mechanism of allowing separate threads in pl/sql.

addendum

A reader, November 13, 2002 - 10:39 am UTC

This process was introduced to our environment by a master tuner and personal friend, Larry Elkins. This was a totally new paradigm for the application, and one that saved the entire mission-critical application. The in-place updates would not have worked with the terrabytes of data that we have in our database.

How to Update millions or records in a table

Boris Milrud, November 19, 2002 - 1:55 pm UTC

In response to the Jack Silvey (from Richardson, TX ) review, where he wrote "It is the only way to fly. We also have an absolutely incredible stored procedure that rebuilds all of our indexes concurrently after the load, using the Oracle job scheduler as the mechanism of allowing separate threads in pl/sql":

Could you provide more information about that procedure and how to rebuild multiple same-table indexes concurrently using Oracle job scheduler?

Thanks,
Boris.

November 19, 2002 - 5:29 pm UTC

instead of

begin
execute immediate 'alter index idx1 rebuild';
execute immediate 'alter index idx2 rebuild';
end;

you can code:

declare
l_job number;
begin
dbms_job.submit( l_job, 'execute immediate ''alter index idx1 rebuild'';' );
commit;
dbms_job.submit( l_job, 'execute immediate ''alter index idx2 rebuild'';' );
commit;
end;

Now, just set job_queue_processes > 0 to set the "degree of threads" and in 8i and before set job_queue_interval to say 60 or so and there you go.

How to Update millions or records in a table

Boris Milrud, November 19, 2002 - 9:10 pm UTC

Thanks, Tom.
Your status said that you had a large backlog, so I decided not to wait for your response and tried myself using dbms_job.submit() calls. At first, it did not work (job_queue_processes was 0), but after I set it to 12 it started working. The only one difference between your code and mine is that I issue just one commit at the end. It should not matter, right?

I selected 1 mln. rows table and rebuild 5 non-partitioned indexes with 'compute statistics parallel nologging' clause.
Here is the numbers I've got: rebuilding indexes sequentually consistently took 76 sec., while using dbms_job.submit() calls took around 40 - 42 sec.

I said "around", because the technique I used may not be perfect, though it served the purpose. I recorded the time right after the commit statement at the end of PL/SQL block - that's the start time. Then I kept querying user_jobs view every 2 - 3 sec, until the last of the 5 jobs were gone. That's was the end time.

The last question on this topic: is user_jobs view is the right place to look in order to determine that's rebuilding is done and how long it took? In package I am writing, I do massive delete operation, then rebuilding indexes, then starting the next routine. What would be the best way to detect the end of rebuiding, in order to proceed with the next call?

Thanks.
Boris.

November 19, 2002 - 10:14 pm UTC

You can use user_jobs or dba_jobs but -- you might just want to put some "logging" into your jobs themselves so you can monitor their progress and record their times.

You can even rebuild all the indexes in a partition simultaneously

ramakrishna, November 21, 2002 - 12:50 am UTC

Thanks Tom,
In our environment, we have partitioned tables and we use:
ALTER TABLE table_name MODIFY PARTITION part_name REBUILD UNUSABLE LOCAL INDEXES

and this rebuilds all the indexes in the partition at one shot. We found this much faster than doing the indexes one by one. We will now try out if we can submit multiple such jobs in parallel (one for each partition of the table).

regards
ramakrishna

in case of deletes

carl, November 29, 2002 - 5:17 pm UTC

Hi Tom,

Thanks so much for your web site and help. It is our
number 1 reference in times of fear and loathing.

This is what we came up with concerning mass updates

INV 50M
INVINS 10M
INVDEL 7M

There are indexes on INV.KEY and INVDEL.KEY

Execution Plan ( for deletes and updates )
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 FILTER
2 1 TABLE ACCESS (FULL) OF 'INV'
3 1 INDEX (RANGE SCAN) OF 'IX_INVDEL01' (NON-UNIQUE)

alter table INVTMP nologging;

-- INVINS contains delta inserts and updates

insert /*+ APPEND */
into INVTMP
select *
from INVINS t1;

-- INVDEL contains delta deletes and updates

insert /*+ APPEND */
into INVTMP
select *
from INV t1
where not exists ( select null
from INVDEL t2
where t2.KEY = t1.KEY );

alter table INVTMP logging;
drop table INV;
rename INVTMP to INV

-- build indexs etc

This is what we came up with and is to the fastest
approach we've tested.

Any comments or suggestions are welcome and appreciated.

November 30, 2002 - 8:49 am UTC

I might try an anti-join on the NOT EXISTS and doing the work in a single parallel CTAS

create table invtmp as
select * from invins
UNION ALL
select t1.*
from inv t1, invdel t2
where t1.key = t2.key(+)
and t2.key is null;

add parallel /nologging as needed.

in case of deletes - many thanks

carl, December 01, 2002 - 7:45 pm UTC

Ran the test cases at home 500K rows in INV and 50K rows in
INVINS and INVDEL

My way: 16.33 secs
Your way: 08.35 secs

Many thanks.

A bit confused

Choche, March 17, 2003 - 10:30 am UTC

I was just wandering that none of the reviews made mention if these techniques could be applied in a multi-user environment where multiple users could be updating the same table at the same time.

March 17, 2003 - 10:32 am UTC

sorry -- I though it obvious that in most cases "no" is the answer. we are copying the data or locking excessive amounts of it -- or disabling indexes and the like.

This is a "batch" process here, to update millions of records.

Replacing PK values

Michael, July 25, 2003 - 6:42 pm UTC

We need to replace the main ID for all entries in the database which is the PK in about 100 tables, largest of them have about 2 mln rows. New and old values for ID are stored in the lookup table, and there about half a million of them. The original approach was to replace PK in all tables for one ID value, commit the changes and move on to the next ID. This way referential integrity would be guaranteed even if update fails at any stage, and we do not need a long rollback segments and can restart the process at the point it aborted. But this update is too lengthy, and in our case would take almost a month of continuous processing. Do you have any suggestions on how to improve the performance of this critical update? Would it be better to add new ID as an additional column into one table at a time and populate all of it at once as "create table ... as select" (with nologging option), and then rename table and drop old ID column?

July 25, 2003 - 8:42 pm UTC

sounds like I would:

create table new_table as select <join>;
create table new_table2 as select <join>;
etc etc etc etc

drop old tables
add indexes
add primary/fkeys/constraints etc....

that is -- rebuild in bulk, using nologging (no redo, no undo generated), using parallel when appropriate.

Update million Rows

Reaz, July 26, 2003 - 5:41 am UTC

Dear Tom,

We have this situation, where we load data from external files numbering 7. Next we had to run a transformation process, which checks if the row exists in the target table and updates the existing row with the new one. Otherwise inserts the row to the target table.

Our current process is very slow, sometimes taking a night long to complete for say 200,000 rows.

The procedures discussed so far meant of updating only, but in our case we need to update and insert too.

I would appreciate your suggestion.

Thanks.
Reaz.

July 26, 2003 - 12:53 pm UTC

read about MERGE in the sql reference guide.

A single batch?

A reader, July 26, 2003 - 9:54 am UTC

Tom,
Is it still okay, if these steps (create a new table, drop the old one, and them rename the new table as old, create indexes etc.) are coded as the batch update program? OR would you suggests these should be detailed as the Steps that should be performed under the supervision of a DBA? What I would like to know is that can we still have the steps in a single batch routine (as update would have had) or instead of the single batch these should be DBA instructions instead?
Thanks

July 26, 2003 - 12:54 pm UTC

I would have these performed under the watchful eye of someone.

it is a one time bulk manipulation. it won't be a "pretty program" with tons of gracefully logging/error handling. You'll be doing it "down and dirty" (at least I would be). It would need someone to watch over it.

What about space consideration

A reader, July 28, 2003 - 1:39 am UTC

Hi Tom,
I had a similar problem where my table had gone big and i wanted to truncate some old data, my table was big more then 20 million records, i could not use the create temp table truncate and insert flow as the space of the database was not enough to hold 2 copies of the large table, I thought of doing a partition on table and then truncate one partition of table but that does not work.
Can you suggest some way which caters for this space constraint also

July 28, 2003 - 7:04 am UTC

why didn't the partitioning work for you.

How to update part of a Very Large Table

Mohan K, July 28, 2003 - 6:11 am UTC

The update operation can be made simple by sepearting out the query and update operation. To do that create another table which stores the rowids of the records in the original table which has to be updated along with the the value to be updated. Then run the PL/SQL script to update the records one by one. To test the above method the following scripts can be used.

SQL> create table test_update(id number, name varchar2(100), description varchar2(4000)) storage initial(48M next 4m);

SQL> declare
v_n number;
v_name number;
v_desc number;
i number;
begin
for i in 1..1000000 LOOP
insert into test_update(id, name, description) values(i, 'Test Name'||i, 'Test Name '||i||' description ');
END LOOP;
end;
/

Elapsed: 00:04:277.23

The above script will insert 1 million rows.

SQL> select count(*) from test_update where description like '%5 description%';

  COUNT(*)
----------
    100000

Elapsed: 00:00:02.63

SQL> create table test_update_rowids as select rowid rid, description from test_update where description like '%5 description%';

Elapsed: 00:00:54.58

The table test_update_rowids stores the rowids and the new values that has to be updated. ie 100000 rows needs to be updated.

SQL>  declare
begin
for c1 in(select rid, description from test_update_rowids)
LOOP
update test_update set description=c1.description||' after update' where rowid=c1.rid;
END LOOP;
end;
/

Elapsed: 00:01:82.17

The above script performs the updation. 

Mohan

July 28, 2003 - 7:37 am UTC

the above script performs an "updation" in the worst possible way, consuming the most resources.

it should have been simply:

update test_udpate set description = description || ' after update ' where description like '%5 description%';

Never do procedurally that which can be done in a single sql statement.

Updating 100K of 1 Million rows

Mohan K, July 28, 2003 - 8:38 am UTC

I ran the update statement as a single SQL statement and the time elapsed is slightly more than the above method. It has to be tested against update statements containing complex queries or join operations.

SQL> update test_update set description = description || ' after update ' where description like '%5 description%';

Elapsed: 00:01:100.13


Mohan

July 28, 2003 - 8:53 am UTC

trust me -- that difference is neglible and the amount of time you spent writing code wasn't included.

Also, you FORGOT to add in the time to populate your temporary table -- no?


if you can do it in a single sql statment (and you almost always can) you SHOULD. 

Lets do you benchmark a little better:

ops$tkyte@ORA920> create table test_update(id number, name varchar2(100), description varchar2(4000));

Table created.

ops$tkyte@ORA920>
ops$tkyte@ORA920> insert /*+ append */
  2  into test_update
  3  select rownum, 'Test Name' || rownum, 'Test Name ' || rownum || ' description '
  4    from big_table.big_table
  5   where rownum <= 1000000;

1000000 rows created.

ops$tkyte@ORA920> commit;

Commit complete.

ops$tkyte@ORA920>
ops$tkyte@ORA920> select count(*)
  2    from test_update
  3   where description like '%5 desc%' ;

  COUNT(*)
----------
    100000

ops$tkyte@ORA920>
ops$tkyte@ORA920> select count(*)
  2    from test_update
  3   where description like '%6 desc%' ;

  COUNT(*)
----------
    100000

ops$tkyte@ORA920>
ops$tkyte@ORA920> exec runstats_pkg.rs_start

PL/SQL procedure successfully completed.

ops$tkyte@ORA920>
ops$tkyte@ORA920> update test_update
  2     set description = description || ' after update'
  3   where description like '%6 description%';

100000 rows updated.

ops$tkyte@ORA920>
ops$tkyte@ORA920> exec runstats_pkg.rs_middle

PL/SQL procedure successfully completed.

ops$tkyte@ORA920>
ops$tkyte@ORA920> create table test_update_rowids
  2  as
  3  select rowid rid, description from test_update where description like '%5 description%';

Table created.

ops$tkyte@ORA920>
ops$tkyte@ORA920> begin
  2     for c1 in(select rid, description from test_update_rowids)
  3     LOOP
  4        update test_update
  5           set description=c1.description||' after update'
  6         where rowid=c1.rid;
  7     END LOOP;
  8  end;
  9  /

PL/SQL procedure successfully completed.

ops$tkyte@ORA920>
ops$tkyte@ORA920>
ops$tkyte@ORA920> exec runstats_pkg.rs_stop
Run1 ran in 1376 hsecs
Run2 ran in 9806 hsecs
run 1 ran in 14.03% of the time

<b>so, according to me, the SINGLE (efficient, effective, correct way) update statment runs in about 14% of the time.  furthermore:</b>


Name                                  Run1        Run2        Diff
STAT...recursive cpu usage               5       4,662       4,657
STAT...CPU used by this sessio         448       5,397       4,949

<b>only 4,949 MORE cpu seconds using the row by row approach...</b>

STAT...CPU used when call star         448       5,397       4,949
STAT...free buffer inspected        10,162      17,039       6,877
STAT...db block gets               111,705     104,450      -7,255
STAT...dirty buffers inspected       9,016      16,620       7,604
STAT...calls to kcmgas               7,755          68      -7,687
STAT...Cached Commit SCN refer       7,718           0      -7,718
STAT...switch current to new b       7,719           0      -7,719
STAT...cleanouts only - consis           0       7,720       7,720
STAT...immediate (CR) block cl           0       7,720       7,720
STAT...commit txn count during           0       7,721       7,721
STAT...cleanout - number of kt           3       7,726       7,723
STAT...consistent gets - exami           3       7,760       7,757
LATCH.undo global data                 221       7,985       7,764
STAT...redo entries                100,700     108,727       8,027
LATCH.redo allocation              100,795     108,945       8,150
STAT...db block changes            202,723     210,907       8,184
STAT...Elapsed Time                  1,385       9,825       8,440
STAT...physical reads                7,894      16,476       8,582
LATCH.checkpoint queue latch        26,845      38,012      11,167
LATCH.cache buffers lru chain       35,638      52,827      17,189
STAT...no work - consistent re       7,741     100,072      92,331
STAT...table scan blocks gotte       7,719     107,721     100,002
STAT...table scan rows gotten    1,000,000   1,100,002     100,002
STAT...execute count                    27     100,092     100,065
STAT...buffer is not pinned co       7,763     107,836     100,073
STAT...calls to get snapshot s          56     100,241     100,185
LATCH.shared pool                      172     100,778     100,606
STAT...session logical reads       119,503     220,165     100,662
STAT...consistent gets               7,798     115,715     107,917
LATCH.library cache pin                233     200,865     200,632
STAT...recursive calls                 177     200,870     200,693
LATCH.library cache                    332     201,322     200,990
LATCH.cache buffers chains         587,377     835,167     247,790
STAT...redo size                31,323,804  31,836,904     513,100
STAT...session pga memory max            0     524,824     524,824
STAT...session pga memory                0   1,049,112   1,049,112

Run1 latches total versus runs -- difference and pct
Run1        Run2        Diff       Pct
754,602   1,553,986     799,384     48.56%

PL/SQL procedure successfully completed.

<b>and worse, it used 2x the number of latches!  (scalability inhibitors)

There is nothing good about row at a time processing, it is something to do ONLY when a set based operation isn't feasible</b>

This is very good example for updating the records

Bipin Ganar, July 28, 2003 - 9:00 am UTC

Hi Tom,
Yes this is good example to update the record but my problem
is i have 1Million records in one table. I want update and give hike of 10% of their salaries in one table but that table is accessed by n no of people at the same moment. I can not drop the table and update and second how can i define tablespace in this script
Create Table a as select * from emp_sal;

July 28, 2003 - 9:10 am UTC

create table a
tablespace foo
as
select * from emp_sal;

slightly different problem!

Srikanth Adiga, July 28, 2003 - 12:33 pm UTC

My table:
5M records are inserted each month.

Operations:
5M records inserted on the first day of the month
5M records updated on rest of the days on the month.
Note that this update is update each row with diff values.

I am using OCI and using batch operations. Can you please suggest me what else can I do within my program to improve the performance of my updates? I tried multithreading, but that did not help.

anything else?

July 28, 2003 - 12:48 pm UTC

it only takes a couple of minutes to update 5 million rows on my desktop pc -- so, there must be something more involved here.

I'd be able to do that in minutes -- why is it taking you so long -- what is your bottleneck.

space considerations

A reader, July 28, 2003 - 8:56 pm UTC

could not create another temp table which holds the data same as teh table being partitioned as there was not enuff space in the harddisk to hold 2 or 1.5 to be precise copies of the same huge table,is there a way to partition without having to drop of truncate or transfer data to some temp location and then partition

July 29, 2003 - 6:52 am UTC

disk is cheap.

time is money.

your time spent just thinking about this problem cost much more then the disk to solve it would.

penny wise, pound foolish. How big is this 20million row table (assuming you are the same "a reader" from above). Maybe 1-3 gig?

slightly different problem!

Srikanth Adiga, July 29, 2003 - 12:30 am UTC

couple of minutes!!! Oh dear, thats really too good.

Let me reitterate my case:
My table schema is something like this:
SubscriberVsQtyTable:
int client-id
int year
int month
int quantity
primary key - (client-id, year,month)

This table has 5M client ids.

Now every day, I have update the quantity field with the txns made for that day. Note that, I have to update one subscriber at a time since the number of txns made by each subscriber if different.

So, even now do you believe you can do this in couple of minutes? My app takes about 30 to 50 mins even on higher machines like 8 CPU.

As I said, this is a C++ app, running Solaris + Oracle 8i/9i. We are using OCI.

Thanks,

July 29, 2003 - 7:01 am UTC

if you are doing that work with more then a single sql statement, I could see it taking a while.

If you do that work in a single update, no, I don't see it taking more then a couple of minutes.

Time for Update

Mohan K, July 29, 2003 - 8:27 am UTC

It is my understanding that the time for update depends on the following factors. There may be something else also involved.

1. Row Chaining/Migration caused by update
2. The where cluase is used to select the rows to be updated.
3. Percentage rows updated in a block. The more the better. Even if one row is updated in a block containing 10 records the entire block has to written by DBWR and LGWR.

So just the number of records is not that matters. 100 million recorsd may takes days for one table and may take less than a hour for another table with few columns.

Mohan

July 29, 2003 - 8:31 am UTC

1) updating a quantity (number) field is unlikely to do that.
2) correct
3) correct

no, i disagree with the last part. to update 5million rows, a single unindexed column, should not take "a really long time". I was answering their direct question -- they update every row, on every block. so #1 and #3 really don't come into play. The biggest factor is #2 here

Verifing if neccesary to update, before update

juan carlos reyes, July 29, 2003 - 9:09 am UTC

Hi Tom, I had seen several times that you can
increase the performance by evaluating if you really need to update, or maybe it was only my impression.

For example
Update table set income = 500
where level = 7
AND NOT INCOME = 500

The AND NOT INCOME = 500, allows to avoid unncesary updates,
my point is a question

How advisable is it, does it really increase performance?

As you are always thinking how to increase performance, tt could be a good idea in newer version of Oracle, that if Oracle verifies that the value was set to that value, not update
:)

July 29, 2003 - 11:07 am UTC

if there are more then a handful of records where income = 500, then yes, this will definitely increase performance by

a) decreasing the latching
b) decreasing the redo
c) decreasing the undo

we cannot make that optimization for you -- it would stop triggers and other EXPECTED side effects from happening.

A reader, July 29, 2003 - 11:42 am UTC

:) Thanks,

Maybe then it could be an optional feature
UPDATE_ONLY_IF_DIFFERENT hint
I think it could interest several people
who don't use trigger, neither constraints ,etc.

Stu Charlton, July 29, 2003 - 2:28 pm UTC

"Note that, I have to update one subscriber at a time since the number of txns made by each subscriber if different."

That still seems doable in 1 SQL statement, even if you wanted to merge the updates. Couldn't you just do a COUNT(*) grouped by client_id, year, and month?

July 29, 2003 - 2:38 pm UTC

I agree -- most things CAN be done in a single sql statement.

easier said then done in real life situations

A reader, July 29, 2003 - 8:59 pm UTC

Hi Tom,
I agree you may think getting a disk is cheaper, that is only when u are considering getting up or just going to the store and getting a disk,
In real life conditions specially corporates this is not as easy as it seems nor is it less costly.
A Simple purchase of disk has to first be justified
which is quite difficuult, i dont think i have justification as yet)then be approved then be purchased by purchase dept then be installed or fixed by the server teams, server team may want to take the system down which means more dollars much more.These are just things off my head, when we acutually inititate the purchase process , there will be definitly some more steps and processes.

Coming back to my question, am i right in saying that we can't partion an existing table without copy the data to some temp location,Let me know if there is any other way.

Thanks

July 29, 2003 - 9:27 pm UTC

justification

you pay me alot
disk is cheap.
think about it.

(sometimes it really can be that simple)

we are talking about a couple of gig (like 1 to 3 -- small, tiny -- my /tmp has more room then that) here, something my LAPTOP would not have an issue with.

in order to partition -- think about it -- you'll need the SOURCE DATA and the PARTITIONED DATA -- at the same time, for a period of time. no magic there.

slightly confused !

Srikanth Adiga, July 30, 2003 - 12:00 am UTC

Thanks, Stu and rest for your updates.

Taking this buit more further, since I am confused :)

[Sri]
"Note that, I have to update one subscriber at a time since the number of txns
made by each subscriber if different."
[Sri]

[Stu Charlton]
That still seems doable in 1 SQL statement, even if you wanted to merge the
updates. Couldn't you just do a COUNT(*) grouped by client_id, year, and month?

Followup:
I agree -- most things CAN be done in a single sql statement.
[Stu Charlton]

Sorry, how would one do this?
If my table has two rows;
clientid =1,year=july,year=2003,quantity=10
clientid =2,year=july,year=2003,quantity=20
Now I have to update (clientid=1)'s quantity by 15 and (clientid=2)'s quantity by 25.
How would you manage this in a single SQL? Like this there would be 5M rows to be updated.

Btw, my table is indexed on clientid,year and month.

July 30, 2003 - 7:24 am UTC

what is the OTHER table you are updating from?  you must have a detail table elsewhere from where you derive that 15 and 25.  So, assuming something like this:

ops$tkyte@ORA920> create table t ( clientid int, month int, year int, quantity int );

Table created.

ops$tkyte@ORA920> create table txns ( clientid int, month int, year int );

Table created.

ops$tkyte@ORA920>
ops$tkyte@ORA920> insert into t values ( 1, 7, 2003, 10 );

1 row created.

ops$tkyte@ORA920> insert into t values ( 2, 7, 2003, 20 );

1 row created.

ops$tkyte@ORA920>
ops$tkyte@ORA920> insert into txns select 1, 7, 2003 from all_objects where rownum <= 15;

15 rows created.

ops$tkyte@ORA920> insert into txns select 2, 7, 2003 from all_objects where rownum <= 25;

25 rows created.

ops$tkyte@ORA920>
ops$tkyte@ORA920>
ops$tkyte@ORA920> select * from t;

  CLIENTID      MONTH       YEAR   QUANTITY
---------- ---------- ---------- ----------
         1          7       2003         10
         2          7       2003         20

ops$tkyte@ORA920> update t
  2     set quantity = quantity + ( select count(*)
  3                                   from txns
  4                                  where txns.clientid = t.clientId
  5                                    and txns.month = t.month
  6                                    and txns.year = t.year );

2 rows updated.

ops$tkyte@ORA920> select * from t;

  CLIENTID      MONTH       YEAR   QUANTITY
---------- ---------- ---------- ----------
         1          7       2003         25
         2          7       2003         45


there you go.


(it is a shame you are using the wrong datatypes -- only a DATE should be used to hold, well, DATES.  Using numbers to hold a year and a month isn't a good practice)

SCN,CHECK POINT

AATIF, July 30, 2003 - 1:03 am UTC

i ask about whta is difference SCN system commit number,systemt change number and check point.
Which base oracle sync the database SCN or CHECKPOINT ,
when check point ocur who number write in control file headre,datafile and log file header.
dear tom.
your web site say me not ask more question now but i need clear my point.
AATIF.

July 30, 2003 - 7:30 am UTC

er?

i remember you asking this on another page -- yes.

and I also remember answering this on that other page -- yes.

I fail to see how this relates to this page.

anyway there IS no difference between

scn
system commit number
system change number

they are the same.

as for the rest of the comment -- it doesn't "make sense" (i do suggest a read through of the server concepts guide, many questions are answered there!)

More clarification

Srikanth Adiga, August 01, 2003 - 12:34 am UTC

>what is the OTHER table you are updating from? you must
>have a detail table elsewhere from where you derive that
>15 and 25.
It is coming from an OCI application after processing some files.

August 01, 2003 - 7:49 am UTC

use bulk processing - update between 100 and 500 rows at a time in a single call using array processing.

Or, put the data where it belongs - not in files, in tables -- and process the data in the database rather then in C

Srikanth Adiga, August 06, 2003 - 10:16 am UTC

>what is the OTHER table you are updating from? you must
>have a detail table elsewhere from where you derive that
>15 and 25.
It is coming from an OCI application after processing some files.

>Followup:
>use bulk processing - update between 100 and 500 rows at a >time in a single call using array processing.
Yes, that is what we do. It takes about 60 mins to update 5M records on a 2CPU machine. Is this much expected?

If I do the same in multiple threads I do not see any performance improvement i.e. 1M records parallely updated in 5 threads.
Any idea why?

August 06, 2003 - 10:55 am UTC

are you sure they are going in parallel and not blocking/locking each other.

verify that all 5 sessions are in fact not only ACTIVE (v$session) at the same time but that you are not blocking yourself.

and you have taken some statspacks during your testing to see if you have any obvious bottlenecks right?

Explicit commit !

Nathan, August 07, 2003 - 5:04 am UTC

Tom,

Why do we commit explicitly after submitting jobs ?

Thanks
Nathan
/*------------------------------------------------
Followup:
instead of

begin
execute immediate 'alter index idx1 rebuild';
execute immediate 'alter index idx2 rebuild';
end;

you can code:

declare
l_job number;
begin
dbms_job.submit( l_job, 'execute immediate ''alter index idx1 rebuild'';' );
commit;
dbms_job.submit( l_job, 'execute immediate ''alter index idx2 rebuild'';' );
commit;
end;
---------------------------------*/

August 09, 2003 - 3:51 pm UTC

because if you do not YOUR session can "see" the job in the job queue but the job queue processes cannot!

How many of us have sat there for minutes going "when is the stupid thing going to run" and until we exit sqlplus it does nothing :)

Explicit Commit

Nathan, August 07, 2003 - 8:51 am UTC

Apologies,
I must withdraw my previous question... i was obviously not thinking straight. I was thinking about a similar sitaution where a commit was issued ( before dbms_job.submit ) with a comment /* do not remove */ without any explanations... I'm still pondering why the commit should be there.

Sorry for the trouble.

Regards
Nathan

I have to take issue....

cheaper than disk, August 07, 2003 - 10:00 am UTC

Your response about justification for buying disk space is outdated...

"justification

you pay me alot
disk is cheap.
think about it.

(sometimes it really can be that simple)"

Anybody else out there NOT paid alot ANYMORE?!

;-)

Re: Explicit Commit - To Nathan

A reader, August 07, 2003 - 3:25 pm UTC

Tom is out till weekend..
"I'm still pondering why the commit should be there. "
The commit after dbms_job is requierd so that other sessions can see the job_queue...which typically is the
requirement...As usual, Tom can confirm..

Menon:)

Reader

A reader, August 10, 2003 - 1:27 pm UTC

"
create table invtmp as
select * from invins
UNION ALL
select t1.*
from inv t1, invdel t2
where t1.key = t2.key(+)
and t2.key is null;
"

Like to know why
"
where t1.key = t2.key(+)
and t2.key is null
"
better than
where t1.key != t2.key with ALWAYS_ANTI_JOIN hint

August 10, 2003 - 2:42 pm UTC

because there is no "always anti join" hint?

you could of course use the CBO, and a NOT IN subquery in which case it'll do it itself.

Su Baba, August 14, 2003 - 5:54 pm UTC

So if I have a big table, x, that needs to be updated, I would

create table y nologging as select * from x...
retrieve index definition from x
build indexes on y
drop table x
rename y to x

If I have all these done in a PL/SQL batch job, what's the best way to retrieve index definition and form create index statement from table x? I want to create indexes on y that have the exact storage parameter, tablespace, maybe partition, etc. as the ones on table x.

By the way, the database version is 8.1.7.

thanks

August 14, 2003 - 8:01 pm UTC

you COULD, not WOULD, it is an option.

you don't have your create indexes somewhere?

guess I would exp rows=n, imp indexfile=foo to get them -- or DBMS_METADATA.get_ddl in 9i

How to update part of a Very Large Table

Mohan K, August 28, 2003 - 7:03 am UTC

please refer "How to update part of a Very Large Table July 28, 2003"

When updating a large table in a database having comparatively less buffer cache one thing to make sure is that blocks are updated in a continuous order. Suppose some records of block1 is updated first in the beginning and next after updating some other blocks, then block1 will be written twice by DBWR and LGWR. After the first update block1 may get flushed out by LRU algorithm. If all records in block1 is updated at a time then block1 is wriiten by LGWR and DBWR only once. If the update statement is doing a full table scan then bloacks are updated in order. But the order can be different when the is using an index for selecting the rows.

The SQL statement that makes sure of that is

update (select id, description from test_update where rowid in(select rowid from (select rowid from test_update order by rowid))) set description=description||' new val' where description like '%5%';

August 28, 2003 - 7:54 am UTC

not necessarily so -- first lgwr doesn't write blocks -- it writes redo. second, dbwr may or may not write the block more then once.

I would never never never use your approach. why I would want to

a) read the entire table
b) sort the rowids
c) distinct that set (implied)
d) join to it (and you are making the assumption the table in the IN will be the driving table and that a hash join will not be used and ..... lots of other stuff)

when a simple:

update T set d = d || 'x' where d like '%5%'

will do it -- if you are worried about the index on d being used - you would simply FULL hint it -- but doubtful that it would be necessary.

the additional work you add would more then offset the work you are trying to save.

update large table

Vera, August 28, 2003 - 2:50 pm UTC

Hi Tom and thank you again and again for all your work.

My problem is that I cannot simply replace update with creating a new table. The table I need to update - I'll call it 'pension table' - has about 1.5 mln rows. I have to update "amount" fields for 1994, 1995,1995 etc. years, and the way how I update these fields depends on status of the participant at each particular year. To determine this status I built a function that looks through another tables and takes several input parameneters including each particular financial year because it determines not just status of a participant, but status he/she had in particular FY. (This function I call a "box of cockroaches"). Basically, I ask Oracle to bring me annual monthly amount for Mr/Mrs 'X' for, let's say, 1994 and if 'X' had status 'retired' in 1994 then I want this amount to be calculated one way, if 'X' was 'deferred' -another way and if 'X' was 'dead' third way.
As you alredy guessed, updates take long time - about an hour for each year. I wonder what approach would you use in a situation like this?
Thank you
By the way, what is a status of your new book?

August 29, 2003 - 8:35 am UTC

I would use a CASE statement in SQL without using PLSQL to compute the value.

optionally -- i would turn it inside out. instead of sql calling plsql, this might be one of the times when plsql calling sql is more appropriate. build up procedurally an array of values to update -- forall update them -- and do it again until the rows are processed. with a little foresight, you can even parallelize this processing and run your procedure N times -- each instance of it running against a different slice of the table...

book is "real soon" -- should be in trucks right now.

update large table

Luc Gyselinck, August 28, 2003 - 4:18 pm UTC

For large tables, if I have to update more than 20 percent of the rows, I do it like Tom: write a query to insert the 'updated' records in a new table, using the /*+ APPEND */ hint. You have no indexes to update, no constraints to validate and you generate almost no redo. Next, I truncate the old table, after I have disabled all constraints and triggers (once again, almost no redo, very fast), and then I inject, again with the /*+ APPEND */ hint, the data from the new table into the old (very fast, little redo). Indexes get automaticaly rebuild AFTER the extents are populated. I reenable the triggers, the constraints (with or without validation, as you wish).

If the new values for the columns being updated must come from other tables, I NEVER write something like

update t1
set c1 = (select c2 from t2 where t2.c3 = t1.c4)

In fact, you are performing NESTED LOOPs. Using functions to get the new values for the columns is much the same : NESTED LOOPS (even worse: SQL / PLSQL engine context switches, open/close cursors in PL/SQL, NO read consistency).

Whenever I find myself in such a situation (typicaly when writing batch procedures, during data migrations, data transformations), I make sure my queries use HASH joins, I give the session more resources (higher SORT_AREA_SIZE, higher HASH_AREA_SIZE, DB_MULTIBLOCK_READ_COUNT to the max), avoid the INDEX RANGE SCANS, do FULL (yes FULL) table scans, do FAST FULL INDEX scans, thus bypassing the buffer pool which otherwise gets flushed by the data from my queries. And by the way, use LMTs with uniform size extents.

As an example, I rewrote a batch procedure that took 3 days to complete (once, many years ago), written the old way (see the update statement above), that now does the same job in only 4 hours, on the same (old) hardware, same database, but different Oracle version: Oracle 7 / Oracle 8i.

Yes, you are right,

Vera, August 28, 2003 - 4:50 pm UTC

but right now I don't see how can I perform my task without update statement and a function. In other cases I do exactly the same thing you do on large tables - create new table rather than update old one and then rename tables.
As for LM tablespaces I was begging our DBAs to switch on LMT since I don't know when and they keep promising to do it.

Does parallel insert just imply direct?

Dennis, September 08, 2003 - 1:49 pm UTC

Tom (or anyone),

Does a parallel insert imply direct (append) insert? I read something in the documentation (or is that forbidden territory) ;) awhile back about the append hint not being needed on a parallel insert because it was implied.

I issued the following:
INSERT /*+ append, nologging, parallel(ext, 8) */ INTO hrdmadm.ext_xpayd_pay_details ext
SELECT /*+ parallel(ext2, 8) */ *
FROM hrdmadm.ext_xpayd_pay_details@hip2 ext2

and while I saw the parallel processes spawn on hip2 (for the select), I didn't notice any spawn where I was.

I was wondering if that was because parallel insert was synonymous with direct insert, or did I mess up the syntax somehow? Is the insert in fact parallel, and it's just 'hidden' from us?

Thanks,
Dennis

September 08, 2003 - 2:13 pm UTC

</code> http://docs.oracle.com/docs/cd/B10501_01/server.920/a96524/c21dlins.htm#10629 <code>

quote:

Serial and Parallel Direct-Path INSERT

When you are inserting in parallel DML mode, direct-path INSERT is the default. In order to run in parallel DML mode, the following requirements must be met:
......

Thanks!

Dennis, September 08, 2003 - 2:57 pm UTC

My bad was that I didn't alter session. Thanks Tom for your valuable insight. I guess I was just too afraid to read down a few more pages when I was in the documentation, with that being forbidden territory and all ;) Well maybe not forbidden...just hard to put that 'haunted/fear-inspiring' word that would fit in there.

Thanks,
Dennis

Optimizing Query

Mohan, September 11, 2003 - 8:40 am UTC

BULK collect and BULK insert can improve the performance. Create SQL objects of type t1%rowtype and t2%rowtype.
BULK collect 100 records at a time to the collection array. Create seperate collection arrays to store values that are computed or obtained after searching the database.
Do BULK insert/update into the target table. Search this site using key words "bulk insert" or "bulk collect" or "bulk update" for more examples.

Other things are
1. Improve query performance by proper indexing
2. Remove fragmentation by "alter table move" command
3. Rebuild indexes
4. Allow large pctfree if the update causes row migration/row chaining
5. Size rollback segments and rollback tablespaces

Mohan

September 11, 2003 - 9:49 am UTC

#2 isn't really "a good thing" in general.
#3 is really not "a good thing" in general.

reorgs -- so so overrated.

Kailash, September 11, 2003 - 1:51 pm UTC

Can you explain which of the 3 operations will be faster. If I want to update millions of rows,
1. then would delete/reinsert be faster or
2. mere update will be faster
3. the one you suggested will be faster.
Can you advise as to why the method you had suggested will be faster than 1 and 2.

Can you explain why updating millions of rows is not a good idea. Is it based on the amount of redo generated and what if the columns I update are not part of the index?

Which operation will be the fastest and I request you to explain the various DML operations and the amount of redo they generate. ie. I would appreciate if you could explain me which DML generates most redo, followed by the second followed by the 3rd which generates the least redo.

Currently we have an application that checks for a condition in a table A and obtains the unique id. if a record is found, then flushes all of the dependent tables for the given id and re-inserts these records back into the dependent tables with a set of new values. Is this a good approach? If so, what would be the ideal approach?

September 11, 2003 - 7:25 pm UTC

I can say yes to all three -- each is the fastest.

each is the slowest.

If you have a large bulk update that is done once and hits most of the of rows -- you may be best off by doing a CTAS, drop old and rename new.

the CTAS, indexing, etc can be done in nologging mode -- no undo, no redo. you end up getting done much faster.

updating millions of records

b, October 03, 2003 - 4:28 pm UTC

Hi,
The approach of updating the view instead of a table works great.
But here i want to update multiple colums of the same table based on different table queries.
we have around 600,000 records in the base table.
what could be the most inexpensive query.
We run oralce 9i.

UPDATE user1.UserSummary
Set
Column1 = (
SELECT Count(DISTINCT WebOrderID)
FROM user1.OrderTotal ot
WHERE total>0 and
(ot.ostatus <> -1 and
ot.ostatus <> -1) and
ot.UREFITEM = user1.UserSummary.UREFITEM)),
Column2 = (
SELECT NVL(Sum(Total),0) FROM
user1.OrderTotal ot
WHERE (ot.ostatus <> -1 and ot.ostatus <> -1)
and ot.UREFITEM = user1.UserSummary.UREFITEM),
Column3 = (
SELECT min(sodate) FROM orders.salesorders so left join
user1.ordertotal ot on ot.sorefitem = so.sorefitem
WHERE total>0 and (so.ostatus <> -1 and so.ostatus <> -1)
and so.UREFITEM = user1.UserSummary.UREFITEM),
Column4 = (
SELECT max(sodate) FROM orders.salesorders so
left join user1.ordertotal ot on ot.sorefitem = so.sorefitem
WHERE total>0 and (so.ostatus <> -1 and so.ostatus <> -1) and
so.UREFITEM = user1.UserSummary.UREFITEM)
;
the above is the sample of my query

thanks in advance
;;)

October 04, 2003 - 9:38 am UTC

if you can come up with a single query that pulls all 4 columns together in a single result set -- we can optimize this. Left as an exercise for you -- see if you cannot merge the above 4 separate query into a single query

great

b, October 17, 2003 - 3:33 pm UTC

thanks
i finally made my query simple

Great

Raj, October 21, 2003 - 5:28 pm UTC

Great technique never thought of it.

Anthony Reddy, October 22, 2003 - 1:27 am UTC

Updating 1 million records

Kailash, October 22, 2003 - 4:43 pm UTC

I have a situation where I have a unique key on 3 columns (c1,c2,c3) on a table T and I want to update c2 from A to B for some records that match certain criteria. There are a total of 30 million records in the table and there will be about 1 million records in the table that match the criteria to be updated. The table is not partitioned.
1. Your suggestion to drop the old table and to re-create a new table and re-naming it to the old table may not be possible because I am updating only a subset of the table. Is there any other method to acheive this ?
2. Which one of the following will be a better solution?
a. Drop index and update the rows and recreate the
index
b. Delete the rows with C2 = A and insert new rows
with C2 = B. My DBA says that this will create
fragmentation and is not a better approach. Is
that true?
c. Do you suggest any better solution under these
scenarios since I keep getting these type of
questions from my manager often.

Your suggestion in this regard is highly appreciated.

October 22, 2003 - 6:28 pm UTC

have you considered just "update set c2 = b where ...."?

1,000,000 rows out of 30,000,000 isn't alot. if c2 is indexed, you might benefit from a set unusable, skip unusable, update, rebuild.

but it won't take that long to update 1,000,000 rows. couple of minutes on my laptop

big_table@ORA920> update big_table set created = created+1;

1833792 rows updated.

Elapsed: 00:02:13.28

not a big deal.

How about rollback segment problem when updating million rows.

Ramasamy, October 22, 2003 - 10:49 pm UTC

Tom ,

Thanks lot for your service.

I have to update a single column NUMBER(9) where
the table has got more than 80 columns.
This table has got 30 million rows and i have to update
around 17 million rows each night.

As you always insisted ,
i tried with a single update like

UPDATE t1 SET batchno = (A constant )
WHERE batchno is NULL;

1.)I came across the rollback segment issue.
2.) It is taking around 3 hrs to update.

Your input is greatly appreciated to solve the above issues.

Thanking You,
Ramasamy

October 23, 2003 - 8:03 am UTC

there is no problem -- other then you haven't sized your system to do the things you require it to do!!!!!!!!

1) the rollback segment "issue" is a "non issue". size things for what you do and there are NO ISSUES. disk is cheap -- compared to what they pay us all to have this discussion. consider that. the amount of time you've spent thinking about this already cost your employer much more then the disk it would take to do this correctly.

2) you are changing a null to a non-null -- increasing the width of the field. make sure you are not causing massive row migration (check your pctfree, make sure it is set appropriately here). is batchno "indexed"?

MILLION RECORD INSERT/SELECT/UPDATE TAKES TOO MUCH TIME

Santosh Das, October 23, 2003 - 3:27 am UTC

Hi Tom,
I have a table A and soon after creating it 1million records are put into it. Then it being in a real time environment its size goes on increasing atleast half million each day.However each day a cleanup application cleans half million every day.After inserting 1 million at the creation of table its put in the realtime environment.When ever a new record is inserted a contuniously listening application detects it and applies a select operation on table A. Then it send the selected items to some table B and then updates the table A.
The listening operation may be 3 or 4 in number.How ever the select and update operations are taking a lot of time and the initial insertion of 1 million records is taking 11 hours to complete so what can be the possible architecture.
I tried sending the rowid to a queue/pipe after inserting in table A and then used it to select and update.But still its taking a lot of time. Plz suggest some method.
Thanks and regards,
Sandy..

October 23, 2003 - 12:42 pm UTC

sounds like you should just be using AQ -- plop the record in as a message -- have as many dequeuers as you like.

not sure what else to say, not a ton of detail here.

Select and update of a table Having Millions of Records.

Jack, October 23, 2003 - 8:05 am UTC

Hi Tom,
I have the following queries which execute on a table of million records.That table has a PK but its not used in both the queries given below.CMTIME,SAD and STATUS all are NUMBER fields.If I make index on cmtime then it takes longer. Presently i dont have a index field.Can you plz tellme how i can optimize these or where the performance bottleneck is ?

update SD set CMTIME = 1064929834 where (CMTIME is NULL or 1064929834 - CMTIME >= 10) and (1064929834 - SAD >= 0) and 1064929834 not in (select CMTIME from SD where CMTIME is not NULL)

SELECT * from SD where (CMTIME is not NULL and (CMTIME = 1064929834 OR 1064929834 - CMTIME >= 10)) AND (ED - 1064929834 > 0 and (STATUS = 1 OR STATUS = 0) and 1064929834 - SAD >= 0)

Thanks and regards,
Jack.

October 23, 2003 - 12:50 pm UTC

are you using the CBO.

Followup to your question

Ramasamy, October 23, 2003 - 2:17 pm UTC

2) you are changing a null to a non-null -- increasing the width of the field.
make sure you are not causing massive row migration (check your pctfree, make
sure it is set appropriately here). is batchno "indexed"?

Yes Tom. The Batchno is indexed like
index on (BATCHno,STATUS);

Will it be worth Droping the index and recreating it?.

Thanks,
Ramasamy

October 23, 2003 - 7:16 pm UTC

could definitely help, yes.

George Lee, October 23, 2003 - 9:12 pm UTC

I tried to update 30 million records using Tom's solution.
It works fine. Using approximately 1.5 hours to finish.

George (Broadway Photo Supply Ltd. HK)

whats CBO

Jack, October 24, 2003 - 4:29 am UTC

October 24, 2003 - 9:37 am UTC

cost based optimizer

No

jack, October 27, 2003 - 1:37 am UTC

We are not using any cost based optimizers.
It's a simple query which we are using to retrieve data from a table having millions of records and it's performance is not satisfactory.
How to optimize it so that it runs faster ?
jack

October 27, 2003 - 7:42 am UTC

you are kidding right?

this goes right up there with "my car won't start, why not?"

there is so much missing from this "request" as to make it impossible to say anything sensible.

George Lee, November 11, 2003 - 11:21 pm UTC

I have a fact table being partitioned by month. The indexes are built local to partition.

The data population is working fine.

But the rebuild index fails at the end.

I don't know how to rebuild indexes in the partitions.
Can you help me ?

November 12, 2003 - 10:19 am UTC

why do you need to rebuild here at all??

but why don't you show us what you are doing -- at the very least -- define what "fails at the end" means (error codes, messages, details)

George Lee, November 12, 2003 - 8:17 pm UTC

Dear Tom,

Yes, I should provide enough information to you. Sorry for that.

Here is the story,

I create a table as following,

drop table D_DYNRPT_SALES_FACT;

create table D_DYNRPT_SALES_FACT
(
TX_DATE DATE,
SHOP_NO VARCHAR2(20),
ITEM_NO VARCHAR2(12),
PARENT_ITEM_NO VARCHAR2(12),
BRAND_NO VARCHAR2(5),
VENDOR_NO VARCHAR2(20),
ITEM_GROUP VARCHAR2(5),
ITEM_TYPE VARCHAR2(5),
CONSIGNMENT VARCHAR2(1),
ITEM_DESC VARCHAR2(40),
BRAND_DESC VARCHAR2(30),
VENDOR_DESC VARCHAR2(30),
CATEGORY_DESC VARCHAR2(30),
QTY_SOLD NUMBER(14,2),
NET_SALES_AMT NUMBER(14,2),
GROSS_PROFIT_AMT NUMBER(14,2)
)
PARTITION BY RANGE (TX_DATE)
( PARTITION D_DYNRPT_SALES_FACT_2000_09 VALUES LESS THAN (TO_DATE('01/10/2000','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2000_09,
PARTITION D_DYNRPT_SALES_FACT_2000_10 VALUES LESS THAN (TO_DATE('01/11/2000','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2000_10,
PARTITION D_DYNRPT_SALES_FACT_2000_11 VALUES LESS THAN (TO_DATE('01/12/2000','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2000_11,
PARTITION D_DYNRPT_SALES_FACT_2000_12 VALUES LESS THAN (TO_DATE('01/01/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2000_12,
PARTITION D_DYNRPT_SALES_FACT_2001_01 VALUES LESS THAN (TO_DATE('01/02/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_01,
PARTITION D_DYNRPT_SALES_FACT_2001_02 VALUES LESS THAN (TO_DATE('01/03/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_02,
PARTITION D_DYNRPT_SALES_FACT_2001_03 VALUES LESS THAN (TO_DATE('01/04/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_03,
PARTITION D_DYNRPT_SALES_FACT_2001_04 VALUES LESS THAN (TO_DATE('01/05/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_04,
PARTITION D_DYNRPT_SALES_FACT_2001_05 VALUES LESS THAN (TO_DATE('01/06/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_05,
PARTITION D_DYNRPT_SALES_FACT_2001_06 VALUES LESS THAN (TO_DATE('01/07/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_06,
PARTITION D_DYNRPT_SALES_FACT_2001_07 VALUES LESS THAN (TO_DATE('01/08/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_07,
PARTITION D_DYNRPT_SALES_FACT_2001_08 VALUES LESS THAN (TO_DATE('01/09/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_08,
PARTITION D_DYNRPT_SALES_FACT_2001_09 VALUES LESS THAN (TO_DATE('01/10/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_09,
PARTITION D_DYNRPT_SALES_FACT_2001_10 VALUES LESS THAN (TO_DATE('01/11/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_10,
PARTITION D_DYNRPT_SALES_FACT_2001_11 VALUES LESS THAN (TO_DATE('01/12/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_11,
PARTITION D_DYNRPT_SALES_FACT_2001_12 VALUES LESS THAN (TO_DATE('01/01/2002','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_12,
PARTITION D_DYNRPT_SALES_FACT_2002_01 VALUES LESS THAN (TO_DATE('01/02/2002','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2002_01,
PARTITION D_DYNRPT_SALES_FACT_2002_02 VALUES LESS THAN (TO_DATE('01/03/2002','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2002_02,
PARTITION D_DYNRPT_SALES_FACT_2002_03 VALUES LESS THAN (TO_DATE('01/04/2002','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2002_03,
PARTITION D_DYNRPT_SALES_FACT_2002_04 VALUES LESS THAN (TO_DATE('01/05/2002','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2002_04,
.
.
.
PARTITION D_DYNRPT_SALES_FACT_2004_12 VALUES LESS THAN (TO_DATE('01/01/2005','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2004_12,
PARTITION D_DYNRPT_SALES_FACT_MAXVALUE VALUES LESS THAN (maxvalue)
TABLESPACE D_DYNRPT_SALES_FACT_MAXVALUE
)
/

CREATE INDEX D_DYNRPT_SALES_FACT_I01 ON D_DYNRPT_SALES_FACT
(TX_DATE) LOCAL;

CREATE INDEX D_DYNRPT_SALES_FACT_I02 ON D_DYNRPT_SALES_FACT
(TX_DATE, ITEM_GROUP, ITEM_TYPE) LOCAL;

CREATE INDEX D_DYNRPT_SALES_FACT_I03 ON D_DYNRPT_SALES_FACT
(TX_DATE, SHOP_NO, ITEM_NO, ITEM_GROUP) LOCAL;

CREATE INDEX D_DYNRPT_SALES_FACT_I04 ON D_DYNRPT_SALES_FACT
(TX_DATE, BRAND_NO, ITEM_GROUP, ITEM_TYPE) LOCAL;

CREATE INDEX D_DYNRPT_SALES_FACT_I05 ON D_DYNRPT_SALES_FACT
(TX_DATE, ITEM_NO) LOCAL;

CREATE OR REPLACE PUBLIC SYNONYM D_DYNRPT_SALES_FACT FOR BPSADM.D_DYNRPT_SALES_FACT;

then, populate data into the table as the following,

alter index D_DYNRPT_SALES_FACT_I01 unusable;
alter index D_DYNRPT_SALES_FACT_I02 unusable;
alter index D_DYNRPT_SALES_FACT_I03 unusable;
alter index D_DYNRPT_SALES_FACT_I04 unusable;
alter index D_DYNRPT_SALES_FACT_I05 unusable;

alter session set skip_unusable_indexes=true;

@pop_d_dynrpt_sales_fact.sql '2000/09/01' '2000/01/30';
.
.
.
@pop_d_dynrpt_sales_fact.sql '2003/11/01' '2003/11/30';

alter index D_DYNRPT_SALES_FACT_I01 rebuild nologging;
alter index D_DYNRPT_SALES_FACT_I02 rebuild nologging;
alter index D_DYNRPT_SALES_FACT_I03 rebuild nologging;
alter index D_DYNRPT_SALES_FACT_I04 rebuild nologging;
alter index D_DYNRPT_SALES_FACT_I05 rebuild nologging;

the data_population is working fine, but alter index rebuild nologging fail with the error as following,

alter index D_DYNRPT_SALES_FACT_I01 rebuild nologging
*
ERROR at line 1:
ORA-14086: a partitioned index may not be rebuilt as a whole

alter index D_DYNRPT_SALES_FACT_I02 rebuild nologging
*
ERROR at line 1:
ORA-14086: a partitioned index may not be rebuilt as a whole

alter index D_DYNRPT_SALES_FACT_I03 rebuild nologging
*
ERROR at line 1:
ORA-14086: a partitioned index may not be rebuilt as a whole

alter index D_DYNRPT_SALES_FACT_I04 rebuild nologging
*
ERROR at line 1:
ORA-14086: a partitioned index may not be rebuilt as a whole

alter index D_DYNRPT_SALES_FACT_I05 rebuild nologging
*
ERROR at line 1:
ORA-14086: a partitioned index may not be rebuilt as a whole

I checked the reference, it should be something syntax error on my rebuild index statement, it is like alter index D_DYNRPT_SALES_FACT_I05 rebuild partition..., I don't know the exact statement, can you help me, thanks.

November 13, 2003 - 6:54 am UTC

ops$tkyte@ORA920LAP> CREATE TABLE t
  2  (
  3    data       char(255),
  4    temp_date  date
  5  )
  6  PARTITION BY RANGE (temp_date) (
  7    PARTITION part1 VALUES LESS THAN (to_date('13-mar-2003','dd-mon-yyyy')) ,
  8    PARTITION part2 VALUES LESS THAN (to_date('14-mar-2003','dd-mon-yyyy')) ,
  9    PARTITION part3 VALUES LESS THAN (to_date('15-mar-2003','dd-mon-yyyy')) ,
 10    PARTITION part4 VALUES LESS THAN (to_date('16-mar-2003','dd-mon-yyyy')) ,
 11    PARTITION part5 VALUES LESS THAN (to_date('17-mar-2003','dd-mon-yyyy')) ,
 12    PARTITION part6 VALUES LESS THAN (to_date('18-mar-2003','dd-mon-yyyy')) ,
 13    PARTITION junk VALUES LESS THAN (MAXVALUE)
 14  )
 15  ;

Table created.

ops$tkyte@ORA920LAP>
ops$tkyte@ORA920LAP> create index t_idx1 on t(temp_date) LOCAL nologging;

Index created.

ops$tkyte@ORA920LAP>
ops$tkyte@ORA920LAP> alter index t_idx1 unusable;

Index altered.

ops$tkyte@ORA920LAP>
ops$tkyte@ORA920LAP> begin
  2          for x in ( select 'alter index ' || index_name ||
  3                            ' rebuild partition ' || partition_name stmt
  4                   from user_ind_partitions
  5                                  where index_name = 'T_IDX1' )
  6          loop
  7                  dbms_output.put_line( x.stmt );
  8                  execute immediate x.stmt;
  9          end loop;
 10  end;
 11  /
alter index T_IDX1 rebuild partition PART1
alter index T_IDX1 rebuild partition PART2
alter index T_IDX1 rebuild partition PART3
alter index T_IDX1 rebuild partition PART4
alter index T_IDX1 rebuild partition PART5
alter index T_IDX1 rebuild partition PART6
alter index T_IDX1 rebuild partition JUNK

PL/SQL procedure successfully completed.

A reader, December 11, 2003 - 4:56 pm UTC

Hi Tom,
I am running below update statement from past 24 hr and it's still running..

Table S_REVN has 1.5 Million and Table S_OPTY_PROD has 1.2 Million Record.. It's doing full table scan of both table sinace I am using function in where clause.. We using RULE base optimizer and oracle 8.1.7.4

Here is sql statement

Update siebel.S_REVN r
set x_opty_prod_city=(select X_CITY from siebel.S_OPTY_PROD p where replace(replace(replace(p.ROW_ID,'ce','='),'-','='), '@', '$') = r.row_id),
X_opty_prod_STATE =(select X_STATE from siebel.S_OPTY_PROD p where replace(replace(replace(p.ROW_ID,'ce','='),'-','='), '@', '$') = r.row_id),
X_PR_YEAR_QTY =(select X_PR_YEAR_QTY from siebel.S_OPTY_PROD p where replace(replace(replace(p.ROW_ID,'ce','='),'-','='), '@', '$') = r.row_id)

0 UPDATE STATEMENT Optimizer=RULE
1 0 UPDATE OF 'S_REVN'
2 1 TABLE ACCESS (FULL) OF 'S_REVN'
3 0 TABLE ACCESS (FULL) OF 'S_OPTY_PROD'
4 0 TABLE ACCESS (FULL) OF 'S_OPTY_PROD'
5 0 TABLE ACCESS (FULL) OF 'S_OPTY_PROD'

Please advice
Thanks

December 11, 2003 - 6:27 pm UTC

you understand that it is doing 3 full scans of S_OPTY_PROD for EACH and EVERY row in S_REAVN yes???

how about

update t
set (a,b,c) = (select a,b,c from t2 where ... )

at the very least (reduce the full scans from 3 to 1)

at the very most, consider creating an index:

create index temp_index_to_make_this_complete_before_my_grandchildren_have_grandchildren
on p(replace(replace(replace(p.ROW_ID,'ce','='),'-','='), '@', '$'))

and useing a hint (to let sighbul use the cbo) to get the index used.

A reader, December 12, 2003 - 12:44 pm UTC

Hi Tom,

I am following your advice on update I've changed sql statement

Update /*+ INDEX(S_OPTY_PROD T_TEMP) */ siebel.S_REVN r
set (x_opty_prod_city,X_opty_prod_STATE,X_PR_YEAR_QTY) =
(select X_CITY,X_STATE,X_PR_YEAR_QTY from siebel.S_OPTY_PROD p
where replace(replace(replace(p.ROW_ID,'ce','='),'-','='), '@', '$') = r.row_id)

created function based index on S_OPTY_PROD

create index T_TEMP
on S_OPTY_PROD(replace(replace(replace(ROW_ID,'ce','='),'-','='), '@', '$'))

BUT above sql statement in not using index eventhough I supply hint (Is it b/c function based index not reconized by Rule based optimizer?)

Please help
Thanks

December 13, 2003 - 11:11 am UTC

you put the hint in the wrong place :)

and you might not have met all of the requirements for FBI's.  read:

http://asktom.oracle.com/~tkyte/article1/index.html


but it is not the RBO (the well formed hint -> cbo)....


your hint is wrong (must use the correlation name of P) and in the wrong place (should be in the subquery).

you need query_rewrite_enabled and (depending on version) might need query_rewrite_integrity (see the above link).

but here is an example:
ops$tkyte@ORA920> create table t1( x_opty_prod_city int, x_opty_prod_state int, x_pr_year_qty int, row_id varchar2(25) );
 
Table created.
 
ops$tkyte@ORA920> create table t2( x_city int, x_state int, x_pr_year_qty int, row_id varchar2(25) );
 
Table created.
 
ops$tkyte@ORA920>
ops$tkyte@ORA920> create index t2_idx on t2( replace(replace(replace(ROW_ID,'ce','='),'-','='), '@', '$') );
 
Index created.
 
ops$tkyte@ORA920>
ops$tkyte@ORA920> delete from plan_table;
 
4 rows deleted.
 
ops$tkyte@ORA920>
ops$tkyte@ORA920> ALTER SESSION SET QUERY_REWRITE_ENABLED=true;
 
Session altered.
 
ops$tkyte@ORA920>
ops$tkyte@ORA920> explain plan for
  2  Update t1 r
  3  set (x_opty_prod_city,X_opty_prod_STATE,X_PR_YEAR_QTY)     =
  4      (select /*+ INDEX( t2 t2_idx ) */ X_CITY,X_STATE,X_PR_YEAR_QTY
  5             from t2 p
  6        where replace(replace(replace(p.ROW_ID,'ce','='),'-','='), '@', '$') = r.row_id)
  7  /
 
Explained.
 
ops$tkyte@ORA920>
ops$tkyte@ORA920> prompt @?/rdbms/admin/utlxpls
@?/rdbms/admin/utlxpls
ops$tkyte@ORA920> set echo off
 
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------
 
-------------------------------------------------------------------------------
| Id  | Operation                    |  Name       | Rows  | Bytes |Cost(%CPU)|
-------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT             |             |    82 |  4346 |   3  (34)|
|   1 |  UPDATE                      | T1          |       |       |          |
|   2 |   TABLE ACCESS FULL          | T1          |    82 |  4346 |   3  (34)|
|   3 |   TABLE ACCESS BY INDEX ROWID| T2          |     1 |    53 |   2  (50)|
|*  4 |    INDEX RANGE SCAN          | T2_IDX      |     1 |       |   2  (50)|
-------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   4 - access(REPLACE(REPLACE(REPLACE("P"."ROW_ID",'ce','='),'-','='),'@','$')=:B
              1)
 
16 rows selected.

A reader, December 12, 2003 - 9:04 pm UTC

hello Tom,I create table as following using the parallel hint,but the cost is different:

sql>insert /*+ append parallel(demo01,4) */ into demo01 select /*+ parallel(customer,4) */ * from customer;
the cost --42s
and the Execution Plan is:
Execution Plan
----------------------------------------------------------
ERROR：
ORA-12838: 无法在并行模式下修改之后读/修改对象
SP2-0612: 生成AUTOTRACE EXPLAIN报告时出现错误

Statistics
----------------------------------------------------------
1698 recursive calls
455 db block gets
8728 consistent gets
8075 physical reads
66492440 redo size
630 bytes sent via SQL*Net to client
525 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
11 sorts (memory)
0 sorts (disk)
390126 rows processed

sql>insert * from demo02 select * from customer;
cost --26s
and the Execution Plan is:
Execution Plan
----------------------------------------------------------
0 INSERT STATEMENT Optimizer=CHOOSE (Cost=1226 Card=659568 Byt
es=680674176)

1 0 PARTITION RANGE (ALL)
2 1 TABLE ACCESS (FULL) OF 'CUSTOMER' (Cost=1226 Card=659568
Bytes=680674176)

Statistics
----------------------------------------------------------
1939 recursive calls
44876 db block gets
20918 consistent gets
1 physical reads
64954576 redo size
646 bytes sent via SQL*Net to client
467 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
7 sorts (memory)
0 sorts (disk)
390126 rows processed

The customer table have 0.4 million records and the machine is 2 CPU's SUN E420,which runiing Oracle database.

Can you tell me why the physical reads increase so much using parallel hint in the DML sql,thanks

December 13, 2003 - 11:49 am UTC

well, for a tiny number of records like this, i would not even consider PQ at this point.

But -- to answer your questions

the cost, well the cost is affected by thousands of things. HINTS definitely affect the cost (That is in part how hints work -- by playing with costs). Ignore the cost of two queries that are not the same, they are NOT comparable.

the PIO's -- well, you ran one before the other? That and parallel query prefers to checkpoint -- and do direct io many times (you might find PQ doing a global checkpoint before the query begins to get current images onto disk in order to just all hit slam the data as fast as it can without messing with the buffer cache)

but for so few records -- it would seem that regular sql is what you want.

A reader, December 13, 2003 - 4:37 pm UTC

Hi Tom,
Followed you advice...WORKS LIKE A CHARM..updated 1.5 Million recod in 10 Min. EXCELLENT!!!!!!!

Thanks a LOTTTTTTTTTTT

Update siebel.S_REVN r
set (x_opty_prod_city,X_opty_prod_STATE,X_PR_YEAR_QTY) =
(select /*+ INDEX(S_OPTY_PROD T_TEMP) */ X_CITY,X_STATE,X_PR_YEAR_QTY from siebel.S_OPTY_PROD p
where replace(replace(replace(p.ROW_ID,'ce','='),'-','='), '@', '$') = r.row_id)

Online updation of very big table

Praveen, March 06, 2004 - 3:37 pm UTC

Hi Tom
Iwant to update a table with one go on online system.A table has 20 lakh records and has 110 columns. when i give the update command it takes ~ one hrs.I don't know why it is taking so much time even index is created on that particular field.

select count(*),app_flg from test;
app_flag has only two values Approved or unapproved.By defualt is unapproved.
approved =170000
unapproved=30000
update test set app_flg='APPROVED' where app_flg='UNAPPROVED'
it took 1hrs to update the records and other application online users processing got slow down and locks started to occure on the table.

Suggest the best solution
pkr

March 06, 2004 - 8:30 pm UTC

It would be the INDEX that slowed it down.

You want to FULL SCAN this table.
You want there to be NO index on the column you are updating.

it gives error

praveen, March 07, 2004 - 5:16 am UTC

on dropping the index it gives error ora-00054

thanks

March 07, 2004 - 9:26 am UTC

that'll happen whilst there are outstanding transactions, yes. You'll want to keep trying until you get a chance to drop it.

thanks

jasdeep,praveen, March 07, 2004 - 11:16 am UTC

i have solved that problem as an user has locked rows on the but at present not logged on i killed that session and index was dropped immediately.
and updates were as fast as u can think.

thanks

Update taking hrs of time

Sachin, March 09, 2004 - 7:44 am UTC

Hi Tom,
I have a query as:
I have two tables Table1(with around 10000 records max) and table2(actually the GL code combination master table with around 600,000 records). I need to update three fields in table1 (which is a temp processing table) with a unique value from table2. T1 has ccid fields which need to be updated, and s1-s4 fields corresponding to segment1-segment4 fields of table2
Following is the query like:

UPDATE table1 t1
SET t1.ccid1 = (SELECT t2.code_combination_id
FROM gl_code_combinations t2,
gl_sets_of_books gb
WHERE t2.segment5 = 'yyyyy'
and t2.segment4 = t1.s4
and t2.segment3 = 'XXXXX'
and t2.segment2 = t1.s2
and t2.segment1 = t1.s1
and t2.coa_id = gb.chart_of_accounts_id
and gb.set_of_books_id = 1),
t1.ccid2 = (SELECT t2.code_combination_id
FROM gl_code_combinations t2,
gl_sets_of_books gb
WHERE t2.segment5 = 'yyyyy'
and t2.segment4 = t1.s4
and t2.segment3 = 'zzzzzz'
and t2.segment2 = t1.s2
and t2.segment1 = t1.s1
and t2.coa_id = gb.coa_id
and gb.set_of_books_id = 1),
t1.ccid3 = (SELECT t2.ccid
FROM gl_code_combinations t2,
gl_sets_of_books gb
WHERE t2.segment5 = 'yyyyy'
and t2.segment4 = t1.s4
and t2.segment3 = 'wwwww'
and t2.segment2 = t1.s2
and t2.segment1 = t1.s1
and t2.coa_id = gb.coa_id
and gb.set_of_books_id = 1

But this is taking more than 6 hrs to complete. When I check the table locks the table remains locked Row Xclusively. I am committing immediately after the update statement in the procedure. Cud u pls tell me why is this happening?

March 09, 2004 - 12:26 pm UTC

not that i don't believe you but -- about the locking issue -- I'm not sure i believe you.

when you commit -- locks are released. You'd have to show me the info from v$lock showing that t1 is still "locked" and locked by your session.

you could

set (c1,c2,c3) = ( select max(decode(segment3,'xxxxx',ccid)),
max(decode(segment3,'zzzzz',ccid)),
max(decode(segment3,'wwwww',ccid))
FROM gl_code_combinations t2,
gl_sets_of_books gb
WHERE t2.segment5 = 'yyyyy'
and t2.segment4 = t1.s4
and t2.segment3 in ('xxxxx','zzzzzz', 'wwwww')
and t2.segment2 = t1.s2
and t2.segment1 = t1.s1
and t2.coa_id = gb.coa_id
and gb.set_of_books_id = 1 )

instead of running a query per column.

if you run the query

select max(decode(segment3,'xxxxx',ccid)),
max(decode(segment3,'zzzzz',ccid)),
max(decode(segment3,'wwwww',ccid))
FROM gl_code_combinations t2,
gl_sets_of_books gb
WHERE t2.segment5 = 'yyyyy'
and t2.segment4 = some_value_value
and t2.segment3 in ('xxxxx','zzzzzz', 'wwwww')
and t2.segment2 = some_value_value
and t2.segment1 = some_value_value
and t2.coa_id = gb.coa_id
and gb.set_of_books_id = 1

how long does that take?

Updating millions of rows

A reader, March 15, 2004 - 1:46 pm UTC

Tom,

In this discussion you mention that you would not do an UPDATE/DELETE of millions of rows, but rather would create a new table (in parallel/nologging) with the rows that needs to be retained and then create indexes in parallel, drop the old table and rename the new table to old table. If so, my understanding is :

1. Delete/UPDATE of tables with more than 1 million row would also have to maintain the index structure and also will generate considerable rollback and undo.
2. Other consideration with respect to update/delete is that the blocks will be moved in and out of the freelists. Please confirm if this is true and if so, please explain what happens during update/delete of large number of rows with regard to freelists?
3. Other consideration is that it will cause lot of table/segment fragmentation and lot of empty blocks below the HWM and all full table scans on the table will read lot of empty blocks.

Is my understanding correct and if I have missed any other things as to why DELETE/UPDATE of large tables is not recommended, please let me know.

4. If my table that needs to be updated/deleted is the primary/parent table with dependent child tables/constraints and triggers, is this approach correct
Disable all the triggers/constraints
Drop the table
Rename the new table to the old table
Reenable all constraints and triggers
What to do in such cases?

March 15, 2004 - 3:03 pm UTC

I "might", "probably" if it was most of the records...

1) correct
2) correct, deletes would put them on, updates might put them on, might take them off.

3) it could, yes.

the delete/update can take lots longer than CTAS + reindex if you are doing MOST all of the records in a LARGE table with lots of indexes.

4) triggers are not relevant here....

but you would have to drop the fkeys
do the CTAS+drop+rename
add the constraints back with enable/rely (don't need validation)

Deletes/Updates

A reader, March 15, 2004 - 5:44 pm UTC

Tom,

In the above discussion you mention :
1. The deletes will put them on because the used space in the block will drop below PCTUSED. Is it correct?
2. The update will put them on if the updated column makes the used space in the block fall below PCTUSED or the updated column makes the free space available in the block less than PCTFREE. Is it correct.

If both of the above understanding is not correct, please explain. Also, please let me know how we could track the movement of the blocks on and off the freelist. Any SQL scripts/tools available that I could use?

Thanks

March 15, 2004 - 6:24 pm UTC

1) the deletes could, yes.

2) yes.

to track individual blocks -- not without dumping blocks (and I don't go there). dbms_space and other tools (search this site for show_space for the script I use) can show you how MANY things are on the freelist.

A small correction

A reader, March 15, 2004 - 6:30 pm UTC

Tom,

In the above scenario for question 2, there is a small correction marked in CAPITAL letters :
"The update will put them on if the updated column makes the used space in the block fall below PCTUSED AND WILL TAKE OFF of the freelist if the updated column makes the free space available in the block less than PCTFREE. "
Is it correct?

Thanks

March 15, 2004 - 6:33 pm UTC

I'll state it simply, forget pctfree/pctused/etc -- in english:

if the update reduces the row size, it can put it back onto the freelist.

if the update increases the row size, it can take it off the freelist.

Parallel DML

A reader, March 16, 2004 - 8:22 am UTC

1)On page 630 of your Oracle Expert One on One, you mention that parallel DML can be used only upon partitions -- in 8i. I'm using 9.2. Has this behavior changed in the later versions ?

2)I need to do a mass delete on a 12 GB table and it's taking to much time (far too much, really). How could I speed it up ?

Thanks

March 16, 2004 - 9:36 am UTC

1) yes, that is a 9ir2 new feature.

</code> http://docs.oracle.com/docs/cd/B10501_01/server.920/a96531/ch2_9ir2.htm#74432 <code>

2) did you read the original answer above?

enable PK with parallel clause

A reader, March 20, 2004 - 6:40 pm UTC

Hi

I am working in 8.1.7.4.

I want to delete 30 million rows of 60 million so I am doing this

1. create a copy of original table with good data
2. disable original tables´s constraints, primary keys (and child FKs) and foreign keys
3. truncate the original table
4. make all original tables indexes unusable
5. insert append back from copy to original, I have to do this way because I am not allowed to modify constraint names
6. drop the copy
7. enable constraints. here when I enable PK I cannot provide paralle clause right? I searched the doco but it seems that I can only specify using index tablespace clause or I am missing something?
8. rebuild all unusable indexes

March 21, 2004 - 9:50 am UTC

you can create an index in parallel (unique) and then enable/rely/novalidate the pk.

How to Update millions or records in a table

A reader, March 25, 2004 - 11:55 am UTC

Hi Tom,
I read your response to Murali's question above and believe, there will be a downtime window for the application. This is because if I want to keep the index names same as before, I will have to create the new table, drop the old table, rename the new table to old table name and then create the required indexes on the same.

Wondering, how if we can create the indexes by some other name on the new table and rename the indexes after dropping the old table.

As always your valuable advice helps a lot!

March 25, 2004 - 9:13 pm UTC

you can rename the indexes.

alter index supports this.

how to make this restartble?

A reader, March 25, 2004 - 6:14 pm UTC

Hi

I want to delete 80% of rows from a couple of 50 million rows table. The process is divided in 10 steps

1. Create TEMP table, CTAS from original table
2. Disable constraints
3. Truncate original table
4. Set indexes to UNUSABLE
5. Insert into original table from TEMP table
6. Drop TEMP table
7. Create PK and UK
8. Enable PK and UK
9. Enable FKs
10. Rebuild indexes

I want to make this process restartble, i.e if it fails in step 3 if I later rerun the procedure it will start from step 3 again. How can we achiveve this? I have been thinking using a table which stores the two table name and process status something like following table then update the status as the process is going

TABLE: PROCESS_STATUS
TABLE_NAME STATUS
----------- ---------------
EMP 2
DEPT 0

Process_status is the step, 0 means that I need to start from beginning so in each process I would query this table and see which step should be run. For example if the process failed at step 3 then STATUS would be 2

select status into l_proc_status from PROCESS_STATUS
where table_name = 'EMP'
if l_proc_status = 0
then
run process 1
end if;

select status into l_proc_status from PROCESS_STATUS
where table_name = 'EMP'
if l_proc_status = 1
then
run process 2
end if;

select status into l_proc_status from PROCESS_STATUS
where table_name = 'EMP'
if l_proc_status = 2
then
run process 3
end if;

select status into l_proc_status from PROCESS_STATUS
where table_name = 'EMP'
if l_proc_status = 2
then
run process 3
end if;

and so on.

Any suggestions :-? What I see is that I will hav quite a few redundant codes...

March 25, 2004 - 9:35 pm UTC

you'd have to keep a state table and have you code query it up, much like you are suggesting. That would do it yes.

another "inserting" idea might be to

a) insert the steps to process
b) delete them as you complete them (and commit)
c) to restart, just pick up at the step you wanted

you could insert the procedures to be called and just:

for x in ( select name from procedures order by seq )
loop
execute immediate 'begin ' || x.name || '; end;';
delete from procedures....
commit;
end loop

(just a thought, not fully baked)

provide more information about that procedure drop indexes and rebuild with nologging

mohan, April 07, 2004 - 2:39 am UTC

Hi Tom
Could you provide more information about that procedure and
how to drop indexes and rebuild with nologging .We are using informatica etl tool
before loading bulk data into target drop the indexes ( pre-session) and after load the data rebuilt the indexes with nologging (post-session ), it takes less amount of time because its generates less undo
Regards
Mohan

April 07, 2004 - 9:09 am UTC

it is just a drop and create? not sure what you are looking for -- if it is "syntax", we document that.

Problem with Update

Ram, April 08, 2004 - 6:23 am UTC

Hi Tom,
This update is not working properly.How to correct this?

SQL> declare
  2   x number;
  3   y number;
  4  begin
  5   execute immediate 'update e set sal = sal*1.1 where ename in        ('SMITH','KING') returning sal into :x,:y' using x,y;
  6   dbms_output.put_line(x || chr(9)||y);
  7* end;
                                                          *
ERROR at line 5:
ORA-06550: line 5, column 65: 
PLS-00103: Encountered the symbol "SMITH" when expecting one of the following: 
. ( * @ % & = - + ; < / > at in is mod not rem return 
returning <an exponent (**)> <> or != or ~= >= <= <> and or 
like between into using || bulk 

Please do reply.
Bye!

April 08, 2004 - 10:09 am UTC

..... ( ''SMITH'', ''KING'' ) .....

you need two quotes for a quote in a string (until 10g when there is another way)

Updating million records

Himnish Narang, April 08, 2004 - 7:34 am UTC

Hi Tom,

i was just going through the discussion in this subject. In this you have described that you will create a table instead of going for updates of million records and this update will be peroformed as part of creation of new table.

Don't you think that the new table will also occupy the same amount of space and after dropping the table we will have to do the reorg for that tablespace.

please comment.

Thanx.

April 08, 2004 - 10:24 am UTC

reorgs are so overrated.

no, i would not reorge a tablespace or anything simply cause I copied some data and dropped some old stuff. not a chance.

How to Update millions or records in a table

Ganesh, April 08, 2004 - 9:03 am UTC

Hi Tom,
Its very useful and I used this saved lots of time by creating new table.
I got another issue similar to this. We are using 9i. there is requirement of needs to modify primary key datatype number to varchar which is having lot of dependents. Is there any option with out rebuilding table as data already exist.
Can you please suggest on this.
Thanks in Advance.
Ganesh

April 08, 2004 - 10:51 am UTC

that'll be a rebuild pretty much.

Please see this

Ram, April 08, 2004 - 11:38 pm UTC

Hi Tom,
Thanks for your reply but Still it's not working properly.
SQL> declare
  2   x number;
  3   y number;
  4  begin
  5   execute immediate 'update e set sal = sal*1.1 where ename in (''SMITH'',''ALLEN'')
  6                      RETURNING sal into :x,:y' using x,y;
  7     dbms_output.put_line(x ||chr(9) ||y);
  8  end;
  9  /
declare
*
ERROR at line 1:
ORA-00913: too many values 
ORA-06512: at line 5 


How to correct this?You say about another way in Oracle 10G.How does that work?Could you please provide an example. 
Please do reply.
Bye!

April 09, 2004 - 7:42 am UTC

you are returning SAL (one thing) into two things (:x,:y)

different problem all together.

in 10g, the strings could be

q'|how's this for quoting|'

instead of
'how''s this for quoting'

a character string literal that starts with q for quote will use the next two characters as the start quote and those two characters in reverse for the end quote.

Thanks

Ram, April 09, 2004 - 9:39 am UTC

Hi Tom,
 Thanks for your reply.I found out the way of doing it as follows and Do you have any other option to do it in a better way?
SQL> create or replace type sal_t as table of number;
  2  /

Type created.

SQL> declare
  2  sals sal_t := sal_t();
  3  begin
  4   execute immediate 'update emp set sal = sal*1.1 where ename in (''SMITH'',''KING'')
  5                      returning sal into :sals' returning bulk collect into sals;
  6   for i in 1..sals.count loop
  7    dbms_output.put_line('Updated salary:'||sals(i));
  8   end loop;
  9  end;
 10  /
Updated salary:968                                                              
Updated salary:5500                                                             

PL/SQL procedure successfully completed.

Please do reply.
Bye!

April 09, 2004 - 10:17 am UTC

get rid of the dynamic sql, it isn't necessary (fixes your original issue with the quotes as well).  also, would seem to make sense to pull back other columns so you would be able to see what rows were actually updated:


ops$tkyte@ORA9IR2> declare
  2     type numArray is table of number;
  3     l_sals  numArray;
  4     l_empno numArray;
  5  begin
  6     update emp
  7        set sal = sal*1.1
  8      where ename in ('SMITH','KING')
  9  returning empno, sal bulk collect into l_empno, l_sals;
 10
 11      for i in 1..l_sals.count loop
 12            dbms_output.put_line
 13          ('Updated salary for empno ' || l_empno(i) ||
 14           ' to: '||l_sals(i));
 15       end loop;
 16  end;
 17  /
Updated salary for empno 7369 to: 880
Updated salary for empno 7839 to: 5500
 
PL/SQL procedure successfully completed.

To Mr. Ram

A reader, April 09, 2004 - 10:44 am UTC

Hi Ram and All,
May I request you all not to pester Tom with inane posts requesting Tom to debug and program on our behalf.
This forum is not to learn coding.

update based on rowid

john, April 16, 2004 - 2:02 am UTC

Tom,

we use non-intelligent primary key to update the table through a stored procedure.

i.e. when the table is queried, our select procedure returns the non-intelligent key along with the details like below

select emp_key, other details from emp;

when updating the data, emp_key is passed from the front end to the stored procedure and table is updated based on emp_key.

is it a bad idea, using rowid instead of non-intelligent key(emp_key)?

April 16, 2004 - 7:27 am UTC

non-intelligent? interesting choice of terminology.

But in any case -- once upon a time ago, rowids were IMMUTABLE. Once assigned a row would have a rowid assigned to it and that rowid would live with that row until you deleted it.

Starting in 8i with support for updates to partition keys that would cause a row to move from partition A to partition B -- that is no longer true (and then there are IOT's...).

In 10g, there are even more options for 'row movement' -- an online segment shrink for example.

So, rowids can change and are changing in more circumstances as time goes on.

Sooooo, if you have lost update detection in place using 'FOR UPDATE' -- rowids are very safe (forms uses them). What I mean is -- you

a) select a.*, rowid from T a where .... 'normally'
b) you decide to update a row, so you

select * from t where rowid = :that_rowid AND
((c1 = :that_c1 or (c1 is null and :that_c1 is null)) AND
((c2 = :that_c2 or (c2 is null and :that_c2 is null)) AND ...
((cN = :that_cN or (cN is null and :that_cN is null))
for update NOWAIT;

you lock it -- this prevents anything that could change a rowid from doing so and verifies the data hasn't been changed. If that returns 0 rows, someone changed the data (or reorg'ed it and moved it). You need to requery to get the current values before you let the user even think about modifying it. If that returns a row -- you got it. If that returns ora-54, something has it locked, you have to decide what you want to do about that.

c) you can then safely update that row by rowid

If you do not have lost update detection in place using FOR UPDATE locks -- then you should stick with a primary key probably, just to protect yourself in the future.

update based on rowid

john, April 19, 2004 - 12:42 am UTC

thanks a lot Tom.

to understand fully what you said, can you please tell

1.why do we have to where condition?

((c1 = :that_c1 or (c1 is null and :that_c1 is null)) AND
((c2 = :that_c2 or (c2 is null and :that_c2 is null)) AND ...
((cN = :that_cN or (cN is null and :that_cN is null))

2.another question is, is there a situation where a rowid of a row being assigned another row of the same table?(because this may cause more danger as we end up updating another row)?

thanks again.

April 19, 2004 - 6:37 am UTC

1) you want to make sure the row hasn't changed.

c1 = :that_c1 will only with if both the column and the bind are not null, hence the rest is needed to make it so null's compare "the same"

2) this is tied into 1 again. sure, we could delete from emp where empno = 7788; commit; and then re-insert that row and get the same rowid (rare, but could happen).

But - if we inserted the same exact values and they got the same exact rowid -- then #1 would make this "safe". the values of the row are the same, for all intents and purposes it IS the same row.

If the newly inserted row doesn't match column by column the values we expect, well, then we'll reject it (won't lock it) and all will be well.

update large table

Prasad Chittori, April 22, 2004 - 6:04 pm UTC

I have a very large partitioned table with DATE column, I would like to take out the time portion from the date column
I did the following and it is taking lot of time and failing with unable to extend the rollback segments.
Any better ideas without increasing the rollback segments or changing the query

update t1
set dt = trunc(dt)

April 23, 2004 - 10:12 am UTC

parallel dml -- each pq slave can get its own RBS, letting you use them all at the same time, not just one of them.

convert delete to insert

marvin, April 26, 2004 - 11:30 am UTC

Hi

We want to delete several tables of several million of rows. The problem is, the delete statement is quite simple but if we want to this faster we would insert to temp table, truncate original and insert back the good data. We want to convert delete into insert statements but I think it´s very hard

For example

We have these DELETE statements

DELETE
FROM OM_ANNEX t
WHERE EXISTS
(SELECT 1
FROM LISTA_ORDENES l,OM_CONTRACT_INST c
WHERE t.contract_id=c.contract_id
AND c.order_id=l.order_id)
/
COMMIT
/
DELETE
FROM OM_ANNEX t
WHERE EXISTS(SELECT 1 FROM OM_CONTRACT_INST c
WHERE t.contract_id=c.contract_id
AND c.order_id IS NULL)
/
COMMIT

to convert to that we did

INSERT /*+ APPEND */ INTO OM_ANNEX$TMP
SELECT t.*
FROM OM_ANNEX t
WHERE NOT EXISTS (SELECT 1
FROM LISTA_ORDENES l,
OM_CONTRACT_INST c
WHERE t.contract_id=c.contract_id
AND c.order_id=l.order_id)
UNION
SELECT t.*
FROM OM_ANNEX t
WHERE NOT EXISTS (SELECT 1
FROM OM_CONTRACT_INST c
WHERE t.contract_id=c.contract_id
AND c.order_id IS NULL);

Seems incorrect, if I run the queries inside insert seperately I get correct results but if we use UNIONs we get data which we dont want, if we do it in two inserts we get duplicate rows

One approach is this

create a temp table using the delete statements but converted to querys for example instead of

DELETE
FROM OM_ANNEX t
WHERE EXISTS
(SELECT 1
FROM LISTA_ORDENES l,OM_CONTRACT_INST c
WHERE t.contract_id=c.contract_id
AND c.order_id=l.order_id)

we convert to

select t.*
FROM OM_ANNEX t
WHERE EXISTS (SELECT 1
FROM LISTA_ORDENES l,OM_CONTRACT_INST c
WHERE t.contract_id=c.contract_id
AND c.order_id=l.order_id)

then simply says

select * from om_annex a
where not exists (select null from temp_table b
where a.annexid = b.annexid)

annexid is PK

the other approach is this

select * from om_annex a
where annexid not in (select annexid
FROM OM_ANNEX t
WHERE EXISTS (SELECT 1
FROM LISTA_ORDENES l,OM_CONTRACT_INST c
WHERE t.contract_id=c.contract_id
AND c.order_id=l.order_id)
union
select annexid
FROM OM_ANNEX t
WHERE EXISTS(SELECT 1
FROM OM_CONTRACT_INST c
WHERE t.contract_id=c.contract_id
AND c.order_id IS NULL))

They are all very slow because involving tables have several million of rows!

Is there a better approach?

April 26, 2004 - 2:20 pm UTC

why not:

create table new as select rows to keep;
drop table old;
rename new to old;

do the first in parallel, with nologging.

err the problem is convert the DELETE to CTAS

A reader, April 26, 2004 - 3:03 pm UTC

Hi

create table new as select rows to keep;
drop table old;
rename new to old;
do the first in parallel, with nologging.

That is exactly I want to do, the problem is until now we have always done the other way round, use plain DELETE (and it takes a week to delete everything!), the problem I am not sure how to convert DELETE to CTAS. If I want to do the reverse of DELETE statements (some table has 5 DELETE statements!) it is not as simple as write the DELETE the other way round? For example how would you change

delete tab1
where exists (select null
from tab2, tab3
where tab2.id = tab3.id
and tab1.id = tab2.fid)

delete tab1
where exists (select null
from tab2
where tab2.fid = tab1.id
and tab2.id is null)

Would you change it to

insert into tmp_x
select *
from tab1
where not exists (select null
from tab2, tab3
where tab2.id = tab3.id
and tab1.id = tab2.fid)

insert into tmp_x
select *
from tab1
where not exists (select null
from tab2
where tab2.fid = tab1.id
and tab2.id is null)

Is simple as this?

April 27, 2004 - 4:02 am UTC

if i had this:

delete tab1
where exists (select null
from tab2, tab3
where tab2.id = tab3.id
and tab1.id = tab2.fid)

delete tab1
where exists (select null
from tab2
where tab2.fid = tab1.id
and tab2.id is null)

I would probably have this:

create table tab1_new
as
select tab1.*
from tab1, tab2, tab3
where tab1.id = tab2.fid(+)
and tab2.id = tab3.id(+)
and NOT ( tab2.fid is not null and tab3.id is not null )
and NOT ( tab2.fid is not null and tab2.id is null )
/

outer join the three tables. Negate the conditions for the where exists.

that is, after outer joining tab1 to tab2, tab3 -- remove the rows

where tab2.fid is not null and tab3.id is not null -- that is subquery one in your deletes above.

where tab2.fid is not null and tab2.id is null -- that is subquery two in your deletes above.

err the problem is convert the DELETE to CTAS

marvin, April 26, 2004 - 3:04 pm UTC

thank you very much for the outer join tip

marvin, April 27, 2004 - 6:16 am UTC

Hi

I am going to have a look how to apply the outer join in order to convert DELETE to CTAS.

I have a further question, I have another table which undergoes 4 DELETEs

DELETE
FROM SW_PERSON t
WHERE EXISTS
(SELECT 1
FROM LISTA_ORDENES o
WHERE o.order_id=t.swobjectid AND t.swtype='ORDER')
/
COMMIT
/
DELETE
FROM SW_PERSON t
WHERE t.swtype='ORDER' AND t.swobjectid IS NULL AND COMP_INST_ID IS NULL
/
COMMIT
/
DELETE
FROM SW_PERSON t
WHERE t.swtype IS NULL
/
COMMIT
/
DELETE
FROM SW_PERSON t
WHERE t.swtype='ORDER'
AND t.swobjectid IS NULL
AND EXISTS
(SELECT 1
FROM OM_COMPANY_INST c, LISTA_ORDENES l
WHERE c.COMP_INST_ID=t.COMP_INST_ID
AND l.order_id=c.order_id)
/
COMMIT
/

I need to convert this to CTAS as well, however I am not sure if this can be done in a single statement. These DELETE for example cant be converted into one as follows right? (because of commit between them)

DELETE
FROM SW_PERSON t
WHERE EXISTS (SELECT 1
FROM LISTA_ORDENES o
WHERE o.order_id=t.swobjectid
AND t.swtype='ORDER')
OR (t.swtype='ORDER'
AND t.swobjectid IS NULL
AND COMP_INST_ID IS NULL)
OR t.swtype IS NULL
OR (t.swtype='ORDER'
AND t.swobjectid IS NULL
AND EXISTS (SELECT 1
FROM OM_COMPANY_INST c,
LISTA_ORDENES l
WHERE c.COMP_INST_ID=t.COMP_INST_ID
AND l.order_id=c.order_id));

Can this use the outer join tip as well?

TIA

April 28, 2004 - 12:17 pm UTC

why have any commits in between.

but of course -- any four deletes against a single table can (and if you ask me, should) be done as a single delete.

the outer join was used in the CTAS, not in a delete.

why do you use outer join

A reader, April 27, 2004 - 6:57 am UTC

hi

Why is outer join needed for tab1, tab2 and tab3 :-?

create table tab1_new
as
select tab1.*
from tab1, tab2, tab3
where tab1.id = tab2.fid(+)
and tab2.id = tab3.id(+)
and NOT ( tab2.fid is not null and tab3.id is not null )
and NOT ( tab2.fid is not null and tab2.id is null )
/

April 28, 2004 - 12:20 pm UTC

because we wanted to keep all rows in tab1 -- if there is NO mate in tab2/tab3

since they deleted "where exists in tab2/tab3":

delete tab1
where exists (select null
from tab2, tab3
where tab2.id = tab3.id
and tab1.id = tab2.fid)

delete tab1
where exists (select null
from tab2
where tab2.fid = tab1.id
and tab2.id is null)

rows in tab2/tab3, we need to ALWAYS keep rows that are NOT in tab2/tab3 using CTAS. outer join is mandatory for that in this example.

to marvin

A reader, April 27, 2004 - 9:33 am UTC

hi marvin, you can try this

select * from tab1
where PK not in (select PK
from tab1
where exists (select null
from tab2, tab3
where tab2.id = tab3.id
and tab1.id = tab2.fid)
union
select *
from tab1
where exists (select null
from tab2
where tab2.fid = tab1.id
and tab2.id is null)))

regarding the conditions

A reader, April 27, 2004 - 11:09 am UTC

Hi Tom

May you show some light why

NOT ( tab2.fid is not null and tab3.id is not null )
is same as
exists (select null
from tab2, tab3
where tab2.id = tab3.id
and tab1.id = tab2.fid)

and
NOT ( tab2.fid is not null and tab2.id is null )
is same as
exists (select null
from tab2
where tab2.fid = tab1.id
and tab2.id is null)

Cant see why. Thank you

April 28, 2004 - 12:50 pm UTC

it isn't the same.

it is in fact the opposite.

if you outer join T1 to T2 to T3

and before you were looking for (where exists)

a) a match in T2 (tab1.id = tab2.fid)
b) a match in T3 for that T2 (tabe2.id = tab3.id)

then you are saying "if I outer join T1 to T2 to T3, that row would be such that:

a) tab2.fid is NOT NULL (we found a mate)
b) tab3.id is NOT NULL (we found a mate in t3 for t2)

with the where exists -- we would have deleted that row, hence with the CTAS (which is finding rows to keep) we simply NEGATE that (with NOT).

Therefore we would keep that row, IF that was "not" satisfied.

Same logic for the second part.

the second where exists says

delete the row if

a) there is a match in T2 (where tab2.fid = tab1.id)
b) the id column in t2 for that match is NULL

that would be (in an outer join)

(tab2.fid is not null) -- we joined to a row
(tab2.id is null) -- and that row is having id is null

negate it and keep it.

Updating Table having millions of records taking lot of Time..

Anand Pandey, April 28, 2004 - 3:45 am UTC

Hi Tom,
I hd a table having millions of record in which two of its cols are Null.
i just tried to update the null colls with the data from other table, which is taking taking around 10-12 hrs for a single day record, and I've to update it for 31 days.
pls help Me in getting the high perf. on updation.

Query used is:

UPDATE /*+nologging parallel(4) */ MASTER_tab A SET (null_col1,null_col2)=(SELECT /*+PARALLEL(4) */ MIN(C1),MIN(C2)
FROM Table2 B WHERE SUBSTR (A.col1_T1, 1, LENGTH (B.C1)) = B.C1
AND SUBSTR(A.col2_T1,1,3)=B.C2)
WHERE c_date='01-Jan-04'

Thanks and Regards,
Anand Pandey

Nologging - how does it impact recovery?

Naresh, April 30, 2004 - 11:12 am UTC

Hi Tom,

This is a great chain of discussion. I especially liked the "outer join to replace the not exists". I am really looking forward to my copy of your first book that I oredered recently (on it's way from amazon).

One question regarding making the table nologging: Does it not have implications for recovery? What am I missing?

April 30, 2004 - 5:16 pm UTC

you need to schedule a hot backup if you use non-logged operations, yes.

db sequential waits on UPDATE

A reader, May 14, 2004 - 9:55 am UTC

Dear Tom,

Due to some processing complexities I am unable to use a single SQL for Update/Insert.

I use:
LOOP
1. Bulk select (500 rows at a time) from Table A, C with rowids from C
2. Bulk insert
3. Bulk update table C
END LOOP

I am getting a very high number of "db file sequential read" waits on the update part. Please see relevant portions of Trace File:
********************************************************************************
SELECT /*+ FULL(v) FULL(c) PARALLEL(v,4) PARALLEL(c,4) HASH(v,c) */
c.rowid,
...
FROM vp v,citi c
WHERE v.idno= c.idno

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.03 0.07 10 234 0 0
Execute 1 0.32 3.38 3023 156 3 0
Fetch 155 1.96 82.82 0 0 0 77500
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 157 2.32 86.28 3033 390 3 77500

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 95 (recursive depth: 1)

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 4 0.01 0.03
rdbms ipc reply 26 0.00 0.00
enqueue 2 0.02 0.02
PX Deq: Join ACK 7 0.00 0.00
PX Deq: Parse Reply 4 0.23 0.35
PX Deq: Execute Reply 717 2.06 735.45
PX qref latch 4 1.02 2.75
PX Deq: Table Q Normal 5218 0.35 23.95
********************************************************************************

INSERT /*+ APPEND PARALLEL(A,4) */ INTO t_v A
VALUES(:b1,
:b2)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.01 2 11 0 0
Execute 155 0.32 8.00 552 1116 5567 77500
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 156 0.32 8.02 554 1127 5567 77500

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 95 (recursive depth: 1)

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 552 0.36 6.43
free buffer waits 6 1.02 1.23
********************************************************************************

UPDATE /* PARALLEL(c,4) */ citi c
SET ...
WHERE c.rowid = :b6

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 1 0 0
Execute 154 35.26 586.87 75251 77036 79595 77000
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 155 35.26 586.87 75251 77037 79595 77000

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 95 (recursive depth: 1)

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 75578 1.04 461.09
free buffer waits 242 1.03 96.11
********************************************************************************

Is it not true that if the row has already been selected (and is there in the buffer cache) no disk IO should happen to update this row?

Please tell me a way to make this faster.

Regards

May 15, 2004 - 10:33 am UTC

insert /*+ append parallel... */ VALUES hmmmmmm

append = no go with values, only works with "as select"
parallel -- parallel on what exactly?

update /* parallel(c,4) */ hmmmmmm

in search of the fast = true switch. thankfully that hint is malformed and hence ignored as well -- you are updating by rowid.

parallel is *not* fast=true.
append is not either really, especially with the values clause.

you can review the trace file itself (p1,p2,p3 will tell you file and block info, along with blocks read). you can use that to verify that it is the very act of reading the indexes that need to be updated and maintained that is causing this. If they are not in the case, well, we'll need to read them into there.

More Info

A reader, May 16, 2004 - 11:41 pm UTC

Thanks Tom,
The hint in the update was there by a Developer, it has been rightly disabled. The insert is not a bottleneck so didn't look into it.
This is a development machine, there are no other jobs running, asynchronous IO is enabled, the machine is on RAID 0 (no fault tolerance - being a development one). There are NO INDEXES on Tables being inserted and updated.

The segment on which the WAITS (db sequential read) are happening are that of the TABLE that is being UPDATED.

Please guide me next.

May 17, 2004 - 7:29 am UTC

then you are seeing physical IO performed to read into the buffer cache the data needed to be updated. If it is not cached, we'll have to read it.

PARALLEL DML

Sar, May 28, 2004 - 2:50 pm UTC

Tom,

I need to update a table that has 50 million rows but
the number of rows affected are only 1 million rows.

I have a single update statement to do this.

UPDATE acc_proc
SET acc_seq_num = 0
WHERE acc_seq_num > 0
/

The column acc_seq_num is not part of an index.

Can you please suggest me if there is anything better
that I can do on this SQL to run it faster.

Thanks
Sar

May 28, 2004 - 2:56 pm UTC

nope, that is perfect.

shouldn't take very long at all. unless you are getting blocked constantly by other sessions.

do i need to commit after execute immediate ? for dml or ddl ?

A reader, June 02, 2004 - 11:31 am UTC

</code> http://download-east.oracle.com/docs/cd/A87861_01/NT817EE/index.htm <code>

do i need to commit after execute immediate ? for dml or ddl ? or it dose automatically ?

I tried to find the answer but not found in doc. from above site please help

June 02, 2004 - 11:35 am UTC

DDL is processed *conceptually as:

begin
commit; -- commit anything you started, even if DDL fails, you are committed
do the ddl;
commit; -- commit the ddl
exception
when others
then
rollback; -- the ddl
raise;
end;
/

You need not commit DDL, it'll commit 2 times for you.

You need to commit DML as DML does not do that.

This is regardless of the method used to execute SQL, native dynamic sql, dbms_sql, static sql, whatever -- the rules are the same in all cases.

Thanks !!

A reader, June 02, 2004 - 11:50 am UTC

ok,

so i did
1.)

begin
begin
Execute immediate 'SET CONSTRAINTS ALL DEFERRED';
end;

begin
execute immediate 'delete from table ' || t;
end;

begin
execute immediate 'delete from table ' || t1;
end;

exception
when others then
rollback;
raise;
end;
/

i got the message some constraint of t1 violated. (child record exists)
if i did not issue the commit,post or rollback,
and did not perform any DDL,constraints are DEFERRED
what is the issue ? can u help ?

June 02, 2004 - 11:56 am UTC

...
begin
execute immediate 'set constraints all deferred';
execute immediate 'delete from ' || t;
execute immediate 'delete from ' || t1;
execute immediate 'set constraints all immediate';
end;
/

that is all the code you want -- the all immediate will validate the contraints

if they fail -- then the error (exception) will be propagated back to the client and the rollback of the statement (which is whatever the client sent over to the database -- a plsql block, whatever) will be undone.

A reader, June 02, 2004 - 12:01 pm UTC

...
begin
execute immediate 'set constraints all deferred';
execute immediate 'delete from ' || t; -- ***** this one
-- execute immediate 'delete from ' || t1;
-- execute immediate 'set constraints all immediate';
end;
/

the question is
as soon as it executes the
'delete from ' || t;
statement, it raises the referential constriant violation exception. why ?
as
1.) constraints all deferred
2.) it NOT a ddl
3.) I am not commiting(or ending the tx) manually

June 02, 2004 - 12:45 pm UTC

the constraint must not be deferrable.

you have to have deferrable constraints in order to defer them. the default is "not deferrable"

7.3.4 - extracting 5 million records from 100 million record table

A reader, June 15, 2004 - 12:39 pm UTC

Any suggestions on how to accomplish this on 7.3.4? No concept of nologging, partitioning, Export/query clause etc in 7.3.4..

How I can extract about 5 million records (based on a condition) into a separate database/server from a 100million record table? What's the most efficient way? I'm having to make do with very limited disk/space on both machines.

Thanks.

June 15, 2004 - 5:30 pm UTC

5 million is pretty small.

I'd just

create table as select ( concept of nologging did in fact exist, unrecoverable);
export it
import it

or use the sqlplus copy command if they are connected via a nice network.

A reader, June 16, 2004 - 5:38 pm UTC

Thanks. Had some long columns in there. So 'Create table as select' wouldn't work. Am just ending up plsql looping and hitting smaller chunks of the mega table. Then creating smaller target tables..then exp/imp.

Am afraid I won't have sufficient temp space to do a sqlplus "Copy from". Also, documentation talks about Copy not being intended for Oracle to Oracle DBs. No idea why.

June 16, 2004 - 6:12 pm UTC

copy is ONLY for oracle to oracle db's....

A reader, June 16, 2004 - 7:40 pm UTC

Per 7.3.4 documentation :
</code> http://download-west.oracle.com/docs/pdf/A42562_1.pdf <code>

Excerpt from it -

Note: In general, the COPY command was designed to be used
for copying data between Oracle and non-Oracle databases. You should use SQL commands (CREATE TABLE AS and INSERT)
to copy data between Oracle databases.

June 17, 2004 - 7:53 am UTC

but if you think about it..... doesn't matter what the doc says, it takes two connections. connections are only to oracle databases. sure you could be using a gateway -- but even there, it would be true that create table as and insert would work.

Updating and Inserting 1 Million rows Daily with Bitmap indexes

Sidda, June 24, 2004 - 6:32 pm UTC

Hi Tom,
Here we are facing very big problem. We have a Partitioned table with 180Million records(with 70 columns and 10 bitmaps,10 B-tree indexes).End users need Bitmap indexes 24/7 so we can't drop and recreate table and indexes.Daily we have to update and insert 1Million records. We tried with bulk updates but invain. What is the best method to follow up?

Thanks in advance
Sidda

June 24, 2004 - 9:01 pm UTC

describe "in vain", what went wrong?

it would be best to do a SINGLE insert into and a SINGLE Update against this table (not even in bulk - just single statements)

Creating Table with aggregated data from another table

RB, July 26, 2004 - 1:16 pm UTC

July 26, 2004 - 1:32 pm UTC

so how many records have that id?

Great aproach, but is it as fast for an IOT table?

Peter Tran, July 26, 2004 - 2:34 pm UTC

Hi Tom,

I'm trying the same approach with an IOT table. We have an IOT table partitioned daily. I want to recreate this table with a monthly partition. I do a CTAS parallel nologging using the new monthly partition, but it's SLOW. Then again, the table does have 200 million rows. Is the "Index Organization" part of table the slow part?

Thanks,
-Peter

July 26, 2004 - 2:47 pm UTC

what do the inputs/outputs look like (table structure) -- remember, you are sorting 200,000,000 rows as well!

did you give it a nice juicy sort area size?

Unfortunately no.

Peter Tran, July 26, 2004 - 2:58 pm UTC

"...nice juicy sort area size"  That would be a negative.  :(  Man, this is going to take forever isn't it?   Ugh...

SQL> show parameter sort_area

NAME                                 TYPE        VALUE
------------------------------------ ----------- --------
sort_area_retained_size              integer     8388608
sort_area_size                       integer     8388608

SQL> desc odfrc;
 Name                                                  Null?    Type
 ----------------------------------------------------- -------- ------------

 ODIFID                                                NOT NULL NUMBER(10)
 TRPORGN                                               NOT NULL VARCHAR2(5)
 TRPDSTN                                               NOT NULL VARCHAR2(5)
 POSCOUNTRYCODE                                        NOT NULL VARCHAR2(3)
 PAXTYPE                                               NOT NULL VARCHAR2(1)
 DCP                                                   NOT NULL NUMBER(2)
 ODIFDATE                                              NOT NULL DATE
 FRCDATE                                               NOT NULL DATE
 BKGMEAN                                                        NUMBER
 BKGMEANINFLUENCED                                              NUMBER
 BKGVARIANCE                                                    NUMBER
 XXLMEAN                                                        NUMBER
 XXLMEANINFLUENCED                                              NUMBER
 XXLVARIANCE                                                    NUMBER

Here's my CTAS:

CREATE TABLE ODFRC_MONTHLY (
     ODIFID,
     TRPORGN,
     TRPDSTN,
     POSCOUNTRYCODE,
     PAXTYPE,
     DCP,
     ODIFDATE,
     FRCDATE,
     BKGMEAN,
     BKGMEANINFLUENCED,
     BKGVARIANCE,
     XXLMEAN,
     XXLMEANINFLUENCED,
     XXLVARIANCE,
    CONSTRAINT ODFRC_MONTHLY_PK PRIMARY KEY
      (ODIFID, ODIFDATE, TRPORGN, TRPDSTN,POSCOUNTRYCODE, PAXTYPE, DCP, FRCDATE)
   ) ORGANIZATION INDEX nologging parallel 8
  PARTITION BY RANGE (ODIFDATE)
(PARTITION ODFRC_20021130  VALUES LESS THAN (TO_DATE('2002-12-01', 'YYYY-MM-DD'))
  TABLESPACE ODFRC_IDX01 ,
 PARTITION ODFRC_20021231  VALUES LESS THAN (TO_DATE('2003-01-01', 'YYYY-MM-DD'))
  TABLESPACE ODFRC_IDX02 ,
 PARTITION ODFRC_20030131  VALUES LESS THAN (TO_DATE('2003-02-01', 'YYYY-MM-DD'))
  TABLESPACE ODFRC_IDX03 ,
 PARTITION ODFRC_20030228  VALUES LESS THAN (TO_DATE('2003-03-01', 'YYYY-MM-DD'))
  TABLESPACE ODFRC_IDX04 ,
 PARTITION ODFRC_20030331  VALUES LESS THAN (TO_DATE('2003-04-01', 'YYYY-MM-DD'))
  TABLESPACE ODFRC_IDX05 ,
 PARTITION ODFRC_20030430  VALUES LESS THAN (TO_DATE('2003-05-01', 'YYYY-MM-DD'))
  TABLESPACE ODFRC_IDX06 ,
 PARTITION ODFRC_20030531  VALUES LESS THAN (TO_DATE('2003-06-01', 'YYYY-MM-DD'))
  TABLESPACE ODFRC_IDX07 ,
 PARTITION ODFRC_20030630  VALUES LESS THAN (TO_DATE('2003-07-01', 'YYYY-MM-DD'))
  TABLESPACE ODFRC_IDX08 ,
 PARTITION ODFRC_20030731  VALUES LESS THAN (TO_DATE('2003-08-01', 'YYYY-MM-DD'))
  TABLESPACE ODFRC_IDX09 ,
 PARTITION ODFRC_20030831  VALUES LESS THAN (TO_DATE('2003-09-01', 'YYYY-MM-DD'))
  TABLESPACE ODFRC_IDX10 ,
 PARTITION ODFRC_20030930  VALUES LESS THAN (TO_DATE('2003-10-01', 'YYYY-MM-DD'))
  TABLESPACE ODFRC_IDX11 ,
 PARTITION ODFRC_20031031  VALUES LESS THAN (TO_DATE('2003-11-01', 'YYYY-MM-DD'))
  TABLESPACE ODFRC_IDX12 , 
 PARTITION ODFRC_20031131  VALUES LESS THAN (TO_DATE('2003-12-01', 'YYYY-MM-DD'))
  TABLESPACE ODFRC_IDX13) AS SELECT
     ODIFID,
     TRPORGN,
     TRPDSTN,
     POSCOUNTRYCODE,
     PAXTYPE,
     DCP,
     ODIFDATE,
     FRCDATE,
     BKGMEAN,
     BKGMEANINFLUENCED,
     BKGVARIANCE,
     XXLMEAN,
     XXLMEANINFLUENCED,
     XXLVARIANCE
FROM odfrc nologging;

The original ODFRC table has daily partition for one year.  Anyway, I can estimate how long this will take?  

-Peter

July 26, 2004 - 3:29 pm UTC

the nologging is in the wrong place

from odfrc nologging;

is the same thing as:

from odfrc a;

would you consider

a) create the table (empty, alter it to be nologging)
b) insert /*+ append */ into the individual partitions (instead of all at once) in parallel sessions?

Create Table with data from an aggregated sum of few fields from another table

RB, July 26, 2004 - 3:48 pm UTC

Tom: Qn related to creating a table with data from another table -

I want to create a table with few fields and aggregated some of few columns from
another table.

Here is a query:

create table SRC200401 as select ID, src, LOC, sum(BYTES) TOTAL, sum
(billable_bytes) TOTAL_BILLABLE from SRC_MONTHLY where ID = 1072915200 group by
ID,LOC, SRC;

I have 128Million record in the SRC_MONTHLY table and it is taking more than 1
hr to get this table created. I have index on ID, src and LOC fields.

Any faster way of getting this table created?

Followup:
so how many records have that id?

RB: Tom - This number varies - we have so many IDs in the master table. If I pass one id then the query will have one equi join with that ID if more than one I was planning to use IN clause. So I do not know how many records per id I will have it in the table at any given point of time.

July 26, 2004 - 3:51 pm UTC

now I'm confused -- the predicate is variant? why create *that* table then?

what is the goal here -- to make queries of the form:

select ID, src, LOC, sum(BYTES) TOTAL, sum
(billable_bytes) TOTAL_BILLABLE from SRC_MONTHLY where ID = :x group by id, loc, src;

faster in general? (you don't have to duplicate lots of text, it is all right here)

Great suggestion!

Peter Tran, July 26, 2004 - 3:58 pm UTC

Hi Tom,

Thanks for the useful suggestion.

1) Part one done (easy).
2) Is this what you mean for part 2? When you say parallel sessions, do you mean kick off a bunch of them using execute immediate?

INSERT /*+ append */ INTO odfrc_monthly(
ODIFID,TRPORGN,TRPDSTN,POSCOUNTRYCODE,
PAXTYPE,DCP,ODIFDATE,FRCDATE,BKGMEAN,
BKGMEANINFLUENCED,BKGVARIANCE,XXLMEAN,
XXLMEANINFLUENCED,XXLVARIANCE)
SELECT
ODIFID,TRPORGN,TRPDSTN,POSCOUNTRYCODE,
PAXTYPE,DCP,ODIFDATE,FRCDATE,BKGMEAN,
BKGMEANINFLUENCED,BKGVARIANCE,XXLMEAN,
XXLMEANINFLUENCED,XXLVARIANCE
FROM odfrc partition(ODFRC_20021114) nologging;

3) Should I still give it a nice juicy sort_area_size? :)

Thanks,
-Peter

July 26, 2004 - 5:13 pm UTC

2) fire up N sqlplus sessions and run the insert append in each

3) sure, let each session have a big SAS -- alter session can be used.

Create Table with data from an aggregated sum of few fields from another table

RB, July 26, 2004 - 7:57 pm UTC

Tom - If the user can select one or more ids. If I have more than one ID then I was planning to use an IN clause in the where clause. The temp table that I am creating will be used in a later phase of the app for other joins. What I am looking for a soln which will be must fater than my current approach. The Query that I have given with a 120M table is taking more than 1 hr to create the aggregated table.

July 26, 2004 - 8:19 pm UTC

what is the query plan in general then.

you would not create a temporary table in oracle -- that would be so sqlserver. just use that query in the "IN" statment in the first place!!!!!

Update...

Peter Tran, July 27, 2004 - 4:11 pm UTC

Hi Tom,

I wanted to give you an update on progress. The nice thing about your approach is I can monitor the progress, but it's not as fast I as thought it would be.

I started 6 sessions and gave each session a sort_area_size = 200M. I then executed a month's worth of insert each session. It's taking about 30 minutes per partition/day to insert from the source daily table to the monthly table. Each partition holds around 700K to 800K rows.

Should I expect to run this long?

Thanks,
-Peter

July 27, 2004 - 7:01 pm UTC

sounds long -- can you run one of the sessions with a 10046 level 12 trace and see what they might be waiting on?

sort_area_size, pga_aggregate_target, insert /+ APPEND/

Dusan Valasek, July 28, 2004 - 3:02 am UTC

Hi Tom,

I have two questions about setting of big sort_area_size before direct path load.
1) How does setting of sort_area_size (from session) affect direct path load? I thought that this is useful for sort operation and building of indexes.
2) If I use pga_aggregate_target, does setting of sort_area_size (from session) override pga_aggregate_target?

Thanks,

Dushan

July 28, 2004 - 8:14 am UTC

1) you are loading an IOT. big sort going on.

2) you would set the workarea policy to manual (in your session) to allow you to override the sort area size for this "batch operation"

A reader, July 28, 2004 - 12:10 pm UTC

Would you suggest to re-create the table when other users want to update it online

July 28, 2004 - 1:28 pm UTC

huh?

How to change a DATATYPE of column.

peru, July 30, 2004 - 7:54 am UTC

Hi Tom,
How to change a datatype of a particular column.
say a table with column VARCHAR2 for date.
Now i want to change the datatype DATE.
Assume that table has many records and referenced with
tables , procedures,triggers.
Thank you

July 30, 2004 - 4:43 pm UTC

not really going to happen.

you can add a new column that is a date, update this new column to be to_date() of the varchar2 and set the varchar2 to null, drop the old column and rename the newly added column to the old name.

Update to July 27, 2004

Peter Tran, August 22, 2004 - 11:11 am UTC

Hi Tom,

Sorry it took awhile to get back with you on this. You wanted to see a 10046 level 12 trace. I wanted to do some research first and I had to rebuild the table to reproduce the step.

Here's the trace.

insert /*+ append */ into odfrc_monthly select * from odfrc
partition(ODFRC_20040201)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.01 0 0 0 0
Execute 1 15.08 5776.28 11657 15874 111333 569939
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 15.09 5776.29 11657 15874 111333 569939

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 34

Rows Row Source Operation
------- ---------------------------------------------------
1 LOAD AS SELECT
569939 INDEX FAST FULL SCAN ODFRC_PK PARTITION: 185 185 (object id 12351)

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
control file sequential read 8 0.00 0.00
enqueue 1911 3.00 5726.43
db file sequential read 38 0.03 0.20
db file scattered read 380 0.63 9.50
direct path write 19 0.00 0.00
direct path read 10 0.00 0.00
log buffer space 45 0.61 12.01
log file switch completion 2 0.28 0.33
log file sync 1 0.00 0.00
SQL*Net message to client 1 0.00 0.00
SQL*Net message from client 1 0.00 0.00

Jonathan Lewis wrote in his book that you cannot use direct-path insert using the APPEND hint into an IOT table. Can this be the reason for the large enqueue timed event? Are the 6 sessions waiting to lock the index to modify it?

Thanks,
-Peter

August 22, 2004 - 4:55 pm UTC

Umm, you were blocked by someone else for an excessively long period of time here. Enqueue waits:

enqueue 1911 3.00 5726.43

normal "blocker, blockee" problem here! doh, it was me doing it to you.

only one session at a time can append. only one session at a time can direct path insert into a table.

sorry -- use normal insert, I goofed.

delete 2M records without dropping table.

Sean, August 23, 2004 - 9:18 pm UTC

Hi Tom,

I have to delete 2M records from 6M records and the table has about 400 columns (Oracle 920, Solaris 9).

I understand that your suggestion of creating temp table with the records I needed, then drop the original table and change the temp table name.

But since our table is not that big and the application is using this table all the time, we try to use traditional delete method to accomplish this.

(1)If we have enough redo tablespace, what is the good number of rows for each delete. I tried each commit for 10000 records or 1000 records. Both are quite slow.
(2)Anything you can suggest to improve the speed of the delete in the following procedure.

Thanks so much for your help

------------------------------------------------------------
create or replace procedure sp_del_audit(p_total_loop_time number)
is

total_loop_time number := 1;

BEGIN
WHILE total_loop_time <= p_total_loop_time
LOOP
delete /*+ parallel */ from audit a
where rownum < 10000
and exists ( select /*+ parallel */ null from audit_temp b
where a.Pkey = b.Pkey and
a.row_type = b.row_type and
a.update_time = b.update_time);

commit;

total_loop_time := total_loop_time + 1;

END LOOP;
END;
/
------------------------------------------------------------

August 24, 2004 - 7:19 am UTC

1) assuming you meant UNDO tablespace, there is no such thing as a REDO tablespace...

removing 2 million rows, with 400 columns (every row in this table is CHAINED, when you have more than 255 columns...) and probably many indexes -- it is not going to be what you might term "speedy".

2) do it as a SINGLE delete (deleting 10,000 records in "parallel" doesn't even begin to make sense here. PQ is for BIG BIG BIG things. this is tiny)

delete from audit
where (pkey, rowi_type, update_time) in ( select pkey, row_type, update_time
from audit_temp);

nothing more, nothing less. but don't expect this to be super fast if this table is indexed. removing 33% of the index entries is going to take some amount of time.

Deleting 90% of rows

A reader, October 13, 2004 - 10:54 pm UTC

As part of some ETL processing, I need to delete 90% of the rows in a large (6million rows) table. This table has a PK index so traditional delete takes a long time. I see the following options

1. Mark the index unusable, delete and rebuild index nologging. This is significantly faster than plain delete
2. create new table as remaining 10% of the rows, drop, rename, etc
3. Precreate a regular or GTT and do insert /*+ append */ in my batch program.

(2) is not very attractive as I dont want to be doing DDL in my code.

How would you compare (1) and (3) above? Both of them would generate similar amounts of undo/redo, take roughly same amount of time?

Thanks

October 14, 2004 - 9:19 am UTC

ETL is 'special', it doesn't matter 5 times a minute.

I'd go with #2 actually, CTAS a newone, drop oldone, rename newone to oldone.

#3 is "dangerous" -- you could lose everything in the event of a failure.

A reader, October 14, 2004 - 11:02 am UTC

"#3 is "dangerous" -- you could lose everything in the event of a failure"

Why would I lose everything? What kind of failure? Instance or media failure? Oracle guarantees recoverability of commited transactions, right, why do you bring that up here?

The only reason I dont want to do (2) is that I usually want to prevent DDL in my code. The only difference between (2) and (3) is that the table is already created in (3), right? Yes, (3) would generate much more redo/undo (lesser if I create the table as nologging), but I thought using a GTT would cut down on the redo/undo as well?)

October 14, 2004 - 11:16 am UTC

i saw "use a gtt"

i could only assume you mean:

insert rows to keep into gtt
truncate table
insert rows from gtt into the table

that would be dangerous. if you fail after the truncate, before/during the insert -- gone, all gone. gtt would be dangerous here.

(truncate is ddl btw...)

How can you Display multiple rows in one record

Mack, October 14, 2004 - 2:34 pm UTC

Hi Tom,

Let's suppose in deptno 10, there are 3 to 5 employees, I want to see the emp names like
TOM, KIM, JOHN... and so on.

Is there an easy way to do it in SQL? Not through PL/SQL. The number of records are unknown, it could have 10, 20 or one hundred. Please advise.

October 14, 2004 - 7:42 pm UTC

collect() in 10g

stragg (search for it -- nasty plsql you install once and use over and over and over and over and over) in pre 10g

Query first and then update

A reader, October 15, 2004 - 1:23 am UTC

We have two tables with approximately 28 million and 35 million records respectively. These tables are joined to produce data to be displayed to users using IE browsers. Based on this information, around 10000 records get added to these tables and around the same updated daily. Our SLA is to display each screenful (100 rows) in 2-4 seconds.

While partitioning is being reviewed to improve the performance for the queries, could you let us know if there are any issues regarding partitions? For instance someone has reported that using global indexes on a partitioned table has degraded the performance.

October 15, 2004 - 11:19 am UTC

i seriously doubt partitioning is going to be used to increase the performance of these queries.

partitioning -- great for speeding up a full scan. are you suggesting to full scan and still return in 2-4 seconds?

"someone reported using global indexes ... degraded performance"

I just about fell out of my chair on that one. geez.

If you have my book "Effective Oracle by Design" -- I go into the "physics behind partitioning".

In order to return the first 100 rows to a web based application -- you are doing to be using indexes (or you are not going to be doing what you signed up to do -- funny, you have an SLA in place but no idea if you can live up to it...). whether the tables are partitioned or not probably won't have any bearing on making this faster. Partitioning could certainly make things slower if you do it wrong, but in a system like this, you would be looking at partitioning to make administration easier and hopefully not NEGATIVELY impact performance.

given two tables...
index access
get 100 rows

I personally would be shooting for well under 1 second response times for everything -- regardless of whether there was 1 row or 1 billion.

don't get the tie into "query first and then update" though.

A reader, October 18, 2004 - 1:33 am UTC

"Our SLA is to display each *screenful* (100 rows) in 2-4 seconds."

In our case the queries, after joining 2 tables of 28 million and 35 million records, could return 10 screens or 600 screens or many more screens based on the query parameters. Each screenful (with 100 records in each screen) should appear in 2-4 seconds.

October 18, 2004 - 8:13 am UTC

so, you've signed up for an SLA you have no idea if you can meet.

but hey -- using indexes to retrieve 100 rows from 1, 1000, 10000000000000 should be about the same amount of time (and way under subsecond).

but -- getting to "screen 600" is not. Look to google as the gold standard for searching and web pagination

o totally estimate the number of returned rows -- don't even THINK about giving an accurate count

o don't give them the ability to go to page "600", pages 1-10 is more than sufficient

o even if there is a page 600 -- realize it doesn't make sense to go there (no human could know "what I need is on page 600 -- 6,000 rows into this result set). Google stops you at page 99

o understand that page 2 takes more time to retrieve than page 1, page 50 more than 2, and so on (as you page through google --each page takes longer)

But perhaps most importantly -- laugh at people that say things like:

"For instance
someone has reported that using global indexes on a partitioned table has
degraded the performance."

and say -- "yes, it has when improperly applied it can degrade performance, there have also been sightings of systems where it didn't affect performance at all, there have also been reported instances of them massively improving performance. Now what we have to do is understand how the feature works, what it does, and how it might apply to our problem"

The neat thing about that paragraph -- it is INFINITELY reusable. You can use it with regards to any feature!

(if you have effective Oracle by design -- i go into the "physics" of partitioning and who -- without the judicious use of global indexes, your system could fall apart and run really really slow as well)

Senior Statistical Analyst

Hank Freeman, October 18, 2004 - 5:58 pm UTC

18 October 2004

Tom,

Thanks for the lob_replace code !!!
It worked wonders when trying to fix about 25,000 clobs with a known error. Here my detailed discussion in outline and then in detail.

1. What type of data was is in the CLOB
2. What went wrong to create the error
3. What was done to correct it.
a. Stored Proc
b. Declare syntax
c. VBA code in Excel

1. What type of data was is in the CLOB. The company has about 45,000 internet/intranet web pages stored in an Oracle database table field which is a CLOB. Meaning instead of calling the website from a file server or physical location the site, the entire website-webpages/html source code is held in the table at this CLOB field.
2. What went wrong to create the error. An error occurred when these records were being modified to remove a specific piece of information and a replacement null character was inserted. The null character for some unknown reason did not work and the web-page information in the CLOB field got garbage appended to the end of the web-page after the closing </html> tag.
a. It was easy to spot the records but a conventional update within would not work because of all the size of the web-pages being so large and the special characters that were found within and the unique nature of working with clobs.
b. I contacted a friend who pointed me to your site and URL: </code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:1533006062995

c. Upon examination of the last changes you made for a previous person, the solution came into focus clear on how to update the records and, it had to be done with a program.
3. What was done to correct the problem.
a. The last stored procedure you listed on your website was used, because it was applicable to our situation. This procedure allowed the creation of a specific version of the Declare sql to select the bad part of the CLOB record and replace it with a (space character) and not a null.
i. The Stored Proc syntax, is not listed since it is the Toms lob_replace syntax verbatim.
ii. The Declare statement is listed below would take the unique record id and the starting location of the bad information in the CLOB. The sql would then pass the CLOB, the data to find, and the replacing string (a space) to the stored procedure.

Declare Sql syntax

declare
l_lob clob;
l_substr varchar2(32767);
begin
select KS_DESCRIPTION into l_lob
from ks_longdescription
WHERE KS_LONGDESCRIPTIONID = 55012 for update;
select SUBSTR(KS_DESCRIPTION, 876, length(KS_DESCRIPTION) - 876) into l_substr
from ks_longdescription
WHERE KS_LONGDESCRIPTIONID = 55012;
hf_clob_replace (l_lob, l_substr, ' ');
end;

iii. It was decided to use Excel VBA to control the update process. It was contemplated to write a second procedure with for the declare statement and just call execute it with parameters. However, as you noticed the maximum value for the l_substr Varchar2 field was declared as 32767. It was already known that some of the erroneous parts of the CLOB data were larger than 25,000 characters. Therefore, Excel was used to ascertain which records were going to be within this limit and those outside a 25,000 character limit would require a manual fix.
iv. Approximately 6,000 records were loaded into an Excel worksheet and the following Macro code was written to process/update the records. This was the optimum method of performing the cleanup because the VBA code was written to decide:
1. Length of clob
2. Starting location of the junk in the clob
3. The unique record id was already known and with the aforementioned starting location being determined by the VBA code, it was a simple task to just pass there parameters so the VBA Declare SQL string and create a specific run for each qualifying record.
4. The VBA code then would execute the Declare SQL string, firing up the stored procedure hf_lob_replace, thus cleaning up the record.
5. If a record could be processed it was, otherwise it was marked for manual cleanup.
6. During the process it was determined that any l_Substr value greater than 20,000 would cause an Oracle error on the Buffer size. Therefore, upon see that the error would occur often any l_Substr(p_what) value greater then 20,000 was immediately marked for a SQL Buffer Failure, meaning it would have to be manually updated.. The important part here was that the process would not stop if did not like a specific record, it just kept going.

In Summary, it took less than ONE second to fix a record ,with an auto-commit turned on and we knew it was working for the process could be watched. Within four hours (6,000 per hour) using two systems, the 25,000 records were corrected with only 150 needing a manual fix, a highly acceptable result.

In closing, I would like to give credit to the following.
Mr. Andy Sword, who is/was a subscriber to Toms web page and submitted to me the first draft of the Declare statement we used. This was what we really needed to know !!! I had never worked with CLOB, so this was a real leason.
Tom Web page:

http://asktom.oracle.com/pls/ask/f?p=4950:1:

    Tom Reed:

http://the-big-o.port5.com/index.shtml <code>where some of the best Excel/VBA code for interfacing with Oracle can be found.
I have listed the VBA code below for those who are so inclined to read it. There is nothing there that is private and confidential. So anyone who can learn from it may feel free to do so.

Respectfully

Hank Freeman
FCS Ltd
hfreeman@msn.com

updating millions of records

Josphat, October 26, 2004 - 10:28 am UTC

You approach is quite ok for one line update but how about procedure which updates the next row based on value of previous row.
Kindly look at the following scripts and give hard suggestions on how 5 million rows could be processed in under 10 hours...capturing the business logic
----starting here
---3 Scripts to process over 5 million rows
---and constitute approx 40000 distinct identifiers (id)
---should exec as ordered

drop table temporary_emphist;

create table temporary_emphist
(ID VARCHAR2(15),PAYDATE DATE, AMOUNT NUMBER(10,2),
PPART NUMBER(13,2), IPART NUMBER(13,2), INTRATE NUMBER(5,2), OBAL NUMBER(13,2),
REF NUMBER(7), NOOFMONTHS NUMBER(13,2), ACCRUED_INT NUMBER(15,2))
/

-----values(ID ,PAYDATE,AMOUNT , PPART, IPART, INTRATE, OBAL, REF, NOOFMONTHS, ACCRUED_INT)
-----data for different ids being inserted (one id might have as many as 300 entries)
-----but i have put just a few for each id
----set for customer id 10979844

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE, OBAL)
values('10979844','30-jun-2003',2000,2,112000);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10979844','30-jul-2003',2000,2);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10979844','30-aug-2003',2000,2);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10979844','30-dec-2003',2000,2);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10979844','30-jan-2004',1950,2);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10979844','30-jun-2004',5000,2);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10979844','30-aug-2004',4000,2);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10979844','30-oct-2004',3000,2);

----set for customer id 10779844

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE, OBAL)
values('10779844','30-jun-2003',2000,2,99380);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10779844','30-jul-2003',2000,2);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10779844','30-aug-2003',2000,2);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10779844','30-dec-2003',2000,2);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10779844','30-jan-2004',1950,2);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10779844','30-jun-2004',5000,2);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10779844','30-aug-2004',4000,2);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10779844','30-oct-2004',3000,2);

----set for customer id 10679844
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE, OBAL)
values('10679844','30-jun-2003',800,4,40390);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-jul-2003',1500,4);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-aug-2003',1600,4);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-dec-2003',2000,4);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-jan-2004',1950,4);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-mar-2004',1950,4);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-jul-2004',1000,4);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-aug-2004',1000,4);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-sep-2004',1992,4);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-oct-2004',1200,4);

commit;
set lines 1000
select * from temporary_emphist
order by paydate,id;
create index temporary_emphistid on temporary_emphist(id);
create index temporary_emphistpd on temporary_emphist(paydate);

---script 1 (will number transactions for each customer from 1 to ..n then start from 1 for the next customer
DECLARE -----Procedure 19
V_2 NUMBER(7) :=1;
V_1 VARCHAR2(18);
v_3 VARCHAR2(18);
v_4 VARCHAR2(18) :='T';
v_5 date;
CURSOR DRECS IS
SELECT ROWID,id,paydate
FROM temporary_emphist
ORDER BY ID,paydate;
BEGIN
OPEN DRECS;
LOOP
FETCH DRECS INTO V_1,v_3,v_5;
EXIT WHEN DRECS%NOTFOUND;

if v_3 <> v_4 then
v_4 := v_3;
v_2 := 1;
end if;
UPDATE temporary_emphist
SET REF = V_2
WHERE ROWID = V_1;
V_2 := V_2+1;
END LOOP;
commit;
CLOSE DRECS;
END;
/
select * from temporary_emphist
order by id,paydate;

-----script 2

----gets the difference in months from one transaction to the next

DECLARE -----Procedure 20
V_1 NUMBER(7) :=0;
V_2 NUMBER(15,2):=0;
V_3 NUMBER(15,2) :=0;
v_4 NUMBER(15,2) :=0;
V_5 NUMBER(15,2) :=0;
V_6 NUMBER(15,2) :=0;
V_7 varchar2(15) :='0';
V_8 NUMBER(15,2) :=0;
v_9 number(15,2);
v10 number(35,10) := ROUND(1/12,10);
V11 number(15,2) ;
V12 DATE;
V14 DATE;
V13 NUMBER(13,2);
V15 number(5);

CURSOR UPLO IS
SELECT REF,OBAL,amount,ID,INTRATE,last_day(PAYDATE) FROM temporary_emphist
where nvl(ref,0)>1
ORDER BY ID, REF;

BEGIN
OPEN UPLO;
LOOP
FETCH UPLO INTO V_1,V_2,V_3,V_7,V11,V12;
EXIT WHEN UPLO%NOTFOUND;
---V13 :=V12;
uPDATE temporary_emphist
SET NOOFMONTHS=mONTHS_BETWEEN(v12,last_day(paydate))
WHERE NVL(REF,0) = V_1-1 and id = V_7 ;

END LOOP;

commit;
CLOSE UPLO;
END;
/
select * from temporary_emphist
order by id,paydate;

---script 3
-------------------------------------------puts interest per month
-------------------------------------------CALCULATES INTEREST ON OUTSTANDING BALANCE
---as per the numbering above(script 1) value of rows for customer
----will be manipulated to get values for the subsequent row though some input from the
----current row will be integrated with derivative of the previous to populate values into the
----current row. This is applied for each customer untill all are cleared

SELECT 'ESTABLISHING BALANCES ON REDUCING BASIS-----Procedure 21 STARTS ' FROM DUAL;
DECLARE -----Procedure 21
V_1 NUMBER(7) :=0;
V_2 NUMBER(15,2):=0;
V_3 NUMBER(15,2) :=0;
v_4 NUMBER(15,2) :=0;
V_5 NUMBER(15,2) :=0;
V_6 NUMBER(15,2) :=0;
V_7 varchar2(15) ;
V_8 NUMBER(15,2) :=0;
v_9 number(15,2);
v10 number(35,10) := ROUND(1/12,10);
V11 number(15,2) ;
V12 NUMBER(15,2) :=0;
V13 NUMBER(15,2);
v14 number(15,2) :=0; ---interest hold
v15 number(15,2) :=0; ---balance hold
v16 number(15,2) :=0; ---ipart hold
V20 VARCHAR2(18) :='T';
CURSOR UPLO IS
SELECT NVL(REF,0),NVL(OBAL,0),NVL(amount,0),ID,INTRATE,nvl(NOOFMONTHS,1) FROM temporary_emphist

ORDER BY ID,REF;
BEGIN
OPEN UPLO;
LOOP
FETCH UPLO INTO V_1,V_2,V_3,V_7,V11,V12;
EXIT WHEN UPLO%NOTFOUND;
if v_7<>v20 AND V_1=1 then
v20 :=v_7;
V_4 :=V_2-V_3;
v16 :=ROUND((((v_4)*POWER((1+V11/100),v10*V12)-V_4)/V12),2);
v15 :=v_4;
ELSE

select obal,INTRATE,nvl(NOOFMONTHS,1) into v_4,V11,V12 from temporary_emphist
where NVL(ref,0) = V_1 - 1 AND ID = V_7 ;

select nvl(amount,0) into v_3 from temporary_emphist
where NVL(ref,0) = V_1 AND ID = V_7 ;

v15 := ROUND((v_4)*POWER((1+V11/100),v10*V12),2)-v_3; -------obal
v14 :=ROUND((v_4)*POWER((1+V11/100),v10*V12),2)-V_4; ---accrued interest
v16 :=ROUND((((v_4)*POWER((1+V11/100),v10*V12)-V_4)/V12),2); --ipart
END IF;
UPDATE temporary_emphist
SET OBAL = v15,ACCRUED_INT =v14,IPART=v16
WHERE NVL(REF,0) =V_1 AND ID = V_7 ;
--END IF;
END LOOP;
commit;
CLOSE UPLO;
END;
/
select * from temporary_emphist
order by id,paydate;

I dont know whether there is a faster way to employ the logic...I've seen your authority..give it a short

October 26, 2004 - 10:42 am UTC

one word for you:

analytics

they rock, they roll. lag/lead -- i frankly did not really read all of your code (i'm not a compiler, v_1, v_2 etc make it really truly utterly hard to follow and comprehend) but if you need to update a row based on the prior row(s) after sorting -- analytics -- lag, lead, sum, whatever.

database

cp, October 26, 2004 - 10:45 am UTC

Hi Tom,

I have to appear for interview and need your help.
1. Hows to decide for datafile size?
2. How to decide for redolog file size?
3. Any other factors which you feel as interviewer to ask?

Please help me out as its really urgent
Regards

October 26, 2004 - 10:47 am UTC

how do you know they'll ask these questions -- not questions I would ask.

datafile size = pure function of amount of data you want to store. I mean, well, what else could you say?

redolog file size is a function of

o amount of redo generated by most of your transactions
o the number of these transactions you do in a period of time

predominantly (but if you are an experienced dba -- why not just fall back on your knowledge of what you've done in the past?)

problem

cp, October 26, 2004 - 1:42 pm UTC

So untill or unless we get some opportunity we cant learn the skills to decide for initial size of database right?

Please provide words about or facotrs to think on this

Thanks!

October 26, 2004 - 2:43 pm UTC

experience.

you get an entry level job (you have to sort of start in the basement) and get experience.

update of millions of records/rows

Josphat, October 29, 2004 - 10:29 am UTC

Hi Tom,
Thanks for decisive debate in this site.

Kindly go through my script cos now I have elaborated on
the v_1,v_2..etc just variables as you will see...SORRY for MY FIRST SUBMISSION. I will be grateful if ya suggestion /code
could reduce my processing time for 5 million records from
15 hrs to 1, 2, 3? .

---Script to process over 5 million rows
---and constitute approx 40000 distinct identifiers (id)

-------------------------------------------puts interest per month
-------------------------------------------CALCULATES INTEREST ON OUTSTANDING BALANCE
-----EACH customer HAS OWN ROWS STARTING 1..N AND REPRESENTED BY REF COLUMN ON TABLE
-----MULTIPLIER FACTOR FOR EACH MONTH IS REPRESENTED BY NOOFMONTHS
-----will be manipulated to get values for the subsequent row though some input from the
-----current row will be integrated with derivative of the previous to populate values into the
-----current row. This is applied for each customer untill all are cleared
-----PROCESSING IS SEQUENCIAL (I.E FROM REF 1 ..N FOR EACH CUSTOMER)
-----REF =1 FOR EACH CUSTOMER HOLDS A VALUE FOR OBAL WHICH WILL BE USED TO FILL UP
-----THE SUBSEQUENT ROWS AFTER MANIPULATION WITH MULTIPLIER FACTOR

----work starts next

drop table temporary_emphist;

create table temporary_emphist
(ID VARCHAR2(15),PAYDATE DATE, AMOUNT NUMBER(10,2),
PPART NUMBER(13,2), IPART NUMBER(13,2), INTRATE NUMBER(5,2), OBAL NUMBER(13,2),
REF NUMBER(7), NOOFMONTHS NUMBER(13,2), ACCRUED_INT NUMBER(15,2))
/

----DATA set for customer id 10979844

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE, OBAL,ref,NOOFMONTHS)
values('10979844','30-jun-2003',2000,2,112000,1,1);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10979844','30-jul-2003',2000,2,2,1);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10979844','30-aug-2003',2000,2,3,4);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10979844','30-dec-2003',2000,2,4,1);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10979844','30-jan-2004',1950,2,5,5);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10979844','30-jun-2004',5000,2,6,2);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10979844','30-aug-2004',4000,2,7,2);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10979844','30-oct-2004',3000,2,8,0);

----DATA set for customer id 10779844

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE, OBAL,ref,NOOFMONTHS)
values('10779844','30-jun-2003',2000,2,99380,1,1);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10779844','30-jul-2003',2000,2,2,1);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10779844','30-aug-2003',2000,2,3,4);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10779844','30-dec-2003',2000,2,4,1);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10779844','30-jan-2004',1950,2,5,5);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10779844','30-jun-2004',5000,2,6,2);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10779844','30-aug-2004',4000,2,7,2);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10779844','30-oct-2004',3000,2,8,0);

----DATA set for customer id 10679844
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE, OBAL,ref,NOOFMONTHS)
values('10679844','30-jun-2003',800,4,40390,1,1);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-jul-2003',1500,4,2,1);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-aug-2003',1600,4,3,4);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-dec-2003',2000,4,4,1);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-jan-2004',1950,4,5,2);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-mar-2004',1950,4,6,4);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-jul-2004',1000,4,7,1);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-aug-2004',1000,4,8,1);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-sep-2004',1992,4,9,1);

insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-oct-2004',1200,4,10,0);

commit;
set lines 1000
select * from temporary_emphist
order by paydate,id;

create index temporary_emphistid on temporary_emphist(id);
create index temporary_emphistpd on temporary_emphist(paydate);

---script STARTS

DECLARE
V1_REF temporary_emphist.REF%TYPE;
V2_OBAL temporary_emphist.OBAL%TYPE;
V3_AMOUNT temporary_emphist.amount%TYPE;
v_4 NUMBER(15,2) :=0;
V5 NUMBER(15,2) :=0;
V6 NUMBER(15,2) :=0;
V7_ID Temporary_emphist.ID%TYPE;
V8 NUMBER(15,2) :=0;
v9 number(15,2);
v10 number(35,10) := ROUND(1/12,10); ---CONVERTS ONE MONTH INTO YEARS
V11_INTRATE temporary_emphist.INTRATE%TYPE;
V12_NOOFMONTHS temporary_emphist.NOOFMONTHS%TYPE;
V13 NUMBER(15,2);
v14 number(15,2) :=0; ---interest hold
v15 number(15,2) :=0; ---balance hold AS CALCULATED
v16 number(15,2) :=0; ---ipart holdING INTEREST AS CALCULATED
V20 VARCHAR2(18) :='T'; ----ARBITRALY KEY TO ENABLE FOCUS ON DIFFERENT IDS
CURSOR UPLO IS
SELECT NVL(REF,0),NVL(OBAL,0),NVL(amount,0),ID,INTRATE,nvl(NOOFMONTHS,1) FROM temporary_emphist

ORDER BY ID,REF;
BEGIN
OPEN UPLO;
LOOP
FETCH UPLO INTO V1_REF,V2_OBAL,V3_AMOUNT,V7_ID,V11_INTRATE,V12_NOOFMONTHS;
EXIT WHEN UPLO%NOTFOUND;

----V20 IS VARIABLE KEY FOR CHECKING WHETHER WORKING ON SAME ID

if v7_ID<>v20 AND V1_REF=1 then
v20 :=v7_ID;
V_4 :=V2_OBAL-V3_AMOUNT;
v16 :=ROUND((((v_4)*POWER((1+V11_INTRATE/100),v10*V12_NOOFMONTHS)-V_4)/V12_NOOFMONTHS),2);
v15 :=v_4;
ELSE

select obal,INTRATE,nvl(NOOFMONTHS,1) into v_4,V11_INTRATE,V12_NOOFMONTHS from temporary_emphist
where NVL(ref,0) = V1_REF - 1 AND ID = V7_ID ;
select nvl(amount,0) into v3_AMOUNT from temporary_emphist
where NVL(ref,0) = V1_REF AND ID = V7_ID ;
--------V10 IS ONE MONTH COVERTED INTO YEARS
v15 := ROUND((v_4)*POWER((1+V11_INTRATE/100),v10*V12_NOOFMONTHS),2)-v3_AMOUNT; -------CALCULATED obal
v14 :=ROUND((v_4)*POWER((1+V11_INTRATE/100),v10*V12_NOOFMONTHS),2)-V_4; ---CALCULATED accrued interest
v16 :=ROUND((((v_4)*POWER((1+V11_INTRATE/100),v10*V12_NOOFMONTHS)-V_4)/V12_NOOFMONTHS),2); --ipart
END IF;
UPDATE temporary_emphist
SET OBAL = v15,ACCRUED_INT =v14,IPART=v16
WHERE NVL(REF,0) =V1_REF AND ID = V7_ID ;
--END IF;
END LOOP;
commit;
CLOSE UPLO;
END;
/
select * from temporary_emphist
order by id,paydate;
spool off

October 29, 2004 - 10:49 am UTC

this isn't the right place for "recode my application".

show me your attempt at using analytics.

(UPDATE temporary_emphist
SET OBAL = v15,ACCRUED_INT =v14,IPART=v16
WHERE NVL(REF,0) =V1_REF AND ID = V7_ID ;

is still really obscure, comments didn't "do much" really)

Reader

dheeraj, November 21, 2004 - 3:35 pm UTC

Excellent discussion as usual on this site.

Selecting data over a database link from 7.3.4 to 10g

Stewart W. Bryson, December 02, 2004 - 10:54 am UTC

Tom:

If I use the "INSERT /*+ APPEND */ ..." statement on a 10g database, but the "SELECT..." part of the statement is over a database link to a 7.3.4 database, would the bulk-loader still be invoked, or would the hint be ignored?

Also, if I do a "MERGE... " statement with the "INSERT..." part of the statement being an "INSERT /*+ APPEND */...", would the MERGE work correctly, and would the bulk-loader be used?

Thanks for your time.

December 02, 2004 - 11:54 am UTC

trick question

it isn't going to work -- 9ir2 and later do not interoperate with 7.3 and earlier.


if it did work

insert /*+ append */ into local_table select * from remote_table@dblink;

could be direct pathed.


insert /*+ append */ is supported only for insert as select -- not insert values.


ops$tkyte@ORA9IR2> create table t ( x varchar2(25), y varchar2(25) );
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> merge into t
  2  using dual
  3  on ( t.x = dual.dummy )
  4  when matched then update set y = dummy
  5  when not matched then insert /*+ append */ values ( dummy, null );
 
1 row merged.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select * from t;
 
X                         Y
------------------------- -------------------------
X
 
ops$tkyte@ORA9IR2> insert /*+ append */ into t values ( 'y', null );
 
1 row created.
 
ops$tkyte@ORA9IR2> select * from t;
 
X                         Y
------------------------- -------------------------
X
y
 
ops$tkyte@ORA9IR2> insert /*+ append */ into t select 'z', null from dual;
 
1 row created.
 
ops$tkyte@ORA9IR2> select * from t;
select * from t
*
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel
 
 
ops$tkyte@ORA9IR2>


<b>only the last insert append was a direct path...</b>

A follow-up question

Stewart W. Bryson, December 02, 2004 - 1:20 pm UTC

Thanks for the good information Tom.

I've looked and I've looked, but I still cannot find the information about 9iR2 and 7.3.4 database links stated in the documentation. As a matter of fact, the 9iR2 Migration guide lists that 7.3.4 Materialized View sites can interact with 9iR2 master sites, and vice versa:

</code> http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96530/migrep.htm#620714 <code>

As Advanced Replication utilizes database links, this seems a little contradictory. Of course, if you tell me once more (I'm a bit stuborn) that it won't work, I promise to believe you.

As I am currently architecting a solution to try and bring some of the data from our European sites into a central repository, and as I don't yet have access to those sites, I cannot simply test it as we have no 7.3.4 sites (thank God) in the US.

I appreciate your time and understanding.

December 02, 2004 - 7:21 pm UTC

contact support -- they will confirm

version 7.3 was so far out of any sort of support at the time of 9ir2 being released that they never even considered testing it. and it does not work.

I believe that documentation to be erroneously left over from 9ir1.

mg, December 03, 2004 - 10:30 am UTC

Hi Tom,

I have a table T in tablespace TS1.

Every day so much of insertions on table T.

So I needs to check the a perticular size of TS is exceeds, if then I needs to move the older data to another tablespace.

So could you please tell me the best way to do this?

meny thanks in advance.

December 03, 2004 - 11:01 am UTC

you sound like you really want a rolling window partitioned table -- a table partitioned by date.

A reader, December 06, 2004 - 10:20 am UTC

Can we use trigger to check the % of filled tablespace size ?

December 06, 2004 - 12:35 pm UTC

yes, but -- why? just to make the code run really really slow?

What about the larger tables?

Steven, December 15, 2004 - 3:44 pm UTC

I used to work at a place where the main table had 4 billion rows and growing. This tracked manufacturing details and rows for an item trickled in and then were "archived" (written to tape then deleted) as one. Finding a good partitioning scheme was difficult due to the large variable time that items could stay in the manufacturing process. That is, two rows inserted on the same day could be archived on vastly different timescales.

The system had around 40 million rows inserted and deleted on a daily basis (about 1%). Some of the items could have 1 million or more records; they needed to be archived at the same time. The table had a PK and a secondary index.

Obviously CTAS was not an option due to the 24x7 availability requirement and besides we usually didn't have enough space for a full copy of the main table.

Is this a case where you just kick off the delete and take your lumps?

(Hmm, given past growth curves, the table should by now be up to 20 billion rows and 200 million rows in/out daily.)

December 15, 2004 - 6:25 pm UTC

or, enable row movement....

partition by a date field, the date field represents "record done as of"

when you initially insert it, it has a "done as of" way into the future (in the active partition(s))

when you are done, you update this field and it moves from active into the "you can archive me later" partitions.

It would be like the delete but the delete would be happening continously as little transactions instead of a big delete all at once.

data warehouse daily load

shgoh, December 16, 2004 - 3:58 am UTC

Dear Tom,

One of my dw developer is having performance problem with their daily data load from legacy system. Approximately 5-8 million records inserted into a table (append load table). The table has a lots of constraints and none of us are familiar with the data given (it could have a constraint voilation).

What is the method to speed up the data loading process beside parallel, direct load and nologging ?

Appreciated if you can share your view. Since we don't know well about the data and the constraints, is it advisable to disbale constraint in this case. Thanks

Rgds
SHGoh

December 16, 2004 - 8:18 am UTC

parallel
direct
nologging

they are them. how long is "long"

if you are adding 5/8 million records a day -- are you using partitioning? you can look at disabling portions of partitioned indexes and rebuilding them after the fact.

Not a partionioning table

shgoh, December 17, 2004 - 4:14 am UTC

Dear Tom,

It is not a partioning table. Would it help if I go for parallel , nologging, direct and disable portion of the constraints? Thanks

Rgds
Shgoh

December 17, 2004 - 2:30 pm UTC

you are already doing parallel, nologging, direct -- disabling the constraints only means "gotta enable them".

you asked for additional ideas. partitioning.

How about Bulk Update?

Tony, December 22, 2004 - 5:13 am UTC

Tom,

I have two tables PO_HDR and PO_DTL having millions of records. I want to join the two tables, and update a flag in PO_HDR table. Can I use Bulk Update based on rowid here?

Example:

DECLARE

TYPE tblrowid IS TABLE OF ROWID INDEX BY BINARY_INTEGER;
arr_rowid tblrowid;

CURSOR po_cur
IS SELECT a.rowid FROM po_hdr a, po_dtl b
WHERE a.poseq = b.poseq
AND a.custid = b.custid
AND b.po_qty <> b.del_qty;

BEGIN

OPEN po_cur;
LOOP
FETCH po_cur BULK COLLECT INTO arr_rowid LIMIT 100000;
EXIT WHEN arr_rowid.COUNT = 0;

FORALL i IN arr_rowid.FIRST..arr_rowid.LAST
UPDATE po_hdr
SET del_flg = 'Partial'
WHERE rowid = arr_rowid(i);

COMMIT;
END LOOP;
CLOSE po_cur;

END;
/

Is rowid based bulk update is fast, and reliable? Is there any better, and faster approach?

Please advice me.

December 22, 2004 - 9:41 am UTC

update po_hdr
set del_flg = 'Partial'
where (poseq,custid) in ( select poseq,custid
from po_dtl
where po_qty <> del_qty );

period.

To update or insert by sql statement and not procedurally

DAV, December 23, 2004 - 2:44 pm UTC

Hi,Tom,
I have 2 questions:
The one is :
By the update - sql statement that you wrote above
say i want to execute on millions records, and i want to make commit every 1000 records
how can i do it?
The second question is:
If i have to use pl/sql table and i want to load
alot of records to the table say 70,000- have i do it by:
fetch bulk bind limit 1000
or SELECT ... BULK BIND ....?
Why is it a problem to load many records to pl/sql tables?
Thank's
DAV

December 23, 2004 - 3:08 pm UTC

you don't want to commit every 1000 records, that would take longer, consume more resources and break your transactional integrity.

if you have 70k rows to process, and want to do it 1k at a time, you'll be using fetch ... bulk collect ... limit 1000;

can you spell "memory"

Tom,sorry i didn't understand you.

DAV, December 23, 2004 - 3:49 pm UTC

Why fetch is better than sql statement at the case that written above. You told before that sql is better.
Can you answer me please to the second question thar i wrote above?
Tnank you very much.
DAV

December 23, 2004 - 4:30 pm UTC

not understanding you.

you asked:

If i have to use pl/sql table and i want to load
alot of records to the table say 70,000- have i do it by:
fetch bulk bind limit 1000
or SELECT ... BULK BIND ....?
Why is it a problem to load many records to pl/sql tables?

and I answered??? the question wasn't "single sql vs procedural code"

you started with "if I HAVE TO USE..."

you would not select BULK COLLECT for 70k records, you would use fetch bulk collect with the limit clause.

I was unclear - let me explain

A reader, December 23, 2004 - 5:12 pm UTC

Sorry ,Tom,
i didn't explain myself well.
What i meant was:

First:

if i have to update million records in the plsql procedure
is it better to do it in update sql statement or procedural kode like
fetch.....? i want to commit every n records because of rollback segment so if i use sql update statement how can i do it?

Second:

General question - IF I USE PROCEDURAL CODE - why it is not good to load say 40k of records to plsql table by SELECT...BULK BIND... but by fetch bulk bind limit ...
As i know plsql table's records are kept in pga memory,so what happend if there will be 40k of records?
Please,clarify the theory.
Thank you very much.

December 23, 2004 - 8:03 pm UTC

you need to size your rollback segment large enough to do you job. seriously -- really you do.

a million records is nothing.

big_table@ORA9IR2> update big_table set owner = lower(owner);

1000000 rows updated.

big_table@ORA9IR2> select used_ublk*8/1024 from v$transaction;

USED_UBLK*8/1024
----------------
92.4921875

big_table@ORA9IR2>

93meg, big deal.

If you commit every N records -- you do realize you have to make your code restartable somehow, so when you inflict the ora-1555 on YOURSELF by committing, or fail for whatever other reason -- you can restart where you left off. Now we are talking some code!

If you have to update tons of records and tons of columns AND they are indexed, disable indexes, consider recreating the table instead of a mass update (if you are updating millions of records -- this is going to be an offline thing anyway, you might be supporting queries but you are not going to have "high concurrency")

can you spell "memory"

and think about streaming -- keeping everyone busy. So you wait for a long time while oracle gets 40k records, jams them into ram, manages them, then you process them and do it all over. doing things in bite sized pieces is easier for everyone - couple hundred in , couple hundred out.

Update 1000000 in sql statement

A reader, December 24, 2004 - 1:24 am UTC

Thank's, Tom!
What is the message ora-1555?I haven't a guide.
And i really can't findthe way to mmit n rows in update sql statement.May you give me some idea?
About loading to PL/SQL table for example 80k, if i need procedural code of update at least 1000000 records and not update sql statement so why can'not i avoid pl/sql table and do it just by regular cursor?

December 24, 2004 - 9:01 am UTC

search this site for 1555, you'll see lots and lots. check out Expert one on one Oracle -- i wrote about it lots.

YOU DO NOT WANT TO COMMIT THE ROWS INCREMENTALLY. that is my fundemental premise here.

If you have access to my book "Expert one on one Oracle" -- you can read all about it.

Another question

A reader, December 24, 2004 - 3:58 pm UTC

About loading to PL/SQL table for example 80k, if i need procedural code of
update at least 1000000 records and not update sql statement so why can'not i
avoid pl/sql table and do it just by regular cursor?

Thank's

December 24, 2004 - 4:06 pm UTC

slow by slow is slow by slow (opps, keep doing that don't I, row by row is row by row)

Sorry,misunderstanding.

A reader, December 25, 2004 - 4:21 am UTC

I didn't explain myself well.
What i meant in my question is a generall question:
If a need a cursor ( and not sql statement) for some process that acceses to a big table,why is it better to load data to pl/sql table and run on plsql table than just fetch records directly from the db table ?

December 25, 2004 - 9:10 am UTC

I understood, you didn't understand me.

slow by slow processing (a row at a time)

is slower than

o a single sql statement
o bulk collect and forall updates

doing things in BULK is

o faster
o uses less resources

index DDL in the midst of transactions

Nik Malenovic, December 30, 2004 - 3:36 pm UTC

Tom,

this is a followup to your March 07, 2004 followup related to ddl operations on indecies whilst there are outstanding transactions in progress on the underlying table. You said

"that'll happen whilst there are outstanding transactions, yes. You'll want to keep trying until you get a chance to drop it."

We just ran into a similar production problem: the system was slow because we were missing an index on a column and there were updates running full table scans (don't ask!). Since there are users banging on the darn thing, there was no way to shutdown and add index so we had to do it on the fly. Here's what we did:

declare
l_attempt pls_integer := 0;
l_limit pls_integer := 2000*10; -- 10 seconds
l_rindex binary_integer;
l_slno binary_integer;
l_target binary_integer;
begin
l_rindex := dbms_application_info.set_session_longops_nohint;
l_target := 42;

dbms_application_info.set_client_info('Vincent The Builder');
dbms_application_info.set_module('rebuild_index', 'between_transactions');
for l_attempt in 1 .. l_limit
loop
begin
dbms_application_info.set_session_longops
(
rindex => l_rindex,
slno => l_slno,
op_name => 'rebuilding the index',
target => l_target,
context => 0,
sofar => l_attempt,
totalwork => l_limit,
target_desc => 'attempt',
units => 'attempts'
);

execute immediate 'create index RESPONSE_QUESTIONID_INDX on response (questionid) nologging parallel 5';
dbms_output.put_line('SUCCESS - now rebuilding');
exit;
exception
when others then
null;
end;
if l_attempt = l_limit then
dbms_output.put_line('FAILED - there was not a phat chance to succeed');
end if;
end loop;
dbms_application_info.set_module(null, null);
dbms_application_info.set_client_info(null);
end;
/

and in another session we monitored the progress with

select sid, message from v$session_longops where time_remaining > 0;

that's all. This may be useful for somebody in a similar situation, hence the post.

thanks,

Nik

December 30, 2004 - 4:10 pm UTC

you could try (for maximum "uptime")

create index .. ONLINE;

that would

a) block and wait for outstanding transactions to complete but...
b) block new transactions from starting and
c) once it started creating -- lets the new transactions proceeed and
d) at the very very end -- does a/b again to finish

reduces the amount of "downtime" by allowing people to still work in the middle (yes, they'll be full scanning, but working -- if that was something you wanted..)

CTAS doesnt have compute statistics?

A reader, January 06, 2005 - 11:15 am UTC

create index .... compute statistics;

Why doesnt create table have a similar compute statistics clause?

I have a CTAS that takes 5 hours during which Oracle already knows everything about the data (similar to create index). Gathering stats on this table later takes another 5 hours!

Is there a way to gather stats during the CTAS itself?

Thanks

January 06, 2005 - 11:31 am UTC

Oracle does not already know about everything.

Have you traced a dbms_stats.gather_table_stats to see how many queries, in how many different ways it gathers stuff?

about the only thing Oracle knows after a CTAS is number of rows and number of blocks (and so do you, you can set that)

there is no way it was computing histograms and such

I don't believe it takes 5 hours after the fact -- if it does, have you considered "estimate"

robin, January 18, 2005 - 8:53 am UTC

I am trying to modify your code from above:

/**********************************************/
set (c1,c2,c3) = ( select max(decode(segment3,'xxxxx',ccid)),
max(decode(segment3,'zzzzz',ccid)),
max(decode(segment3,'wwwww',ccid))
FROM gl_code_combinations t2,
gl_sets_of_books gb
WHERE t2.segment5 = 'yyyyy'
and t2.segment4 = t1.s4
and t2.segment3 in ('xxxxx','zzzzzz', 'wwwww')
and t2.segment2 = t1.s2
and t2.segment1 = t1.s1
and t2.coa_id = gb.coa_id
and gb.set_of_books_id = 1 )
/***************************************************/
to allow me to update one column based on several different where clauses and having no luck.

What I want to do is:

update table t1

set t1.c1 = 'X' where t1.c5 and t1.c6 are null

set t1.c1 = 'A' where t1.4 is not null and t1.c8 > sysdate

set t1.c1 = 'B' where t1.4 is null and t1.c8 < sysdate

etc....

I've tried a case statement and decode like above, but neither seems to compile....thanks

January 18, 2005 - 10:07 am UTC

well, in answer to the "what I want to do is"

update table t1
set t1.c1 = (case when c5 is null and c6 is null
then 'X'
when c4 is not null and c8 > sysdate
then 'Y'
when c4 is null and c8 < sysdate
then 'B'
end )
where (c5 is null and c6 is null )
or (c4 is not null anc c8 > sysdate)
or (c4 is null and c8 < sysdate)

I cannot comment on the other one -- well, because there isn't anything to comment on :)

updating large table

Tracy, January 20, 2005 - 9:50 am UTC

I have table_a with 50 million rows and table_b with 10 million rows.

table_a has a new column added called statid

Table_b has statid and tabaid. tabaid has a unique constraint on it and it is also a foreign key to the primary key of table_a (id)

I want to update all rows in table_a to set the statid column to match the statid column from table_b where the tabaid column in table_b = the primary key in table_a

which of the following two do you think is better, and is there some other altogether different way which would be better still?

UPDATE
(SELECT a.statid newval
,b.statid oldval
FROM table_a a
,table_b b
WHERE a.id=b.tabaid)
SET newval = oldval

or

UPDATE table_a a SET a.statid =
(SELECT b.statid
FROM table_b b
WHERE a.id=b.tabaid)

I've tested the above on a subset of the data using rownum, and both statements seem to update at a rate of about 300,000 rows per hour, but I would like to get it going faster than this.

January 20, 2005 - 10:46 am UTC

update the join.

expect to see a nice big juicy full scan and hash join. it'll be as good as it gets.

I have a similar situation where I must update millions of records fast

Ryan, February 09, 2005 - 10:58 pm UTC

First I want to say thank you very much for all your help this last week Tom, it has helped me out a lot. Unfortunately Oracle is not easy so I have another question.

I am updating millions of records using insert statements.
I am updating the records using insert with values. I'm not updating one table using another table.

example:
insert into dbname (FirstName, Number) values ('Ryan', 1);

called with different records millions of times.

When I call "commit" at the end of all the insert's it takes more than 17 hours to commit. Of course this was importing records from a 1.5 gigabyte text file with 83000 records. The time to call all insert statements was less than a half an hour. Can you think of any reasons why the commit would be so slow. There is another program at the office that can import the same data (in another database format) in less than 20 minutes and that program is calling insert and commit in a similar fashion so it can't be that oracle is just that slow. I'm also disabling all triggers at the beginning of the import, and enabling them at the end. I'm turning off logging and using /*+ APPEND */ and parallel hints but they probably don't help as I'm not using direct load inserts.

Any help would be greatly appreciated.

February 10, 2005 - 1:07 am UTC

<quote>
I am updating millions of records using insert statements.
</quote>

really?

sorry-- but i don't believe you on your commment about commit, you are mistaken.

committing is fast, regardless of the number of records. Your method of observation is somehow "flawed".

Unless of course you have an "on commit refresh" set of mv's or something.

So, turn on 10046 level 12 trace and trace it, use tkprof to report on it and see what you see.

query retrival slow it effects in the whole system

Praveen, February 10, 2005 - 3:41 pm UTC

hi Tom
When developer using the escape sequence in the selection criteria the response time is slow upto 15sec. this delay time effect the whole system. Kindly suggest the appropriate way to resolve this.
select count(*)
from pf_document_rv where
((r_folder_path like '/Jobs/52021/Assembly/UNDER~_BAR/%' escape '~')
and (lower(r_object_id) <> lower('f004000000010471')))
takes about 15 secs to run

select count(*)
from pf_document_rv where
((r_folder_path like '/Jobs/52021/Assembly/NOBAR/%')
and (lower(r_object_id) <> lower('f004000000010471')))
takes less then one second.
Perhaps this is helpful ... if I add the escape '~' specifier to this query it also runs slowly even though it doesn't use the escape character.
regards
PKR

February 11, 2005 - 7:38 pm UTC

so, why do you believe this has anything to do with escape?

(you do understand these two queries return totally different result sets and perhaps that -- that is the "difference")

have you tkprof'ed these queries and compared them?

RE : Row Migration

A reader, February 17, 2005 - 3:35 pm UTC

Hi Tom,

You mentioned in one of the above discussion threads that while updating a null-column to a not null value we have to make sure that we have enough PCTFREE and no row migration happens.
I have a table that gets 2 million rows on a daily basis and I want to update say a column for only 50000 rows.

1. What should be my approach
A. if the column to update is indexed.
B. if the column to update is not indexed
C. How do I find out row migration that may happen
during the update

Thanks

February 17, 2005 - 6:58 pm UTC

regardless of whether the column is indexed or not -- the approach would be "the same".  the index will make the update take longer as it does lots more work, but for just 50,000 rows -- it won't be a very big deal.

2,000,000 and 50,000 are only big if it is US dollars and it was your bonus and paycheck.  In a database, in 2005, they are very small numbers.

ops$tkyte@ORA9IR2> set timing on
ops$tkyte@ORA9IR2> update big_table set owner = owner||'x' where mod(id,40) = 0;
 
50000 rows updated.
 
Elapsed: 00:00:18.02
 
ops$tkyte@ORA9IR2> create index big_table_idx on big_table(owner);
 
Index created.
 
Elapsed: 00:00:27.84
ops$tkyte@ORA9IR2> update big_table set owner = owner||'y' where mod(id,40) = 0;
 
50000 rows updated.
 
Elapsed: 00:00:28.04
ops$tkyte@ORA9IR2> select count(*) from big_table;
 
  COUNT(*)
----------
   2000000
 
Elapsed: 00:00:03.02
ops$tkyte@ORA9IR2>


to discover if rows migrated, you would analyze the table (perhaps using the list chained rows, if you just analyze it, you'll get chain_cnt filled in the dictionary)

you cannot find out "before" the update what will happen.

RE : Update

A reader, February 18, 2005 - 8:59 am UTC

Tom,

I think I did not make the last statement regarding the UPDATE of 50000+ rows more clear. Here is my requirement :

1. I have a flat-file coming in to our system around 10.00 AM in the morning with over 2 million records and has the following fields :
ACCOUNT_NUMBER
SECURITY_ID
COST_BASIS
....
....
2. I have another flat file (a delta file containing only data that has changed) coming in on a daily-basis around 2.00 AM in the morning with around 50000+ records and has the following fields :
ACCOUNT_NUMBER
SECURITY_ID
.....
.....
which we directly load into a table B with the following fields :
ACCOUNT_NUMBER
SECURITY_ID
COST_BASIS <== which is set to NULL during this load
.....
.....
3. Once this file is loaded, we run a process which takes unmodified data (from prior run ie. which did not appear in file in step 2) so that the table B always has full-load which is the 2 million records I was talking about
4. We need to run a process that will update the COST_BASIS field only for modified data (ie. that appeared in today's delta load which will be the 50000+ records or so that I need to update. For data is brought over from previous run, we use prior run's value itself)

In order to perform this, I thought of using the 2 million file as an external table (say table A) and join this table with B and update the COST_BASIS from A to B for the modified records. We have the logic to identify the changed records however. But my concern is that the table A has to be full-scanned for 50000+ records from B and no indexes can be created on A.
I kindly need your suggestion as to what would be the best approach in tackling this problem.

February 18, 2005 - 9:24 am UTC

why would A be full scanned more than once in a join?

RE : Full Scan

A reader, February 18, 2005 - 10:13 am UTC

Tom,

This is because I thought if we join two tables, then it will do a Nested Loop join. For each row got from table B, it would have to full scan A to get matching rows and also that B (delta table) would be my driving table (since it has fewer number of rows to start with)

If my understanding is not correct, please advise.

February 18, 2005 - 1:59 pm UTC

hash join it would do in light of no indexes.

hash join it would do even in the presence of indexes because the CBO is smart and knows indexes are evil when most or all of the rows are going to be processed.

it would be a nice big juicy fast hash join.

Use MERGE command to combine steps 3 and 4

Logan Palanisamy, February 18, 2005 - 12:51 pm UTC

Combine steps 3 and 4 with the merge command. Your problem is solved. The External table with 2 million records will be scanned just once.

MERGE INTO B
USING A
ON (B.ACCOUNT_NUMBER = A.ACCOUNT_NUMBER)
WHEN MATCHED THEN UPDATE
SET B.COST_BASIS = A.COST_BASIS
WHEN NOT MATCHED THEN INSERT (B.ACCOUNT_NUMBER, B.SECURITY_ID, B.COST_BASIS, ..) VALUES (A.ACCOUNT_NUMBER, A.SECURITY_ID, A.COST_BASIS, ..)

"A" is your external table with 2 million records.
"B" is the one that initally contains 50,000+ records. After the merge, B will have 2 million records.

The INSERT clause is step number 3
The UPDATE clause is step number 4

Hope that helps

Delete Record !!!

Anupam, February 28, 2005 - 6:49 am UTC

What is the diffence between :

1. DELETE <table> TRUNCATE WHERE <clause>;
2. DELETE <table> WHERE <clause>;

?? Both works!!!

February 28, 2005 - 8:01 am UTC

just like

ops$tkyte@ORA9IR2> delete t this_is_a_correlation_name where 1=1;
 
27794 rows deleted.


works -- truncate is just being used as a correlation name there (a table alias)

Delete record

Anupam, February 28, 2005 - 6:50 am UTC

What is the diffence between :

1. DELETE <table> TRUNCATE WHERE <clause>;
2. DELETE <table> WHERE <clause>;

?? Both works!!!

Deleted message

Franc Uwa, March 03, 2005 - 10:11 am UTC

the deleted was of today and trying to see if you can help me to resend those deleted message of me back to my inbox

March 03, 2005 - 10:28 am UTC

huh?

Brad Doczkat, March 10, 2005 - 10:51 am UTC

What do you do when the resident DBAs do not approve of the use of this method? How do I convince them that this approach is much more streamlined and requires far less processing time than doing an update of millions of rows the old fashioned way? For my particular need that brought me to this thread, I only have to do a one time initialization of a one character column that I want to put a not null (with default) contraint on. There are 5.5 million rows in the table. I need to do this one-time as an implementation step, yet the DBAs do not seem to buy into this approach. They would rather update so many rows a night, over the course of several nights, doing frequent commits during the update process.

March 10, 2005 - 7:11 pm UTC

tell the dba's to do the work then?

Sorry -- I just don't know how to make your dba's do something they don't want to.

Help Please I don't Understand

A Reader, March 11, 2005 - 4:20 pm UTC

Tom:

I have a project due tomorrow that I am getting paid a boat load of money for and do not have a clue Could you write it for me without being compensated? Oh and answer my interview questions for me as well.

Just joking ;)

As always you have the best info on your site (dare we say better than meta link?).

However I have made it through two threads in just over 9 hours. Have you ever thought of hiring a few people and extracting: best part of thread ; lessons learned; sample code (I still have not found the latest and greatest 10g show_space); and a glossary for items like CTAS, (sic), SLA, FBI? Perhaps make a book out of it and sell it but it would have to be a searchable electronic copy. (Hmm, WIKI enable a portion of asktom (naa)?)

In this thread alone I found: q| quoted strings |; CTAS for speed; I really need to start using hints; I need to look into Analytics; I have to check if my DW really needs partitions

Thanks for everything.
A Reader.

(Guess response: This is a feature request Denied :)

March 12, 2005 - 9:58 am UTC

I do that -- it is called "a book" :)

Effective Oracle by Design -- best of asktom, with scripts as of the time of the book.

Expert One on One Oracle -- what you need to know, with scripts -- second edition under construction (in fact I'm going to take April "off" -- from asktom -- in order to just get it going)

Expert One on One Oracle second will be the 10g version of what you are looking for.

Alberto Dell'Era, March 12, 2005 - 10:16 am UTC

> best of asktom, with scripts
> as of the time of the book.

Yes, but that doesn't mean that the books are just a collection of q&a from asktom (as i've found many ones believe); they are "regular" books - the q&a are mainly used as "Customer-driven Topic Selection", so to speak...

March 12, 2005 - 10:20 am UTC

correct -- definitely not a collection of asktom articles, that would be disjointed at best. Rather, a flow -- a story -- with examples from the site to help it along. The things I pick to write about are driven very much by the questions and trends I see here.

A reader, March 13, 2005 - 2:45 am UTC

Hi Tom,
Regarding...
"in fact I'm going to take April "off" -- from asktom "

What will happen to desparate people, who are looking for answers supported by technical reasons? I am sure that others will be of the same opinion that this is one of greatest help ever one can get in Oracle in timely and accurate manner...

March 13, 2005 - 9:54 am UTC

start here:
</code> http://www.jlcomp.demon.co.uk/faq/ind_faq.html


and try:

groups.google.com -> comp.databases.oracle.*

the place where I got hooked on discussing Oracle in the first place way back in October of 1994.  You might get an answer from Jonathan Lewis, Connor McDonald, or a host of others.  I still lurk there from time to time.


otn.oracle.com -> discussion forums

metalink.oracle.com -> discussion forums

the oracle-l list

http://www.freelists.org/archives/oracle-l/ <code>
Where Cary Millsap or Steve Adams or Tim Gorman to name but a very few might answer your question.

You might even find yourself answering some from time to time!

dblink

shiva, March 23, 2005 - 2:49 am UTC

We are trying to access a stored procedure which is existing in another database thro dblink, in that stored procedure one update stat.. is there because of that screen gets hanged

Regards,
Shivachandramouli.R

March 23, 2005 - 8:46 am UTC

so little data... 'one update stat' what is that. and what have you done to diagnose the issue?

questions in the dependent objects after

A reader, March 27, 2005 - 10:03 pm UTC

Hi Tom,

As u suggested us to create a new table ,then drop the original table and rename
sounds the great method update the table by CTAS, drop original table and then rename it to the original table.
questions all the dependent objects such as packages, views, synonynums... all will because invalid. do these need to be recompiled? I know all the grants have to be regrants.

Thanks.

March 28, 2005 - 7:18 am UTC

dependent objects will take care of themselves next time they are executed.

A reader, March 27, 2005 - 10:06 pm UTC

Hi Tom,

Sounds the great method to update the table by CTAS, drop original table and then
rename it to the original table.
Question is all the dependent objects such as packages, views, synonynums... all
will become invalid. Do these need to be recompiled? I know all the grants have
to be regrants.

Thanks.

A reader, March 27, 2005 - 10:48 pm UTC

tom,
you said:

"
Now, just set job_queue_processes > 0 to set the "degree of threads" and in 8i
and before set job_queue_interval to say 60 or so and there you go.
"

what's the "degree of threads"?

March 28, 2005 - 7:29 am UTC

"degree of threads" is number of concurrent job queue processes, how many dbms_job jobs can be running simultaneously.

commits or not

A reader, April 01, 2005 - 5:09 pm UTC

want to insert 30 million records to a dummy table with nologging as you adivsed.

insert into /*+ append parallel ( ) */
select * from ...

if i have the large rollback segs, it can be complete without doing the commits during the insert into (for example comment every one millions inserts), is it faster without the commits?

Thanks alot

April 01, 2005 - 6:12 pm UTC

to commit or not. Everytime you commit, you will wait, wait for log file sync.

Every time you commit, you say "i'm done". That's ok, if you are restartable!

However, with a big parallel direct path insert it'll be all or nothing (no point to do N rows, and do the query all over again). Each parallel process will utilize it's own rbs and if you have no indexes, a direct path insert isn't going to generate MUCH undo at all (it writes above the HWM and doesn't need to generate UNDO information)

A reader, April 05, 2005 - 5:08 am UTC

Let's Tell the CBO About Our Great Idea!

Doug Winage, April 11, 2005 - 7:28 pm UTC

Given that it seems there are clear cases in which making and loading a new table is preferable to updating in place, I wonder if it would be just as well to make such an approach available to the CBO as a candidate execution plan. I'll grant that one wouldn't want to leave the choice entirely up to the optimizer; the safest thing would be to allow enabling and disabling of the plan option at the system level with a parameter called, say, "ctas_for_updates_enabled." One could also provide a pair of hints, say, /*+ USE_CTAS */ and /*+ NO_CTAS */, to allow one to override the CBO, if need be, and expressly demand or prohibit a plan involving recreation of the table.

April 11, 2005 - 8:23 pm UTC

all really good ideas need to go via support and get filed....

CTAS instead of Deletes

GG, April 11, 2005 - 11:02 pm UTC

Tom,
Please, this is a relevant follow up to this thread.
I have also included a test case.
Instead of deleting rows from a table with 270 Million records, I have used CTAS. The delete criteria is based on a join of two fields from two tables. The delete occurs only on one table (ORDERS). The other table is INVOICE.
If DOC_NUMBER and ORD_ITEM values from both tables match, then delete the matching records from ORDERS.

Test case:

drop table orders;
create table orders (
DOC_NUMBER number(10) not null
,S_ORD_ITEM number(10) not null
,PROMOTION varchar2(10) not null);

drop table invoice;
create table invoice(
DOC_NUMBER number(10) not null
,S_ORD_ITEM number(10) not null
,OTHER_INFO varchar2(10) not null);

begin
for i in 1..10 loop
insert into orders values(i,i+1,'ORDERS');
end loop;
commit;
end;
/

select * from orders;

DOC_NUMBER S_ORD_ITEM PROMOTION
---------- ---------- ----------
1 2 ORDERS
2 3 ORDERS
3 4 ORDERS
4 5 ORDERS
5 6 ORDERS
6 7 ORDERS
7 8 ORDERS
8 9 ORDERS
9 10 ORDERS
10 11 ORDERS

begin
for i in 1..5 loop
if mod(i,2) = 0 then
insert into invoice values(i,i+1,'INVOICE');
end if;
end loop;
commit;
end;
/

select * from invoice order by 1;

DOC_NUMBER S_ORD_ITEM PROMOTION
---------- ---------- ----------
2 3 INVOICE
4 5 INVOICE

The above two rcords should be deleted from ORDERS:

drop TABLE ORDERS_NEW;
CREATE TABLE ORDERS_NEW
NOLOGGING
AS
SELECT /*+ parallel(ORDERS,4) */ *
from orders ord
where NOT EXISTS
(select null from invoice inv
where inv.DOC_NUMBER = ord.DOC_NUMBER and
inv.S_ORD_ITEM = ord.S_ORD_ITEM);

select * from orders_new order by 1;

DOC_NUMBER S_ORD_ITEM PROMOTION
---------- ---------- ----------
1 2 ORDERS
3 4 ORDERS
5 6 ORDERS
6 7 ORDERS
7 8 ORDERS
8 9 ORDERS
9 10 ORDERS
10 11 ORDERS

IS THIS THE CORRECT WAY OF DOING IT? ESPECIALLY THE SQL WITH "NOT EXISTS"?

April 12, 2005 - 7:07 am UTC

hints need to use the correlation (alias) names....

I would just use NOT IN but not exists with the CBO will be considered as not in conceptually so it is just harder to code, but has the same net effect.

See the concepts guide for details on parallel CTAS.

Update millions of records

Marsboy, April 12, 2005 - 1:33 pm UTC

Hi Tom,
It helped me so much that my days work is now can be accomplished in minutes.

update partition

phil, April 14, 2005 - 2:33 am UTC

I am sorry for posting aq similar question in another thread. The follow up should have been here !

I am getting a little confused with the technique of updated lots of rows in a partitioned table.

I have a table partioned by month. To do an update on a column you suggest doing a create table as select. This of course will create a new table with the updated colum and the data from the table to be updated.

What I do not understand is how to handle the each partition. My new table has none, just valid data. How do I now get the new table to the structure of the old with all the partitions ?

I am sure the answer is here, I just cannot see it !

regards

April 14, 2005 - 7:43 am UTC

to do a huge bulk update of most of the rows, using a create table as select could be a way to go. I followed up on the other thread.

Partioned tables

Greg, April 14, 2005 - 7:54 am UTC

a previous poster was asking about partioned tables ..

[quote]
I am getting a little confused with the technique of updated lots of rows in a
partitioned table.
[/quote]

Feel free to ignore this question if you answered it already in the other followup (I wasn't sure which thread you posted it in, and can't see find it .. *sigh*)

Anyway ... if the table is partitioned, can you not just deal with each partition seperately ?

That is: if you have 15 partitions (ie P01 through P15 ...), and you "update" only encompased P03 and P04 ... then could you not just "re-create" that partition in the same way you "re-create" the table .. then just "stuff" the partition(s) back in?

I can't remember all the syntax to pull this little "card-shuffle" off ... but I'm pretty sure it can be done? can't it?

April 14, 2005 - 8:35 am UTC

sorry, the other thread is
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:7143624535091 <code>

Yes, if the table is partitioned, you can. (and that is what I was trying to describe on the other page, I just didn't have sufficient detail the first time around to make it clear)

A reader, April 21, 2005 - 12:12 pm UTC

"
declare
l_job number;
begin
dbms_job.submit( l_job, 'execute immediate ''alter index idx1 rebuild'';' );
commit;
dbms_job.submit( l_job, 'execute immediate ''alter index idx2 rebuild'';' );
commit;
end;

Now, just set job_queue_processes > 0 to set the "degree of threads" and in 8i
and before set job_queue_interval to say 60 or so and there you go.

"
will the dbms job only be executed once?

TIA

April 22, 2005 - 9:02 am UTC

yes, well, it will execute successfully ONCE.

It might fail over and over and over and over, then run successfully and go away.

How to Update millions or records in a table?

S.Darshan, May 17, 2005 - 7:48 am UTC

I have to update one table which has 10 million records. The regular rollback segment won't enough to hold this data, In this case, how can we predict, how much rollback area will be used by this update.

Initially I thought of your approach "CREATE TABLE new_table as select <do the update "here"> from old_table;"

But I have to update about 30% of columns. Rest of the 70% columns should be not touched. Also this table have lot foreign key Constraints. Is any alternative method? Thanks in Advance.

With best regards
S.Darshan

May 17, 2005 - 9:16 am UTC

if I had to update 1/3 of a 10,000,000 row table -- that is the perfect candidate for create table as select. foreign keys or not (you can always disable, ctas and for the columns you have not updated -- enable novalidate, just put them back on)

but, if you cannot do that, disable indexes on columns to be updated and update and then rebuild the indexes. to get a grasp on "how much undo", create a copy of the first 100,000 rows of data for example, update it and see how much undo you generated.

Parallel DML

Tamilselvan, May 18, 2005 - 10:05 am UTC

I agree with what Tom said.

How ever, in many real life situation CTAS may not be executed as we think. For example, if the table has a LONG column, then CTAS will not work.

Consider Parallel DML. I have successfully implemented parallel dml in a large DB.

Here is the script:

alter table scott.table_a parallel 8;
alter index scott.table_a_p1 parallel 8;
alter table scott.tamil_temp_table_b parallel 8;
alter index scott.tamil_temp_table_b_idx parallel 8;

commit;
alter session enable parallel dml ;
alter session enable parallel query ;

update /*+ parallel(a,8) */ scott.table_a a
set a.pr_addr_id = ( select addr_per_id
from scott.tamil_temp_table_b c
where c.accnt_id = a.ID)
where exists ( select null
from scott.tamil_temp_table_b c
where c.accnt_id = a.ID) ;
commit;
alter session disable parallel dml ;
alter session disable parallel query ;
alter table scott.table_a noparallel ;
alter index scott.table_a_p1 noparallel ;

Thanks
Tamil

May 18, 2005 - 11:20 am UTC

<quote>
if the table has a LONG column, then CTAS will not work.
</quote>

convert long to clob and be done with it.

Ecellent Explanation

A reader, May 18, 2005 - 11:18 pm UTC

this is very helpful nice Toma i have question i am facing while using dbstart utility where can i ask that question

May 19, 2005 - 7:39 am UTC

suggest you don't use dbstart but rather write your own rc.d script to startup/shutdown that which you want.

What I do is like this:

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:1302619965694#12118047114111 <code>

How to Update millions or records in a table

Debasish, June 08, 2005 - 5:29 am UTC

you told that create tble with update command. but I have a problem that. ( in 8i)
One of my Master table is contain following fields ( approx 500000 records)

1. ID varchar2
2. Pincode Nuimber
3. Tag char

Another three table is bankmaster1,bankmaster2,bankmaster3 fields are
1. pincode number
2. branch varchar2

I have to update tag column in master table according the pincode distribution in bankmaster table, like tag E for bank1, tag C form bank2, tag V for bank3.
when I try it through update command then it works for a small table but not for this 500000 record table.

Plese let me know in advance.

June 08, 2005 - 9:04 am UTC

I told you to create a table with the update command?  Where and when -- I'll need to correct that if true, for you cannot create a table with the update command.


It works for me for a 50,000,000 row table or a 5 row table.  You'll have to tell us why it isn't "working" (and hopefully it is not using a single index)


ops$tkyte@ORA10G> create table t
  2  as
  3  select id, id pincode, 'x' tag
  4    from big_table.big_table
  5   where rownum <= 1000000;
 
Table created.
 
Elapsed: 00:00:01.36
ops$tkyte@ORA10G>
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> create table t1 as select id pincode, 'xxxxxx' branch from t where mod(id,3) = 0;
 
Table created.
 
Elapsed: 00:00:01.60
ops$tkyte@ORA10G> create table t2 as select id pincode, 'xxxxxx' branch from t where mod(id,3) = 1;
 
Table created.
 
Elapsed: 00:00:01.54
ops$tkyte@ORA10G> create table t3 as select id pincode, 'xxxxxx' branch from t where mod(id,3) = 2;
 
Table created.
 
Elapsed: 00:00:01.49
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> exec dbms_stats.gather_table_stats( user, 'T', method_opt=> 'for all indexed columns' );
 
PL/SQL procedure successfully completed.
 
Elapsed: 00:00:00.06
ops$tkyte@ORA10G> exec dbms_stats.gather_table_stats( user, 'T1', method_opt=> 'for all indexed columns' );
 
PL/SQL procedure successfully completed.
 
Elapsed: 00:00:00.13
ops$tkyte@ORA10G> exec dbms_stats.gather_table_stats( user, 'T2', method_opt=> 'for all indexed columns' );
 
PL/SQL procedure successfully completed.
 
Elapsed: 00:00:00.07
ops$tkyte@ORA10G> exec dbms_stats.gather_table_stats( user, 'T3', method_opt=> 'for all indexed columns' );
 
PL/SQL procedure successfully completed.
 
Elapsed: 00:00:00.13
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> set timing on
ops$tkyte@ORA10G> update (select tag from t where pincode in (select pincode from t1)) set tag = 'E';
 
333333 rows updated.
 
Elapsed: 00:00:06.13
ops$tkyte@ORA10G> update (select tag from t where pincode in (select pincode from t2)) set tag = 'C';
 
333334 rows updated.
 
Elapsed: 00:00:06.48
ops$tkyte@ORA10G> update (select tag from t where pincode in (select pincode from t3)) set tag = 'V';
 
333333 rows updated.
 
Elapsed: 00:00:06.40
ops$tkyte@ORA10G>


Only takes a couple of seconds.

A reader, June 10, 2005 - 8:42 am UTC

Tom,
I have this query in which I'm trying update the old POC_ID with the new POC_ID. However, it doesn't produce the results I would expect. The table does not get updated with the correct POC_ID. What am I missing here?

begin for c1 in (select w.poc_id new_poc_id, a.poc_id old_poc_id
from wsar_poc w, ap32admin.adas_poc a
where w.poc_responsible_ind = a.poc_responsible_ind)
loop
update wsar_archive_cs
set poc_id = c1.new_poc_id
where poc_id = c1.old_poc_id;
end loop;
end;

Thanks,
Randy

June 10, 2005 - 10:34 am UTC

need example.

but that looks like it should be at most a SINGLE sql statement, no code whatsoever...

A reader, June 10, 2005 - 10:59 am UTC

I have a table WSAR_ARCHIVE_CS which contains a column POC_ID. POC_ID is FK to the ADAS_POC POC_ID PK. I want to update that column with the new POC_ID which is stored in WSAR_POC POC_ID PK.

Thanks,
Randy

June 10, 2005 - 11:10 am UTC

create table .....
create table ....
create table ....
insert some data...
insert some more data...

followed by "and this is what I need to do..."

A reader, June 10, 2005 - 11:28 am UTC

create table wsar_poc
(POC_ID NOT NULL NUMBER,
POC_RESPONSIBLE_IND VARCHAR2(50),
DIVISION_ID NOT NULL NUMBER,
OUSER VARCHAR2(30),
ACCESS_ROLE VARCHAR2(50),
PHONE_NUMBER VARCHAR2(20),
EMAIL VARCHAR2(200))

Create table wsar_archive_cs
(FILE_ID NOT NULL NUMBER,
ONELINER_DATE DATE,
PRODUCT VARCHAR2(50),
COMPONENT VARCHAR2(400),
PRIORITY VARCHAR2(50),
POC_ID NUMBER,
SUBJECT VARCHAR2(300),
ISSUE VARCHAR2(4000))

I have the table adas_poc which contians the old poc_id that is in the WSAR_ARCHIVE_CS table.

desc adas_poc
POC_ID NOT NULL NUMBER,
POC_RESPONSIBLE_IND VARCHAR2(50),
DIVISION_ID NOT NULL NUMBER,
OUSER VARCHAR2(30),
ACCESS_ROLE VARCHAR2(50),
PHONE_NUMBER VARCHAR2(20),
EMAIL VARCHAR2(200)

I tried to update using the cursor for loop, but no luck. The column does not get updated with the correct ID.

begin for c1 in (select w.poc_id new_poc_id, a.poc_id old_poc_id
from wsar_poc w, ap32admin.adas_poc a
where w.poc_responsible_ind = a.poc_responsible_ind)
loop
update wsar_archive_cs
set poc_id = c1.new_poc_id
where poc_id = c1.old_poc_id;
end loop;
end;

Thanks,
Randy

June 10, 2005 - 3:33 pm UTC

oh, so close -- you almost had inserts there with data to show

a) what happened
b) explain what you expected

That will definitely join wsap_poc to adas_poc by poc_responsible_ind

it will definitely then update wsar_archive_cs

you need to explain why this logic isn't working, it is rather straight forward. Demonstrate for us.

RE: Parallel Import

Bakunian, June 20, 2005 - 6:37 pm UTC

Tom,

I created range partitioned table with parallelization enabled, then I use import utility to load 5Gb of data into table however I don't see parallel processes spawned during load.
Shouldn't import take advantage of this functionality?
I am using 9.2 on Solaris 8.

These are instance parameters.

Thank you for your time.

NAME_COL_PLUS_SHOW_PARAM TYPE VALUE_COL_
--------------------------------------------- ----------- ----------
fast_start_parallel_rollback string LOW
log_parallelism integer 4
parallel_adaptive_multi_user boolean TRUE
parallel_automatic_tuning boolean TRUE
parallel_execution_message_size integer 4096
parallel_instance_group string
parallel_max_servers integer 40
parallel_min_percent integer 0
parallel_min_servers integer 0
parallel_server boolean FALSE
parallel_server_instances integer 1
parallel_threads_per_cpu integer 2
recovery_parallelism integer 0

June 21, 2005 - 7:51 am UTC

imp is a serial process, always has been - it reads a single DMP file sequentially and loads it.

10g with impdb has a parallel capability.

Tome u r great

shyam, June 30, 2005 - 3:08 am UTC

I have a quick question my statistics are as below i am loading data

datafile Blocks_Read FS.Phyblkwrt Blocks_Written Total_IOs

/oracle/u03/oradata/dev/data_01.dbf
26507 5240 31747

/oracle/u03/oradata/dev/data_09.dbf
26823 2823 29646

/oracle/u06/oradata/dev/data_08.dbf
24978 3704 28682
/oracle/u04/oradata/dev/data_10.dbf
26288 1831 28119

/oracle/u04/oradata/dev/data_02.dbf
24053 3634 27687

/oracle/u05/oradata/dev/data_11.dbf
26155 1284 27439
/oracle/u05/oradata/dev/data_07.dbf
23125 2865 25990

/oracle/u04/oradata/dev/data_06.dbf
23558 1978 25536

/oracle/u06/oradata/dev/data_25.dbf
21672 2234 23906
/oracle/u05/oradata/dev/data_03.dbf
21299 2574 23873

/oracle/u03/oradata/dev/data_05.dbf
20324 1819 22143

/oracle/u06/oradata/dev/data_04.dbf
19673 1746 21419

/oracle/u06/oradata/dev/data_12.dbf
19574 1306 20880

/oracle/u04/oradata/dev/data_14.dbf
19309 755 20064

/oracle/u03/oradata/dev/data_13.dbf
18807 848 19655
/oracle/u05/oradata/dev/data_24.dbf
17656 1316 18972

/oracle/u06/oradata/dev/data_16.dbf
16748 1730 18478

/oracle/u05/oradata/dev/data_15.dbf
17006 810 17816
/oracle/u03/oradata/dev/data_17.dbf
14810 1350 16160

/oracle/u05/oradata/dev/data_19.dbf
13552 1157 14709

/oracle/u03/oradata/dev/data_21.dbf
13622 911 14533

/oracle/u05/oradata/dev/data_23.dbf
13366 878 14244

/oracle/u04/oradata/dev/data_18.dbf
12644 908 13552

/oracle/u04/oradata/dev/data_22.dbf
12521 741 13262
/oracle/u06/oradata/dev/data_20.dbf
12137 546 12683

Tom u can observe that total reads is more than total writes as a result the performance is decreasing. let me know how should i increse the block written. as a result o this my data loading is slow

June 30, 2005 - 9:29 am UTC

who is "u" and why is "u" observing things?

how do you know that reads are the cause of performance decreasing. All I see are a bunch of numbers.

loading data

shyam, June 30, 2005 - 3:10 am UTC

this is the quesry i used to generate statis stics

select DF.Name File_Name,
FS.Phyblkrd Blocks_Read,
FS.Phyblkwrt Blocks_Written,
FS.Phyblkrd+FS.Phyblkwrt Total_IOs
from V$FILESTAT FS, V$DATAFILE DF
where DF.File#=FS.File#
order by FS.Phyblkrd+FS.Phyblkwrt desc;

June 30, 2005 - 9:30 am UTC

so, you've just shown the number of reads done since the database was started. All they are are a bunch of numbers with no meaning.

how do you know read IO is your bottleneck? what made you decide that?

performance while loading data

shyam, June 30, 2005 - 7:54 pm UTC

FS.Phyblkwrt
datafile = /oracle/u03/oradata/dev/data_01.dbf

Blocks_Read = 26507
Blocks_Written =5240
Total_IOs= 31747

June 30, 2005 - 9:09 pm UTC

so what? (he says again)

you've shown me that file has been read 26,507 times SINCE STARTUP and written to 5,240 time SINCE STARTUP

big deal, so what?

where they all during your load.
did they take A LONG TIME <<<===

PRECAUTIONS TO TAKE WHEN HUGE INSERTS

shyam, July 03, 2005 - 8:44 am UTC

tom can u tell me the tips and how to improv the speed with which data is loading.

to to improve data load speed. for huge inserts at present i have 73 tables data is being loaded it takes for one project 60 mins

July 03, 2005 - 10:28 am UTC

depends on your needs and circumstances. To describe them all would be a small book.

So, you'd need to be more "specific". How many new rows, how many old rows, is there lots of validation going on, is this a reload or a load, have lots of un-necessary indexes, etc etc etc

Delete & import 6M records - Data Warehouse

A reader, July 06, 2005 - 11:54 pm UTC

Tom,

We currently have v8.0.6 running data warehouse. We have 6M rows to delete everyday & same time import same amount. It currently take us 10hrs to do deletion part. Data are import every 5mins. Is there any way we can speed up the deletion & improve the import(using oracle imp/exp)

Thanx so much for your help

July 07, 2005 - 9:09 am UTC

insufficient data. why no partitioning? maybe a create table as select rows to keep and drop old table.

but insufficient data to answer properly.

and really old software.
and really archaic tools.

a lot has happened in the time between 8.0.6 and 10gR2 - you'd find many more options in 2005 than you did in 1997 when 8.0 was current.

slowly my data base

hameed, August 06, 2005 - 6:50 pm UTC

hi tom
pleas help me
i have program by oracle 8i& dev 6i
the problem:
data base is slowly

One large insert Vs multiple smaller inserts

Sami, August 14, 2005 - 10:55 am UTC

Dear Tom,

We had this issue in production during huge data migration.

Environment:-
Oracle 8.1.7.3 Sun 2.6 - 2 Node OPS. 12 CPU on each box.
The table in question here involved in M-M Oracle replication.

There is no other activity on both "12Million_Rec_table" and "1Million_Rec_table " other than my process.

Process-1
=========
exec dbms_reputil.replication_off();
commit;
insert /*+ append parallel(t,12) */ into 12Million_Rec_table t
select /*+ parallel */ * from 1Million_Rec_table;

Process-2
=========
exec dbms_reputil.replication_off();
commit;
--for loop with diff predicate value
loop
insert /*+ append parallel(t,12) */ into 12Million_Rec_table
select /*+ parallel */ * from 1Million_Rec_table
where predicate=&1; -- there is an Index on Predicate column
end loop

12Million_Rec_table has only one primary key index. There is no other constraints or LOB columns on this table. For some reason it didn't use direct-path insert, don't know why.

Process-2 executed N times in a sequence.Basically Process-1 is one large transaction compared to Process-2 (which has N number of transactions in a loop).

In terms of execution time the Process-1 took longer than Process-2. Process-1 was running for more than
2 hrs without completion so we stopped it. Process-2 with smaller chunk of records completed very fast.

When Process-1 started the USED_UBLK value in v$rollstat increased rapidly then slowed down gradually. I have seen
this behaviour many times while running Process-1.

My questions are

1)Why Process-1 is slower than Process-2. [We didn't have any rollback segment issue while running both Process-1 and Process-2]

2)Is there any performnce issue if the rollback segment goes beyond some size, say 200M or something..?

August 14, 2005 - 11:49 am UTC

you did not enable parallel DML, to do a parallel direct path insert, you must.

I cannot answer these questions, insufficient data. You could have captured information while it was running.

There is a *small* chance that what happened was

a) the big insert, because of the primary key index, generated gobs of undo.
b) this caused more undo to be generated than fit in the buffer cache
c) so it had to be flushed by dbwr
d) causing buffer waits by the insert (trying to get more buffers from the cache for undo)
e) and causing IO contention on the disk

The smaller inserts *MIGHT* have

a) filled up part of the buffer cache with undo blocks
b) committed, allowing the undo blocks in the cache to be reused
c) goto A n times

avoiding the buffer waits and the flushing of the undo to disk.

However, I would not have taken your approach for a small set of one million rows. It likely would have been much faster to

a) get rid of the primary key index
b) set big table nologging if not already
c) insert /*+ append */ into big_table select * from small_table;
d) recreate the primary key index in parallel (if even needed, just 12 million rows), nologging
e) put logging attribute back on for the future
f) backup affected datafiles

did not get rid of PK index because the table is in M-M replication

Sami, August 14, 2005 - 1:22 pm UTC

Dear Tom,

Thanks a lot for your immediate response. You are absolutely
right about why the "smaller chunk transactions" defeated "one large transaction".

<asktom>
a) get rid of the primary key index
b) set big table nologging if not already
c) insert /*+ append */ into big_table select * from small_table;
d) recreate the primary key index in parallel (if even needed, just 12 million
rows), nologging
e) put logging attribute back on for the future
f) backup affected datafiles
</asktom>

I agree with you 100%. Even I prefer your approach [get rid of the primary key index,delete,rebuild index]. However the table in question is involved in Master-Master replication.
We are not sure about getting rid of primary key index for the table which involved in replication. That is why we didn't take this approach.

Tom,thanks for your time & effort.

Additional Info

Sami, August 14, 2005 - 8:20 pm UTC

Our large pool is set to 0 size. I am just wondering whether this value had any impact on our parallel process poor performance.

NAME TYPE VALUE
------------------------------------ ------- ------------------------------
large_pool_size string 0

August 15, 2005 - 6:52 am UTC

are you sure it is really zero, if you are using automatic SGA memory management:

ops$xp10gr1\tkyte@ORA10GR1> show parameter large

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
__large_pool_size                    big integer 4M
large_pool_size                      big integer 0
ops$xp10gr1\tkyte@ORA10GR1>




I didn't set a large pool, but Oracle has set up a 4m one (that can grow/shrink as needed)

Large pool size

Sami, August 15, 2005 - 1:45 pm UTC

Tom, We are still in 8.1.7.3.

Yes, I am sure the large pool size is set to zero on both instances(2 node OPS)

August 15, 2005 - 10:22 pm UTC

it isn't "impacting it" performance wise, it is using memory from elsewhere if, but the large pool would be a better place since this memory for parallel execution isn't "LRU" memory like the other stuff in the shared pool -- the large pool is designed for memory that is allocated and deallocated, unlike shared pool memory which is designed to be allocated and aged out over time.

Undo - With and without index

Sami, August 15, 2005 - 3:17 pm UTC

Session #1
=================================================================================
/*

create table t1 as select * from dba_objects;
create index t1_idx01 on t1(object_id);

*/
SQL> select distinct sid from v$mystat;

       SID
----------
       126

SQL> insert into t1 select * from dba_objects;

48601 rows created.

=================================================================================


Session #2 
=================================================================================
SQL> select taddr,used_ublk from v$transaction t,v$session s,v$rollstat r where s.taddr=t.addr and t.xidusn=r.usn and s.sid=126;

TADDR     USED_UBLK
-------- ----------
1965B9F4        568

SQL> select taddr,used_ublk from v$transaction t,v$session s,v$rollstat r wheres.taddr=t.addr and t.xidusn=r.usn and s.sid=126;

TADDR     USED_UBLK
-------- ----------
1965AFFC          2

SQL> select taddr,used_ublk from v$transaction t,v$session s,v$rollstat r where s.taddr=t.addr and t.xidusn=r.usn and s.sid=126;

TADDR     USED_UBLK
-------- ----------
1965B9F4        636


SQL> select taddr,used_ublk from v$transaction t,v$session s,v$rollstat r where s.taddr=t.addr and t.xidusn=r.usn and s.sid=126;

TADDR     USED_UBLK
-------- ----------
1965AFFC          2

SQL> select taddr,used_ublk from v$transaction t,v$session s,v$rollstat r where s.taddr=t.addr and t.xidusn=r.usn and s.sid=126;

no rows selected

SQL> select taddr,used_ublk from v$transaction t,v$session s,v$rollstat r where s.taddr=t.addr and t.xidusn=r.usn and s.sid=126;

TADDR     USED_UBLK
-------- ----------
1965B9F4        754

SQL> select taddr,used_ublk from v$transaction t,v$session s,v$rollstat r where s.taddr=t.addr and t.xidusn=r.usn and s.sid=126;

TADDR     USED_UBLK
-------- ----------
1965B9F4       1035

SQL>

=================================================================================

I monitored rollback activities for session#1(while inserting rows) through session#2.

While the USED_UBLK is gradually going up for one transaction , I noticed  some other 
transaction(TADDR=1965AFFC) with USED_UBLK=2.
Is it because of Index t1_idx01? 

I didn't see this behavior when I tested the same test case without Index.



Sometimes I noticed "no rows selected" in session#2. I assume this is the time the transaction is
switching between TADDR 1965AFFC and 1965B9F4?


Please correct me if I am wrong?

August 15, 2005 - 10:35 pm UTC

ddl commits, there were two transactions there

the create table and the create index.

no ddl here

Sami, August 15, 2005 - 11:04 pm UTC

Tom,

Sorry for the confusion. I shouldn't have put the create table and create index statement over there (though it was commented)

create table and create index is outside of this scope. The reason I put those because I just wanted to show you that
1) the table is created from dba_objects
2) it has index on object_id column

I have tested this behavior in both 8i and 10g. The following output and "no rows selected" appeared N times (more than 2) WHILE THE INSERT STATEMENT IS GOING ON, just for clarity I just truncated the output.

TADDR USED_UBLK
-------- ----------
1965AFFC 2

August 16, 2005 - 10:46 am UTC

then I have no clue what we are in fact looking at there.

I cannot even remotely reproduce - nor is this the way it works.

the used_ublks would in this case increase, until you commit, then they go away.

1 big INSERT with joins - OR - multiple UPDATES...

Sandeep, August 16, 2005 - 8:05 am UTC

Hi Tom,

I'm trying to archive data from some big tables (5 or 6 of them). Before doing that I need to "rollup" an "amount"
column within each of those big tables (based on
a "creation date" criteria). The summations of these amount will be grouped by date (monthly sum-up) and will
need to be put into a reporting table.

The reporting table will have columns like
"Month", "Sum_Amt_BigTable1","Sum_Amt_BigTable2",
"Sum_Amt_BigTable3" and the like.

Initially there was only one BigTable (say BigTable1), so the archiving script was simple.
Just one - INSERT INTO REPORT_TABLE... SELECT.. FROM BIGTABLE1.. WHERE (the date criteria)
GROUP BY (the date column).

Now since there are more "BigTables", I have two(?) options:

1. Modify the insert script such that there is JOIN
between all BigTables, say BigTable1..BigTable6 (eeps!)
on the date criteria, and just sum-up the required
columns. For the sake of simplicity, assume that all
the tables contain at least one record for each month
(so no need to outer-join!).

2. Use the original INSERT script to create the initial
records within the REPORT table... then run multiple
UPDATEs on the REPORT table - one for
each bigtable, like this (roughly!)-
UPDATE REPORT_TABLE
SET Sum_Amt_BigTable2 = (SELECT sum(amountxx)
FROM BigTable2 WHERE (the date criteria)
GROUP BY (the date column))
WHERE REPORT_TABLE.month = LAST_DAY(TO_DATE(TO_CHAR(
BigTable2.date_column,'DD-MON-YYYY'),'DD-MON-YYYY'))
;

... and so on - for each remaining BigTable..

Which option do you thing will be better in terms of performance? - I suspect option2, as I'm a bit
apprehensive about joining 6 big tables for one big insert. Will be testing this out, but wanted
to have your opinion about the same.

Regards,
Sandeep

August 17, 2005 - 10:36 am UTC

#1 of course, do as much as you can in a single sql statement.

big hash area size, nice juicy full scans -- ready to rock and roll and have it done fast.

big INSERT with joins - OR - multiple UPDATES

Sandeep, August 17, 2005 - 11:57 am UTC

Hi,
Was thinking you might actually suggest #2 !!
Having a big join across 6 tables with multimillion
records? hmm... not sure.. but will definitely try this
out.
Did the #2 and it works well... will try to do the
#1 and get some stats to compare... somehow joins
between 6 *huge* tables does not convince me... but...
Thanks,
Sandeep

August 17, 2005 - 1:58 pm UTC

6 tables, 60 tables -- I don't care, make SQL work for you -- NOT the other way around.

databases were BORN TO JOIN, it is what they do

Option #3?

Bob B, August 17, 2005 - 2:25 pm UTC

If there was the likelihood of an outer join in the above scenario, it might be worth using a union all approach. Something like:

SELECT
'Big Table 1' NAME,
SUM( WHATEVER ) WHATEVER_SUM,
SUM( WHATEVER2 ) WHATEVER2_SUM,
PARAM1,
PARAM2
FROM BIG_TABLE1
WHERE CONDITION1 = 'whatever'
GROUP BY PARAM1, PARAM2
UNION ALL
SELECT
'Big Table 2' NAME,
SUM( WHATEVER ) WHATEVER_SUM,
SUM( WHATEVER2 ) WHATEVER_SUM,
PARAM1,
PARAM2
FROM BIG_TABLE2
WHERE CONDITION1 = 'whatever'
GROUP BY PARAM1, PARAM2
... etc

Essentially, write the report query for each table, union all them together and then summarize the union all query as necessary:

SELECT
SUM( SUM_WHATEVER ) SUM_WHATEVER,
SUM( SUM_WHATEVER2 ) SUM_WHATEVER2,
PARAM1,
PARAM2
FROM (
QUERY ABOVE
)
GROUP BY PARAM1, PARAM2

With more clear test case

sami, August 17, 2005 - 9:12 pm UTC

</asktom>
then I have no clue what we are in fact looking at there.

I cannot even remotely reproduce - nor is this the way it works.  

the used_ublks would in this case increase, until you commit, then they go away. 
</asktom>


Dear Tom,

While the session#1 is doing DIRECT PATH INSERT on TABLE which has PRIMAR KEY,I Monitored SESSION#2 as below.

While the USED_UBLK is gradually going up, there are 2 occurances it reported different TADDR with USED_UBLK=2.

I just wanted to know what is the new TADDR.

Session#1
#############################################################################

20:37:47 SQL> select distinct sid from v$mystat;

       SID
----------
       162

20:37:50 SQL> insert /*+ append */ into t1 select * from dba_objects;

48603 rows created.

20:38:23 SQL>
#############################################################################



Session #2

#############################################################################
SQL> desc undo_monitor
 Name                                      Null?    Type
 ----------------------------------------- -------- -------------------------

 TADDR                                              VARCHAR2(50)
 UNDO_UBLK                                          NUMBER
 TIME                                               DATE

SQL> truncate table undo_monitor;

Table truncated.

SQL> set time on
SQL>
SQL>
SQL> declare
  2  i number :=0;
  3  begin
  4  loop
  5  for c0rec in (select taddr,used_ublk from v$session s,v$transaction t
  6  where s.taddr=t.addr
  7  and s.sid=&session_to_be_monitored)
  8  loop
  9  insert into undo_monitor values(c0rec.taddr,c0rec.used_ublk,sysdate);
 10  end loop;
 11  i := i+1;
 12  if i=100000 then
 13  exit;
 14  end if;
 15  end loop;
 16  end;
 17  /
Enter value for session_to_be_monitored: 162
old   7: and s.sid=&session_to_be_monitored)
new   7: and s.sid=162)

PL/SQL procedure successfully completed.


20:39:49 SQL> select distinct time,taddr,undo_ublk from undo_monitor  order by t
ime;

TIME      TADDR                                               UNDO_UBLK
--------- -------------------------------------------------- ----------
17-AUG-05 4C64FADC                                                    1
17-AUG-05 4C64FADC                                                    1
17-AUG-05 4C64FADC                                                    1
17-AUG-05 4C64FADC                                                    1
17-AUG-05 4C6504D4                                                    2
17-AUG-05 4C64FADC                                                    5
17-AUG-05 4C64FADC                                                    6
17-AUG-05 4C64FADC                                                    9
17-AUG-05 4C64FADC                                                   10
17-AUG-05 4C64FADC                                                   16
17-AUG-05 4C6504D4                                                    2
17-AUG-05 4C64FADC                                                   17
17-AUG-05 4C64FADC                                                   21
17-AUG-05 4C64FADC                                                   25
17-AUG-05 4C64FADC                                                   27
17-AUG-05 4C6504D4                                                    2
17-AUG-05 4C64FADC                                                   27
17-AUG-05 4C64FADC                                                   30
17-AUG-05 4C64FADC                                                   32
17-AUG-05 4C64FADC                                                   36
17-AUG-05 4C64FADC                                                   38
17-AUG-05 4C64FADC                                                   49
17-AUG-05 4C64FADC                                                   52
17-AUG-05 4C64FADC                                                   52
17-AUG-05 4C64FADC                                                   52
17-AUG-05 4C64FADC                                                   52
17-AUG-05 4C64FADC                                                   52
17-AUG-05 4C64FADC                                                   52
17-AUG-05 4C64FADC                                                   52
17-AUG-05 4C64FADC                                                   52

80 rows selected.

20:39:56 SQL>

#############################################################################

August 18, 2005 - 3:34 pm UTC

as I said, I cannot even REMOTELY reproduce this at all.

date is only valid down to the second, perhaps -- these rows are scrambled. put I in that last column, not sysdate and see.

(select to_char(time, 'dd-mon-yyyy hh24:mi:ss' ) -- see if your times are not all in the same second there.

What if standby database were used?

Michel, August 18, 2005 - 9:25 am UTC

Hi Tom,
replacing update with CTAS and/or INSERT /*+ append */ INTO ... is especially usefull because of NOLOGGING

Now, what to do if we are using stnadby databases and so it is not a good idea to use operations with NOLOGGING

August 18, 2005 - 4:05 pm UTC

CTAS and /*+ append */ are done with logging enabled in ARCHIVELOG mode, unless you specify "alter <segment_name> NOLOGGING" and even then the dba can "force logging" at the database level.

LOGGING happens with this stuff!!!

Excellent

Peter, August 18, 2005 - 12:55 pm UTC

Hi Tom,

You are the best of best. I visit your site everyday.

I need your expertise in this regard too.

I have a fact table T
( order_id number,
customer_id number,
order_date date,
month_id number,
prd_id number,
prd_name varchar2(30),
quantity number,
zip number,
flag smallint,
swiprd_id number)
with about 5 million records imported every month.
NOTE: one product (prd_name) may have one or more prd_ids. There are about 200 prd_name and 2000 different prd_id in this table)

Oracle SERVER: Oracle9i Enterprise Edition Release 9.2.0.5.0 - 64bit Production

T was partitioned by month_id.
A local index is builded on T(customer_id, order_date, prd_name).

table T was analyzed using
dbms_stats.GATHER_TABLE_STATS(
OWNNAME=>'USER',
TABNAME=>'T',
ESTIMATE_PERCENT=>dbms_stats.auto_sample_size,
METHOD_OPT => 'for all indexed columns',
cascade => true,
degree => 4);

I need to know following statitics informations each month

1) how many continue customers for each prd_id (not just prd_name) in this month.
the continue customer is defined as A customer who bought a product (with a prd_id ) in this month
and also bought same product (with same or different prd_id) in past 365 days and ((previous order_date+ 30*previous quantity) >= current order_date)
SET flag =1

2) how many restart customers for each prd_id (not just prd_name) in this month.
the restart customer is defined as A customer who is not a continue customer and bought a product (with a prd_id ) in this month
and also bought same product (with same or different prd_id) in past 365 days and ((previous order_date+ 30*previous quantity) < current order_date)

SET flag =2

3) how many switch customers for each prd_id (not just prd_name) in this month.
the restart customer is defined as A customer who is not a continue nor restart customer and bought a product (with prd_id) in this month
and also bought different product (not just different prd_id) in past 365 days.
and also record the what prd_id switched from
SET flag =3, swiprd_id = previous prd_id

4) how many new customers for each prd_id (not prd_name) in this month.
the new customer is defined as A customer who bouht a product but never bought any product in past 365 days
SET flag =4

Current solutions: use a procedure to process each order

CREATE OR REPLACE PROCEDURE get_flag(v_monthid IN NUMBER)
AS
cursor c1 IS
SELECT distinct prd_id
from T
WHERE month_id=v_monthid order by prd_id;

cursor c2 is
SELECT customer_id, order_date, prd_name
FROM T
WHERE prd_id=v_prd_id and month_id = v_monthid
order by customer_id, order_date desc, quantity desc
FOR UPDATE of flag NOWAITE ;

cursor c3 -- for all orders with same customer and same
-- product during lookback
is SELECT order_date, quantity
FROM T
WHERE customer_id = v_customer_id
and prd_name = v_prd_name
AND order_date BETIEN (v_order_date-366)
AND (v_order_date-1)
order by order_date desc, quantity desc;

cursor c4 -- for all orders with same customer and
-- different product during lookback
is SELECT prd_id
FROM T
WHERE customer_id = v_customer_id
and prd_name <> v_prd_name
AND order_date BETIEN (v_order_date-366)
AND (v_order_date-1)
order by order_date desc, quantity desc;

TYPE customer_array is TABLE of T.customer_id%type;
TYPE order_date_array is TABLE of T.order_date%type;
TYPE prd_name_array is TABLE of T.prd_name%type;
lv_customer_arr customer_array;
lv_order_date_arr order_date_array;
lv_prd_name_arr prd_name_array;

v_prd_id T.prd_id%type;
v_customer_id T.customer_id%type;
v_order_date T.order_date%type;
v_order_date_tmp T.order_date%type;
v_prd_name T.prd_name%type;
v_quantity T.quantity%type;
v_flag PLS_INTEGER;
v_swiprd_id T.swiprd_id%type;

BEGIN
FOR lv_prd in c1
LOOP
v_prd_id :=lv_prd.prd_id;
v_customer_id :=0 ;
OPEN c2;
LOOP
FETCH BULK COLLECT INTO lv_customer_arr,
lv_order_date_arr, lv_prd_name_arr
LIMIT 100;
FOR i in 1 .. lv_customer_arr.count
LOOP
IF v_customer_id <> lv_customer_arr(i)
v_customer_id := lv_customer_arr(i);
v_order_date := lv_order_date_arr(i)
v_prd_name := lv_prd_name_arr(i);
v_flag := 0;
v_swiprd_id := 0;

OPEN c3;
FETCH c3 INTO v_order_date_tmp, v_quantity ;
IF c3%NOTFOUND
THEN
OEPN c4
FETCH INTO v_swiprd_id ;
IF c4%NOTFOUND
THEN -- bought another product
v_flag :=4;
ELSE
v_flag :=3;
END IF;
CLOSE c4;
ELSE -- bought same product
IF (v_order_date_tmp + 30*v_quantity)
>=v_order_date
THEN
v_flag :=1 ;
ELSE
v_flag :=2;
END IF;
END IF;
CLOSE c3
-- UPDATE flag and swiprd_id in table T
IF v_flag =3
THEN
UPDATE T
SET flag = v_flag,
swiprd_id = v_swiprd_id
WHERE CURRENT OF c2;
ELSE
UPDATE T
SET flag = v_flag
WHERE CURRENT OF c2;
END IF;
END IF;
END LOOP;

EXIT WHEN c2%notfound;
END LOOP;
CLOSE c2;
END LOOP;

EXCEPTION
WHEN ... THEN ....;
WHEN OTHERS THEN
dbms_output.put_line(SubStr('Error '||TO_CHAR(SQLCODE)||': '||SQLERRM, 1, 255));
RAISE;
END;

After processing I can run some query to get statitics information I need.

This solution is obviously very slow.
(for each record 1~2 querys invoke, total about 5~7 million querys called)
It will take about a week to finish the whole process.
IF I divide the prd_id into 6 different ranges and run 6 procedures with diffent prd_id ranges at same time.
It will take about 2 days. But it is still too long.

I did go thru. some threads in your site. you suggested that:

CREATE TABLE new_table as select <do the update "here"> from old_table;
............

but it is very hard to achieve <do the update "here"> here.

My questions are:

1) Is there any better way to achieve this?

2) IF this is a way to be, how to improve the performance.

Thanks,

As you requested

sami, August 20, 2005 - 9:54 am UTC

Dear Tom,

As you requested I put the sequence number(loop#).
I am able to reproduce consistenly in both 8i solaris and 10 windows.

Look at the TADDR difference between
  SEQUENCE TADDR                 UNDO_UBLK TIME
---------- -------------------- ---------- ------------------
   18714 4C656458                     27 20-AUG-05 09:26:51
   18715 4C65695C                      2 20-AUG-05 09:26:51


Session#1
==========
SQL> truncate table t1;

Table truncated.

SQL> select index_name from dba_indexes where table_name='T1';

INDEX_NAME
------------------------------
T1_PK

SQL> desc t1
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------

 OWNER                                              VARCHAR2(30)
 OBJECT_NAME                                        VARCHAR2(128)
 SUBOBJECT_NAME                                     VARCHAR2(30)
 OBJECT_ID                                 NOT NULL NUMBER
 DATA_OBJECT_ID                                     NUMBER
 OBJECT_TYPE                                        VARCHAR2(19)
 CREATED                                            DATE
 LAST_DDL_TIME                                      DATE
 TIMESTAMP                                          VARCHAR2(19)
 STATUS                                             VARCHAR2(7)
 TEMPORARY                                          VARCHAR2(1)
 GENERATED                                          VARCHAR2(1)
 SECONDARY                                          VARCHAR2(1)

SQL> select sid from v$mystat where rownum=1;

       SID
----------
       122


----*** STARTED MONITONING THIS SESSION FROM HERE ***----

SQL> insert /*+ APPEND */ into t1 select * from dba_objects;

48526 rows created.

SQL>
----*** STOPPED MONITONING THIS SESSION FROM HERE ***----
##########################################################


Session#2
===================
 SQL> drop table undo_monitor;
 
 Table dropped.
 
 SQL> create global temporary table undo_monitor(sequence number,taddr varchar2(2
 0),undo_ublk number,time date)
   2  ;
 
 Table created.
 
 SQL> alter session set nls_date_format='DD-MON-YY HH24:MI:SS';
 
 Session altered.
 
 SQL>  declare
   2   i number :=0;
   3   begin
   4   loop
   5   for c0rec in (select taddr,used_ublk from v$session s,v$transaction t
   6   where s.taddr=t.addr
   7   and s.sid=&session_to_be_monitored)
   8   loop
   9   insert into undo_monitor values(i,c0rec.taddr,c0rec.used_ublk,sysdate);
  10   end loop;
  11   i := i+1;
  12   if i=100000 then
  13   exit;
  14   end if;
  15   end loop;
  16   end;
  17   /
 Enter value for session_to_be_monitored: 122
 old   7:  and s.sid=&session_to_be_monitored)
 new   7:  and s.sid=122)
 
 PL/SQL procedure successfully completed.
 
 SQL> spool c:\temp\1.log
 SQL> set lines 120 pages 300
SQL> select min(undo_ublk),max(undo_ublk) from undo_monitor;

MIN(UNDO_UBLK) MAX(UNDO_UBLK)
-------------- --------------
             1             52

SQL>
SQL> select * from undo_monitor where undo_ublk >1 and undo_ublk <52 order by se
quence;
  SEQUENCE TADDR                 UNDO_UBLK TIME
---------- -------------------- ---------- ------------------
[truncated for display purpose]

    18696 4C656458                     27 20-AUG-05 09:26:51
    18697 4C656458                     27 20-AUG-05 09:26:51
    18698 4C656458                     27 20-AUG-05 09:26:51
    18699 4C656458                     27 20-AUG-05 09:26:51
    18700 4C656458                     27 20-AUG-05 09:26:51
    18701 4C656458                     27 20-AUG-05 09:26:51
    18702 4C656458                     27 20-AUG-05 09:26:51
    18703 4C656458                     27 20-AUG-05 09:26:51
    18704 4C656458                     27 20-AUG-05 09:26:51
    18705 4C656458                     27 20-AUG-05 09:26:51
    18706 4C656458                     27 20-AUG-05 09:26:51
    18707 4C656458                     27 20-AUG-05 09:26:51
    18708 4C656458                     27 20-AUG-05 09:26:51
    18709 4C656458                     27 20-AUG-05 09:26:51
    18710 4C656458                     27 20-AUG-05 09:26:51
    18711 4C656458                     27 20-AUG-05 09:26:51
    18712 4C656458                     27 20-AUG-05 09:26:51
    18713 4C656458                     27 20-AUG-05 09:26:51
    18714 4C656458                     27 20-AUG-05 09:26:51
    18715 4C65695C                      2 20-AUG-05 09:26:51
    18716 4C65695C                      2 20-AUG-05 09:26:51
    18717 4C65695C                      2 20-AUG-05 09:26:51
    18718 4C65695C                      2 20-AUG-05 09:26:51
    18719 4C65695C                      2 20-AUG-05 09:26:51
    18720 4C65695C                      2 20-AUG-05 09:26:51
    18721 4C65695C                      2 20-AUG-05 09:26:51
    18722 4C65695C                      2 20-AUG-05 09:26:51
    18723 4C65695C                      2 20-AUG-05 09:26:51
    18724 4C65695C                      2 20-AUG-05 09:26:51
    18725 4C65695C                      2 20-AUG-05 09:26:51
    18726 4C65695C                      2 20-AUG-05 09:26:51
    18727 4C65695C                      2 20-AUG-05 09:26:51
    18728 4C65695C                      2 20-AUG-05 09:26:51
    18729 4C65695C                      2 20-AUG-05 09:26:51
    18730 4C65695C                      2 20-AUG-05 09:26:51
    18731 4C65695C                      2 20-AUG-05 09:26:51
    18732 4C65695C                      2 20-AUG-05 09:26:51
    18733 4C65695C                      2 20-AUG-05 09:26:51
    18734 4C65695C                      2 20-AUG-05 09:26:51
    18735 4C65695C                      2 20-AUG-05 09:26:51
    18736 4C65695C                      2 20-AUG-05 09:26:51
    18737 4C65695C                      2 20-AUG-05 09:26:51
    18738 4C65695C                      2 20-AUG-05 09:26:51
    18739 4C65695C                      2 20-AUG-05 09:26:51
    18740 4C65695C                      2 20-AUG-05 09:26:51
    18741 4C65695C                      2 20-AUG-05 09:26:51
    18742 4C65695C                      2 20-AUG-05 09:26:51
    18743 4C65695C                      2 20-AUG-05 09:26:51
    18744 4C65695C                      2 20-AUG-05 09:26:51
    18745 4C65695C                      2 20-AUG-05 09:26:51
    18746 4C656458                     38 20-AUG-05 09:26:51
    18747 4C656458                     38 20-AUG-05 09:26:51
    18748 4C656458                     38 20-AUG-05 09:26:51
    18749 4C656458                     38 20-AUG-05 09:26:51
    18750 4C656458                     38 20-AUG-05 09:26:51
    18751 4C656458                     38 20-AUG-05 09:26:51
    18752 4C656458                     38 20-AUG-05 09:26:51
    18753 4C656458                     38 20-AUG-05 09:26:51
    18754 4C656458                     38 20-AUG-05 09:26:51
    18755 4C656458                     38 20-AUG-05 09:26:51
    18756 4C656458                     38 20-AUG-05 09:26:51
    18757 4C656458                     38 20-AUG-05 09:26:51
    18758 4C656458                     38 20-AUG-05 09:26:51
    18759 4C656458                     38 20-AUG-05 09:26:51
    18760 4C656458                     38 20-AUG-05 09:26:51
    18761 4C656458                     38 20-AUG-05 09:26:51
    18762 4C656458                     38 20-AUG-05 09:26:51
    18763 4C656458                     38 20-AUG-05 09:26:51
    18764 4C656458                     38 20-AUG-05 09:26:51
    18765 4C656458                     38 20-AUG-05 09:26:51
    18766 4C656458                     38 20-AUG-05 09:26:51
    18767 4C656458                     38 20-AUG-05 09:26:51
    18768 4C656458                     38 20-AUG-05 09:26:51
    18769 4C656458                     38 20-AUG-05 09:26:51
    18770 4C656458                     38 20-AUG-05 09:26:51
    18771 4C656458                     38 20-AUG-05 09:26:51
    18772 4C656458                     38 20-AUG-05 09:26:51
    18773 4C656458                     38 20-AUG-05 09:26:51
    18774 4C656458                     38 20-AUG-05 09:26:51
    18775 4C656458                     38 20-AUG-05 09:26:51
    18776 4C656458                     38 20-AUG-05 09:26:51
    18777 4C656458                     38 20-AUG-05 09:26:51
    18778 4C656458                     38 20-AUG-05 09:26:51
    18779 4C656458                     38 20-AUG-05 09:26:51
    18780 4C656458                     38 20-AUG-05 09:26:51
    18781 4C656458                     38 20-AUG-05 09:26:51
    18782 4C656458                     38 20-AUG-05 09:26:51
    18783 4C656458                     38 20-AUG-05 09:26:51
    18784 4C656458                     38 20-AUG-05 09:26:51
    18785 4C656458                     38 20-AUG-05 09:26:51
    18786 4C656458                     38 20-AUG-05 09:26:51
    18787 4C656458                     38 20-AUG-05 09:26:51
    18788 4C656458                     38 20-AUG-05 09:26:51
    18789 4C656458                     38 20-AUG-05 09:26:51
    18790 4C656458                     38 20-AUG-05 09:26:51
    18791 4C656458                     38 20-AUG-05 09:26:51
    18792 4C656458                     38 20-AUG-05 09:26:51
    18793 4C656458                     38 20-AUG-05 09:26:51
    18794 4C656458                     38 20-AUG-05 09:26:51
    18795 4C656458                     38 20-AUG-05 09:26:51
    18796 4C656458                     38 20-AUG-05 09:26:51
    18797 4C656458                     38 20-AUG-05 09:26:51
    18798 4C656458                     38 20-AUG-05 09:26:51
    18799 4C656458                     38 20-AUG-05 09:26:51
    18800 4C656458                     38 20-AUG-05 09:26:51
    18801 4C656458                     38 20-AUG-05 09:26:51
    18802 4C656458                     38 20-AUG-05 09:26:51
    18803 4C656458                     38 20-AUG-05 09:26:51
    18804 4C656458                     38 20-AUG-05 09:26:51
    18805 4C656458                     38 20-AUG-05 09:26:51
    18806 4C656458                     38 20-AUG-05 09:26:51
    18807 4C656458                     38 20-AUG-05 09:26:51
    18808 4C656458                     38 20-AUG-05 09:26:51
    18809 4C656458                     38 20-AUG-05 09:26:51
    18810 4C656458                     38 20-AUG-05 09:26:51
    18811 4C656458                     38 20-AUG-05 09:26:51
    18812 4C656458                     38 20-AUG-05 09:26:51
    18813 4C656458                     38 20-AUG-05 09:26:51
    18814 4C656458                     38 20-AUG-05 09:26:51
    18815 4C656458                     38 20-AUG-05 09:26:51
    18816 4C656458                     38 20-AUG-05 09:26:51
    18817 4C656458                     38 20-AUG-05 09:26:51
    18818 4C656458                     38 20-AUG-05 09:26:51
    18819 4C656458                     38 20-AUG-05 09:26:51
    18820 4C656458                     38 20-AUG-05 09:26:51
    18821 4C656458                     38 20-AUG-05 09:26:51
    18822 4C656458                     38 20-AUG-05 09:26:51
    18823 4C656458                     38 20-AUG-05 09:26:51
    18824 4C656458                     38 20-AUG-05 09:26:51
    18825 4C656458                     38 20-AUG-05 09:26:51
    18826 4C656458                     38 20-AUG-05 09:26:51
    18827 4C656458                     38 20-AUG-05 09:26:51
    18828 4C656458                     38 20-AUG-05 09:26:51
    18829 4C656458                     38 20-AUG-05 09:26:51
    18830 4C656458                     38 20-AUG-05 09:26:51
    18831 4C656458                     38 20-AUG-05 09:26:51
    18832 4C656458                     38 20-AUG-05 09:26:51
    18833 4C656458                     38 20-AUG-05 09:26:51
    18834 4C656458                     38 20-AUG-05 09:26:51
    18835 4C656458                     38 20-AUG-05 09:26:51
    18836 4C656458                     38 20-AUG-05 09:26:51
    18837 4C656458                     38 20-AUG-05 09:26:51
    18838 4C656458                     40 20-AUG-05 09:26:51
    18839 4C656458                     40 20-AUG-05 09:26:51
    18840 4C656458                     40 20-AUG-05 09:26:51
    18841 4C656458                     40 20-AUG-05 09:26:51
    18842 4C656458                     40 20-AUG-05 09:26:51
    18843 4C656458                     40 20-AUG-05 09:26:51
    18844 4C656458                     40 20-AUG-05 09:26:51
    18845 4C656458                     40 20-AUG-05 09:26:51
    18846 4C656458                     40 20-AUG-05 09:26:51
    18847 4C656458                     40 20-AUG-05 09:26:51
    18848 4C656458                     40 20-AUG-05 09:26:51
    18849 4C656458                     40 20-AUG-05 09:26:51
    18850 4C656458                     40 20-AUG-05 09:26:51
    18851 4C656458                     40 20-AUG-05 09:26:51
    18852 4C656458                     40 20-AUG-05 09:26:51
    18853 4C656458                     40 20-AUG-05 09:26:51
    18854 4C656458                     40 20-AUG-05 09:26:51
    18855 4C656458                     40 20-AUG-05 09:26:51
    18856 4C656458                     40 20-AUG-05 09:26:51
    18857 4C656458                     40 20-AUG-05 09:26:51
    18858 4C656458                     40 20-AUG-05 09:26:51
    18859 4C656458                     40 20-AUG-05 09:26:51
    18860 4C656458                     40 20-AUG-05 09:26:51
    18861 4C656458                     40 20-AUG-05 09:26:51
    18862 4C656458                     40 20-AUG-05 09:26:51
    18863 4C656458                     40 20-AUG-05 09:26:51
    18864 4C656458                     40 20-AUG-05 09:26:51
    18865 4C656458                     40 20-AUG-05 09:26:51
    18866 4C656458                     45 20-AUG-05 09:26:51
    18867 4C656458                     45 20-AUG-05 09:26:51
    18868 4C656458                     45 20-AUG-05 09:26:51
    18869 4C656458                     45 20-AUG-05 09:26:51
    18870 4C656458                     45 20-AUG-05 09:26:51
    18871 4C656458                     45 20-AUG-05 09:26:51
    18872 4C656458                     45 20-AUG-05 09:26:51
    18873 4C656458                     45 20-AUG-05 09:26:51
    18874 4C656458                     45 20-AUG-05 09:26:51
    18875 4C656458                     45 20-AUG-05 09:26:51
    18876 4C656458                     45 20-AUG-05 09:26:51
    18877 4C656458                     45 20-AUG-05 09:26:51
    18878 4C656458                     45 20-AUG-05 09:26:51
    18879 4C656458                     46 20-AUG-05 09:26:51
    18880 4C656458                     51 20-AUG-05 09:26:51
    18881 4C656458                     51 20-AUG-05 09:26:51
    18882 4C656458                     51 20-AUG-05 09:26:51
    18883 4C656458                     51 20-AUG-05 09:26:51
    18884 4C656458                     51 20-AUG-05 09:26:51
    18885 4C656458                     51 20-AUG-05 09:26:51
    18886 4C656458                     51 20-AUG-05 09:26:51
    18887 4C656458                     51 20-AUG-05 09:26:51
    18888 4C656458                     51 20-AUG-05 09:26:51
    18889 4C656458                     51 20-AUG-05 09:26:51
    18890 4C656458                     51 20-AUG-05 09:26:51

August 20, 2005 - 5:15 pm UTC

Recursive SQL.

turn on sql trace and see if you are not doing SQL behind your SQL (space management, quotas, etc)....

That was likely a recursive sql transaction to update the dictionary.

Which is the best way to update?

A Reader, August 22, 2005 - 11:45 pm UTC

Hi Tom,

We have a fact table with few dimension tables. Initially the records are inserted into the fact table. This way most of the columns in the table will be populated with values. For the remaining columns we have to update the table by joining with other tables.
1) What will be the best way to update? We are currently using correlated queries to update, but there is no index on the column being joined in the fact table. The source table has index if its a key column.

2)Should we create index on the joining columns in the fact table? If there are many such columns, is it a good idea to create index on each of them before updating? Will this improve performance?

3)We are currently analyzing the fact table before update. Is this the right way though we are not creating index on all required columns before update?

4) Should index be created on table before update and the table be analyzed?

Your inputs will be highly appreciated and very helpful.

Thanks in advance.

August 24, 2005 - 3:44 am UTC

I like to update joins

update ( select t1.a, t2.b from t1, t2 where t1.fk = t2.pk )
set a = b;

t2 would need to have a primary key in place for this.

Which is the best way to update?

A Reviewer, August 24, 2005 - 12:17 am UTC

Tom can you please answer the queries posted in the immediately previous post?

August 24, 2005 - 9:06 am UTC

I did, update the join by primary key. That was my answer. Use the CBO.

Allan Sun, October 02, 2005 - 1:01 pm UTC

I found your answer is very helpful and wonder how I can submit my question to you.
Thanks

October 02, 2005 - 5:35 pm UTC

when taking them, there is a button on the home page.

Calculate redolog generation

A reader, October 04, 2005 - 1:49 pm UTC

Tom,
Is there any way to calculate the amount of redolog that is going to be generated when some 2 million records are updated?
I have to update 2 million records (I can't recreate a dummy table etc etc). I need to know how much redolog will be generated and OS space required for the undo datafile

October 04, 2005 - 6:23 pm UTC

not really, not without simulating at least a portion of it on some system and measuring the amount of redo.

Table/column widths, row chaining/migration, indexes involved - they'll all play a role.

I frequently do 1,000,000 to 10,000,000 records on my laptop - I would think you can test this out on a test system someplace?

it's very good

gopal, December 07, 2005 - 5:51 am UTC

Delete, then Insert or Update?

A reader, January 06, 2006 - 3:35 pm UTC

Tom,
Quick Question. while trying to modify a row in a table from a procedure, is it better[Performance wise] to delete existing one and then insert with new values or update the existing values with new values,WHY?

Thanks,

January 06, 2006 - 4:52 pm UTC

update is less work.

you typically update a subset of the columns.

only the columns that change values and have indexes on them will affect the indexes.

doing a delete+insert would take more resources, hit all indexes.

Large update due to consolidation

David Piazza, January 25, 2006 - 8:41 pm UTC

We are consolidating five databases where the schemas are the same into one database. Currently a test system was created with five schemas that hold the previous data from the five databases. It isn't a clean update as all primary keys are in the same range, so some number x will have to be added to the primary(Psuedo key) and child keys in each schema to avoid dups. After reading through this topic, I can see two ways of doing the merge and updating the first schema: CTAS incrementing each primary and foreign key by some x value. Then either insert /*+ append or use the merge command to insert/merge the data into the first schema. The second way would be to directly insert /*+ append into the first schema from tables in each of the five schemas adding the appropriate value to the primary/foreign keys. How would you suggest implementing this?

January 26, 2006 - 10:16 am UTC

I would try to load the "fixed" data.

Eg: if the keys that are in need of fixing are all numbers - I might multiply each key by 10 and add 1, 2, 3, 4, or 5 to each during the load.

mass insert code package

Brett Clark, February 13, 2006 - 4:47 pm UTC

For what its worth, here is a package that can be used as a generic foundation for mass loads. It will disable triggers and indexes and (optionally) truncate the main table before running the insert statement that you provide. Hope this helps someone.

create or replace package table_support is

TRUNCATE_NO constant boolean := FALSE;
TRUNCATE_YES constant boolean := TRUE;

procedure load_table(G_TABLE_NAME Varchar2, G_INSERT_STMT Varchar2, TRUNC_ACTION Boolean DEFAULT TRUNCATE_NO );

end table_support;
/
create or replace package body table_support is

ACTION_ENABLE constant boolean := TRUE;
ACTION_DISABLE constant boolean := FALSE;

procedure exec( inStmt Varchar2 ) is
begin
EXECUTE IMMEDIATE inStmt;
end exec;

function exec( inStmt Varchar2 ) return Number is
begin
exec( inStmt );
return( SQL%ROWCOUNT );
end exec;

procedure tbl_triggers(G_TABLE_NAME Varchar2, action Boolean) is
begin
if( action = ACTION_ENABLE ) then
exec( 'ALTER TABLE ' || G_TABLE_NAME || ' ENABLE ALL TRIGGERS' );
elsif ( action = ACTION_DISABLE ) then
exec( 'ALTER TABLE ' || G_TABLE_NAME || ' DISABLE ALL TRIGGERS' );
end if;
end tbl_triggers;

procedure tbl_indexes( G_TABLE_NAME Varchar2, action Boolean ) is
begin
FOR C IN (SELECT INDEX_NAME
FROM USER_INDEXES
WHERE TABLE_NAME = G_TABLE_NAME) LOOP
if( action = ACTION_DISABLE ) then
exec( 'alter index ' || C.INDEX_NAME || ' unusable' );
elsif( action = ACTION_ENABLE ) then
exec( 'alter index ' || C.INDEX_NAME || ' rebuild parallel nologging' );
end if;
END LOOP;

if( action = ACTION_ENABLE ) then
exec( 'alter session set skip_unusable_indexes=false' );
elsif ( action = ACTION_DISABLE ) then
exec( 'alter session set skip_unusable_indexes=true' );
end if;
end tbl_indexes;

procedure load_table(G_TABLE_NAME Varchar2, G_INSERT_STMT Varchar2, TRUNC_ACTION Boolean DEFAULT TRUNCATE_NO ) is
rowCount Number;
begin
-- DISABLE ALL TRIGGERS
tbl_triggers( G_TABLE_NAME, ACTION_DISABLE );

-- TRUNCATE TABLE
if( TRUNC_ACTION = TRUNCATE_YES ) then
exec( 'truncate table ' || G_TABLE_NAME );
end if;

-- DISABLE INDEXES
tbl_indexes( G_TABLE_NAME, ACTION_DISABLE );

-- ENABLE PARALLEL DML
exec( 'alter session enable parallel dml' );

-- EXECUTE INSERT STATEMENT
rowCount := exec( G_INSERT_STMT );

DBMS_OUTPUT.PUT_LINE('Inserted ' || rowCount || ' records into ' || G_TABLE_NAME);

-- ENABLE/REBUILD INDEXES
tbl_indexes( G_TABLE_NAME, ACTION_ENABLE );

-- ENABLE TRIGGERS
tbl_triggers( G_TABLE_NAME, ACTION_ENABLE );

-- TURN OFF PARALLEL DML
exec( 'alter session disable parallel dml' );

end load_table;

begin
-- Initialization
NULL;
end table_support;
/

mass update

Hans, February 17, 2006 - 5:19 pm UTC

I have a table with 50 columns and 10 million records and I have a file with 10 million rows and I have to update 5 columns in the table from the file( the update is performed by getting the common ID value).
Steps I am performing
1.open the file using utl_file
2. read the record
3. perform the update in the table
The process is very slow.
I will appreciate better ideas from the group

February 18, 2006 - 8:16 am UTC

external table (so you can query the flat file), or sqlldr it into a temporary table.

and then merge.

If you are in 9i, you'll have to:

merge into db_table
using (select * from external_table where id in (select id from db_table))
on (db_table.id = external_table.id)
when matched then update ...
when not matched then insert (id) values (null); <<<=== never happens

in 10g, you can just

merge into db_table
using external_table
on (db_table.id = external_table.id)
when matched then update ...;

A reader, February 18, 2006 - 5:37 am UTC

Morning TOM.

sudip, February 19, 2006 - 12:14 am UTC

What is archieve log mode? why we prefer recovery in Archieve log mode? Please, will u brief it?

February 19, 2006 - 9:06 am UTC

"U" is not here
I have never met "U"

If you know how to contact them - I'd really appreciate a pointer on doing so. They get a lot of requests.

If you don't mind, I'll give you a pointer to what archivelog mode is (until we can find this "U" person)

</code> http://docs.oracle.com/docs/cd/B19306_01/backup.102/b14192/strategy003.htm#sthref127 <code>

:)

A reader, February 19, 2006 - 12:17 pm UTC

To sudip from BGALORE, india

Looks like you are new to this site just put one thing in your mind Tom really ticks off for the "U" so dont use it next time :-)

February 19, 2006 - 5:16 pm UTC

Hey, the page he used to post that actually said "if you do this, I will do what I just did..."

Which is to poke fun at at :)

It amazes more than ticks me off.

Just like in some parts of the world, an "English Accent" adds 10 IQ points (when I hear a British accent, for whatever reason - I add default IQ points, maybe it was all of the James Bond movies when I was a kid) - using this instant message speak takes them away.

Large Update Due To Consolidation

David Piazza, February 21, 2006 - 4:29 pm UTC

Tom,

Regarding my previous question. They've decided to do merge one of the five clusters at a time. I've got scripts to implement the first cluster as such:

CTAS NOLOGGING setting the keys to old_key*10+1
insert /*+ append */
DROP old table
RENAME new table
GRANTs
ALTER TABLE add Primary Key
ALTER TABLE add Foreign Key
CREATE INDEX NOLOGGING parallel(degree 4)
ALTER TABLE ADD CONSTRAINT UNIQUE ENABLE
ALTER TABLE Logging

This seems pretty fast. Anything you see that will make it faster?

When I merge in the next clusters, should I:
- Do the same steps as above(*10+2,*10+3 etc.)
- Disable primary keys, make indexes unusable and
Insert /*+ append, and then enable PK's, and indexes
- Drop primary keys, indexes, and constraints, and
INSERT /*+ append, create PK's, Indexes, and constraints

The number of rows in the tables(Approximately the same in each cluster) are: 72338677,32278464,25347185,23834984,7595305,4519904,4371347,1229877,1168373,6669752,6382030,97250,4694,934,166

-Thanks

February 22, 2006 - 8:26 am UTC

why not all of them at once to avoid having to do the disable/enable over and over.

Large Update Due To Consolidation

David Piazza, February 22, 2006 - 1:28 pm UTC

That was my suggestion, but the customer wants to do one cluster at a time, especially the first one, ensuring that it works as before and all the users are happy before merging another cluster in...

February 22, 2006 - 2:09 pm UTC

then you got it.

You don't really need to disable all contraints and if you could get it done in two bits - that would be the least work (load cluster 1, then 2, 3, 4 and 5 after testing 1)

Large Update Due To Consolidation

David Piazza, February 23, 2006 - 4:23 pm UTC

Ok. Which of the three methods would you suggest to merge in the remaining clusters, or would you recommend something else?

1) Do the same steps as above, CTAS/DROP old table/Rename
new table to old table/Insert /*+ append(*10+3,*10+4
etc.)/Create constraints,indexes.
OR
2) Disable primary keys, make indexes unusable and
Insert /*+ append, and then enable PK's, and indexes
OR
3) Drop primary keys, indexes, and constraints, and
INSERT /*+ append, create PK's, Indexes, and constraints
OR
4) Something else

I'm leaning towards 2) or 3)

-Thanks

February 23, 2006 - 8:05 pm UTC

I would just be disabling indexes (if I wanted to - insert append does a rather efficient "merge" of data) and insert appending the other data. I would not redo the CTAS over.

Bulk Insert / Update in sigle SQL

Narendra, February 24, 2006 - 1:49 am UTC

Tom,

I have a target table that stores current as well as history records. Current records are identified by a field CURRENT_REC having value of 'Y'. History records have value of CURRENT_REC as 'N'. At any stage, for a PART_NO, there can be only SINGLE record with CURRENT_REC as 'Y' in target table.
My problem is I want to transfer data (in bulk) from source table to target table with CURRENT_REC flag as 'Y' and simultaneously update existing records having CURRENT_REC as 'Y'. Following is table definition:
Source table:
CREATE TABLE PART_SRC
(
PART_NO NUMBER,
PART_NAME VARCHAR2(100),
BALANCE_QTY NUMBER(5),
CREATED_DATE DATE,
CURRENT_REC VARCHAR2(1)
)
target Table:
CREATE TABLE PART_TABLE
(
PART_NO NUMBER,
PART_NAME VARCHAR2(100),
BALANCE_QTY NUMBER(5),
CREATED_DATE DATE,
CURRENT_REC VARCHAR2(1)
)

Is this possible in a single SQL ?

February 24, 2006 - 8:25 am UTC

I don't get it.

I want to transfer data (in bulk) from source table to target table with current_rec flag as 'Y'

I understand that.

the "simultaneously update existing records have a current-rec as 'Y'"

which existing records, in which table, and update what exactly (since we are already looking only at 'Y' records - what exactly are we updating??!?!)

Bulk Insert / Update in sigle SQL

Michel Cadot, February 24, 2006 - 9:50 am UTC

If i clearly understand you, you want something like (inspired by "merge delete" or "insert all" statements):

merge into part_table t
using part_src s
on (t.part_no = s.part_no and t.current_rec='Y')
when matched then
update t.current_rec = 'N'
insert (t.part_no, ...) values (s.part_no, ...)
when not matched then
insert (t.part_no, ...) values (s.part_no, ...)
/

or

merge ALL into part_table t
using part_src s
on (t.part_no = s.part_no and t.current_rec='Y')
when matched then update t.current_rec = 'N'
when TRUE then
insert (t.part_no, ...) values (s.part_no, ...)
/

But this does not exist. ;)

Michel

More details

Narendra, February 24, 2006 - 9:28 pm UTC

Tom,

One correction to table definitions.
Source table does not contain CURRENT_REC field.
What I am trying to achieve is transfer data from source to target. Each time data is appended to target table. I may transfer updated details of PART_NO multiple times from source to target. However, I do not want data in target table to be overwritten. In order to identify the "last loaded record", I am maintaining CURRENT_REC field in target table. If a PART_NO (from source table) does not exist in target table, it should simply add that PART_NO to target table with value of CURRENT_REC = 'Y'. However, if PART_NO already exists i.e. data has been transferred earlier from source to target for that PART_NO, target table should already have a single record with CURRENT_REC as 'Y' and may have multiple records with CURRENT_REC as 'N' for same PART_NO. Now, when I transfer data for same PART_NO again from source to target, it should be added with CURRENT_REC as 'Y'. However, at the same time, existing record for same PART_NO, having CURRENT_REC as 'Y' should be updated with 'N'.
Hope I am clear.

Michel,
MERGE will not help since CURRENT_REC will not be there in my source table.

Thanks

More details

Michel Cadot, February 25, 2006 - 8:45 am UTC

Narendra,

I understood this and the pseudo-merges don't use the current_rec of source (s) but only of target (t). It would be clearer if I wrote "insert (t.part_no, ..., t.current_rec) values (s.part_no, ..., 'YES')".

There is no way to do it in a single SQL statement but it is easy to do it in an efficient PL/SQL block.

Regards
Michel

Merging ten large tables

Jesse Buzzard, March 17, 2006 - 7:00 pm UTC

Good Evening,

This is my first foray into very large databases and I need some advice.

We have ten tables that contain blob data. The total size of the tables is approximately 1.8TB. They are stored in the same table space. How this came to be is a long story for another discussion thread. Our task is merge the tables into one table. After selecting the table with the largest number of rows as the target we came up with the below code to insert the data from another table into it.
-------- start of code -------
#: Announce start insert

echo "-------------------------------------------------------------"
echo "`date '+ %a %e-%b-%Y %X'` - $0 - started" >> $1
echo "-------------------------------------------------------------"

$ORACLE_HOME/bin/sqlplus /nolog << EOF
connect odirms/odirms

insert /* +APPEND PARALLEL (nto, 8) */ into odirms.t_ompf4 nto
select /* +APPEND PARALLEL (tompf1, 8) */ * from t_ompf1 tompf1;
commit;
exit

EOF

#: End of program
echo "`date '+ %a %e-%b-%Y %X'` - $0 - completed" >> $1
echo "-------------------------------------------------------------"
echo
------------ end of code ----------

At present we are trying to merge 4 of the tables. All logging and indexes are off. Multi threading is on. From all I have read we should have completed this in a relatively short amount of time. This job has been running for over 28 hours. I can see the space used is growing (about 300MB /minute), but my supervisors would like to know when we will have all the tables merged.
Where have gone astray?

Thank You,
Jesse

March 18, 2006 - 11:55 am UTC

well, you said "all indexes are off" and "we have blobs" - the two concepts do not go together.  every blob has an index associated with it that cannot possibly be disabled for any reason (short of dropping the blob column!)

So, you do having logging going on, you do have index maintenance going on. 

but you don't give any information as to the size of the tables being read (and hence written into the new table). 

I don't think anything has "gone astray" here, I can say that your "hint" isn't a hint, it is just text you typed in:

ops$tkyte@ORA10GR2> create table t as select * from dual;

Table created.

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> insert /* +APPEND */ into t select * from dual;

1 row created.

ops$tkyte@ORA10GR2> select * from t;

D
-
X
X

ops$tkyte@ORA10GR2> insert /*+ APPEND */ into t select * from dual;

1 row created.

ops$tkyte@ORA10GR2> select * from t;
select * from t
              *
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel



See how the /* + didn't really do anything, but /*+ does.  If that is your real script, you likely did not achieve your goal of either append nor parallel (parallel implies append by the way, if it was in fact using parallel)

Merging ten large tables

Jesse Buzzard, March 20, 2006 - 12:15 am UTC

Thank you for your response.
If I may I would like to impose on your wisdom again. We are averaging 1800MB per hour. Yes, the sample I provied is one of our real scripts, and yes the goal was work in parallel. At present five of the tables (56% of the rows) have been merged. The total size of the data, is 1.8 TB in 32,518,931 rows. As you can see from the description below the amount of data in addition to the blob data is rather small. If we have gotten the hint correct would we have achieve a better insert/append rate?

Name Null? Type
----------------------------------------- -------- -----------

OMPF_ID NOT NULL NUMBER
OMPF_UID NUMBER(10)
OMPF_IMG_TYPE_ID CHAR(1)
OMPF_IMG_STYPE_ID CHAR(1)
OMPF_PAGE_NUM NUMBER(5)
OMPF_ACTIVE_FLAG CHAR(1)
OMPF_ADDED_DT DATE
OMPF_INACTIVE_DT DATE
OMPF_IMAGE BLOB
OMPF_LOCK_FLAG NUMBER(5)
OMPF_USER_ID CHAR(8)

March 20, 2006 - 7:16 am UTC

you might have, you might not have. you sort of have to figure out where you are spending your time (tracing - a level 12 10046 trace on a bit of this would go a long way)

Alternative Suggestion

VK, March 21, 2006 - 2:40 pm UTC

Here is an approach based on exchange partitions I used successfully in the past for combining tables (This will work only if the data in the individual tables is distinct and there is no overlap - A scenario in which this will work well is when people in their infinite wisdom create horizontally "partitioned" tables. An example may be where all customers with ID 1 to one million are kept in CUST1 table and one million to 5 million in CUST2 table and so on..):

4 tables to be combined:

create table test(a number,b blob);
create table test1(a number,b blob);
create table test2(a number,b blob);
create table test3(a number,b blob);

Now, let's say table test has values 1 to 10,000 in column a and test1 has 10,001 to 20,000, test2 has 20,001 to 30,000 and test3 has 30,001 to 40,000.

Create a partitioned table that looks exactly like the tables being combined - pick a partition key where the values do not overlap between tables.

create table testp(a number,b blob)
partition by range(a)
(
partition p1 values less than (10001),
partition p2 values less than (20001),
partition p3 values less than (30001),
partition p4 values less than (maxvalue)
);

You can combine the 4 tables into the partitioned table using exchange partition:

alter table testpart exchange partition p1 with table test;
alter table testpart exchange partition p1 with table test1;
alter table testpart exchange partition p3 with table test2;
alter table testpart exchange partition p4 with table test3;

At this point, you can use the partitioned table or if you don't want the table to be partitioned, you can merge all 4 partitions into a single partition and exchange with a non-partitioned table.

alter table testpart merge partitions p1,p2 into partition p5;
alter table testpart merge partitions p5,p3 into partition p6;
alter table testpart merge partitions p6,p4 into partition p7;

Now we have just one partition (P7) in the table and you can exchange this with any of the original tables (we'll use test).

alter table testpart exchange partition p7 with table test;

Now we have all 40,000 rows in table test. As you can see, it is a lot of work to combine 4 small tables but if you have tens/hundreds of millions of rows, this will be much faster.

Merging ten large tables

Jesse Buzzard, March 21, 2006 - 5:03 pm UTC

Tom,
VK,

Thank you for your suggestions. They are educational and helpful. The merges have nearly completed. Now I am off to read about reorganizing the tablespace to reclaim some of the space formerluy used by the smaller tables.

Merging ten large tables

Jesse Buzzard, March 21, 2006 - 5:48 pm UTC

Very slow insert into table

bayaraat, April 17, 2006 - 12:16 am UTC

Hi Tom

I need your expertise in this regard.
My table is
create table CDR00
(
CALLDATETIME DATE,
TELNO VARCHAR2(20),
CALLEDNO VARCHAR2(20),
SECONDS NUMBER,
PULSE NUMBER,
PREFIXCODEA CHAR(14),
PREFIXCODEB CHAR(14),
RATECODE CHAR(28),
TOG_AMOUNT NUMBER(20,2),
USD_AMOUNT NUMBER(20,8),
NO_TOG_AMOUNT NUMBER(20,8),
NO_USD_AMOUNT NUMBER(20,2),
TOG_TAX NUMBER(20,8),
USD_TAX NUMBER(20,8),
FNAME VARCHAR2(12),
OPERATOR VARCHAR2(10),
RESERVED VARCHAR2(10),
PRINTED NUMBER(1)
)
tablespace CDR01
pctfree 20
pctused 80
initrans 1
maxtrans 255
storage
(
initial 1M
minextents 1
maxextents unlimited
);
Every day insert into this table 2 million records and truncate table. This contains million or records insert into partition table.
Insert process is very slow, respond times 12:00:00.
I want to quickly insert many records. I using RAID 5.
Not use constraint and trigger. Non stop archive log process. Because Oracle database server is Standby database server. Any suggestions please...

April 17, 2006 - 7:51 am UTC

if done correctly, it would only take minutes or less to bulk load 2 million records.

So, you need to describe HOW you are loading this.

and if you are loading and truncating, doubtful that you are updating - so pctfree=0 would be more appropriate -and perhaps even "compress", but it depends on how you load after all - which we don't know.

Very slow insert into table

Bayaraat, April 17, 2006 - 12:32 am UTC

Hi Tom,

I need your expertise in this regard.

My create table is
create table CDR00
(
CALLDATETIME DATE,
TELNO VARCHAR2(20),
CALLEDNO VARCHAR2(20),
SECONDS NUMBER,
PULSE NUMBER,
PREFIXCODEA CHAR(14),
PREFIXCODEB CHAR(14),
RATECODE CHAR(28),
TOG_AMOUNT NUMBER(20,2),
USD_AMOUNT NUMBER(20,8),
NO_TOG_AMOUNT NUMBER(20,8),
NO_USD_AMOUNT NUMBER(20,2),
TOG_TAX NUMBER(20,8),
USD_TAX NUMBER(20,8),
FNAME VARCHAR2(12),
OPERATOR VARCHAR2(10),
RESERVED VARCHAR2(10),
PRINTED NUMBER(1)
)
tablespace CDR01
pctfree 20
pctused 80
initrans 1
maxtrans 255
storage
(
initial 1M
minextents 1
maxextents unlimited
);

Every day insert into this table for 2 million and records. Response time is 12:00:00. After insert into another partition table and truncate table. I want very quickly insert million and records. Are you need some information? My oracle database server is Standby database server. This table is not use constraint and trigger.
Any suggestions please...

Moving high volumes of data from table1 to table2 in an OLTP system

A reader, May 10, 2006 - 10:54 am UTC

Tom,

We are looking at archiving data in a table (schema1.table1) having about 40 Million records. This table is populated continously by a java application. The objective is to move all but 5 million records(latest records) to the archive table with minimal impact to the end user. The option I am looking at is

create a different schema (schema2) on tablespace2.
create a new table (table2), structure similar to table1.
create a procedure(p_1) that will copy data from schema1.table1 to schema2.table2.
call procedure 1 from a job on a certain schedule.

Psedo code for procedure p_1
----------
insert into schema2.table2
select a.*
from schema1.table1 a
where year(a.creation_date) < 2006
and rownum < 1000
and a.id not in (select id from schema2.table2);
commit;

delete from schema1.table1 aa
where aa.id in (select bb.id from schema2.table2);
commit;
----------

Would like your expert comments on
1) Do you see any problems with the psedo code?
One of the problems I see is
After step #1 and before step #2 in the psedo code, a user may see duplicate records. Is there a way to avoid this situation?

2) Is there a better of implementing this change?

Thank you

May 11, 2006 - 7:31 am UTC

I'd probably rather move the 5 million records to a new table.

(move the lesser of the two amounts)

time to look at partitioning for you? to avoid this in the future.

Updating a single column to 'N' in a 50 Million records table

A reader, May 15, 2006 - 5:34 pm UTC

Tom,

In one of our application, we have a database conversion job to update a column with a 'N' for a table that has about 50 million rows.
We are using a rollback segment which has the capacity to commit about 8 million rows at a time.
So we are running update statements to commit 7 million rows at a time between each of the 8 identical update statements.
( requires us to run the update 8 times to commit all the 50 million updates)

The time for the conversion of these 50 million rows is taking about 20 hours.

We have already explored the following CTAS method :

1. CTAS/INSERT/APPEND/PARALLEL and rename the new table to orig table but due to space constraints and this particular table is bigger in size, the DBAs say this method is not feasible due to space constraints.

So we are looking for other methods/suggestions to see if we can improve the timing.

Thank you.

May 15, 2006 - 6:08 pm UTC

parallel update, using many rollback segments simultaneously.

I pray this column is NOT indexed.

Parallel update question

A reader, May 18, 2006 - 4:22 pm UTC

Tom,

Per Oracle documentation, it seems Tables must be partitioned to support parallel update and delete.

So, i tried parallel update on partitioned table as shown below but how do we confirm it really did parallel dml because after parallel update is done.. i tried to query the same table and it did not generate any error. 

I see the same behaviour on non-partitioned tables when parallel update was performed.


Parallel update statement logs are given below :

Partitioned table :
===================

SQL> update /*+ parallel (T, 3) */ T set status='NV' where status='INVALID' ;

1410 rows updated.


Execution Plan
----------------------------------------------------------
   0      UPDATE STATEMENT Optimizer=FIRST_ROWS (Cost=11 Card=138 Byte
          s=690)

   1    0   UPDATE OF 'T'
   2    1     PARTITION RANGE* (ALL)                                   :Q214000
   3    2       TABLE ACCESS* (FULL) OF 'T' (Cost=11 Card=138 Bytes=69 :Q214000
          0)



   2 PARALLEL_COMBINED_WITH_PARENT
   3 PARALLEL_TO_SERIAL            SELECT /*+ Q214000 NO_EXPAND ROWID(A1) */ A1
                                   .ROWID,A1."STATUS" FROM "T" PX_GRANULE(0, BL
                                   OCK_RANGE, DYNAMIC)  A1 WHERE A1."STATUS"='I
                                   NVALID'



Statistics
----------------------------------------------------------
         19  recursive calls
       1733  db block gets
        263  consistent gets
        168  physical reads
     351584  redo size
        854  bytes sent via SQL*Net to client
        597  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          6  sorts (memory)
          0  sorts (disk)
       1410  rows processed

IOMDV:SQL> select count(*) from T ;

    COUNT(*)
------------
       12239

1 row selected.


Non-partitioned table :
========================

update /*+ parallel(tn 3) */ tn set valid_ind='N' ;

29250 rows updated.


Execution Plan
----------------------------------------------------------
   0      UPDATE STATEMENT Optimizer=FIRST_ROWS (Cost=14 Card=29250 By
          tes=58500)

   1    0   UPDATE OF 'TN'
   2    1     TABLE ACCESS* (FULL) OF 'TN' (Cost=14 Card=29250 Bytes=5 :Q212000
          8500)



   2 PARALLEL_TO_SERIAL            SELECT /*+ Q212000 NO_EXPAND ROWID(A1) */ A1
                                   .ROWID,A1."VALID_IND" FROM "IOMDBO"."TN" PX_
                                   GRANULE(0, BLOCK_RANGE, DYNAMIC)  A1



Statistics
----------------------------------------------------------
         73  recursive calls
     143947  db block gets
        244  consistent gets
        222  physical reads
   18118440  redo size
        844  bytes sent via SQL*Net to client
        576  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          6  sorts (memory)
          0  sorts (disk)
      29250  rows processed

bala@IOMDV> select count(*) from tn ;

    COUNT(*)
------------
       29250

1 row selected.

May 19, 2006 - 11:42 am UTC

that used to be true in older releases, Has not been true since 9i.

</code> http://docs.oracle.com/docs/cd/B10501_01/server.920/a96531/ch2_9ir2.htm#74432 <code>

A reader, May 19, 2006 - 3:05 pm UTC

Thanks for the info.

in oracle 8.1.7, However, with parallel DML on partitioned table, i should not be able to query the same table until i commit the parallel update.
But i was able to query the same table with our any error generated.

Is it the expected behaviour.

May 20, 2006 - 4:39 pm UTC

you did not do a parallel update.

<quote src=expert Oracle database architecture>

We can actually observe the fact that there are separate independent transactions created for the parallel execution servers. Well use two sessions again, as before. In the session with SID=162, we explicitly enable parallel DML. PDML differs from parallel query in that regards; unless you explicitly ask for it, you will not get it:

big_table@ORA10GR1> alter session enable parallel dml;
Session altered.

The fact that the table is parallel is not sufficient, as it was for parallel query. The reasoning behind the need to explicitly enable PDML in your session is the fact that PDML has certain limitations associated with it, which I list after this example.

</quote>

Thank you

A reader, May 20, 2006 - 8:08 pm UTC

RE: Parallel update

bakunian, May 23, 2006 - 8:20 am UTC

I have table "A" 17 GB and 120 mil. rows. I needed to add columns with NOT NULL therefore I had to add column, update every row setting new column to some value, then make it NOT NULL. When I ran update it ran for almost 10 hours, then I thought to create empty dummy table "B" as select * from "A" and setting columns default value to required value.
To my surprise when I did
INSERT /*+ parallel (b, 4) INTO b SELECT /*+ parallel (a,4) */ * FROM a
Default values were not inserted but when I issued regular insert into b statement default values appeared. Why?

Thank you for your time.

May 23, 2006 - 8:38 am UTC

show me - you are using "*" and not listing any columns - I do not see how a conventional path insert would have used defaults.

column defaults are only used when you explicitly skip the column in the insert statement.

ops$tkyte@ORA10GR2> create table t2( x varchar2(20), y int );
 
Table created.
 
ops$tkyte@ORA10GR2> insert into t2 values ( 't2', null );
 
1 row created.
 
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> insert /*+ append */ into t (x) select 'direct path' from dual;
 
1 row created.
 
ops$tkyte@ORA10GR2> commit;
 
Commit complete.
 
ops$tkyte@ORA10GR2> insert /*+ append */ into t  select * from t2;
 
1 row created.
 
ops$tkyte@ORA10GR2> commit;
 
Commit complete.
 
ops$tkyte@ORA10GR2> select * from t;
 
X                             Y
-------------------- ----------
direct path                   0
t2
 
ops$tkyte@ORA10GR2> insert into t (x) select 'direct path' from dual;
 
1 row created.
 
ops$tkyte@ORA10GR2> commit;
 
Commit complete.
 
ops$tkyte@ORA10GR2> insert into t  select * from t2;
 
1 row created.
 
ops$tkyte@ORA10GR2> commit;
 
Commit complete.
 
ops$tkyte@ORA10GR2> select * from t;
 
X                             Y
-------------------- ----------
direct path                   0
t2
direct path                   0
t2

Parellel Update

bakunian, May 23, 2006 - 10:07 am UTC

I see so for my trick to work all I had to do is to skip columns that were supposed to have default values. Correct?
At the very beginning of this thread you gave below example but that does not really work for me so what would you do in my case where I need to update millions of rows with default values and new columns are added to a large table with NOT NULL constraint. Was my method almost correct?

ops$tkyte@ORA817DEV.US.ORACLE.COM> create table new_emp as
2 select empno, LOWER(ename) ename, JOB,
3 MGR, HIREDATE, SAL, COMM, DEPTNO
4 from emp;

Thank you,

May 23, 2006 - 10:11 am UTC

well, since you are going to update the entire table - hence locking every row - and you in effect have a "scheduled downtime", I don't agree with you that you cannot use the create table as select. You say "it doesn't really work for you", I say "why not"

Parallel Update

bakunian, May 23, 2006 - 10:34 am UTC

Forgive me I've been up for almost 48 hours and a bit slow so using your example would it look something like below assuming every person in this table will have ename = 'Bob'

create table new_emp as
select empno, 'Bob', job, MGR, .
From emp;

Correct?

May 23, 2006 - 3:26 pm UTC

select empno, 'Bob' ename, job, mgr, ...
from emp

yes.

Estimate time for a create table to end

Thierry Yen Suin, May 26, 2006 - 3:56 am UTC

We once had to sort a very large (non-partitioned) table in the region of 60G.
The sorted column was Type NUMBER and indexed.
So what was/is regularly done is

create table sorted_table
as select * from unsorted_table
where col_to_be_sorted>0;

The sorts normally takes 3 hours to finish using the index on col_to_be_sorted. As I understand this create table does not need TEMP tablespace to do the sort.

Once the sort did not finish after 35hours.
Database was 8i. We stopped the create statement.
Rebuild the index (out of desperation, who knows?)
Reissued the create table. -> never finished.
Did an export/import into 9i on another server.
Created index, issued same create statement, took less than 1hr.

My question:
Is there a way to estimate the time it will take for a create table to finish based on some V$ views?
A way to see the number of rows that have been inserted/created into the temp segment of the "to be created" table?
Or anything that gives us a clue that it will take forever.

Thanks,
Thierry Yen Suin

Humm V$SESSION_LONGOPS

Thierry Yen Suin, June 01, 2006 - 1:22 am UTC

Of Course the V$SESSION_LONGOPS view!
How could I have forgotten than?

Thierry Yen Suin.

Am i missing something?

A reader, June 13, 2006 - 4:18 am UTC

Tom, 

I did following to calculate the redo generated with CTAS with & without nologging. But the amount of redo generated is identical & time taken is even more with nologging. Am I missing here something?

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production
PL/SQL Release 9.2.0.1.0 - Production
CORE    9.2.0.1.0       Production
TNS for Solaris: Version 9.2.0.1.0 - Production
NLSRTL Version 9.2.0.1.0 - Production


SQL> set timing on
SQL> set time on
13:25:17 SQL> drop table big_table;

Table dropped.

Elapsed: 00:00:00.67

13:25:52 SQL> select a.value
13:26:38   2  from v$mystat a , V$STATNAME b
13:26:38   3  where a.statistic# = b.statistic#
13:26:38   4  and b.name = 'redo size'  ;

     VALUE
----------
     54144

Elapsed: 00:00:00.29

13:26:38 SQL>
13:26:46 SQL> create table big_table
13:27:01   2  as select * from dba_objects;

Table created.

Elapsed: 00:00:08.56

13:27:10 SQL> select a.value
13:27:22   2  from v$mystat a , V$STATNAME b
13:27:22   3  where a.statistic# = b.statistic#
13:27:22   4  and b.name = 'redo size'  ;

     VALUE
----------
    109040

Elapsed: 00:00:00.33

13:27:23 SQL> drop table big_table;

Table dropped.

Elapsed: 00:00:00.19

13:27:57 SQL> select a.value
13:28:02   2  from v$mystat a , V$STATNAME b
13:28:02   3  where a.statistic# = b.statistic#
13:28:02   4  and b.name = 'redo size'  ;

     VALUE
----------
    134372

Elapsed: 00:00:00.36

13:28:03 SQL> create table big_table  nologging
13:28:12   2  as select * from dba_objects;

Table created.

Elapsed: 00:00:09.04

13:28:21 SQL> select a.value
13:28:46   2  from v$mystat a , V$STATNAME b
13:28:46   3  where a.statistic# = b.statistic#
13:28:46   4  and b.name = 'redo size'  ;

     VALUE
----------
    188816

Elapsed: 00:00:00.24

June 13, 2006 - 12:18 pm UTC

you are in a noarchivelog mode database aren't you.


run it again, it was a fluke that it took 9 seconds.  something else was happening on your system

nologging on a CTAS in a noarchivelog mode database is a big "no-operation", CTAS is already a direct path, non-logged operation in noarchivelog mode.

You would expect the "logged" one to have generated gobs of redo (of the size of the table).  not 50k


Here we are in ARCHIVELOG MODE:

ops$tkyte@ORA9IR2> set timing on
ops$tkyte@ORA9IR2> @test
ops$tkyte@ORA9IR2> drop table t;
 
Table dropped.
 
Elapsed: 00:00:00.06
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> @mystat "redo size"
ops$tkyte@ORA9IR2> set echo off
 
NAME                      VALUE
-------------------- ----------
redo size               3568320
 
Elapsed: 00:00:00.00
ops$tkyte@ORA9IR2> create table t as select * from dba_objects;
 
Table created.
 
Elapsed: 00:00:00.53
ops$tkyte@ORA9IR2> @mystat2
ops$tkyte@ORA9IR2> set echo off
 
NAME                      VALUE DIFF
-------------------- ---------- ------------------
redo size               6998672        3,430,352
 
Elapsed: 00:00:00.01
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> drop table t;
 
Table dropped.
 
Elapsed: 00:00:00.06
ops$tkyte@ORA9IR2> @mystat "redo size"
ops$tkyte@ORA9IR2> set echo off
 
NAME                      VALUE
-------------------- ----------
redo size               7021144
 
Elapsed: 00:00:00.00
ops$tkyte@ORA9IR2> create table t NOLOGGING as select * from dba_objects;
 
Table created.
 
Elapsed: 00:00:00.47
ops$tkyte@ORA9IR2> @mystat2
ops$tkyte@ORA9IR2> set echo off
 
NAME                      VALUE DIFF
-------------------- ---------- ------------------
redo size               7088872           67,728
 
Elapsed: 00:00:00.00


<b>That yours did not generate more than 50k of redo makes me believe you are in noarchivelog mode like this:

</b>

ops$tkyte@ORA9IR2> @test
ops$tkyte@ORA9IR2> drop table t;
 
Table dropped.
 
Elapsed: 00:00:00.03
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> @mystat "redo size"
ops$tkyte@ORA9IR2> set echo off
 
NAME                      VALUE
-------------------- ----------
redo size                205344
 
Elapsed: 00:00:00.00
ops$tkyte@ORA9IR2> create table t as select * from dba_objects;
 
Table created.
 
Elapsed: 00:00:00.47
ops$tkyte@ORA9IR2> @mystat2
ops$tkyte@ORA9IR2> set echo off
 
NAME                      VALUE DIFF
-------------------- ---------- ------------------
redo size                273312           67,968
 
Elapsed: 00:00:00.01
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> drop table t;
 
Table dropped.
 
Elapsed: 00:00:00.06
ops$tkyte@ORA9IR2> @mystat "redo size"
ops$tkyte@ORA9IR2> set echo off
 
NAME                      VALUE
-------------------- ----------
redo size                295856
 
Elapsed: 00:00:00.00
ops$tkyte@ORA9IR2> create table t NOLOGGING as select * from dba_objects;
 
Table created.
 
Elapsed: 00:00:00.45
ops$tkyte@ORA9IR2> @mystat2
ops$tkyte@ORA9IR2> set echo off
 
NAME                      VALUE DIFF
-------------------- ---------- ------------------
redo size                363612           67,756
 
Elapsed: 00:00:00.03


<b>and the 9 seconds - totally because something else was happening on your system at that point that prevented you from being "fast"</b>

Absolutely correct as usual

A reader, June 14, 2006 - 1:18 am UTC

SQL> SELECT log_mode
  2  FROM v$database;

LOG_MODE
------------
NOARCHIVELOG

Thanks very much Tom.

Merge with aggregates.

Thiru, July 07, 2006 - 10:48 am UTC

Tom,

I think I posted this wrongly elsewhere!
-----------------------------------------
Will merge be a good thing to use (if so, how)when dealing with aggregates in a
multi-million records tables.

For instance:

create table source ( a varchar2(2),b number)
/
create table target ( a varchar2(2),b number,status varchar2(10))
/
insert into source values('ab',100);
insert into source values('bc',100);
insert into source values('cd',100);
insert into source values('ef',100);
insert into source values('fg',100);
insert into source values('gh',100);
insert into source values('hi',100);
insert into source values('ij',100);
insert into source values('jk',100);
insert into source values('kl',100);
insert into source values('mn',100);

insert into target (a,b) values('ab',50);
insert into target (a,b)values('ab',50);
insert into target (a,b) values('bc',50);
insert into target (a,b)values('bc',50);
insert into target (a,b)values('cd',50);
insert into target (a,b)values('cd',50);
insert into target (a,b)values('de',50);
insert into target (a,b)values('de',50);
insert into target (a,b)values('de',50);
insert into target (a,b) values('ef',50);
insert into target (a,b)values('ef',50);
insert into target (a,b)values('gh',50);
insert into target (a,b)values('hi',50);
insert into target (a,b)values('hi',50);
insert into target (a,b)values('ij',50);
insert into target (a,b)values('ij',50);
insert into target (a,b)values('jk',50);
insert into target (a,b)values('jk',50);
insert into target (a,b)values('jk',50);
commit;

I would want to update the target STATUS column to 'MCHED' if the sum(target.b)
= source.b and target.a=source.a.
In all other cases, update the STATUS column to 'UNMCHED'. The number of records
in both the tables are over 2mill.

b. Also, can this be done in a single update statment?

July 08, 2006 - 10:52 am UTC

look elsewhere for the answer, I did it over there.

Large import question

Mike, July 12, 2006 - 3:57 pm UTC

I have visited this thread on and off for a while. I have a question regarding an apparent slowdown of inserts while importing a very large table. We are on 9.2.0.5 on Solaris 8.

I am importing a 60 million row table into our development database. (I just got more space for it, wooohooo!). Not long ago, I found a very nifty script written by Frank Naude` for monitoring import progress, as follows:

select substr(sql_text,instr(sql_text,'INTO "'),30) table_name, rows_processed, 
round((sysdate-to_date(first_load_time,'yyyy-mm-dd hh24:mi:ss'))*24*60,1) minutes,
         trunc(rows_processed/((sysdate-to_date(first_load_time,'yyyy-mm-dd hh24:mi:ss'))*24*60)) rows_per_min
  from   sys.v_$sqlarea
  where  sql_text like 'INSERT %INTO "%'
    and  command_type = 2
    and  open_versions > 0
/

When the table in question started importing, this script returned very high rows per minute values (138905), but as the import progressed, it has fallen to 33779 per minute after a few hours of inserting. I am doing the import from a command line and have the buffer set to 4m, recordlength 64k, commit=n, no statistics (will do entire schema afterward). Can you theorize as to why the performance would erode as the import of a large table progresses? Our box has 4g of memory and 4 cpus. My SGA is about 1G with a healthy cache and auto undo, other parameters would likely pass a sanity check. 


TABLE_NAME                     ROWS_PROCESSED   MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION         524272       3.8       139805

1 row selected.

TABLE_NAME                     ROWS_PROCESSED   MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION         676480       5.2       129263

1 row selected.


TABLE_NAME                     ROWS_PROCESSED   MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION         744128       6.5       114775

1 row selected.

TABLE_NAME                     ROWS_PROCESSED   MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION        1234576      12.9        96075


TABLE_NAME                     ROWS_PROCESSED   MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION        1471344      17.3        84885

                                                         

TABLE_NAME                     ROWS_PROCESSED   MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION        1877232      26.6        70484

                                                                          

TABLE_NAME                     ROWS_PROCESSED   MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION        1978704      28.7        68904

1 row selected.



TABLE_NAME                     ROWS_PROCESSED   MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION        2114000      32.2        65618

1 row selected.



TABLE_NAME                     ROWS_PROCESSED   MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION        2824304      50.4        56019

1 row selected.

SQL> /
Wed Jul 12                                                                     
                                                                           


TABLE_NAME                     ROWS_PROCESSED   MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION        3602256      78.5        45888

1 row selected.


TABLE_NAME                     ROWS_PROCESSED   MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION        4126528      97.8        42186

1 row selected.

SQL> /
Wed Jul 12                                                                     
                                                                           

TABLE_NAME                     ROWS_PROCESSED   MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION        4329472     108.2        40001

1 row selected.

TABLE_NAME                     ROWS_PROCESSED   MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION        5597872     165.7        33779

July 12, 2006 - 5:19 pm UTC

see any "cannot allocate new log" messages?

what do v$session_* views return? like session_wait?

Additional Info

Mike, July 13, 2006 - 9:08 am UTC

SELECT A.USERNAME ora_user, A.OSUSER os_user, A.PROGRAM, C.EVENT||', '|| C.P1TEXT||', '||C.P2TEXT||', '||
C.P3TEXT event,C.WAIT_TIME
FROM V$SESSION A, V$PROCESS B, V$SESSION_WAIT C
WHERE A.PADDR = B.ADDR
AND A.SID = C.SID(+)
AND A.OSUSER IS NOT NULL
and a.username is not null
and c.event not like 'SQL*Net%'
ORDER BY 1,3

(I edited the output for readability)

ORA_USER OS_USE PROGRAM
-------- ------ ------------
SYSTEM oracle imp@ods-d-pr
od1.appliedc
ard.com (TNS

EVENT WAIT_TIME
---------------------------- ---------
db file sequential read, fil 0
e#, block#, blocks
V1-V3)

There are three mirrored sets of redo logs, 15om each

SELECT class, count FROM v$waitstat
WHERE class IN ('free list', 'system undo header', 'system undo block', 'undo header', 'undo block')
order by class

CLASS COUNT
------------------ ------------
free list 0
system undo block 0
system undo header 0
undo block 0
undo header 160

Rollback!!!
ROLLBACK SEGMENT WAITS for DEVL

waits gets Wait Ratio
71 2,131,551 .00003%

The import ran all night then failed this morning on an
ORA-30036: unable to extend segment by 4 in undo tablespace 'UNDOTBS'. Now I'm really mystified, because this table, with 59 million rows, is only about 3.5G in size. During previous imports, I have seen undo segments stretch up to a bit over 4G, then apparently wrap around with no consequence (there is nobody else reading or writing this table). The undotbs has a 4G file which can extend to 16G. In addition, I have seen the "used" size of the undotbs expand and shrink based on activity....last time I looked yesterday it had reached 14.5G and this morning when the job failed (it is rolling back right now) it had maxxed out the file.

Here are the undo settings:
transactions_per_rollback_segment 4
undo_management AUTO
undo_retention 16000
undo_suppress_errors FALSE
undo_tablespace UNDOTBS

Another thing, the table is NOT partitioned, which is because it was never supposed to grow this big, (bad metrics given to dba!) and there is no natural key we can depend on. We are working on an alternate way to hold this data.

I was just looking at v$undostat and noticed this:

UNXPBLKRELCNT UNXPBLKREUCNT EXPSTEALCNT EXPBLKRELCNT
------------- ------------- ----------- ------------
0 0 0 0
0 0 1 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
4608 0 10 0
10752 0 21 0
9728 0 19 0
8192 0 16 0
13312 0 19 0
11264 0 22 0
10752 0 12 0
9216 0 8 0
11264 0 8 0
10752 3 7 0
8704 0 10 0
10752 0 7 0
8704 0 6 0
13312 3 8 0
9216 0 4 0
9216 3 5 0
12800 0 4 0
6144 0 5 0
12800 0 4 0
4676 0 5 2048
10240 0 7 2560
0 0 2 7680
0 0 2 10240
0 0 1 4096

So did this undo failure occur due to numbers shown in the last column? Hmmmm. undo_retention is set for more than 4 hours, yet the import had been running at least 16 hours when it went belly up.

For now, I am going to do a separate export from production with a where clause and populate the table that way, but I would like to further understand both the original question, which is why import performance degraded as it went along, and what caused the import to blow up on undo (we have successfully imported other very large tables).

July 13, 2006 - 9:25 am UTC

this is too much information

all I wanted to know is:

o are there any waits you see in the v$session tables
o do you have "cannot allocate new log" messages in your alert log.

160 waits is insignificant/meaningless from v$waitstat - tells us nothing. No "time" information from it and it is a count since the INSTANCE was started.

Results

Mike, July 13, 2006 - 9:59 am UTC

select sid, event, p1text, p1, p2text, p2 from v$session_wait
/

SID EVENT P1TEXT P1 P2TEXT P2
-- --------------------------- ---------- ------------ --------- --------
1 pmon timer duration 300 0
3 rdbms ipc message timeout 300 0
4 rdbms ipc message timeout 300 0
6 rdbms ipc message timeout 180000 0
7 rdbms ipc message timeout 500 0
12 db file sequential read file# 12 block# 813986
2 db file parallel write requests 153 interrupt 0
5 smon timer sleep time 300 failed 0
9 SQL*Net message to client driver id 1413697536 #bytes 1
8 SQL*Net message from client driver id 1413697536 #bytes 1
11 SQL*Net message from client driver id 1413697536 #bytes 1
18 SQL*Net message from client driver id 1413697536 #bytes 1
22 SQL*Net message from client driver id 1413697536 #bytes 1
23 SQL*Net message from client driver id 1413697536 #bytes 1
25 SQL*Net message from client driver id 1413697536 #bytes 1

The answer to the second question is no.....logs were switching about every 15 - 20 minutes during the import.

July 13, 2006 - 12:57 pm UTC

that is point in time, session_event shows what?

JUST for your session please. measure it, wait a bit, measure again - subtract. It'll show you what that session waited on during that interval (as you are measuring row counts - as the row counts decrease what wait events increase...)

(and this table is not indexed right... that is my assupmption)

How to monitor the table delete prograss?

Jim, July 19, 2006 - 12:10 am UTC

Tom,
I have a long run delete process:
delete tab where col1 < '01-May-06'
/
it takes a few hours to complete.
how can i check how many records have been deleted during the process?

TIA

July 19, 2006 - 9:06 am UTC

that is a strange string to compare to isn't it? Because I know col1 cannot be a date since we never compare dates to strings and strings to numbers and so on.... (just a comment..)

as long as the delete uses a full tablescan, you can monitor it in v$session_longops, not by number of rows deleted, but rather by number of blocks scanned in the table.

Very confused

Elaine, July 25, 2006 - 11:55 am UTC

I have been reading through this thread and saw where you said that updating a few million rows should be able to take place in minutes. I have 400 million rows daily to load. We use insert/append to do the insert (sorry, that's redundant insert/insert...).

I am having a disagreement with my boss. There are 9 indexes on the table. I maintain that as each row is appended, the indexes are updated. He maintains that the index updates do not take place until after the load is complete.

Which of us is correct? I have checked in both your books and in the Concepts manual. Someone pointed me to the following quote:
Indexes are created on one or more columns of a table. After it is created, an index is automatically maintained and used by Oracle. Changes to table data (such as adding new rows, updating rows, or deleting rows) are automatically incorporated into all relevant indexes with complete transparency to the users.

This would imply (I think) that I am right that the indexes are updated as the inserts are occurring, or am I misreading this?

Any links to either chapter numbers in your book or other documentation links greatly appreciated if this has already been covered.

July 25, 2006 - 12:04 pm UTC

neither of you is entirely correct :)

both of you are partitially right! but i'd say he is closer to the truth.

with a direct path load like that - we write the newly added data to the table above the high water mark for the table.

We also build a "mini index" of the newly loaded data off to the side (not in the "big, existing" index yet).

At the end of the append - we merge in BULK the mini-index with the existing index.

So, the existing index is maintained automatically.
But not row by row.
It is done in bulk - efficiently - at the end of the operation.

Great answer!

Elaine H, July 25, 2006 - 12:51 pm UTC

I never understood how that worked. Thanks for the excellent explanation. I still have one thing I don't understand though. If there is a unique index on the table, how does oracle compensate to make sure that the constraint is not violated if it is only building the mini index to merge later?

Also, do you know of any way to speed the load of the mini-index to the main index? Is there any way to influence the manner in which the update/insert is applied the way that you can the initial insert using hints?

July 25, 2006 - 2:31 pm UTC

</code> http://docs.oracle.com/docs/cd/B19306_01/server.102/b14215/ldr_modes.htm#sthref1475 <code>

when it merges - it'll check things out.

your major influence will be pga memory - sort area size if using manual memory management, pga_aggregate_target elsewise.

Parallel DML and GROUP BY

Thiru, September 21, 2006 - 1:29 pm UTC

I do not know if this is an oracle bug or parallel processing is not going to work in this case:

alter table t parallel;
alter session enable parallel dml;
create sequence t_seq;

update
(select rkey,id from t order by id)
set rkey=t_seq.nextval;

The table contains 50mill records
What it looks like after running the above update is that the sequence used was not consistent with the order on column id.

Though I cannot run a test case at this point for your perusal, the parallel procssing looks like ordered the individual records in the splits and was allocating the sequence numbers.

for eg: the data looked like:

rkey id

100 ab
101 ab
102 bc
103 bc

..
..
500 ab

The question is: Is this the way parallel dml works or is it a bug or is something wrong with my update?

Thanks and regards

September 22, 2006 - 2:21 am UTC

I would not expect it to be. each section was ordered nicely, but that is about it. I would expect gaps to be possible and basically rows to be randomly updated.

I would say "something wrong with the thought behind doing such an update" rather than something wrong with the update. I don't see the point in assigning a number like that at all.

Thiru, September 22, 2006 - 5:43 pm UTC

The reason behind doing such an update was :

The flat files that contain data is not sorted. I used external table to load this into the table and could not use a sequence and order the records while doing the load. No idea if that was possible ??

So what I understand now is that parallel dml would work if no ordering or grouping is involved. Because if grouping is involved, then it would be the same case of grouping each splits separatlely. Right?

The other option which I thought of was:
create the table through the external table load.
And then create another table CTAS using the sequence and ordering the base table.

create table t1 as select t_seq.nextval,t.id,t.c2 from t order by id

Is this a good option?

September 24, 2006 - 1:09 pm UTC

umm, but the data is apparently already "orderable" - you just tried to use an order by to assign yet another field to order by.

so, why do you need this second field when one already apparently exists.

Delete / CTAS

Chris, September 27, 2006 - 11:25 am UTC

Tom,
Trying to process all the great info in this thread and would like a little feedback. I have a 20 million row table that gets about 2 million rows deleted every day. The table is not partitioned (let's not get into the why). There is only one index on the table and it is the primary key. Due to the extremely critical nature of the data some folks are very nervous with a CTAS, drop/rename strategy. The database does not use locally managed tablespaces and the current process spawns about 70 threads to delete data and bring specific rollback segments online. For other reasons I won't go into here, the database is bounced every day and this delete process would ideally be compatible with running before and after the bounce (currently it must complete before they bounce). I am trying to benchmark something like this and was looking for your thoughts.

1. CTAS getting only those rows to be deleted (unrecoverable/parallel).
2. Delete x number of rows at a time in parallel.

Looks like:

CREATE TABLE mytab unrecoverable parallel 8 AS
SELECT /*+ parallel(mytab_source,8) */ column1
FROM mytab_source
WHERE ...;

DELETE /*+ parallel(mytab,8) */
FROM mytab
WHERE column1 IN (SELECT column1
FROM mytab_source
WHERE rownum < 50000);

The only two reasons for limiting # of rows deleted would be rollback space and time. Would be happy to do it all at once if this became of no concern. If the whole operation could complete in < 30 minutes that would be very acceptable.

Not exactly sure how I'd loop through 50000 at a time though without deleting from mytab_source as well. In other words after I delete the first 50000, how do I get the next 50000.

Anyhow, some questions.

1. Does this seem reasonable or would you suggest another approach (I'm sure you will)?
2. Not much experience with parallel DML, does the above work as is or do I need to "alter session enable parallel dml" first? I though I had to, but it seemed to run really quick without doing this.
3. I don't believe the table currently has a value specified for freelists, should it, and if so what should it be?
4. Is the delete syntax with parallel correct?
5. Might I be better off with a CTAS for the 18 million rows I want to keep, then build an index in parallel (the first time I tried with logging and not in parallel on 11 million rows it took 34 minutes) and then rename/rename? To appease those that are fearful of dropping the old table I could keep the last version with a different name if there is sufficient space keeping a rolling cycle of two tables.
6. How would you suggest doing the delete x number of rows at a time?
7. What is the difference between unrecoverable and nologging on the CTAS?
8. If you use a drop/rename or rename/rename, packages will go invalid when the original table disappears. All code should recompile itself on first use, correct? Am I sure I won't get "invalid state of packages" on the first attempt, and then a successful use on the next attempt?

September 27, 2006 - 3:51 pm UTC

laughing out loud, but limiting the number of rows deleted - you increase the runtime (which seems to be the converse of what you are trying to achieve)

have you tried "create table as select ROWS TO KEEP", "index" (all of which will require no undo and if you like, no redo), drop and rename?

how does IT perform compared to deleting 10% of the data?

Alexander the ok, September 27, 2006 - 4:54 pm UTC

Why does limiting the number of rows increase runtime?

September 27, 2006 - 5:33 pm UTC

eh? not sure what you are refering to exactly here?

Your last follow up

Alexander the ok, September 27, 2006 - 6:09 pm UTC

But, I've got a LONG column ...

Matt, October 11, 2006 - 1:00 pm UTC

that is used by everyone's favourite CRM application. I've got to get rid of 38 million rows of a 40 million row table. If I understand correctly the conversion from LOB to LONG is a no-no (therefore cannot do a CTAS and then convert back).

So ... how does one delete 90 odd % of a table with a LONG column?

October 11, 2006 - 4:03 pm UTC

sorry, no good answer for you - longs prevent most things.

LONG and a direct path insert

Matt, October 16, 2006 - 5:35 am UTC

Tom,

Apologies for pestering you still, but does having a LONG column also count out the use of a direct path insert?

I'm thinking, create another table for the data we want to keep, do the direct path insert, drop the 'old' table and then rename.

October 16, 2006 - 7:53 am UTC

you cannot insert a long using insert as select, so the question is moot since you cannot use insert with a "as select" and a long.

Application to insert/copy?

Basil, November 01, 2006 - 2:16 pm UTC

I've inherited a database design where we have the need to make copies of rows in one particular table. The system more or less looks like:
drop table c;
drop table u;
drop table pt;
drop table ans;

create table c (
c_id NUMBER PRIMARY KEY NOT NULL,
name VARCHAR2(500));
create table u (
user_id NUMBER PRIMARY KEY NOT NULL,
name VARCHAR2(500));
create table pt (
pt_id NUMBER PRIMARY KEY NOT NULL,
name VARCHAR2(500));
create table ans (
ans_id NUMBER PRIMARY KEY NOT NULL,
name VARCHAR2(500));

create table rp
(rp_id NUMBER PRIMARY KEY NOT NULL,
ans_id NUMBER NOT NULL,
pt_id NUMBER NOT NULL,
c_id NUMBER NOT NULL ,
date_value DATE,
text_value VARCHAR2(4000),
numeric_value NUMBER,
addon_ids VARCHAR2(4000),
cr_dt DATE DEFAULT SYSDATE NOT NULL,
cr_user_id NUMBER NOT NULL,
upd_dt DATE DEFAULT SYSDATE NOT NULL,
upd_user_id NUMBER NOT NULL,
CONSTRAINT rp_c_fk FOREIGN KEY (c_id)
REFERENCES c (c_id) ENABLE,
CONSTRAINT rp_CRUSER_FK FOREIGN KEY (CR_USER_ID)
REFERENCES u (user_id) ENABLE,
CONSTRAINT rp_PRICETBL_FK FOREIGN KEY (pt_id)
REFERENCES pt (pt_id) ENABLE,
CONSTRAINT rp_SRVYRESP_FK FOREIGN KEY (ans_id)
REFERENCES ans (ans_id) ENABLE,
CONSTRAINT rp_UPDUSER_FK FOREIGN KEY (upd_user_id)
REFERENCES U (USER_ID) ENABLE
) ;
CREATE INDEX rp_ac_id ON rp(c_id, ans_id);
CREATE INDEX rp_idx ON rp(ans_id, pt_id, c_id);
CREATE INDEX rp_crsidx ON rp (c_id, ans_id, pt_id );
-- rp_idx or rp_crsidx should probably go away, to minimize duplication
CREATE INDEX rp_pt_idx ON rp(pt_id);

The table RP has about 30 million rows at present. It is unpartitioned. The use of surrogate keys everywhere makes it difficult to find a suitable key for partitioning. When we do a copy, about 600,000 rows (in one particular bad case) need to be duplicated, adjusting the ans_id, c_id, and pt_id values to new ones.

To accomplish the mapping, I created some temporary tables:
create global temporary table c_map (
previd NUMBER PRIMARY KEY NOT NULL, newid NUMBER )
on commit preserve rows;
create global temporary table pt_map (
previd NUMBER PRIMARY KEY NOT NULL, newid NUMBER )
on commit preserve rows;
create global temporary table ans_map (
previd NUMBER PRIMARY KEY NOT NULL, newid NUMBER )
on commit preserve rows;

As I generate new entries for c, pt, and ans, I do multitable inserts, so that the old and new ids get stored in the temp tables. For example:
INSERT ALL
INTO c_map(previd, newid)
VALUES (c_id, c_seq.NEXTVAL)
INTO c(c_id, name)
VALUES (c_seq.NEXTVAL, name)
SELECT c_id, name FROM c WHERE <criteria>;

My thought was then to do something like:
insert into rp (
rp_id, ans_id, pt_id, c_id,
date_value, text_value, numeric_value, addon_ids,
cr_dt, cr_user_id, upd_dt, upd_user_id )
SELECT rp_seq.NEXTVAL,
ans_map.newid, pt_map.newid, c_map.newid,
date_value, text_value, numeric_value, addon_ids,
cr_dt, cr_user_id, upd_dt, upd_user_id )
FROM rp
JOIN ans_map ON rp.ans_id = ans_map.previd
JOIN pt_map ON rp.pt_id = pt_map.previd
JOIN c_map ON rp.c_id = c_map.previd;

On top of the strange execution plans I'm getting for the SELECT portion (which is doing full scans of rp with its 30 million rows, in spite of indexes that should help), the actual insert is taking a REALLY long time.

Can this UPDATE approach be modified to improve the INSERT performance in my scenario?

Oh yes, this is on a 9iR2 system.

How do hints work

Andrew, December 05, 2006 - 2:03 pm UTC

From time to time, developers tell me that the hints they use to influence execution path, like INDEX, do not work and there must be a bug in Oracle. I tell them that sometimes the hint does not take because a HINT is a hint, not a directive and that they work by artificially changing the cost of the query to favor the HINTed path. The developers insist that HINTs should always be taken because they are directives. I looked through the Oracle documentation, and it consistently refers to hints as instructing or directing the optimizer to take a certain path, rather than favoring the hinted path.

I have found no documentation about how hints work internally. Can you tell me where to find information about how these 'execution path' hints work?

December 06, 2006 - 9:29 am UTC

they can only be obeyed when they make sense.

give me an example and I'll tell you what mistake your developers made.

Well, in addition to the mistake of hinting queries at the drop of a hat that is.


Using hints is a dicey proposition - for an emergency bug fix of one or two queries - sure, as a tuning tool - I'd avoid it.



ops$tkyte%ORA10GR2> create table t ( x int, y int );

Table created.

ops$tkyte%ORA10GR2> create index t_idx on t(x);

Index created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> @at
ops$tkyte%ORA10GR2> column PLAN_TABLE_OUTPUT format a72 truncate
ops$tkyte%ORA10GR2> set autotrace traceonly explain
ops$tkyte%ORA10GR2> select /*+ index( t t_idx ) */ * from t order by x;

Execution Plan
----------------------------------------------------------
Plan hash value: 961378228

------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    26 |     3  (34)| 00:00:0
|   1 |  SORT ORDER BY     |      |     1 |    26 |     3  (34)| 00:00:0
|   2 |   TABLE ACCESS FULL| T    |     1 |    26 |     2   (0)| 00:00:0
------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement

<b>the index CANNOT be used in this case, X is nullable - it is not possible to use the index on X since any row with X is null would not be in that particular index</b>

ops$tkyte%ORA10GR2> alter table t modify x not null;

Table altered.

ops$tkyte%ORA10GR2> select /*+ index( t t_idx ) */ * from t order by x;

Execution Plan
----------------------------------------------------------
Plan hash value: 3778778741

------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU
------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     1 |    26 |   827   (1
|   1 |  TABLE ACCESS BY INDEX ROWID| T     |     1 |    26 |   827   (1
|   2 |   INDEX FULL SCAN           | T_IDX |     1 |       |    26   (0
------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement

ops$tkyte%ORA10GR2> set autotrace off

<b>and now it can be</b>


directives can only be followed when they make sense.  If I gave you a directive to levitate six feet off of the floor - you would not be able to do that

Basic how to do simple update in minutes rather then hours

Matthew, January 02, 2007 - 11:21 am UTC

Tom,
Earlier in this thread you demonstrated how you were able to update a million rows in just a few minutes on your laptop.
I am currently developing on a system where it takes over 2 hours to update a single column for 757474 rows, and I am beyond frustrated. I can do a CTAS for millions of rows with a complex sql join of 5 tables in a couple of minutes by modifying my session to use a large sort area and hash area size, but I cannot do a simple update statement (update t set colunmX = 'C') in under a 2 hours.

This is unacceptable as I am doing ETL and my load window needs to be within a 4 hour window. Having an update of a single table take 2 hours is a application killer. The db is at 10.2, but unfortuantely the DBA's are not.

My question is what low hanging fruit should I look for in terms of speeding this processing up, and how can I prove / debug future problems of this nature.

The following the explain plan and autotrace of the update statement (working on getting access to the trace files on the server):
idl@DEVAEETL> explain plan for
2 update idl.srvc_mbr_prmry_dim
3 set current_flag ='I';

Explained.

idl@DEVAEETL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 153652758

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 755K| 1475K| 24004 (4)| 00:04:49 |
| 1 | UPDATE | SRVC_MBR_PRMRY_DIM | | | | |
| 2 | TABLE ACCESS FULL| SRVC_MBR_PRMRY_DIM | 755K| 1475K| 24004 (4)| 00:04:49 |
-----------------------------------------------------------------------------------------

9 rows selected.

idl@DEVAEETL> set autotrace on
idl@DEVAEETL> update idl.srvc_mbr_prmry_dim
2 set current_flag ='I';

757474 rows updated.

Execution Plan
----------------------------------------------------------
0 UPDATE STATEMENT Optimizer=ALL_ROWS (Cost=24004 Card=755486 Bytes=1510972)
1 0 UPDATE OF 'SRVC_MBR_PRMRY_DIM'
2 1 TABLE ACCESS (FULL) OF 'SRVC_MBR_PRMRY_DIM' (TABLE) (Cost=24004 Card=755486
Bytes=1510972)

Statistics
----------------------------------------------------------
960 recursive calls
922410 db block gets
865720 consistent gets
133717 physical reads
253701772 redo size
447 bytes sent via SQL*Net to client
469 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
11 sorts (memory)
0 sorts (disk)
757474 rows processed

January 04, 2007 - 9:55 am UTC

that current_flag is not indexed correct?

trace it - see what the major waits in fact are.

large update for a table

ozjbrown, January 04, 2007 - 5:46 pm UTC

Hi Tom,

I have a student_achievement fact table (over 4 million rows) and need to update a single column in the record for every record contained in this table. In order to determine the value updated, I have a temporary student_school table (approximately 2 million rows) that contains the value for that the column should be updated based upon matching criteria. The update to the fact table takes over 6 hours to complete.

The student_achievement fact table has the following columns -

student_achievement_key integer (pk)
school_year varchar2(8)
student_key integer
course_key integer
mark varchar2(5)
school_key integer

The student_school temporary table has the following columns -

school_year varchar2(8)
student_key integer
school_key integer

A relationship to the student_school to student_achievement is one-to-many. Not every student_school record has a record in the student_achievement fact table. And not every student_achievement fact record will have a record in the student_school table.

The logic used in my PL/SQL procedure to update all records in the student_achievement fact table is as follows -

cursor school_upd_curs is
select *
from student_school a
where exists (select 1 from student_achievement b
where a.student_key = b.student_key
and a.school_year = b.school_year)
order by student_key, school_year;

begin
open school_upd_curs;
loop
fetch school_upd_curs bulk collect
into col... limit 1000;
exit when school_year.count = 0;

for i in school_year.first .. school_year.last loop
if first record fetched for the student then
update student_achievement
set school_key = fetched school_key
where school_year <= fetched school_year;
elsif last record fetched for student then
update student_achievement
set school_key = fetched school_key
where school_year >= fetched school_year;
else
update student_achievement
set school_key = fetched school_key
where school_year = fetched school_year;
end if;
end loop;

commit;

end loop;
end;

Is there a way to speed the update?

Thanks.

January 05, 2007 - 9:43 am UTC

update the join of the tables.

update (select column_to_be_updated, value_to_update_to
from t1, t2
where (join) )
set column_to_be_updated = value_to_update_to;

trace for update that takes 2 hours

Matthew, January 09, 2007 - 12:54 pm UTC

Tom,
Sorry that this took so long to get to you. Had to wait for the DBA's to pull the trace file from the development system for me.

The column being updated is not indexed.

Here is the trace that you requested. Please let me know if you need more information.

update idl.srvc_mbr_prmry_dim
set current_flag ='C'

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.01 0 0 0 0
Execute 1 152.68 6057.11 240133 978321 917824 757474
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 152.69 6057.12 240133 978321 917824 757474

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 68 (IDL)

Rows Row Source Operation
------- ---------------------------------------------------
0 UPDATE SRVC_MBR_PRMRY_DIM (cr=973072 pr=232705 pw=0 time=2145490920 us)
1547044 TABLE ACCESS FULL SRVC_MBR_PRMRY_DIM (cr=978249 pr=237879 pw=0 time=3465511263 us)

Rows Execution Plan
------- ---------------------------------------------------
0 UPDATE STATEMENT MODE: ALL_ROWS
0 UPDATE OF 'SRVC_MBR_PRMRY_DIM'
1547044 TABLE ACCESS MODE: ANALYZED (FULL) OF 'SRVC_MBR_PRMRY_DIM'
(TABLE)

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file scattered read 10919 57.93 5622.86
db file sequential read 10192 8.05 218.19
log buffer space 6 0.98 2.94
log file switch completion 12 0.39 3.54
latch: shared pool 5 0.02 0.06
latch: cache buffers lru chain 3 0.00 0.00
latch: cache buffers chains 1 0.00 0.00
free buffer waits 10 0.01 0.15
buffer exterminate 57 0.98 56.17
latch free 1 0.00 0.00
SQL*Net message to client 1 0.00 0.00
SQL*Net message from client 1 0.03 0.03
********************************************************************************

Any help that you can give me would be appreciated as the DBA's and I are stumped.

January 11, 2007 - 9:38 am UTC

Elapsed times include waiting on following events:
Event waited on                  Times  Max. Wait Total Waited
----------------------------------------  Waited ---------- ------------
db file scattered read              10919    57.93    5622.86

seems rather obvious that you have a potentially serious IO issue there doesn't it? almost a minute for at least one of those IO's to return??

optimize update millions of lobs

Wolfgang, January 10, 2007 - 10:18 am UTC

hi tom,
every year i have to reorganize my picture table (500GB). generealy i do this like this way:
1. make everything offline
2. CREATE TABLE dbildtmp AS SELECT cid, bildtn, text FROM dbild ;
3.) ALTER TABLE DBILD SET UNUSED (text);
ALTER TABLE DBILD SET UNUSED (bildtn);
4. ALTER TABLE DBILD DROP UNUSED COLUMNS;
5. ALTER TABLE dbild ADD (bildtn BLOB) LOB (bildtn)
STORE AS ( DISABLE STORAGE IN ROW TABLESPACE NGEN_2K_LOB );

ALTER TABLE dbild ADD (text CLOB)
LOB (text) STORE AS ( DISABLE STORAGE IN ROW TABLESPACE NGEN_2K_LOB );

6. update the new created collumns:

DECLARE
CURSOR cur IS SELECT * FROM dbildlobs;
v_anz NUMBER := 0;
BEGIN
FOR rec IN cur LOOP
UPDATE DBILD
SET bildtn = rec.bildtn , text = rec.text
WHERE cid = rec.cid;
IF v_anz = 1000 THEN
COMMIT;
v_anz := 0;
END IF;
v_anz := v_anz +1;
END LOOP;
COMMIT;
END;
/

7. make everything online

please could you tell me how i can reduce the runtime from point 2 and 4 and particularly point 6 ?
i tryed to modify the procedure with bulk collect and forall. although it doesnt work faster ( 6 hours )

here is my modified procedure:
CREATE OR REPLACE PROCEDURE Update_Dbild2 IS

TYPE t_text IS TABLE OF DBILDLOBS.TEXT%TYPE INDEX BY BINARY_INTEGER;
TYPE t_bildtn IS TABLE OF DBILDLOBS.bildtn%TYPE INDEX BY BINARY_INTEGER;
TYPE t_cid IS TABLE OF DBILDLOBS.cid%TYPE INDEX BY BINARY_INTEGER;
l_text t_text;
l_bildtn t_bildtn;
l_cid t_cid;
CURSOR cr_rec IS SELECT cid,bildtn,TEXT FROM DBILDLOBS WHERE ROWNUM < 1000;

BEGIN
OPEN cr_rec;
FETCH cr_rec BULK COLLECT INTO l_cid,l_bildtn,l_text;
CLOSE cr_rec;

FORALL i IN l_cid.FIRST .. l_cid.LAST
UPDATE DBILD
SET TEXT = l_text(i), bildtn = l_bildtn(i)
WHERE cid = l_cid(i);
END;
/

Large Update on table

ozjbrown, January 10, 2007 - 10:28 pm UTC

Hi Tom,

I am lost as to what you are recommending I am changing in my procedure that uses the bulk fetch and loops through to update another table with the value.

Would it be possible to use my example as to where I need to make the change?

Thanks.

Performance Issue!! Commit Strategy

Prashant Pathak, January 11, 2007 - 9:41 am UTC

Hi Tom,

I have a table which contains following columns

ALLOY itm OpSeq DEPT DAYS offset grp_id
Alloy 102618 9999 PROC 0 0 518346
Alloy 102618 180 SGP 5 0 518346
Alloy 102618 170 QA 1 0 518346
Alloy 102618 160 LAB 0 0 518346
Alloy 102618 150 TRP 1 0 518346
Alloy 102618 140 QA 1 0 518346
Alloy 102618 130 QA 1 0 518346
Alloy 102618 120 WP 5 0 518346
Alloy 102618 110 BA 2 0 518346
Alloy 102618 100 TRP 1 0 518346
Alloy 102618 90 QA 1 0 518346
Alloy 102618 80 QA 1 0 518346
Alloy 102618 70 QA 1 0 518346
Alloy 102618 60 MW2 3 0 518346
Alloy 102618 50 MW1 0 0 518346
Alloy 102618 40 M2M 0 0 518346
Alloy 102618 30 M2M 3 0 518346
Alloy 102618 20 PROC 1 0 518346
Alloy 102618 10 PROC 0 0 518346
Alloy 102963 9999 PROC 0 0 518346
Alloy 102963 90 CHP 1 0 518346
Alloy 102963 80 SG 5 0 518346
Alloy 102963 70 CHP 1 0 518346
Alloy 102963 60 PM 6 0 518346
Alloy 102963 50 PM 6 0 518346
Alloy 102963 40 TRP 1 0 518346
Alloy 102963 30 AIM 0 0 518346
Alloy 102963 20 AIM 0 0 518346
Alloy 102963 10 PROC 0 0 518346

Now i have to update the table for offset column.
starting with opseq 9999 and deduct days from previous row and updated offset column

so for opseq 9999 at first step it will be 0

ALLOY itm OpSeq DEPT DAYS offset grp_id
Alloy 102618 9999 PROC 0 0 518346
Alloy 102618 180 SGP 5 0-5=-5 518346
Alloy 102618 170 QA 1 -5-1=-6 518346
Alloy 102618 160 LAB 0 -6-0=-6 518346
Alloy 102618 150 TRP 1 -6-1=-7 518346
.
.
.

so i have to do this operation based on grp_id and this table has about 50,000 rows.
So my question is sholud i commit the transaction for each grp_id and go forward or should finish for all and at the end commit.

becase currently i am doing all deduction and update and after that only one commit and which is taking long time.

Thanks in advance

Re: update taking over 2 hours

Matthew, January 11, 2007 - 2:05 pm UTC

"Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file scattered read 10919 57.93 5622.86

seems rather obvious that you have a potentially serious IO issue there doesn't it? almost a minute for at least one of those IO's to return?? "

I agreee that they do have an IO problem, the SAN is set as Raid 5. I do have a couple of follow up questions in regards to this and hope you can provide some insight:

On the Trace file execution plan it shows:
Rows Execution Plan
------- --------------------------------------------------
0 UPDATE STATEMENT MODE: ALL_ROWS
0 UPDATE OF 'SRVC_MBR_PRMRY_DIM'
1547044 TABLE ACCESS MODE: ANALYZED (FULL)
OF 'SRVC_MBR_PRMRY_DIM' (TABLE)
1. The actual update statement only updated 757474 rows. Why does the execution plan show over twice the number of rows then what were actually updated?
2. There where 2 db read wait events with similar number of event counts:
db file scattered read 10919 57.93 5622.86
db file sequential read 10192 8.05 218.19
for the single update statement of the table.
Could this be caused by the width of the table making the actual full table scan into the buffer cache making multiple passes over the disc. The first pass using a sequential read and the second or subsequent passes doing a scattered read to match the data in the buffer to the data on the disc?
3. Would it be worth testing the following scenerios to see if I can reduce db scatter read:
a. Creating the very wide table with a larger block size and own SGA buffer size?
b. Increasing the db_file_multiblock_read_count at the session level to higher then 32?

Appreciate your input and look forward to your next book.

plsql is faster then sql

A reader, January 22, 2007 - 2:18 pm UTC

Hi Tom.
whats wrong with my database.
it seems that pl/sql is faster then sql

SQL> create table test nologging as select * from dba_objects;

Table created.

Elapsed: 00:00:01.53
SQL>
SQL> create index test_pk on test(object_id);

Index created.

Elapsed: 00:00:00.41
SQL>
SQL>
SQL>
SQL> declare
2 l_icnt number := 0;
3 l_ucnt number := 0;
4 begin
5 for x in ( select * from test )
6 loop
7
8 update test set status = 'ok' where object_id= x.object_id;
9 l_ucnt:=l_ucnt+1;
10
11 end loop;
12 dbms_output.put_line ('rows updated :'||l_ucnt);
13 end;
14 /
rows updated :46087

PL/SQL procedure successfully completed.

Elapsed: 00:00:11.20
SQL>
SQL>
SQL> commit;

Commit complete.

Elapsed: 00:00:00.05
SQL>
SQL> update test set status='ok';

46087 rows updated.

Elapsed: 00:00:17.80
SQL>
SQL> commit;

A reader, January 23, 2007 - 2:27 pm UTC

Why do you think something is wrong with your data base? I tried your example in the same order and the SQL test ran faster than the PL/SQL test, but I would expect that because the SQL test has the benefit of the now pre-loaded buffers. Did you run the test in the order as presented? Was your database busy with other things at the time of the test, or was it dedicated to your test? Same question can be asked of the server and the disk system. There can be a lot of variables that affect results like this, especially for such a small, short test.

Updating 1M records through bulk update take long time

RK, May 11, 2007 - 9:56 am UTC

I have an update statement provided below which is updating 1 million records
in lead_master table joining with a customer_table having around 5 million records.

This is taking more than 2 hrs. I could not figure out the reason.There are no sessions running parallely.

UPDATE lead_master  
SET customer_gender_cd = (
          SELECT 
          DECODE (customer_gender_cd, 
             NULL, NULL,
             '0' ||customer_gender_cd) FROM customer_table
          WHERE customer_table.customer_id = lead_master.customer_id   
  )              
  WHERE  lead_master.customer_id  IS NOT NULL;

May 11, 2007 - 11:44 am UTC

because you may well be executing that subselect 1,000,000 times

you don't say if this is indexed
what release
what version
what plan
what anything really.

update ( select a.customer_gender_cd a_cd,
                b.customer_gender_cd b_cd
           from lead_master a, customer_table b
          where a.customer_id = b.customer_id(+) )
  set a_cd = nvl2( b_cd, '0'||b_cd, null );

outer join if and only if necessary of course.

Updating 1M records through bulk update take long time

RK, May 11, 2007 - 9:56 am UTC

UPDATE lead_master  
SET customer_gender_cd = (
          SELECT 
          DECODE (customer_gender_cd, 
             NULL, NULL,
             '0' ||customer_gender_cd) FROM customer_table
          WHERE customer_table.customer_id = lead_master.customer_id   
  )              
  WHERE  lead_master.customer_id  IS NOT NULL;

RK, May 14, 2007 - 10:08 am UTC

Oracle Release is Oracle9i Release 9.2.0.4.0 - 64bit Production
The suggested modified SQL was much useful.
It got executed in 6 minutes.

Thanks a lot.

sammy, May 21, 2007 - 8:01 am UTC

Hi Tom,

I am trying to duplicate your May 11th followup.

I am getting the following error...ORA-01779: cannot modify a column which maps to a non key-preserved table.

UPDATE (select a.oldCol a_oldCol,
b.newCcol b_newCol
from table1 partition(part1) a,
table2 b
where a.id = b.id)
SET a_oldCol = b_newCol

b.id is indexed .
a.id is indexed (local).

I also want to use the nvl2 function you use¿can you please explain
nvl2( b_cd, '0'||b_cd, null )

as opposed to...

nvl2( b_cd, b_cd, null ) ??

Thanks in advance for you advice.

I am using Oracle9i Enterprise Edition Release 9.2.0.6.0 - 64bit Production

May 21, 2007 - 11:10 am UTC

give full example - you need a CONSTRAINT (primary key or unique) on table2(id)

Direct-path INSERT and TEMP Tablespace

A reader, June 06, 2007 - 12:23 pm UTC

Hi Tom,

I am using direct-path INSERT to copy the contents of source_table to destination_table.

When I enable PDML and the INSERT runs in parallel, I observe the use of sort_usage, temp tablespace filling up and, at the same time, nothing being written to the tablespace containing the destination table. Eventually, temp tablespace runs out of space.

However, when I don't enable PDML and the INSERT runs in serial, Oracle starts writing to the temp segments in the destination table's tablespace as soon as the source result set is available. In this scenario, it never uses any sort segments in temp tablespace.

Why is temp tablespace being used during PDML?

As you can see from the query, there are no sorts or joins -- it's just a straight select *. There are no indexes on the destination_table. I thought PDML would work similarly to serial DML (as far as space allocation goes). Shouldn't each parallel execution sever just pick a partition in destination_table start allocating extents in the destination_table's tablespace.

Here is what I run:

alter table destination_table nologging;
insert /*+ append */ into destination_table
select /*+ parallel(s 1) full(s) */ * from source_table@link s;
commit;

The same behavior can be observed when both tables are in the same database (and the select doesn't use the link).

Environment:
Oracle 9.2.0.6
source_table and destination_table are partitioned
source_table is ~400GB in size
temp tablespace is 40GB
there are no indexes on destination_table

Thank you in advance for your help.

June 06, 2007 - 9:22 pm UTC

full disclosure - show the plans, show the queries you use to monitor this, show everything we need to see what you see.

and you do know that the dblink will serialize right, so the parallel is extremely questionable - you have a parallel query getting serialized over the dblink to return the data.

Deleting 1-2 million of records from 300 millions records

Franklin, June 20, 2007 - 8:27 pm UTC

Hi Tom,

In one of my project I have a table which contains more than 400 million records, everyday one of my procedure will execute which will delete 1 million records and insert new 1 million records, some time it may no delete any single records, but insert fresh 1 million records, I have tryied no. of way but performance is vary bad, can u guide me.

My table : orders having columns
ord_id,
inv_id,
inv_date
col1,
col2,
...

my delete will be based on ord_id, inv_id, inv_date

delete from orders a
wHERE NOT EXISTS (SELECT 1 FROM ordertemp b
WHERE a.ord_id = b.ord_id
and a.inv_id = b.inv_id
and a.inv_date between b.sdate and b.edate)

My orders tables generated all records based on ordtemp which contains sdate and edate, now suppose this period has changed for any order then i have to delete this data and insert data with new period..

June 21, 2007 - 10:23 am UTC

what indexing strategies have YOU tried.

what is "performance is very bad" - what are you observing and what are your expectations.

have you considered MERGE instead of delete+insert (eg: update existing, insert new)

"insert into" using a cursor

ht, August 17, 2007 - 5:44 pm UTC

Hi Tom,
I took your advice and, instead of an update that ran for days, am using "create table as select * from".

I'm also using pl/sql with a cursor to insert additional rows into the new table and am wondering if there is a more efficient way to perform the insert. The pl/sql is trying to bring over the bare minimum of recs by first finding a "down" rec, getting only the next "up" rec and inserting it:

>declare
2
3 cursor c1 is
4 select
5 id,op,db
6 from
7 tbl1
8 where
9 status='down'
10 order by
11 id,
12 op,
13 db;
14
15 begin
16
17
18
19 for x in c1 loop
20
21 insert into tbl2
22 (
23 select
24 *
25 from
26 tbl1
27 where
28 id=
29 (
30 select min(id)
31 from
32 tbl1
33 where
34 id>x.id
35 and op=x.op
36 and status='up'
37 and db=x.db
38 )
39 );
40
41
42
43 end loop;
44
45 end;
46
47 /

August 22, 2007 - 9:48 am UTC

would need to know more, about the data.

if you are trying to get the next "up" record after each "down" record - next being assigned by ordering by ID for each op/db - there are better ways, sure

insert into tbl2
select * 
  from (
select tbl1.*, 
       lag(status) over (partition by op, db order by id) last_status 
  from tbl1
 where status in ( 'up', 'down' )
       )
 where status = 'up' and last_status = 'down'

would get the "next up record after each down record by OP and DB" for example

Works great - should an index be created?

ht, August 22, 2007 - 3:16 pm UTC

Tom,
As always, you helped tremendously. My process is taking 3 minutes instead of 2 hours. Given I'm new to "lag" and "partition" statements, would adding an index speed up the process? If yes, which columns would you index on? I know full table scans are not a bad thing - just wondering if I can get tune the query further with indexes since I'll be running this 200+ times.

August 23, 2007 - 10:44 am UTC

unless the 'up' and 'down' records represent a tiny portion of ALL records - no index, we want a nice juicy full scan.

do they?

why are you running this 200 times?

once for each set of recs

ht, August 24, 2007 - 4:33 pm UTC

Tom,
Thanks for the advice. The fts is appropriate in this case and I have to perform the operation 200+ times since I was testing only a subset of the data. In reality, I just removed a predicate to operate on all the recs and it works great!

How to Load (6M+ Records x 300 Values ) between Databases

Saravanan Ramalingam, August 30, 2007 - 12:04 pm UTC

Hi Tom,

I have a requirement where I need to load around 6M+ records on daily basis from source database on which every record will hold a minimum of 300 different values. (Source DB will throw only 6 values and the others are permutation and combination of them which will be calculated on the fly). This will go up to 20 Million records per day in future.

The constraints are:
1. Currently I receive 300 different values to be stored (Per Record), which is likely to be increase to 1000 and max up to 25000 different values. This value is for only one record and the total number of records which I receive is around 6M per day.

2. As per the SLA the load window is one hour

3. I need to retain 10 years of data inside active database (10Yrs * 365 Days * 6M Records) to be stored and (10 Yrs * 365 Days * 6M Records * 300 Different Values) needs to be holded

Also I would like to know how to take care of this from the Performance, Storage and Hardware stand point for Oracle.

I have tried all theoretical solutions which are available on the net, they speak only about partitioning, compressing. But I have not found a solution to handle 300 columns (which can be extended up to 10000 in future).

Thanks in Advance,

Regards,

Saravanan Ramalingam

September 04, 2007 - 5:38 pm UTC

by 300 different values, do you mean "columns", "attributes"?

loading 20 million records will take minutes.

you will want to utilize partitioning to permit you to age the data out over time.

I see you meant 300 columns, which can theoretically go to 1000 but will never go to 25,000. The maximum number of attributes in a table is 1,000.

but - why is 300 columns "a sticking point", what are you looking for to address that? so what if you have 300 columns - why is that relevant to loading data?

is this practible ?

Karin, September 01, 2007 - 7:28 am UTC

Hallo Tom,

do you think, this is still the better way in our case ? the table contains 40 million sentences, has got a lot of foreign keys and we need to update a numeric field from n to n+1 for half of the sentences. And we have got a partitioned table and a partitioned index on that table.

Thanks for your respond, I already got a lot out of your answers.

Karin

September 05, 2007 - 12:42 pm UTC

I don't know what a sentence is in this context?

sorry, language problem ;-))

A reader, September 12, 2007 - 9:56 am UTC

sentence = row

September 15, 2007 - 4:41 pm UTC

go ahead and update it?

you wrote:

...
the table contains 40 million sentences,
has got a lot of foreign keys and we need to update a numeric field from n to n+1 for half of the
sentences. And we have got a partitioned table and a partitioned index on that table.
....

40 million rows - understood.

foreign keys - so what? You don't relate that to this column at all, so not sure what relevance it has.

partitioned - again, not sure of the relevance

indexes - you don't say that THIS column is indexed, so not seeing the relevance.

A delete scenario

A reader, September 19, 2007 - 2:11 pm UTC

Hi Tom,

I have a table containing 20 Million rows. The table has a composite index (3 columns). The application continually inserts into the table and also does delete from the table every 1 hour through independent delete statements(1 per row). The delete operation is pretty slow and we see a lot of waits on "db file sequencial access" during the delete operation. The delete is based on the indexed column values.

Please suggest a solution which will help in having better performance.

September 20, 2007 - 9:30 pm UTC

define pretty slow, got a tkprof?

the delete must update ever index, not just the one it reads, the single block IO's are likely due to that.

slow by slow processing should be avoided, it is always the slowest way to process data. think "sets"

ENABLE PARALLEL DML" and TEMP tablespace

Florin, September 20, 2007 - 12:05 pm UTC

Hi Tom,

Would you please explain me why the "ALTER SESSION ENABLE PARALLEL DML" is consuming TEMP tablespace?

If I'm not using "alter the session enable parallel dml" Oracle will use anyway the paralellism from the hint (4)

Example:

--ALTER SESSION ENABLE PARALLEL DML;
INSERT /*+ PARALLEL ( BL1_CUSTOMER_INFO_TEST 4) APPEND */
INTO BL1_CUSTOMER_INFO_TEST
SELECT /*+ PARALLEL ( BL1_CUSTOMER_INFO 4) FULL (BL1_CUSTOMER_INFO ) */
*
FROM BL1_CUSTOMER_INFO;

In this case the table BL1_CUSTOMER_INFO has more than 200 million rows and over 50 GB.

If I'm leaving the "alter session enable parallel dml" it will fail on TEMP tablespace, while if I'm disabling the "alter session .." it will go direct to the tablespace of the BL1_CUSTOMER_INFO_TEST.

Many thanks in advance!

September 24, 2007 - 7:32 am UTC

are you sure it is failing on TEMP or is it failing to acquire a temporary extent.

big difference. Let us see the precise error message please.

and what indexes have you on this table.

ENABLE PARALLEL DML" and TEMP tablespace

Florin, September 26, 2007 - 9:25 am UTC

Hi Tom,

I did another test (this time in our test database not like previous one in Prod).
I have a table (TEST1) with 25 million rows (1.7 GB).

AIM_DBA>create table test2 nologging tablespace pool_ix as select * from test1 where 1=2;

Table created.

AIM_DBA>alter session enable parallel dml;

Session altered.

AIM_DBA>insert /*+ append parallel (test2 2) */ into test2 select /*+ parallel (test1 2) */ * from test1;

ORA-12801: error signaled in parallel query server P000
ORA-01652: unable to extend temp segment by 128 in tablespace TEMP

If I'm doing the same thing without "alter session enable parallel dml" :

AIM_DBA>insert /*+ append parallel (test2 2) */ into test2 select /*+ parallel (test1 2) */ * from test1;

25192932 rows created.

Elapsed: 00:03:31.74

Please note:
No indexes in both tables.
I'm using Oracle 9.2.0.5.0 - 64 bit

I'm very curious to hear your opinion.

Thanks!
Florin

September 26, 2007 - 9:39 pm UTC

i've bookmarked and will look at this when I get back (and have a server whereby I can do parallel on)

Update Query

Pete, October 05, 2007 - 9:09 pm UTC

Hi Tom,

We have a query written like mentioned below

update table a set column B = some value where id = :id;

commit;

here column ID is a primary key

This query is written inside a package and is called each time when a user logs into a system. At one time the system ran slow when about 100 persons tried to access the system and the DBA at the client side says this is the wrong way to write the query. It should be written like this (see below);
---- below is DBA reply --
"Like about a 100 people,so this table gets updated frequently so there is a need to avoid locking the index when caching.Your related rowid and using that to bypass any index locking ,access is so important"

he suggest the following way :-

Set transaction read only;

Select rowid into dbky from table a where id = :id;
(best done in a package w. package variables)

Commit;

Return Success;

Then when you need to update
Update table a SET column b = some value where rowid = :dbky; <- no locking index

Commit;
-===========================

My questions is which is the better way ? Rowid or using the primary key as written earlier and why ?

Thanks and appreciate your feedback

Upsert

Antony, October 10, 2007 - 10:16 am UTC

I do have some issues with the merge. Currently in my database there are different schemas where the datas are manipulated with differenet applications. There is one schema which called USAGE is used to maintain the login details etc from differnet application. Currently there is a package for manipulating which is used to fetch the values from different tables in diferenet schema. The procedure basically first deletes the value from tables in USAGE corresponding to the values in source table deleted and updated then it tries to insert the data. The most problem is that during insert and delete statment it takes a hell lot of redo log files due to 1- 32 millions of data in the source table. I reviewed so many comments in u r site and found there are 2 solutions either use Merge statment or drop the table and create a table with nologging then insert the statement that won't be increasing the redolog files. Could you please suggest to which i should proceed

When its done?

Karthick, October 26, 2007 - 3:20 am UTC

After reading this thread I figured out one thing. BIG TABLE makes people crazy.

But my question is; when you decide you are done with it for now?

Say you got an update which took 1 hour. Ok you are clever and you made it to run in 15 min. But is 15 min is the optimal time for now? How would you decide that? Because we cant keep on testing and trying. Things need to go to production right!!

October 29, 2007 - 10:33 am UTC

flip the question around.

what are the requirements for it to complete. how fast does it NEED to execute.

If you always move half way to a wall (eg: you start at 10 feet away and get to 5 feet away. Then you get to 5/2 feet away, then 5/2/2 feet and so on) you never actually get to the wall (sure you get very very close...)

you have to know when you can stop.

Can you show how you do that.

karthick pattabiraman, November 05, 2007 - 7:21 am UTC

you said

"it only takes a couple of minutes to update 5 million rows on my desktop pc -- so, there must be
something more involved here.

I'd be able to do that in minutes -- why is it taking you so long -- what is your bottleneck. "

with source table having 1000000 records and target table having 5000000 records where all the records in target table are updated by source.

Can you show how you do it in couple of min. I think it will definitly take lot of time.

Just want to see how you create your index and how you perform your update and so on..

I just want to see that being done with UPDATE statement not CREATE TABLE... SELECT * FROM..

Because i cant drop a table :-(

November 06, 2007 - 9:04 am UTC

"i think it will definitely take lot of time"

why do you "think" that, why don't you "try that"?

what index would I create? If I am going to update 5 million out of 5 million rows, I am not going to use an index.

update t set col = 'value';

that is how the update would go? Nothing special.

I tried

karthick pattabiraman, November 06, 2007 - 9:19 am UTC

Yes I did try. My source table has 1 million records and target has 2 million records.

So my update was like this simple

update target_table
set target_col_1 = (select source_col_1
from source_table
where source_col_2 = target_col_2)
where exists ( select null
from source_table
where source_col_2 = target_col_2)

On seeing the explain plan I was able to identify that

Target table was going for a full table scan and source_table was going for an index range scan.

So I "think" the update will be executed in this way.

for i in (select * from target_table) --> full table scan
loop
select source_col_1
from source_table
where source_col_2 = i.taget_col_2 --> index range scan

replace i.target_col_i with source_col_1
end loop

I don¿t have any index on target_table as you said any way its going to be a full table scan. I have an index on source_col_2.

So when I did the update I took more than 1.5 hour (I don¿t know the exact time as I killed the session) so I enabled parallel dml and ran the same and it completed in 20 min.

Would you suggest parallel dml for such updates? Or any other things can be done before going parallel.

And after that I changed the source from 1 million to .1 million and ran the update in parallel again. Again it took around 18 min.

So is it because source_table is going for index range scan it doesn¿t really matter how many rows are there in source_table.

November 06, 2007 - 9:39 am UTC

update (select data_from_src, data_from_tgt from src, tgt where src.key = tgt.key)
set data_from_tgt = data_from_src;

update a join, don't do all of those correlated subqueries.

no indexes should be involved although src needs to have a unique constraint on the joined to columns.

Thanks a lot

karthick pattabiraman, November 07, 2007 - 2:17 am UTC

What to say as the world says you "ROCK"..

my 20 min query that was running parallel just completed in 3 min without parallel when i changed the update the way you suggested.

can you explain us how oracle internally processes this update.

Once again thank you so much for all your help.

November 07, 2007 - 5:13 pm UTC

it just joins.

Stuart Horsman, November 08, 2007 - 6:24 am UTC

Tom,

If we're doing an application release and updating multiple tables and multiple indexes with millions of rows, is it best to:

1) Put the database into no archivelog mode.
2) Alter all tables and indexes nologging.
3) Run inserts and updates.
4) Alter all table and indexes logging.
5) Put database back into archive log mode

Regards

Stuart

November 08, 2007 - 10:15 am UTC

nologging would have no effect.

nologging works for large direct path operations such as:

o create table as select
o create index
o insert /*+ append */ (when the append hint is actually followed, can be followed) - but only for the TABLE, not for any indexes.

nologging will not affect

o insert, update, delete, merge (without append on insert or merge....)
o index maintenance

and in noarchivelog mode, nologging is not necessary, many direct path operations will not generate log in general (create table as select, insert /*+ APPEND */).

If you are not updating an indexed column, just update - try to make sure you ONLY update a table once (eg: do not update col1, then col2 then col3 - just make a single pass).

If you are updating an indexed column - consider making the index unusable and rebuild (with or without logging, up to you).

If you are updating such that you think there might be lots of migrated rows (rows grow in size), you might

o create table as select <modified data>
o drop old table
o rename new table
o re-index new table

compound key

Karthick., November 21, 2007 - 5:24 am UTC

When the column in the source is not unique how to do the same update.

say source COL1+COL2 is unique. i join both column to target table and update target with COL2. But COL2 alone is not unique. And because i have used both COL1 and COL2 in the join condition it ensures to return only one value. But still i cant do join update. In such case what kind of update is better..

is it..

update t set t_c1 = (select s_c1 from s ...)
where exists(select null from s...)

Why 'created' instead of 'Inserted'

chandra, December 11, 2007 - 3:25 pm UTC

Tom, 
Please the below..
====
SQL> insert into tmp_is_oracle_supplier_match
  2       (
  3          otis_vendor_num,
  4          otis_vendor_name, 
  5          otis_tax_id,  
  6          status_code,  
  7          comments
  8       )   
  9     select attribute2,
 10      vendor_name,
 11      num_1099,
 12      0,
 13      'Valid Record' 
 14     from 
 15      ap_suppliers_int 
 16     where status is null
 17      and REJECT_CODE is null 
 18      and 1=2;

0 rows created.

===

When I try to insert, with the above command, why it says "Created" instead of "Inserted" ??

December 11, 2007 - 9:41 pm UTC

because sqlplus likes to say "created" when you use insert.

Locking conflict

Feng, December 12, 2007 - 8:28 pm UTC

Hi Tom,

Here is my case which is on 9.2.0.8. On daily basis, we need to batch update few million rows in a 30-million row table (t1) using something like:

update
( select t1.a, t2.b
from t1,t2
where t1.fk=t2.pk
)
set a=b;

t2 is a table with tens of millions rows as well. Usually this takes few hours. We also have small amount real time updates against a single record in t1 from web user. If real time user update hit the same record being updated by batch update process, the real time user update process will wait and eventually time out if batch update still not finish. In order to avoid/minimize such locking conflict, we are thinking of reducing the size of batch update using PL/SQL. Is there other way to solve this problem?

Thanks in advance.

December 13, 2007 - 9:40 am UTC

don't do the update.

Why no join to retrieve the correct value

Locking Conflict

feng, December 13, 2007 - 12:52 pm UTC

Tom,

Due to our business model, t2 is a temp table which contains periodic update from our vendor that will be used to update our permanent table t1, thus we can not join to retrieve value as you suggested.

Any alternative in this case besides reducing size of update statement either in SQL or PL/SQL?

Thanks.

December 13, 2007 - 2:24 pm UTC

well, before I even begin to describe how to do this I want you to say yes or no in no uncertain terms to the following question:

What happens if another process reads a record that was updated and committed, and another record that was not yet updated (or was updated but has not yet been committed) during your "batch update". Will the fact this other process will be seeing data that heretofore was IMPOSSIBLE to see (they would never see two records that are inconsistent with respect to each other) be ok. The fact that data integrity is going to be compromised here - is that something your applications can actually live with, deal with, not have a problem with.

Because that is what you are suggesting to do - to have PART of your update be seen by every one before the entire update itself is complete.

also, we need a flag field or the ability to delete the record in t2 as we process it - so we don't process the same record more than once in the event (which will happen) of a failure. Do we have such a field that we can update in t2 as we process it - or can we delete the record from t2.

Locking Conflict

feng, December 13, 2007 - 4:58 pm UTC

Tom,

Thanks for your prompt follow up.

1) Data integrity is not a issue here as records are independent from each other so they don't have to be updated at same time.

2) We don't delete records in t2, but we can add a flag field to it if it's needed.

Please let me know if you need additional information. Thanks.

December 14, 2007 - 12:51 pm UTC

ops$tkyte%ORA10GR2> create table t1 as select * from all_users;

Table created.

ops$tkyte%ORA10GR2> alter table t1 add constraint t1_pk primary key(user_id);

Table altered.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create table t2 as select user_id, lower(username) username, 'N' processed from all_users where mod(user_id,2) = 0;

Table created.

ops$tkyte%ORA10GR2> create index t2_idx on t2(processed);

Index created.

ops$tkyte%ORA10GR2> select * from t1;

USERNAME                          USER_ID CREATED
------------------------------ ---------- ---------
BIG_TABLE                              58 14-DEC-05
DIP                                    19 30-JUN-05
TSMSYS                                 21 30-JUN-05
LOTTOUSER                              65 30-DEC-05
MDDATA                                 50 30-JUN-05
OPS$ORA10GR2                           56 14-DEC-05
FOO$TKYTE                              60 19-DEC-05
QUOTA                                  94 22-FEB-06
AQ                                    228 15-OCT-07
R                                      76 09-JAN-06
OPS$TKYTE                             233 16-NOV-07
SCOTT                                  84 12-FEB-06
B                                     213 08-JAN-07
RT_TEST                               231 29-OCT-07
A                                     221 04-SEP-07
TEST                                  237 16-NOV-07
DMSYS                                  35 30-JUN-05
DBSNMP                                 24 30-JUN-05
WMSYS                                  25 30-JUN-05
EXFSYS                                 34 30-JUN-05
CTXSYS                                 36 30-JUN-05
XDB                                    38 30-JUN-05
ANONYMOUS                              39 30-JUN-05
OLAPSYS                                47 30-JUN-05
ORDSYS                                 43 30-JUN-05
ORDPLUGINS                             44 30-JUN-05
SI_INFORMTN_SCHEMA                     45 30-JUN-05
MDSYS                                  46 30-JUN-05
SYSMAN                                 51 30-JUN-05
PERFSTAT                              148 31-MAR-06
SYS                                     0 30-JUN-05
SYSTEM                                  5 30-JUN-05
OUTLN                                  11 30-JUN-05
MGMT_VIEW                              53 30-JUN-05
MY_USER                               211 09-NOV-06

35 rows selected.

ops$tkyte%ORA10GR2> select * from t2;

   USER_ID USERNAME                       P
---------- ------------------------------ -
         0 sys                            N
        24 dbsnmp                         N
        34 exfsys                         N
        36 ctxsys                         N
        38 xdb                            N
        44 ordplugins                     N
        46 mdsys                          N
        50 mddata                         N
        56 ops$ora10gr2                   N
        58 big_table                      N
        60 foo$tkyte                      N
        94 quota                          N
       228 aq                             N
        76 r                              N
        84 scott                          N
       148 perfstat                       N

16 rows selected.

ops$tkyte%ORA10GR2> declare
  2      type numArray is table of t2.user_id%type;
  3      type vcArray is table of  t2.username%type;
  4      l_userids    numArray;
  5      l_usernames vcArray;
  6  begin
  7      loop
  8          update t2
  9             set processed = NULL
 10           where processed = 'N'
 11             and rownum <= 10
 12       returning user_id, username
 13            bulk collect into l_userids, l_usernames;
 14
 15          exit when l_userids.count = 0;
 16
 17          forall i in 1..l_userids.count
 18              update t1 set username = l_usernames(i) where user_id = l_userids(i);
 19          commit;
 20        exit when l_userids.count < 10;
 21      end loop;
 22  end;
 23  /

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> select * from t1;

USERNAME                          USER_ID CREATED
------------------------------ ---------- ---------
big_table                              58 14-DEC-05
DIP                                    19 30-JUN-05
TSMSYS                                 21 30-JUN-05
LOTTOUSER                              65 30-DEC-05
mddata                                 50 30-JUN-05
ops$ora10gr2                           56 14-DEC-05
foo$tkyte                              60 19-DEC-05
quota                                  94 22-FEB-06
aq                                    228 15-OCT-07
r                                      76 09-JAN-06
OPS$TKYTE                             233 16-NOV-07
scott                                  84 12-FEB-06
B                                     213 08-JAN-07
RT_TEST                               231 29-OCT-07
A                                     221 04-SEP-07
TEST                                  237 16-NOV-07
DMSYS                                  35 30-JUN-05
dbsnmp                                 24 30-JUN-05
WMSYS                                  25 30-JUN-05
exfsys                                 34 30-JUN-05
ctxsys                                 36 30-JUN-05
xdb                                    38 30-JUN-05
ANONYMOUS                              39 30-JUN-05
OLAPSYS                                47 30-JUN-05
ORDSYS                                 43 30-JUN-05
ordplugins                             44 30-JUN-05
SI_INFORMTN_SCHEMA                     45 30-JUN-05
mdsys                                  46 30-JUN-05
SYSMAN                                 51 30-JUN-05
perfstat                              148 31-MAR-06
sys                                     0 30-JUN-05
SYSTEM                                  5 30-JUN-05
OUTLN                                  11 30-JUN-05
MGMT_VIEW                              53 30-JUN-05
MY_USER                               211 09-NOV-06

35 rows selected.

ops$tkyte%ORA10GR2> select * from t2;

   USER_ID USERNAME                       P
---------- ------------------------------ -
         0 sys
        24 dbsnmp
        34 exfsys
        36 ctxsys
        38 xdb
        44 ordplugins
        46 mdsys
        50 mddata
        56 ops$ora10gr2
        58 big_table
        60 foo$tkyte
        94 quota
       228 aq
        76 r
        84 scott
       148 perfstat

16 rows selected.

expect this update to take longer than the other.

10 is what I used, you would use 100 to 500.

it'll generate more redo, more undo, consume more cpu, take long - but will do it in batches.

and it will not suffer from 1555's as it does not keep a cursor open across commits.

and it is restartable when it fails, just rerun it and it will do it's job.

You will want an index on that column, even with two values.

why did I use NULL instead of "Y" - because entire null entries are not made in b*tree indexes - hence this index will stay small, it only indexes the "N" records.

Thanks

Feng, December 14, 2007 - 2:48 pm UTC

Tom,

Thanks for the solution. We will try this out.

Performance question

Feng, January 15, 2008 - 4:26 pm UTC

Hi Tom,

During testing of your bulk select/update solution, I found out that it has almost same execution time as following PL/SQL.

begin
for x in ( select * from t2 )
loop
update t1 set a=x.b where t1.fk=x.pk;
commit;
end loop;
end;
/

Since this method only lock/commit one row at a time, would this be a better solution than the bulk select and update of 500 rows at a time given that both have similar response time?

What do you think?

Feng

January 15, 2008 - 4:57 pm UTC

*no*

you have a transaction - a logical unit of work.

You commit IF and ONLY IF and when your transaction is complete.

what if this updates N rows and then fails? what then? where is your code to restart in the middle?

the goal - especially in Oracle - is not to reduce the number of locks at the expense of PROGRAM LOGIC AND CORRECTNESS (we are not sqlserver!).

Performance question

Feng, January 15, 2008 - 6:49 pm UTC

In our application, if we fail after updating N rows, we can start over by updating everything again. This is not as efficient as starting from what was left from last run, but it will reduce chance of lock conflict further since the chance of failure is pretty low.

January 16, 2008 - 3:13 pm UTC

do you understand the concepts of transactional consistency?

if you are worried about "locks", that means other people must be updating this information.

So, you come and update a row, commit, they update the same information, commit, you fail and you come back and just blind as a bat overwrite their updates again???

You understand that the outside world will see a BIT of your update before it is totally complete?

Do you understand the natural ramifications of what you propose and are you sure your applications are a) expecting that b) can deal with that c) won't totally corrupt your database (logically speaking)

If you actually have what you call "lock conflict", I think the answers to the above are "we really need to commit when we are DONE, because apparently other people are trying to modify this data and we'll introduce lost update problems on a scale the size of the grand canyon"

You are right!

Feng, January 16, 2008 - 5:01 pm UTC

delete thousands records

Dawar, January 17, 2008 - 2:49 pm UTC

I have a excel spread sheet base on one our internal table.
It contains 7000 records.

I need to delete all records from Oracle Database.

What is the quickies way to delete all records?

Thanks
D

January 19, 2008 - 10:24 pm UTC

7,000 records is

a) big for excel
b) trivial, tiny, almost non-existent for Oracle.

I would just use DELETE FROM T;

Truncate would not be much faster, if it were faster even.

7,000 is such a small number.

BULK UPDATES

SRINIVAS CHALLA, January 17, 2008 - 7:57 pm UTC

For bulk updates/deletes, Please also consider Merge Statmement. With the usage of Merge statement, one can overcome the temp table creation or performance issues of Pl/SQL BULK update statement.

MERGE INTO ok a
USING (
  SELECT ID
  FROM ok1
    ) b
ON (a.id = b.id)
WHEN MATCHED THEN
update set a.flag = 'Y'
DELETE WHERE a.FLAG='Y';  


desc ok;
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 ID                                                 NUMBER
 FLAG                                               VARCHAR2(1)

SQL> desc ok1;
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 ID                                                 NUMBER
 FLAG                                               VARCHAR2(1)

January 19, 2008 - 10:31 pm UTC

why would you even consider merge for that?

delete from ok where id in (select id from ok1);

that is all you need, why would you even consider complicating things so horribly in that case with merge?

Temp Tablespace Error

Nikhilesh, April 06, 2008 - 11:54 pm UTC

Dear Tom,
When we create a table using CTAS (Create table as select) (query retunrs 20 thousand rows) it works fine for first time but then it eats up all temp tablespace and gives an error saying that it could not allot an extent. We are using Oracle 10.1.0.3, we have true temporary tablespace of 9.6 GB. extent size is 1028.

I'm the only person connected to Oracle and yes after running successfully i killed 1 session which was running the same query using dynamic sql.

Am I doing something wrong or Is it a bug in 10.1.0.3?
I don't have metalink access.

Thanks in advance,

April 07, 2008 - 9:29 am UTC

no, it is not a bug, it is however insufficiently described to say what it is.

It seems like you are saying:

we do this:

a) create table t as select ......
b) drop table t;
c) create table t as select .... <<<= and this one fails.

are you sure it is blowing up on TEMP - or are you doing something like:

a) create table T1 as select ..... <<== works
b) create table T2 as select ... <<== fails on temp

If that is the case, it could be that what is "blowing up" is not temp at all - but the tablespace where t2 is going into - when we do a CTAS - we load the new data into temporary extents in the target tablespace AND THEN at the end, convert them into permanent extents.

Many times, people say "ran out of temp space", when they really ran out of space in the target tablespace (so, check the error - maybe even consider posting a CUT AND PASTE of it)....

Or, if it is not T2's tablespace that is blowing up, but there are two different tables - I'd say "table T1 needed little to no temp and T2's query needs more temp than you have allocated"

Delete the rows that contain BLOB object as one of the column

Ramakrishna, May 13, 2008 - 11:01 am UTC

I am trying to delete more than 100,000 records. It is faster in the tables that does not contain BLOB object as one of the column.I tried it through simple SQL plus
like 'delete from table_name where indexed_column<10000'. It is hanging there for a long time even for two days , the query is not done. How to effeciently delete this and what is the query I cna use to delete it quickly.

May 13, 2008 - 11:18 am UTC

two days? really? Not that I don't believe you - but I don't believe you - you must have been blocked or otherwise locked out.

although the statement "It is faster in the tables
that does not contain BLOB object as one of the column." should be rather obvious? If you do something that takes less work - it'll be faster - of course.

Did you monitor this process from another session - looking at the plethora of V$ tables out there that tell you what is going on in the database at all?

Set based update.

john hardy, May 14, 2008 - 3:25 pm UTC

What exactly is a set based update. I have been hearing it from my colleagues. Is it better than doing an update by rowid.( I know this method is not to your liking,..but it works for me. 100 million updates on a highly indexed &triggered table in 10 hours!)

May 16, 2008 - 12:14 pm UTC

if it took me 10 hours to update 100,000,000 rows, I'd be upset. We can do better.

I'd have been looking at perhaps getting rid of indexes, looking at that trigger to see if I really needed it to fire (disabling if not) and using a single update - perhaps in parallel - to perform the operation QUICKLY - in a single sql statement.

That is a set based update, slow by slow processing written by procedural programmers = bad, set based operations - packing as much work into a single sql statement = good.

In fact, if I were updating 100,000,000 out of 100,000,000 (or nearly so), I might choose to create table as "select rows as they should be", drop old table, rename-index and so on - or use dbms_redefinition to do it online.

update

A reader, May 29, 2008 - 8:25 pm UTC

delete performance

Resl, June 03, 2008 - 4:35 am UTC

Hi Tom,

I have crm_cust_info table with 301,000 records.
the select count(*) from crm_cust_info where cust_type='CONT' needs less than 1 second to return teh value (there is an index on cust_type).

while the delete statement
delete from crm_cust_info where cust_no='3500' (for example) needs more than 10 seconds to delete the required record knowing than cust_no is a unique an indexed column.

i rebuild indexes also try the same delete with different optimizer mode after analyzing the mentioned table and its indexes but without any performance change.

Please note that i included the delete statment in a cursor in order to delete about 300,000 records.

June 03, 2008 - 11:33 am UTC

you do realize that comparing the performance of a count(*) where indexed_column=value and a delete where indexed_column = value

is about as useful as comparing the performance of your bike with an airplane. True, your bike and airplane are both modes of traveling (as select and delete are both sql statements) but that is about it.

the select statement - what does it have to do... It has to read a couple of index blocks and count stuff (that it takes almost a second means there is LOTS of records to count, I would be hoping for 0.02 seconds maybe on something like that - for it to be near a second is huge)

the delete statement - what does it have to do....

a) read the index blocks
b) navigate to the table blocks, get the block in consistent read mode
c) get the block in current mode for each row it wants to delete
d) delete it - which requires navigating all of the OTHER indexes on this table to find the entries that point to this row and remove them)
e) generate redo for all of the above
f) generate undo for all of the above
g) generate redo for the undo
h) ultimately delete the index entry that found us the row in the first place

i) do a-h for the remaining 299,999 records

and you did it the slowest way possible!!!! slow by slow via a cursor, so you are making it take EVEN LONGER (if you did it in a single sql statement, you would find it to be faster in all probability)

Update statement taking more hours --

Sipra, June 09, 2008 - 3:00 am UTC

Hi Tom,

Have been observing the great support from last six months. Many thanks for the useful info -

I am having a SQL statement whic needs to be update 4 million of rows. I couldnot figure where my query is getting strucked...

Please have the following Code -

UPDATE T1
set TYPE_ID = 141
where TYPE_ID in (
SELECT TYPE_ID
FROM T1, VIEW, T2@dblink T2
WHERE T1.ID = VW.ID
AND VW.NUM = T2.NUM
AND T2.NAME IS NOT NULL
AND T2.REV = 'Y'
AND T2.NUM = '&v_num'
AND TYPE_ID = 1);

Scenario -

Updating the type_id in table T1 with 141 if the NAME of T2 is Not null and T2.REV is 'Y'
At ta time this will update 4-5 million reocrds.

I ahve executed this thru a procedure which has given the following error. Please help me.

regards,
Sipra
www.dwforum.net

Updating millions of records-LOB column

Praveen, June 16, 2008 - 10:21 am UTC

Hi Tom,

This is w.r.t. the original question and the answer given by you. There are 1 million records in a table. And if I need to upload an image or pdf file (say of size 1MB each) into a LOB column of each record (which means there are 1 million LOB files, each of size 1MB), what would be the best approach?

I need to commit after every 1000 records. Also it will be difficult to follow the steps you suggested while dealing with LOBs.

Thanks

June 16, 2008 - 1:32 pm UTC

follow which steps? This is a HUGE page here.

so, you have 1,000,000 1mb files.

I'd be looking at perhaps, probably using dbms_lob.loadfromfile, called in parallel in as many sessions as you want to have going - each session working on a different set of files.

Re:Updating millions of records-LOB column

Praveen, June 17, 2008 - 1:08 am UTC

The steps for updating 1 million records you had suggested for the original question in this thread. Anyway, I am going to test with the solution you suggested above, for uploading 1 million 1mb files.

Thank you

Update of 2 columns...

Richard, June 25, 2008 - 11:55 am UTC

Hi Tom,

Thanks for the website.. I'm an avid reader.  In reference to this thread, "How to Update millions or records in a table," recently, we've run into an issue where our tables are growing very quickly and our updates aren't able to keep up. Every day, we have to perform an update like the below example. When we had 20-30m records it would take a couple hours and wouldn't be so bad.  But now, that we have over 50m consistently, and sometimes closer to 100m rows, our updates haven't scaled properly.  I use the example, our method of updating is like driving a Geo Metro to the grocery store and back.. if the performance is bad you don't really notice because the ride is so short. But, when you have to drive this same Geo Metro across the state and back ,thats when you notice the performance issues :-)

We have to update our customers' location based on their IP address. We have the IP Geo data and know the starting decimal IP address and the ending decimal IP address.

For example, let's say a row in big_table  has SRCIP = 192.168.1.100 & DSTIP = 172.1.1.1

In the lookup_table, the following is true:

* The country_code = 'LOCAL'  for the dotted decimal IP range of 192.168.0.0 - 192.168.255.255  (decimal IP 3232235520 - 3232301055). 
* The country_code = 'REMOTE'   for the dotted decimal IP range of 172.0.0.0 - 172.255.255.255 (decimal IP 2885681152 - 2902458367).

update big_Table
    set src_cc = (select country_code from lookup_table where srcip between 3232235520 and 3232301055)
       , dst_cc = (select country_code from lookup_table where dstip between 2885681152 and 2902458367)
 where src_cc is null
       or dst_cc is null;

So the Line would eventually look like

SRCIP:     192.168.1.100
DSTIP:     172.1.1.1
SRC_CC:   LOCAL
DST_CC:   REMOTE


I tried to incorporate BULK COLLECT and FORALL into the PL/SQL below, but I'm not really sure how to do so. Maybe my logic is backwards, but I'm thinking we need to collect all of the lookup_table data (~135k rows), loop through them, and then update the big_table columns. This doesn't seem like it should be too bad.. 135k iterations through to update ~60m records. Obviously some loops will update 10k records, some 30k and some only a few hundred.  I've looked at various examples of CTAS, MERGE, /+ Append hints, etc. But when it comes to updating two columns  that rely on different data, I can't seem to wrap my brain around it.

I have indexes on SRCIP & DSTIP in the big_table, and lookup_table is an IOT (ip_from & ip_to columns). Reading your comments in this threat, this probably doesn't matter since my current method requires a full table scan of lookup_table anyway... and it's only ~135k records.  We're running 10.2.0.2 (with plans to upgrade to 10.2.0.4 ASAP) on Solaris 9.  Big_table is in nologging mode. We don't keep this data for very long but it's important for us to make these updates.

Thanks in advance for any support you can offer!


SQL> select count(*) from big_table;

 COUNT(*)
----------
 59,895,942

SQL> select count(*) from lookup_table;

 COUNT(*)
----------
   135,237

SQL> desc lookup_table

 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 IP_FROM                                   NOT NULL NUMBER(10)
 IP_TO                                     NOT NULL NUMBER(10)
 COUNTRY_CODE                                       VARCHAR2(2)

SQL> desc big_table
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 QRY                                                        CHAR(3)
  STARTTIME                                          DATE
 MSECS                                                  RAW(3)
 RANK                                                   NUMBER
 SRCIP                                                  NUMBER
 DSTIP                                                 NUMBER
 SRC_CC                                             CHAR(2)
 DST_CC                                             CHAR(2)
 ASSIGNEDHOSTID                                     RAW(4)
 HOSTNAME                         NOT NULL VARCHAR2(20)


Original update statement....

update big_table
    set src_cc = (select country_code from lookup_table where srcip between ip_from and ip_to)
      ,  dst_cc = (select country_code from lookup_table where dstip between ip_from and ip_to)
/


Obviously, taking forever.... lots of UNDO space required.

Second attempt....  not much better results. Less UNDO used, but more use of DBWn:

begin
FOR x  IN (select country_code, ip_from, ip_to from lookup_table)
LOOP
 update /*+ PARALLEL(big_table,10) */ big_table
       set src_cc = (loop_counter.country_code)
  where srcip between x.ip_from and x.ip_to
      and src_cc='  ';
 commit;
END LOOP;

FOR x  IN (select country_code, ip_from, ip_to from lookup_table)
LOOP
 update /*+ PARALLEL(big_table,10) */ big_table
       set dst_cc = (x.country_code)
  where dstip between x.ip_from and x.ip_to
       and dst_cc='  ';
 commit;
END LOOP;
end;
/

June 25, 2008 - 12:08 pm UTC

what are the odds of filling out the IP table (that should be really easy to do)

then we could update using no code easily

Filling out IP table...

Richard, June 25, 2008 - 3:47 pm UTC

Thanks for the prompt response. We've considered that option and it may be our only choice.

Two drawbacks that I can identify immediately:
 * This IP data is updated ~monthly, or should be updated ~monthly. As the IP spaces change, we will have to rebuild this table... but, as I know you'll say, rebuilding that table once/month is much better than using 2 between statements on a daily basis :-)
 * The decimal IP range is 33996344 - 4278190079.

If I did my math correctly:

SQL> select 4278190079-33996344 from dual;

4278190079-33996344
-------------------
         4244193735

SQL> select to_char(4244193735,'999,999,999,999') from dual;

TO_CHAR(4244193735,'999,999,99')
-------------------------------
   4,244,193,735

I've considered distinct IP decimal addresses as well. Out of ~50m rows, per day, we have approximately 3m unique srcip or dstip values.

For arguments sake, if we couldn't fill out an IP table, any suggestions on a course of action? This problem has become those song lyrics that I can't get out of my head until I know the rest of the verse. I think you're absolutely right that filling out an IP table might be the best method, but I'm curious what I could do with SQL or PL/SQL to attack the problem.

Thanks again.

June 25, 2008 - 3:55 pm UTC

Actually, doh, now that I think about it, why not just maintain a table of your 3milion and just JOIN

you don't need the country code in the base table, just join - so, don't fill out the ip table, maintain a lookup nightly that has just the 3million ip's you need and monthly wise update that after you get a refresh. Every night you would just have to insert into this lookup table any ip's that aren't there yet, populate them with values.

Even if you "must" have the IP's in the base table (and I would disagree, just join, it is ok to join), it would be more efficient to maintain this 3million row table nightly (adding new rows) and only having to lookup the new rows.

union & distinct columns...

Rich, June 25, 2008 - 8:36 pm UTC

Thanks Tom. 

I am thinking this might be a good use of CTAS with /*+ APPEND */ hint, parallel & nologging, no? 

CTAS select distinct srcip as ip from big_table UNION select distinct dstip as ip from big_table... no logging, /*+ append */, parallel, etc.

Then, i'll just keep inserting into that table until it also becomes unmanagable... truncate it and start over.

We'll have ~3m unique IPs per day.. but then on a weekly basis, I would guess it'll be ~4.5m unique IPs... monthly ~10m unique IPs, etc.  So the size of this table shouldn't exceed the size of the big_table anytime soon. *fingers crossed* :-)

Sorry to go off topic, but what are your thoughts on triggers? I read in your books that triggers have less effect on redo in 10g than they did in 8i/9i. We've considered adding the src & dst country_code columns onto big_table and using an on insert trigger to update those 2 columns. This table is appended to via sqlldr (not direct load) a few dozen rows at a time... but those few dozen times add up pretty quickly -- 50+m/day. We haven't tried it yet, and I'm sorry to be the guy who asks you the questions without having tkprof, trace, etc, but I'm just looking for a gut (experience/wisdom) feeling.

Thanks again Tom.

June 26, 2008 - 3:41 pm UTC

create table as select is ALWAYS append mode (writes above high water mark with direct path). There is no such thing as a CTAS with /*+ append */.

Parallel - up to you, do you have the resources for it, can parallel help it.

Nologging - up to you and your DBA :) they need to do special stuff if you go nologging.

... Sorry to go off topic, but what are your thoughts on triggers? ...

the title of an upcoming column by me in Oracle Magazine is:

<quote>
The Trouble with Triggers¿

Those of you who follow the asktom.oracle.com website know that I have an aversion to triggers. Once upon a time, a long time ago, I thought triggers were the coolest thing ever and I used (and abused) them heavily. Now, I will go very far out of my way in order to avoid a trigger ¿ whenever possible.
</quote>

and you said:

... his table is appended to
via sqlldr (not direct load) a few dozen rows at a time... ...

but that'll skip triggers... and make /*+ append */ a "no-op"


ops$tkyte%ORA10GR2> create table t as select * from all_users where 1=0;

Table created.

ops$tkyte%ORA10GR2> create trigger t_trigger
  2  before insert or update on t for each row
  3  begin
  4          :new.username := lower(:new.username);
  5  end;
  6  /

Trigger created.

ops$tkyte%ORA10GR2> !cat t.ctl
LOAD DATA
INFILE *
INTO TABLE t
REPLACE
FIELDS TERMINATED BY '|'
(
username
,user_id
,created
)
BEGINDATA
BIG_TABLE|58|14-DEC-05
DIP|19|30-JUN-05
TSMSYS|21|30-JUN-05

ops$tkyte%ORA10GR2> !sqlldr / t.ctl direct=y

SQL*Loader: Release 10.2.0.2.0 - Production on Thu Jun 26 15:30:44 2008

Copyright (c) 1982, 2005, Oracle.  All rights reserved.


Load completed - logical record count 3.

ops$tkyte%ORA10GR2> select * from t;

USERNAME                          USER_ID CREATED
------------------------------ ---------- ---------
BIG_TABLE                              58 14-DEC-05
DIP                                    19 30-JUN-05
TSMSYS                                 21 30-JUN-05

<b>no trigger fired</b>

ops$tkyte%ORA10GR2> update t set user_id = user_id;

3 rows updated.

ops$tkyte%ORA10GR2> select * from t;

USERNAME                          USER_ID CREATED
------------------------------ ---------- ---------
big_table                              58 14-DEC-05
dip                                    19 30-JUN-05
tsmsys                                 21 30-JUN-05

<b>that just shows trigger works, but did not fire</b>

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> insert /*+ append */ into t select * from all_users where rownum < 3;

2 rows created.

ops$tkyte%ORA10GR2> select * from t;

USERNAME                          USER_ID CREATED
------------------------------ ---------- ---------
big_table                              58 14-DEC-05
dip                                    19 30-JUN-05
tsmsys                                 21 30-JUN-05
big_table                              58 14-DEC-05
dip                                    19 30-JUN-05

<b>that shows trigger fired and since the select did not fail, we did NOT append!</b>

add column

jayesh, June 28, 2008 - 3:27 am UTC

hi tom,

generally we add a new column it added at the end of all column suppose i want to add a column between two column if it is possible so plese give me a examples & syntex

June 28, 2008 - 1:53 pm UTC

this is what a view is about.

How I wish we never gave "rename column :(" views are the right approach here.

alter table t add ( x int );

create or replace view view_of_t
as
select a, x, b from t;

use the view - if need be, rename the table to "t_table" and create view T as select ....

no one will know...

Update with BETWEEN operator...

Rich, July 02, 2008 - 5:50 pm UTC

Thanks for the response to my questions Tom -- always appreciated.

I've still been thinking about how to attack this and I'm curious how BETWEEN works.If I say:

update big_table
set dst_cc = (select country_code
from lookup_table
where dstip between x.ip_from
and x.ip_to);

Will this act similar to an IN operator or an EXISTS operator? I'm curious if BETWEEN will stop at the first result or continue searching for other values?

I created a quick test... I duplicated a row in lookup_table, but with a different country_code, executed a statement like

update big_table
set src_cc = (select country_code
from lookup_table
where srcip between ip_from and ip_to);

and it returned "ORA-01427: single-row subquery returns more than one row"

Am I correct to assume that Oracle will scan my entire lookup table when it uses the BETWEEN operator?

I'm thinking about rewriting my query to something like:

FOR x IN (select country_code, ip_from, ip_to from lookup_table)
LOOP
update big_table
set src_cc = (x.country_code)
where exists (select 1
from lookup_table
where srcip between x.ip_from and x.ip_to);
commit;
END LOOP;
end;
/

The lookup_table shouldn't have duplicates, ip_to & ip_from have UNIQUE constraints. Therefore, I want my query to select the first country_code & move on to the next update, not continue to search through the lookup_table. I'm not quite sure how to attack this. Any thoughts or suggestions?

Thanks again for your help!

Regards,
Rich

July 07, 2008 - 9:19 am UTC

..
update big_table
set dst_cc = (select country_code
from lookup_table
where dstip between x.ip_from
and x.ip_to);

Will this act similar to an IN operator or an EXISTS operator?
.....

absolutely not. think of that like:

update big_table set dst_cc = f(x);

no where clause - EVERY row gets updated. And if your correlated subquery returns more than one row - you'll just get an error and the update will fail.

...
Am I correct to assume that Oracle will scan my entire lookup table when it
uses the BETWEEN operator?
....

absolutely not. it'll index range scan more likely if it can.

I cannot really help with the real update you want - too much missing here, this logic doesn't seem sensible:

FOR x IN (select country_code, ip_from, ip_to from lookup_table)
LOOP
update big_table
set src_cc = (x.country_code)
where exists (select 1
from lookup_table
where srcip between x.ip_from and x.ip_to);
commit;
END LOOP;
end;
/

why would you use where exists. that would tend to scan big table for every row in lookup

maybe this?

ops$tkyte%ORA11GR1> create table t ( src_cc varchar2(2), srcip number );

Table created.

ops$tkyte%ORA11GR1> insert into t values ( null, 50 );

1 row created.

ops$tkyte%ORA11GR1> insert into t values ( null, 500 );

1 row created.

ops$tkyte%ORA11GR1> insert into t values ( null, 5000 );

1 row created.

ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> create table lookup ( country_code varchar2(2), ip_from number, ip_to number );

Table created.

ops$tkyte%ORA11GR1> insert into lookup values ( 'aa', 0, 100 );

1 row created.

ops$tkyte%ORA11GR1> insert into lookup values ( 'bb', 101, 1000 );

1 row created.

ops$tkyte%ORA11GR1> insert into lookup values ( 'cc', 1001, 10000 );

1 row created.

ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> select * from t;

SR      SRCIP
-- ----------
           50
          500
         5000

ops$tkyte%ORA11GR1> merge into t
  2  using lookup l
  3  on (t.srcip between l.ip_from and l.ip_to)
  4  when matched then update set src_cc = l.country_code;

3 rows merged.

ops$tkyte%ORA11GR1> select * from t;

SR      SRCIP
-- ----------
aa         50
bb        500
cc       5000

please do NOT commit in the for loop, please....

No commit & no exists...

A reader, July 08, 2008 - 8:56 am UTC

Thanks Tom.

why would you use where exists. that would tend to scan big table for every row in lookup

I agree, I wouldn't want to do that. I was trying to use it as an example of how I wanted to query lookup_table and use 'exists-like logic' to stop after it finds the first entry that matches the between statement. I think your suggestion of a merge is my best bet.

please do NOT commit in the for loop, please....

That being in there was embarrassing -- sorry :-)

Thanks,
Rich

How to insert 10 million rows...into a prtition

Suresh, July 23, 2008 - 12:08 pm UTC

Hi Tom,

I have a table (fact_projection) with currently with 26 partitions and having around 10 million rows.
Every day i need to insert into fact_projection all the partitions of table from stging_projection joining some other tables.

Data to all the partitions are being inserted correctly but for a paricular partition called PH is getting too late.

I tried inserting using BULK COLLECT but it is taking bulk of time.

Could you please suggest the best possible approach.

Shall i do it using the

insert into fact_projection PARITION PH
SELECT <join>

Here is the query .-->

INSERT INTO invsl.fact_invntry_projctn
             (mngng_org_id, org_suply_chain_item_key, suply_chain_loc_key,
              snpsht_dt_key, fscl_mth_id, actvty_typ_key, slng_mrkt_id,
              unit_msr_id, crncy_id, srce_sys_id, unit_qty, std_cost_amt,
              sku_parms_key, item_typ_id, mrkt_fsc_key, suply_mrkt_id,PROD_TYP_GRP_ID)
    SELECT   /*+ parallel(a,8) push_subq */ a.mngng_org_id, d.org_item_fg_key, a.suply_chain_loc_key,
             g.clndr_dt_key, b.fscl_mth_id, a.actvty_typ_key, a.lcl_mrkt_id,
             MIN (NVL (a.uom_id, -100)), MIN (NVL (f.crncy_id, -100)),
             MIN (a.srce_sys_id), SUM (a.unit_qty), MIN (a.std_cost_amt), -100,
             MIN (a.item_typ_id), a.mrkt_fsc_key, suply_mrkt_id,min(a.PROD_TYP_GRP_ID)
        FROM invsl_stage.stg_invntry_projctn a JOIN codi_stage.stage_fscl_mth b
             ON (    TO_CHAR (a.projctn_mth_dt, 'mm') = b.fscl_mth_nr
                 AND TO_CHAR (a.projctn_mth_dt, 'yyyy') = b.fscl_yr_nr
                )
             JOIN codi_stage.stage_org_fg d
             ON (a.suply_mrkt_id = d.org_id AND a.item_cd = d.lcl_sc_fsc_cd)
             LEFT OUTER JOIN codi_stage.stage_crncy f
             ON (a.lcl_crncy_cd = f.crncy_cd)
             JOIN codi_stage.stage_clndr_dt g ON (a.snpsht_dt = g.clndr_dt) 
     WHERE  a.mngng_org_cd = lv_mrkt
                AND a.lcl_item_typ = 'FG'
                AND a.prod_typ_grp_cd = 'FG PRODUCT'
    GROUP BY a.mngng_org_id,d.org_item_fg_key,a.suply_chain_loc_key,g.clndr_dt_key,
             b.fscl_mth_id,a.actvty_typ_key,a.lcl_mrkt_id,a.mrkt_fsc_key,a.suply_mrkt_id;

July 24, 2008 - 10:29 am UTC

you might consider a direct path insert, insert /*+ APPEND */ and enabling parallel dml if you want to do a parallel direct path load.

How to insert 10 million rows...into a prtition

Sureshpswamy, August 08, 2008 - 6:40 am UTC

Hi,

It was working fine when I added the APPEND hint.
But just before the deployment I had a Change request to consider new table mapsrpt_stage.stg_mrkt_fsc.But now the whole query went to old 3hr status.
Cuold you please help me .

INSERT INTO /*+ APPEND  */ invsl.fact_invntry_projctn PARTITION (part_62) NOLOGGING
            (mngng_org_id, org_suply_chain_item_key, suply_chain_loc_key,
             snpsht_dt_key, fscl_mth_id, actvty_typ_key, slng_mrkt_id,
             unit_msr_id, crncy_id, srce_sys_id, unit_qty, std_cost_amt,
             sku_parms_key, item_typ_id, mrkt_fsc_key, suply_mrkt_id,
             prod_typ_grp_id)
   SELECT   /*+ parallel(a,8) push_subq  index(pk_stg_invntry_projctn) */
            a.mngng_org_id, d.org_item_fg_key, a.suply_chain_loc_key,
            g.clndr_dt_key, b.fscl_mth_id, a.actvty_typ_key, a.lcl_mrkt_id,
            MIN (NVL (a.uom_id, -100)), MIN (NVL (f.crncy_id, -100)),
            MIN (a.srce_sys_id), SUM (a.unit_qty), MIN (a.std_cost_amt), -100,
            MIN (a.item_typ_id), a.mrkt_fsc_key, suply_mrkt_id,
            MIN (a.prod_typ_grp_id)
       FROM invsl_stage.stg_invntry_projctn PARTITION (ph) a JOIN codi_stage.stage_fscl_mth b
            ON (    TO_CHAR (a.projctn_mth_dt, 'mm') = b.fscl_mth_nr
                AND TO_CHAR (a.projctn_mth_dt, 'yyyy') = b.fscl_yr_nr
               )
            JOIN codi_stage.stage_org_fg d
            ON (a.suply_mrkt_id = d.org_id AND a.item_cd = d.lcl_sc_fsc_cd)
            LEFT OUTER JOIN codi_stage.stage_crncy f
            ON (a.lcl_crncy_cd = f.crncy_cd)
            JOIN codi_stage.stage_clndr_dt g ON (a.snpsht_dt = g.clndr_dt)
      WHERE a.mngng_org_cd = 'PH'
        AND a.lcl_item_typ = 'FG'
        AND a.prod_typ_grp_cd = 'FG PRODUCT'
        AND EXISTS (
               SELECT 1
                 FROM mapsrpt_stage.stg_mrkt_fsc m
                WHERE a.mrkt_fsc_key = m.mrkt_fsc_key AND m.catgry_id != 17
                   OR (    m.catgry_id = 17
                       AND EXISTS (
                              SELECT 1
                                FROM system_code
                               WHERE m.form_id = code
                                 AND code_type =
                                                'SALES_TOOLS_INCLUSION_FILTER'
                                 AND market = 'GLOBAL')
                      ))
   GROUP BY a.mngng_org_id,
            d.org_item_fg_key,
            a.suply_chain_loc_key,
            g.clndr_dt_key,
            b.fscl_mth_id,
            a.actvty_typ_key,
            a.lcl_mrkt_id,
            a.mrkt_fsc_key,
            a.suply_mrkt_id

August 08, 2008 - 1:20 pm UTC

no, not really. you'd need to trace this, look at what it is doing, and see if there is a more efficient way to do it.

you'd need to understand the indexing scheme
you'd need to understand the QUESTION trying to be answered
you'd need to understand the relationships between the tables

and so on - you have that, I don't.

Update

krishna, August 19, 2008 - 12:13 am UTC

Hi Tom,

Can I update a table based on some criteria in the where clause which is having joins.

Ex: update emp set sal=1000 where emp.emp_no = <table>.<field_name>

pls, can u provide some examples.

Thanks
Krishna

August 20, 2008 - 10:18 am UTC

"U" is not available (I believe they are dead actually). Can I help instead?

you can update a join

ops$tkyte%ORA10GR2> create table emp as select * from scott.emp;

Table created.

ops$tkyte%ORA10GR2> create table dept as select * from scott.dept;

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> alter table emp add constraint emp_pk primary key(empno);

Table altered.

ops$tkyte%ORA10GR2> alter table emp modify deptno not null;

Table altered.

ops$tkyte%ORA10GR2> alter table dept add constraint dept_pk primary key(deptno);

Table altered.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> update ( select sal from emp, dept where emp.deptno = dept.deptno) set sal = 1000;

14 rows updated.

and of course a subquery also works:

ops$tkyte%ORA10GR2> update emp set sal = 1000 where deptno in ( select deptno from dept );

14 rows updated.

error logging clause for CTAS

Oxnard, August 29, 2008 - 10:55 am UTC

ver 10.2
Anyway to do an error logging clause for CTAS?

We do partition swaping with the CTAS tables

so for example we do

create table my_ctas (
col1 not null, -- note the not null
col2,
col3) tablespace tbl nologging
select
cast(col1 as varchar2(20)) as col1
,cast(col2 as number) as col2
,cast(col3 as number) as col3
from taba, tabb
where
taba.x = tabb.x

when there are some data problems it can be a pain to figure it out. If not how do I get a request onto the new features request.

Thanks

August 30, 2008 - 9:48 am UTC

since you are doing a cast like that, you know the types you want, you would accomplish your goal in two statements

create table my_table( col1 varchar2(30) not null, .... );
insert /*+ append */ into my_table select ..... log errors ....;

use the direct path insert to populate the table, it'll work very much in the same fashion as the create table as select performance wise. By pass buffer cache, write directly to disk, can be non-logged, parallel, etc.

not update but just select and insert

A reader, September 10, 2008 - 3:13 am UTC

I have a subset of parent table data in table A (around 4 Millions records) and detail table B of around 4.5 billion records. I need to query them and insert the 4 Million * X (number of matchin records in B) in new table C. There is index on both table A and B on matching key.

The problem is, when I query table B alone without any conditions, just select count(1) from B, it takes hours to return. There are more than 400 partitions (hash) on the table B and the values in partions are not uniformly distributed, does that cause slowing down ?

What should be the best approach in this condition ? I am thinking of querying table A and B and put the result in C using CTAS, is there another better way ? or any tips to make it faster ?

September 11, 2008 - 10:42 am UTC

the values in b have nothing to do with anything. You are counting the number '1' (and one might ask - why the number '1', why not '*', if your intention is to count "how many records do i have", you would use count(*) to show to the world you know what you are doing - by counting '1's, it looks like you are confused about how to count records....)

However, the fact that there are 4.5 billion records in there and let us assume an average row width of 100 bytes - you might be full scanning (you do not say if you have indexed a NOT NULL column on this table) about 1/2 of a terabyte of storage.

And depending on your storage, that just might take a while?

I would question the need for this table C, you give no justification for it, but it doesn't seem very useful

not update but just select and insert

P, September 11, 2008 - 1:03 pm UTC

For count(1), I read somewhere that if you just want to count records of a table, it is better than using count(*) , please correct me if I am wrong. I dont have the table analyzed or any other way to find how many rows in the table so just to tell you correct number, I queried select count(1). If count(*) is better, please let me know and also please explain why.

For table C which I mentioned in earlier question is result table from querying A and B and there would be more tables to update table C for some more columns and then table C will be used in main process with other applications's result tables to apply some other business rules for final output of some reports.

For more information on index and tables , table A and table B are indexed on NOT NULL columns (primary keys on A and not null indexed foreign key on B) and I need to query some more tables like B to update columns in C which would have same parent child relationship like A --> B.

sorry, if I was not clear on table C, but the actual question is what approach I should take ? I hope I am able to explain my question to you.

September 16, 2008 - 12:38 pm UTC

... For count(1), I read somewhere that if you just want to count records of a
table, it is better than using count(*) , please correct me if I am wrong. ...

consider yourself corrected then.

When you read that information - did they

a) say why?
b) show why?
c) demonstrate some noticeable improvement in response times?

Always look for that - if someone says "X will make your life better", you would as to see a, b, and c - do the same with all technical advice.

As for why - count(*) is the way, the documented way, the correct way to count rows. Count(*) is in fact "count rows".

Count( <expression> ) says "count non-null expressions", by saying count(1) you are not saying "count rows", you are in fact saying "count how many non-null number 1's are in this table".

We had to go and file a bug way back when to internally rewrite count(1) as count(*) in order to make things faster - this rumor was so rampant.

read this:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::NO::P11_QUESTION_ID:1156159920245

we've been discussing this for years and years and years now.....

other than saying you have a table A and B and lots of rows in B, I don't have much information to go on here. You do not say how large C is, if C is very large - it could be that full scans + hash join = right way.

If C is not, full scan A + nested loops into B might be the right way.

But in either case, this will take a MEASURABLE amount of time, you do not say how long it is currently taking, what your (possibly unreasonable) expectations are, what the current plan is, how big (in gigabytes - row counts are useless - they only describe the smallest the table could be) A and B are, what kinds of resources you have (parallel? lots of disk? tons of memory, etc...)

and I still doubt that C is necessary, but whatever, people keep building tables like this anyway... Unless you are going to full scan C frequently - C is probably not useful.

Question on Updating a Join

Dan, September 12, 2008 - 6:24 pm UTC

Tom,

I'm looking to rewrite a pretty commonly used procedure for getting phone numbers - it's currently written "slow by slow" and takes a long time on anything of decent size. I'm running into trouble when attempting to use the UPDATE on a JOIN that you specified earlier in this thread:

UPDATE (SELECT tmp.HOME_PHONE, phn.phn_area_cde_num, phn.PHN_NUM
FROM tmp_test1 tmp,
phone_table phn
WHERE tmp.cust_id = phn.cust_id
AND phn.phn_loc = 'H'
AND phn.phn_use = 'G'
AND phn.del = 'N')
SET HOME_PHONE = phn_area_cde_num || phn_num;

This results in a "ORA-01779: cannot modify a column which maps to a non key-preserved table".

This is in 10g. There are primary key restraints on each phone_table field (cust_id, phn_loc, phn_use, del) and I placed a unique constraint on tmp.cust_id after looking through your site a bit for the error. Using the below:

alter table tmp_test1 add constraint tmp_test1_unique
unique(cust_id);

But it's still happening. It seems like that should be sufficient information for Oracle to know there is only one possible result, correct? Or is there another constraint I need?

(Also, I found this interesting - I first ran the sql from another user, one that has permissions on tmp_test1 but not on phone_table, and got an "ORA-01031: insufficient privileges" error. It seems like this should work since I am only attempting to update the table I have access to, but looks like not)

September 16, 2008 - 5:22 pm UTC

You can use a merge here - but not the join. The optimizer won't recognize the constants there - it would want tmp to have phn_loc, phn_use, del and for tmp to join to phn using 4 columns (we have to join to the primary key of the PHN table in its entirety)

so, just use a merge command.

ops$tkyte%ORA11GR1> create table tmp ( a int primary key, y int );

Table created.

ops$tkyte%ORA11GR1> create table phn ( a int, b int, y int, primary key(a,b) );

Table created.

ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> update ( select tmp.y tmpy, phn.y phny from tmp, phn where phn.a = tmp.a and phn.b = 'x' )
  2  set tmpy = phny;
set tmpy = phny
    *
ERROR at line 2:
ORA-01779: cannot modify a column which maps to a non key-preserved table


ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> merge into tmp
  2  using (select * from phn where b = 'x' ) phn
  3  on (tmp.a = phn.a)
  4  when matched then update set y = phn.y;

0 rows merged.

how to update millions row in produktion

Francisco Jimenez, October 15, 2008 - 7:50 am UTC

Hello Tom,

i need to update one column from one table. This Table have 250 millions rows.

My Problem is that someone can updating rows from this table at the same time while i am updating the table. How can i do this update? I would like to use CTAs but our DBA think that it is not a good idea.

thanks in advance.

Fran.

October 15, 2008 - 5:31 pm UTC

lock table t in exclusive mode;
update t set x = whatever;
commit;

get the table, update the table, un-get the table by committing.

A reader, November 17, 2008 - 1:00 pm UTC

We have a system in production for about 6 months. One process inserts records into a table and right now it has inserted about 250 million rows in the table. We want to start putting a retention period on this table and change the process a little more. So we want to do the following in production

1. Rename the existing table.
2. Create a new table with the same structure and let it run in production.
3. Every day early morning we want to do the following - check the data in the table created above and if the data is more then 3 days old, archive it to a history table.
4. Delete from the history table any data more than 90 days old.

Now for the step 3, it could be about a million - 2 million rows every day. What will be the best way to move the data or archive it to the history table ?

Thanks for your help.

November 18, 2008 - 7:29 pm UTC

one wonders, one really really really wonders....

why....

this was not part of the original design from day one, this is important stuff.

You want to use partitioning - anything else would be a hugely inefficient way to approach it.

Change datatype of column in 400 million row table

A reader, March 20, 2009 - 5:19 pm UTC

Tom,
We have a table with 400 million rows. The table is partitioned. We have to change the datatype of a column from number to varchar2. We are running 10gr2.

For some arcane reason, the end users want the column order to remain unchanged.

My thought would be to:
a) Add a new column of type number
b) Update the new column with old column
c) Set old column to NULL
d) Change the datatype of old column to varchar2
e) Update the old column with new column
f) Drop the new column

Is this the right approach or is there a better way considering the data volume? If this is the right approach, how can I speed it up while minimizing the downtime to users?

Thanks...

March 24, 2009 - 10:47 am UTC

... the end users want the column order to remain
unchanged.
...

we call that a VIEW.

alter table t add new_col varchar2(25);
update t set new_col = number_col, number_col = null;
alter table t drop column number_col;
alter table t rename to t_table;
create or replace view t
as
select ....... <in any order, with any name you like>
  from t_table;

DO NOT do the 'curly shuffle' like you suggest above, that would just be a waste.

Change datatype of column in 400 million row table

A reader, March 24, 2009 - 11:08 am UTC

Tom,
Thanks for your reply.

Following your example, from a test in non-production database, updating the column took about 15 minutes for 1 million row table. For 400 million row table, if I linearly extrapolate, it will take about 100 hours. How can I speed it up?

Thanks...

March 29, 2009 - 10:46 am UTC

parallel update.

or, use dbms_redefinition which can apply functions and conversions as you redefine the table, you'd need two copies, but it can be 100% online, your end users don't have to have any interrupted service.

actually, the fastest way to do something is NOT TO DO IT in the first place. I would go back and seriously re-evaluate your "need" to perform this operation at all...

Bulk operation

Sam, April 20, 2009 - 10:59 am UTC

Hi Tom,

We have a table with 60 million record in production. We need to do a clean up activity based on the following logic.

For a particular FORDERUNG_ID, if the column values in the column REST_KOST_UNVERZ, REST_KAP_VERZ differs, we need to keep those rows. If the value in the following row ( order by GUELTIG_VON_DAT ) has the same value in the columns REST_KOST_UNVERZ and REST_KAP_VERZ, we need to delete that row. Here the problem for me is that I don't know how to compare rows of the same table in a particular order. Sample data is provided below

FORDERUNG_ID REST_KOST_UNVERZ REST_KAP_VERZ GUELTIG_VON_DAT     GUELTIG_BIS_DAT
1387221                     10       3769.59 31.10.2008 10:21:35 25.11.2008 13:07:01
1387221                     10       3769.59 25.11.2008 13:07:02 26.11.2008 13:07:01
1387221                      0             0 26.11.2008 13:07:02 27.11.2008 13:07:01
1387221                     10            10 27.11.2008 13:07:02 04.02.2009 23:59:59
1387221                      0             0 05.02.2009 00:00:00 06.02.2009 23:59:59
1387221                      0             0 10.02.2009 00:00:00 31.12.9999 00:00:00
1387224                      0        340.16 31.10.2008 10:21:35 25.11.2008 13:07:01
1387224                  116.4        340.16 25.11.2008 13:07:02 04.02.2009 23:59:59
1387224                  116.4        340.16 05.02.2009 00:00:00 06.02.2009 23:59:59
1387224                  116.4        340.16 10.02.2009 00:00:00 31.12.9999 00:00:00

The output data should be

    
FORDERUNG_ID REST_KOST_UNVERZ REST_KAP_VERZ GUELTIG_VON_DAT     GUELTIG_BIS_DAT
1387221                     10       3769.59 31.10.2008 10:21:35 25.11.2008 13:07:01
1387221                      0             0 26.11.2008 13:07:02 27.11.2008 13:07:01
1387221                     10            10 27.11.2008 13:07:02 04.02.2009 23:59:59
1387221                      0             0 05.02.2009 00:00:00 06.02.2009 23:59:59
1387224                      0        340.16 31.10.2008 10:21:35 25.11.2008 13:07:01
1387224                  116.4        340.16 25.11.2008 13:07:02 04.02.2009 23:59:59

If you see that I need to compare current rows and previous row to see if there is any difference after ordering by GUELTIG_VON_DAT (valid from date). For your information, GUELTIG_BIS_DAT is valid to date.

Difference between GUELTIG_VON_DAT and GUELTIG_BIS_DAT of next row is GUELTIG_VON_DAT +1 second.

I need a query that returns the result so that I can insert into second table with append mode, drop the original table and rename second table to original table. Can you please help me out?

April 21, 2009 - 2:27 pm UTC

... For a particular FORDERUNG_ID, if the column values in the column REST_KOST_UNVERZ, REST_KAP_VERZ differs, we need to keep those rows. If the value in the following row ( order by GUELTIG_VON_DAT ) has the same value in the columns REST_KOST_UNVERZ and REST_KAP_VERZ, we need to delete that row. ....

did not make sense, make that more clear and of course provide create tables and inserts - else I cannot really look at it.

It sounds like to me that if you have a row whereby REST_KOST_UNVERZ, REST_KAP_VERZ differs - keep that row (so, you only look within a row, therefore decode( REST_KOST_UNVERZ, REST_KAP_VERZ, 0, 1 ) = 1 are rows to KEEP (they do not differ) always.

Then it gets difficult to understand. For rows where they do not differ (so we are only interested in them I think??) we need to delete that row.

So basically, generate a set of rows whereby REST_KOST_UNVERZ, REST_KAP_VERZ are the SAME and de-duplicate using a key of FORDERUNG_ID, REST_KOST_UNVERZ, REST_KAP_VERZ - keeping only the oldest record? right?

Re: Bulk operation

Jakub Illner, April 22, 2009 - 11:12 am UTC

Sam, I suppose you should use the LAG/LEAD analytical function which will give you the values of the previous or next row. In the next step you can compare the values in the current versus previous row and decide what to do with the record - delete or keep it. The select (not tested) could look like this:

select FORDERUNG_ID
, REST_KOST_UNVERZ
, REST_KAP_VERZ
, GUELTIG_VON_DAT
, GUELTIG_BIS_DAT
, case
when REST_KOST_UNVERZ = next_REST_KOST_UNVERZ
and REST_KAP_VERZ = next_REST_KAP_VERZ
then 'DELETE'
else 'KEEP'
end as action
from (
select FORDERUNG_ID
, REST_KOST_UNVERZ
, REST_KAP_VERZ
, GUELTIG_VON_DAT
, GUELTIG_BIS_DAT
, lead(REST_KOST_UNVERZ,1)
over (partition by FORDERUNG_ID
order by GUELTIG_VON_DAT
) as next_REST_KOST_UNVERZ
, lead(REST_KAP_VERZ,1)
over (partition by FORDERUNG_ID
order by GUELTIG_VON_DAT
) as next_REST_KAP_VERZ
prev_REST_KAP_VERZ
from <table>
)

Regards,
Jakub

Rewrite the update

sriram, April 24, 2009 - 8:00 am UTC

Tom,

I am trying to implement the technique(create new_table as select <update statement> from old_table) that you mentioned on the below sql , but no clues on proceeding.

Please can you suggest a good way of doing this?

UPDATE PS_PI_TAX_DETAIL TAX SET PI_STATE = (NVL((SELECT J_STATE_TAX FROM PS_J_PAYCHECK PAY WHERE PAY.EMPLID = TAX.PI_EMPLID AND PAY.WEEK_END_DT = TAX.PAY_END_DT AND PAY.J_PAYCHECK_NBR = TAX.PI_CHECK_NBR AND SUBSTR(PAY.J_CONTROL_NUMB, 1, 6) = TAX.PI_PAY_GROUP2),:1)) ,PI_LOCALITY = (NVL((SELECT J_LOCAL_TAX FROM PS_J_PAYCHECK PAY WHERE PAY.EMPLID = TAX.PI_EMPLID AND PAY.WEEK_END_DT = TAX.PAY_END_DT AND PAY.J_PAYCHECK_NBR = TAX.PI_CHECK_NBR AND SUBSTR(PAY.J_CONTROL_NUMB, 1, 6) = TAX.PI_PAY_GROUP2),:1)) WHERE PI_RUN_NUM = 0 AND PI_SYSTEM_ID = :2 AND PI_CONFIG_ID = :2

April 27, 2009 - 12:18 pm UTC

wouldn't you guys hate it if I wrote all of my sql like that? Isn't that just about the most impossible thing to read? All uppercase, no sensible line breaks, totally impossible to read.

you'd have to select out every row of course (your update has a where clause... you are not updating every row)

I think you want to use merge, merge into tax using pay on (join condition) with a WHEN MATCHED clause to update.

merge into ps_pi_tax_detail tax using PS_J_PAYCHECK pay on ( PAY.EMPLID= TAX.PI_EMPLID AND PAY.WEEK_END_DT = TAX.PAY_END_DT AND PAY.J_PAYCHECK_NBR = TAX.PI_CHECK_NBR AND SUBSTR(PAY.J_CONTROL_NUMB, 1, 6) = TAX.PI_PAY_GROUP2 and tax.pi_run_num = 0 and tax.pi_system_id = :b2 and tax.pi_config_id = :b2 ) when matched then update set pi_state = nvl(pay.j_state_tax,:b1), pi_locality = nvl(pay.j_local_tax,:b1)

like that. Only kidding, more like this:

ops$tkyte%ORA10GR2> merge into ps_pi_tax_detail tax
  2  using PS_J_PAYCHECK pay
  3  on ( PAY.EMPLID= TAX.PI_EMPLID
  4       AND PAY.WEEK_END_DT = TAX.PAY_END_DT
  5       AND PAY.J_PAYCHECK_NBR = TAX.PI_CHECK_NBR
  6       AND SUBSTR(PAY.J_CONTROL_NUMB, 1, 6) = TAX.PI_PAY_GROUP2
  7       and tax.pi_run_num = 0
  8       and tax.pi_system_id = :b2
  9       and tax.pi_config_id = :b2 )
 10  when matched then update set pi_state = nvl(pay.j_state_tax,:b1), 
                                  pi_locality =  nvl(pay.j_local_tax,:b1)
 11  /

the nvl probably isn't needed anymore since we are not using scalar subqueries.

Deleting records in table

Siddu, May 27, 2009 - 7:26 am UTC

Hi Tom,

We are trying to delete 5 lakh records weekly. These 5 lakh records are selected based on criteria which happens dynamically. We are using Perl at the front end. Deletion of records is taking 20 minutes. We were looking at variuos options to increase the performance. One approach I thought was to write a stored procedure to delete the records and do a service side execution instead of client side (here it is perl). Could you please suggest any other ways we can tune the performance.

Cheers,
Siddu

May 27, 2009 - 11:36 am UTC

so, I suppose 5 lakh is 500,000 (best to be unambiguous - acronyms, colloquialisms - should be avoided...)

No one could say if your idea of moving code from perl to plsql will have any impact on performance. Anyone that says they do is lying.

There is no information here to help you.

"These 5 lakh records are selected based on criteria which happens dynamically" is not meaningful, I have no idea what that means, what that entails.

You do not say if you issue 500,000 individual delete statements (bad).

You do not say if you run a query to find 500,000 records and then issue a delete for each one (worse)

You do not say if you run a single delete (good), but you are deleting 500,000 records out of 5,000,000,000 and using a full scan (probably bad)

In short, you don't tell us what you are doing - what you have observed.

20 minutes - that might be really good (if the table has 20 indexes on it - that might be as good as you get).

What don't you (in PAINSTAKING DETAIL) tell us what you need to do and then (and only then, do not skip that part, the telling us what you are trying to do in specification format) tell us your current algorithm.

Then we can suggest methods for rewriting your algorithm more efficiently.

abdou, May 28, 2009 - 10:01 am UTC

hi tom
please i want query show 3 big sal employee from table emp

thank

May 28, 2009 - 1:56 pm UTC

what if 4,000 people all make the same exact really big salaries.

be specific do you want:

a) a random set of 3 or less people that make a salary that is in the first three rows after sorting the entire set by SAL DESC

select * from (select * from emp order by sal desc) where rownum <= 3;

b) the set of employees (maybe 0 to many many rows) that make the top 3 salaries - accounting for 'ties'. That is, if 50 people make $100, and 50 people make 75$ and 50 people make $50 and 50 people make $25 - this answer would return only the 50 people that make $100 (since they all tied for first place, there is no 2nd, 3rd, 4th, .... 50th place person since the first 50 all tied for that honor)

select * from (select emp.*, rank() over (order by sal desc) r from emp) where r <= 3;

c) the set of employees that make a salary that is in the top three salaries. In the above example that would return 150 people - the first three groups

select * from (select emp.*, dense_rank() over (order by sal desc) r from emp) where r <= 3;

updating some columns & 500,000 records in a table

Prem, June 29, 2009 - 9:23 am UTC

hi Tom,

i have a requirement where i need to update a table of 1.5 million records with 300 columns, based on various conditions(as much as 500 different sets of conditions arrived after executing predefined rules)
and the number of rows getting affected could be .5 million and out of 300 columns 20 columns will be updated in the .5 million records

So I need to fire one update query per condition so that makes 500 update statment to be fired and this takes more than 10 hours.

Could you please let me know the best possible solution to handle this situation.

thanks,
prem

3 million rows to update with a subselect

Rebecca, July 17, 2009 - 12:57 pm UTC

Hi Tom,
In a data expansion program I took a set of 22 rows and burst them into 100k+ sets, resulting in
around 3 million rows.

Each set has a common key (FAMILY_ID) and each row a unique key (MEMBER_ID).

The data in the table is structured parent/child-- with the parent's MEMBER_ID being stored in
PARENT_MEMBER_ID.

There are multiple parents in each family.

When I created the 3 million rows- every row gets a new unique MEMBER_ID.
So the new keys to the parent rows need to be remapped to the children they are parents of.
There are about 1 million parents and 2 million children. Its only parent/child- no grandchildren.

Here's where I started:
UPDATE FAMILIES_TBL A SET PARENT_MEMBER_ID = (SELECT X.MEMBER_ID FROM FAMILIES_TBL X
WHERE X.OLD_MEMBER_ID = A.PARENT_MEMBER_ID AND A.BRANCH = X.BRANCH AND A.FAMILY_ID =
X.FAMILY_ID) WHERE PARENT_MEMBER_ID IS NOT NULL

FAMILIES_TBL is a temp table without an index, which was great for the inserts. But we'd both
grow very old before the update completes.

After reading this excellent thread, I changed it to:
CREATE TABLE FAMILIES_TBL_HLD AS SELECT * FROM FAMILIES_TBL WHERE 1=2;
ALTER table FAMILIES_TBL nologging;
ALTER table FAMILIES_TBL_HLD nologging;

Created an index on FAMILIES_TBL("PARENT_MEMBER_ID", "OLD_MEMBER_ID", "BRANCH",
"FAMILY_ID")

insert into FAMILIES_TBL_HLD (PARENT_MEMBER_ID,<rest of fields>)
SELECT (SELECT X.MEMBER_ID FROM FAMILIES_TBL X WHERE A.PARENT_MEMBER_ID = X.OLD_MEMBER_ID AND
A.BRANCH = X.BRANCH AND A.FAMILY_ID = X.FAMILY_ID),<rest of fields> FROM FAMILIES_TBL A WHERE PARENT_MEMBER_ID IS NOT NULL;

Still pretty bad. Any suggestions? It certainly feels like a classic pattern, but Im not 'getting it'.

<table and field names changed to protect the innocent>

Thank you in advance for your suggestions.
Rebecca

Re: 3 million rows to update with a subselect

Carsten, July 20, 2009 - 12:47 pm UTC

Hi Rebecca,

if we are talking about a one time issue: create an index to support the join operation.

otherwise:
I guess the fastest way to solve your problem is to not do it at all.
Basically you seem to have an ID that changes (OLD_MEMBER_ID -> MEMBER_ID) and then update a corresponding FK-field.

If your Member_ID changes, you shouldn't use it as PK, but instead generate one and use this as reference in FKs.
Generate a view to display the parent_member_id in your old fashion, if you need to, which would greatly benefit from above index as well. So from solving a repetitive task you got to solving it once.

Mazhar

Mazhar, August 17, 2009 - 3:03 am UTC

Great Stuff

Update 40 million rows

A reader, August 24, 2009 - 6:52 am UTC

Hi Tom,

We are working on a warehouse and want to update Type 1 and type 2 records from the delta on a given day (which is 40 million) to a table which has 3.5 billion rows.

Some of the approaches which we could think are:
1) Create a dummy copy of the 3.5 billion row table.Run an update statement to this copy. Build indexes and then do an exchamge. this effort takes around 14 hrs total to build 13 indexes along with creating a copy of 3.5 billion rows.
2) We are thinking of another approach to directly update the Dimension on-line. Will that approach work.
3) The delta is sitting in 64 hash sub partitions and the dimension is also hashed partition 64. Can we take advantage of Partition wise Join.
4) Should we try DIY parallelism, as suggested in your book.
5) We are thinking of using Updateable Join view.

Kind of need some guidance which way to try this out.

Thanks

August 25, 2009 - 9:41 am UTC

since you are only touching 40mill out of 3.5billion - why not just update the data?

Update 40 million

A reader, August 25, 2009 - 11:28 am UTC

Hi Tom,

Thanks for your reply. So as I understand we should update the data blocks on-line.

The reason I was confused was we have 12 indexes on this dimension and if we do update the data blocks we will be doing index maintenance and most of the indexes are BITMAP.

Also can you plesae suggest if we should use:

1) Nested Loops as the Join Method (so there will be 40 million index lookups before it touches the data blocks) and updates them OR we should use HASH JOIN to bring the 40 million in hash and scan the bigger dimension (3.5 billion rows) via ROWID Range Scan.

2) I feel the Partition wise Join will scale better since we have 64 hash partitions and each partition of the delta table will own join with the corresponding hash partion of the dimension, so join will happen with 3.5/64 million rows and 40/64 million rows as 64 indepemdent joins.

3) Will NL Partition wise join work, i was eading the docs but could not find any perfect answer to this.

Please help and guide us.

August 25, 2009 - 8:30 pm UTC

... The reason I was confused was we have 12 indexes on this dimension and if we do
update the data blocks we will be doing index maintenance and most of the
indexes are BITMAP.
...

do it as a single statement - in bulk - one sweep.

do NOT do it 40,000,000 times, a slow by slow approach
do it as a single sql

1) write the sql, let the optimizer decide.

2) see #1

3) see #2

Slow Bulk Collect

A reader, August 27, 2009 - 3:12 pm UTC

Tom:

1. I have a two partitioned table with enable row movement.
partition 2 has unprocessed records , partition 1 processed records
2. I have to update around 2 mill rows which will make them move from partition 2 to partition 1.
3. I update them with normal update and it takes around 40 minutes.
4. I rewrote a procedure to do BULK UPDATE rather than simple update.
5. It takes around 60 minutes to do BULK update.

The SQL trace shows Bulk update happening:

UPDATE TABLE A
SET COL1 = :B4 ,
COL2 = :B3 ,
COL3 = :B2
WHERE
COL4 = :B1

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1708 1203.22 2235.05 203830 7981227 99254416 1707253
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1709 1203.22 2235.05 203830 7981227 99254416 1707253

Is my BULK update which should in turn do a BULK - INSERT, BULK - DELETE for partition row movement happening? How do I trouble shoot this.

please provide your valuable input to me to troubleshoot this.

Thanks in Advance.

August 28, 2009 - 5:02 pm UTC

why would you introduce slow procedural code what you have a fast single sql statement?

there is nothing to trouble shoot, I've written a billion times by now:

<quote from every book I've ever written and many postings/articles>

I have a pretty simple philosophy when it comes to developing database software, and it is one that has not changed over many years:
* You should do it in a single SQL statement if at all possible.
* If you cannot do it in a single SQL statement, then do it in PL/SQL (but as little PL/SQL as possible!).
* If you cannot do it in PL/SQL (due to some missing feature like listing the files in a directory), try a Java stored procedure. This is an extremely rare need today with Oracle9i and above.
* If you cannot do it in Java, do it in a C external procedure. This is most frequently the approach when raw speed or the use of a third-party API written in C is needed.
* If you cannot do it in a C external routine, you might want to think seriously about why exactly you need to do it.

</quote>

procedural code is going to be slower than a set based operation to do the same thing in almost every case (exceptions? sure, there are strange edge cases). In this case - a simple update is the only thing to consider.

additional information to prev review

A reader, August 27, 2009 - 4:56 pm UTC

Here are the wait events associated:

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 28 (recursive depth: 1)

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 203828 0.54 1017.05
latch: cache buffers chains 95 0.00 0.02
read by other session 27 0.04 0.26
buffer deadlock 1 0.00 0.00
latch free 20 0.00 0.00
buffer busy waits 26 0.10 0.21
log buffer space 283 0.42 15.33
log file switch completion 27 0.08 0.46
latch: object queue header operation 5 0.00 0.00

So Basically single straight update is taking same amount of time as the BULK update.

Thanks in advance...

slow bulk update

A reader, August 27, 2009 - 5:58 pm UTC

I have gone through another series of questions which you answered. I think simple straight update SQL performs best.

Thanks,

PL/SQL vs SQL

A reader, August 29, 2009 - 3:36 pm UTC

Hi Tom,

I have a question for your regarding SQL and Performance.
...I have a summary table that maintains counts based on a File ID and Region that's used for reporting. The summary table is a GTT(Global Temp Table).

In this table - I have about 250 fields - that represent when a field has changed - so if it's data is number, char date and so on .. and if's changed I update this table.

e.g update tmp_counts
set field1 = field1 + decode(nvl(x(1).field1, 9999),
nvl(x(1).field1, 9999),0,
9999,1,1),
...etc

example above is to monitor if a numeric field has changed. If so increment the counter for that field.

Now...this update to the table is called in a proc with array parameters (the result from a bulk collect using NOCOPY). The procedure is called millions of times to monitor changes.

Now, if I write the above code updating an array instead of a GTT) procedurely using dbms_sql.execute - i.e checking each individual field using IF..ENDIF statements. It takes much, much, much longer. I would have thought updating a small summary table millions of times would have been slower than updating an array in memory? Can you explain this behaviour? Is it quicker doing things in "a set" using an update rather than doing stuff procedurely in PL/SQL. Thanks!

August 29, 2009 - 7:29 pm UTC

... procedurely
using dbms_sql.execute - i.e checking each individual field using IF..ENDIF
statements. It takes much, much, much longer. ...

you thought that compiling code and dynamically executing it over and over would be faster? why?

... Is it quicker doing things in
"a set" using an update rather than doing stuff procedurely in PL/SQL. Thanks!
...

almost always - search this site for the word

mantra

and read about my mantra

PL/SQL vs SQL

A reader, August 29, 2009 - 9:55 pm UTC

Hi Tom,

I understand what you mean by "mantra" and I have all your books - it's just that "intuitively" you'd think doing something in memory would be quicker than doing I/O (from datafiles to buffer cache) is quicker? ... I process millions of records (bulk collect)from a transaction table - are there possibilities to do processes in parallel? I think if you want to use the parallel hint/degree you need to put the table in "parallel mode" ...

August 31, 2009 - 7:13 am UTC

... it's just
that "intuitively" you'd think doing something in memory would be quicker than
doing I/O (from datafiles to buffer cache) is quicker?...

Not if your procedural code you write to modify memory is a billion times less efficient than the code we wrote to modify memory is.

As for your last bit - you have a serial bit of code that updates rows slow by slow (sequentially). How would parallel SQL help you - you have a serial process.

If you want something to run orders of magnitude faster - the typical approach is

a) forget the existing code, put it out of your head for a minute
b) write down in GORY detail what it is you have to do
c) design a set based algorithm to achieve that goal (from scratch)

don't buy into 'sunk cost' (the code you have isn't a sunk cost, it is a failed first implementation). Now that you know what you need to do - do it over with SQL set based operations at the core of your thought process.

Insert 50 million rows to cluster table from another table

Hopkins, September 16, 2009 - 4:46 am UTC

Hi Tom

I need to add more than 50 million rows from one table to another. Both are clustered table

I use the following command

insert /*+ append parallel 16 */ into second_tab
select * from first_tab;

But it takes more than 10 hours and hangs

Could you please show me the best way how to achieve it and some information about parallel hint. SHould I give more than 16 or less?

Thanks

September 16, 2009 - 4:05 pm UTC

... But it takes more than 10 hours and hangs ...

what does it "hang" on - I hate that word, very rarely is it every used correctly. define what you mean by hang.

you cannot direct path load a clustered table, the append hint, not useful. In a cluster - data has a LOCATION, a PLACE where it needs to be. For every row going into the cluster, before we put it away, we have to find out where it needs to go.

A direct path load presumes "data is streaming in, we'll format a block, fill a block, write a block". If the data streaming in has to go onto a SPECIFIC block, we cannot do that.

So, the append hint - not meaningful.

The parallel hint - not specified in any documented format (eg: not correct). http://docs.oracle.com/docs/cd/B19306_01/server.102/b14200/sql_elements006.htm#SQLRF50801 Also, you do not say if you tried even to enable parallel dml (pre-req for parallel dml is to enable it).

Also, there is the problem that clustered tables cannot be created or loaded in parallel using parallel dml - for the same reason direct path doesn't work. The data has a location - multiple loaders would all need access to the same exact locations at the same time.

you do not give us anything to go on here.

a) is the source table in a cluster with other tables, if so, then you realize a full scan will "not be fast", as it has to full scan not only the source table but EVERY OTHER table in there.

b) is the targeted table a btree cluster? hash cluster?

c) are the size and keys settings set correctly?

d) is the target table indexed?

what you should do

a) enable tracing
b) insert some of the data
c) review the work performed, see what the waits were, see if any can be 'removed' (any of the waits)

Here is me doing 50,000,000 rows on a rather old piece of hardware. 10hours would be outrageous.

it also demonstrates that append doesn't work (by showing what append would do if it did - cause a subsequent query to fail).

so, no append
no parallel load

I presume you have lots of indexes - that'll contribute to long load times.

You do have to realize that every row that goes into the cluster will either scan a btree cluster index OR run a hash function (you didn't give us cluster details). That entails quite a bit of work. Clusters take longer to load into (by definition) - their goal is to make data retrieval faster, but loads will necessarily take longer.

You might be able to do a bit of parallel query processing if you do "DIY parallel processing"

http://asktom.oracle.com/pls/ask/search?p_string=diy+parallelism

ops$tkyte%ORA10GR2> /*
ops$tkyte%ORA10GR2> drop table stage;
ops$tkyte%ORA10GR2> create table stage
ops$tkyte%ORA10GR2> as
ops$tkyte%ORA10GR2> select level id, rpad('*',120,'*') data from dual connect by level <= 50000000;
ops$tkyte%ORA10GR2> */
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create cluster btree_cluster ( id number ) size 125
  2  /

Cluster created.

ops$tkyte%ORA10GR2> create index btree_cluster_idx on cluster btree_cluster
  2  /

Index created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create table t
  2  as
  3  select level id, rpad('*',120,'*') data from dual connect by level <= 0;

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> insert /*+ append */ into t select 1, 'x' from dual;

1 row created.

ops$tkyte%ORA10GR2> select * from t;
select * from t
              *
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel


ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> drop table t;

Table dropped.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create table t
  2  cluster btree_cluster( id )
  3  as
  4  select level id, rpad('*',120,'*') data from dual connect by level <= 0;

Table created.

ops$tkyte%ORA10GR2> select * from t;

        ID
----------
DATA
-------------------------------------------------------------------------------
         1
*******************************************************************************
*****************************************


ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> rollback;

Rollback complete.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> set timing on
ops$tkyte%ORA10GR2> insert into t select * from stage;

50000000 rows created.

Elapsed: 02:14:43.63
ops$tkyte%ORA10GR2> set timing off

Updating existing rows with sequence

SAI, February 11, 2010 - 9:42 am UTC

Hi Tom,

I have Two tables 'CLIENT_DETAIL' and 'CLIENT_STAUS_HIST' with same table structure.
ID, TimeModified, XXX, Versioin
Version is newly added column in both records.
Here ID is unique in CLIENT_DETAIL, but not in CLIENT_STATUS_HIST.
When ever CLIENT_DETAIL is updated the details are stored in CLIENT_STATUS_HIST.
I have millions of records in CLIENT_STATUS_HIST. Now i want to update the version column sequencially.

CREATE OR REPLACE
PROCEDURE CLIENT_STATUS_VERSION_UPDATE AS

counter NUMBER;
v_client_id CLIENT_STATUS_HIST.CLIENT_DETAIL_ID%TYPE;
CURSOR CLIENT_HIST IS
SELECT CLIENT_DETAIL_ID FROM client_status_hist
GROUP BY client_detail_id;

CURSOR CLIENT_HIST_FOR_UDATE (client_id in NUMBER) IS
SELECT * FROM client_status_hist WHERE client_detail_id=v_client_id
ORDER BY LAST_MODIFIED_TMSTMP ASC FOR UPDATE;

BEGIN
OPEN CLIENT_HIST;

LOOP
FETCH CLIENT_HIST INTO v_client_id;
counter:= 0;
FOR v_row IN CLIENT_HIST_FOR_UDATE (v_client_id)

LOOP

UPDATE CLIENT_STATUS_HIST SET VERSION= counter where current OF CLIENT_HIST_FOR_UDATE;
counter:= counter + 1;

EXIT WHEN CLIENT_HIST_FOR_UDATE%NOTFOUND;
END LOOP;
CLOSE CLIENT_HIST_FOR_UDATE;

UPDATE CLIENT_DETAIL SET VERSION= counter where client_detail_id= v_client_id;

EXIT WHEN CLIENT_HIST%NOTFOUND;
END LOOP;
CLOSE CLIENT_HIST;

END CLIENT_STATUS_VERSION_UPDATE;

This Procedure takes nearly 3 hours to complete the task. Is there any other way to do it in less time?

February 16, 2010 - 9:51 am UTC

ops$tkyte%ORA10GR2> create table t as select * from all_users;

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> alter table t add version number;

Table altered.
<b>
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> merge into t
  2  using (select rowid rid, row_number() over (order by created asc) rn from t) x
  3  on (t.rowid =  x.rid)
  4  when matched then update set version = x.rn;

46 rows merged.
</b>

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select * from (
  2  select * from t order by version ) where version <= 5;

USERNAME                          USER_ID CREATED                 VERSION
------------------------------ ---------- -------------------- ----------
SYS                                     0 30-jun-2005 19:10:18          1
SYSTEM                                  5 30-jun-2005 19:10:18          2
OUTLN                                  11 30-jun-2005 19:10:24          3
DIP                                    19 30-jun-2005 19:14:45          4
TSMSYS                                 21 30-jun-2005 19:20:34          5

Updating existing rows with sequence

SAI, February 17, 2010 - 3:46 am UTC

Tom,

Thank you for the follow up.

my table t looks some thing like this.

USERNAME USER_ID CREATED VERSION
------------------------------ ---------- -------------------- ----------
SYS 0 30-jun-2005 19:10:18 1
SYS 0 30-jun-2005 19:11:18 2
SYSTEM 5 30-jun-2005 19:10:18 1
OUTLN 11 30-jun-2005 19:10:24 1
OUTLN 11 30-jun-2005 19:11:24 2
OUTLN 11 30-jun-2005 19:13:24 3
DIP 19 30-jun-2005 19:14:45 1
TSMSYS 21 30-jun-2005 19:20:34 1

February 17, 2010 - 9:15 am UTC

just use partition by if you need to break up and reset by username.

ops$tkyte%ORA11GR2> create table t as select * from all_users where username like 'C%' or username like 'S%' or username like 'M%';

Table created.

ops$tkyte%ORA11GR2> insert into t select * from all_users where username like 'C%' or username like 'S%';

9 rows created.

ops$tkyte%ORA11GR2> insert into t select * from all_users where username like 'S%';

8 rows created.

ops$tkyte%ORA11GR2> alter table t add version number;

Table altered.

ops$tkyte%ORA11GR2> merge into t
  2  using (select rowid rid, <b>row_number() over (PARTITION BY USERNAME order by created asc) rn</b> from t) x
  3  on (t.rowid =  x.rid)
  4  when matched then update set version = x.rn;

29 rows merged.

ops$tkyte%ORA11GR2> select * from t order by username, version;

USERNAME                          USER_ID CREATED                VERSION
------------------------------ ---------- ------------------- ----------
CTXSYS                                 43 2009-08-13 23:09:45          1
CTXSYS                                 43 2009-08-13 23:09:45          2
MDDATA                                 65 2009-08-13 23:19:11          1
MDSYS                                  57 2009-08-13 23:12:05          1
MGMT_VIEW                              74 2009-08-13 23:24:58          1
SCOTT                                  84 2009-08-13 23:35:44          1
SCOTT                                  84 2009-08-13 23:35:44          2
SCOTT                                  84 2009-08-13 23:35:44          3
SH                                     88 2009-09-14 15:09:16          1
SH                                     88 2009-09-14 15:09:16          2
SH                                     88 2009-09-14 15:09:16          3
SI_INFORMTN_SCHEMA                     56 2009-08-13 23:12:05          1
SI_INFORMTN_SCHEMA                     56 2009-08-13 23:12:05          2
SI_INFORMTN_SCHEMA                     56 2009-08-13 23:12:05          3
SPATIAL_CSW_ADMIN_USR                  70 2009-08-13 23:22:20          1
SPATIAL_CSW_ADMIN_USR                  70 2009-08-13 23:22:20          2
SPATIAL_CSW_ADMIN_USR                  70 2009-08-13 23:22:20          3
SPATIAL_WFS_ADMIN_USR                  67 2009-08-13 23:22:15          1
SPATIAL_WFS_ADMIN_USR                  67 2009-08-13 23:22:15          2
SPATIAL_WFS_ADMIN_USR                  67 2009-08-13 23:22:15          3
SYS                                     0 2009-08-13 23:00:59          1
SYS                                     0 2009-08-13 23:00:59          2
SYS                                     0 2009-08-13 23:00:59          3
SYSMAN                                 72 2009-08-13 23:22:32          1
SYSMAN                                 72 2009-08-13 23:22:32          2
SYSMAN                                 72 2009-08-13 23:22:32          3
SYSTEM                                  5 2009-08-13 23:00:59          1
SYSTEM                                  5 2009-08-13 23:00:59          2
SYSTEM                                  5 2009-08-13 23:00:59          3

29 rows selected.

Updating existing rows with sequence

SAI, February 18, 2010 - 3:51 am UTC

Thank you tom. Thank you very much.

Update huge table

Andrey, March 03, 2010 - 3:45 pm UTC

Hello!
I'm using next method last 3 years & it works well
(But i'm not working with clustered tables...):
1) Open cursor for the query
SELECT O.DATA_OBJECT_ID
,E.RELATIVE_FNO AS FILE_ID
,E.BLOCK_ID
,E.BLOCKS
FROM ALL_OBJECTS O
,DBA_EXTENTS E
WHERE O.OWNER = :PR_SCHEMA_NAME
AND O.OBJECT_NAME = :PR_TABLE_NAME
AND O.DATA_OBJECT_ID IS NOT NULL
AND E.OWNER = O.OWNER
AND E.SEGMENT_NAME = O.OBJECT_NAME
AND NVL(E.PARTITION_NAME, E.SEGMENT_NAME) = NVL(O.SUBOBJECT_NAME, O.OBJECT_NAME)
ORDER BY E.FILE_ID
,E.BLOCK_ID
2) Split by this cursor segment to many continuous parts
of blocks (i am using regions from 8000 to 12000 blocks) & construct start and finish rowids of regions by dbms_rowid
3) Save regions parameters to a queue
4) Start scheduler processes which read regions from queue and update data only between start_rowid and end_rowid of the region with commit for any operation.

P.S.> I've update 525 000 000 records on the production DB for an 2:40:15. 24CPU, 16 Job process, 24Gb mem + high-perf. storage.

March 04, 2010 - 8:27 am UTC

we call that "do it yourself parallelism", written about many times.

http://asktom.oracle.com/pls/asktom/f?p=100:11:4101082379852146::::P11_QUESTION_ID:10498431232211

but you have to remember that is completely and totally changes the transaction semantics - you have to understand the ramifications of doing it that way.

11g Release 2 actually formalizes this approach using the new dbms_parallel_execute package.

Merging 3 very huge tables.

Prats, March 11, 2010 - 2:47 am UTC

Tom,

In one of the legacy database we have inherited, there are 3 existing tables with millions of records. Most of their columns are redundant and now we are planning to merge their data together in a new table which would contain all columns from these tables.

For this I will essentially have to perform following-
Step 1) insert table_A into new_table
Step 2) Traverse through table_B update respective columns in new_table [if corresponding record has been inserted from table_A].
Else insert new record from table_B.
Step 3) Repeat step 2 with table_C to update existing records and insert extra records from table_C

Though this is one time job, it would be done at production systems.What would be the optimized approach to go about it.
Please guide...

March 11, 2010 - 8:31 am UTC

why not insert a join of the three????

RE: Merge 3 huge

Duke Ganote, March 11, 2010 - 3:35 pm UTC

Like this?

create table A 
as ( select level AS key_value, level as attribute_value
     from dual connect by level < 5 )
;
create table B 
as select level-1 AS key_value, level as attribute_value
     from dual connect by level < 5
;
create table C
as select level-2 AS key_value, level as attribute_value
     from dual connect by level < 5
;

create table combined_table as
select coalesce(c.key_value,b.key_value,a.key_value) as key_value
     , coalesce(c.attribute_value,b.attribute_value,a.attribute_value) as key_attribute
  from A
  full 
  join b
    on b.key_value = A.key_value
  full 
  join c
    on c.key_value = coalesce(b.key_value,a.key_value)
;
select key_value as "KEY", key_attribute AS "ATTR"
     , case when key_value = key_attribute then 'A'
            when key_value = key_attribute-1 then 'B'
            when key_value = key_attribute-2 then 'C'
        END as SRC
  from combined_table order by key_value
;
KEY   ATTR SRC
-1 1 "C"
 0 2 "C"
 1 3 "C"
 2 4 "C"
 3 4 "B"
 4 4 "A"

Updating a filed with a subquery for 94 million row table

CT, March 12, 2010 - 11:39 am UTC

Hi Tom,
Hello, how are you? Based on your recocmmendation I tried updating a 94 million row table T1 as follows
where T1's structure is
mpi varchar2(25)
tid Primarykey number
ttime date

and structure of x is
oldmpi varchar2(25)
newmpi varchar2(25)
name varchar2(50)

create table t2 nologging as select (select newmpi from x where x.oldmpi = t1.mpi) mpi, tid, ttime from T1;

when table t2 was created the mpi field did not have any value. In short the subquery that I used to populate the field does not seem to have worked. The timing for the above was about 40 mins or so, which was great! I really need help with this issue as I have not been able to find any method by which to update this huge table. Your help with this is greatly appreciated

March 12, 2010 - 3:56 pm UTC

you would have to give me an example, with data - as far as I can see, it would have worked correctly as typed in.

However, I would not have done it that way, you should use:

select x.newmpi, t1.tid, t1.ttime from t1, x where t1.mpi = x.oldmpi(+);

RE: "the mpi field did not have any value"

Duke Ganote, March 12, 2010 - 12:31 pm UTC

Sounds like basic debugging...what do you get with this query?

select count(*) 
  from x 
  join t1
    on x.oldmpi = t1.mpi

If the count result is zero, then it's unsurprising that "the mpi field did not have any value". If it's non-zero, then it is surprising.

The count was non zero

CT, March 12, 2010 - 1:14 pm UTC

I ran the query as suggested by you and the count was non zero. There are matching records in T1 on mpi column between x.oldmpi and T1.mpi. Is there any reason why the subquery may not have worked maybe it is returning more than one row? Thanks

March 12, 2010 - 4:10 pm UTC

if it returned more than one row, it would fail.

You would need to provide a full example, but again - you want to outer join, NOT use a scalar subquery.

But until we have a FULL EXAMPLE that shows exactly what you did - we cannot comment on how it might have gone wrong for you.

creating a new table after joining two large tables

Somjit, April 12, 2010 - 7:29 am UTC

Hi Tom,

I read the solutions you gave in this thread. I have two tables:

A => 1200Million records
B => 70Million records

The tables are not partitioned and we have a plan in place to implement that now. The issue is that we are creating a new table which would contain data starting from 01.Jan.2010 only and for this purpose want to create a new table A_1 which has the same cols as A but with an additional column eventdate:

This is how I proceeded:

create table b_1 as select * from b where 1=2;

alter table b_1 add (eventdate date);

alter table b_1 nologging;

INSERT /*+ append parallel (C,12) */
into a_1 C
SELECT /*+ parallel (A,12) */
A.TRANS_ID, ..<all columns from table>..,
To_Timestamp(to_char(B.EVENTDATE,'dd.mm.yyyy')||' '||B.Eventtime,'dd.mm.yyyy hh24:mi:ss.FF')
from A, B
where B.EventDate >=  to_date('2010-01-01','YYYY-MM-DD')
where A.trans_Id = B.Trans_Id

columns used in predicate are indexed. B.Trans_Id will be unique and A.Trans_Id will be a normal index as it has duplicate values. The concern is that this join condition is doing a heavy sort operation because of which the TEMP tablespace is getting filled up. How do I get this working as efficiently as possible?

April 13, 2010 - 9:17 am UTC

are you sure temp is getting filled - or is it saying "unable to extend temporary segment in tablespace XXX" where xxx is the target tablespace for this table?

This won't be sorting, this'll be HASHING (I would hope). I would expect B to be full scanned first, the event date predicate evaluated and the rows that are > 2010-01-01 to be hashed into memory and spilling on to disk if necessary. Then the join to A.

If the hash table is spilling to disk you can increase your pga settings, or if you are using old manual memory management the hash area size.

New method

Michel, April 19, 2010 - 11:22 pm UTC

Hi Tom,

This link helped me , But still confusing

A- 170Milion rows
B-80 milion rows

some what like this i am doing

merge /*+ parallel(TableA,4) */ into TableA
using TableB K
on(TableA.Col1=K.col1)
when matched then
Update set
TableA.col2=K.col2,
TableA.col3=K.col3,
TableA.col4='p';

explain plan is taking more time , hw ca i write by using nologging method ...

April 20, 2010 - 8:13 am UTC

why do you think nologging would make it go faster - it probably would not - the redo is written out in the background, you are not really 'waiting' for it.

You cannot merge "nologging", the insert part of the merge can be done nologging (use the append hint), but NOT the update part, NOT the index maintenance part.

the update part will always be done logging
the index maintenance part will always be done logging

rename

Michel, April 20, 2010 - 12:44 pm UTC

Tom,

My qn was how to write the same requirement not using merge
by using table rename method,

April 20, 2010 - 12:56 pm UTC

wow, that is about as far away from what I thought your question was as possible. I sure didn't see that from the above...

To do the "create a new table, and rename" trick, you would have to full outer join OLD with NEW (you want to keep all OLD records even if not in NEW, you want all NEW records even if not in OLD)

It'll be like this:

create table new_table
as
select coalesce( old.key, new.key ) key,
       ... repeat this for every column except join key ...
       case when new.key is null then old.column else new.column end, 
       ... above repeated .....
  from old full outer join new on (old.key=new.key);

if old.key is null - record is in new but not old.
if new.key is null - record is in old but not new.
if neither is null - record is in both.

coalesce will return "the key" if old is null, returns the new key (which must not be null)

the case will return the old column if the new key is null - else return the new column (if the new.key is not null - then the new.column overrides anything in the old column)

creating a new table after joining two large tables contd

Somjit, April 21, 2010 - 7:23 am UTC

Hello Tom,

This is continuation from my earlier post.

I tried to run an explain plan on my query

SQL> explain plan for
  2  INSERT /*+ append parallel (C, 8) */
  3  into SEALSNET_STATPER_INST_P C
  4  SELECT /*+ parallel (A,8) parallel (b,8) */
  5  A.TRANS_ID, A.INST_ID, A.INST_LOGGERID, A.INST_CREATIONDATE, A.INST_CREATIONTIME, A.INST_COMPCLASS,
  6    A.INST_COMPTYPE, A.INST_COMPNAME, A.INST_TAG, A.INST_UOM, A.INST_VALUE, A.INST_PLATFORM, A.INST_PRIORITY,
  7  To_Timestamp(to_char(B.EVENTDATE,'dd.mm.yyyy')||' '||B.Eventtime,'dd.mm.yyyy hh24:mi:ss.FF')
  8  from SEALSNET_STATPER_INST_BCK A, sealsnet_statper B
  9  where A.trans_Id = B.Trans_Id
 10  and B.EventDate >= to_date('2010-01-01','YYYY-MM-DD');

Explained.

Elapsed: 00:00:00.01
SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2340029746

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |                           |   186M|    39G|   239K  (1)| 00:48:00 |        |      |            |
|   1 |  LOAD AS SELECT          | SEALSNET_STATPER_INST_P   |       |       |            |          |        |      |            |
|   2 |   PX COORDINATOR         |                           |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)   | :TQ10002                  |   186M|    39G|   239K  (1)| 00:48:00 |  Q1,02 | P->S | QC (RAND)  |
|*  4 |     HASH JOIN BUFFERED   |                           |   186M|    39G|   239K  (1)| 00:48:00 |  Q1,02 | PCWP |            |
|   5 |      PX RECEIVE          |                           |    29M|  1327M| 58068   (1)| 00:11:37 |  Q1,02 | PCWP |            |
|   6 |       PX SEND HASH       | :TQ10000                  |    29M|  1327M| 58068   (1)| 00:11:37 |  Q1,00 | P->P | HASH       |
|   7 |        PX BLOCK ITERATOR |                           |    29M|  1327M| 58068   (1)| 00:11:37 |  Q1,00 | PCWC |            |
|*  8 |         TABLE ACCESS FULL| SEALSNET_STATPER          |    29M|  1327M| 58068   (1)| 00:11:37 |  Q1,00 | PCWP |            |
|   9 |      PX RECEIVE          |                           |   289M|    48G|   181K  (1)| 00:36:18 |  Q1,02 | PCWP |            |
|  10 |       PX SEND HASH       | :TQ10001                  |   289M|    48G|   181K  (1)| 00:36:18 |  Q1,01 | P->P | HASH       |
|  11 |        PX BLOCK ITERATOR |                           |   289M|    48G|   181K  (1)| 00:36:18 |  Q1,01 | PCWC |            |
|  12 |         TABLE ACCESS FULL| SEALSNET_STATPER_INST_BCK |   289M|    48G|   181K  (1)| 00:36:18 |  Q1,01 | PCWP |            |
-----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("A"."TRANS_ID"="B"."TRANS_ID")
   8 - filter("B"."EVENTDATE">=TO_DATE('2010-01-01 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))

25 rows selected.

Is there some way I can reduce the huge space required in "HASH JOIN BUFFERED" area? 39G required

April 21, 2010 - 9:03 am UTC

you could use indexes and nested loops - but then you would be asking me how to make this run in less then 100 days.

40gb is pretty small these days... given the volume of data we work with.

how to audit ddl command from a user in oracle 9i

Soumya Samanta, April 22, 2010 - 10:46 am UTC

hi, I am Soumya Samanta. I want to audit ddl commands issued from a user in oracle 9i environment.
How do i do that?

April 22, 2010 - 12:53 pm UTC

the AUDIT command would work.
http://otn.oracle.com/pls/db92/search?remark=&word=audit&book=&preference=

you could write you own using a system event trigger as well.

http://otn.oracle.com/pls/db92/search?word=system+event+trigger

Slow update

Philippe Gaudin, April 29, 2010 - 6:16 am UTC

Hi Tom,

I have a query that has to update 23.000.000 rows in a table. Here are the query and the plan.

SQL> EXPLAIN PLAN
  SET STATEMENT_ID = 'PGA_TST1'
  FOR
UPDATE (SELECT current_pol_cov_trans.pct_cov_subscription_value AS cur_cov_subscription_value,
               current_pol_cov_trans.pct_cov_in_force_value AS cur_cov_in_force_value,
               current_pol_cov_trans.pct_cov_gross_amount AS cur_cov_gross_amount,
               current_pol_cov_trans.pct_cov_net_amount AS cur_cov_net_amount,
               current_pol_cov_trans.pct_cov_tariff_amount AS cur_cov_tariff_amount,
               previous_pol_cov_trans.pct_cov_subscription_value AS prev_cov_subscription_value,
               previous_pol_cov_trans.pct_cov_in_force_value AS prev_cov_in_force_value,
               previous_pol_cov_trans.pct_cov_gross_amount AS prev_cov_gross_amount,
               previous_pol_cov_trans.pct_cov_net_amount AS prev_cov_net_amount,
               previous_pol_cov_trans.pct_cov_tariff_amount AS prev_cov_tariff_amount,
               current_pol_cov_trans.pct_subscription_counter,
               current_pol_cov_trans.pct_in_force_counter,
               current_pol_cov_trans.pct_cov_delta_net_amount,
               current_pol_cov_trans.pct_cov_delta_tariff_amount
          FROM fact_policy_cov_trans current_pol_cov_trans
               JOIN fact_policy_cov_trans previous_pol_cov_trans
                 ON (previous_pol_cov_trans.pct_policy_version_skey = current_pol_cov_trans.pct_prev_version_skey
                     AND previous_pol_cov_trans.pct_coverage_skey = current_pol_cov_trans.pct_coverage_skey))
   SET pct_subscription_counter     =
         (CASE
            WHEN cur_cov_subscription_value = 'N' AND prev_cov_subscription_value <> 'N' THEN -1
            WHEN cur_cov_subscription_value <> 'N' AND prev_cov_subscription_value = 'N' THEN +1
            ELSE 0
          END),
       pct_in_force_counter         =
         (CASE
            WHEN cur_cov_in_force_value = 'N' AND prev_cov_in_force_value <> 'N' THEN -1
            WHEN cur_cov_in_force_value <> 'N' AND prev_cov_in_force_value = 'N' THEN +1
            ELSE 0
          END),
       pct_cov_delta_net_amount     = cur_cov_net_amount - prev_cov_net_amount,
       pct_cov_delta_tariff_amount  = cur_cov_tariff_amount - prev_cov_tariff_amount
Explain complete.
SQL> SELECT * FROM table (DBMS_XPLAN.display (NULL, 'PGA_TST1', 'BASIC PARALLEL ROWS'))

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------
Plan hash value: 3795694554

-----------------------------------------------------------------------------------------------
| Id  | Operation                | Name                  | Rows  |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT         |                       |    14M|        |      |            |
|   1 |  UPDATE                  | FACT_POLICY_COV_TRANS |       |        |      |            |
|   2 |   PX COORDINATOR         |                       |       |        |      |            |
|   3 |    PX SEND QC (RANDOM)   | :TQ10002              |    14M|  Q1,02 | P->S | QC (RAND)  |
|   4 |     HASH JOIN BUFFERED   |                       |    14M|  Q1,02 | PCWP |            |
|   5 |      PX RECEIVE          |                       |    22M|  Q1,02 | PCWP |            |
|   6 |       PX SEND HASH       | :TQ10000              |    22M|  Q1,00 | P->P | HASH       |
|   7 |        PX BLOCK ITERATOR |                       |    22M|  Q1,00 | PCWC |            |
|   8 |         TABLE ACCESS FULL| FACT_POLICY_COV_TRANS |    22M|  Q1,00 | PCWP |            |
|   9 |      PX RECEIVE          |                       |    17M|  Q1,02 | PCWP |            |
|  10 |       PX SEND HASH       | :TQ10001              |    17M|  Q1,01 | P->P | HASH       |
|  11 |        PX BLOCK ITERATOR |                       |    17M|  Q1,01 | PCWC |            |
|  12 |         TABLE ACCESS FULL| FACT_POLICY_COV_TRANS |    17M|  Q1,01 | PCWP |            |
-----------------------------------------------------------------------------------------------

19 rows selected.

The plan seems OK to me : hash join with full tablescan (as most of the rows of the table has to be read).

My problem is that it is slow and it generates a lot of "db file sequential read" waits. I guess that these waits are for the blocks that have to be updated...

What could I do to solve this problem ?

Thanks in advance,

Philippe

April 29, 2010 - 7:48 am UTC

what is "slow"

I would start there - in order to see if I thought it was slow or not first.

23,000,000 is not totally useful, it doesn't tell us if there is 23mb, 23gb or 23tb to update.

Philippe Gaudin, April 29, 2010 - 7:55 am UTC

The fact_policy_cov_trans table is 4.5 GB.

The execution of the query took 45 minutes... and in the past, with approximately the same data, the same plan, it took 6 minutes.

April 29, 2010 - 9:05 am UTC

the db file sequential reads are from disk IO to get the blocks from disk into memory during a full scan operation.

These are waits for the blocks that identify the rows you want to update and hold those same rows you want to update.

If these waits are "new" (you never saw them before) then the 'problem' is that IO has suddenly slowed down - a table of this size would be mostly on disk and the IO was always being done, so the conclusion would be likely that "the IO is taking longer than it did before"

what has changed? pga_aggregate_target (are you using that)? new application that also modifies this table (even if just ONE record) running at the same time? do you have performance metrics from before (using ASH or AWR)? Anything to go on? New index that never existed before? More users logged in than previously (just logged in, don't even have to be active)?

I'd be looking for what changed - because we know something must have..

Philippe Gaudin, April 29, 2010 - 10:02 am UTC

Thanks Tom for your answer.

The UPDATE is part of a data warehouse FIRST load.

I was thinking that FULL TABLE SCAN waits were db file scattered reads... and not sequential reads.

I will investigate the memory (PGA_AGGREGATE_TARGET,...) parameters.

April 29, 2010 - 10:26 am UTC

sounds like index maintenance - did you recently have one added to the columns you are modifying?

Philippe Gaudin, April 30, 2010 - 1:48 am UTC

No index on the updated columns.

The db file sequential read are done by the parallel slaves. That's why I was thinking that Oracle had to get each block that needs to be updated after the hash join.

April 30, 2010 - 7:05 am UTC

that is likely true as well, the hash join would employ multiblock read (probably direct path read), the result set is built and then lots of index like access to the blocks as we discover we need to update them.

You might try a merge and/or PDML (parallel dml)

the merge would be a bulky process, the PDML would let the p->s you have possibly stay parallel.

merge into fact_policy_cov_trans Tgt
using
(SELECT current_pol_cov_trans.rowid rid,
               current_pol_cov_trans.pct_cov_subscription_value AS cur_cov_subscription_value,
               current_pol_cov_trans.pct_cov_in_force_value AS cur_cov_in_force_value,
               current_pol_cov_trans.pct_cov_gross_amount AS cur_cov_gross_amount,
               current_pol_cov_trans.pct_cov_net_amount AS cur_cov_net_amount,
               current_pol_cov_trans.pct_cov_tariff_amount AS cur_cov_tariff_amount,
               previous_pol_cov_trans.pct_cov_subscription_value AS prev_cov_subscription_value,
               previous_pol_cov_trans.pct_cov_in_force_value AS prev_cov_in_force_value,
               previous_pol_cov_trans.pct_cov_gross_amount AS prev_cov_gross_amount,
               previous_pol_cov_trans.pct_cov_net_amount AS prev_cov_net_amount,
               previous_pol_cov_trans.pct_cov_tariff_amount AS prev_cov_tariff_amount,
               current_pol_cov_trans.pct_subscription_counter,
               current_pol_cov_trans.pct_in_force_counter,
               current_pol_cov_trans.pct_cov_delta_net_amount,
               current_pol_cov_trans.pct_cov_delta_tariff_amount
          FROM fact_policy_cov_trans current_pol_cov_trans
               JOIN fact_policy_cov_trans previous_pol_cov_trans
                 ON (previous_pol_cov_trans.pct_policy_version_skey =
current_pol_cov_trans.pct_prev_version_skey
                     AND previous_pol_cov_trans.pct_coverage_skey =
current_pol_cov_trans.pct_coverage_skey)) src
on (src.rid = tgt.rowid)
when matched then
update
   SET pct_subscription_counter     =
         (CASE
            WHEN cur_cov_subscription_value = 'N' AND prev_cov_subscription_value <> 'N' THEN -1
            WHEN cur_cov_subscription_value <> 'N' AND prev_cov_subscription_value = 'N' THEN +1
            ELSE 0
          END),
       pct_in_force_counter         =
         (CASE
            WHEN cur_cov_in_force_value = 'N' AND prev_cov_in_force_value <> 'N' THEN -1
            WHEN cur_cov_in_force_value <> 'N' AND prev_cov_in_force_value = 'N' THEN +1
            ELSE 0
          END),
       pct_cov_delta_net_amount     = cur_cov_net_amount - prev_cov_net_amount,
       pct_cov_delta_tariff_amount  = cur_cov_tariff_amount - prev_cov_tariff_amount
/

it would tend to build the src result and bulk merge into the tgt - might be more "io" friendly

disabling table lock

Soumya Samanta, May 13, 2010 - 9:44 am UTC

Hi Tom,
Please make a solution for the problem mentioned below:
I have made a command to prevent ddl like 'Alter table table_name disable table lock' but that would create a problem to delete a record from a particular table which has a foreign key to other table and also have two indexes. what is the solution because i have to prevent the ddl

Update 150 Million rows

Jeff, August 31, 2010 - 9:38 am UTC

Hi Tom,
I have a situation where I need to update around 45 tables, and each table has around 150M rows.

Need to update only one field from 500 to 510 in all tables.

Here is what I have done.

--set parallel to 8
--drop indexes
--increased memory parameters
update table set field = '510';
commit;

Right now each table takes 1 hour on an average.

Is there any way I can make it run faster?

Really appreciate your help,
Thanks

September 09, 2010 - 11:47 am UTC

why drop indexes? do not do that.

you would not have an index on field - since it contains just one value (the only index you would drop would be an index on field - I presume you don't have any).

it seems "silly" to me to update all to '510', what version are you on?

Update 150 Million rows

A reader, September 13, 2010 - 1:33 pm UTC

Thanks Tom for the reply.

Yes the field is part of pretty much all the indexes. I ended up dropping the unique indexes and marking unusable the rest.

Its an SAP application and this field is a client number.

Version of the database is 1024 on SuSe 10.

Thanks in advance.
Jeff

September 13, 2010 - 2:54 pm UTC

SAP - means we cannot do much since SAP is a 3rd party app.

In 11g, we could have dropped (set unused) the column and readded it with a default value. In 11g - newly added columns with defaults don't have to update the table.

You are already doing it in the most effective way possible I guess - short of using more parallel operations if you have the resources.

Update 150 Million rows

Jeff, September 13, 2010 - 3:10 pm UTC

Thanks Tom.

That's a very good idea. We will be upgrading to 11g soon. This process is a client copy process where production data is copied over to the other system for testing purposes every now and then.

Thanks again.
Jeff

Update from sequence

Dan, September 23, 2010 - 11:15 pm UTC

We have about 40 million rows in a table big_book(chapter number, lineno number, line_text varchar2(80)).
We currently have lineno 1..n for each chapter.
We need to replace lineno with values from a sequence ordered by (chapter, lineno) to result in a long running lineno 1 through 4 million.
Can this be done with a single sql statement?
If not is bulk binds with forall the fastest alternative?

September 27, 2010 - 10:11 am UTC

merge into T
using (select rowid rid, row_number() over (order by ......) rn from t) x
on (t.rowid = x.rid)
when matched then update set lineno = x.rn;

and then create your sequence (for future lines) starting with the max value plus one.

And realize that of course sequences are not gap free and never ever will be - so all you can be sure of in the future is that the lineno will be unique - but NOT necessarily sequential. And unless you lock the table before generating the sequence, you cannot be sure it will be "ordered" by your columns anymore either.

Update from sequence

Dan, September 23, 2010 - 11:27 pm UTC

I may have missed a detail in the previous post.
The sequence is global - used by other books. So there may be holes in the lineno generated for a give book - and that is ok.
(The book is just an analogy to our actual schema).

Update 200M Records

Rajeshwaran, Jeyabal, November 16, 2010 - 11:34 am UTC

Hi Tom:

I have a table with nearly 200 million records ( 41 columns ) Hash partition by 8, with 21 indexes ( 3 index Global partition other 18 index is not partitioned).I need to update the column C3 ( presently 2 non-unique composite index is build on this columns) with some padded values. so i used the below scenario.

Create Table T2
(
 c1 ,
 c2 ,
 c3
 ...
 c41
)
nologging
parallel 4 tablespace tools 
as
select  c1,
  c2,
  case when length(c3) = 7 and replace(translate(c3,'0123456789','@'),'@') is null 
   then lpad( substr(c3,1,2),4,'0') || lpad( substr(c3,3,5),7,'0') 
   else c3 
  end as c3,
  .....
  c41
from T partition (p1);

alter table T exchange partition p1 with table T2 without validation;
drop table T2 purge;

( Simillary repeat for other partitions available in table T )

But this will make non-local index to UNUSABLE state. Hence i need to rebuild all 21 indexes in NOLOGGING and parallel option.

Is there is any other option available better than this?

November 17, 2010 - 5:40 am UTC

In this case, you might consider

a) setting unused the two indexes on c3
b) updating (using parallel dml perhaps) c3
c) rebuilding the two indexes

that seems like it would be less work. Hopefully, the update would not cause too many row migrations as you are only adding 4bytes to each row - if there is sufficient free space on the block, most rows should stay where they are.

adaptive direct path updates

Amir Riaz, November 17, 2010 - 1:09 pm UTC

Hi Tom,

http://afatkulin.blogspot.com/2009/01/11g-adaptive-direct-path-reads-what-is.html

in Oracle 11g, oracle uses direct path reads for serial full table scan. oracle uses adaptive direct path reads when full table scan size is more the 2% of cache size.

If that is possible with reads, then adaptive direct path should also kick in when doing updates or deletes.

if, update and delete are also made adaptive, it will be really cool.

I tried to produce a test but cannot produce both the cases?. Please help, especially in update case, if you consider that sort of modification has been done in oracle 11g.

November 18, 2010 - 3:20 am UTC

think about it - a direct path read bypasses the buffer cache. If the block isn't in the cache and someone else attempts to go after it - how will they know you have it in current mode for your update?

when you modify a block, you get it in current mode. Only one thing at a time can have it in current mode. If you did the work in your PGA (direct reads go there), how would anyone KNOW you are doing that. It needs to be in the buffer cache so we can put it in current mode (preventing others from doing that at the same time) - modify it - release it from current mode and have it marked "dirty".

If update and delete did it - it wouldn't be "cool".

Just like a direct path write causes the entire segment to be locked, so would this so called direct path read with a modification. Not so cool :)

got your point.

Amir Riaz, November 18, 2010 - 6:36 am UTC

Hi Tom,

Thanks for reply, yes, i got your point. adaptive direct path reads were possible because oracle does not lock for reading. for serial update we need the block in current mode. With direct path update oracle(if implemented) oracle has to lock the whole segament which effects concurrency.

I think direct path updates are possible with oracle parallel updates in data warehouse environments. I tried to find out its functionality after what you have told me above. Cannot found much about this topic. However here is my theory about parallel updates in oracle.

"Oracle uses parallel updates if parallel dml mode is enabled and a hint exist in update statement. Oracle can also use parallel update statement if tables are enabled for parallel update. under these condition if size of table is more than 2% of database cache size oracle will go for parallel update using direct path. Oracle will lock the whole segament and perform update in PGA and after updating the block in PGA, write it to database files on the same place where it has read it"

If above is true. than.
1. How does oracle populate rollback segaments.
2. if, i have big cache and table size is less than 2% of database cache size, but optimizer plan finds out that serial update is still costly, can oracle will consider In memory parallel update (trying to understand under which condition in memory parallel processing will be performed)
3. multiversioning does not come into play here. because the whole segament is lock and environment is data warehouse. I hope i am right here.
4. does oracle generate redo with parallel updates.

November 19, 2010 - 7:26 am UTC

1) no differently than at any other time
2) yes
3) yes it does, it always does. why wouldn't it?
4) yes

direct path reads and updates.

A read, November 19, 2010 - 1:53 am UTC

I dont see direct path reads or parallel direct path updates are much useful, if we dont cache data in buffer cache we have to do physical I/O for each query which database will execute. Why oracle introduce direct path reads or updates.

November 19, 2010 - 9:36 am UTC

say we didn't do direct io

say you need to full scan a table of 1,000,000 blocks.

it cannot fit into memory - no chance all 1,000,000 blocks will be there.

so, we start full scanning - say we'll read 64 blocks at a time (db file multi-block read count)

now before we can read blocks 1-64, we have to check in the cache to see if any of them are there. that is 64 LOOKUPS - ONE AFTER THE OTHER. Say we found 2 blocks - block 20 and block 34.

Now we need to read blocks 1..19 into the cache, then blocks 21..33 into the cache, then blocks 35..64 into the cache.

No more 64 block IO's :(

we did 64 buffer cache lookups (for two entire blocks :( )

we put 62 blocks in the cache - probably flushing out some blocks in the meanwhile - probably blocks from the table were reading - causing us to find even less blocks over time (but we still have to look!) :(

the cache in short GETS IN THE WAY. Makes the IO we have to do less efficient, adds 1,000,000 lookups that probably won't find anything, and making us move the blocks into the cache (1,000,000 times) for NOTHING (because they won't be there later when I go to look again)

a cache is not always a good thing.

oracle 11g some more optimizations

A reader, November 21, 2010 - 1:40 pm UTC

Hi Tom,

http://jonathanlewis.wordpress.com/2009/01/16/concurrency/

Check this out
consistent gets from cache (fastpath) 2,301

A consultant is insisting that its a direct path read. While i agree with Jonathan, i think oracle copy the database blocks to PGA in bulk. But i dont know the conditions under which FastPath take place. Would to put some light on the mechanism and conditions under which this phenomenon occur

November 23, 2010 - 12:30 pm UTC

You cannot get a *concurrent* and *consistent* read without the cache.

It is not a direct read. It is a consistent get from the buffer cache that can use a shorter code path for repeated reads of the same block in a fetch call.

tell the consultant that if it were a direct read, it would be using physical IO, that is part of the definition of a direct path read.

some interesting update behaviours.

AMIR RIAZ, November 22, 2010 - 12:05 am UTC

Hi Tom,

1. how will the following statement executes.

UPDATE
( SELECT src.x src_x, src.y src_y
, tgt.x tgt_x, tgt.y tgt_y
FROM src INNER JOIN tgt
ON ( src.id = tgt.id )
)
SET tgt_x = src_x
, tgt_y = src_y
/

2 rows updated

src and tgt are big tables. if both do serial direct path reads in select query and update 2 rows. How will update execute here. I think select will bypass cache and when the blocks were in PGA they are update and written directly to modified blocks. Both scr and tgt will lock the whole segament during select (no problem since oracle locks the read in shared mode during direct path read). When updating oracle will transforms the share read lock to exclusive lock on these rows and shared lock on table and perform update. so concurrency suffers. But it just a thoughts, i think we need Toms opinion on this interesting topic.

November 23, 2010 - 12:39 pm UTC

they need to do consistent gets to identify the rows to update and then current mode gets to update. No direct path reads here - you need the buffer cache (else this update would have to lock both tables - no concurrency).

src and tgt are NOT both locked by an update - even if the update updates every row - someone else could still insert into the table.

AMIR RIAZ, November 22, 2010 - 12:48 am UTC

oops! typing error above.

so concurrency suffers = so conncurrency does not suffer.

runtime update

A reader, December 09, 2010 - 5:40 am UTC

I need to update a table (say t) based on entries in another table (say u) by matching
the key columns.
My update would be something like below
update t set (column_a,
column_b,
....
.....
column_n)=(
select
nvl(updt_a,column_a),
nvl(updt_b,column_b),
...

null(updt_n,column_n)
from
where t.k1 =s.k1 --primary column)
where exists (select null
from s
where t.k1=s.k1)

Is the above advisable for large tables ?Any alternatives.

Above sql would be run dynamically in a procedure and name of table and corresp update table
will be available runtime.

December 09, 2010 - 11:02 am UTC


merge into t using u on (t.key = u.key) when matched then update set a=u.a, b=u.b, ....

just merge it in bulk.

80 gig table

Arunraj, December 13, 2010 - 5:41 am UTC

Hi Tom,
The table1_back is 80 gig and i have tried the following queries-

SQL> alter table table1_back nologging;

Table altered.

Elapsed: 00:00:00.01
SQL> alter table table1_back parallel 8;

Table altered.

Elapsed: 00:00:00.00
SQL> truncate table table1;

Table truncated.

Elapsed: 00:00:02.36
SQL> alter table table1 nologging;

Table altered.

Elapsed: 00:00:00.01
SQL> alter table table1 parallel 8;

Table altered.

Elapsed: 00:00:00.00


SQL> insert /*+APPEND parallel(b,8) */ into table1 b select /*+ parallel(a,8) */ * from table1_back a;


It is taking an eternity for completion. Can you suggest any other method for performing this? 
I want to basically copy the data from backup table table1_back to the table table1.

Thanks,
Arunraj

December 14, 2010 - 2:40 am UTC

what kind of machine are you using.

define "eternity"

do you need a copy or would you just like to replace table1 with table1_back - the fastest way to do that would be:

drop table table1;
rename table1_back to table1;

80 gig table

Arunraj, December 14, 2010 - 4:42 am UTC

dropping the table isnt an option because i would lose the back up table then.
I ran that query for 17 hours and in the end it gave me a "unable to extend segment 8" error.
I have checked with our DBAs and they said that they cant extend the table space any more.
So, I guess error which I am facing is the segmentation error and need to find ways to solve it.
I found a method of committing the table after x number of inserts. Will it serve the purpose?
Or do you suggest any other solutions?

Thanks,
Arunraj

December 14, 2010 - 7:22 am UTC

the unable to extend means the tablespace you were trying to copy into is out of space - you don't have free space in your target tablespace.

if your DBA's said "no more space for you" then you are not going to get very far, think about it.

committing isn't going to make it take less space. It could make it take more.

You could try using compression (create table t compress as select * from t_backup)

'define "eternity"'

Sokrates, December 14, 2010 - 5:55 am UTC

select NUMTOYMINTERVAL(BINARY_DOUBLE_INFINITY,'year') from dual

December 14, 2010 - 7:28 am UTC

hah, funny :)

update compress table

A reader, February 06, 2011 - 4:39 pm UTC

I have compress table having 9,000,000 rows.

update(merge and parallel 8 dml) of compress table's 800,000 rows takes 20 minutes generating about 450,000 undo records each session,
but update(merge and parallel 8 dml) of uncompress table's 800,000 rows takes 2 minutes generating about 100,000 undo records each session.

same plan and same data!

only different thing is compress or uncompress

February 07, 2011 - 6:23 am UTC

you do not mention the version or any other relevant conditision, but I'll assume 10g or before.

An update cannot be done in direct path. Therefore, your update of all 800,000 rows will involve decompressing them, allocating new space for them, and moving them to this new space. whereas when you do the uncompressed update - it just updates them and probably doesn't have to move them.

Saying the "only different thing is compress or uncompress" is like saying "the only difference between the racers was that one was on a bike and the other in a car". It is a pretty huge difference.

In 10g and before, basic compression was for direct path loaded data only and is only appropriate for read only/read mostly data. While it supports updates - it is certainly not recommended. It would be appropriate in a warehouse where you load in direct path mode a ton of data - and then query it. It would be appropriate in an OLTP system under some circumstances like an audit trail - where you can take last months audit trail partition and alter move it - to compress it. So your audit trail would have old months partitions compressed (never to be modified of course) and the current month uncompressed.

In 11g and above, with the advanced compression option - you can preserve the compression of data even while performing 'normal' inserts, updates and so on.

A reader, February 07, 2011 - 4:41 pm UTC

thanks.

your assumption is right. database version is 10.2.0.4.

I understand your answer.

insert/updae with plsql

KG, March 12, 2011 - 9:39 am UTC

Hi Tom
Got a t1 and t2 tables with same table structure
(
c1 number,
c2 number)

t1 has less number of rows around 10000
and t2 around 200000

Need a plsql will take the data in the T1 table and
1. insert rows not currently in the T2 table and
2. update T2 rows with a matching c1 value with the data in T1

I think MERGE cannot be use here as join condition for Insert and update varies.

Thanks in advance.

March 12, 2011 - 5:05 pm UTC

Well, you have a slight small ambiguity.

Do you want it to happen in that order? Or could you update and then insert (possibly inserting less records then before).

What if T1 has:

C1=1, C2=1

and T2 has:

C1=1, C2=2

If you insert first:

C1=1,C2=1
C1=1,C2=2

and then update:

C1=1,C2=2
C1=1,C2=2

if you update first, you'll just have

C1=1,C2=2

so, please be a tad more precise in your specification - as it stands now - it is not clear.

Sorry for Confusions

KG, March 12, 2011 - 11:19 pm UTC

Hi Tom
Yes i was bit wrong ,The col1 is primary key .
Below is task i have been asked to achieve

PL/SQL program to be written that will take the data in the T1 table and
1. insert rows not currently in the T2 table and
2. update T2 rows with a matching col1 value with the data in T1.

Please write the program so as to maximize overall performance and efficiency.

I was able to achieve this with MERGE but i need a PLSQL as well.
Whether plsql can be better than MERGE performance wise?

March 14, 2011 - 7:49 am UTC

I was able to achieve this with MERGE but i need a PLSQL as well.

no you don't and it would be an utter waste of time to write it.

MERGE is the right answer here

merge into t2 using t1 on (t1.col1 = t2.col1) 
when matched then update set col2 = t1.col2
when not matched then insert(col1,col2) values (t1.col1, t1.col2);

If you need it in plsql, it would look like this:

begin

merge into t2 using t1 on (t1.col1 = t2.col1) 
when matched then update set col2 = t1.col2
when not matched then insert(col1,col2) values (t1.col1, t1.col2);

end;

that is the way to write the program so as to maximize overall performance and efficiency.

I refuse to write procedural code as this solution not only does not require it, it would be less efficient and performance would suffer.

Thanks Tom

kg, March 14, 2011 - 3:14 pm UTC

Tom
Merge does full table scan of the tables.
The col1 has primary key index .
Is it not possible to use index to access tables and then do DMLS on them

March 14, 2011 - 3:30 pm UTC

Merge does full table scan of the tables.

EXCELLENT, I would be very very upset if it did anything else in most cases!!!

If the tables are of any size - then it is doing the absolute RIGHT THING.

And you will NOT break that perfection by writing your own "do it yourself nested loops joins". I won't let you - I've seen it happen far too often.

"Oh no, it isn't using my indexes!!! I have to 'fix' that!!!"

That is the most wrong way to think about it.

But also bear in mind that when it makes SENSE, it will not full scan both tables:

ops$tkyte%ORA11GR2> create table t1 as select object_id col1, object_name col2 from all_objects where rownum <= 10;

Table created.

ops$tkyte%ORA11GR2> alter table t1 add constraint t1_pk primary key(col1);

Table altered.

ops$tkyte%ORA11GR2> exec dbms_stats.gather_table_stats( user, 'T1' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> create table t2 as select object_id col1, object_name col2 from all_objects;

Table created.

ops$tkyte%ORA11GR2> alter table t2 add constraint t2 primary key(col1);

Table altered.

ops$tkyte%ORA11GR2> exec dbms_stats.gather_table_stats( user, 'T2' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> set autotrace traceonly explain
ops$tkyte%ORA11GR2> set linesize 1000
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> merge into t2 using t1 on (t1.col1 = t2.col1)
  2  when matched then update set col2 = t1.col2
  3  when not matched then insert(col1,col2) values (t1.col1, t1.col2);

10 rows merged.


Execution Plan
----------------------------------------------------------
Plan hash value: 1890694634

---------------------------------------------------------------------------------------
| Id  | Operation                      | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | MERGE STATEMENT                |      |    10 |   590 |    13   (0)| 00:00:01 |
|   1 |  MERGE                         | T2   |       |       |            |          |
|   2 |   VIEW                         |      |       |       |            |          |
|   3 |    NESTED LOOPS OUTER          |      |    10 |   400 |    13   (0)| 00:00:01 |
|   4 |     TABLE ACCESS FULL          | T1   |    10 |   110 |     3   (0)| 00:00:01 |
|   5 |     TABLE ACCESS BY INDEX ROWID| T2   |     1 |    29 |     1   (0)| 00:00:01 |
|*  6 |      INDEX UNIQUE SCAN         | T2   |     1 |       |     0   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - access("T1"."COL1"="T2"."COL1"(+))

ops$tkyte%ORA11GR2> set autotrace off

here it full scans the tiny 10 row table and applies the changes to T2.

If the "tiny table" grows, then we'll stop using the INEFFICIENT, BAD indexes and get back to nice big juicy efficient full scans:

ops$tkyte%ORA11GR2> delete from t1;

10 rows deleted.

ops$tkyte%ORA11GR2> insert into t1 select * from t2 where rownum <= 10000;

10000 rows created.

ops$tkyte%ORA11GR2> exec dbms_stats.gather_table_stats( user, 'T1' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> set autotrace traceonly explain
ops$tkyte%ORA11GR2> set linesize 1000
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> merge into t2 using t1 on (t1.col1 = t2.col1)
  2  when matched then update set col2 = t1.col2
  3  when not matched then insert(col1,col2) values (t1.col1, t1.col2);

10000 rows merged.


Execution Plan
----------------------------------------------------------
Plan hash value: 2414655244

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | MERGE STATEMENT      |      | 10000 |   576K|   115   (2)| 00:00:02 |
|   1 |  MERGE               | T2   |       |       |            |          |
|   2 |   VIEW               |      |       |       |            |          |
|*  3 |    HASH JOIN OUTER   |      | 10000 |   498K|   115   (2)| 00:00:02 |
|   4 |     TABLE ACCESS FULL| T1   | 10000 |   214K|    14   (0)| 00:00:01 |
|   5 |     TABLE ACCESS FULL| T2   | 72719 |  2059K|   100   (1)| 00:00:02 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("T1"."COL1"="T2"."COL1"(+))

ops$tkyte%ORA11GR2> set autotrace off

So you see - if you do it in a merge statement - it will use "for loops" when it makes sense, and it will use big full scans when it makes sense - meaning you don't need N versions of code for:

T1            T2
-------       --------
big           big
small         small
big           small
small         big

you just have merge and we figure out the way to do it.

Inconsistent Database for Improper Shutdown

Shantanu Nandy, March 15, 2011 - 5:53 am UTC

hi Tom,
its been a pleasure to read your comments on the various aspects of oracle and knowing so much about this under the same roof.

just had a thought in my mind about your suggestion while updating large number of rows that we need to (1) create a new table, (2) do an insert, then (3) drop the old table and (4) rename the new table to old one.. now, these steps are defined by the programmer and are not in general explained in details to the dba team.. if, by any chance, the database shuts down in the middle of steps (3) and (4), then will not the dba team has to act more interactively in executing the remaining steps? otherwise it will leave the database inconsistent. does this approach then involves some kind of threat of causing the database inconsistent?

thanks again for all your explanations and answers with utmost clarity.

March 15, 2011 - 8:54 am UTC

Well, it doesn't even take this scenario - most of the times developers commit WAY too early - the do something, commit, do something, commit and so on - even though all of the do somethings are part of the same flow, the same thing - we must pick up where they left off (the process is restartable).

They (the developers) can actually use dbms_scheduler to set up these work flows. Their process would not be to do 1, 2, 3, 4 - but rather schedule a job chain that does 1, 2, 3, 4 in order (or uses whatever external job/workflow tool they want).

big table update

Indranil Das, April 16, 2011 - 8:56 am UTC

hi Tom,
Suppose, we need to an update like this-

rem --- Update Credit Card Record ---

update prod.contract_payment
set CREDIT_CARD_NUMBER =
case credit_card_type_code
when 'VISA' then
'DD07A801793B63825E3B23A5A2622A95E1924E940939D6461169DCE01F6150A7G425'
when 'MC' then
'C217DC8C813A77562F9A2979302CBD32E1924E940939D6461169DCE01F6150A7G425'
when 'AMEX' then
'AC2CA9F4A1C5B223EE870778E1F26753G425'
when 'DSC' then
'162077E116204E6F1B91C1D5ED26B9A0E1924E940939D6461169DCE01F6150A7G425'
else
CREDIT_CARD_NUMBER
end ,
bank_account_number = NULL ,
last_modified = sysdate ,
last_modified_by = user
where credit_card_type_code in ('VISA','MC','AMEX','DSC')
and TRANSACTION_TYPE_CODE ='CC' ;

COMMIT;

rem --- Update Bank Account Record ---
update prod.contract_payment
set CREDIT_CARD_NUMBER = NULL
bank_account_number = 'A48F3F9A0010D1167F9E8201A9BE1A90G425' ,
last_modified = sysdate ,
last_modified_by = user
where TRANSACTION_TYPE_CODE ='ACH' ;

COMMIT;

Now, Contract_payment Table is having 5,00,00,000 records and nothing is worked - I mean bulk update ,plain update,using rowid update - nothing worked... Now, my query is, without using the "create a new table,drop the new table " method , is there any alter nattive ? Could you please give an example,.... Thanks a lot in avance... Regards
Indranil

April 18, 2011 - 10:04 am UTC

why do you have a commit in between? Are these really two transactions - it would be ok for one to succeed and one to fail?

What do you mean by "nothing worked"??? Of course they worked, or were working... I don't know what number 5,00,00,000 is - it is either

500,000,000
or
5 billion

but in any case - assuming you have the hardware and time - it'll work.

You don't tell us how many records this will update - that would be very very important to know. As would the proper size of the table - both in RECORDS and GIGABYTES

Alex, April 18, 2011 - 10:25 am UTC

That post is pretty scary, you'd like to think experts were writing code that's touching credit card numbers and bank accounts.....Let's hope that's a bogus account # in there...

April 18, 2011 - 10:52 am UTC

it is too big to be a credit card number, it looks to be encrypted (raw) representations of something.

Update trigger issue

banerjee, April 18, 2011 - 12:35 pm UTC

today i faced one typical issue. want to share with you.

In our load test environment we found one update is taking almost 15 min to update 18 million data. but the query is pretty simple. After some analysis we found that their is an update trigger associated with the table. And the update trigger is for each row. So simple bulk collect with limit 100 itself is taking much time because, for every update the trigger is being called. So after some analysis it is found that trigger is not doing any thing related to this update. So we
1. Disabled the trigger in pl/sql code.
2. Do the update
3. Enable it again in pl/sql code.

For time being this issue is resolved. But we cant use the same technique for all the cases.

Is their any way to handle this kind of update.

April 18, 2011 - 12:49 pm UTC

avoid triggers - think about it - what else could one say????

You have a row trigger
It'll fire per row
It'll do things slow by slow
And that is unavoidable, entirely.
Even if you moved the processing to the AFTER trigger (using a compound trigger in 11g or using a plsql package to capture a row by row state in plsql tables) - it would still be slow since the row by row, slow by slow, trigger would still have to fire.

see
http://www.oracle.com/technetwork/issue-archive/2008/08-sep/o58asktom-101055.html
for more reasons to avoid triggers.

update on generic columns

DAN FISHER, April 20, 2011 - 7:08 am UTC

We have a generic entity / attribute based table. I know this is a mistake :)

However we need to update 4 value varchar2 columns to 4 corresponding numeric columns ie

UPDATE t_clinical_record
SET value_varchar_23 = value_numeric_3,
value_varchar_24 = value_numeric_5,
value_varchar_25 = value_numeric_6,
value_varchar_26 = value_numeric_7
WHERE entity_id = 857;

This needs to be done for around 300 different entities, where the numeric column differs based on entity id. The table in question has approximately 100 million record and is not partitioned (this would have been handy as I could have performed a partition exchange).

Running an update in this manner is going to takes days. weeks..

Could we exploit the CTAS as highlighted and do you have an example.

Dan

Disclaimer I inherited the generic design and disliked this type of modelling before now I loathe it.

April 20, 2011 - 8:12 am UTC

why would it take days? weeks? It sounds like a very tiny update??????

You are going to update a couple hundred records as far as I can tell?? why would this take a long time?

guess

A reader, April 20, 2011 - 12:32 pm UTC

Tom,
Can you guess who I am ?

Thanks

update on generic columns

DAN, April 21, 2011 - 5:39 am UTC

It will be slow because the attribution varies across entities the basic statement below, exhibits variation across the generic numeric columns based on the entity id.

UPDATE t_clinical_record
SET value_varchar_23 = value_numeric_3,
value_varchar_24 = value_numeric_5,
value_varchar_25 = value_numeric_6,
value_varchar_26 = value_numeric_7
WHERE entity_id = 857;

So this needs to be executed a few times.

In addition running a count (*) from t_clinical_record
where entity_id = 857

Returns aproximately 25,000,000 records. The table it self has several hundred million records, not 100.

Thanks in advance
Dan

April 25, 2011 - 8:05 am UTC

why would you need to run it a couple of times??

Updating millions of records

A reader, May 02, 2011 - 5:45 am UTC

Hi Tom,

I have a similar situation.
My requirement is related to a process of archiving where I need to move 10 million records from main table to history table.
Any pointers on how can this be done faster? (We work on Oracle 10g.)

Regards,
Neha

May 04, 2011 - 12:13 pm UTC

re-read the page, I've described what you might consider doing.

Using DDL instead of DML, create tables as select.

Adjusting Primary Keys and Foreign Keys in a table

NimitZ, May 12, 2011 - 6:30 am UTC

Hi Tom,

I have a requirement where I need to merge selective data from one database instance to another. I have scripts that did the part to stage the data.

However, when before I try to merge the data to the original tables, I would need to adjust the primary keys of all the tables and update the foreign key references accordingly. There are around 130 tables. Currently what I do is, I have added additional columns to hold the new and old primary keys. Updating references on the tables takes a real long time and there is nothing that I can think of on the migration part of it. All of this is done by disabling constraints and then applying constraints separately after the updates are complete.

1. Create views around source (Not all data and not all tables are to be merged)
2. Move data from views (which is now my source) to staging
3. Staging tables have old_id and new_id in addition to the source table data
4. Update new_id with max(primary_key value at destination) + rownum – For all staging tables
5. Update old_id with current primary key value – For all staging tables
6. Get all foreign key references for each table and based on the referred table and column update the references.

Step 6 is what takes like forever. All the queries are dynamically created and executed with an execute immediate.

Can you please suggest if there is any better way of doing it?

Thank you.

PS: I tried to post a new question but the site was not open. And this is a little urgent.

May 12, 2011 - 7:59 am UTC

you seem to be missing steps? I don't see any disabling of constraints?

What about the indexes - do you do anything with them?

How much data are we talking about?

What is your definition of "forever"? Mine is 1,000 days. What is yours? (it might be expectations that need to be reset...)

Are you doing this sequentially? Do you have resources to do this in a more parallel fashion?

Updating 1M records deriving the value from a DB link

NPaul, May 19, 2011 - 3:35 pm UTC

Hi Tom,
I have a reporting table consisiting of 62378579 Rows.
I will have to update approx 1 M records in the table for a given field.
This field value will have to be derived from the erp database.
I followed your approach from this thread to
create a new table
using an update query having the db links ,
it has been running for more than 8 hours now.
For Simple updates where a reference table
<Meaning - straight forward cases- of wat has to be updated to what>
is given to update 1M records it usually takes 5 hours.
What is the right approach for these kind of data fixes,
is it okay to connect via DB links to derive the data?Please advise

Thanks,
NPaul

May 23, 2011 - 11:10 am UTC

you tell us nothing about the size of the data in the remote database - that would be the important bit... Also, "62378579 Rows." tells us nothing of the volume of data - is that 62mg or 62gb or 62tb or something else?

Updating 1M records deriving the value from a DB link

A reader, May 25, 2011 - 9:43 am UTC

Hi Tom,
Sorry for the incomplete facts as this being my first question in this site.
Reporting table size = 49G,Erp table size =9.4G.Update will be done to a single field based on data from erp table.

Thanks,
NPaul

May 25, 2011 - 11:35 am UTC

is it possible to identify the 1,000,000 rows from the ERP table easily? Or are we going to have to basically bring that entire back over the database link?

help on this please

venkata, August 03, 2011 - 4:43 am UTC

Tom,

I have to build a local table by copying some 10 million records from another table which i am accessing via dblink. The data in remote table is partitioned based on month and year i.e 'Mar-2010', 'Apr-2010', 'May-2010' and so on.

What i am doing now !

I am building table locally and then inserting data like this

insert into local_table select * from table@remotelocation where mon_year = 'Mar-2010';
commit;
insert into local_table select * from table@remotelocation where mon_year = 'Apr-2010';
commit;
insert into local_table select * from table@remotelocation where mon_year = 'May-2010';
commit;

what is my assumption

my assumption is loading data into local table as said above will not create any relog files

what i want to know

i want to know instead of the above process can i load data like this
insert into local_table select * from table@remote_location where mon_year in ('May-2010','Apr-2010','Mar-2010');
commit;

and which process is better.

i can't give you a working example here like create table etc..

August 03, 2011 - 7:44 am UTC

your assumption is false.

those are conventional path inserts - they always generate UNDO and REDO.

if you use insert /*+ append */ (and met all of the requirements for a direct path load - no foreign keys, no triggers...) - then we will skip UNDO generation for the table (but not any indexes)

further if you are in archivelog mode, AND if you use insert /*+ append */ (and met the requirements) AND the table was set to NOLOGGING AND the database was not set to FORCE LOGGING - then we can skip REDO and UNDO for the table (but *not* for any indexes)

or if you are in noarchivelog mode and you use insert /*+ append */ (and met the requirements) we would skip UNDO and REDO generation for the table (but not any indexes...)

Yes, you can and probably should use the mon_year in (.....)

is mon_year a date or a varchar2? If it is a date, please use to_date( 'may-2010', 'mon-yyyy' ) - not just 'May-2010'.

can i do this

Venkata, August 03, 2011 - 12:15 pm UTC

Hi,

can i do bulk collect over a db link to avoid multiple db reads will this approach also avoid redo and undo logs.

hmmm

i will post the exact query what i am doing in my next post sorry i don't have the query in hand now .....

August 03, 2011 - 1:27 pm UTC

how would bulk collect avoid redo and undo in general? You are very mistaken/ confused about how this works.

You can skip undo generation on a table using DIRECT PATH operations (insert /*+ append */, sqlldr direct=y, create table as select, alter table t move - for example)

You MIGHT be able to skip redo generation if you are

a) in noarchivelog mode
b) in archivelog mode and the table is set to nologging and the database IS NOT set to force logging

But that is it - that is all.

(Note: in 11gr2, using the append_values hint, you can direct path load using arrays in plsql as well)

If you want to skip undo and perhaps redo generation, you would use the append hint, period.

update two columns of same table

A reader, September 02, 2011 - 6:47 am UTC

hi tom,

my environment:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
"CORE 11.2.0.2.0 Production"
TNS for Solaris: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

i have a table t1 with the following fileds and values;
i j k name1 name2 date
1 1 1 c d 03-SEP-11
1 1 1 z x 06-SEP-11
1 1 2 b c 04-SEP-11
1 1 2 a a 07-SEP-11
1 2 2 b b 01-AUG-11
1 2 2 v b 04-SEP-11

in RB database i,j,k,name1,name2 is used as a primary key but while
migrating to oracle for some prupose we desiging it in such a way that
i,j,k is the primary key but we will get duplicates for this combination.
as you refer in the above example.
to make it distinct we apply the logic say for combination 1,1,1
we get 2 rows , so we will be taking only one name1 and name2 combination.

logic is: for combination of i,j we take latest of name 1
( using latest load_date value) and for i,k name2(using latest load_date value)
so for 1,1,1 combination we should have the latest name 1 as 'a' and
name2 as x and load _date is latest of 1,1,1 combination and
insert into a new table as 1,1,1,z,x,06-sep-11 for the combination of 1,1,1. and so on.
please help me am trying it using many logics and making it complex. i have used ref cursors and have not succeedded.

this is my code which i tried and not successful:
declare
TYPE rec_type IS RECORD(i NUMBER,j NUMBER);
rec_c1 rec_type;
type rec_type4 IS RECORD(i NUMBER,k NUMBER);
rec_c2 rec_type4;
TYPE cv_typ IS REF CURSOR;
cur1 cv_typ;

type rec_type1 is record ( rn number, i number, j number, nm1 varchar2(10), load_date date);
rec_c3 rec_type1;
type rec_type2 is record ( rn number, k number,nm2 varchar2(10), load_date date);
rec_c4 rec_type2;
begin

select distinct i,j into rec_c1 from temp1;
select distinct i,k into rec_c2 from temp1;
--where load_date in (select max(load_date) load_date from temp1 group by i||j )

open cur1 for 'select * from (select rownum,i,j,k,nm1,nm2, load_date from temp1 where i=1 and j=1 order by load_date desc) where rownum =1';
loop
fetch cur1 into rec_c3;
insert into temp2 ( i, j, nm1,load_date) values( rec_c3.i, rec_c3.j,rec_c3.nm1,rec_c3.load_date);
end loop;

open cur1 for 'select * from (select rownum,i,j,k,nm1, nm2, load_date from temp1 where i=1 and k=1 order by load_date desc) where rownum =1';
loop
fetch cur1 into rec_c4;
update temp2
set nm2 = rec_c4.nm2 ,
k =rec_c4.k
where i = temp2.i and j =temp2.j and nm1 = temp2.nm1 and load_date = temp2.load_date;
end loop;

end;

please let me know if any additional information is required

September 02, 2011 - 8:48 am UTC

logic is: for combination of i,j we take latest of name 1
( using latest load_date value) and for i,k name2(using latest load_date value)
so for 1,1,1 combination we should have the latest name 1 as 'a' and
name2 as x and load _date is latest of 1,1,1 combination and
insert into a new table as 1,1,1,z,x,06-sep-11 for the combination of 1,1,1.
and so on.

you lost me there. You said "i,j - take the latest of name 1 and that is 'A'", "i,k table the latest of name2 and that is 'X'", and then we insert "z,x" ??????

why not a,x - and what is the rule for the date column - how do you decide what that should be????

you really also need to supply a create table and inserts if you want me to create a query for you.

update 2 columns in a table

A reader, September 02, 2011 - 6:51 am UTC

hi tom small correction in tha above example the result output should be 1,1,1,a,x,06-sep-11 in the above i wrongly mentioned as 1,1,1,z,x,06-sep-11

September 02, 2011 - 8:49 am UTC

so, tell me, why sep-6th?? what is the rule for that

and you need a create table and inserts

A reader, September 05, 2011 - 5:35 am UTC

hi tom,

6th sep follows the rule - it takes latest date for the combination of 1,1,1

please let me know any more inputs needed.

also i tried one more update statement.

Update temp2 t2
set
temp2.nm1 = (select nm1
from
temp1 T1 where
T1.i = t2.i and
T1.j = t2.j )
where exists
(select
i,j,max(load_Date) FROM t1
group by i,j)

but not sure it is correct. also if its rite, will it take less time for updating billion of records. since am using exists.

September 06, 2011 - 8:38 am UTC

you will have to, in text - just words, spell this out precisely. sep-6th is still confusing for me - I have to look at two different rule sets to get that - you have one rule for i,j. You have another rule involving i,k. Is the third - yet to be specified rule for i and (j or k)?

Also, read the last response above more closely, you are missing the second most important thing after a clear and detailed specification. Actually, you can read either of my posts since I clearly told you what you MUST provide if you want me to even look at this.

informatica

ram, September 20, 2011 - 6:12 am UTC

i have an issue here.in my source table i have 1000 records and in my target table it is shoing me as 1500 records.I want only 1000 records in my target.can you tell me how to clear that.

September 20, 2011 - 6:43 pm UTC

seriously - you are asking me an informatica question?

I don't know what you did wrong in informatica to get this transformation.

informatica

ram, September 20, 2011 - 6:14 am UTC

i have an issue here.in my source table i have 1000 records and in my target table it is shoing me as 1500 records.I want only 1000 records in my target.can you tell me how to clear that.

Table update with 900m rows

Rahul, November 09, 2011 - 8:44 am UTC

Hi Tom,
I have not managed to nail the above suggestions. I have a table with 900m rows and need to add a new column and then update the value based on some existing value in the table. I have created a dummy table with NOLOGGING clause and use Insert /*+ append parallel (EPOS_STORE_LEVEL_HOLD,12) */ to insert with Select /*+ parallel (EPOS_STORE_LEVEL,12) */ and it is taking forever. Am I missing something or should I be just patience and wait for the update to take its own time. This is in DEV only and to see the affect or downtime needed in our LIVE database. Based on sample data of 1m rows I have calculated that it would take 22 hours. Do you have any suggestions to bring it down to more realistic time frame please? Your help will be highly appreciated. Thanks in advance.

November 09, 2011 - 11:24 am UTC

a) are you sure it went parallel, did you check
b) do you have the resources to go parallel (cpu, memory, DISK, IO BANDWIDTH)
c) how big is 900m rows? Could be anywhere between 1gb and infinity
d) are you archivelog mode, if so, with or without logging set on the table.
e) what does the sql look like
f) what are you waiting on (v$session_event, v$session_wait)

#2 is probably one of the most important things

Table update with 900m rows

Rahul, November 10, 2011 - 5:10 am UTC

Thanks for your quick response. Couple of things first:
Update worked with NOLOGGING clause on all the partitions in 9 hours which is a lot faster than anticipated 22. Also, I don't think I had "all" the resources to go in parallel as there are 4 processors on our server so I am not sure if it went in parallel. All other resources are there. There is a decent disk space and all the tables-paces have been auto extended to avoid failures. 900M rows in terms of size is about 56G in space and this is the query I was using:
Insert /*+ append parallel (XXX_HOLD,12) */
into XXX_HOLD ( EPOS_DATA_PROVIDER_CODE ,
MAGAZINE_CODE ,
SALE_DATE ,
STORE_CODE ,
SUM_OF_UNITS ,
LAST_STORE_POLLED_DATE ,
LAST_UPDATED ,
LAST_UPDATED_BY ,
SUM_OF_COST ,
WHSHS_SKU ,
TITLE_ID
) Select /*+ parallel (XXX,12) */ EPOS_DATA_PROVIDER_CODE
,MAGAZINE_CODE
,SALE_DATE
,STORE_CODE
,SUM_OF_UNITS
,LAST_STORE_POLLED_DATE
,LAST_UPDATED
,LAST_UPDATED_BY
,SUM_OF_COST
,WHSHS_SKU
,ALL_MAGAZINE_PKG.GET_TITLE_ID_FROM_MAG_CODE('EPOS_MAGAZINE_CODE',MAGAZINE_CODE) Title_ID
from XXX ;
where ALL_MAGAZINE_PKG.GET_TITLE_ID_FROM_MAG_CODE is a function which returns required title_id and is using Index.

Also explain plan of select statement returns as :

Plan
SELECT STATEMENT ALL_ROWSCost: 2 Bytes: 117 Cardinality: 1
4 PX COORDINATOR
3 PX SEND QC (RANDOM) PARALLEL_TO_SERIAL SYS.:TQ10000 :Q1000Cost: 2 Bytes: 117 Cardinality: 1
2 PX BLOCK ITERATOR PARALLEL_COMBINED_WITH_CHILD :Q1000Cost: 2 Bytes: 117 Cardinality: 1 Partition #: 3 Partitions accessed #1 - #89
1 TABLE ACCESS FULL TABLE PARALLEL_COMBINED_WITH_PARENT EPOS.EPOS_STORE_LEVEL :Q1000Cost: 2 Bytes: 117 Cardinality: 1 Partition #: 3 Partitions accessed #1 - #89

Index creation (on 4 columns) took about 30 mins for the same data which is what I was hoping for. I am not in archive log mode.

November 10, 2011 - 1:33 pm UTC

Update worked with NOLOGGING clause on all the partitions in 9 hours which is a
lot faster than anticipated 22

just for your information - the nologging had no effect on this, updates are ALWAYS logged, always - no way around it

disk space isn't relevant (there had to have been space, that is obvious - serial or not). number of disks, amount of bandwidth to/from disk is important.

Table update with 900m rows

Rahul, November 10, 2011 - 9:18 am UTC

Hi Tom,
Thanks for your update. It si working fine now and I have reduced it to 1.5 hours from 9 hours. As pointed out NOLOGGING wasn't working fine and change of function and join with that table also helped.

Thanks once again.

update performance

Venkat, March 30, 2012 - 5:41 pm UTC

Hi Tom,

Thank you very much for spending your valuable time in giving the responses.

I have the below update statement(code changed for security) that's taking long time(90minutes) for updating 1500 records.
I was able to tune the select part of the statement to 3minutes(from 70minutes) using parallel hints etc. However when I replace the select statement in update with the tuned code it's taking around 10minutes to execute the select statement itself(observed from sql_monitor). There aren't any changes in the plan.

Update table set col1=<value>
where exists(select subquery);

oracle version: 11.2.0.2

Sorry if I missed providing more data.

Thanks once again

March 31, 2012 - 8:09 am UTC

I feel like the computer from Star Trek:

insufficient data.

merging tables,

A reader, April 16, 2012 - 4:02 pm UTC

The development team have 4 tables each having more than billion rows. They data in each table are unique. The team wants to merge all the 4 of them into one table. However, the data after merging can be duplicate based on the primary key columns they have defined. They want a solution from us that can enable the constraint (after merging) and capture the duplicate rows in a separate temp table.

My approach was, merge it (insert into A select * from B,C and D with parallel append nologging to speed up the inserts), delete the duplicate rows and also store them in temp table (to speed this up, create a non-unique index on the primary key columns) and then drop the non-unique index and enable the primary key.

What would be your approach?

Thanks,

April 17, 2012 - 11:57 am UTC

how do they define which row to keep and which to discard?

how many duplicates do they anticipate - that is relevant to the solution - just a rough estimate.x

follow up,

A reader, April 17, 2012 - 9:10 am UTC

They determine the duplicate rows based on few attributes. The attributes are already primary key in their respective tables but when they merge all the 3 tables there can be duplicate and they want to capture them.

They don't have a very clear picture of how many of those 3 billion rows (1 billion in each table) are duplicate but their guess is pretty small (may be 5%).

Thanks,

April 18, 2012 - 2:51 am UTC

that is not what I meant.

say a key is in all three tables.

which one do they keep
which ones do they record as duplicate
or do they keep none of them and record all as duplicate

5% of 3,000,000,000 is a big number by the way.

how big is this table, 3,000,000,000 doesn't tell us anything other than - it is at least 3,000,000,000 bytes big.

I might be tempted to use a multi-table insert

INSERT first
  when rn = 1 then into good_table values ( ... )
  else into duplicates_table values ( ... )
select row_number() over (partition by key1, key2, key3 order by <whatever>) rn,
       x.*
  from (select * from t1 union all select * from t2 union all select* from t3) x
/

yes, that will not use any indexes
yes, you do not want to use a single index
yes, that will require gobs of temp
yes, you want to use temp
yes, that will take a while
yes, it is going to take a while

get particular recors from million of records in a table

swapna, April 23, 2012 - 2:14 pm UTC

Hi,

Suppose if a Table has millions of records. Now, I want particular record from that table? which is the best way to do this?

April 24, 2012 - 7:29 am UTC

probably an index on the columns that constitute the where clause that selects this particular record.

large deletes

A reader, May 03, 2012 - 5:37 am UTC

We need to delete a large portion of our table and would like to go by
CTAS method. However, we are running short of space in any of our tablespace.

Do we have any method as basically we want to replace with trimmed version of
table
For performance reasons,I would like to go by CTAS but others prefer the
slow by slow (batch deletes).

So only way out is to extend tablespaces to do this .is there any alternative ?

May 03, 2012 - 9:03 am UTC

disk is cheap

your time is expensive

if you go with the deletes and you are short on space - ask them 'where is the TON of redo going to go??????????'

when they answer - say "ok, instead of using that disk for redo - give it to me for a few minutes and we'll be good to go - I can CTAS using it"

you need the space for the copy of the table - yes.

big update

Ravi B, July 26, 2012 - 5:20 pm UTC

Hi Tom,

Could you please let me know if there is an efficient way of doing this?

This update is taking about 5hrs to finish.

select count(*) from TEST_A_TABLE;
--17608743

select count(*) from TEST_A_TABLE where associated = 0;
--17608743

select count(*) from ADDREMOVEDATA;
--1964698

CREATE INDEX IX_ADDREMOVEDATA_7 ON ADDREMOVEDATA
    (
      UPPER("NAME"),
      UPPER("PUB"),
      UPPER("VERSION")
    )

explain plan for
update TEST_A_TABLE src
       set (ASSOCIATED,
           IGNORE,
           PROD_RID,
           EDITIONS_RID,
           VERSIONS_RID,
           RELS_RID) = 
       ( SELECT 1, dst.IGNORE,dst.PROD_RID,dst.EDITIONS_RID,dst.VERSIONS_RID,dst.RELS_RID
           FROM ADDREMOVEDATA dst
          WHERE 1 = 1
            AND (upper(dst.VERSION) = upper(src.VERSION))
            AND (upper(dst.PUB)     = upper(src.PUB))
            AND (upper(dst.NAME)    = upper(src.NAME))
            AND src.ASSOCIATED = 0
        )
    WHERE EXISTS (SELECT 1
           FROM ADDREMOVEDATA dst
          WHERE 1 = 1
            AND (upper(dst.VERSION) = upper(src.VERSION))
            AND (upper(dst.PUB)     = upper(src.PUB))
            AND (upper(dst.NAME)    = upper(src.NAME))
            AND src.ASSOCIATED = 0
          );

Plan hash value: 2294094399
 
----------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT              |                    |   880K|   100M|    52M  (1)|205:46:30 |
|   1 |  UPDATE                       | TEST_A_TABLE       |       |       |            |          |
|*  2 |   FILTER                      |                    |       |       |            |          |
|   3 |    TABLE ACCESS FULL          | TEST_A_TABLE       |    17M|  2015M| 60067   (1)| 00:14:01 |
|*  4 |    FILTER                     |                    |       |       |            |          |
|*  5 |     INDEX RANGE SCAN          | IX_ADDREMOVEDATA_7 |     1 |    63 |     3   (0)| 00:00:01 |
|*  6 |   FILTER                      |                    |       |       |            |          |
|   7 |    TABLE ACCESS BY INDEX ROWID| ADDREMOVEDATA      |     1 |   149 |     4   (0)| 00:00:01 |
|*  8 |     INDEX RANGE SCAN          | IX_ADDREMOVEDATA_7 |     1 |       |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - filter( EXISTS (SELECT 0 FROM "ADDREMOVEDATA" "DST" WHERE :B1=0 AND 
              UPPER("VERSION")=UPPER(:B2) AND UPPER("PUB")=UPPER(:B3) AND UPPER("NAME")=UPPER(:B4)))
   4 - filter(:B1=0)
   5 - access(UPPER("NAME")=UPPER(:B1) AND UPPER("PUB")=UPPER(:B2) AND 
              UPPER("VERSION")=UPPER(:B3))
   6 - filter(:B1=0)
   8 - access(UPPER("NAME")=UPPER(:B1) AND UPPER("PUB")=UPPER(:B2) AND 
              UPPER("VERSION")=UPPER(:B3))

July 30, 2012 - 9:48 am UTC

looks like you are updating just about every row in that table - using an index look up TWICE!!!!!

rewrite as a merge

merge into test_a_table tgt
using addremovedata src
on ( upper(tgt.version) = upper(src.version) and .... )
when matched then update....

and hope for two full scans and a hash join. If you see any indexes being used, fix that.

and consider rewriting this as DDL, create a new table as an outer of these two tables, index it, constrain it, grant it, etc - and then drop the old and rename the new. updating ever row in a big table is a bad idea.

big update

Ravi B, July 30, 2012 - 11:44 am UTC

Hi Tom,

There were several indexes on the TEST_A_TABLE table(which are not necessary); the execution time was drastically reduced to 12 minutes, after i dropped those indexes(with the query i provided). Infact, i tried using MERGE with HASH JOIN which is taking about 17 mins.

We have a product which runs both on SQL Server and Oracle. The SQL Server takes about 7 mins to finish the same query (same hardware configuration). My management (SQL server favored) wanted to know why oracle is taking that much time.

Here is the plan with MERGE.

Plan hash value: 265507897
 
----------------------------------------------------------------------------------------------
| Id  | Operation            | Name          | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | MERGE STATEMENT      |               |     1 |   336 |       |   182K  (1)| 00:42:40 |
|   1 |  MERGE               | TEST_A_TABLE  |       |       |       |            |          |
|   2 |   VIEW               |               |       |       |       |            |          |
|*  3 |    HASH JOIN         |               |     1 |   353 |   459M|   182K  (1)| 00:42:40 |
|   4 |     TABLE ACCESS FULL| ADDREMOVEDATA |  1964K|   436M|       |  8325   (2)| 00:01:57 |
|*  5 |     TABLE ACCESS FULL| TEST_A_TABLE  |    17M|  2014M|       | 60095   (1)| 00:14:02 |
----------------------------------------------------------------------------------------------

July 30, 2012 - 4:04 pm UTC

trace it and tkprof it

Slow by Slow wins?!?

Matthew McPeak, August 29, 2012 - 1:41 pm UTC

Tom,

Due to a surprise sprung on us by our parent company, my team and I have several updates that we have to accomplish and we'd like to be able to do them all in one Sunday afternoon.

Ordinarily, for a large update, I would consider:

*) Build a new, partitioned table with the same structure as the target table, then
*) INSERT /*+ APPEND */ the converted data into the new, partitioned table, then
*) Exchange the partition in the new table with the target table, then
*) Rebuild the indexes on the target table.

However, in this case, the client does not have the partitioning option licensed.

Also, the tables in question are part of a 3rd party package, so they are not comfortable with a similar strategy involving just renaming the tables at the end, since that would leave our home-built copy in place as the "real" table.

So, the best pure-SQL approach I could find was:

UPDATE ( SELECT t.col1, v.new_val FROM large_table t
INNER JOIN cnv_master_data v ON v.old_val = t.col1 )
SET col1 = new_val;

That has a plan of:
UPDATE STATEMENT ALL_ROWS
UPDATE LARGE_TABLE
HASH JOIN
TABLE ACCESS FULL CNV_MASTER_DATA
TABLE ACCESS FULL LARGE_TABLE

I didn't actually run it to completion, but I started it and watched v$session_longops. After spending almost 30 minutes doing FTS of the two tables, the progress on the HASH JOIN indicated it was going to take another 4 or 5 hours.

So, I tried the following approach:
1) Create a table having the row_ids and new value for each row_id.
2) Create an index on that table, so I could quickly select the data, sorted by row_id.
3) Open a cursor on the table, bulk collect into arrays, and FORALL UPDATE to the new value, based on the row ids.

It took just over 2 hours.

Now, the manager wants me to type this up as the "approach" for all the other developers who have similar updates to do as part of this initiative.

I really hate typing up a "slow-by-slow" approach as the "right" way to do this, but I cannot deny that it seems faster.

Am I crazy?

As always, I appreciate your insights and comments.

Best regards,
Matt

August 29, 2012 - 1:56 pm UTC

I don't know how big these tables are nor how many rows are to be updated.

I would have used

CTAS
drop old
rename new
index, constrain, grant

myself.

Yes, but...

Matthew McPeak, August 29, 2012 - 2:04 pm UTC

I know. Me too. But they're not comfortable doing those operation on important tables of a 3rd party application.

There is the fear the the new table would be somehow different from the original (a missed trigger, rowdependencies, etc).

It's not that those fears cannot be overcome -- it's a "F.U.D." issue, not a technical issue.

But what about the PL/SQL approach with the pre-built rowid table? Is it possible for such an approach to be faster than a pure-SQL update?

P.S. The tables vary in size -- mostly a few million blocks. We're updating maybe 80% of the rows in each table.

Thanks again,
Matt

August 29, 2012 - 2:26 pm UTC

wasn't the plan for the create table with rowids just

hash join
   full scan t1
   full scan t2

????

what was different about that plan that the creation of the table went instantly fast?

(you are already updating a 3rd party applications table, you've already crossed the line... the FUD should already be there...)

Yes

Matthew McPeak, August 29, 2012 - 2:35 pm UTC

Yes, that was the plan. It wasn't instantly fast, but it was just 20 minutes or so. A better question (that I cannot answer) is why does that HASH JOIN step go so much slower when it is inside the update statement?

>> you've already crossed the line... the FUD should already be there...

We're in a bind :(... We're just trying not to make the situation worse than it needs to be...

Anyway, I am putting together a more focused test on a 1 million row dummy table. If that continues to be faster, I'll post it.

Thanks,
Matt

August 29, 2012 - 2:52 pm UTC

the hash join was actually doing the update at that point in time as well. The hash join had:

a) read the one table and hashed it
b) started reading the table to update and as it hashed joined it to (a), did the update

The update hash join did everything in one step.

did your steps 1, 2 and 3 take 2 hours or was it just step 3 that took 2 hours.

Comparing

Matthew McPeak, August 29, 2012 - 2:58 pm UTC

Building the rowid table took about 20 minutes, using it to update the base table took another 2 hours or so.

Doing the straight update was on course to take 4-5 hours, all told.

insert select VS CAST which is faster?

A reader, August 31, 2012 - 3:05 am UTC

Hi tom,

I heared that in Teradata, 'create table as' is 20% slower than 'create table, insert select'.

Would like to know how is Oracle if we do not consider parallel, nologging? Just a very common case.

My rough testing shows, 'insert select' is slower than 'create as', could you please elaborate more on the processing logic behind their implementation?

September 10, 2012 - 6:30 pm UTC

in Oracle, the opposite is usually true - the create table as select will in some cases marginally outperform an insert as select (we can bypass a final processing step sometimes).

Posting example

Matthew McPeak, August 31, 2012 - 9:01 am UTC

Tom,

I am actually posting my example with timings.

I am wondering if maybe something is not tuned correctly that is making a big "update" statement so slow? Because I wrote a focused test case with a million rows and found the PL/SQL approach almost as fast as pure-SQL, but not faster.

But, when I ran the two approaches on the real (much bigger) table, the difference was dramatic.

The only difference I see is that the PL/SQL approach can tune the work by rowid, so all the updates to a single block happen together. With the pure update do that? Would it even make a difference?

I'd much rather believe that there is something de-tuned in the database. The single UPDATE statement obviously has to write and read from TEMP segments... maybe something there I should look at?

Thanks, as always! (Code example below)

Matt

-- OE_ORDER_LINES_ALL is the table to be converted
-- CNV_MASTER_VINS is pre-existing -- tells us the old values and new values for the conversion

-- Step 1 -- create a list of row_ids from OE_ORDER_LINES_ALL and the new value for each record
-- Run time in STRDV4: 1020 seconds
CREATE TABLE cnv_oel_rowids AS
SELECT oel.rowid row_id, v.vin17
FROM oe_order_lines_all oel INNER JOIN cnv_master_vins v ON v.vin8 = oel.ordered_item
WHERE oel.item_type_code = 'CONFIG';

-- Step 2 -- create an index on the rowid table so that we can quickly sort by rowid
-- (maybe isn't necessary, but helps in testing with less than the full # of rows, and
-- doesn't take too much time to build.)
-- Run time in STRDV4: 19 seconds
CREATE UNIQUE INDEX cnv_oel_rowids_u1 on cnv_oel_rowids ( row_id, vin17 )

-- Step 3 -- Lock the table in exclusive mode so we don't get hung up waiting for anyone's locks...
LOCK TABLE oe_order_lines_all IN EXCLUSIVE MODE;

-- Step 4 -- Loop through the rowid table, and use the information to update the conversion table
-- Run time in STRDV4: 873 seconds
DECLARE
CURSOR c IS SELECT * FROM ( SELECT row_id, vin17 FROM cnv_oel_rowids ORDER BY row_id ); -- WHERE ROWNUM <= 10000;
TYPE t_row_id_tab IS TABLE OF rowid;
TYPE t_vin17_tab IS TABLE OF VARCHAR2(17);
l_row_ids t_row_id_tab;
l_vin17s t_vin17_tab;

-- For longops progress...
rindex BINARY_INTEGER;
slno BINARY_INTEGER;
totalwork NUMBER;
sofar NUMBER;
obj BINARY_INTEGER;

BEGIN
rindex := DBMS_APPLICATION_INFO.set_session_longops_nohint;
sofar := 0;
totalwork := 1737827;

OPEN c;
LOOP
FETCH c BULK COLLECT INTO l_row_ids, l_vin17s LIMIT 1000;
FORALL i IN 1..l_row_ids.COUNT
UPDATE oe_order_lines_all
SET ordered_item = l_vin17s(i)
WHERE rowid = l_row_ids(i);

sofar := sofar + l_row_ids.COUNT;
DBMS_APPLICATION_INFO.set_session_longops (rindex,
slno,
'OEOL Conversion 3',
obj,
0,
sofar,
totalwork,
'conversion',
'items');

EXIT WHEN c%notfound;
END LOOP;
CLOSE C;
END;

-- Total runtime of bulk SQL approach in STRDV4: 1020+19+873 = 1912 seconds
-- Alternative "pure-sql Approach":
-- Run-time in STRDV4: 15911 seconds
UPDATE (SELECT oel.ordered_item, v.vin17
FROM oe_order_lines_all oel INNER JOIN cnv_master_vins v ON v.vin8 = oel.ordered_item
WHERE oel.item_type_code = 'CONFIG')
SET ordered_item = vin17 ;

September 10, 2012 - 6:33 pm UTC

The only difference I see is that the PL/SQL approach can tune the work by
rowid, so all the updates to a single block happen together. With the pure
update do that? Would it even make a difference?

that could definitely have a big difference - but one would hope an update like that would not use an index so it will full scan and hit the blocks in order too

what is the plan for the update?

Tom You have to accept problem exists with Oracle

A reader, September 05, 2012 - 5:25 am UTC

Hi Tom,

I have been a long time reader of your posts and give them quite a bit of importance. but I never found your CTAS suggestion to replace update.

Oracle has to accept that their serial update or parallel update operations are not optimized.

Oracle parallel processing capabilities are very very bad. Oracle have to put some effort in this area.

September 10, 2012 - 7:42 pm UTC

they are optimized.

they are optimized for OLTP, they always have been, they always will be.

and for you to say "parallel processing ... are very very bad" - wow. Flabergasted. Ok, you are "entitled" to your opinion - but it is wrong.

You are a software *engineer* correct?

as an *engineer* do you accept the fact that if a plane flies slowly - say at 500-600mph - the wings should be more or less straight out and long? I hope so, it is the current design of most of the planes I fly on.

as an *engineer* do you accept the fact that if a plane wants to fly above the speed of sound - you need a *different* approach, a *different* design, and entirely *different* thing altogether. Those wings, they better be short, they better be stubby - otherwise you'll rip them RIGHT OFF - now won't you.

Does that meant the maker of the 777 "has bad wings"? No, not at all, it means their wings are designed for one set of parameters. Does it mean the makers of the wings for fighter jets got it all wrong? has a bad implementation? No, not at all - that would be a stupid silly statement wouldn't it?

If you want to update millions of rows, I'm convinced (based on watching it over and over and OVER and OVER again) that you want to use PARALLEL (yes, that bad parallel stuff we got) direct path (create table as select/insert /*+ APPEND */) operations which can bypass the undo and redo *necessary* (necessary) for OLTP - but not for a big bulk operation)

If you are a software engineer and you truly believe that there is a single best way for everything - you are not yet a software engineer, not even close :)

You have to put some effort into learning the fact that a warehouse type bulk operation and an OLTP operation is like comparing a 777 to a fighter jet. *silly* is the first word that pops into my mind.

It's the rowid sorting!

Matthew McPeak, September 06, 2012 - 1:54 pm UTC

Tom,

It seems to be the rowid sorting in the PL/SQL that is saving all the time.

I build a biggish table "MATT1" -- 5.1 million rows using the following:

<code>

create table matt1 ( a number not null, b varchar2(17) not null, block_filler varchar2(400));

insert into matt1
select rownum a, dbms_random.string('u',17), dbms_random.string('a',400) block_filler 
from dual
connect by rownum <= 100000;

insert into matt1
select rownum + 100000, dbms_random.string('u',17), dbms_random.string('a',400) block_filler 
from   matt1, ( SELECT null FROM dual CONNECT BY ROWNUM <= 50 );

commit;

create table matt1_cnv ( b varchar2(17) not null, new_b varchar2(8) not null );

insert into matt1_cnv 
select b, substr(b,-8) from matt1
order by dbms_random.value;

create unique index matt1_cnv_u1 on matt1_cnv (b);

Then, ran a pure SQL update both with rowid sorting and without:

update ( SELECT t.a, t.b, c.new_b FROM matt1 t inner join matt1_cnv c on c.b = t.b)
set b = new_b

vs...

update ( SELECT t.a, t.b, c.new_b FROM matt1 t inner join matt1_cnv c on c.b = t.b ORDER BY t.rowid)
set b = new_b

The update with no rowid sorting took 3.93 hours. The update with ORDER BY t.rowid took 21 minutes.

My question for you -- is there some statistic for updates that I could look at to prove one is doing less work than the other. Like, for SELECT statements, I would compare "consistent gets". Is there something analagous for tracking how much work updates do?

Thanks!

Matt
</code>

September 10, 2012 - 7:58 pm UTC

let's see the plan for the update.

Here are the plans

Matt McPeak, September 11, 2012 - 9:35 am UTC

The first plan, with the order by, ran about 11 times faster.  

<code>
update ( SELECT t.a, t.b, c.new_b FROM matt1 t inner join matt1_cnv c on c.b = t.b ORDER BY t.rowid)
set b = new_b
 
UPDATE STATEMENT  ALL_ROWS Cost: 150,330  Bytes: 163,072,896  Cardinality: 4,291,392  Partition #: 0                                                                 
   5 UPDATE APPS.MATT1 Cost: 0  Bytes: 0  Cardinality: 0  Partition #: 0                                       
       4 SORT ORDER BY  Cost: 150,330  Bytes: 163,072,896  Cardinality: 4,291,392  Partition #: 0                                
         3 HASH JOIN  Cost: 107,408  Bytes: 163,072,896  Cardinality: 4,291,392  Partition #: 0         
            1 TABLE ACCESS FULL APPS.MATT1 Cost: 86,835  Bytes: 94,410,602  Cardinality: 4,291,391  Partition #: 0 
            2 TABLE ACCESS FULL APPS.MATT1_CNV Cost: 6,296  Bytes: 85,605,536  Cardinality: 5,350,346  Partition #: 0 
 
 
 
 
update ( SELECT t.a, t.b, c.new_b FROM matt1 t inner join matt1_cnv c on c.b = t.b)
set b = new_b
 
UPDATE STATEMENT  ALL_ROWS Cost: 104,970  Bytes: 111,576,192  Cardinality: 4,291,392  Partition #: 0                                                 
   4 UPDATE APPS.MATT1 Cost: 0  Bytes: 0  Cardinality: 0  Partition #: 0                       
      3 HASH JOIN  Cost: 104,970  Bytes: 111,576,192  Cardinality: 4,291,392  Partition #: 0         
         1 TABLE ACCESS FULL APPS.MATT1 Cost: 86,835  Bytes: 42,913,910  Cardinality: 4,291,391  Partition #: 0 
         2 TABLE ACCESS FULL APPS.MATT1_CNV Cost: 6,296  Bytes: 85,605,536  Cardinality: 5,350,346  Partition #: 0

</code>

September 14, 2012 - 4:25 pm UTC

yeah, it is the data access patterns the hashing puts the data all over in memory (not in block order) - your order by put it back in block order.

I would say that is the cause - your order by happens to put the rows to be updated back in block order and minimized the physical IO probably by a great deal (we didn't have to read and re-read blocks)

Thanks

Matt McPeak, September 14, 2012 - 5:33 pm UTC

Thanks, Tom! That was really bugging me. I thought that was probably it, but I find it hard to be sure sometimes. Your help is much appreciated, as always!

One clarification...

Matt McPeak, September 15, 2012 - 10:12 am UTC

Thanks again for your help. One more follow up though please...

You said that the order by "happens" to put the updates back in block order, as though this is an unusual and unique situation.

After this, it seems to me that any sort of large update against an updatable view should be ordered by rowid of the target table. Maybe even merge statements too, if the target table is in the USING query and the ON clause doesn't access the target table via index.

In your view, is this just "one of those things" or is it something to remember to do going forward?

Thanks!

September 16, 2012 - 4:13 am UTC

it depends on the plan, only if the plan messed up the natural order.

if the plan was:

nested loops
    full scan of table to be updated
    index range scan
       table access by index rowid of table with new values

then the table to be updated would be in block order - we'd read each block once and update it.

it depends on the plan taken, the data access pattern dictated by the plan.

A Reader, September 21, 2012 - 2:01 pm UTC

Hi Tom,

If I do an insert into...select x,y,z from, and if this is inserting a substantial number of records, can another user enter the same record and commit before this completes? In that case the original insert will fail with dup_val_on_index right?

A Reader

September 26, 2012 - 12:56 pm UTC

yes

Oracle Parallel processing is flawed.

A reader, September 26, 2012 - 10:21 pm UTC

Tom,

By reading this thread, I am extremely disappointed from Oraclel software. Its now an establish fact that Oracle is just for OLTP processing.

For Oracle parallel processing even with Exadata, update statements poses problems. Oracle still not able to access the DOP of parallelism correctly for parallel queries. Exadata still have many architectural flaws.

If you buy a one million USD machine and to update a huge table completely you have to follow the non practical techniques as describe in this thread. I would really suggest the clients to see for another database a database design for data warehouse.

September 27, 2012 - 8:56 am UTC

update statements poses
problems.

no duh, brilliant. that is true for any database. update was invented for..... hold onto your hats..... OLTP, transactions!!!!

if you buy a 1,000,000$ machine to update a huge table - you are doing it wrong, anyone would tell you that.

you don't update a huge table, that is silly (you don't want to update millions or billions of rows and generate redo, undo (oltp things!!!))

you would use create table as select, period. massively parallel, no undo, no redo, no buffer cache, direct path reads/writes, easily compressable, etc.

I've already written this to you before - it is getting boring.

If something is an established fact, you'd be able to point us to the proof. I can point you to surveys that show MOST data warehouses run on Oracle. That Oracle is well positioned for warehousing. Now what?

http://www.oracle.com/us/products/database/dw-software-leadership-1691879.html

http://www.sybase.com/files/White_Papers/Gartner_MagicQuad_forDataWarehouseDMS.pdf

if you recommend to clients to use update to modify a huge table, I wold suggest your clients consult another consultant.

established fact just for Oracle

A reader, September 27, 2012 - 11:32 pm UTC

Hi Tom,

I dont mean to hurt you, neither I think its a bored topic. Try update statements in Netezza or Teradata You would find this thread more as a work around of a problem which Oracle has failed to address. The links you provided dont hold any significance since they are from oracle or sybase which already dont considered a warehouse database.

Just look at the parallel processing capabilities of Oracle, its extremely poor compared netezza or teradata. Further now its an establish fact that Oracle architecture have failed to understand what needed for Parallel processing.

Just take your example, You have advocating for a long long time that share everything architecture is best compare to Teradata share nothng architecture. Asktom is full of such FUD information. While now Exadata uses a sort of shared nothing and share everything architecture.

September 28, 2012 - 6:59 am UTC

ummm, the link to sybase is to a gartner article. gartner is not a database vendor, they are an independent group that evaluates database vendors.

the link to oracle.com is a link to IDC, again, another independent body outside of the vendor.
http://www.idc.com/

now what? still say "no significance"?????

Just look at the parallel processing capabilities of Oracle, its extremely
poor compared netezza or teradata.

explain, don't just spew junk out, explain. If it is so so so obviously clear, it should be simple for you.

You have advocating for a long long time that share
everything architecture is best compare to Teradata share nothng architecture.

i still do, 100%.

exadata combines a bit of shared nothing (the storage cells and query offloading) with the flexibility and ease of implementation of shared everything (the database server in the middle, the up to eight nodes of RAC cluster).

We do not make you reorganize your database as you add capacity, you don't partition for the ease and convenience of the computer. You partition the way you want to partition for administration (purging of data for example), you do not partition for your shared nothing implementation.

And you knew that, you just wrote it. It is not a shared nothing, some of the IO is that way - and that totally 100% transparent to the database designer, not stuck right in your face. It combines some shared something with a completely shared everything approach.

further

A reader, September 27, 2012 - 11:39 pm UTC

The problem with most Oracle resouces is, they have considered the oracle database as final thing and did not experiemented with other database. For a warehouse prespective most of the Oracle corporation resouces dont qualify even as warehouse expert, they have limited experience in this field. Exadata is just one such example. You spend millions of dollars just because you are not ready to experiment new systems.

I challenge TOM to have a BENCH MARK WITH NETEZZA VS ORACLE JUST ON THIS UPDATE. IF NETEZZA WON, CLOSE ASK TOM, IF TOM WINS I WOULD LEAVE NETEZZA DATABASE.

HOPE TOM IS MAN ENOUGH TO ACCEPT THE CHALLENGE AND TOM PLEASE STOP SPREADING FUD INFORMATION. THE INFORMATION HE SPREADS JUST HOLDS FOR ORACLE, NOT FOR OTHER DATABASE.

September 28, 2012 - 7:04 am UTC

there will never be a benchmark on an update on a data warehouse for the simple reason:

you don't do it that way. that would not be a smart approach.

I tell people how to properly use the Oracle database - yes, that is 100% true. You can tell them how to use the others, that isn't my job. But if you try to tell us that the approaches you use on database X are good for database Y, then you need to rethink your position.

If all databases were exactly the same, did everything the same way, there wouldn't be other databases now would there. We are different, you do things differently on the different databases.

for example, on Oracle a DBA partitions for their ease and convenience, we don't worry about the shared nothing aspect - that doesn't ever enter into our design. Now if you tried that on the others - that wouldn't work so swell would it.

for example, on Oracle, you can actually trickle feed your reporting/warehouse system since we can actually do OLTP in your warehouse - we can support fully concurrent reads and writes. You wouldn't want to be doing that in some other databases.

and of course the information I "spread" holds for Oracle - that is pretty obvious isn't it? I've never tried to hide where I'm from (ahem, like "a reader")

Maintaining Referential Integrity during CTAS

Ankit, January 29, 2013 - 4:14 am UTC

Hi Tom,

Can you explain to me how can one use CTAS (create table as select) strategy on a table with concurrent reads and contains referential constraints?

Please direct me to the link if you have already answered.

January 31, 2013 - 9:16 am UTC

if you are updating millions of records, this is going to be an offline operation (no modifications - you are locking pretty much the entire table after all).

so, during the create table as select, you will be the only one accessing the table in a write (modify) mode.

anyone that reads the old table will just read the old table - nothing to worry about there.

as for referential integrity -- you'll be dropping and adding those constraints after you drop and rename the table, they'll take care of themselves (again, an offline operation)

Bulk update order

Rajesh Voruganti, February 15, 2013 - 4:32 am UTC

Hi Tom,

I have one quetion. When we update the set of records using with bulk update query. So In this case Oracle engine taking which order?. Is Taking by that table of the primary column or else anything?

Thanks
Rajesh.

February 15, 2013 - 9:31 am UTC

if you do a bulk update, like this:


forall i in 1 .. some_table_variable.count
    update t set x = some_table_variable.x(i) where key = some_table_variable.key(i);

then the updates would take place in the order of the elements in some_table_variable.

how can i decide when i use partitioning tables?

mohamed Aboazm, March 07, 2013 - 8:53 am UTC

Dear Tom, I have acertain table contaning 3 milion records and the bussiness workflow is to update and insert every minute as totally about 1000 recors every day. So, i want to use certain scenario to deal with this table.

March 07, 2013 - 9:05 am UTC

it is a tiny table. with very little work done to it.

what are you trying to accomplish with partitioning in this case? why are you applying partitioning - what do you want to achieve with it. You need to start there.

problem

asdf, March 14, 2013 - 3:25 am UTC

dear tom,
plz solve this problem
rror in ==> C:\MATLAB6p5\work\Untitled3.m
On line 7 ==> Diffs(row, col) = Image(row, col)-100 ;

ramya, April 29, 2013 - 7:08 pm UTC

Hello TOM,

I have a select statement in a cursor through which the data is being selected and the data is for looped and it is update in a table having 64 million records(OVRRD_COST).The select statement in the cursor is taking time as the driving table which is a view(CRE_WORK_SET_TEST_VW) and the data in that view will change for each run. we do have 2,000,000 records in RAS_HIER table , 20,000,000 records in OVRRD_BOM and the another table OVRRD_COST is having 64 million records. And in the filter where clause it is having the bind variables in the select statement. the values of costper01 to costper18 from select statement are being used in the update set columns.

I want to optimize the below select statement. Any suggestions please ...

Please find the below select statement and I would like to mention that for each run the row coun to be updated in the target table will vary b/n 4000 to 1,000,000 .
Orginal select :

SELECT CB.matnr,CB.ORGUNTY,CB.ORGUNCD,DECODE(CR.CTO_FLAG,'N',CR.PTCSTY,DECODE(PRODTY,:V_OPTION_IND,CR.PTCSTY,DECODE(CR.PTCSTY,'LO','M',CR.PTCSTY))) PTCSTY,
sum(CMPNTQT1 * (COST_MTH1 + NVL(AGGRG_COST_MTH1,0))) COSTPER01,
sum(CMPNTQT2 * (COST_MTH2 + NVL(AGGRG_COST_MTH2,0))) COSTPER02,
sum(CMPNTQT3 * (COST_MTH3 + NVL(AGGRG_COST_MTH3,0))) COSTPER03,
sum(CMPNTQT4 * (COST_MTH4 + NVL(AGGRG_COST_MTH4,0))) COSTPER04,
sum(CMPNTQT5 * (COST_MTH5 + NVL(AGGRG_COST_MTH5,0))) COSTPER05,
sum(CMPNTQT6 * (COST_MTH6 + NVL(AGGRG_COST_MTH6,0))) COSTPER06,
sum(CMPNTQT7 * (COST_MTH7 + NVL(AGGRG_COST_MTH7,0))) COSTPER07,
sum(CMPNTQT8 * (COST_MTH8 + NVL(AGGRG_COST_MTH8,0))) COSTPER08,
sum(CMPNTQT9 * (COST_MTH9 + NVL(AGGRG_COST_MTH9,0))) COSTPER09,
sum(CMPNTQT10 * (COST_MTH10 + NVL(AGGRG_COST_MTH10,0))) COSTPER10,
sum(CMPNTQT11 * (COST_MTH11 + NVL(AGGRG_COST_MTH11,0))) COSTPER11,
sum(CMPNTQT12 * (COST_MTH12 + NVL(AGGRG_COST_MTH12,0))) COSTPER12,
sum(CMPNTQT13 * (COST_MTH13 + NVL(AGGRG_COST_MTH13,0))) COSTPER13,
sum(CMPNTQT14 * (COST_MTH14 + NVL(AGGRG_COST_MTH14,0))) COSTPER14,
sum(CMPNTQT15 * (COST_MTH15 + NVL(AGGRG_COST_MTH15,0))) COSTPER15,
sum(CMPNTQT16 * (COST_MTH16 + NVL(AGGRG_COST_MTH16,0))) COSTPER16,
sum(CMPNTQT17 * (COST_MTH17 + NVL(AGGRG_COST_MTH17,0))) COSTPER17,
sum(CMPNTQT18 * (COST_MTH18 + NVL(AGGRG_COST_MTH18,0))) COSTPER18,
sum (CMPNTQT1 * (STD_COST + NVL(AGGRG_STD_COST,0))) STDCOST,
DECODE(PRODTY,'UN','Y','N') OPTION_FLAG,
WS.FLOOR_SOURCE
from CRE_WORK_SET_TEST_VW WS,
RAS_HIER RH,
OVRRD_BOM CB,
OVRRD_COST_TEST CR
WHERE CR.ORGUNTY = CB.ORGUNTY AND
CR.ORGUNCD = CB.ALT_ORGUNCD AND
CR.MATNR = CB.IDNRK AND
CB.MATNR = WS.MATNR AND
WS.MATNR = RH.SKU_NO(+) AND
WS.ORGUNTY =CB.ORGUNTY AND
WS.ORGUNCD = CB.ORGUNCD AND
WS.LEVELNO = :N_LEVELNO AND
WS.PTCSTY = :V_PTCSTY AND
WS.ORGUNCD = :ORGUNCD AND
WS.ORGUNTY = :V_ORGUNTY AND
WS.BATCH_ID = :TASK_LOG AND
DECODE(PRODTY,:V_OPTION_IND,CR.PTCSTY,DECODE(CR.CTO_FLAG,'N',CR.PTCSTY,DECODE(CR.PTCSTY,'LO','M',CR.PTCSTY))) =:V_PTCSTY AND
WS.SETNO = :N_SETNO
GROUP BY CB.matnr,
CB.ORGUNTY,
CB.ORGUNCD,
DECODE(CR.CTO_FLAG,'N',CR.PTCSTY,DECODE(PRODTY,:V_OPTION_IND,CR.PTCSTY,DECODE(CR.PTCSTY,'LO','M',CR.PTCSTY))),
DECODE(PRODTY,:V_OPTION_IND'UN','Y','N'),
WS.FLOOR_SOURCE

April 30, 2013 - 2:35 pm UTC

do not select, read slow by slow and update small numbers of rows millions and millions of times.

write one single select statement - this select statement will produce the "updated set". Then put create table as select on front of this select and create a new table, drop the old, rename the new.

no undo
no redo (if you want)
all direct path writings, no inefficient caching happening

so, write a single select that joins the data to be updated with the data that does the updating and just create a new table

watch this:
http://tinyurl.com/RWP-DW-PART3

the interesting bit for you is about half way in - but the entire video is relevant to anyone doing some sort of 'ETL' processing like you.

A reader, April 30, 2013 - 4:19 pm UTC

Hello TOM,

I would like to thank you for answering my question.

Could you please show me an example for CTAS .

"write a single select that joins the data to be updated with the data that does the updating and just create a new table " -- correct me if I am wrong regarding this sentence, you want me to use merge here, if not could please elaborate it !!

Answer for example of CTAS

Siby, April 30, 2013 - 7:51 pm UTC

Hi,

In response to the previous question, I think this want Tom is referring to (he has shown this earlier on this page):
<quote>

ok, say you wanted to update emp to set ename = lower(ename). Instead, you could do this:

ops$tkyte@ORA817DEV.US.ORACLE.COM> create table new_emp as
2 select empno, LOWER(ename) ename, JOB,
3 MGR, HIREDATE, SAL, COMM, DEPTNO
4 from emp;

Table created.

ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> drop table emp;

Table dropped.

ops$tkyte@ORA817DEV.US.ORACLE.COM> rename new_emp to emp;

Table renamed.

</quote>

Instead of the simple select, you'd probably have a join depending on the number of tables and the new (updated) values that you want.

how to join these two merge statements

A reader, May 12, 2013 - 6:27 pm UTC

Hi Tom,

I have developed below 2 merge statements for updating parent/child records from the same table. However, I am not sure if there is a way we can join these two merge statements into single merge(or update) statement. Can you please help out here? I have included comments in the 2nd merge statement(where there is a difference from 1st merge statement). For security purposes I scrambled the data.

merge into test1 t1_o
using (select t1.col1, t1.col2, t1.col3, sum(t3.qty) parent_qty
from test1 t1, test2 t2, test3 t3
where t1.col3 = 'TEST' and t2.col1 = t1.col1 and
t2.col2 = t1.col2 and t2.col4=t1.col4 and t4.st_date <= sysdate-4
and t3.col1(+) = t1.parent_col /* Parent */
and t3.col5(+) = t1.col5 and t3.col6 (+) = t1.col6
and t3.col7(+) <= sysdate-2
group by t1.col, t1.col2, t1.col3) i
on (t1_o.col1 = i.col1 and t1_o.col2 = i.col2 and t1_o.col3 = i.col3)
when matched then update set t1_o.parent_qty = i.parent_qty, t1_o.col9='TEST2', t1_o.col10='TEST3';

merge into test1 t1_o
using (select t1.parent_col, t1.col1, t1.col2, t1.col3, sum(t3.qty) child_qty
from test1 t1, test2 t2, test3 t3
where t1.col3 = 'TEST' and t2.col1 = t1.col1 and
t2.col2 = t1.col2 and t2.col4=t1.col4 and t4.st_date <= sysdate-4
and t3.col1(+) = t1.col1 /* col1 instead of parent_col */
and t3.col5(+) = t1.col5 and t3.col6 (+) = t1.col6
and t3.col7(+) <= sysdate-2
group by t1.col, t1.col2, t1.col3) i
on (t1_o.parent_col = i.parent_col and t1_o.col1 = i.col1 t1_o.col2 = i.col2 and t1_o.col3 = i.col3)
when matched then update set t1_o.child_qty = i.child_qty, /* child_qty instead of parent_qty */
t1_o.col9='TEST2', t1_o.col10='TEST3';

May 13, 2013 - 12:57 pm UTC

In general, I'd have to say "no"

what if the first merge updates record X *and then* the second merge updates record X. That would work fine as a two step, it could raise an error as a single merge.

that and you are merging in on different columns, you are joining to t1_o using entirely different criteria on the ON clause - while we could union together the two using statements *maybe* - it would do us no good since you join using three columns in one of them and four columns in the other.

update help

A reader, May 13, 2013 - 8:56 pm UTC

Hi Tom,

Can you please check the below update and provide your valuable suggestions on how it could be done efficiently.

update test_1 o set col1='test1', col2='test2' where exists (
select 1 from test_1 i where col3 = :B1 and col2 = :B2
and col4 = o.col4 and col5 = o.col5 and col6 = o.col6
group by col4, col5, col6 having count(*) > 1)
and col3 = :B1
and col2 = :B2;

Thank you

May 13, 2013 - 11:43 pm UTC

no create
no inserts
no look

also, give information about size of table, probable number of rows modified, how many rows you expect the count(*) to represent when it is updated

Alexander, June 06, 2013 - 3:59 pm UTC

Hi Tom,

I have a pretty large table that developers want to add a new column to, and back fill the new column with default values. Is the fastest way to do this:

create table t_temp as select cols, 1 as "new_col_value"
from t nologging

drop t
rename t_temp to t
put indexes and stuff back...?

June 06, 2013 - 4:26 pm UTC

in 11.2, it would be:

alter table t add (x number default 42 NOT NULL);

that is called a fast add column.

https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:2575782100346318728

(in the next release after 11gr2 - the NOT NULL part won't be mandatory anymore!)

INSERT 200000 ROWS

Luis, August 13, 2013 - 11:28 pm UTC

Hi Tom,

I need to insert 200000 rows (result a select) to a table every month, what's the best practics to do that??

INSERT APPEND INTO in_table SELECT * FROM out_table;

Thanks in advance.

August 14, 2013 - 2:55 pm UTC

append would be a hint - so it would be insert /*+ APPEND */

do you delete from this table? or just insert? 200,000 rows is pretty small - if you do deletes and want to reuse that space - you'll want to drop the append hint.

post load update - two times

waran, August 20, 2013 - 2:17 am UTC

hi Tom i am learning from your blog lot. but this is my question. i need your expertise on this. i am loading one table which contain 12 million records every day. after loading i need to do post load update in one column two times. what is the best way to do this. to load the table i am using SP which will automatically load the table daily basis.
Version 11.2

August 28, 2013 - 5:29 pm UTC

the best way:

do not update, perform the update logic during the load. Just do a single load with all of the update logic embedded.

why do you feel the need to update twice? why wouldn't you just insert the data you want????

Update Methods

Todor, August 22, 2013 - 12:10 pm UTC

Hi Tom,

I found an articale where 8 update methods are compared:
http://www.orafaq.com/node/2450

Can we trust the test that are made?

In your opinion, which method schould be used if it's necessary to use dynamic sql for the updates? (11g)

Thank you!

August 28, 2013 - 6:11 pm UTC

they missed the best

do not UPDATE!!!!!! use create table as select to create the modified data. can skip redo, undo, run in parallel, bypass the inefficient buffer cache and do it all in a single statement.

but always bulk up

a single sql statement is almost always the way to go

parallel can benefit sometimes (but not always)

DDL should be considered when you are going to hit most of the blocks anyway (eg: if you are going to update 1% of the rows in a table, and that table has 100 rows per block, and your data is uniformally scattered through the table - you will update every single block - you might as well use DDL for that!!!!)

Any better suggestion for the below code?

Karthik, September 06, 2013 - 8:01 am UTC

Hi Tom

I request you to kindly let me know if there is a better way to write the below code. This code basically updates about 350 million records using BULK UPDATE and FORALL statements. It takes about 32-35 hours for this code to complete the updation. Kindly suggest if there any better ways to update. We are using 20 threads / jobs in parallel to do the updation and still takes the above time.
----------------------
FUNCTION INTF_MAINFL1_UPDATE_T(ID_FROM IN NUMBER, ID_TO IN NUMBER)
RETURN BOOLEAN AS
CURSOR cur_intf_mainfl1 IS
SELECT im.rowid i_rowid, im.im1_sys_id i_seqno
FROM intf_mainfl1 im
where im.im1_sys_id between ID_FROM AND ID_TO;

varid intf_mainfl1.im1_sys_id%TYPE;
varcount NUMBER;

TYPE typ_cur_intf_mainfl1 IS TABLE OF cur_intf_mainfl1%ROWTYPE;
lv_typ_cur_intf_mainfl1 typ_cur_intf_mainfl1;

BEGIN
varcount := 0;
OPEN cur_intf_mainfl1;
LOOP
FETCH cur_intf_mainfl1 BULK COLLECT
INTO lv_typ_cur_intf_mainfl1 LIMIT 2500;

EXIT WHEN lv_typ_cur_intf_mainfl1.COUNT = 0;

FORALL i IN lv_typ_cur_intf_mainfl1.first .. lv_typ_cur_intf_mainfl1.last
-- LOOP
-- BEGIN
-- varid := lv_typ_cur_intf_mainfl1(i).i_seqno;
UPDATE dedup_precook_data dpd
SET dpd.dpd_rowid = lv_typ_cur_intf_mainfl1(i).i_rowid
WHERE dpd.dpd_table_id = 'M'
AND dpd.dpd_uploadid = lv_typ_cur_intf_mainfl1(i).i_seqno;

FORALL i IN lv_typ_cur_intf_mainfl1.first .. lv_typ_cur_intf_mainfl1.last
UPDATE dedup_result dr
SET dr.drm_rowid = lv_typ_cur_intf_mainfl1(i).i_rowid
WHERE dr.drm_table_id = 'M'
AND dr.drm_uploadid = lv_typ_cur_intf_mainfl1(i).i_seqno;

--END LOOP;
COMMIT;
END LOOP;
CLOSE cur_intf_mainfl1;
COMMIT;
RETURN TRUE;

EXCEPTION
WHEN OTHERS THEN
RETURN FALSE;
END;
----------------------

September 09, 2013 - 11:37 am UTC

 EXCEPTION
    WHEN OTHERS THEN
      RETURN FALSE;
  END;

I hate your code

search this site for "i hate your code" to find out what I hate about your code...

in general, i would never even consider updating 350,000 rows. I would be looking to use DDL to convert the data (no redo, no undo, parallel easily, fast - big - bulk - NO PROCEDURAL CODE)

committing inside the loop!!!!!!!! ugh, what is that about??? do you want to get ora-1555's, what happens when this code fails? How do you restart?????????????

this should be two create tables as select - one to create a new dedup_result and another to create a new dedup_precook_data

Updateable Join view deprecated ?

Philippe, September 09, 2013 - 12:31 pm UTC

In orafag http://www.orafaq.com/node/2450

They said that the updateable Join view is deprecated and merge it's the new methode.

Is that true ?

September 10, 2013 - 9:44 am UTC

it is not deprecated, it is ANSI standard SQL syntax. they (the authors of that page) are calling it deprecated because in their opinion, merge is the easy way to express what you want to do (and I sort of agree, but I won't call an updateable join 'deprecated')

Partition exchange problem with primary key constraint

Milind, January 16, 2014 - 6:52 am UTC

Hi Tom,

I am trying for partition exchange as follows which fails with error. (Oracle version 11.2.0.1.0)

SQL> CREATE TABLE pemp
  2  (empno NUMBER(4) CONSTRAINT pk_pemp PRIMARY KEY,
  3     ename VARCHAR2(10),
  4     job VARCHAR2(20),
  5     mgr NUMBER(4),
  6     hiredate DATE,
  7     sal NUMBER(7,2),
  8     comm NUMBER(7,2),
  9     deptno NUMBER(2) )
 10   PARTITION BY RANGE (sal) (
 11   PARTITION p1_1250 VALUES LESS THAN (1251),
 12   PARTITION p2_3000 VALUES LESS THAN (3001),
 13   PARTITION p3_restsal VALUES LESS THAN (MAXVALUE));

Table created.

SQL> INSERT INTO pemp SELECT * FROM emp;

14 rows created.

SQL>
SQL> CREATE TABLE emp_t AS SELECT * FROM pemp WHERE empno IS NULL;

Table created.

SQL>
SQL> INSERT INTO emp_t SELECT empno, ename, job || 'abcd' AS job, mgr, hiredate, sal, comm, deptno FROM pemp PARTITION
  2  (p2_3000);

8 rows created.

SQL>
SQL> ALTER TABLE pemp EXCHANGE PARTITION p2_3000
  2  WITH TABLE emp_t INCLUDING INDEXES WITHOUT VALIDATION;
ALTER TABLE pemp EXCHANGE PARTITION p2_3000
*
ERROR at line 1:
ORA-14097: column type or size mismatch in ALTER TABLE EXCHANGE PARTITION

Above thing works if I do not have Primary Key constraint specified on table pemp. Why it doesn't work when Primary key is there? What is remedy for this?

I also referred MOS 1418545.1 however it seems none of the situation is applicable here.

Thanks in advance.

January 16, 2014 - 5:44 pm UTC

you are missing the primary key constraint on EMP_T

ops$tkyte%ORA11GR2> CREATE TABLE emp_t AS SELECT * FROM pemp WHERE empno IS NULL;

Table created.
<b>
ops$tkyte%ORA11GR2> alter table emp_t add constraint emp_t_pk primary key(empno);

Table altered.
</b>
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> INSERT INTO emp_t SELECT empno, ename, job || 'abcd' AS job, mgr,
  2  hiredate, sal, comm, deptno FROM pemp PARTITION (p2_3000);

8 rows created.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> alter table pemp exchange partition p2_3000 with table emp_t with validation update indexes;

Table altered.

Note that when you have enabled constraints like a primary key - without validation "is not valid"

http://docs.oracle.com/cd/E11882_01/server.112/e25523/part_admin002.htm#i1107555

When you specify WITHOUT VALIDATION for the exchange partition operation, this is normally a fast operation because it involves only data dictionary updates. However, if the table or partitioned table involved in the exchange operation has a primary key or unique constraint enabled, then the exchange operation is performed as if WITH VALIDATION were specified to maintain the integrity of the constraints.

also, the index to enforce your primary on the PEMP table is going to be a global index (since you are not partitioning the table by the primary key - the index used to enforce the primary key must be globally partitioned or not partitioned at all which is just a special case of a globally partitioned index. So, using update indexes in 11g and before (before 12c's asynchronous global index maintenance) might be appropriate unless you want to rebuild the global index afterwards.

Why CTAS is giving problem?

Milind, January 17, 2014 - 4:11 am UTC

Hi Tom,

Thanks a lot for prompt reply.

Now instaed of creating empty table, adding Primary Key constraint and inserting data, I tried second approach as following i.e. using CTAS and adding Primary key. Now even I have Primary key constraint with status enabled and validated, why it gives the error.

SQL> drop table emp_t purge;

Table dropped.

SQL> create table emp_t as SELECT empno, ename, job || 'abcd' AS job, mgr, hiredate, sal,
  2  comm, deptno FROM pemp PARTITION(p2_3000);

Table created.

SQL> alter table emp_t add CONSTRAINT pk_emp_t_empno PRIMARY KEY (empno);

Table altered.

SQL> select CONSTRAINT_NAME, STATUS, VALIDATED from user_constraints where table_name='EMP_T';

CONSTRAINT_NAME                STATUS   VALIDATED
------------------------------ -------- -------------
PK_EMP_T_EMPNO                 ENABLED  VALIDATED

SQL> ALTER TABLE pemp EXCHANGE PARTITION p2_3000
  2  WITH TABLE emp_t WITH VALIDATION UPDATE INDEXES;
ALTER TABLE pemp EXCHANGE PARTITION p2_3000
*
ERROR at line 1:
ORA-14097: column type or size mismatch in ALTER TABLE EXCHANGE PARTITION

Thanks in advance

Why CTAS is giving problem?

Milind, January 17, 2014 - 4:17 am UTC

Hi Tom,

Sorry to bother you. I got it. In emp_t table with CTAS, job column size become varchar2(24) because of job || 'abcd' . In pemp table it is varchar2(20).

When column size is corrected, it worked.

SQL> alter table emp_t modify job varchar2(20);

Table altered.

SQL> ALTER TABLE pemp EXCHANGE PARTITION p2_3000
  2  WITH TABLE emp_t WITH VALIDATION UPDATE INDEXES;

Table altered.

Warm Regards.

Issue while Updating 100k records in a 10 Million table

Pa1, February 03, 2014 - 10:17 am UTC

Hi Tom,

Need your expertise in help resolve this problem we are facing. We have a table with 10 millions records and need to update this table with two new column values in a batch of 100k records. While doing the initial 100k we didnt see any issues but as we are trying to process further batches of 100k records, its taking considerable amount of time. We have indexes on this table which is used by the Update query. Is it causing the problem. Can we halt the indexing on this table during the bulk update and rebuild it again. Will this help?

A reader, February 14, 2014 - 8:20 pm UTC

Drop the index. Load the data and then create the index. That should be faster.

Oracle Magazine July/August 2014

Sokrates, June 25, 2014 - 3:25 pm UTC

just saw your article in Oracle Magzine July/August 2014 regarding the topic "large updates with CTAS".

One should probably add, that the algorithm given there has to be extended in case the table has virtual columns ( and that this might only work in 12c ):

1a. CTAS the new table without virtual columns ( because CTAS does not support them )
1b. add virtual columns
1c. ( only works in 12c ) if necessary, re-order virtual columns by setting columns invisible/visible again

records of atable

y.navya, July 10, 2014 - 9:21 am UTC

Hiiii tom

how to update all the records of a table in database

Shell script used to delete

Shuja, September 11, 2014 - 3:10 pm UTC

Hi Tom,

I was faced with a similar problem and considered using the create a new table and drop the existing. However I had already put a shell script in place and cron'd to run every 3 hours.

It seems to be working well, but I dont know if this is the most efficient method to tackle this in future purging of data.

Script tbl_clean.sh :-

#!/usr/bin/sh

clear

#-- Set Month and Row limit to delete
month="MAY"
limitRow=6

#-- Function to get total count for each table thats imported from the "table" file
getCount() {
sqlplus -S /nolog << EOF
connect / as sysdba
SELECT count(*) FROM $dataline
WHERE importdate LIKE '%$month%';
exit
EOF
}

#-- Function to delete X rows from the imported table list and commit the change
deleteRows() {
sqlplus -S /nolog << EOF
connect / as sysdba
DELETE FROM $dataline
WHERE IMPORTDATE LIKE '%$month%'
AND rownum < $limitRow;
commit;
EOF
}

cat tables | while read dataline
do
rows="$(getCount | sed 's/[^0-9]*//g')"
echo "Table " $dataline " has " $rows "rows"

if [ "$rows" = 0 ]
then
echo "No rows to delete\n"
else
echo $(deleteRows) "\n"
fi
done

Rgds,
Shuja

Intermediate Commit in direct load

Samby, November 18, 2014 - 6:46 pm UTC

Hi Tom,

I am working in Dataware housing project in which we insert 30-40 millions of record in a table using append hints (Direct load) from a select query
like

insert /*+ append */ into temp
select *
from all_objects;

I want to commit after 10k records, could you please suggest is there any way to do that without using cursor,loop,bulk collect.

Many Thanks in Advance

Samby

bulk update using merge and analytic functions

santosh, January 06, 2015 - 5:22 pm UTC

Is it advisable to merge update a table from another which stores some values which need to be extracted using last_value ordered by a date column ?
Is there a faster way to do this.

How to Update millions of records in a table

Shweta, February 25, 2015 - 6:17 pm UTC

Hi Tom,
What if there was an "Identity Column" in old table and I created the table "AS" instead of updating the same table? After the new table is created, would the identity column behave same when new records are added?

Update 5 millions rows using info in another table

Lang, March 12, 2015 - 2:07 pm UTC

Hi Tom,
I have table1 with 5 million rows that I need to update one column with info from table2 with 700 rows. For each record in table2, I need to go through all rows in table2 to find a match. What's the best way to do it. I am doing it in SQL Developer, Oracle 11g. Running for 40 hours now, no end in sight:
BEGIN
For i in (select id, inspectiondate, eventid from inspections)
LOOP
Update readings set inspectionid = i.eventid, modifiedby=''
where inspectionid is null and stationid= i.id and readingdate=i.inspectiondate;
END LOOP;
END;

We really need to get this done very soon!
Thanks a lot!

Jenny, April 28, 2015 - 9:26 am UTC

Is there a difference between doing a CTAS vs IAS for this usecase ?

Doing a CTAS requires me to re-create all indexes, triggers, constraints etc, instead can we just
1) make a copy of the original table
2) disable all indexes, triggers, constraints on original table
3) truncate original table, do IAS from temp table with modified columns to original table
4) enable all indexes, triggers etc on original table

Thanks Tom

Sunil Kumar Choubey, October 09, 2016 - 7:36 pm UTC

Thanks for this wonderful conversation

October 10, 2016 - 5:32 am UTC

Thanks for the feedback

Oracle Script

Prashant, May 06, 2019 - 9:17 am UTC

How to create script for drop tables older than 45 days from schema?

May 06, 2019 - 4:38 pm UTC

What do you mean by "older" ?

Data? Created ? Other?

Oracle Script

A reader, May 07, 2019 - 2:40 pm UTC

tables created date

May 10, 2019 - 8:06 am UTC

The CREATED column in *_objects tells you when the object was made:

select * from all_objects
where  object_type = 'TABLE'
and    created < sysdate - 45
and    owner = 'THE_SCHEMA'

Thanks for the Excellent Briefing.

Khalid Mehmood, June 25, 2019 - 4:29 am UTC

Thanks for the Excellent Briefing.

Although I read this post after I was in the middle of Merging a huge table and was regretting but this post will surely help me for future tasks.

Rizwan Admani

Rizwan Admani, December 07, 2020 - 12:38 pm UTC

DECLARE

CURSOR ONE IS
SELECT EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO
FROM EMP

minus
SELECT EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO
FROM EMP2;

emp number;
emp1 number;
BEGIN

GO_BLOCK('BUTTON');
FIRST_RECORD;
for x in one

LOOP
select count(empno)
into emp
from emp2;
select count(empno)
into emp1
from emp;

fetch one into

:EMPNO,:ENAME,:JOB,:MGR,:HIREDATE,:SAL,:COMM,:DEPTNO;

if emp = emp1 then
null;
else
insert into emp2(EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO)
values(:EMPNO,:ENAME,:JOB,:MGR,:HIREDATE,:SAL,:COMM,:DEPTNO);
commit;
:per:=nvl(:Per,0)+1;
end if;

EXIT WHEN one%NOTFOUND;

END LOOP;

END;

December 08, 2020 - 5:11 am UTC

That is some nice Forms code there.... I'm not sure what its for, but thanks for posting ?

2022 Update

Georgi, February 23, 2022 - 11:54 am UTC

Hello,
Since we are in the 2022 now and times and technology has changed, I am wondering what is the up to date answer to the original question ?

To be more specific:
In our DWH, we need to update a column in approx. 400 tables split in different schemas, containing records ranging from 1,000,000 to 10,000,000,000 (yes, 10 billion...).

We have 10 tables with more than 1 billion, 40 with less than 1 billion and more than 100 million. And around a hundred with less 100 million and more than 1 million.

And table recreation is not an option...

The initial idea was to use dbms_parallel_execute and run the update one schema at a time. i did some reading and saw that in newer versions of Oracle, the parallel hint can be used even on a table that has no parallel in its definition or that has no partitions.

We have sufficient amount of resources, so would it be a good idea to use the integrated parallelism and in general which is the best approach ?

Thanks !

May 23, 2022 - 10:20 am UTC

If you're updating a significant fraction of the rows in a large table, the CTAS method will still be the fastest. The thresholds for what counts as "significant" and "large" are something you'll need to test in your environment!

Yes, you can use a parallel hint to run an update in parallel like this:

update /*+ parallel */t
set ...

(Ensure that parallel DML is enabled though)

The big benefit of this is it's easy to write. The downsides are:

- You're locking all the affected rows for the duration of the statement. This could be a problem if other sessions need to update the rows at the same time.
- It's all-or-nothing. So if the process is 90%+ complete and hits an error, the whole thing rolls back. You can use DML error logging to mitigate this.

In contrast, there's more code to write with DIY parallelism, making it more fiddly to use. The upside is you're processing rows in chunks. You can limit how many chunks are processed at the same time. So you're not locking the whole data set the whole time.

That said, there is an easier way - DBMS_REDEFINITION.EXECUTE_UPDATE. This is essentially a wrapper for an online CTAS method.

All you need is:

exec dbms_redefinition.execute_update ( 'update ... set ... ' );

I discussed this and other update tuning methods in this Office Hours session:

Further on 2022

Georgi, February 24, 2022 - 8:23 am UTC

Thanks for the quick and detailed reply !

I've watched your video and today I will do some testing with EXECUTE_UPDATE, so thanks !

The task is to add a new column for the insert date of the record to those ~400 tables, which have to be taken (updated) from another column in the same table. This column is different for most of the tables and for some it does not even exist, so in this case we will use SYSDATE. All records should have it, so everything should be updated.

This is needed because a regular purging process will be implemented later based on that insert date column. And we will have a maintenance window of around 4 hours, so locking rows is not an issue.

We do not want to use CTAS, because we have to do it for ~400 tables and there are lots of indexes, constraints, partitions, dependent objects etc.

For our tests we used dbms_parallel_execute and it works fine in our test environment, but it has significantly less amount of data.

Any additional comments based on the above is appreciated. Thanks !

February 24, 2022 - 4:09 pm UTC

For the tables without an existing column to copy use:

alter table ...
  add insert_date date default sysdate not null;

This will do a "fast column add", instantly setting its value for all existing rows.

For the tables where you are copying from an existing column - my big question is:

Why bother?

If you need/want a consistent column name across all tables, there are a couple of tricks you could use to avoid all the updates.

First is to rename the existing column:

alter table ...
  rename column old_name to new_name;

If this will cause lots of code change headaches, another is to "clone" the column using views or virtual columns, e.g.:

create view ... as
  select t.*, old_name as new_name
  from ... t

Finally if you want to minimize downtime, you could look at using Edition-based Redefinition to do the change fully online.

Thanks

Georgi, February 25, 2022 - 2:11 pm UTC

Thank you so much, Chris !
The fast column add is really helpful ! I read about it and tried it, and it will save sufficient amount of time for those tables without any date column. Really appreciated !

To your question... For the other tables, the value in the date column can change, so it does not guarantee the age of the record. In this case we cannot create a view, rename the column, or add a virtual one based on it. This is why we have to add a new column, update it with whatever other date we have at that point in time and consider it as the creation date of the record, so that we can later delete based on it.

I guess that's life working with legacy applications :)

February 28, 2022 - 4:10 am UTC

Just be aware that to enable fast add with add a hidden column to your table. This is *unlikely* to cause any issues, but you might some care with thinks like 'alter table shrink' or partition maintenance should that need ever arise

A reader, August 12, 2022 - 12:09 am UTC

What if i cannot delete the table, like it has lots of dependencies. I want to update the table with millions of rows.

August 12, 2022 - 3:05 am UTC

check out dbms_parallel_execute for that

Live SQL

Share and learn SQL and PL/SQL; free access to the latest version of Oracle Database!

Dev Gym

Classes, workouts and quizzes on Oracle Database technologies. Expertise through exercise!

How to Update millions or records in a table

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

updating millions of records

Million records update

"How to Update millions or records in a table", version 8.1.7

updating millions of records

in case of delete

Agree with John - insert append / CTAS / partition swap is the only way to fly

addendum

How to Update millions or records in a table

How to Update millions or records in a table

You can even rebuild all the indexes in a partition simultaneously

in case of deletes

in case of deletes - many thanks

A bit confused

Replacing PK values

Update million Rows

A single batch?

What about space consideration

How to update part of a Very Large Table

Updating 100K of 1 Million rows

This is very good example for updating the records

slightly different problem!

space considerations

slightly different problem!

Time for Update

Verifing if neccesary to update, before update

easier said then done in real life situations

slightly confused !

SCN,CHECK POINT

More clarification

Explicit commit !

Explicit Commit

I have to take issue....

Re: Explicit Commit - To Nathan

Reader

How to update part of a Very Large Table

update large table

update large table

Yes, you are right,

Does parallel insert just imply direct?

Thanks!

Optimizing Query

updating millions of records

great

Great

Updating 1 million records

How about rollback segment problem when updating million rows.

MILLION RECORD INSERT/SELECT/UPDATE TAKES TOO MUCH TIME

Select and update of a table Having Millions of Records.

Followup to your question

whats CBO

No

Online updation of very big table

it gives error

thanks

Update taking hrs of time

Updating millions of rows

Deletes/Updates

A small correction

Parallel DML

enable PK with parallel clause

How to Update millions or records in a table

how to make this restartble?

provide more information about that procedure drop indexes and rebuild with nologging

Problem with Update

Updating million records

How to Update millions or records in a table

Please see this

Thanks

To Mr. Ram

update based on rowid

update based on rowid

update large table

convert delete to insert

sort_area_size, pga_aggregate_target, insert /+ APPEND/