updating millions of records
Om, November 11, 2002 - 11:09 am UTC
Hi Tom,
Could you please elaborate
CREATE TABLE new_table as select <do the update "here"> from old_table;
the above statement with a suitable example please
November 11, 2002 - 5:45 pm UTC
ok, say you wanted to update emp to set ename = lower(ename). Instead, you could do this:
ops$tkyte@ORA817DEV.US.ORACLE.COM> create table new_emp as
2 select empno, LOWER(ename) ename, JOB,
3 MGR, HIREDATE, SAL, COMM, DEPTNO
4 from emp;
Table created.
ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> drop table emp;
Table dropped.
ops$tkyte@ORA817DEV.US.ORACLE.COM> rename new_emp to emp;
Table renamed.
Million records update
Ramesh G, November 11, 2002 - 1:44 pm UTC
What if a table has over 100 million records and if i only want to update 1 million? If your method is still applicable could you elaborate it.
Many thanks in advance.
November 11, 2002 - 6:33 pm UTC
most likely -- yes. I don't have a 100million row table to test with for you but -- the amount of work required to update 1,000,000 indexed rows is pretty large. Fortunately, you are probably using partitioning so you can do this easily in parallel -- bit by bit.
"How to Update millions or records in a table", version 8.1.7
John Bittner, November 11, 2002 - 2:21 pm UTC
This is absolutely a viable approach, and one we have used repeatedly. One of our apps updates a table of several hundred million records.
The cursor..For loop approach for the update was calculated to take 53.7 years to complete!
We institued the Insert into a dummy table append with nologging, and were able to complete the "update" in under 30 minutes.
With nologging, if the system aborts, you simply re-run the 'update' again, as you have the original data in the main table. When done, we swap the partition of original data with the 'dummy' table (the one containing new values), rebuild indexes in parallel, and wha-la! Our update is complete.
i.e, to update field2 in a table:
1) First create your dummy hold table: create table xyz_HOLD as select * from xyz where rownum<1. Alter tablexyz nologging.
2) insert /*+ append parallel (xyzhold,12) */ into xyz_hold xyzhold (field1, field2, field3) select /*+ parallel (x,12) */ xyz.field1, my_new_value_for_field2, xyz.field3 from xyz x where blah blah blah.
3) when done, either rename the table, or swap the partition if your original table is partitioned, and you only updated one partition as we do. Obviously you need to rebuild indecies, etc as required.
Hope this helps!
updating millions of records
A.Ashiq, November 11, 2002 - 11:28 pm UTC
Hi Tom,
As u suggested us to create a new table ,then drop the original table and rename the new table to original table instead of updating a table with millions of records.But what happen to dependent objects ,everything will get invalidated.Yeah ,of course it'll recompile itself when it called next time.But again it(dependent objects) has to do parsing.Is it Ok.
November 12, 2002 - 10:11 am UTC
It is OK.
in case of delete
A reader, November 12, 2002 - 9:58 am UTC
We've a similar situation., We delete around 3 million records from 30 million rows table everyday.
There is no logical column to do partition.,
I guess the insert into a new table will take considerable time with 27 mil records.. Please let me know what is the best approach.
November 12, 2002 - 10:35 am UTC
wait 10 days so that you are deleting 30 million records from a 60 million record table and then this will be much more efficient.
Time it some day though. 3 million records on an indexed table will take considerable time. There is a chance that INSERT /*+ append */ select <rows to keep> done nologging, parallel with create indexes in parallel could best the delete.
A reader, November 12, 2002 - 12:43 pm UTC
Tom,
Recently I had conducted a interview in which one the dba mentioned that they had a table that might conatin 10 million records or might be 1 million. He meant to say they delete the records and some time later the table will be populated again and viceversa.
Tom according to you do you consider partitions for such tables and if yes which type of partition..
Thanks.
November 13, 2002 - 12:31 pm UTC
hard to tell -- is the data deleted by something that is relatively constant (eg: the value in that column doesn't change - so the row doesn't need to move from partition to partition). If so -- sure, cause we could just drop partitions (fast) instead of deleting the data.
Agree with John - insert append / CTAS / partition swap is the only way to fly
Jack Silvey, November 12, 2002 - 1:19 pm UTC
I work with John Bittner, one of the previous reviewers. I second what he said absolutely. It is the only way to fly. We also have an absolutely incredible stored procedure that rebuilds all of our indexes concurrently after the load, using the Oracle job scheduler as the mechanism of allowing separate threads in pl/sql.
addendum
A reader, November 13, 2002 - 10:39 am UTC
This process was introduced to our environment by a master tuner and personal friend, Larry Elkins. This was a totally new paradigm for the application, and one that saved the entire mission-critical application. The in-place updates would not have worked with the terrabytes of data that we have in our database.
How to Update millions or records in a table
Boris Milrud, November 19, 2002 - 1:55 pm UTC
In response to the Jack Silvey (from Richardson, TX ) review, where he wrote "It is the only way to fly. We also have an absolutely incredible stored procedure that rebuilds all of our indexes concurrently after the load, using the Oracle job scheduler as the mechanism of allowing separate threads in pl/sql":
Could you provide more information about that procedure and how to rebuild multiple same-table indexes concurrently using Oracle job scheduler?
Thanks,
Boris.
November 19, 2002 - 5:29 pm UTC
instead of
begin
execute immediate 'alter index idx1 rebuild';
execute immediate 'alter index idx2 rebuild';
end;
you can code:
declare
l_job number;
begin
dbms_job.submit( l_job, 'execute immediate ''alter index idx1 rebuild'';' );
commit;
dbms_job.submit( l_job, 'execute immediate ''alter index idx2 rebuild'';' );
commit;
end;
Now, just set job_queue_processes > 0 to set the "degree of threads" and in 8i and before set job_queue_interval to say 60 or so and there you go.
How to Update millions or records in a table
Boris Milrud, November 19, 2002 - 9:10 pm UTC
Thanks, Tom.
Your status said that you had a large backlog, so I decided not to wait for your response and tried myself using dbms_job.submit() calls. At first, it did not work (job_queue_processes was 0), but after I set it to 12 it started working. The only one difference between your code and mine is that I issue just one commit at the end. It should not matter, right?
I selected 1 mln. rows table and rebuild 5 non-partitioned indexes with 'compute statistics parallel nologging' clause.
Here is the numbers I've got: rebuilding indexes sequentually consistently took 76 sec., while using dbms_job.submit() calls took around 40 - 42 sec.
I said "around", because the technique I used may not be perfect, though it served the purpose. I recorded the time right after the commit statement at the end of PL/SQL block - that's the start time. Then I kept querying user_jobs view every 2 - 3 sec, until the last of the 5 jobs were gone. That's was the end time.
The last question on this topic: is user_jobs view is the right place to look in order to determine that's rebuilding is done and how long it took? In package I am writing, I do massive delete operation, then rebuilding indexes, then starting the next routine. What would be the best way to detect the end of rebuiding, in order to proceed with the next call?
Thanks.
Boris.
November 19, 2002 - 10:14 pm UTC
You can use user_jobs or dba_jobs but -- you might just want to put some "logging" into your jobs themselves so you can monitor their progress and record their times.
You can even rebuild all the indexes in a partition simultaneously
ramakrishna, November 21, 2002 - 12:50 am UTC
Thanks Tom,
In our environment, we have partitioned tables and we use:
ALTER TABLE table_name MODIFY PARTITION part_name REBUILD UNUSABLE LOCAL INDEXES
and this rebuilds all the indexes in the partition at one shot. We found this much faster than doing the indexes one by one. We will now try out if we can submit multiple such jobs in parallel (one for each partition of the table).
regards
ramakrishna
in case of deletes
carl, November 29, 2002 - 5:17 pm UTC
Hi Tom,
Thanks so much for your web site and help. It is our
number 1 reference in times of fear and loathing.
This is what we came up with concerning mass updates
INV 50M
INVINS 10M
INVDEL 7M
There are indexes on INV.KEY and INVDEL.KEY
Execution Plan ( for deletes and updates )
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 FILTER
2 1 TABLE ACCESS (FULL) OF 'INV'
3 1 INDEX (RANGE SCAN) OF 'IX_INVDEL01' (NON-UNIQUE)
alter table INVTMP nologging;
-- INVINS contains delta inserts and updates
insert /*+ APPEND */
into INVTMP
select *
from INVINS t1;
-- INVDEL contains delta deletes and updates
insert /*+ APPEND */
into INVTMP
select *
from INV t1
where not exists ( select null
from INVDEL t2
where t2.KEY = t1.KEY );
alter table INVTMP logging;
drop table INV;
rename INVTMP to INV
-- build indexs etc
This is what we came up with and is to the fastest
approach we've tested.
Any comments or suggestions are welcome and appreciated.
November 30, 2002 - 8:49 am UTC
I might try an anti-join on the NOT EXISTS and doing the work in a single parallel CTAS
create table invtmp as
select * from invins
UNION ALL
select t1.*
from inv t1, invdel t2
where t1.key = t2.key(+)
and t2.key is null;
add parallel /nologging as needed.
in case of deletes - many thanks
carl, December 01, 2002 - 7:45 pm UTC
Ran the test cases at home 500K rows in INV and 50K rows in
INVINS and INVDEL
My way: 16.33 secs
Your way: 08.35 secs
Many thanks.
A bit confused
Choche, March 17, 2003 - 10:30 am UTC
I was just wandering that none of the reviews made mention if these techniques could be applied in a multi-user environment where multiple users could be updating the same table at the same time.
March 17, 2003 - 10:32 am UTC
sorry -- I though it obvious that in most cases "no" is the answer. we are copying the data or locking excessive amounts of it -- or disabling indexes and the like.
This is a "batch" process here, to update millions of records.
Replacing PK values
Michael, July 25, 2003 - 6:42 pm UTC
We need to replace the main ID for all entries in the database which is the PK in about 100 tables, largest of them have about 2 mln rows. New and old values for ID are stored in the lookup table, and there about half a million of them. The original approach was to replace PK in all tables for one ID value, commit the changes and move on to the next ID. This way referential integrity would be guaranteed even if update fails at any stage, and we do not need a long rollback segments and can restart the process at the point it aborted. But this update is too lengthy, and in our case would take almost a month of continuous processing. Do you have any suggestions on how to improve the performance of this critical update? Would it be better to add new ID as an additional column into one table at a time and populate all of it at once as "create table ... as select" (with nologging option), and then rename table and drop old ID column?
July 25, 2003 - 8:42 pm UTC
sounds like I would:
create table new_table as select <join>;
create table new_table2 as select <join>;
etc etc etc etc
drop old tables
add indexes
add primary/fkeys/constraints etc....
that is -- rebuild in bulk, using nologging (no redo, no undo generated), using parallel when appropriate.
Update million Rows
Reaz, July 26, 2003 - 5:41 am UTC
Dear Tom,
We have this situation, where we load data from external files numbering 7. Next we had to run a transformation process, which checks if the row exists in the target table and updates the existing row with the new one. Otherwise inserts the row to the target table.
Our current process is very slow, sometimes taking a night long to complete for say 200,000 rows.
The procedures discussed so far meant of updating only, but in our case we need to update and insert too.
I would appreciate your suggestion.
Thanks.
Reaz.
July 26, 2003 - 12:53 pm UTC
read about MERGE in the sql reference guide.
A single batch?
A reader, July 26, 2003 - 9:54 am UTC
Tom,
Is it still okay, if these steps (create a new table, drop the old one, and them rename the new table as old, create indexes etc.) are coded as the batch update program? OR would you suggests these should be detailed as the Steps that should be performed under the supervision of a DBA? What I would like to know is that can we still have the steps in a single batch routine (as update would have had) or instead of the single batch these should be DBA instructions instead?
Thanks
July 26, 2003 - 12:54 pm UTC
I would have these performed under the watchful eye of someone.
it is a one time bulk manipulation. it won't be a "pretty program" with tons of gracefully logging/error handling. You'll be doing it "down and dirty" (at least I would be). It would need someone to watch over it.
What about space consideration
A reader, July 28, 2003 - 1:39 am UTC
Hi Tom,
I had a similar problem where my table had gone big and i wanted to truncate some old data, my table was big more then 20 million records, i could not use the create temp table truncate and insert flow as the space of the database was not enough to hold 2 copies of the large table, I thought of doing a partition on table and then truncate one partition of table but that does not work.
Can you suggest some way which caters for this space constraint also
July 28, 2003 - 7:04 am UTC
why didn't the partitioning work for you.
How to update part of a Very Large Table
Mohan K, July 28, 2003 - 6:11 am UTC
The update operation can be made simple by sepearting out the query and update operation. To do that create another table which stores the rowids of the records in the original table which has to be updated along with the the value to be updated. Then run the PL/SQL script to update the records one by one. To test the above method the following scripts can be used.
SQL> create table test_update(id number, name varchar2(100), description varchar2(4000)) storage initial(48M next 4m);
SQL> declare
v_n number;
v_name number;
v_desc number;
i number;
begin
for i in 1..1000000 LOOP
insert into test_update(id, name, description) values(i, 'Test Name'||i, 'Test Name '||i||' description ');
END LOOP;
end;
/
Elapsed: 00:04:277.23
The above script will insert 1 million rows.
SQL> select count(*) from test_update where description like '%5 description%';
COUNT(*)
----------
100000
Elapsed: 00:00:02.63
SQL> create table test_update_rowids as select rowid rid, description from test_update where description like '%5 description%';
Elapsed: 00:00:54.58
The table test_update_rowids stores the rowids and the new values that has to be updated. ie 100000 rows needs to be updated.
SQL> declare
begin
for c1 in(select rid, description from test_update_rowids)
LOOP
update test_update set description=c1.description||' after update' where rowid=c1.rid;
END LOOP;
end;
/
Elapsed: 00:01:82.17
The above script performs the updation.
Mohan
July 28, 2003 - 7:37 am UTC
the above script performs an "updation" in the worst possible way, consuming the most resources.
it should have been simply:
update test_udpate set description = description || ' after update ' where description like '%5 description%';
Never do procedurally that which can be done in a single sql statement.
Updating 100K of 1 Million rows
Mohan K, July 28, 2003 - 8:38 am UTC
I ran the update statement as a single SQL statement and the time elapsed is slightly more than the above method. It has to be tested against update statements containing complex queries or join operations.
SQL> update test_update set description = description || ' after update ' where description like '%5 description%';
Elapsed: 00:01:100.13
Mohan
July 28, 2003 - 8:53 am UTC
trust me -- that difference is neglible and the amount of time you spent writing code wasn't included.
Also, you FORGOT to add in the time to populate your temporary table -- no?
if you can do it in a single sql statment (and you almost always can) you SHOULD.
Lets do you benchmark a little better:
ops$tkyte@ORA920> create table test_update(id number, name varchar2(100), description varchar2(4000));
Table created.
ops$tkyte@ORA920>
ops$tkyte@ORA920> insert /*+ append */
2 into test_update
3 select rownum, 'Test Name' || rownum, 'Test Name ' || rownum || ' description '
4 from big_table.big_table
5 where rownum <= 1000000;
1000000 rows created.
ops$tkyte@ORA920> commit;
Commit complete.
ops$tkyte@ORA920>
ops$tkyte@ORA920> select count(*)
2 from test_update
3 where description like '%5 desc%' ;
COUNT(*)
----------
100000
ops$tkyte@ORA920>
ops$tkyte@ORA920> select count(*)
2 from test_update
3 where description like '%6 desc%' ;
COUNT(*)
----------
100000
ops$tkyte@ORA920>
ops$tkyte@ORA920> exec runstats_pkg.rs_start
PL/SQL procedure successfully completed.
ops$tkyte@ORA920>
ops$tkyte@ORA920> update test_update
2 set description = description || ' after update'
3 where description like '%6 description%';
100000 rows updated.
ops$tkyte@ORA920>
ops$tkyte@ORA920> exec runstats_pkg.rs_middle
PL/SQL procedure successfully completed.
ops$tkyte@ORA920>
ops$tkyte@ORA920> create table test_update_rowids
2 as
3 select rowid rid, description from test_update where description like '%5 description%';
Table created.
ops$tkyte@ORA920>
ops$tkyte@ORA920> begin
2 for c1 in(select rid, description from test_update_rowids)
3 LOOP
4 update test_update
5 set description=c1.description||' after update'
6 where rowid=c1.rid;
7 END LOOP;
8 end;
9 /
PL/SQL procedure successfully completed.
ops$tkyte@ORA920>
ops$tkyte@ORA920>
ops$tkyte@ORA920> exec runstats_pkg.rs_stop
Run1 ran in 1376 hsecs
Run2 ran in 9806 hsecs
run 1 ran in 14.03% of the time
<b>so, according to me, the SINGLE (efficient, effective, correct way) update statment runs in about 14% of the time. furthermore:</b>
Name Run1 Run2 Diff
STAT...recursive cpu usage 5 4,662 4,657
STAT...CPU used by this sessio 448 5,397 4,949
<b>only 4,949 MORE cpu seconds using the row by row approach...</b>
STAT...CPU used when call star 448 5,397 4,949
STAT...free buffer inspected 10,162 17,039 6,877
STAT...db block gets 111,705 104,450 -7,255
STAT...dirty buffers inspected 9,016 16,620 7,604
STAT...calls to kcmgas 7,755 68 -7,687
STAT...Cached Commit SCN refer 7,718 0 -7,718
STAT...switch current to new b 7,719 0 -7,719
STAT...cleanouts only - consis 0 7,720 7,720
STAT...immediate (CR) block cl 0 7,720 7,720
STAT...commit txn count during 0 7,721 7,721
STAT...cleanout - number of kt 3 7,726 7,723
STAT...consistent gets - exami 3 7,760 7,757
LATCH.undo global data 221 7,985 7,764
STAT...redo entries 100,700 108,727 8,027
LATCH.redo allocation 100,795 108,945 8,150
STAT...db block changes 202,723 210,907 8,184
STAT...Elapsed Time 1,385 9,825 8,440
STAT...physical reads 7,894 16,476 8,582
LATCH.checkpoint queue latch 26,845 38,012 11,167
LATCH.cache buffers lru chain 35,638 52,827 17,189
STAT...no work - consistent re 7,741 100,072 92,331
STAT...table scan blocks gotte 7,719 107,721 100,002
STAT...table scan rows gotten 1,000,000 1,100,002 100,002
STAT...execute count 27 100,092 100,065
STAT...buffer is not pinned co 7,763 107,836 100,073
STAT...calls to get snapshot s 56 100,241 100,185
LATCH.shared pool 172 100,778 100,606
STAT...session logical reads 119,503 220,165 100,662
STAT...consistent gets 7,798 115,715 107,917
LATCH.library cache pin 233 200,865 200,632
STAT...recursive calls 177 200,870 200,693
LATCH.library cache 332 201,322 200,990
LATCH.cache buffers chains 587,377 835,167 247,790
STAT...redo size 31,323,804 31,836,904 513,100
STAT...session pga memory max 0 524,824 524,824
STAT...session pga memory 0 1,049,112 1,049,112
Run1 latches total versus runs -- difference and pct
Run1 Run2 Diff Pct
754,602 1,553,986 799,384 48.56%
PL/SQL procedure successfully completed.
<b>and worse, it used 2x the number of latches! (scalability inhibitors)
There is nothing good about row at a time processing, it is something to do ONLY when a set based operation isn't feasible</b>
This is very good example for updating the records
Bipin Ganar, July 28, 2003 - 9:00 am UTC
Hi Tom,
Yes this is good example to update the record but my problem
is i have 1Million records in one table. I want update and give hike of 10% of their salaries in one table but that table is accessed by n no of people at the same moment. I can not drop the table and update and second how can i define tablespace in this script
Create Table a as select * from emp_sal;
July 28, 2003 - 9:10 am UTC
create table a
tablespace foo
as
select * from emp_sal;
slightly different problem!
Srikanth Adiga, July 28, 2003 - 12:33 pm UTC
My table:
5M records are inserted each month.
Operations:
5M records inserted on the first day of the month
5M records updated on rest of the days on the month.
Note that this update is update each row with diff values.
I am using OCI and using batch operations. Can you please suggest me what else can I do within my program to improve the performance of my updates? I tried multithreading, but that did not help.
anything else?
July 28, 2003 - 12:48 pm UTC
it only takes a couple of minutes to update 5 million rows on my desktop pc -- so, there must be something more involved here.
I'd be able to do that in minutes -- why is it taking you so long -- what is your bottleneck.
space considerations
A reader, July 28, 2003 - 8:56 pm UTC
could not create another temp table which holds the data same as teh table being partitioned as there was not enuff space in the harddisk to hold 2 or 1.5 to be precise copies of the same huge table,is there a way to partition without having to drop of truncate or transfer data to some temp location and then partition
July 29, 2003 - 6:52 am UTC
disk is cheap.
time is money.
your time spent just thinking about this problem cost much more then the disk to solve it would.
penny wise, pound foolish. How big is this 20million row table (assuming you are the same "a reader" from above). Maybe 1-3 gig?
slightly different problem!
Srikanth Adiga, July 29, 2003 - 12:30 am UTC
couple of minutes!!! Oh dear, thats really too good.
Let me reitterate my case:
My table schema is something like this:
SubscriberVsQtyTable:
int client-id
int year
int month
int quantity
primary key - (client-id, year,month)
This table has 5M client ids.
Now every day, I have update the quantity field with the txns made for that day. Note that, I have to update one subscriber at a time since the number of txns made by each subscriber if different.
So, even now do you believe you can do this in couple of minutes? My app takes about 30 to 50 mins even on higher machines like 8 CPU.
As I said, this is a C++ app, running Solaris + Oracle 8i/9i. We are using OCI.
Thanks,
July 29, 2003 - 7:01 am UTC
if you are doing that work with more then a single sql statement, I could see it taking a while.
If you do that work in a single update, no, I don't see it taking more then a couple of minutes.
Time for Update
Mohan K, July 29, 2003 - 8:27 am UTC
It is my understanding that the time for update depends on the following factors. There may be something else also involved.
1. Row Chaining/Migration caused by update
2. The where cluase is used to select the rows to be updated.
3. Percentage rows updated in a block. The more the better. Even if one row is updated in a block containing 10 records the entire block has to written by DBWR and LGWR.
So just the number of records is not that matters. 100 million recorsd may takes days for one table and may take less than a hour for another table with few columns.
Mohan
July 29, 2003 - 8:31 am UTC
1) updating a quantity (number) field is unlikely to do that.
2) correct
3) correct
no, i disagree with the last part. to update 5million rows, a single unindexed column, should not take "a really long time". I was answering their direct question -- they update every row, on every block. so #1 and #3 really don't come into play. The biggest factor is #2 here
Verifing if neccesary to update, before update
juan carlos reyes, July 29, 2003 - 9:09 am UTC
Hi Tom, I had seen several times that you can
increase the performance by evaluating if you really need to update, or maybe it was only my impression.
For example
Update table set income = 500
where level = 7
AND NOT INCOME = 500
The AND NOT INCOME = 500, allows to avoid unncesary updates,
my point is a question
How advisable is it, does it really increase performance?
As you are always thinking how to increase performance, tt could be a good idea in newer version of Oracle, that if Oracle verifies that the value was set to that value, not update
:)
July 29, 2003 - 11:07 am UTC
if there are more then a handful of records where income = 500, then yes, this will definitely increase performance by
a) decreasing the latching
b) decreasing the redo
c) decreasing the undo
we cannot make that optimization for you -- it would stop triggers and other EXPECTED side effects from happening.
A reader, July 29, 2003 - 11:42 am UTC
:) Thanks,
Maybe then it could be an optional feature
UPDATE_ONLY_IF_DIFFERENT hint
I think it could interest several people
who don't use trigger, neither constraints ,etc.
Stu Charlton, July 29, 2003 - 2:28 pm UTC
"Note that, I have to update one subscriber at a time since the number of txns made by each subscriber if different."
That still seems doable in 1 SQL statement, even if you wanted to merge the updates. Couldn't you just do a COUNT(*) grouped by client_id, year, and month?
July 29, 2003 - 2:38 pm UTC
I agree -- most things CAN be done in a single sql statement.
easier said then done in real life situations
A reader, July 29, 2003 - 8:59 pm UTC
Hi Tom,
I agree you may think getting a disk is cheaper, that is only when u are considering getting up or just going to the store and getting a disk,
In real life conditions specially corporates this is not as easy as it seems nor is it less costly.
A Simple purchase of disk has to first be justified
which is quite difficuult, i dont think i have justification as yet)then be approved then be purchased by purchase dept then be installed or fixed by the server teams, server team may want to take the system down which means more dollars much more.These are just things off my head, when we acutually inititate the purchase process , there will be definitly some more steps and processes.
Coming back to my question, am i right in saying that we can't partion an existing table without copy the data to some temp location,Let me know if there is any other way.
Thanks
July 29, 2003 - 9:27 pm UTC
justification
you pay me alot
disk is cheap.
think about it.
(sometimes it really can be that simple)
we are talking about a couple of gig (like 1 to 3 -- small, tiny -- my /tmp has more room then that) here, something my LAPTOP would not have an issue with.
in order to partition -- think about it -- you'll need the SOURCE DATA and the PARTITIONED DATA -- at the same time, for a period of time. no magic there.
slightly confused !
Srikanth Adiga, July 30, 2003 - 12:00 am UTC
Thanks, Stu and rest for your updates.
Taking this buit more further, since I am confused :)
[Sri]
"Note that, I have to update one subscriber at a time since the number of txns
made by each subscriber if different."
[Sri]
[Stu Charlton]
That still seems doable in 1 SQL statement, even if you wanted to merge the
updates. Couldn't you just do a COUNT(*) grouped by client_id, year, and month?
Followup:
I agree -- most things CAN be done in a single sql statement.
[Stu Charlton]
Sorry, how would one do this?
If my table has two rows;
clientid =1,year=july,year=2003,quantity=10
clientid =2,year=july,year=2003,quantity=20
Now I have to update (clientid=1)'s quantity by 15 and (clientid=2)'s quantity by 25.
How would you manage this in a single SQL? Like this there would be 5M rows to be updated.
Btw, my table is indexed on clientid,year and month.
July 30, 2003 - 7:24 am UTC
what is the OTHER table you are updating from? you must have a detail table elsewhere from where you derive that 15 and 25. So, assuming something like this:
ops$tkyte@ORA920> create table t ( clientid int, month int, year int, quantity int );
Table created.
ops$tkyte@ORA920> create table txns ( clientid int, month int, year int );
Table created.
ops$tkyte@ORA920>
ops$tkyte@ORA920> insert into t values ( 1, 7, 2003, 10 );
1 row created.
ops$tkyte@ORA920> insert into t values ( 2, 7, 2003, 20 );
1 row created.
ops$tkyte@ORA920>
ops$tkyte@ORA920> insert into txns select 1, 7, 2003 from all_objects where rownum <= 15;
15 rows created.
ops$tkyte@ORA920> insert into txns select 2, 7, 2003 from all_objects where rownum <= 25;
25 rows created.
ops$tkyte@ORA920>
ops$tkyte@ORA920>
ops$tkyte@ORA920> select * from t;
CLIENTID MONTH YEAR QUANTITY
---------- ---------- ---------- ----------
1 7 2003 10
2 7 2003 20
ops$tkyte@ORA920> update t
2 set quantity = quantity + ( select count(*)
3 from txns
4 where txns.clientid = t.clientId
5 and txns.month = t.month
6 and txns.year = t.year );
2 rows updated.
ops$tkyte@ORA920> select * from t;
CLIENTID MONTH YEAR QUANTITY
---------- ---------- ---------- ----------
1 7 2003 25
2 7 2003 45
there you go.
(it is a shame you are using the wrong datatypes -- only a DATE should be used to hold, well, DATES. Using numbers to hold a year and a month isn't a good practice)
SCN,CHECK POINT
AATIF, July 30, 2003 - 1:03 am UTC
i ask about whta is difference SCN system commit number,systemt change number and check point.
Which base oracle sync the database SCN or CHECKPOINT ,
when check point ocur who number write in control file headre,datafile and log file header.
dear tom.
your web site say me not ask more question now but i need clear my point.
AATIF.
July 30, 2003 - 7:30 am UTC
er?
i remember you asking this on another page -- yes.
and I also remember answering this on that other page -- yes.
I fail to see how this relates to this page.
anyway there IS no difference between
scn
system commit number
system change number
they are the same.
as for the rest of the comment -- it doesn't "make sense" (i do suggest a read through of the server concepts guide, many questions are answered there!)
More clarification
Srikanth Adiga, August 01, 2003 - 12:34 am UTC
>what is the OTHER table you are updating from? you must
>have a detail table elsewhere from where you derive that
>15 and 25.
It is coming from an OCI application after processing some files.
August 01, 2003 - 7:49 am UTC
use bulk processing - update between 100 and 500 rows at a time in a single call using array processing.
Or, put the data where it belongs - not in files, in tables -- and process the data in the database rather then in C
Srikanth Adiga, August 06, 2003 - 10:16 am UTC
>what is the OTHER table you are updating from? you must
>have a detail table elsewhere from where you derive that
>15 and 25.
It is coming from an OCI application after processing some files.
>Followup:
>use bulk processing - update between 100 and 500 rows at a >time in a single call using array processing.
Yes, that is what we do. It takes about 60 mins to update 5M records on a 2CPU machine. Is this much expected?
If I do the same in multiple threads I do not see any performance improvement i.e. 1M records parallely updated in 5 threads.
Any idea why?
August 06, 2003 - 10:55 am UTC
are you sure they are going in parallel and not blocking/locking each other.
verify that all 5 sessions are in fact not only ACTIVE (v$session) at the same time but that you are not blocking yourself.
and you have taken some statspacks during your testing to see if you have any obvious bottlenecks right?
Explicit commit !
Nathan, August 07, 2003 - 5:04 am UTC
Tom,
Why do we commit explicitly after submitting jobs ?
Thanks
Nathan
/*------------------------------------------------
Followup:
instead of
begin
execute immediate 'alter index idx1 rebuild';
execute immediate 'alter index idx2 rebuild';
end;
you can code:
declare
l_job number;
begin
dbms_job.submit( l_job, 'execute immediate ''alter index idx1 rebuild'';' );
commit;
dbms_job.submit( l_job, 'execute immediate ''alter index idx2 rebuild'';' );
commit;
end;
---------------------------------*/
August 09, 2003 - 3:51 pm UTC
because if you do not YOUR session can "see" the job in the job queue but the job queue processes cannot!
How many of us have sat there for minutes going "when is the stupid thing going to run" and until we exit sqlplus it does nothing :)
Explicit Commit
Nathan, August 07, 2003 - 8:51 am UTC
Apologies,
I must withdraw my previous question... i was obviously not thinking straight. I was thinking about a similar sitaution where a commit was issued ( before dbms_job.submit ) with a comment /* do not remove */ without any explanations... I'm still pondering why the commit should be there.
Sorry for the trouble.
Regards
Nathan
I have to take issue....
cheaper than disk, August 07, 2003 - 10:00 am UTC
Your response about justification for buying disk space is outdated...
"justification
you pay me alot
disk is cheap.
think about it.
(sometimes it really can be that simple)"
Anybody else out there NOT paid alot ANYMORE?!
;-)
Re: Explicit Commit - To Nathan
A reader, August 07, 2003 - 3:25 pm UTC
Tom is out till weekend..
"I'm still pondering why the commit should be there. "
The commit after dbms_job is requierd so that other sessions can see the job_queue...which typically is the
requirement...As usual, Tom can confirm..
Menon:)
Reader
A reader, August 10, 2003 - 1:27 pm UTC
"
create table invtmp as
select * from invins
UNION ALL
select t1.*
from inv t1, invdel t2
where t1.key = t2.key(+)
and t2.key is null;
"
Like to know why
"
where t1.key = t2.key(+)
and t2.key is null
"
better than
where t1.key != t2.key with ALWAYS_ANTI_JOIN hint
August 10, 2003 - 2:42 pm UTC
because there is no "always anti join" hint?
you could of course use the CBO, and a NOT IN subquery in which case it'll do it itself.
Su Baba, August 14, 2003 - 5:54 pm UTC
So if I have a big table, x, that needs to be updated, I would
create table y nologging as select * from x...
retrieve index definition from x
build indexes on y
drop table x
rename y to x
If I have all these done in a PL/SQL batch job, what's the best way to retrieve index definition and form create index statement from table x? I want to create indexes on y that have the exact storage parameter, tablespace, maybe partition, etc. as the ones on table x.
By the way, the database version is 8.1.7.
thanks
August 14, 2003 - 8:01 pm UTC
you COULD, not WOULD, it is an option.
you don't have your create indexes somewhere?
guess I would exp rows=n, imp indexfile=foo to get them -- or DBMS_METADATA.get_ddl in 9i
How to update part of a Very Large Table
Mohan K, August 28, 2003 - 7:03 am UTC
please refer "How to update part of a Very Large Table July 28, 2003"
When updating a large table in a database having comparatively less buffer cache one thing to make sure is that blocks are updated in a continuous order. Suppose some records of block1 is updated first in the beginning and next after updating some other blocks, then block1 will be written twice by DBWR and LGWR. After the first update block1 may get flushed out by LRU algorithm. If all records in block1 is updated at a time then block1 is wriiten by LGWR and DBWR only once. If the update statement is doing a full table scan then bloacks are updated in order. But the order can be different when the is using an index for selecting the rows.
The SQL statement that makes sure of that is
update (select id, description from test_update where rowid in(select rowid from (select rowid from test_update order by rowid))) set description=description||' new val' where description like '%5%';
August 28, 2003 - 7:54 am UTC
not necessarily so -- first lgwr doesn't write blocks -- it writes redo. second, dbwr may or may not write the block more then once.
I would never never never use your approach. why I would want to
a) read the entire table
b) sort the rowids
c) distinct that set (implied)
d) join to it (and you are making the assumption the table in the IN will be the driving table and that a hash join will not be used and ..... lots of other stuff)
when a simple:
update T set d = d || 'x' where d like '%5%'
will do it -- if you are worried about the index on d being used - you would simply FULL hint it -- but doubtful that it would be necessary.
the additional work you add would more then offset the work you are trying to save.
update large table
Vera, August 28, 2003 - 2:50 pm UTC
Hi Tom and thank you again and again for all your work.
My problem is that I cannot simply replace update with creating a new table. The table I need to update - I'll call it 'pension table' - has about 1.5 mln rows. I have to update "amount" fields for 1994, 1995,1995 etc. years, and the way how I update these fields depends on status of the participant at each particular year. To determine this status I built a function that looks through another tables and takes several input parameneters including each particular financial year because it determines not just status of a participant, but status he/she had in particular FY. (This function I call a "box of cockroaches"). Basically, I ask Oracle to bring me annual monthly amount for Mr/Mrs 'X' for, let's say, 1994 and if 'X' had status 'retired' in 1994 then I want this amount to be calculated one way, if 'X' was 'deferred' -another way and if 'X' was 'dead' third way.
As you alredy guessed, updates take long time - about an hour for each year. I wonder what approach would you use in a situation like this?
Thank you
By the way, what is a status of your new book?
August 29, 2003 - 8:35 am UTC
I would use a CASE statement in SQL without using PLSQL to compute the value.
optionally -- i would turn it inside out. instead of sql calling plsql, this might be one of the times when plsql calling sql is more appropriate. build up procedurally an array of values to update -- forall update them -- and do it again until the rows are processed. with a little foresight, you can even parallelize this processing and run your procedure N times -- each instance of it running against a different slice of the table...
book is "real soon" -- should be in trucks right now.
update large table
Luc Gyselinck, August 28, 2003 - 4:18 pm UTC
For large tables, if I have to update more than 20 percent of the rows, I do it like Tom: write a query to insert the 'updated' records in a new table, using the /*+ APPEND */ hint. You have no indexes to update, no constraints to validate and you generate almost no redo. Next, I truncate the old table, after I have disabled all constraints and triggers (once again, almost no redo, very fast), and then I inject, again with the /*+ APPEND */ hint, the data from the new table into the old (very fast, little redo). Indexes get automaticaly rebuild AFTER the extents are populated. I reenable the triggers, the constraints (with or without validation, as you wish).
If the new values for the columns being updated must come from other tables, I NEVER write something like
update t1
set c1 = (select c2 from t2 where t2.c3 = t1.c4)
In fact, you are performing NESTED LOOPs. Using functions to get the new values for the columns is much the same : NESTED LOOPS (even worse: SQL / PLSQL engine context switches, open/close cursors in PL/SQL, NO read consistency).
Whenever I find myself in such a situation (typicaly when writing batch procedures, during data migrations, data transformations), I make sure my queries use HASH joins, I give the session more resources (higher SORT_AREA_SIZE, higher HASH_AREA_SIZE, DB_MULTIBLOCK_READ_COUNT to the max), avoid the INDEX RANGE SCANS, do FULL (yes FULL) table scans, do FAST FULL INDEX scans, thus bypassing the buffer pool which otherwise gets flushed by the data from my queries. And by the way, use LMTs with uniform size extents.
As an example, I rewrote a batch procedure that took 3 days to complete (once, many years ago), written the old way (see the update statement above), that now does the same job in only 4 hours, on the same (old) hardware, same database, but different Oracle version: Oracle 7 / Oracle 8i.
Yes, you are right,
Vera, August 28, 2003 - 4:50 pm UTC
but right now I don't see how can I perform my task without update statement and a function. In other cases I do exactly the same thing you do on large tables - create new table rather than update old one and then rename tables.
As for LM tablespaces I was begging our DBAs to switch on LMT since I don't know when and they keep promising to do it.
Does parallel insert just imply direct?
Dennis, September 08, 2003 - 1:49 pm UTC
Tom (or anyone),
Does a parallel insert imply direct (append) insert? I read something in the documentation (or is that forbidden territory) ;) awhile back about the append hint not being needed on a parallel insert because it was implied.
I issued the following:
INSERT /*+ append, nologging, parallel(ext, 8) */ INTO hrdmadm.ext_xpayd_pay_details ext
SELECT /*+ parallel(ext2, 8) */ *
FROM hrdmadm.ext_xpayd_pay_details@hip2 ext2
and while I saw the parallel processes spawn on hip2 (for the select), I didn't notice any spawn where I was.
I was wondering if that was because parallel insert was synonymous with direct insert, or did I mess up the syntax somehow? Is the insert in fact parallel, and it's just 'hidden' from us?
Thanks,
Dennis
Thanks!
Dennis, September 08, 2003 - 2:57 pm UTC
My bad was that I didn't alter session. Thanks Tom for your valuable insight. I guess I was just too afraid to read down a few more pages when I was in the documentation, with that being forbidden territory and all ;) Well maybe not forbidden...just hard to put that 'haunted/fear-inspiring' word that would fit in there.
Thanks,
Dennis
Optimizing Query
Mohan, September 11, 2003 - 8:40 am UTC
BULK collect and BULK insert can improve the performance. Create SQL objects of type t1%rowtype and t2%rowtype.
BULK collect 100 records at a time to the collection array. Create seperate collection arrays to store values that are computed or obtained after searching the database.
Do BULK insert/update into the target table. Search this site using key words "bulk insert" or "bulk collect" or "bulk update" for more examples.
Other things are
1. Improve query performance by proper indexing
2. Remove fragmentation by "alter table move" command
3. Rebuild indexes
4. Allow large pctfree if the update causes row migration/row chaining
5. Size rollback segments and rollback tablespaces
Mohan
September 11, 2003 - 9:49 am UTC
#2 isn't really "a good thing" in general.
#3 is really not "a good thing" in general.
reorgs -- so so overrated.
Kailash, September 11, 2003 - 1:51 pm UTC
Can you explain which of the 3 operations will be faster. If I want to update millions of rows,
1. then would delete/reinsert be faster or
2. mere update will be faster
3. the one you suggested will be faster.
Can you advise as to why the method you had suggested will be faster than 1 and 2.
Can you explain why updating millions of rows is not a good idea. Is it based on the amount of redo generated and what if the columns I update are not part of the index?
Which operation will be the fastest and I request you to explain the various DML operations and the amount of redo they generate. ie. I would appreciate if you could explain me which DML generates most redo, followed by the second followed by the 3rd which generates the least redo.
Currently we have an application that checks for a condition in a table A and obtains the unique id. if a record is found, then flushes all of the dependent tables for the given id and re-inserts these records back into the dependent tables with a set of new values. Is this a good approach? If so, what would be the ideal approach?
September 11, 2003 - 7:25 pm UTC
I can say yes to all three -- each is the fastest.
each is the slowest.
If you have a large bulk update that is done once and hits most of the of rows -- you may be best off by doing a CTAS, drop old and rename new.
the CTAS, indexing, etc can be done in nologging mode -- no undo, no redo. you end up getting done much faster.
updating millions of records
b, October 03, 2003 - 4:28 pm UTC
Hi,
The approach of updating the view instead of a table works great.
But here i want to update multiple colums of the same table based on different table queries.
we have around 600,000 records in the base table.
what could be the most inexpensive query.
We run oralce 9i.
UPDATE user1.UserSummary
Set
Column1 = (
SELECT Count(DISTINCT WebOrderID)
FROM user1.OrderTotal ot
WHERE total>0 and
(ot.ostatus <> -1 and
ot.ostatus <> -1) and
ot.UREFITEM = user1.UserSummary.UREFITEM)),
Column2 = (
SELECT NVL(Sum(Total),0) FROM
user1.OrderTotal ot
WHERE (ot.ostatus <> -1 and ot.ostatus <> -1)
and ot.UREFITEM = user1.UserSummary.UREFITEM),
Column3 = (
SELECT min(sodate) FROM orders.salesorders so left join
user1.ordertotal ot on ot.sorefitem = so.sorefitem
WHERE total>0 and (so.ostatus <> -1 and so.ostatus <> -1)
and so.UREFITEM = user1.UserSummary.UREFITEM),
Column4 = (
SELECT max(sodate) FROM orders.salesorders so
left join user1.ordertotal ot on ot.sorefitem = so.sorefitem
WHERE total>0 and (so.ostatus <> -1 and so.ostatus <> -1) and
so.UREFITEM = user1.UserSummary.UREFITEM)
;
the above is the sample of my query
thanks in advance
;;)
October 04, 2003 - 9:38 am UTC
if you can come up with a single query that pulls all 4 columns together in a single result set -- we can optimize this. Left as an exercise for you -- see if you cannot merge the above 4 separate query into a single query
great
b, October 17, 2003 - 3:33 pm UTC
thanks
i finally made my query simple
Great
Raj, October 21, 2003 - 5:28 pm UTC
Great technique never thought of it.
Anthony Reddy, October 22, 2003 - 1:27 am UTC
Updating 1 million records
Kailash, October 22, 2003 - 4:43 pm UTC
I have a situation where I have a unique key on 3 columns (c1,c2,c3) on a table T and I want to update c2 from A to B for some records that match certain criteria. There are a total of 30 million records in the table and there will be about 1 million records in the table that match the criteria to be updated. The table is not partitioned.
1. Your suggestion to drop the old table and to re-create a new table and re-naming it to the old table may not be possible because I am updating only a subset of the table. Is there any other method to acheive this ?
2. Which one of the following will be a better solution?
a. Drop index and update the rows and recreate the
index
b. Delete the rows with C2 = A and insert new rows
with C2 = B. My DBA says that this will create
fragmentation and is not a better approach. Is
that true?
c. Do you suggest any better solution under these
scenarios since I keep getting these type of
questions from my manager often.
Your suggestion in this regard is highly appreciated.
October 22, 2003 - 6:28 pm UTC
have you considered just "update set c2 = b where ...."?
1,000,000 rows out of 30,000,000 isn't alot. if c2 is indexed, you might benefit from a set unusable, skip unusable, update, rebuild.
but it won't take that long to update 1,000,000 rows. couple of minutes on my laptop
big_table@ORA920> update big_table set created = created+1;
1833792 rows updated.
Elapsed: 00:02:13.28
not a big deal.
How about rollback segment problem when updating million rows.
Ramasamy, October 22, 2003 - 10:49 pm UTC
Tom ,
Thanks lot for your service.
I have to update a single column NUMBER(9) where
the table has got more than 80 columns.
This table has got 30 million rows and i have to update
around 17 million rows each night.
As you always insisted ,
i tried with a single update like
UPDATE t1 SET batchno = (A constant )
WHERE batchno is NULL;
1.)I came across the rollback segment issue.
2.) It is taking around 3 hrs to update.
Your input is greatly appreciated to solve the above issues.
Thanking You,
Ramasamy
October 23, 2003 - 8:03 am UTC
there is no problem -- other then you haven't sized your system to do the things you require it to do!!!!!!!!
1) the rollback segment "issue" is a "non issue". size things for what you do and there are NO ISSUES. disk is cheap -- compared to what they pay us all to have this discussion. consider that. the amount of time you've spent thinking about this already cost your employer much more then the disk it would take to do this correctly.
2) you are changing a null to a non-null -- increasing the width of the field. make sure you are not causing massive row migration (check your pctfree, make sure it is set appropriately here). is batchno "indexed"?
MILLION RECORD INSERT/SELECT/UPDATE TAKES TOO MUCH TIME
Santosh Das, October 23, 2003 - 3:27 am UTC
Hi Tom,
I have a table A and soon after creating it 1million records are put into it. Then it being in a real time environment its size goes on increasing atleast half million each day.However each day a cleanup application cleans half million every day.After inserting 1 million at the creation of table its put in the realtime environment.When ever a new record is inserted a contuniously listening application detects it and applies a select operation on table A. Then it send the selected items to some table B and then updates the table A.
The listening operation may be 3 or 4 in number.How ever the select and update operations are taking a lot of time and the initial insertion of 1 million records is taking 11 hours to complete so what can be the possible architecture.
I tried sending the rowid to a queue/pipe after inserting in table A and then used it to select and update.But still its taking a lot of time. Plz suggest some method.
Thanks and regards,
Sandy..
October 23, 2003 - 12:42 pm UTC
sounds like you should just be using AQ -- plop the record in as a message -- have as many dequeuers as you like.
not sure what else to say, not a ton of detail here.
Select and update of a table Having Millions of Records.
Jack, October 23, 2003 - 8:05 am UTC
Hi Tom,
I have the following queries which execute on a table of million records.That table has a PK but its not used in both the queries given below.CMTIME,SAD and STATUS all are NUMBER fields.If I make index on cmtime then it takes longer. Presently i dont have a index field.Can you plz tellme how i can optimize these or where the performance bottleneck is ?
update SD set CMTIME = 1064929834 where (CMTIME is NULL or 1064929834 - CMTIME >= 10) and (1064929834 - SAD >= 0) and 1064929834 not in (select CMTIME from SD where CMTIME is not NULL)
SELECT * from SD where (CMTIME is not NULL and (CMTIME = 1064929834 OR 1064929834 - CMTIME >= 10)) AND (ED - 1064929834 > 0 and (STATUS = 1 OR STATUS = 0) and 1064929834 - SAD >= 0)
Thanks and regards,
Jack.
October 23, 2003 - 12:50 pm UTC
are you using the CBO.
Followup to your question
Ramasamy, October 23, 2003 - 2:17 pm UTC
2) you are changing a null to a non-null -- increasing the width of the field.
make sure you are not causing massive row migration (check your pctfree, make
sure it is set appropriately here). is batchno "indexed"?
Yes Tom. The Batchno is indexed like
index on (BATCHno,STATUS);
Will it be worth Droping the index and recreating it?.
Thanks,
Ramasamy
October 23, 2003 - 7:16 pm UTC
could definitely help, yes.
George Lee, October 23, 2003 - 9:12 pm UTC
I tried to update 30 million records using Tom's solution.
It works fine. Using approximately 1.5 hours to finish.
George (Broadway Photo Supply Ltd. HK)
whats CBO
Jack, October 24, 2003 - 4:29 am UTC
October 24, 2003 - 9:37 am UTC
cost based optimizer
No
jack, October 27, 2003 - 1:37 am UTC
We are not using any cost based optimizers.
It's a simple query which we are using to retrieve data from a table having millions of records and it's performance is not satisfactory.
How to optimize it so that it runs faster ?
jack
October 27, 2003 - 7:42 am UTC
you are kidding right?
this goes right up there with "my car won't start, why not?"
there is so much missing from this "request" as to make it impossible to say anything sensible.
George Lee, November 11, 2003 - 11:21 pm UTC
I have a fact table being partitioned by month. The indexes are built local to partition.
The data population is working fine.
But the rebuild index fails at the end.
I don't know how to rebuild indexes in the partitions.
Can you help me ?
November 12, 2003 - 10:19 am UTC
why do you need to rebuild here at all??
but why don't you show us what you are doing -- at the very least -- define what "fails at the end" means (error codes, messages, details)
George Lee, November 12, 2003 - 8:17 pm UTC
Dear Tom,
Yes, I should provide enough information to you. Sorry for that.
Here is the story,
I create a table as following,
drop table D_DYNRPT_SALES_FACT;
create table D_DYNRPT_SALES_FACT
(
TX_DATE DATE,
SHOP_NO VARCHAR2(20),
ITEM_NO VARCHAR2(12),
PARENT_ITEM_NO VARCHAR2(12),
BRAND_NO VARCHAR2(5),
VENDOR_NO VARCHAR2(20),
ITEM_GROUP VARCHAR2(5),
ITEM_TYPE VARCHAR2(5),
CONSIGNMENT VARCHAR2(1),
ITEM_DESC VARCHAR2(40),
BRAND_DESC VARCHAR2(30),
VENDOR_DESC VARCHAR2(30),
CATEGORY_DESC VARCHAR2(30),
QTY_SOLD NUMBER(14,2),
NET_SALES_AMT NUMBER(14,2),
GROSS_PROFIT_AMT NUMBER(14,2)
)
PARTITION BY RANGE (TX_DATE)
( PARTITION D_DYNRPT_SALES_FACT_2000_09 VALUES LESS THAN (TO_DATE('01/10/2000','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2000_09,
PARTITION D_DYNRPT_SALES_FACT_2000_10 VALUES LESS THAN (TO_DATE('01/11/2000','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2000_10,
PARTITION D_DYNRPT_SALES_FACT_2000_11 VALUES LESS THAN (TO_DATE('01/12/2000','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2000_11,
PARTITION D_DYNRPT_SALES_FACT_2000_12 VALUES LESS THAN (TO_DATE('01/01/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2000_12,
PARTITION D_DYNRPT_SALES_FACT_2001_01 VALUES LESS THAN (TO_DATE('01/02/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_01,
PARTITION D_DYNRPT_SALES_FACT_2001_02 VALUES LESS THAN (TO_DATE('01/03/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_02,
PARTITION D_DYNRPT_SALES_FACT_2001_03 VALUES LESS THAN (TO_DATE('01/04/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_03,
PARTITION D_DYNRPT_SALES_FACT_2001_04 VALUES LESS THAN (TO_DATE('01/05/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_04,
PARTITION D_DYNRPT_SALES_FACT_2001_05 VALUES LESS THAN (TO_DATE('01/06/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_05,
PARTITION D_DYNRPT_SALES_FACT_2001_06 VALUES LESS THAN (TO_DATE('01/07/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_06,
PARTITION D_DYNRPT_SALES_FACT_2001_07 VALUES LESS THAN (TO_DATE('01/08/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_07,
PARTITION D_DYNRPT_SALES_FACT_2001_08 VALUES LESS THAN (TO_DATE('01/09/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_08,
PARTITION D_DYNRPT_SALES_FACT_2001_09 VALUES LESS THAN (TO_DATE('01/10/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_09,
PARTITION D_DYNRPT_SALES_FACT_2001_10 VALUES LESS THAN (TO_DATE('01/11/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_10,
PARTITION D_DYNRPT_SALES_FACT_2001_11 VALUES LESS THAN (TO_DATE('01/12/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_11,
PARTITION D_DYNRPT_SALES_FACT_2001_12 VALUES LESS THAN (TO_DATE('01/01/2002','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_12,
PARTITION D_DYNRPT_SALES_FACT_2002_01 VALUES LESS THAN (TO_DATE('01/02/2002','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2002_01,
PARTITION D_DYNRPT_SALES_FACT_2002_02 VALUES LESS THAN (TO_DATE('01/03/2002','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2002_02,
PARTITION D_DYNRPT_SALES_FACT_2002_03 VALUES LESS THAN (TO_DATE('01/04/2002','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2002_03,
PARTITION D_DYNRPT_SALES_FACT_2002_04 VALUES LESS THAN (TO_DATE('01/05/2002','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2002_04,
.
.
.
PARTITION D_DYNRPT_SALES_FACT_2004_12 VALUES LESS THAN (TO_DATE('01/01/2005','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2004_12,
PARTITION D_DYNRPT_SALES_FACT_MAXVALUE VALUES LESS THAN (maxvalue)
TABLESPACE D_DYNRPT_SALES_FACT_MAXVALUE
)
/
CREATE INDEX D_DYNRPT_SALES_FACT_I01 ON D_DYNRPT_SALES_FACT
(TX_DATE) LOCAL;
CREATE INDEX D_DYNRPT_SALES_FACT_I02 ON D_DYNRPT_SALES_FACT
(TX_DATE, ITEM_GROUP, ITEM_TYPE) LOCAL;
CREATE INDEX D_DYNRPT_SALES_FACT_I03 ON D_DYNRPT_SALES_FACT
(TX_DATE, SHOP_NO, ITEM_NO, ITEM_GROUP) LOCAL;
CREATE INDEX D_DYNRPT_SALES_FACT_I04 ON D_DYNRPT_SALES_FACT
(TX_DATE, BRAND_NO, ITEM_GROUP, ITEM_TYPE) LOCAL;
CREATE INDEX D_DYNRPT_SALES_FACT_I05 ON D_DYNRPT_SALES_FACT
(TX_DATE, ITEM_NO) LOCAL;
CREATE OR REPLACE PUBLIC SYNONYM D_DYNRPT_SALES_FACT FOR BPSADM.D_DYNRPT_SALES_FACT;
then, populate data into the table as the following,
alter index D_DYNRPT_SALES_FACT_I01 unusable;
alter index D_DYNRPT_SALES_FACT_I02 unusable;
alter index D_DYNRPT_SALES_FACT_I03 unusable;
alter index D_DYNRPT_SALES_FACT_I04 unusable;
alter index D_DYNRPT_SALES_FACT_I05 unusable;
alter session set skip_unusable_indexes=true;
@pop_d_dynrpt_sales_fact.sql '2000/09/01' '2000/01/30';
.
.
.
@pop_d_dynrpt_sales_fact.sql '2003/11/01' '2003/11/30';
alter index D_DYNRPT_SALES_FACT_I01 rebuild nologging;
alter index D_DYNRPT_SALES_FACT_I02 rebuild nologging;
alter index D_DYNRPT_SALES_FACT_I03 rebuild nologging;
alter index D_DYNRPT_SALES_FACT_I04 rebuild nologging;
alter index D_DYNRPT_SALES_FACT_I05 rebuild nologging;
the data_population is working fine, but alter index rebuild nologging fail with the error as following,
alter index D_DYNRPT_SALES_FACT_I01 rebuild nologging
*
ERROR at line 1:
ORA-14086: a partitioned index may not be rebuilt as a whole
alter index D_DYNRPT_SALES_FACT_I02 rebuild nologging
*
ERROR at line 1:
ORA-14086: a partitioned index may not be rebuilt as a whole
alter index D_DYNRPT_SALES_FACT_I03 rebuild nologging
*
ERROR at line 1:
ORA-14086: a partitioned index may not be rebuilt as a whole
alter index D_DYNRPT_SALES_FACT_I04 rebuild nologging
*
ERROR at line 1:
ORA-14086: a partitioned index may not be rebuilt as a whole
alter index D_DYNRPT_SALES_FACT_I05 rebuild nologging
*
ERROR at line 1:
ORA-14086: a partitioned index may not be rebuilt as a whole
I checked the reference, it should be something syntax error on my rebuild index statement, it is like alter index D_DYNRPT_SALES_FACT_I05 rebuild partition..., I don't know the exact statement, can you help me, thanks.
November 13, 2003 - 6:54 am UTC
ops$tkyte@ORA920LAP> CREATE TABLE t
2 (
3 data char(255),
4 temp_date date
5 )
6 PARTITION BY RANGE (temp_date) (
7 PARTITION part1 VALUES LESS THAN (to_date('13-mar-2003','dd-mon-yyyy')) ,
8 PARTITION part2 VALUES LESS THAN (to_date('14-mar-2003','dd-mon-yyyy')) ,
9 PARTITION part3 VALUES LESS THAN (to_date('15-mar-2003','dd-mon-yyyy')) ,
10 PARTITION part4 VALUES LESS THAN (to_date('16-mar-2003','dd-mon-yyyy')) ,
11 PARTITION part5 VALUES LESS THAN (to_date('17-mar-2003','dd-mon-yyyy')) ,
12 PARTITION part6 VALUES LESS THAN (to_date('18-mar-2003','dd-mon-yyyy')) ,
13 PARTITION junk VALUES LESS THAN (MAXVALUE)
14 )
15 ;
Table created.
ops$tkyte@ORA920LAP>
ops$tkyte@ORA920LAP> create index t_idx1 on t(temp_date) LOCAL nologging;
Index created.
ops$tkyte@ORA920LAP>
ops$tkyte@ORA920LAP> alter index t_idx1 unusable;
Index altered.
ops$tkyte@ORA920LAP>
ops$tkyte@ORA920LAP> begin
2 for x in ( select 'alter index ' || index_name ||
3 ' rebuild partition ' || partition_name stmt
4 from user_ind_partitions
5 where index_name = 'T_IDX1' )
6 loop
7 dbms_output.put_line( x.stmt );
8 execute immediate x.stmt;
9 end loop;
10 end;
11 /
alter index T_IDX1 rebuild partition PART1
alter index T_IDX1 rebuild partition PART2
alter index T_IDX1 rebuild partition PART3
alter index T_IDX1 rebuild partition PART4
alter index T_IDX1 rebuild partition PART5
alter index T_IDX1 rebuild partition PART6
alter index T_IDX1 rebuild partition JUNK
PL/SQL procedure successfully completed.
A reader, December 11, 2003 - 4:56 pm UTC
Hi Tom,
I am running below update statement from past 24 hr and it's still running..
Table S_REVN has 1.5 Million and Table S_OPTY_PROD has 1.2 Million Record.. It's doing full table scan of both table sinace I am using function in where clause.. We using RULE base optimizer and oracle 8.1.7.4
Here is sql statement
Update siebel.S_REVN r
set x_opty_prod_city=(select X_CITY from siebel.S_OPTY_PROD p where replace(replace(replace(p.ROW_ID,'ce','='),'-','='), '@', '$') = r.row_id),
X_opty_prod_STATE =(select X_STATE from siebel.S_OPTY_PROD p where replace(replace(replace(p.ROW_ID,'ce','='),'-','='), '@', '$') = r.row_id),
X_PR_YEAR_QTY =(select X_PR_YEAR_QTY from siebel.S_OPTY_PROD p where replace(replace(replace(p.ROW_ID,'ce','='),'-','='), '@', '$') = r.row_id)
0 UPDATE STATEMENT Optimizer=RULE
1 0 UPDATE OF 'S_REVN'
2 1 TABLE ACCESS (FULL) OF 'S_REVN'
3 0 TABLE ACCESS (FULL) OF 'S_OPTY_PROD'
4 0 TABLE ACCESS (FULL) OF 'S_OPTY_PROD'
5 0 TABLE ACCESS (FULL) OF 'S_OPTY_PROD'
Please advice
Thanks
December 11, 2003 - 6:27 pm UTC
you understand that it is doing 3 full scans of S_OPTY_PROD for EACH and EVERY row in S_REAVN yes???
how about
update t
set (a,b,c) = (select a,b,c from t2 where ... )
at the very least (reduce the full scans from 3 to 1)
at the very most, consider creating an index:
create index temp_index_to_make_this_complete_before_my_grandchildren_have_grandchildren
on p(replace(replace(replace(p.ROW_ID,'ce','='),'-','='), '@', '$'))
and useing a hint (to let sighbul use the cbo) to get the index used.
A reader, December 12, 2003 - 12:44 pm UTC
Hi Tom,
I am following your advice on update I've changed sql statement
Update /*+ INDEX(S_OPTY_PROD T_TEMP) */ siebel.S_REVN r
set (x_opty_prod_city,X_opty_prod_STATE,X_PR_YEAR_QTY) =
(select X_CITY,X_STATE,X_PR_YEAR_QTY from siebel.S_OPTY_PROD p
where replace(replace(replace(p.ROW_ID,'ce','='),'-','='), '@', '$') = r.row_id)
created function based index on S_OPTY_PROD
create index T_TEMP
on S_OPTY_PROD(replace(replace(replace(ROW_ID,'ce','='),'-','='), '@', '$'))
BUT above sql statement in not using index eventhough I supply hint (Is it b/c function based index not reconized by Rule based optimizer?)
Please help
Thanks
December 13, 2003 - 11:11 am UTC
you put the hint in the wrong place :)
and you might not have met all of the requirements for FBI's. read:
http://asktom.oracle.com/~tkyte/article1/index.html
but it is not the RBO (the well formed hint -> cbo)....
your hint is wrong (must use the correlation name of P) and in the wrong place (should be in the subquery).
you need query_rewrite_enabled and (depending on version) might need query_rewrite_integrity (see the above link).
but here is an example:
ops$tkyte@ORA920> create table t1( x_opty_prod_city int, x_opty_prod_state int, x_pr_year_qty int, row_id varchar2(25) );
Table created.
ops$tkyte@ORA920> create table t2( x_city int, x_state int, x_pr_year_qty int, row_id varchar2(25) );
Table created.
ops$tkyte@ORA920>
ops$tkyte@ORA920> create index t2_idx on t2( replace(replace(replace(ROW_ID,'ce','='),'-','='), '@', '$') );
Index created.
ops$tkyte@ORA920>
ops$tkyte@ORA920> delete from plan_table;
4 rows deleted.
ops$tkyte@ORA920>
ops$tkyte@ORA920> ALTER SESSION SET QUERY_REWRITE_ENABLED=true;
Session altered.
ops$tkyte@ORA920>
ops$tkyte@ORA920> explain plan for
2 Update t1 r
3 set (x_opty_prod_city,X_opty_prod_STATE,X_PR_YEAR_QTY) =
4 (select /*+ INDEX( t2 t2_idx ) */ X_CITY,X_STATE,X_PR_YEAR_QTY
5 from t2 p
6 where replace(replace(replace(p.ROW_ID,'ce','='),'-','='), '@', '$') = r.row_id)
7 /
Explained.
ops$tkyte@ORA920>
ops$tkyte@ORA920> prompt @?/rdbms/admin/utlxpls
@?/rdbms/admin/utlxpls
ops$tkyte@ORA920> set echo off
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |Cost(%CPU)|
-------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 82 | 4346 | 3 (34)|
| 1 | UPDATE | T1 | | | |
| 2 | TABLE ACCESS FULL | T1 | 82 | 4346 | 3 (34)|
| 3 | TABLE ACCESS BY INDEX ROWID| T2 | 1 | 53 | 2 (50)|
|* 4 | INDEX RANGE SCAN | T2_IDX | 1 | | 2 (50)|
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access(REPLACE(REPLACE(REPLACE("P"."ROW_ID",'ce','='),'-','='),'@','$')=:B
1)
16 rows selected.
A reader, December 12, 2003 - 9:04 pm UTC
hello Tom,I create table as following using the parallel hint,but the cost is different:
sql>insert /*+ append parallel(demo01,4) */ into demo01 select /*+ parallel(customer,4) */ * from customer;
the cost --42s
and the Execution Plan is:
Execution Plan
----------------------------------------------------------
ERROR:
ORA-12838: 无法在并行模式下修改之后读/修改对象
SP2-0612: 生成AUTOTRACE EXPLAIN报告时出现错误
Statistics
----------------------------------------------------------
1698 recursive calls
455 db block gets
8728 consistent gets
8075 physical reads
66492440 redo size
630 bytes sent via SQL*Net to client
525 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
11 sorts (memory)
0 sorts (disk)
390126 rows processed
sql>insert * from demo02 select * from customer;
cost --26s
and the Execution Plan is:
Execution Plan
----------------------------------------------------------
0 INSERT STATEMENT Optimizer=CHOOSE (Cost=1226 Card=659568 Byt
es=680674176)
1 0 PARTITION RANGE (ALL)
2 1 TABLE ACCESS (FULL) OF 'CUSTOMER' (Cost=1226 Card=659568
Bytes=680674176)
Statistics
----------------------------------------------------------
1939 recursive calls
44876 db block gets
20918 consistent gets
1 physical reads
64954576 redo size
646 bytes sent via SQL*Net to client
467 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
7 sorts (memory)
0 sorts (disk)
390126 rows processed
The customer table have 0.4 million records and the machine is 2 CPU's SUN E420,which runiing Oracle database.
Can you tell me why the physical reads increase so much using parallel hint in the DML sql,thanks
December 13, 2003 - 11:49 am UTC
well, for a tiny number of records like this, i would not even consider PQ at this point.
But -- to answer your questions
the cost, well the cost is affected by thousands of things. HINTS definitely affect the cost (That is in part how hints work -- by playing with costs). Ignore the cost of two queries that are not the same, they are NOT comparable.
the PIO's -- well, you ran one before the other? That and parallel query prefers to checkpoint -- and do direct io many times (you might find PQ doing a global checkpoint before the query begins to get current images onto disk in order to just all hit slam the data as fast as it can without messing with the buffer cache)
but for so few records -- it would seem that regular sql is what you want.
A reader, December 13, 2003 - 4:37 pm UTC
Hi Tom,
Followed you advice...WORKS LIKE A CHARM..updated 1.5 Million recod in 10 Min. EXCELLENT!!!!!!!
Thanks a LOTTTTTTTTTTT
Update siebel.S_REVN r
set (x_opty_prod_city,X_opty_prod_STATE,X_PR_YEAR_QTY) =
(select /*+ INDEX(S_OPTY_PROD T_TEMP) */ X_CITY,X_STATE,X_PR_YEAR_QTY from siebel.S_OPTY_PROD p
where replace(replace(replace(p.ROW_ID,'ce','='),'-','='), '@', '$') = r.row_id)
Online updation of very big table
Praveen, March 06, 2004 - 3:37 pm UTC
Hi Tom
Iwant to update a table with one go on online system.A table has 20 lakh records and has 110 columns. when i give the update command it takes ~ one hrs.I don't know why it is taking so much time even index is created on that particular field.
select count(*),app_flg from test;
app_flag has only two values Approved or unapproved.By defualt is unapproved.
approved =170000
unapproved=30000
update test set app_flg='APPROVED' where app_flg='UNAPPROVED'
it took 1hrs to update the records and other application online users processing got slow down and locks started to occure on the table.
Suggest the best solution
pkr
March 06, 2004 - 8:30 pm UTC
It would be the INDEX that slowed it down.
You want to FULL SCAN this table.
You want there to be NO index on the column you are updating.
it gives error
praveen, March 07, 2004 - 5:16 am UTC
on dropping the index it gives error ora-00054
thanks
March 07, 2004 - 9:26 am UTC
that'll happen whilst there are outstanding transactions, yes. You'll want to keep trying until you get a chance to drop it.
thanks
jasdeep,praveen, March 07, 2004 - 11:16 am UTC
i have solved that problem as an user has locked rows on the but at present not logged on i killed that session and index was dropped immediately.
and updates were as fast as u can think.
thanks
Update taking hrs of time
Sachin, March 09, 2004 - 7:44 am UTC
Hi Tom,
I have a query as:
I have two tables Table1(with around 10000 records max) and table2(actually the GL code combination master table with around 600,000 records). I need to update three fields in table1 (which is a temp processing table) with a unique value from table2. T1 has ccid fields which need to be updated, and s1-s4 fields corresponding to segment1-segment4 fields of table2
Following is the query like:
UPDATE table1 t1
SET t1.ccid1 = (SELECT t2.code_combination_id
FROM gl_code_combinations t2,
gl_sets_of_books gb
WHERE t2.segment5 = 'yyyyy'
and t2.segment4 = t1.s4
and t2.segment3 = 'XXXXX'
and t2.segment2 = t1.s2
and t2.segment1 = t1.s1
and t2.coa_id = gb.chart_of_accounts_id
and gb.set_of_books_id = 1),
t1.ccid2 = (SELECT t2.code_combination_id
FROM gl_code_combinations t2,
gl_sets_of_books gb
WHERE t2.segment5 = 'yyyyy'
and t2.segment4 = t1.s4
and t2.segment3 = 'zzzzzz'
and t2.segment2 = t1.s2
and t2.segment1 = t1.s1
and t2.coa_id = gb.coa_id
and gb.set_of_books_id = 1),
t1.ccid3 = (SELECT t2.ccid
FROM gl_code_combinations t2,
gl_sets_of_books gb
WHERE t2.segment5 = 'yyyyy'
and t2.segment4 = t1.s4
and t2.segment3 = 'wwwww'
and t2.segment2 = t1.s2
and t2.segment1 = t1.s1
and t2.coa_id = gb.coa_id
and gb.set_of_books_id = 1
But this is taking more than 6 hrs to complete. When I check the table locks the table remains locked Row Xclusively. I am committing immediately after the update statement in the procedure. Cud u pls tell me why is this happening?
March 09, 2004 - 12:26 pm UTC
not that i don't believe you but -- about the locking issue -- I'm not sure i believe you.
when you commit -- locks are released. You'd have to show me the info from v$lock showing that t1 is still "locked" and locked by your session.
you could
set (c1,c2,c3) = ( select max(decode(segment3,'xxxxx',ccid)),
max(decode(segment3,'zzzzz',ccid)),
max(decode(segment3,'wwwww',ccid))
FROM gl_code_combinations t2,
gl_sets_of_books gb
WHERE t2.segment5 = 'yyyyy'
and t2.segment4 = t1.s4
and t2.segment3 in ('xxxxx','zzzzzz', 'wwwww')
and t2.segment2 = t1.s2
and t2.segment1 = t1.s1
and t2.coa_id = gb.coa_id
and gb.set_of_books_id = 1 )
instead of running a query per column.
if you run the query
select max(decode(segment3,'xxxxx',ccid)),
max(decode(segment3,'zzzzz',ccid)),
max(decode(segment3,'wwwww',ccid))
FROM gl_code_combinations t2,
gl_sets_of_books gb
WHERE t2.segment5 = 'yyyyy'
and t2.segment4 = some_value_value
and t2.segment3 in ('xxxxx','zzzzzz', 'wwwww')
and t2.segment2 = some_value_value
and t2.segment1 = some_value_value
and t2.coa_id = gb.coa_id
and gb.set_of_books_id = 1
how long does that take?
Updating millions of rows
A reader, March 15, 2004 - 1:46 pm UTC
Tom,
In this discussion you mention that you would not do an UPDATE/DELETE of millions of rows, but rather would create a new table (in parallel/nologging) with the rows that needs to be retained and then create indexes in parallel, drop the old table and rename the new table to old table. If so, my understanding is :
1. Delete/UPDATE of tables with more than 1 million row would also have to maintain the index structure and also will generate considerable rollback and undo.
2. Other consideration with respect to update/delete is that the blocks will be moved in and out of the freelists. Please confirm if this is true and if so, please explain what happens during update/delete of large number of rows with regard to freelists?
3. Other consideration is that it will cause lot of table/segment fragmentation and lot of empty blocks below the HWM and all full table scans on the table will read lot of empty blocks.
Is my understanding correct and if I have missed any other things as to why DELETE/UPDATE of large tables is not recommended, please let me know.
4. If my table that needs to be updated/deleted is the primary/parent table with dependent child tables/constraints and triggers, is this approach correct
Disable all the triggers/constraints
Drop the table
Rename the new table to the old table
Reenable all constraints and triggers
What to do in such cases?
March 15, 2004 - 3:03 pm UTC
I "might", "probably" if it was most of the records...
1) correct
2) correct, deletes would put them on, updates might put them on, might take them off.
3) it could, yes.
the delete/update can take lots longer than CTAS + reindex if you are doing MOST all of the records in a LARGE table with lots of indexes.
4) triggers are not relevant here....
but you would have to drop the fkeys
do the CTAS+drop+rename
add the constraints back with enable/rely (don't need validation)
Deletes/Updates
A reader, March 15, 2004 - 5:44 pm UTC
Tom,
In the above discussion you mention :
1. The deletes will put them on because the used space in the block will drop below PCTUSED. Is it correct?
2. The update will put them on if the updated column makes the used space in the block fall below PCTUSED or the updated column makes the free space available in the block less than PCTFREE. Is it correct.
If both of the above understanding is not correct, please explain. Also, please let me know how we could track the movement of the blocks on and off the freelist. Any SQL scripts/tools available that I could use?
Thanks
March 15, 2004 - 6:24 pm UTC
1) the deletes could, yes.
2) yes.
to track individual blocks -- not without dumping blocks (and I don't go there). dbms_space and other tools (search this site for show_space for the script I use) can show you how MANY things are on the freelist.
A small correction
A reader, March 15, 2004 - 6:30 pm UTC
Tom,
In the above scenario for question 2, there is a small correction marked in CAPITAL letters :
"The update will put them on if the updated column makes the used space in the block fall below PCTUSED AND WILL TAKE OFF of the freelist if the updated column makes the free space available in the block less than PCTFREE. "
Is it correct?
Thanks
March 15, 2004 - 6:33 pm UTC
I'll state it simply, forget pctfree/pctused/etc -- in english:
if the update reduces the row size, it can put it back onto the freelist.
if the update increases the row size, it can take it off the freelist.
Parallel DML
A reader, March 16, 2004 - 8:22 am UTC
1)On page 630 of your Oracle Expert One on One, you mention that parallel DML can be used only upon partitions -- in 8i. I'm using 9.2. Has this behavior changed in the later versions ?
2)I need to do a mass delete on a 12 GB table and it's taking to much time (far too much, really). How could I speed it up ?
Thanks
enable PK with parallel clause
A reader, March 20, 2004 - 6:40 pm UTC
Hi
I am working in 8.1.7.4.
I want to delete 30 million rows of 60 million so I am doing this
1. create a copy of original table with good data
2. disable original tables´s constraints, primary keys (and child FKs) and foreign keys
3. truncate the original table
4. make all original tables indexes unusable
5. insert append back from copy to original, I have to do this way because I am not allowed to modify constraint names
6. drop the copy
7. enable constraints. here when I enable PK I cannot provide paralle clause right? I searched the doco but it seems that I can only specify using index tablespace clause or I am missing something?
8. rebuild all unusable indexes
March 21, 2004 - 9:50 am UTC
you can create an index in parallel (unique) and then enable/rely/novalidate the pk.
How to Update millions or records in a table
A reader, March 25, 2004 - 11:55 am UTC
Hi Tom,
I read your response to Murali's question above and believe, there will be a downtime window for the application. This is because if I want to keep the index names same as before, I will have to create the new table, drop the old table, rename the new table to old table name and then create the required indexes on the same.
Wondering, how if we can create the indexes by some other name on the new table and rename the indexes after dropping the old table.
As always your valuable advice helps a lot!
March 25, 2004 - 9:13 pm UTC
you can rename the indexes.
alter index supports this.
how to make this restartble?
A reader, March 25, 2004 - 6:14 pm UTC
Hi
I want to delete 80% of rows from a couple of 50 million rows table. The process is divided in 10 steps
1. Create TEMP table, CTAS from original table
2. Disable constraints
3. Truncate original table
4. Set indexes to UNUSABLE
5. Insert into original table from TEMP table
6. Drop TEMP table
7. Create PK and UK
8. Enable PK and UK
9. Enable FKs
10. Rebuild indexes
I want to make this process restartble, i.e if it fails in step 3 if I later rerun the procedure it will start from step 3 again. How can we achiveve this? I have been thinking using a table which stores the two table name and process status something like following table then update the status as the process is going
TABLE: PROCESS_STATUS
TABLE_NAME STATUS
----------- ---------------
EMP 2
DEPT 0
Process_status is the step, 0 means that I need to start from beginning so in each process I would query this table and see which step should be run. For example if the process failed at step 3 then STATUS would be 2
select status into l_proc_status from PROCESS_STATUS
where table_name = 'EMP'
if l_proc_status = 0
then
run process 1
end if;
select status into l_proc_status from PROCESS_STATUS
where table_name = 'EMP'
if l_proc_status = 1
then
run process 2
end if;
select status into l_proc_status from PROCESS_STATUS
where table_name = 'EMP'
if l_proc_status = 2
then
run process 3
end if;
select status into l_proc_status from PROCESS_STATUS
where table_name = 'EMP'
if l_proc_status = 2
then
run process 3
end if;
and so on.
Any suggestions :-? What I see is that I will hav quite a few redundant codes...
March 25, 2004 - 9:35 pm UTC
you'd have to keep a state table and have you code query it up, much like you are suggesting. That would do it yes.
another "inserting" idea might be to
a) insert the steps to process
b) delete them as you complete them (and commit)
c) to restart, just pick up at the step you wanted
you could insert the procedures to be called and just:
for x in ( select name from procedures order by seq )
loop
execute immediate 'begin ' || x.name || '; end;';
delete from procedures....
commit;
end loop
(just a thought, not fully baked)
provide more information about that procedure drop indexes and rebuild with nologging
mohan, April 07, 2004 - 2:39 am UTC
Hi Tom
Could you provide more information about that procedure and
how to drop indexes and rebuild with nologging .We are using informatica etl tool
before loading bulk data into target drop the indexes ( pre-session) and after load the data rebuilt the indexes with nologging (post-session ), it takes less amount of time because its generates less undo
Regards
Mohan
April 07, 2004 - 9:09 am UTC
it is just a drop and create? not sure what you are looking for -- if it is "syntax", we document that.
Problem with Update
Ram, April 08, 2004 - 6:23 am UTC
Hi Tom,
This update is not working properly.How to correct this?
SQL> declare
2 x number;
3 y number;
4 begin
5 execute immediate 'update e set sal = sal*1.1 where ename in ('SMITH','KING') returning sal into :x,:y' using x,y;
6 dbms_output.put_line(x || chr(9)||y);
7* end;
*
ERROR at line 5:
ORA-06550: line 5, column 65:
PLS-00103: Encountered the symbol "SMITH" when expecting one of the following:
. ( * @ % & = - + ; < / > at in is mod not rem return
returning <an exponent (**)> <> or != or ~= >= <= <> and or
like between into using || bulk
Please do reply.
Bye!
April 08, 2004 - 10:09 am UTC
..... ( ''SMITH'', ''KING'' ) .....
you need two quotes for a quote in a string (until 10g when there is another way)
Updating million records
Himnish Narang, April 08, 2004 - 7:34 am UTC
Hi Tom,
i was just going through the discussion in this subject. In this you have described that you will create a table instead of going for updates of million records and this update will be peroformed as part of creation of new table.
Don't you think that the new table will also occupy the same amount of space and after dropping the table we will have to do the reorg for that tablespace.
please comment.
Thanx.
April 08, 2004 - 10:24 am UTC
reorgs are so overrated.
no, i would not reorge a tablespace or anything simply cause I copied some data and dropped some old stuff. not a chance.
How to Update millions or records in a table
Ganesh, April 08, 2004 - 9:03 am UTC
Hi Tom,
Its very useful and I used this saved lots of time by creating new table.
I got another issue similar to this. We are using 9i. there is requirement of needs to modify primary key datatype number to varchar which is having lot of dependents. Is there any option with out rebuilding table as data already exist.
Can you please suggest on this.
Thanks in Advance.
Ganesh
April 08, 2004 - 10:51 am UTC
that'll be a rebuild pretty much.
Please see this
Ram, April 08, 2004 - 11:38 pm UTC
Hi Tom,
Thanks for your reply but Still it's not working properly.
SQL> declare
2 x number;
3 y number;
4 begin
5 execute immediate 'update e set sal = sal*1.1 where ename in (''SMITH'',''ALLEN'')
6 RETURNING sal into :x,:y' using x,y;
7 dbms_output.put_line(x ||chr(9) ||y);
8 end;
9 /
declare
*
ERROR at line 1:
ORA-00913: too many values
ORA-06512: at line 5
How to correct this?You say about another way in Oracle 10G.How does that work?Could you please provide an example.
Please do reply.
Bye!
April 09, 2004 - 7:42 am UTC
you are returning SAL (one thing) into two things (:x,:y)
different problem all together.
in 10g, the strings could be
q'|how's this for quoting|'
instead of
'how''s this for quoting'
a character string literal that starts with q for quote will use the next two characters as the start quote and those two characters in reverse for the end quote.
Thanks
Ram, April 09, 2004 - 9:39 am UTC
Hi Tom,
Thanks for your reply.I found out the way of doing it as follows and Do you have any other option to do it in a better way?
SQL> create or replace type sal_t as table of number;
2 /
Type created.
SQL> declare
2 sals sal_t := sal_t();
3 begin
4 execute immediate 'update emp set sal = sal*1.1 where ename in (''SMITH'',''KING'')
5 returning sal into :sals' returning bulk collect into sals;
6 for i in 1..sals.count loop
7 dbms_output.put_line('Updated salary:'||sals(i));
8 end loop;
9 end;
10 /
Updated salary:968
Updated salary:5500
PL/SQL procedure successfully completed.
Please do reply.
Bye!
April 09, 2004 - 10:17 am UTC
get rid of the dynamic sql, it isn't necessary (fixes your original issue with the quotes as well). also, would seem to make sense to pull back other columns so you would be able to see what rows were actually updated:
ops$tkyte@ORA9IR2> declare
2 type numArray is table of number;
3 l_sals numArray;
4 l_empno numArray;
5 begin
6 update emp
7 set sal = sal*1.1
8 where ename in ('SMITH','KING')
9 returning empno, sal bulk collect into l_empno, l_sals;
10
11 for i in 1..l_sals.count loop
12 dbms_output.put_line
13 ('Updated salary for empno ' || l_empno(i) ||
14 ' to: '||l_sals(i));
15 end loop;
16 end;
17 /
Updated salary for empno 7369 to: 880
Updated salary for empno 7839 to: 5500
PL/SQL procedure successfully completed.
To Mr. Ram
A reader, April 09, 2004 - 10:44 am UTC
Hi Ram and All,
May I request you all not to pester Tom with inane posts requesting Tom to debug and program on our behalf.
This forum is not to learn coding.
update based on rowid
john, April 16, 2004 - 2:02 am UTC
Tom,
we use non-intelligent primary key to update the table through a stored procedure.
i.e. when the table is queried, our select procedure returns the non-intelligent key along with the details like below
select emp_key, other details from emp;
when updating the data, emp_key is passed from the front end to the stored procedure and table is updated based on emp_key.
is it a bad idea, using rowid instead of non-intelligent key(emp_key)?
April 16, 2004 - 7:27 am UTC
non-intelligent? interesting choice of terminology.
But in any case -- once upon a time ago, rowids were IMMUTABLE. Once assigned a row would have a rowid assigned to it and that rowid would live with that row until you deleted it.
Starting in 8i with support for updates to partition keys that would cause a row to move from partition A to partition B -- that is no longer true (and then there are IOT's...).
In 10g, there are even more options for 'row movement' -- an online segment shrink for example.
So, rowids can change and are changing in more circumstances as time goes on.
Sooooo, if you have lost update detection in place using 'FOR UPDATE' -- rowids are very safe (forms uses them). What I mean is -- you
a) select a.*, rowid from T a where .... 'normally'
b) you decide to update a row, so you
select * from t where rowid = :that_rowid AND
((c1 = :that_c1 or (c1 is null and :that_c1 is null)) AND
((c2 = :that_c2 or (c2 is null and :that_c2 is null)) AND ...
((cN = :that_cN or (cN is null and :that_cN is null))
for update NOWAIT;
you lock it -- this prevents anything that could change a rowid from doing so and verifies the data hasn't been changed. If that returns 0 rows, someone changed the data (or reorg'ed it and moved it). You need to requery to get the current values before you let the user even think about modifying it. If that returns a row -- you got it. If that returns ora-54, something has it locked, you have to decide what you want to do about that.
c) you can then safely update that row by rowid
If you do not have lost update detection in place using FOR UPDATE locks -- then you should stick with a primary key probably, just to protect yourself in the future.
update based on rowid
john, April 19, 2004 - 12:42 am UTC
thanks a lot Tom.
to understand fully what you said, can you please tell
1.why do we have to where condition?
((c1 = :that_c1 or (c1 is null and :that_c1 is null)) AND
((c2 = :that_c2 or (c2 is null and :that_c2 is null)) AND ...
((cN = :that_cN or (cN is null and :that_cN is null))
2.another question is, is there a situation where a rowid of a row being assigned another row of the same table?(because this may cause more danger as we end up updating another row)?
thanks again.
April 19, 2004 - 6:37 am UTC
1) you want to make sure the row hasn't changed.
c1 = :that_c1 will only with if both the column and the bind are not null, hence the rest is needed to make it so null's compare "the same"
2) this is tied into 1 again. sure, we could delete from emp where empno = 7788; commit; and then re-insert that row and get the same rowid (rare, but could happen).
But - if we inserted the same exact values and they got the same exact rowid -- then #1 would make this "safe". the values of the row are the same, for all intents and purposes it IS the same row.
If the newly inserted row doesn't match column by column the values we expect, well, then we'll reject it (won't lock it) and all will be well.
update large table
Prasad Chittori, April 22, 2004 - 6:04 pm UTC
I have a very large partitioned table with DATE column, I would like to take out the time portion from the date column
I did the following and it is taking lot of time and failing with unable to extend the rollback segments.
Any better ideas without increasing the rollback segments or changing the query
update t1
set dt = trunc(dt)
April 23, 2004 - 10:12 am UTC
parallel dml -- each pq slave can get its own RBS, letting you use them all at the same time, not just one of them.
convert delete to insert
marvin, April 26, 2004 - 11:30 am UTC
Hi
We want to delete several tables of several million of rows. The problem is, the delete statement is quite simple but if we want to this faster we would insert to temp table, truncate original and insert back the good data. We want to convert delete into insert statements but I think it´s very hard
For example
We have these DELETE statements
DELETE
FROM OM_ANNEX t
WHERE EXISTS
(SELECT 1
FROM LISTA_ORDENES l,OM_CONTRACT_INST c
WHERE t.contract_id=c.contract_id
AND c.order_id=l.order_id)
/
COMMIT
/
DELETE
FROM OM_ANNEX t
WHERE EXISTS(SELECT 1 FROM OM_CONTRACT_INST c
WHERE t.contract_id=c.contract_id
AND c.order_id IS NULL)
/
COMMIT
to convert to that we did
INSERT /*+ APPEND */ INTO OM_ANNEX$TMP
SELECT t.*
FROM OM_ANNEX t
WHERE NOT EXISTS (SELECT 1
FROM LISTA_ORDENES l,
OM_CONTRACT_INST c
WHERE t.contract_id=c.contract_id
AND c.order_id=l.order_id)
UNION
SELECT t.*
FROM OM_ANNEX t
WHERE NOT EXISTS (SELECT 1
FROM OM_CONTRACT_INST c
WHERE t.contract_id=c.contract_id
AND c.order_id IS NULL);
Seems incorrect, if I run the queries inside insert seperately I get correct results but if we use UNIONs we get data which we dont want, if we do it in two inserts we get duplicate rows
One approach is this
create a temp table using the delete statements but converted to querys for example instead of
DELETE
FROM OM_ANNEX t
WHERE EXISTS
(SELECT 1
FROM LISTA_ORDENES l,OM_CONTRACT_INST c
WHERE t.contract_id=c.contract_id
AND c.order_id=l.order_id)
we convert to
select t.*
FROM OM_ANNEX t
WHERE EXISTS (SELECT 1
FROM LISTA_ORDENES l,OM_CONTRACT_INST c
WHERE t.contract_id=c.contract_id
AND c.order_id=l.order_id)
then simply says
select * from om_annex a
where not exists (select null from temp_table b
where a.annexid = b.annexid)
annexid is PK
the other approach is this
select * from om_annex a
where annexid not in (select annexid
FROM OM_ANNEX t
WHERE EXISTS (SELECT 1
FROM LISTA_ORDENES l,OM_CONTRACT_INST c
WHERE t.contract_id=c.contract_id
AND c.order_id=l.order_id)
union
select annexid
FROM OM_ANNEX t
WHERE EXISTS(SELECT 1
FROM OM_CONTRACT_INST c
WHERE t.contract_id=c.contract_id
AND c.order_id IS NULL))
They are all very slow because involving tables have several million of rows!
Is there a better approach?
April 26, 2004 - 2:20 pm UTC
why not:
create table new as select rows to keep;
drop table old;
rename new to old;
do the first in parallel, with nologging.
err the problem is convert the DELETE to CTAS
A reader, April 26, 2004 - 3:03 pm UTC
Hi
create table new as select rows to keep;
drop table old;
rename new to old;
do the first in parallel, with nologging.
That is exactly I want to do, the problem is until now we have always done the other way round, use plain DELETE (and it takes a week to delete everything!), the problem I am not sure how to convert DELETE to CTAS. If I want to do the reverse of DELETE statements (some table has 5 DELETE statements!) it is not as simple as write the DELETE the other way round? For example how would you change
delete tab1
where exists (select null
from tab2, tab3
where tab2.id = tab3.id
and tab1.id = tab2.fid)
delete tab1
where exists (select null
from tab2
where tab2.fid = tab1.id
and tab2.id is null)
Would you change it to
insert into tmp_x
select *
from tab1
where not exists (select null
from tab2, tab3
where tab2.id = tab3.id
and tab1.id = tab2.fid)
insert into tmp_x
select *
from tab1
where not exists (select null
from tab2
where tab2.fid = tab1.id
and tab2.id is null)
Is simple as this?
April 27, 2004 - 4:02 am UTC
if i had this:
delete tab1
where exists (select null
from tab2, tab3
where tab2.id = tab3.id
and tab1.id = tab2.fid)
delete tab1
where exists (select null
from tab2
where tab2.fid = tab1.id
and tab2.id is null)
I would probably have this:
create table tab1_new
as
select tab1.*
from tab1, tab2, tab3
where tab1.id = tab2.fid(+)
and tab2.id = tab3.id(+)
and NOT ( tab2.fid is not null and tab3.id is not null )
and NOT ( tab2.fid is not null and tab2.id is null )
/
outer join the three tables. Negate the conditions for the where exists.
that is, after outer joining tab1 to tab2, tab3 -- remove the rows
where tab2.fid is not null and tab3.id is not null -- that is subquery one in your deletes above.
where tab2.fid is not null and tab2.id is null -- that is subquery two in your deletes above.
err the problem is convert the DELETE to CTAS
marvin, April 26, 2004 - 3:04 pm UTC
Hi
create table new as select rows to keep;
drop table old;
rename new to old;
do the first in parallel, with nologging.
That is exactly I want to do, the problem is until now we have always done the other way round, use plain DELETE (and it takes a week to delete everything!), the problem I am not sure how to convert DELETE to CTAS. If I want to do the reverse of DELETE statements (some table has 5 DELETE statements!) it is not as simple as write the DELETE the other way round? For example how would you change
delete tab1
where exists (select null
from tab2, tab3
where tab2.id = tab3.id
and tab1.id = tab2.fid)
delete tab1
where exists (select null
from tab2
where tab2.fid = tab1.id
and tab2.id is null)
Would you change it to
insert into tmp_x
select *
from tab1
where not exists (select null
from tab2, tab3
where tab2.id = tab3.id
and tab1.id = tab2.fid)
insert into tmp_x
select *
from tab1
where not exists (select null
from tab2
where tab2.fid = tab1.id
and tab2.id is null)
Is simple as this?
thank you very much for the outer join tip
marvin, April 27, 2004 - 6:16 am UTC
Hi
I am going to have a look how to apply the outer join in order to convert DELETE to CTAS.
I have a further question, I have another table which undergoes 4 DELETEs
DELETE
FROM SW_PERSON t
WHERE EXISTS
(SELECT 1
FROM LISTA_ORDENES o
WHERE o.order_id=t.swobjectid AND t.swtype='ORDER')
/
COMMIT
/
DELETE
FROM SW_PERSON t
WHERE t.swtype='ORDER' AND t.swobjectid IS NULL AND COMP_INST_ID IS NULL
/
COMMIT
/
DELETE
FROM SW_PERSON t
WHERE t.swtype IS NULL
/
COMMIT
/
DELETE
FROM SW_PERSON t
WHERE t.swtype='ORDER'
AND t.swobjectid IS NULL
AND EXISTS
(SELECT 1
FROM OM_COMPANY_INST c, LISTA_ORDENES l
WHERE c.COMP_INST_ID=t.COMP_INST_ID
AND l.order_id=c.order_id)
/
COMMIT
/
I need to convert this to CTAS as well, however I am not sure if this can be done in a single statement. These DELETE for example cant be converted into one as follows right? (because of commit between them)
DELETE
FROM SW_PERSON t
WHERE EXISTS (SELECT 1
FROM LISTA_ORDENES o
WHERE o.order_id=t.swobjectid
AND t.swtype='ORDER')
OR (t.swtype='ORDER'
AND t.swobjectid IS NULL
AND COMP_INST_ID IS NULL)
OR t.swtype IS NULL
OR (t.swtype='ORDER'
AND t.swobjectid IS NULL
AND EXISTS (SELECT 1
FROM OM_COMPANY_INST c,
LISTA_ORDENES l
WHERE c.COMP_INST_ID=t.COMP_INST_ID
AND l.order_id=c.order_id));
Can this use the outer join tip as well?
TIA
April 28, 2004 - 12:17 pm UTC
why have any commits in between.
but of course -- any four deletes against a single table can (and if you ask me, should) be done as a single delete.
the outer join was used in the CTAS, not in a delete.
why do you use outer join
A reader, April 27, 2004 - 6:57 am UTC
hi
Why is outer join needed for tab1, tab2 and tab3 :-?
create table tab1_new
as
select tab1.*
from tab1, tab2, tab3
where tab1.id = tab2.fid(+)
and tab2.id = tab3.id(+)
and NOT ( tab2.fid is not null and tab3.id is not null )
and NOT ( tab2.fid is not null and tab2.id is null )
/
April 28, 2004 - 12:20 pm UTC
because we wanted to keep all rows in tab1 -- if there is NO mate in tab2/tab3
since they deleted "where exists in tab2/tab3":
delete tab1
where exists (select null
from tab2, tab3
where tab2.id = tab3.id
and tab1.id = tab2.fid)
delete tab1
where exists (select null
from tab2
where tab2.fid = tab1.id
and tab2.id is null)
rows in tab2/tab3, we need to ALWAYS keep rows that are NOT in tab2/tab3 using CTAS. outer join is mandatory for that in this example.
to marvin
A reader, April 27, 2004 - 9:33 am UTC
hi marvin, you can try this
select * from tab1
where PK not in (select PK
from tab1
where exists (select null
from tab2, tab3
where tab2.id = tab3.id
and tab1.id = tab2.fid)
union
select *
from tab1
where exists (select null
from tab2
where tab2.fid = tab1.id
and tab2.id is null)))
regarding the conditions
A reader, April 27, 2004 - 11:09 am UTC
Hi Tom
May you show some light why
NOT ( tab2.fid is not null and tab3.id is not null )
is same as
exists (select null
from tab2, tab3
where tab2.id = tab3.id
and tab1.id = tab2.fid)
and
NOT ( tab2.fid is not null and tab2.id is null )
is same as
exists (select null
from tab2
where tab2.fid = tab1.id
and tab2.id is null)
Cant see why. Thank you
April 28, 2004 - 12:50 pm UTC
it isn't the same.
it is in fact the opposite.
if you outer join T1 to T2 to T3
and before you were looking for (where exists)
a) a match in T2 (tab1.id = tab2.fid)
b) a match in T3 for that T2 (tabe2.id = tab3.id)
then you are saying "if I outer join T1 to T2 to T3, that row would be such that:
a) tab2.fid is NOT NULL (we found a mate)
b) tab3.id is NOT NULL (we found a mate in t3 for t2)
with the where exists -- we would have deleted that row, hence with the CTAS (which is finding rows to keep) we simply NEGATE that (with NOT).
Therefore we would keep that row, IF that was "not" satisfied.
Same logic for the second part.
the second where exists says
delete the row if
a) there is a match in T2 (where tab2.fid = tab1.id)
b) the id column in t2 for that match is NULL
that would be (in an outer join)
(tab2.fid is not null) -- we joined to a row
(tab2.id is null) -- and that row is having id is null
negate it and keep it.
Updating Table having millions of records taking lot of Time..
Anand Pandey, April 28, 2004 - 3:45 am UTC
Hi Tom,
I hd a table having millions of record in which two of its cols are Null.
i just tried to update the null colls with the data from other table, which is taking taking around 10-12 hrs for a single day record, and I've to update it for 31 days.
pls help Me in getting the high perf. on updation.
Query used is:
UPDATE /*+nologging parallel(4) */ MASTER_tab A SET (null_col1,null_col2)=(SELECT /*+PARALLEL(4) */ MIN(C1),MIN(C2)
FROM Table2 B WHERE SUBSTR (A.col1_T1, 1, LENGTH (B.C1)) = B.C1
AND SUBSTR(A.col2_T1,1,3)=B.C2)
WHERE c_date='01-Jan-04'
Thanks and Regards,
Anand Pandey
Nologging - how does it impact recovery?
Naresh, April 30, 2004 - 11:12 am UTC
Hi Tom,
This is a great chain of discussion. I especially liked the "outer join to replace the not exists". I am really looking forward to my copy of your first book that I oredered recently (on it's way from amazon).
One question regarding making the table nologging: Does it not have implications for recovery? What am I missing?
April 30, 2004 - 5:16 pm UTC
you need to schedule a hot backup if you use non-logged operations, yes.
db sequential waits on UPDATE
A reader, May 14, 2004 - 9:55 am UTC
Dear Tom,
Due to some processing complexities I am unable to use a single SQL for Update/Insert.
I use:
LOOP
1. Bulk select (500 rows at a time) from Table A, C with rowids from C
2. Bulk insert
3. Bulk update table C
END LOOP
I am getting a very high number of "db file sequential read" waits on the update part. Please see relevant portions of Trace File:
********************************************************************************
SELECT /*+ FULL(v) FULL(c) PARALLEL(v,4) PARALLEL(c,4) HASH(v,c) */
c.rowid,
...
FROM vp v,citi c
WHERE v.idno= c.idno
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.03 0.07 10 234 0 0
Execute 1 0.32 3.38 3023 156 3 0
Fetch 155 1.96 82.82 0 0 0 77500
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 157 2.32 86.28 3033 390 3 77500
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 95 (recursive depth: 1)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 4 0.01 0.03
rdbms ipc reply 26 0.00 0.00
enqueue 2 0.02 0.02
PX Deq: Join ACK 7 0.00 0.00
PX Deq: Parse Reply 4 0.23 0.35
PX Deq: Execute Reply 717 2.06 735.45
PX qref latch 4 1.02 2.75
PX Deq: Table Q Normal 5218 0.35 23.95
********************************************************************************
INSERT /*+ APPEND PARALLEL(A,4) */ INTO t_v A
VALUES(:b1,
:b2)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.01 2 11 0 0
Execute 155 0.32 8.00 552 1116 5567 77500
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 156 0.32 8.02 554 1127 5567 77500
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 95 (recursive depth: 1)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 552 0.36 6.43
free buffer waits 6 1.02 1.23
********************************************************************************
UPDATE /* PARALLEL(c,4) */ citi c
SET ...
WHERE c.rowid = :b6
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 1 0 0
Execute 154 35.26 586.87 75251 77036 79595 77000
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 155 35.26 586.87 75251 77037 79595 77000
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 95 (recursive depth: 1)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 75578 1.04 461.09
free buffer waits 242 1.03 96.11
********************************************************************************
Is it not true that if the row has already been selected (and is there in the buffer cache) no disk IO should happen to update this row?
Please tell me a way to make this faster.
Regards
May 15, 2004 - 10:33 am UTC
insert /*+ append parallel... */ VALUES hmmmmmm
append = no go with values, only works with "as select"
parallel -- parallel on what exactly?
update /* parallel(c,4) */ hmmmmmm
in search of the fast = true switch. thankfully that hint is malformed and hence ignored as well -- you are updating by rowid.
parallel is *not* fast=true.
append is not either really, especially with the values clause.
you can review the trace file itself (p1,p2,p3 will tell you file and block info, along with blocks read). you can use that to verify that it is the very act of reading the indexes that need to be updated and maintained that is causing this. If they are not in the case, well, we'll need to read them into there.
More Info
A reader, May 16, 2004 - 11:41 pm UTC
Thanks Tom,
The hint in the update was there by a Developer, it has been rightly disabled. The insert is not a bottleneck so didn't look into it.
This is a development machine, there are no other jobs running, asynchronous IO is enabled, the machine is on RAID 0 (no fault tolerance - being a development one). There are NO INDEXES on Tables being inserted and updated.
The segment on which the WAITS (db sequential read) are happening are that of the TABLE that is being UPDATED.
Please guide me next.
May 17, 2004 - 7:29 am UTC
then you are seeing physical IO performed to read into the buffer cache the data needed to be updated. If it is not cached, we'll have to read it.
PARALLEL DML
Sar, May 28, 2004 - 2:50 pm UTC
Tom,
I need to update a table that has 50 million rows but
the number of rows affected are only 1 million rows.
I have a single update statement to do this.
UPDATE acc_proc
SET acc_seq_num = 0
WHERE acc_seq_num > 0
/
The column acc_seq_num is not part of an index.
Can you please suggest me if there is anything better
that I can do on this SQL to run it faster.
Thanks
Sar
May 28, 2004 - 2:56 pm UTC
nope, that is perfect.
shouldn't take very long at all. unless you are getting blocked constantly by other sessions.
do i need to commit after execute immediate ? for dml or ddl ?
A reader, June 02, 2004 - 11:31 am UTC
June 02, 2004 - 11:35 am UTC
DDL is processed *conceptually as:
begin
commit; -- commit anything you started, even if DDL fails, you are committed
do the ddl;
commit; -- commit the ddl
exception
when others
then
rollback; -- the ddl
raise;
end;
/
You need not commit DDL, it'll commit 2 times for you.
You need to commit DML as DML does not do that.
This is regardless of the method used to execute SQL, native dynamic sql, dbms_sql, static sql, whatever -- the rules are the same in all cases.
Thanks !!
A reader, June 02, 2004 - 11:50 am UTC
ok,
so i did
1.)
begin
begin
Execute immediate 'SET CONSTRAINTS ALL DEFERRED';
end;
begin
execute immediate 'delete from table ' || t;
end;
begin
execute immediate 'delete from table ' || t1;
end;
exception
when others then
rollback;
raise;
end;
/
i got the message some constraint of t1 violated. (child record exists)
if i did not issue the commit,post or rollback,
and did not perform any DDL,constraints are DEFERRED
what is the issue ? can u help ?
June 02, 2004 - 11:56 am UTC
...
begin
execute immediate 'set constraints all deferred';
execute immediate 'delete from ' || t;
execute immediate 'delete from ' || t1;
execute immediate 'set constraints all immediate';
end;
/
that is all the code you want -- the all immediate will validate the contraints
if they fail -- then the error (exception) will be propagated back to the client and the rollback of the statement (which is whatever the client sent over to the database -- a plsql block, whatever) will be undone.
A reader, June 02, 2004 - 12:01 pm UTC
...
begin
execute immediate 'set constraints all deferred';
execute immediate 'delete from ' || t; -- ***** this one
-- execute immediate 'delete from ' || t1;
-- execute immediate 'set constraints all immediate';
end;
/
the question is
as soon as it executes the
'delete from ' || t;
statement, it raises the referential constriant violation exception. why ?
as
1.) constraints all deferred
2.) it NOT a ddl
3.) I am not commiting(or ending the tx) manually
June 02, 2004 - 12:45 pm UTC
the constraint must not be deferrable.
you have to have deferrable constraints in order to defer them. the default is "not deferrable"
7.3.4 - extracting 5 million records from 100 million record table
A reader, June 15, 2004 - 12:39 pm UTC
Any suggestions on how to accomplish this on 7.3.4? No concept of nologging, partitioning, Export/query clause etc in 7.3.4..
How I can extract about 5 million records (based on a condition) into a separate database/server from a 100million record table? What's the most efficient way? I'm having to make do with very limited disk/space on both machines.
Thanks.
June 15, 2004 - 5:30 pm UTC
5 million is pretty small.
I'd just
create table as select ( concept of nologging did in fact exist, unrecoverable);
export it
import it
or use the sqlplus copy command if they are connected via a nice network.
A reader, June 16, 2004 - 5:38 pm UTC
Thanks. Had some long columns in there. So 'Create table as select' wouldn't work. Am just ending up plsql looping and hitting smaller chunks of the mega table. Then creating smaller target tables..then exp/imp.
Am afraid I won't have sufficient temp space to do a sqlplus "Copy from". Also, documentation talks about Copy not being intended for Oracle to Oracle DBs. No idea why.
June 16, 2004 - 6:12 pm UTC
copy is ONLY for oracle to oracle db's....
A reader, June 16, 2004 - 7:40 pm UTC
Per 7.3.4 documentation :
</code>
http://download-west.oracle.com/docs/pdf/A42562_1.pdf <code>
Excerpt from it -
Note: In general, the COPY command was designed to be used
for copying data between Oracle and non-Oracle databases. You should use SQL commands (CREATE TABLE AS and INSERT)
to copy data between Oracle databases.
June 17, 2004 - 7:53 am UTC
but if you think about it..... doesn't matter what the doc says, it takes two connections. connections are only to oracle databases. sure you could be using a gateway -- but even there, it would be true that create table as and insert would work.
Updating and Inserting 1 Million rows Daily with Bitmap indexes
Sidda, June 24, 2004 - 6:32 pm UTC
Hi Tom,
Here we are facing very big problem. We have a Partitioned table with 180Million records(with 70 columns and 10 bitmaps,10 B-tree indexes).End users need Bitmap indexes 24/7 so we can't drop and recreate table and indexes.Daily we have to update and insert 1Million records. We tried with bulk updates but invain. What is the best method to follow up?
Thanks in advance
Sidda
June 24, 2004 - 9:01 pm UTC
describe "in vain", what went wrong?
it would be best to do a SINGLE insert into and a SINGLE Update against this table (not even in bulk - just single statements)
Creating Table with aggregated data from another table
RB, July 26, 2004 - 1:16 pm UTC
Tom: Qn related to creating a table with data from another table -
I want to create a table with few fields and aggregated some of few columns from another table.
Here is a query:
create table SRC200401 as select ID, src, LOC, sum(BYTES) TOTAL, sum (billable_bytes) TOTAL_BILLABLE from SRC_MONTHLY where ID = 1072915200 group by ID,LOC, SRC;
I have 128Million record in the SRC_MONTHLY table and it is taking more than 1 hr to get this table created. I have index on ID, src and LOC fields.
Any faster way of getting this table created?
July 26, 2004 - 1:32 pm UTC
so how many records have that id?
Great aproach, but is it as fast for an IOT table?
Peter Tran, July 26, 2004 - 2:34 pm UTC
Hi Tom,
I'm trying the same approach with an IOT table. We have an IOT table partitioned daily. I want to recreate this table with a monthly partition. I do a CTAS parallel nologging using the new monthly partition, but it's SLOW. Then again, the table does have 200 million rows. Is the "Index Organization" part of table the slow part?
Thanks,
-Peter
July 26, 2004 - 2:47 pm UTC
what do the inputs/outputs look like (table structure) -- remember, you are sorting 200,000,000 rows as well!
did you give it a nice juicy sort area size?
Unfortunately no.
Peter Tran, July 26, 2004 - 2:58 pm UTC
"...nice juicy sort area size" That would be a negative. :( Man, this is going to take forever isn't it? Ugh...
SQL> show parameter sort_area
NAME TYPE VALUE
------------------------------------ ----------- --------
sort_area_retained_size integer 8388608
sort_area_size integer 8388608
SQL> desc odfrc;
Name Null? Type
----------------------------------------------------- -------- ------------
ODIFID NOT NULL NUMBER(10)
TRPORGN NOT NULL VARCHAR2(5)
TRPDSTN NOT NULL VARCHAR2(5)
POSCOUNTRYCODE NOT NULL VARCHAR2(3)
PAXTYPE NOT NULL VARCHAR2(1)
DCP NOT NULL NUMBER(2)
ODIFDATE NOT NULL DATE
FRCDATE NOT NULL DATE
BKGMEAN NUMBER
BKGMEANINFLUENCED NUMBER
BKGVARIANCE NUMBER
XXLMEAN NUMBER
XXLMEANINFLUENCED NUMBER
XXLVARIANCE NUMBER
Here's my CTAS:
CREATE TABLE ODFRC_MONTHLY (
ODIFID,
TRPORGN,
TRPDSTN,
POSCOUNTRYCODE,
PAXTYPE,
DCP,
ODIFDATE,
FRCDATE,
BKGMEAN,
BKGMEANINFLUENCED,
BKGVARIANCE,
XXLMEAN,
XXLMEANINFLUENCED,
XXLVARIANCE,
CONSTRAINT ODFRC_MONTHLY_PK PRIMARY KEY
(ODIFID, ODIFDATE, TRPORGN, TRPDSTN,POSCOUNTRYCODE, PAXTYPE, DCP, FRCDATE)
) ORGANIZATION INDEX nologging parallel 8
PARTITION BY RANGE (ODIFDATE)
(PARTITION ODFRC_20021130 VALUES LESS THAN (TO_DATE('2002-12-01', 'YYYY-MM-DD'))
TABLESPACE ODFRC_IDX01 ,
PARTITION ODFRC_20021231 VALUES LESS THAN (TO_DATE('2003-01-01', 'YYYY-MM-DD'))
TABLESPACE ODFRC_IDX02 ,
PARTITION ODFRC_20030131 VALUES LESS THAN (TO_DATE('2003-02-01', 'YYYY-MM-DD'))
TABLESPACE ODFRC_IDX03 ,
PARTITION ODFRC_20030228 VALUES LESS THAN (TO_DATE('2003-03-01', 'YYYY-MM-DD'))
TABLESPACE ODFRC_IDX04 ,
PARTITION ODFRC_20030331 VALUES LESS THAN (TO_DATE('2003-04-01', 'YYYY-MM-DD'))
TABLESPACE ODFRC_IDX05 ,
PARTITION ODFRC_20030430 VALUES LESS THAN (TO_DATE('2003-05-01', 'YYYY-MM-DD'))
TABLESPACE ODFRC_IDX06 ,
PARTITION ODFRC_20030531 VALUES LESS THAN (TO_DATE('2003-06-01', 'YYYY-MM-DD'))
TABLESPACE ODFRC_IDX07 ,
PARTITION ODFRC_20030630 VALUES LESS THAN (TO_DATE('2003-07-01', 'YYYY-MM-DD'))
TABLESPACE ODFRC_IDX08 ,
PARTITION ODFRC_20030731 VALUES LESS THAN (TO_DATE('2003-08-01', 'YYYY-MM-DD'))
TABLESPACE ODFRC_IDX09 ,
PARTITION ODFRC_20030831 VALUES LESS THAN (TO_DATE('2003-09-01', 'YYYY-MM-DD'))
TABLESPACE ODFRC_IDX10 ,
PARTITION ODFRC_20030930 VALUES LESS THAN (TO_DATE('2003-10-01', 'YYYY-MM-DD'))
TABLESPACE ODFRC_IDX11 ,
PARTITION ODFRC_20031031 VALUES LESS THAN (TO_DATE('2003-11-01', 'YYYY-MM-DD'))
TABLESPACE ODFRC_IDX12 ,
PARTITION ODFRC_20031131 VALUES LESS THAN (TO_DATE('2003-12-01', 'YYYY-MM-DD'))
TABLESPACE ODFRC_IDX13) AS SELECT
ODIFID,
TRPORGN,
TRPDSTN,
POSCOUNTRYCODE,
PAXTYPE,
DCP,
ODIFDATE,
FRCDATE,
BKGMEAN,
BKGMEANINFLUENCED,
BKGVARIANCE,
XXLMEAN,
XXLMEANINFLUENCED,
XXLVARIANCE
FROM odfrc nologging;
The original ODFRC table has daily partition for one year. Anyway, I can estimate how long this will take?
-Peter
July 26, 2004 - 3:29 pm UTC
the nologging is in the wrong place
from odfrc nologging;
is the same thing as:
from odfrc a;
would you consider
a) create the table (empty, alter it to be nologging)
b) insert /*+ append */ into the individual partitions (instead of all at once) in parallel sessions?
Create Table with data from an aggregated sum of few fields from another table
RB, July 26, 2004 - 3:48 pm UTC
Tom: Qn related to creating a table with data from another table -
I want to create a table with few fields and aggregated some of few columns from
another table.
Here is a query:
create table SRC200401 as select ID, src, LOC, sum(BYTES) TOTAL, sum
(billable_bytes) TOTAL_BILLABLE from SRC_MONTHLY where ID = 1072915200 group by
ID,LOC, SRC;
I have 128Million record in the SRC_MONTHLY table and it is taking more than 1
hr to get this table created. I have index on ID, src and LOC fields.
Any faster way of getting this table created?
Followup:
so how many records have that id?
RB: Tom - This number varies - we have so many IDs in the master table. If I pass one id then the query will have one equi join with that ID if more than one I was planning to use IN clause. So I do not know how many records per id I will have it in the table at any given point of time.
July 26, 2004 - 3:51 pm UTC
now I'm confused -- the predicate is variant? why create *that* table then?
what is the goal here -- to make queries of the form:
select ID, src, LOC, sum(BYTES) TOTAL, sum
(billable_bytes) TOTAL_BILLABLE from SRC_MONTHLY where ID = :x group by id, loc, src;
faster in general? (you don't have to duplicate lots of text, it is all right here)
Great suggestion!
Peter Tran, July 26, 2004 - 3:58 pm UTC
Hi Tom,
Thanks for the useful suggestion.
1) Part one done (easy).
2) Is this what you mean for part 2? When you say parallel sessions, do you mean kick off a bunch of them using execute immediate?
INSERT /*+ append */ INTO odfrc_monthly(
ODIFID,TRPORGN,TRPDSTN,POSCOUNTRYCODE,
PAXTYPE,DCP,ODIFDATE,FRCDATE,BKGMEAN,
BKGMEANINFLUENCED,BKGVARIANCE,XXLMEAN,
XXLMEANINFLUENCED,XXLVARIANCE)
SELECT
ODIFID,TRPORGN,TRPDSTN,POSCOUNTRYCODE,
PAXTYPE,DCP,ODIFDATE,FRCDATE,BKGMEAN,
BKGMEANINFLUENCED,BKGVARIANCE,XXLMEAN,
XXLMEANINFLUENCED,XXLVARIANCE
FROM odfrc partition(ODFRC_20021114) nologging;
3) Should I still give it a nice juicy sort_area_size? :)
Thanks,
-Peter
July 26, 2004 - 5:13 pm UTC
2) fire up N sqlplus sessions and run the insert append in each
3) sure, let each session have a big SAS -- alter session can be used.
Create Table with data from an aggregated sum of few fields from another table
RB, July 26, 2004 - 7:57 pm UTC
Tom - If the user can select one or more ids. If I have more than one ID then I was planning to use an IN clause in the where clause. The temp table that I am creating will be used in a later phase of the app for other joins. What I am looking for a soln which will be must fater than my current approach. The Query that I have given with a 120M table is taking more than 1 hr to create the aggregated table.
July 26, 2004 - 8:19 pm UTC
what is the query plan in general then.
you would not create a temporary table in oracle -- that would be so sqlserver. just use that query in the "IN" statment in the first place!!!!!
Update...
Peter Tran, July 27, 2004 - 4:11 pm UTC
Hi Tom,
I wanted to give you an update on progress. The nice thing about your approach is I can monitor the progress, but it's not as fast I as thought it would be.
I started 6 sessions and gave each session a sort_area_size = 200M. I then executed a month's worth of insert each session. It's taking about 30 minutes per partition/day to insert from the source daily table to the monthly table. Each partition holds around 700K to 800K rows.
Should I expect to run this long?
Thanks,
-Peter
July 27, 2004 - 7:01 pm UTC
sounds long -- can you run one of the sessions with a 10046 level 12 trace and see what they might be waiting on?
sort_area_size, pga_aggregate_target, insert /*+ APPEND*/
Dusan Valasek, July 28, 2004 - 3:02 am UTC
Hi Tom,
I have two questions about setting of big sort_area_size before direct path load.
1) How does setting of sort_area_size (from session) affect direct path load? I thought that this is useful for sort operation and building of indexes.
2) If I use pga_aggregate_target, does setting of sort_area_size (from session) override pga_aggregate_target?
Thanks,
Dushan
July 28, 2004 - 8:14 am UTC
1) you are loading an IOT. big sort going on.
2) you would set the workarea policy to manual (in your session) to allow you to override the sort area size for this "batch operation"
A reader, July 28, 2004 - 12:10 pm UTC
Would you suggest to re-create the table when other users want to update it online
July 28, 2004 - 1:28 pm UTC
huh?
How to change a DATATYPE of column.
peru, July 30, 2004 - 7:54 am UTC
Hi Tom,
How to change a datatype of a particular column.
say a table with column VARCHAR2 for date.
Now i want to change the datatype DATE.
Assume that table has many records and referenced with
tables , procedures,triggers.
Thank you
July 30, 2004 - 4:43 pm UTC
not really going to happen.
you can add a new column that is a date, update this new column to be to_date() of the varchar2 and set the varchar2 to null, drop the old column and rename the newly added column to the old name.
Update to July 27, 2004
Peter Tran, August 22, 2004 - 11:11 am UTC
Hi Tom,
Sorry it took awhile to get back with you on this. You wanted to see a 10046 level 12 trace. I wanted to do some research first and I had to rebuild the table to reproduce the step.
Here's the trace.
insert /*+ append */ into odfrc_monthly select * from odfrc
partition(ODFRC_20040201)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.01 0 0 0 0
Execute 1 15.08 5776.28 11657 15874 111333 569939
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 15.09 5776.29 11657 15874 111333 569939
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 34
Rows Row Source Operation
------- ---------------------------------------------------
1 LOAD AS SELECT
569939 INDEX FAST FULL SCAN ODFRC_PK PARTITION: 185 185 (object id 12351)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
control file sequential read 8 0.00 0.00
enqueue 1911 3.00 5726.43
db file sequential read 38 0.03 0.20
db file scattered read 380 0.63 9.50
direct path write 19 0.00 0.00
direct path read 10 0.00 0.00
log buffer space 45 0.61 12.01
log file switch completion 2 0.28 0.33
log file sync 1 0.00 0.00
SQL*Net message to client 1 0.00 0.00
SQL*Net message from client 1 0.00 0.00
Jonathan Lewis wrote in his book that you cannot use direct-path insert using the APPEND hint into an IOT table. Can this be the reason for the large enqueue timed event? Are the 6 sessions waiting to lock the index to modify it?
Thanks,
-Peter
August 22, 2004 - 4:55 pm UTC
Umm, you were blocked by someone else for an excessively long period of time here. Enqueue waits:
enqueue 1911 3.00 5726.43
normal "blocker, blockee" problem here! doh, it was me doing it to you.
only one session at a time can append. only one session at a time can direct path insert into a table.
sorry -- use normal insert, I goofed.
delete 2M records without dropping table.
Sean, August 23, 2004 - 9:18 pm UTC
Hi Tom,
I have to delete 2M records from 6M records and the table has about 400 columns (Oracle 920, Solaris 9).
I understand that your suggestion of creating temp table with the records I needed, then drop the original table and change the temp table name.
But since our table is not that big and the application is using this table all the time, we try to use traditional delete method to accomplish this.
(1)If we have enough redo tablespace, what is the good number of rows for each delete. I tried each commit for 10000 records or 1000 records. Both are quite slow.
(2)Anything you can suggest to improve the speed of the delete in the following procedure.
Thanks so much for your help
------------------------------------------------------------
create or replace procedure sp_del_audit(p_total_loop_time number)
is
total_loop_time number := 1;
BEGIN
WHILE total_loop_time <= p_total_loop_time
LOOP
delete /*+ parallel */ from audit a
where rownum < 10000
and exists ( select /*+ parallel */ null from audit_temp b
where a.Pkey = b.Pkey and
a.row_type = b.row_type and
a.update_time = b.update_time);
commit;
total_loop_time := total_loop_time + 1;
END LOOP;
END;
/
------------------------------------------------------------
August 24, 2004 - 7:19 am UTC
1) assuming you meant UNDO tablespace, there is no such thing as a REDO tablespace...
removing 2 million rows, with 400 columns (every row in this table is CHAINED, when you have more than 255 columns...) and probably many indexes -- it is not going to be what you might term "speedy".
2) do it as a SINGLE delete (deleting 10,000 records in "parallel" doesn't even begin to make sense here. PQ is for BIG BIG BIG things. this is tiny)
delete from audit
where (pkey, rowi_type, update_time) in ( select pkey, row_type, update_time
from audit_temp);
nothing more, nothing less. but don't expect this to be super fast if this table is indexed. removing 33% of the index entries is going to take some amount of time.
Deleting 90% of rows
A reader, October 13, 2004 - 10:54 pm UTC
As part of some ETL processing, I need to delete 90% of the rows in a large (6million rows) table. This table has a PK index so traditional delete takes a long time. I see the following options
1. Mark the index unusable, delete and rebuild index nologging. This is significantly faster than plain delete
2. create new table as remaining 10% of the rows, drop, rename, etc
3. Precreate a regular or GTT and do insert /*+ append */ in my batch program.
(2) is not very attractive as I dont want to be doing DDL in my code.
How would you compare (1) and (3) above? Both of them would generate similar amounts of undo/redo, take roughly same amount of time?
Thanks
October 14, 2004 - 9:19 am UTC
ETL is 'special', it doesn't matter 5 times a minute.
I'd go with #2 actually, CTAS a newone, drop oldone, rename newone to oldone.
#3 is "dangerous" -- you could lose everything in the event of a failure.
A reader, October 14, 2004 - 11:02 am UTC
"#3 is "dangerous" -- you could lose everything in the event of a failure"
Why would I lose everything? What kind of failure? Instance or media failure? Oracle guarantees recoverability of commited transactions, right, why do you bring that up here?
The only reason I dont want to do (2) is that I usually want to prevent DDL in my code. The only difference between (2) and (3) is that the table is already created in (3), right? Yes, (3) would generate much more redo/undo (lesser if I create the table as nologging), but I thought using a GTT would cut down on the redo/undo as well?)
October 14, 2004 - 11:16 am UTC
i saw "use a gtt"
i could only assume you mean:
insert rows to keep into gtt
truncate table
insert rows from gtt into the table
that would be dangerous. if you fail after the truncate, before/during the insert -- gone, all gone. gtt would be dangerous here.
(truncate is ddl btw...)
How can you Display multiple rows in one record
Mack, October 14, 2004 - 2:34 pm UTC
Hi Tom,
Let's suppose in deptno 10, there are 3 to 5 employees, I want to see the emp names like
TOM, KIM, JOHN... and so on.
Is there an easy way to do it in SQL? Not through PL/SQL. The number of records are unknown, it could have 10, 20 or one hundred. Please advise.
October 14, 2004 - 7:42 pm UTC
collect() in 10g
stragg (search for it -- nasty plsql you install once and use over and over and over and over and over) in pre 10g
Query first and then update
A reader, October 15, 2004 - 1:23 am UTC
We have two tables with approximately 28 million and 35 million records respectively. These tables are joined to produce data to be displayed to users using IE browsers. Based on this information, around 10000 records get added to these tables and around the same updated daily. Our SLA is to display each screenful (100 rows) in 2-4 seconds.
While partitioning is being reviewed to improve the performance for the queries, could you let us know if there are any issues regarding partitions? For instance someone has reported that using global indexes on a partitioned table has degraded the performance.
October 15, 2004 - 11:19 am UTC
i seriously doubt partitioning is going to be used to increase the performance of these queries.
partitioning -- great for speeding up a full scan. are you suggesting to full scan and still return in 2-4 seconds?
"someone reported using global indexes ... degraded performance"
I just about fell out of my chair on that one. geez.
If you have my book "Effective Oracle by Design" -- I go into the "physics behind partitioning".
In order to return the first 100 rows to a web based application -- you are doing to be using indexes (or you are not going to be doing what you signed up to do -- funny, you have an SLA in place but no idea if you can live up to it...). whether the tables are partitioned or not probably won't have any bearing on making this faster. Partitioning could certainly make things slower if you do it wrong, but in a system like this, you would be looking at partitioning to make administration easier and hopefully not NEGATIVELY impact performance.
given two tables...
index access
get 100 rows
I personally would be shooting for well under 1 second response times for everything -- regardless of whether there was 1 row or 1 billion.
don't get the tie into "query first and then update" though.
A reader, October 18, 2004 - 1:33 am UTC
"Our SLA is to display each *screenful* (100 rows) in 2-4 seconds."
In our case the queries, after joining 2 tables of 28 million and 35 million records, could return 10 screens or 600 screens or many more screens based on the query parameters. Each screenful (with 100 records in each screen) should appear in 2-4 seconds.
October 18, 2004 - 8:13 am UTC
so, you've signed up for an SLA you have no idea if you can meet.
but hey -- using indexes to retrieve 100 rows from 1, 1000, 10000000000000 should be about the same amount of time (and way under subsecond).
but -- getting to "screen 600" is not. Look to google as the gold standard for searching and web pagination
o totally estimate the number of returned rows -- don't even THINK about giving an accurate count
o don't give them the ability to go to page "600", pages 1-10 is more than sufficient
o even if there is a page 600 -- realize it doesn't make sense to go there (no human could know "what I need is on page 600 -- 6,000 rows into this result set). Google stops you at page 99
o understand that page 2 takes more time to retrieve than page 1, page 50 more than 2, and so on (as you page through google --each page takes longer)
But perhaps most importantly -- laugh at people that say things like:
"For instance
someone has reported that using global indexes on a partitioned table has
degraded the performance."
and say -- "yes, it has when improperly applied it can degrade performance, there have also been sightings of systems where it didn't affect performance at all, there have also been reported instances of them massively improving performance. Now what we have to do is understand how the feature works, what it does, and how it might apply to our problem"
The neat thing about that paragraph -- it is INFINITELY reusable. You can use it with regards to any feature!
(if you have effective Oracle by design -- i go into the "physics" of partitioning and who -- without the judicious use of global indexes, your system could fall apart and run really really slow as well)
Senior Statistical Analyst
Hank Freeman, October 18, 2004 - 5:58 pm UTC
18 October 2004
Tom,
Thanks for the lob_replace code !!!
It worked wonders when trying to fix about 25,000 clobs with a known error. Here my detailed discussion in outline and then in detail.
1. What type of data was is in the CLOB
2. What went wrong to create the error
3. What was done to correct it.
a. Stored Proc
b. Declare syntax
c. VBA code in Excel
1. What type of data was is in the CLOB. The company has about 45,000 internet/intranet web pages stored in an Oracle database table field which is a CLOB. Meaning instead of calling the website from a file server or physical location the site, the entire website-webpages/html source code is held in the table at this CLOB field.
2. What went wrong to create the error. An error occurred when these records were being modified to remove a specific piece of information and a replacement null character was inserted. The null character for some unknown reason did not work and the web-page information in the CLOB field got garbage appended to the end of the web-page after the closing </html> tag.
a. It was easy to spot the records but a conventional update within would not work because of all the size of the web-pages being so large and the special characters that were found within and the unique nature of working with clobs.
b. I contacted a friend who pointed me to your site and URL: </code>
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:1533006062995 c. Upon examination of the last changes you made for a previous person, the solution came into focus clear on how to update the records and, it had to be done with a program.
3. What was done to correct the problem.
a. The last stored procedure you listed on your website was used, because it was applicable to our situation. This procedure allowed the creation of a specific version of the Declare sql to select the bad part of the CLOB record and replace it with a “ ” (space character) and not a null.
i. The Stored Proc syntax, is not listed since it is the Tom’s “lob_replace” syntax verbatim.
ii. The Declare statement is listed below would take the unique record id and the starting location of the bad information in the CLOB. The sql would then pass the CLOB, the data to find, and the replacing string (a space) to the stored procedure.
Declare Sql syntax
declare
l_lob clob;
l_substr varchar2(32767);
begin
select KS_DESCRIPTION into l_lob
from ks_longdescription
WHERE KS_LONGDESCRIPTIONID = 55012 for update;
select SUBSTR(KS_DESCRIPTION, 876, length(KS_DESCRIPTION) - 876) into l_substr
from ks_longdescription
WHERE KS_LONGDESCRIPTIONID = 55012;
hf_clob_replace (l_lob, l_substr, ' ');
end;
iii. It was decided to use Excel VBA to control the update process. It was contemplated to write a second procedure with for the declare statement and just call execute it with parameters. However, as you noticed the maximum value for the l_substr Varchar2 field was declared as 32767. It was already known that some of the erroneous parts of the CLOB data were larger than 25,000 characters. Therefore, Excel was used to ascertain which records were going to be within this limit and those outside a 25,000 character limit would require a manual fix.
iv. Approximately 6,000 records were loaded into an Excel worksheet and the following Macro code was written to process/update the records. This was the optimum method of performing the cleanup because the VBA code was written to decide:
1. Length of clob
2. Starting location of the junk in the clob
3. The unique record id was already known and with the aforementioned starting location being determined by the VBA code, it was a simple task to just pass there parameters so the VBA Declare SQL string and create a specific run for each qualifying record.
4. The VBA code then would execute the Declare SQL string, firing up the stored procedure hf_lob_replace, thus cleaning up the record.
5. If a record could be processed it was, otherwise it was marked for manual cleanup.
6. During the process it was determined that any l_Substr value greater than 20,000 would cause an Oracle error on the Buffer size. Therefore, upon see that the error would occur often any l_Substr(p_what) value greater then 20,000 was immediately marked for a “SQL Buffer Failure”, meaning it would have to be manually updated.. The important part here was that the process would not stop if did not like a specific record, it just kept going.
In Summary, it took less than ONE second to fix a record ,with an auto-commit turned on and we knew it was working for the process could be watched. Within four hours (6,000 per hour) using two systems, the 25,000 records were corrected with only 150 needing a manual fix, a highly acceptable result.
In closing, I would like to give credit to the following.
• Mr. Andy Sword, who is/was a subscriber to Tom’s web page and submitted to me the first draft of the Declare statement we used. This was what we really needed to know !!! I had never worked with CLOB, so this was a real leason.
• Tom’ Web page:
http://asktom.oracle.com/pls/ask/f?p=4950:1: • Tom Reed:
http://the-big-o.port5.com/index.shtml <code>where some of the best Excel/VBA code for interfacing with Oracle can be found.
• I have listed the VBA code below for those who are so inclined to read it. There is nothing there that is private and confidential. So anyone who can learn from it may feel free to do so.
Respectfully
Hank Freeman
FCS Ltd
hfreeman@msn.com
updating millions of records
Josphat, October 26, 2004 - 10:28 am UTC
You approach is quite ok for one line update but how about procedure which updates the next row based on value of previous row.
Kindly look at the following scripts and give hard suggestions on how 5 million rows could be processed in under 10 hours...capturing the business logic
----starting here
---3 Scripts to process over 5 million rows
---and constitute approx 40000 distinct identifiers (id)
---should exec as ordered
drop table temporary_emphist;
create table temporary_emphist
(ID VARCHAR2(15),PAYDATE DATE, AMOUNT NUMBER(10,2),
PPART NUMBER(13,2), IPART NUMBER(13,2), INTRATE NUMBER(5,2), OBAL NUMBER(13,2),
REF NUMBER(7), NOOFMONTHS NUMBER(13,2), ACCRUED_INT NUMBER(15,2))
/
-----values(ID ,PAYDATE,AMOUNT , PPART, IPART, INTRATE, OBAL, REF, NOOFMONTHS, ACCRUED_INT)
-----data for different ids being inserted (one id might have as many as 300 entries)
-----but i have put just a few for each id
----set for customer id 10979844
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE, OBAL)
values('10979844','30-jun-2003',2000,2,112000);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10979844','30-jul-2003',2000,2);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10979844','30-aug-2003',2000,2);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10979844','30-dec-2003',2000,2);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10979844','30-jan-2004',1950,2);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10979844','30-jun-2004',5000,2);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10979844','30-aug-2004',4000,2);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10979844','30-oct-2004',3000,2);
----set for customer id 10779844
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE, OBAL)
values('10779844','30-jun-2003',2000,2,99380);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10779844','30-jul-2003',2000,2);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10779844','30-aug-2003',2000,2);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10779844','30-dec-2003',2000,2);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10779844','30-jan-2004',1950,2);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10779844','30-jun-2004',5000,2);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10779844','30-aug-2004',4000,2);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10779844','30-oct-2004',3000,2);
----set for customer id 10679844
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE, OBAL)
values('10679844','30-jun-2003',800,4,40390);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-jul-2003',1500,4);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-aug-2003',1600,4);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-dec-2003',2000,4);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-jan-2004',1950,4);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-mar-2004',1950,4);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-jul-2004',1000,4);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-aug-2004',1000,4);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-sep-2004',1992,4);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-oct-2004',1200,4);
commit;
set lines 1000
select * from temporary_emphist
order by paydate,id;
create index temporary_emphistid on temporary_emphist(id);
create index temporary_emphistpd on temporary_emphist(paydate);
---script 1 (will number transactions for each customer from 1 to ..n then start from 1 for the next customer
DECLARE -----Procedure 19
V_2 NUMBER(7) :=1;
V_1 VARCHAR2(18);
v_3 VARCHAR2(18);
v_4 VARCHAR2(18) :='T';
v_5 date;
CURSOR DRECS IS
SELECT ROWID,id,paydate
FROM temporary_emphist
ORDER BY ID,paydate;
BEGIN
OPEN DRECS;
LOOP
FETCH DRECS INTO V_1,v_3,v_5;
EXIT WHEN DRECS%NOTFOUND;
if v_3 <> v_4 then
v_4 := v_3;
v_2 := 1;
end if;
UPDATE temporary_emphist
SET REF = V_2
WHERE ROWID = V_1;
V_2 := V_2+1;
END LOOP;
commit;
CLOSE DRECS;
END;
/
select * from temporary_emphist
order by id,paydate;
-----script 2
----gets the difference in months from one transaction to the next
DECLARE -----Procedure 20
V_1 NUMBER(7) :=0;
V_2 NUMBER(15,2):=0;
V_3 NUMBER(15,2) :=0;
v_4 NUMBER(15,2) :=0;
V_5 NUMBER(15,2) :=0;
V_6 NUMBER(15,2) :=0;
V_7 varchar2(15) :='0';
V_8 NUMBER(15,2) :=0;
v_9 number(15,2);
v10 number(35,10) := ROUND(1/12,10);
V11 number(15,2) ;
V12 DATE;
V14 DATE;
V13 NUMBER(13,2);
V15 number(5);
CURSOR UPLO IS
SELECT REF,OBAL,amount,ID,INTRATE,last_day(PAYDATE) FROM temporary_emphist
where nvl(ref,0)>1
ORDER BY ID, REF;
BEGIN
OPEN UPLO;
LOOP
FETCH UPLO INTO V_1,V_2,V_3,V_7,V11,V12;
EXIT WHEN UPLO%NOTFOUND;
---V13 :=V12;
uPDATE temporary_emphist
SET NOOFMONTHS=mONTHS_BETWEEN(v12,last_day(paydate))
WHERE NVL(REF,0) = V_1-1 and id = V_7 ;
END LOOP;
commit;
CLOSE UPLO;
END;
/
select * from temporary_emphist
order by id,paydate;
---script 3
-------------------------------------------puts interest per month
-------------------------------------------CALCULATES INTEREST ON OUTSTANDING BALANCE
---as per the numbering above(script 1) value of rows for customer
----will be manipulated to get values for the subsequent row though some input from the
----current row will be integrated with derivative of the previous to populate values into the
----current row. This is applied for each customer untill all are cleared
SELECT 'ESTABLISHING BALANCES ON REDUCING BASIS-----Procedure 21 STARTS ' FROM DUAL;
DECLARE -----Procedure 21
V_1 NUMBER(7) :=0;
V_2 NUMBER(15,2):=0;
V_3 NUMBER(15,2) :=0;
v_4 NUMBER(15,2) :=0;
V_5 NUMBER(15,2) :=0;
V_6 NUMBER(15,2) :=0;
V_7 varchar2(15) ;
V_8 NUMBER(15,2) :=0;
v_9 number(15,2);
v10 number(35,10) := ROUND(1/12,10);
V11 number(15,2) ;
V12 NUMBER(15,2) :=0;
V13 NUMBER(15,2);
v14 number(15,2) :=0; ---interest hold
v15 number(15,2) :=0; ---balance hold
v16 number(15,2) :=0; ---ipart hold
V20 VARCHAR2(18) :='T';
CURSOR UPLO IS
SELECT NVL(REF,0),NVL(OBAL,0),NVL(amount,0),ID,INTRATE,nvl(NOOFMONTHS,1) FROM temporary_emphist
ORDER BY ID,REF;
BEGIN
OPEN UPLO;
LOOP
FETCH UPLO INTO V_1,V_2,V_3,V_7,V11,V12;
EXIT WHEN UPLO%NOTFOUND;
if v_7<>v20 AND V_1=1 then
v20 :=v_7;
V_4 :=V_2-V_3;
v16 :=ROUND((((v_4)*POWER((1+V11/100),v10*V12)-V_4)/V12),2);
v15 :=v_4;
ELSE
select obal,INTRATE,nvl(NOOFMONTHS,1) into v_4,V11,V12 from temporary_emphist
where NVL(ref,0) = V_1 - 1 AND ID = V_7 ;
select nvl(amount,0) into v_3 from temporary_emphist
where NVL(ref,0) = V_1 AND ID = V_7 ;
v15 := ROUND((v_4)*POWER((1+V11/100),v10*V12),2)-v_3; -------obal
v14 :=ROUND((v_4)*POWER((1+V11/100),v10*V12),2)-V_4; ---accrued interest
v16 :=ROUND((((v_4)*POWER((1+V11/100),v10*V12)-V_4)/V12),2); --ipart
END IF;
UPDATE temporary_emphist
SET OBAL = v15,ACCRUED_INT =v14,IPART=v16
WHERE NVL(REF,0) =V_1 AND ID = V_7 ;
--END IF;
END LOOP;
commit;
CLOSE UPLO;
END;
/
select * from temporary_emphist
order by id,paydate;
I dont know whether there is a faster way to employ the logic...I've seen your authority..give it a short
October 26, 2004 - 10:42 am UTC
one word for you:
analytics
they rock, they roll. lag/lead -- i frankly did not really read all of your code (i'm not a compiler, v_1, v_2 etc make it really truly utterly hard to follow and comprehend) but if you need to update a row based on the prior row(s) after sorting -- analytics -- lag, lead, sum, whatever.
database
cp, October 26, 2004 - 10:45 am UTC
Hi Tom,
I have to appear for interview and need your help.
1. Hows to decide for datafile size?
2. How to decide for redolog file size?
3. Any other factors which you feel as interviewer to ask?
Please help me out as its really urgent
Regards
October 26, 2004 - 10:47 am UTC
how do you know they'll ask these questions -- not questions I would ask.
datafile size = pure function of amount of data you want to store. I mean, well, what else could you say?
redolog file size is a function of
o amount of redo generated by most of your transactions
o the number of these transactions you do in a period of time
predominantly (but if you are an experienced dba -- why not just fall back on your knowledge of what you've done in the past?)
problem
cp, October 26, 2004 - 1:42 pm UTC
So untill or unless we get some opportunity we cant learn the skills to decide for initial size of database right?
Please provide words about or facotrs to think on this
Thanks!
October 26, 2004 - 2:43 pm UTC
experience.
you get an entry level job (you have to sort of start in the basement) and get experience.
update of millions of records/rows
Josphat, October 29, 2004 - 10:29 am UTC
Hi Tom,
Thanks for decisive debate in this site.
Kindly go through my script cos now I have elaborated on
the v_1,v_2..etc just variables as you will see...SORRY for MY FIRST SUBMISSION. I will be grateful if ya suggestion /code
could reduce my processing time for 5 million records from
15 hrs to 1, 2, 3? .
---Script to process over 5 million rows
---and constitute approx 40000 distinct identifiers (id)
-------------------------------------------puts interest per month
-------------------------------------------CALCULATES INTEREST ON OUTSTANDING BALANCE
-----EACH customer HAS OWN ROWS STARTING 1..N AND REPRESENTED BY REF COLUMN ON TABLE
-----MULTIPLIER FACTOR FOR EACH MONTH IS REPRESENTED BY NOOFMONTHS
-----will be manipulated to get values for the subsequent row though some input from the
-----current row will be integrated with derivative of the previous to populate values into the
-----current row. This is applied for each customer untill all are cleared
-----PROCESSING IS SEQUENCIAL (I.E FROM REF 1 ..N FOR EACH CUSTOMER)
-----REF =1 FOR EACH CUSTOMER HOLDS A VALUE FOR OBAL WHICH WILL BE USED TO FILL UP
-----THE SUBSEQUENT ROWS AFTER MANIPULATION WITH MULTIPLIER FACTOR
----work starts next
drop table temporary_emphist;
create table temporary_emphist
(ID VARCHAR2(15),PAYDATE DATE, AMOUNT NUMBER(10,2),
PPART NUMBER(13,2), IPART NUMBER(13,2), INTRATE NUMBER(5,2), OBAL NUMBER(13,2),
REF NUMBER(7), NOOFMONTHS NUMBER(13,2), ACCRUED_INT NUMBER(15,2))
/
----DATA set for customer id 10979844
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE, OBAL,ref,NOOFMONTHS)
values('10979844','30-jun-2003',2000,2,112000,1,1);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10979844','30-jul-2003',2000,2,2,1);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10979844','30-aug-2003',2000,2,3,4);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10979844','30-dec-2003',2000,2,4,1);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10979844','30-jan-2004',1950,2,5,5);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10979844','30-jun-2004',5000,2,6,2);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10979844','30-aug-2004',4000,2,7,2);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10979844','30-oct-2004',3000,2,8,0);
----DATA set for customer id 10779844
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE, OBAL,ref,NOOFMONTHS)
values('10779844','30-jun-2003',2000,2,99380,1,1);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10779844','30-jul-2003',2000,2,2,1);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10779844','30-aug-2003',2000,2,3,4);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10779844','30-dec-2003',2000,2,4,1);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10779844','30-jan-2004',1950,2,5,5);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10779844','30-jun-2004',5000,2,6,2);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10779844','30-aug-2004',4000,2,7,2);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10779844','30-oct-2004',3000,2,8,0);
----DATA set for customer id 10679844
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE, OBAL,ref,NOOFMONTHS)
values('10679844','30-jun-2003',800,4,40390,1,1);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-jul-2003',1500,4,2,1);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-aug-2003',1600,4,3,4);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-dec-2003',2000,4,4,1);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-jan-2004',1950,4,5,2);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-mar-2004',1950,4,6,4);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-jul-2004',1000,4,7,1);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-aug-2004',1000,4,8,1);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-sep-2004',1992,4,9,1);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-oct-2004',1200,4,10,0);
commit;
set lines 1000
select * from temporary_emphist
order by paydate,id;
create index temporary_emphistid on temporary_emphist(id);
create index temporary_emphistpd on temporary_emphist(paydate);
---script STARTS
DECLARE
V1_REF temporary_emphist.REF%TYPE;
V2_OBAL temporary_emphist.OBAL%TYPE;
V3_AMOUNT temporary_emphist.amount%TYPE;
v_4 NUMBER(15,2) :=0;
V5 NUMBER(15,2) :=0;
V6 NUMBER(15,2) :=0;
V7_ID Temporary_emphist.ID%TYPE;
V8 NUMBER(15,2) :=0;
v9 number(15,2);
v10 number(35,10) := ROUND(1/12,10); ---CONVERTS ONE MONTH INTO YEARS
V11_INTRATE temporary_emphist.INTRATE%TYPE;
V12_NOOFMONTHS temporary_emphist.NOOFMONTHS%TYPE;
V13 NUMBER(15,2);
v14 number(15,2) :=0; ---interest hold
v15 number(15,2) :=0; ---balance hold AS CALCULATED
v16 number(15,2) :=0; ---ipart holdING INTEREST AS CALCULATED
V20 VARCHAR2(18) :='T'; ----ARBITRALY KEY TO ENABLE FOCUS ON DIFFERENT IDS
CURSOR UPLO IS
SELECT NVL(REF,0),NVL(OBAL,0),NVL(amount,0),ID,INTRATE,nvl(NOOFMONTHS,1) FROM temporary_emphist
ORDER BY ID,REF;
BEGIN
OPEN UPLO;
LOOP
FETCH UPLO INTO V1_REF,V2_OBAL,V3_AMOUNT,V7_ID,V11_INTRATE,V12_NOOFMONTHS;
EXIT WHEN UPLO%NOTFOUND;
----V20 IS VARIABLE KEY FOR CHECKING WHETHER WORKING ON SAME ID
if v7_ID<>v20 AND V1_REF=1 then
v20 :=v7_ID;
V_4 :=V2_OBAL-V3_AMOUNT;
v16 :=ROUND((((v_4)*POWER((1+V11_INTRATE/100),v10*V12_NOOFMONTHS)-V_4)/V12_NOOFMONTHS),2);
v15 :=v_4;
ELSE
select obal,INTRATE,nvl(NOOFMONTHS,1) into v_4,V11_INTRATE,V12_NOOFMONTHS from temporary_emphist
where NVL(ref,0) = V1_REF - 1 AND ID = V7_ID ;
select nvl(amount,0) into v3_AMOUNT from temporary_emphist
where NVL(ref,0) = V1_REF AND ID = V7_ID ;
--------V10 IS ONE MONTH COVERTED INTO YEARS
v15 := ROUND((v_4)*POWER((1+V11_INTRATE/100),v10*V12_NOOFMONTHS),2)-v3_AMOUNT; -------CALCULATED obal
v14 :=ROUND((v_4)*POWER((1+V11_INTRATE/100),v10*V12_NOOFMONTHS),2)-V_4; ---CALCULATED accrued interest
v16 :=ROUND((((v_4)*POWER((1+V11_INTRATE/100),v10*V12_NOOFMONTHS)-V_4)/V12_NOOFMONTHS),2); --ipart
END IF;
UPDATE temporary_emphist
SET OBAL = v15,ACCRUED_INT =v14,IPART=v16
WHERE NVL(REF,0) =V1_REF AND ID = V7_ID ;
--END IF;
END LOOP;
commit;
CLOSE UPLO;
END;
/
select * from temporary_emphist
order by id,paydate;
spool off
October 29, 2004 - 10:49 am UTC
this isn't the right place for "recode my application".
show me your attempt at using analytics.
(UPDATE temporary_emphist
SET OBAL = v15,ACCRUED_INT =v14,IPART=v16
WHERE NVL(REF,0) =V1_REF AND ID = V7_ID ;
is still really obscure, comments didn't "do much" really)
Reader
dheeraj, November 21, 2004 - 3:35 pm UTC
Excellent discussion as usual on this site.
Selecting data over a database link from 7.3.4 to 10g
Stewart W. Bryson, December 02, 2004 - 10:54 am UTC
Tom:
If I use the "INSERT /*+ APPEND */ ..." statement on a 10g database, but the "SELECT..." part of the statement is over a database link to a 7.3.4 database, would the bulk-loader still be invoked, or would the hint be ignored?
Also, if I do a "MERGE... " statement with the "INSERT..." part of the statement being an "INSERT /*+ APPEND */...", would the MERGE work correctly, and would the bulk-loader be used?
Thanks for your time.
December 02, 2004 - 11:54 am UTC
trick question
it isn't going to work -- 9ir2 and later do not interoperate with 7.3 and earlier.
if it did work
insert /*+ append */ into local_table select * from remote_table@dblink;
could be direct pathed.
insert /*+ append */ is supported only for insert as select -- not insert values.
ops$tkyte@ORA9IR2> create table t ( x varchar2(25), y varchar2(25) );
Table created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> merge into t
2 using dual
3 on ( t.x = dual.dummy )
4 when matched then update set y = dummy
5 when not matched then insert /*+ append */ values ( dummy, null );
1 row merged.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select * from t;
X Y
------------------------- -------------------------
X
ops$tkyte@ORA9IR2> insert /*+ append */ into t values ( 'y', null );
1 row created.
ops$tkyte@ORA9IR2> select * from t;
X Y
------------------------- -------------------------
X
y
ops$tkyte@ORA9IR2> insert /*+ append */ into t select 'z', null from dual;
1 row created.
ops$tkyte@ORA9IR2> select * from t;
select * from t
*
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel
ops$tkyte@ORA9IR2>
<b>only the last insert append was a direct path...</b>
A follow-up question
Stewart W. Bryson, December 02, 2004 - 1:20 pm UTC
Thanks for the good information Tom.
I've looked and I've looked, but I still cannot find the information about 9iR2 and 7.3.4 database links stated in the documentation. As a matter of fact, the 9iR2 Migration guide lists that 7.3.4 Materialized View sites can interact with 9iR2 master sites, and vice versa:
</code>
http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96530/migrep.htm#620714 <code>
As Advanced Replication utilizes database links, this seems a little contradictory. Of course, if you tell me once more (I'm a bit stuborn) that it won't work, I promise to believe you.
As I am currently architecting a solution to try and bring some of the data from our European sites into a central repository, and as I don't yet have access to those sites, I cannot simply test it as we have no 7.3.4 sites (thank God) in the US.
I appreciate your time and understanding.
December 02, 2004 - 7:21 pm UTC
contact support -- they will confirm
version 7.3 was so far out of any sort of support at the time of 9ir2 being released that they never even considered testing it. and it does not work.
I believe that documentation to be erroneously left over from 9ir1.
mg, December 03, 2004 - 10:30 am UTC
Hi Tom,
I have a table T in tablespace TS1.
Every day so much of insertions on table T.
So I needs to check the a perticular size of TS is exceeds, if then I needs to move the older data to another tablespace.
So could you please tell me the best way to do this?
meny thanks in advance.
December 03, 2004 - 11:01 am UTC
you sound like you really want a rolling window partitioned table -- a table partitioned by date.
A reader, December 06, 2004 - 10:20 am UTC
Can we use trigger to check the % of filled tablespace size ?
December 06, 2004 - 12:35 pm UTC
yes, but -- why? just to make the code run really really slow?
What about the larger tables?
Steven, December 15, 2004 - 3:44 pm UTC
I used to work at a place where the main table had 4 billion rows and growing. This tracked manufacturing details and rows for an item trickled in and then were "archived" (written to tape then deleted) as one. Finding a good partitioning scheme was difficult due to the large variable time that items could stay in the manufacturing process. That is, two rows inserted on the same day could be archived on vastly different timescales.
The system had around 40 million rows inserted and deleted on a daily basis (about 1%). Some of the items could have 1 million or more records; they needed to be archived at the same time. The table had a PK and a secondary index.
Obviously CTAS was not an option due to the 24x7 availability requirement and besides we usually didn't have enough space for a full copy of the main table.
Is this a case where you just kick off the delete and take your lumps?
(Hmm, given past growth curves, the table should by now be up to 20 billion rows and 200 million rows in/out daily.)
December 15, 2004 - 6:25 pm UTC
or, enable row movement....
partition by a date field, the date field represents "record done as of"
when you initially insert it, it has a "done as of" way into the future (in the active partition(s))
when you are done, you update this field and it moves from active into the "you can archive me later" partitions.
It would be like the delete but the delete would be happening continously as little transactions instead of a big delete all at once.
data warehouse daily load
shgoh, December 16, 2004 - 3:58 am UTC
Dear Tom,
One of my dw developer is having performance problem with their daily data load from legacy system. Approximately 5-8 million records inserted into a table (append load table). The table has a lots of constraints and none of us are familiar with the data given (it could have a constraint voilation).
What is the method to speed up the data loading process beside parallel, direct load and nologging ?
Appreciated if you can share your view. Since we don't know well about the data and the constraints, is it advisable to disbale constraint in this case. Thanks
Rgds
SHGoh
December 16, 2004 - 8:18 am UTC
parallel
direct
nologging
they are them. how long is "long"
if you are adding 5/8 million records a day -- are you using partitioning? you can look at disabling portions of partitioned indexes and rebuilding them after the fact.
Not a partionioning table
shgoh, December 17, 2004 - 4:14 am UTC
Dear Tom,
It is not a partioning table. Would it help if I go for parallel , nologging, direct and disable portion of the constraints? Thanks
Rgds
Shgoh
December 17, 2004 - 2:30 pm UTC
you are already doing parallel, nologging, direct -- disabling the constraints only means "gotta enable them".
you asked for additional ideas. partitioning.
How about Bulk Update?
Tony, December 22, 2004 - 5:13 am UTC
Tom,
I have two tables PO_HDR and PO_DTL having millions of records. I want to join the two tables, and update a flag in PO_HDR table. Can I use Bulk Update based on rowid here?
Example:
DECLARE
TYPE tblrowid IS TABLE OF ROWID INDEX BY BINARY_INTEGER;
arr_rowid tblrowid;
CURSOR po_cur
IS SELECT a.rowid FROM po_hdr a, po_dtl b
WHERE a.poseq = b.poseq
AND a.custid = b.custid
AND b.po_qty <> b.del_qty;
BEGIN
OPEN po_cur;
LOOP
FETCH po_cur BULK COLLECT INTO arr_rowid LIMIT 100000;
EXIT WHEN arr_rowid.COUNT = 0;
FORALL i IN arr_rowid.FIRST..arr_rowid.LAST
UPDATE po_hdr
SET del_flg = 'Partial'
WHERE rowid = arr_rowid(i);
COMMIT;
END LOOP;
CLOSE po_cur;
END;
/
Is rowid based bulk update is fast, and reliable? Is there any better, and faster approach?
Please advice me.
December 22, 2004 - 9:41 am UTC
update po_hdr
set del_flg = 'Partial'
where (poseq,custid) in ( select poseq,custid
from po_dtl
where po_qty <> del_qty );
period.
To update or insert by sql statement and not procedurally
DAV, December 23, 2004 - 2:44 pm UTC
Hi,Tom,
I have 2 questions:
The one is :
By the update - sql statement that you wrote above
say i want to execute on millions records, and i want to make commit every 1000 records
how can i do it?
The second question is:
If i have to use pl/sql table and i want to load
alot of records to the table say 70,000- have i do it by:
fetch bulk bind limit 1000
or SELECT ... BULK BIND ....?
Why is it a problem to load many records to pl/sql tables?
Thank's
DAV
December 23, 2004 - 3:08 pm UTC
you don't want to commit every 1000 records, that would take longer, consume more resources and break your transactional integrity.
if you have 70k rows to process, and want to do it 1k at a time, you'll be using fetch ... bulk collect ... limit 1000;
can you spell "memory"
Tom,sorry i didn't understand you.
DAV, December 23, 2004 - 3:49 pm UTC
Why fetch is better than sql statement at the case that written above. You told before that sql is better.
Can you answer me please to the second question thar i wrote above?
Tnank you very much.
DAV
December 23, 2004 - 4:30 pm UTC
not understanding you.
you asked:
If i have to use pl/sql table and i want to load
alot of records to the table say 70,000- have i do it by:
fetch bulk bind limit 1000
or SELECT ... BULK BIND ....?
Why is it a problem to load many records to pl/sql tables?
and I answered??? the question wasn't "single sql vs procedural code"
you started with "if I HAVE TO USE..."
you would not select BULK COLLECT for 70k records, you would use fetch bulk collect with the limit clause.
I was unclear - let me explain
A reader, December 23, 2004 - 5:12 pm UTC
Sorry ,Tom,
i didn't explain myself well.
What i meant was:
First:
if i have to update million records in the plsql procedure
is it better to do it in update sql statement or procedural kode like
fetch.....? i want to commit every n records because of rollback segment so if i use sql update statement how can i do it?
Second:
General question - IF I USE PROCEDURAL CODE - why it is not good to load say 40k of records to plsql table by SELECT...BULK BIND... but by fetch bulk bind limit ...
As i know plsql table's records are kept in pga memory,so what happend if there will be 40k of records?
Please,clarify the theory.
Thank you very much.
December 23, 2004 - 8:03 pm UTC
you need to size your rollback segment large enough to do you job. seriously -- really you do.
a million records is nothing.
big_table@ORA9IR2> update big_table set owner = lower(owner);
1000000 rows updated.
big_table@ORA9IR2> select used_ublk*8/1024 from v$transaction;
USED_UBLK*8/1024
----------------
92.4921875
big_table@ORA9IR2>
93meg, big deal.
If you commit every N records -- you do realize you have to make your code restartable somehow, so when you inflict the ora-1555 on YOURSELF by committing, or fail for whatever other reason -- you can restart where you left off. Now we are talking some code!
If you have to update tons of records and tons of columns AND they are indexed, disable indexes, consider recreating the table instead of a mass update (if you are updating millions of records -- this is going to be an offline thing anyway, you might be supporting queries but you are not going to have "high concurrency")
can you spell "memory"
and think about streaming -- keeping everyone busy. So you wait for a long time while oracle gets 40k records, jams them into ram, manages them, then you process them and do it all over. doing things in bite sized pieces is easier for everyone - couple hundred in , couple hundred out.
Update 1000000 in sql statement
A reader, December 24, 2004 - 1:24 am UTC
Thank's, Tom!
What is the message ora-1555?I haven't a guide.
And i really can't findthe way to mmit n rows in update sql statement.May you give me some idea?
About loading to PL/SQL table for example 80k, if i need procedural code of update at least 1000000 records and not update sql statement so why can'not i avoid pl/sql table and do it just by regular cursor?
December 24, 2004 - 9:01 am UTC
search this site for 1555, you'll see lots and lots. check out Expert one on one Oracle -- i wrote about it lots.
YOU DO NOT WANT TO COMMIT THE ROWS INCREMENTALLY. that is my fundemental premise here.
If you have access to my book "Expert one on one Oracle" -- you can read all about it.
Another question
A reader, December 24, 2004 - 3:58 pm UTC
About loading to PL/SQL table for example 80k, if i need procedural code of
update at least 1000000 records and not update sql statement so why can'not i
avoid pl/sql table and do it just by regular cursor?
Thank's
December 24, 2004 - 4:06 pm UTC
slow by slow is slow by slow (opps, keep doing that don't I, row by row is row by row)
Sorry,misunderstanding.
A reader, December 25, 2004 - 4:21 am UTC
I didn't explain myself well.
What i meant in my question is a generall question:
If a need a cursor ( and not sql statement) for some process that acceses to a big table,why is it better to load data to pl/sql table and run on plsql table than just fetch records directly from the db table ?
December 25, 2004 - 9:10 am UTC
I understood, you didn't understand me.
slow by slow processing (a row at a time)
is slower than
o a single sql statement
o bulk collect and forall updates
doing things in BULK is
o faster
o uses less resources
index DDL in the midst of transactions
Nik Malenovic, December 30, 2004 - 3:36 pm UTC
Tom,
this is a followup to your March 07, 2004 followup related to ddl operations on indecies whilst there are outstanding transactions in progress on the underlying table. You said
"that'll happen whilst there are outstanding transactions, yes. You'll want to keep trying until you get a chance to drop it."
We just ran into a similar production problem: the system was slow because we were missing an index on a column and there were updates running full table scans (don't ask!). Since there are users banging on the darn thing, there was no way to shutdown and add index so we had to do it on the fly. Here's what we did:
declare
l_attempt pls_integer := 0;
l_limit pls_integer := 2000*10; -- 10 seconds
l_rindex binary_integer;
l_slno binary_integer;
l_target binary_integer;
begin
l_rindex := dbms_application_info.set_session_longops_nohint;
l_target := 42;
dbms_application_info.set_client_info('Vincent The Builder');
dbms_application_info.set_module('rebuild_index', 'between_transactions');
for l_attempt in 1 .. l_limit
loop
begin
dbms_application_info.set_session_longops
(
rindex => l_rindex,
slno => l_slno,
op_name => 'rebuilding the index',
target => l_target,
context => 0,
sofar => l_attempt,
totalwork => l_limit,
target_desc => 'attempt',
units => 'attempts'
);
execute immediate 'create index RESPONSE_QUESTIONID_INDX on response (questionid) nologging parallel 5';
dbms_output.put_line('SUCCESS - now rebuilding');
exit;
exception
when others then
null;
end;
if l_attempt = l_limit then
dbms_output.put_line('FAILED - there was not a phat chance to succeed');
end if;
end loop;
dbms_application_info.set_module(null, null);
dbms_application_info.set_client_info(null);
end;
/
and in another session we monitored the progress with
select sid, message from v$session_longops where time_remaining > 0;
that's all. This may be useful for somebody in a similar situation, hence the post.
thanks,
Nik
December 30, 2004 - 4:10 pm UTC
you could try (for maximum "uptime")
create index .. ONLINE;
that would
a) block and wait for outstanding transactions to complete but...
b) block new transactions from starting and
c) once it started creating -- lets the new transactions proceeed and
d) at the very very end -- does a/b again to finish
reduces the amount of "downtime" by allowing people to still work in the middle (yes, they'll be full scanning, but working -- if that was something you wanted..)
CTAS doesnt have compute statistics?
A reader, January 06, 2005 - 11:15 am UTC
create index .... compute statistics;
Why doesnt create table have a similar compute statistics clause?
I have a CTAS that takes 5 hours during which Oracle already knows everything about the data (similar to create index). Gathering stats on this table later takes another 5 hours!
Is there a way to gather stats during the CTAS itself?
Thanks
January 06, 2005 - 11:31 am UTC
Oracle does not already know about everything.
Have you traced a dbms_stats.gather_table_stats to see how many queries, in how many different ways it gathers stuff?
about the only thing Oracle knows after a CTAS is number of rows and number of blocks (and so do you, you can set that)
there is no way it was computing histograms and such
I don't believe it takes 5 hours after the fact -- if it does, have you considered "estimate"
robin, January 18, 2005 - 8:53 am UTC
I am trying to modify your code from above:
/**********************************************/
set (c1,c2,c3) = ( select max(decode(segment3,'xxxxx',ccid)),
max(decode(segment3,'zzzzz',ccid)),
max(decode(segment3,'wwwww',ccid))
FROM gl_code_combinations t2,
gl_sets_of_books gb
WHERE t2.segment5 = 'yyyyy'
and t2.segment4 = t1.s4
and t2.segment3 in ('xxxxx','zzzzzz', 'wwwww')
and t2.segment2 = t1.s2
and t2.segment1 = t1.s1
and t2.coa_id = gb.coa_id
and gb.set_of_books_id = 1 )
/***************************************************/
to allow me to update one column based on several different where clauses and having no luck.
What I want to do is:
update table t1
set t1.c1 = 'X' where t1.c5 and t1.c6 are null
set t1.c1 = 'A' where t1.4 is not null and t1.c8 > sysdate
set t1.c1 = 'B' where t1.4 is null and t1.c8 < sysdate
etc....
I've tried a case statement and decode like above, but neither seems to compile....thanks
January 18, 2005 - 10:07 am UTC
well, in answer to the "what I want to do is"
update table t1
set t1.c1 = (case when c5 is null and c6 is null
then 'X'
when c4 is not null and c8 > sysdate
then 'Y'
when c4 is null and c8 < sysdate
then 'B'
end )
where (c5 is null and c6 is null )
or (c4 is not null anc c8 > sysdate)
or (c4 is null and c8 < sysdate)
I cannot comment on the other one -- well, because there isn't anything to comment on :)
updating large table
Tracy, January 20, 2005 - 9:50 am UTC
I have table_a with 50 million rows and table_b with 10 million rows.
table_a has a new column added called statid
Table_b has statid and tabaid. tabaid has a unique constraint on it and it is also a foreign key to the primary key of table_a (id)
I want to update all rows in table_a to set the statid column to match the statid column from table_b where the tabaid column in table_b = the primary key in table_a
which of the following two do you think is better, and is there some other altogether different way which would be better still?
UPDATE
(SELECT a.statid newval
,b.statid oldval
FROM table_a a
,table_b b
WHERE a.id=b.tabaid)
SET newval = oldval
or
UPDATE table_a a SET a.statid =
(SELECT b.statid
FROM table_b b
WHERE a.id=b.tabaid)
I've tested the above on a subset of the data using rownum, and both statements seem to update at a rate of about 300,000 rows per hour, but I would like to get it going faster than this.
January 20, 2005 - 10:46 am UTC
update the join.
expect to see a nice big juicy full scan and hash join. it'll be as good as it gets.
I have a similar situation where I must update millions of records fast
Ryan, February 09, 2005 - 10:58 pm UTC
First I want to say thank you very much for all your help this last week Tom, it has helped me out a lot. Unfortunately Oracle is not easy so I have another question.
I am updating millions of records using insert statements.
I am updating the records using insert with values. I'm not updating one table using another table.
example:
insert into dbname (FirstName, Number) values ('Ryan', 1);
called with different records millions of times.
When I call "commit" at the end of all the insert's it takes more than 17 hours to commit. Of course this was importing records from a 1.5 gigabyte text file with 83000 records. The time to call all insert statements was less than a half an hour. Can you think of any reasons why the commit would be so slow. There is another program at the office that can import the same data (in another database format) in less than 20 minutes and that program is calling insert and commit in a similar fashion so it can't be that oracle is just that slow. I'm also disabling all triggers at the beginning of the import, and enabling them at the end. I'm turning off logging and using /*+ APPEND */ and parallel hints but they probably don't help as I'm not using direct load inserts.
Any help would be greatly appreciated.
February 10, 2005 - 1:07 am UTC
<quote>
I am updating millions of records using insert statements.
</quote>
really?
sorry-- but i don't believe you on your commment about commit, you are mistaken.
committing is fast, regardless of the number of records. Your method of observation is somehow "flawed".
Unless of course you have an "on commit refresh" set of mv's or something.
So, turn on 10046 level 12 trace and trace it, use tkprof to report on it and see what you see.
query retrival slow it effects in the whole system
Praveen, February 10, 2005 - 3:41 pm UTC
hi Tom
When developer using the escape sequence in the selection criteria the response time is slow upto 15sec. this delay time effect the whole system. Kindly suggest the appropriate way to resolve this.
select count(*)
from pf_document_rv where
((r_folder_path like '/Jobs/52021/Assembly/UNDER~_BAR/%' escape '~')
and (lower(r_object_id) <> lower('f004000000010471')))
takes about 15 secs to run
select count(*)
from pf_document_rv where
((r_folder_path like '/Jobs/52021/Assembly/NOBAR/%')
and (lower(r_object_id) <> lower('f004000000010471')))
takes less then one second.
Perhaps this is helpful ... if I add the escape '~' specifier to this query it also runs slowly even though it doesn't use the escape character.
regards
PKR
February 11, 2005 - 7:38 pm UTC
so, why do you believe this has anything to do with escape?
(you do understand these two queries return totally different result sets and perhaps that -- that is the "difference")
have you tkprof'ed these queries and compared them?
RE : Row Migration
A reader, February 17, 2005 - 3:35 pm UTC
Hi Tom,
You mentioned in one of the above discussion threads that while updating a null-column to a not null value we have to make sure that we have enough PCTFREE and no row migration happens.
I have a table that gets 2 million rows on a daily basis and I want to update say a column for only 50000 rows.
1. What should be my approach
A. if the column to update is indexed.
B. if the column to update is not indexed
C. How do I find out row migration that may happen
during the update
Thanks
February 17, 2005 - 6:58 pm UTC
regardless of whether the column is indexed or not -- the approach would be "the same". the index will make the update take longer as it does lots more work, but for just 50,000 rows -- it won't be a very big deal.
2,000,000 and 50,000 are only big if it is US dollars and it was your bonus and paycheck. In a database, in 2005, they are very small numbers.
ops$tkyte@ORA9IR2> set timing on
ops$tkyte@ORA9IR2> update big_table set owner = owner||'x' where mod(id,40) = 0;
50000 rows updated.
Elapsed: 00:00:18.02
ops$tkyte@ORA9IR2> create index big_table_idx on big_table(owner);
Index created.
Elapsed: 00:00:27.84
ops$tkyte@ORA9IR2> update big_table set owner = owner||'y' where mod(id,40) = 0;
50000 rows updated.
Elapsed: 00:00:28.04
ops$tkyte@ORA9IR2> select count(*) from big_table;
COUNT(*)
----------
2000000
Elapsed: 00:00:03.02
ops$tkyte@ORA9IR2>
to discover if rows migrated, you would analyze the table (perhaps using the list chained rows, if you just analyze it, you'll get chain_cnt filled in the dictionary)
you cannot find out "before" the update what will happen.
RE : Update
A reader, February 18, 2005 - 8:59 am UTC
Tom,
I think I did not make the last statement regarding the UPDATE of 50000+ rows more clear. Here is my requirement :
1. I have a flat-file coming in to our system around 10.00 AM in the morning with over 2 million records and has the following fields :
ACCOUNT_NUMBER
SECURITY_ID
COST_BASIS
....
....
2. I have another flat file (a delta file containing only data that has changed) coming in on a daily-basis around 2.00 AM in the morning with around 50000+ records and has the following fields :
ACCOUNT_NUMBER
SECURITY_ID
.....
.....
which we directly load into a table B with the following fields :
ACCOUNT_NUMBER
SECURITY_ID
COST_BASIS <== which is set to NULL during this load
.....
.....
3. Once this file is loaded, we run a process which takes unmodified data (from prior run ie. which did not appear in file in step 2) so that the table B always has full-load which is the 2 million records I was talking about
4. We need to run a process that will update the COST_BASIS field only for modified data (ie. that appeared in today's delta load which will be the 50000+ records or so that I need to update. For data is brought over from previous run, we use prior run's value itself)
In order to perform this, I thought of using the 2 million file as an external table (say table A) and join this table with B and update the COST_BASIS from A to B for the modified records. We have the logic to identify the changed records however. But my concern is that the table A has to be full-scanned for 50000+ records from B and no indexes can be created on A.
I kindly need your suggestion as to what would be the best approach in tackling this problem.
February 18, 2005 - 9:24 am UTC
why would A be full scanned more than once in a join?
RE : Full Scan
A reader, February 18, 2005 - 10:13 am UTC
Tom,
This is because I thought if we join two tables, then it will do a Nested Loop join. For each row got from table B, it would have to full scan A to get matching rows and also that B (delta table) would be my driving table (since it has fewer number of rows to start with)
If my understanding is not correct, please advise.
February 18, 2005 - 1:59 pm UTC
hash join it would do in light of no indexes.
hash join it would do even in the presence of indexes because the CBO is smart and knows indexes are evil when most or all of the rows are going to be processed.
it would be a nice big juicy fast hash join.
Use MERGE command to combine steps 3 and 4
Logan Palanisamy, February 18, 2005 - 12:51 pm UTC
Combine steps 3 and 4 with the merge command. Your problem is solved. The External table with 2 million records will be scanned just once.
MERGE INTO B
USING A
ON (B.ACCOUNT_NUMBER = A.ACCOUNT_NUMBER)
WHEN MATCHED THEN UPDATE
SET B.COST_BASIS = A.COST_BASIS
WHEN NOT MATCHED THEN INSERT (B.ACCOUNT_NUMBER, B.SECURITY_ID, B.COST_BASIS, ..) VALUES (A.ACCOUNT_NUMBER, A.SECURITY_ID, A.COST_BASIS, ..)
"A" is your external table with 2 million records.
"B" is the one that initally contains 50,000+ records. After the merge, B will have 2 million records.
The INSERT clause is step number 3
The UPDATE clause is step number 4
Hope that helps
Delete Record !!!
Anupam, February 28, 2005 - 6:49 am UTC
What is the diffence between :
1. DELETE <table> TRUNCATE WHERE <clause>;
2. DELETE <table> WHERE <clause>;
?? Both works!!!
February 28, 2005 - 8:01 am UTC
just like
ops$tkyte@ORA9IR2> delete t this_is_a_correlation_name where 1=1;
27794 rows deleted.
works -- truncate is just being used as a correlation name there (a table alias)
Delete record
Anupam, February 28, 2005 - 6:50 am UTC
What is the diffence between :
1. DELETE <table> TRUNCATE WHERE <clause>;
2. DELETE <table> WHERE <clause>;
?? Both works!!!
Deleted message
Franc Uwa, March 03, 2005 - 10:11 am UTC
the deleted was of today and trying to see if you can help me to resend those deleted message of me back to my inbox
March 03, 2005 - 10:28 am UTC
huh?
Brad Doczkat, March 10, 2005 - 10:51 am UTC
What do you do when the resident DBAs do not approve of the use of this method? How do I convince them that this approach is much more streamlined and requires far less processing time than doing an update of millions of rows the old fashioned way? For my particular need that brought me to this thread, I only have to do a one time initialization of a one character column that I want to put a not null (with default) contraint on. There are 5.5 million rows in the table. I need to do this one-time as an implementation step, yet the DBAs do not seem to buy into this approach. They would rather update so many rows a night, over the course of several nights, doing frequent commits during the update process.
March 10, 2005 - 7:11 pm UTC
tell the dba's to do the work then?
Sorry -- I just don't know how to make your dba's do something they don't want to.
Help Please I don't Understand
A Reader, March 11, 2005 - 4:20 pm UTC
Tom:
I have a project due tomorrow that I am getting paid a boat load of money for and do not have a clueÂ… Could you write it for me without being compensated? Oh and answer my interview questions for me as well.
Just joking ;)
As always you have the best info on your site (dare we say better than meta link?).
However I have made it through two threads in just over 9 hours. Have you ever thought of hiring a few people and extracting: best part of thread ; lessons learned; sample code (I still have not found the latest and greatest 10g show_space); and a glossary for items like “CTAS”, “(sic)”, “SLA”, “FBI”? Perhaps make a book out of it and sell it but it would have to be a searchable electronic copy. (Hmm, WIKI enable a portion of asktom (naa)?)
In this thread alone I found: qÂ’| quoted strings |Â’; CTAS for speed; I really need to start using hints; I need to look into Analytics; I have to check if my DW really needs partitionsÂ…
Thanks for everything.
A Reader.
(Guess response: This is a feature requestÂ… Denied :)
March 12, 2005 - 9:58 am UTC
I do that -- it is called "a book" :)
Effective Oracle by Design -- best of asktom, with scripts as of the time of the book.
Expert One on One Oracle -- what you need to know, with scripts -- second edition under construction (in fact I'm going to take April "off" -- from asktom -- in order to just get it going)
Expert One on One Oracle second will be the 10g version of what you are looking for.
Alberto Dell'Era, March 12, 2005 - 10:16 am UTC
> best of asktom, with scripts
> as of the time of the book.
Yes, but that doesn't mean that the books are just a collection of q&a from asktom (as i've found many ones believe); they are "regular" books - the q&a are mainly used as "Customer-driven Topic Selection", so to speak...
March 12, 2005 - 10:20 am UTC
correct -- definitely not a collection of asktom articles, that would be disjointed at best. Rather, a flow -- a story -- with examples from the site to help it along. The things I pick to write about are driven very much by the questions and trends I see here.
A reader, March 13, 2005 - 2:45 am UTC
Hi Tom,
Regarding...
"in fact I'm going to take April "off" -- from asktom "
What will happen to desparate people, who are looking for answers supported by technical reasons? I am sure that others will be of the same opinion that this is one of greatest help ever one can get in Oracle in timely and accurate manner...
March 13, 2005 - 9:54 am UTC
start here:
</code>
http://www.jlcomp.demon.co.uk/faq/ind_faq.html
and try:
groups.google.com -> comp.databases.oracle.*
the place where I got hooked on discussing Oracle in the first place way back in October of 1994. You might get an answer from Jonathan Lewis, Connor McDonald, or a host of others. I still lurk there from time to time.
otn.oracle.com -> discussion forums
metalink.oracle.com -> discussion forums
the oracle-l list
http://www.freelists.org/archives/oracle-l/ <code>
Where Cary Millsap or Steve Adams or Tim Gorman to name but a very few might answer your question.
You might even find yourself answering some from time to time!
dblink
shiva, March 23, 2005 - 2:49 am UTC
We are trying to access a stored procedure which is existing in another database thro dblink, in that stored procedure one update stat.. is there because of that screen gets hanged
Regards,
Shivachandramouli.R
March 23, 2005 - 8:46 am UTC
so little data... 'one update stat' what is that. and what have you done to diagnose the issue?
questions in the dependent objects after
A reader, March 27, 2005 - 10:03 pm UTC
Hi Tom,
As u suggested us to create a new table ,then drop the original table and rename
sounds the great method update the table by CTAS, drop original table and then rename it to the original table.
questions all the dependent objects such as packages, views, synonynums... all will because invalid. do these need to be recompiled? I know all the grants have to be regrants.
Thanks.
March 28, 2005 - 7:18 am UTC
dependent objects will take care of themselves next time they are executed.
A reader, March 27, 2005 - 10:06 pm UTC
Hi Tom,
Sounds the great method to update the table by CTAS, drop original table and then
rename it to the original table.
Question is all the dependent objects such as packages, views, synonynums... all
will become invalid. Do these need to be recompiled? I know all the grants have
to be regrants.
Thanks.
A reader, March 27, 2005 - 10:48 pm UTC
tom,
you said:
"
Now, just set job_queue_processes > 0 to set the "degree of threads" and in 8i
and before set job_queue_interval to say 60 or so and there you go.
"
what's the "degree of threads"?
March 28, 2005 - 7:29 am UTC
"degree of threads" is number of concurrent job queue processes, how many dbms_job jobs can be running simultaneously.
commits or not
A reader, April 01, 2005 - 5:09 pm UTC
want to insert 30 million records to a dummy table with nologging as you adivsed.
insert into /*+ append parallel ( ) */
select * from ...
if i have the large rollback segs, it can be complete without doing the commits during the insert into (for example comment every one millions inserts), is it faster without the commits?
Thanks alot
April 01, 2005 - 6:12 pm UTC
to commit or not. Everytime you commit, you will wait, wait for log file sync.
Every time you commit, you say "i'm done". That's ok, if you are restartable!
However, with a big parallel direct path insert it'll be all or nothing (no point to do N rows, and do the query all over again). Each parallel process will utilize it's own rbs and if you have no indexes, a direct path insert isn't going to generate MUCH undo at all (it writes above the HWM and doesn't need to generate UNDO information)
A reader, April 05, 2005 - 5:08 am UTC
Let's Tell the CBO About Our Great Idea!
Doug Winage, April 11, 2005 - 7:28 pm UTC
Given that it seems there are clear cases in which making and loading a new table is preferable to updating in place, I wonder if it would be just as well to make such an approach available to the CBO as a candidate execution plan. I'll grant that one wouldn't want to leave the choice entirely up to the optimizer; the safest thing would be to allow enabling and disabling of the plan option at the system level with a parameter called, say, "ctas_for_updates_enabled." One could also provide a pair of hints, say, /*+ USE_CTAS */ and /*+ NO_CTAS */, to allow one to override the CBO, if need be, and expressly demand or prohibit a plan involving recreation of the table.
April 11, 2005 - 8:23 pm UTC
all really good ideas need to go via support and get filed....
CTAS instead of Deletes
GG, April 11, 2005 - 11:02 pm UTC
Tom,
Please, this is a relevant follow up to this thread.
I have also included a test case.
Instead of deleting rows from a table with 270 Million records, I have used CTAS. The delete criteria is based on a join of two fields from two tables. The delete occurs only on one table (ORDERS). The other table is INVOICE.
If DOC_NUMBER and ORD_ITEM values from both tables match, then delete the matching records from ORDERS.
Test case:
drop table orders;
create table orders (
DOC_NUMBER number(10) not null
,S_ORD_ITEM number(10) not null
,PROMOTION varchar2(10) not null);
drop table invoice;
create table invoice(
DOC_NUMBER number(10) not null
,S_ORD_ITEM number(10) not null
,OTHER_INFO varchar2(10) not null);
begin
for i in 1..10 loop
insert into orders values(i,i+1,'ORDERS');
end loop;
commit;
end;
/
select * from orders;
DOC_NUMBER S_ORD_ITEM PROMOTION
---------- ---------- ----------
1 2 ORDERS
2 3 ORDERS
3 4 ORDERS
4 5 ORDERS
5 6 ORDERS
6 7 ORDERS
7 8 ORDERS
8 9 ORDERS
9 10 ORDERS
10 11 ORDERS
begin
for i in 1..5 loop
if mod(i,2) = 0 then
insert into invoice values(i,i+1,'INVOICE');
end if;
end loop;
commit;
end;
/
select * from invoice order by 1;
DOC_NUMBER S_ORD_ITEM PROMOTION
---------- ---------- ----------
2 3 INVOICE
4 5 INVOICE
The above two rcords should be deleted from ORDERS:
drop TABLE ORDERS_NEW;
CREATE TABLE ORDERS_NEW
NOLOGGING
AS
SELECT /*+ parallel(ORDERS,4) */ *
from orders ord
where NOT EXISTS
(select null from invoice inv
where inv.DOC_NUMBER = ord.DOC_NUMBER and
inv.S_ORD_ITEM = ord.S_ORD_ITEM);
select * from orders_new order by 1;
DOC_NUMBER S_ORD_ITEM PROMOTION
---------- ---------- ----------
1 2 ORDERS
3 4 ORDERS
5 6 ORDERS
6 7 ORDERS
7 8 ORDERS
8 9 ORDERS
9 10 ORDERS
10 11 ORDERS
IS THIS THE CORRECT WAY OF DOING IT? ESPECIALLY THE SQL WITH "NOT EXISTS"?
April 12, 2005 - 7:07 am UTC
hints need to use the correlation (alias) names....
I would just use NOT IN but not exists with the CBO will be considered as not in conceptually so it is just harder to code, but has the same net effect.
See the concepts guide for details on parallel CTAS.
Update millions of records
Marsboy, April 12, 2005 - 1:33 pm UTC
Hi Tom,
It helped me so much that my days work is now can be accomplished in minutes.
update partition
phil, April 14, 2005 - 2:33 am UTC
I am sorry for posting aq similar question in another thread. The follow up should have been here !
I am getting a little confused with the technique of updated lots of rows in a partitioned table.
I have a table partioned by month. To do an update on a column you suggest doing a create table as select. This of course will create a new table with the updated colum and the data from the table to be updated.
What I do not understand is how to handle the each partition. My new table has none, just valid data. How do I now get the new table to the structure of the old with all the partitions ?
I am sure the answer is here, I just cannot see it !
regards
April 14, 2005 - 7:43 am UTC
to do a huge bulk update of most of the rows, using a create table as select could be a way to go. I followed up on the other thread.
Partioned tables
Greg, April 14, 2005 - 7:54 am UTC
a previous poster was asking about partioned tables ..
[quote]
I am getting a little confused with the technique of updated lots of rows in a
partitioned table.
[/quote]
Feel free to ignore this question if you answered it already in the other followup (I wasn't sure which thread you posted it in, and can't see find it .. *sigh*)
Anyway ... if the table is partitioned, can you not just deal with each partition seperately ?
That is: if you have 15 partitions (ie P01 through P15 ...), and you "update" only encompased P03 and P04 ... then could you not just "re-create" that partition in the same way you "re-create" the table .. then just "stuff" the partition(s) back in?
I can't remember all the syntax to pull this little "card-shuffle" off ... but I'm pretty sure it can be done? can't it?
A reader, April 21, 2005 - 12:12 pm UTC
"
declare
l_job number;
begin
dbms_job.submit( l_job, 'execute immediate ''alter index idx1 rebuild'';' );
commit;
dbms_job.submit( l_job, 'execute immediate ''alter index idx2 rebuild'';' );
commit;
end;
Now, just set job_queue_processes > 0 to set the "degree of threads" and in 8i
and before set job_queue_interval to say 60 or so and there you go.
"
will the dbms job only be executed once?
TIA
April 22, 2005 - 9:02 am UTC
yes, well, it will execute successfully ONCE.
It might fail over and over and over and over, then run successfully and go away.
How to Update millions or records in a table?
S.Darshan, May 17, 2005 - 7:48 am UTC
I have to update one table which has 10 million records. The regular rollback segment won't enough to hold this data, In this case, how can we predict, how much rollback area will be used by this update.
Initially I thought of your approach "CREATE TABLE new_table as select <do the update "here"> from old_table;"
But I have to update about 30% of columns. Rest of the 70% columns should be not touched. Also this table have lot foreign key Constraints. Is any alternative method? Thanks in Advance.
With best regards
S.Darshan
May 17, 2005 - 9:16 am UTC
if I had to update 1/3 of a 10,000,000 row table -- that is the perfect candidate for create table as select. foreign keys or not (you can always disable, ctas and for the columns you have not updated -- enable novalidate, just put them back on)
but, if you cannot do that, disable indexes on columns to be updated and update and then rebuild the indexes. to get a grasp on "how much undo", create a copy of the first 100,000 rows of data for example, update it and see how much undo you generated.
Parallel DML
Tamilselvan, May 18, 2005 - 10:05 am UTC
I agree with what Tom said.
How ever, in many real life situation CTAS may not be executed as we think. For example, if the table has a LONG column, then CTAS will not work.
Consider Parallel DML. I have successfully implemented parallel dml in a large DB.
Here is the script:
alter table scott.table_a parallel 8;
alter index scott.table_a_p1 parallel 8;
alter table scott.tamil_temp_table_b parallel 8;
alter index scott.tamil_temp_table_b_idx parallel 8;
commit;
alter session enable parallel dml ;
alter session enable parallel query ;
update /*+ parallel(a,8) */ scott.table_a a
set a.pr_addr_id = ( select addr_per_id
from scott.tamil_temp_table_b c
where c.accnt_id = a.ID)
where exists ( select null
from scott.tamil_temp_table_b c
where c.accnt_id = a.ID) ;
commit;
alter session disable parallel dml ;
alter session disable parallel query ;
alter table scott.table_a noparallel ;
alter index scott.table_a_p1 noparallel ;
Thanks
Tamil
May 18, 2005 - 11:20 am UTC
<quote>
if the table has a LONG column, then CTAS will not work.
</quote>
convert long to clob and be done with it.
Ecellent Explanation
A reader, May 18, 2005 - 11:18 pm UTC
this is very helpful nice Toma i have question i am facing while using dbstart utility where can i ask that question
How to Update millions or records in a table
Debasish, June 08, 2005 - 5:29 am UTC
you told that create tble with update command. but I have a problem that. ( in 8i)
One of my Master table is contain following fields ( approx 500000 records)
1. ID varchar2
2. Pincode Nuimber
3. Tag char
Another three table is bankmaster1,bankmaster2,bankmaster3 fields are
1. pincode number
2. branch varchar2
I have to update tag column in master table according the pincode distribution in bankmaster table, like tag E for bank1, tag C form bank2, tag V for bank3.
when I try it through update command then it works for a small table but not for this 500000 record table.
Plese let me know in advance.
June 08, 2005 - 9:04 am UTC
I told you to create a table with the update command? Where and when -- I'll need to correct that if true, for you cannot create a table with the update command.
It works for me for a 50,000,000 row table or a 5 row table. You'll have to tell us why it isn't "working" (and hopefully it is not using a single index)
ops$tkyte@ORA10G> create table t
2 as
3 select id, id pincode, 'x' tag
4 from big_table.big_table
5 where rownum <= 1000000;
Table created.
Elapsed: 00:00:01.36
ops$tkyte@ORA10G>
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> create table t1 as select id pincode, 'xxxxxx' branch from t where mod(id,3) = 0;
Table created.
Elapsed: 00:00:01.60
ops$tkyte@ORA10G> create table t2 as select id pincode, 'xxxxxx' branch from t where mod(id,3) = 1;
Table created.
Elapsed: 00:00:01.54
ops$tkyte@ORA10G> create table t3 as select id pincode, 'xxxxxx' branch from t where mod(id,3) = 2;
Table created.
Elapsed: 00:00:01.49
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> exec dbms_stats.gather_table_stats( user, 'T', method_opt=> 'for all indexed columns' );
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.06
ops$tkyte@ORA10G> exec dbms_stats.gather_table_stats( user, 'T1', method_opt=> 'for all indexed columns' );
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.13
ops$tkyte@ORA10G> exec dbms_stats.gather_table_stats( user, 'T2', method_opt=> 'for all indexed columns' );
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.07
ops$tkyte@ORA10G> exec dbms_stats.gather_table_stats( user, 'T3', method_opt=> 'for all indexed columns' );
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.13
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> set timing on
ops$tkyte@ORA10G> update (select tag from t where pincode in (select pincode from t1)) set tag = 'E';
333333 rows updated.
Elapsed: 00:00:06.13
ops$tkyte@ORA10G> update (select tag from t where pincode in (select pincode from t2)) set tag = 'C';
333334 rows updated.
Elapsed: 00:00:06.48
ops$tkyte@ORA10G> update (select tag from t where pincode in (select pincode from t3)) set tag = 'V';
333333 rows updated.
Elapsed: 00:00:06.40
ops$tkyte@ORA10G>
Only takes a couple of seconds.
A reader, June 10, 2005 - 8:42 am UTC
Tom,
I have this query in which I'm trying update the old POC_ID with the new POC_ID. However, it doesn't produce the results I would expect. The table does not get updated with the correct POC_ID. What am I missing here?
begin for c1 in (select w.poc_id new_poc_id, a.poc_id old_poc_id
from wsar_poc w, ap32admin.adas_poc a
where w.poc_responsible_ind = a.poc_responsible_ind)
loop
update wsar_archive_cs
set poc_id = c1.new_poc_id
where poc_id = c1.old_poc_id;
end loop;
end;
Thanks,
Randy
June 10, 2005 - 10:34 am UTC
need example.
but that looks like it should be at most a SINGLE sql statement, no code whatsoever...
A reader, June 10, 2005 - 10:59 am UTC
I have a table WSAR_ARCHIVE_CS which contains a column POC_ID. POC_ID is FK to the ADAS_POC POC_ID PK. I want to update that column with the new POC_ID which is stored in WSAR_POC POC_ID PK.
Thanks,
Randy
June 10, 2005 - 11:10 am UTC
create table .....
create table ....
create table ....
insert some data...
insert some more data...
followed by "and this is what I need to do..."
A reader, June 10, 2005 - 11:28 am UTC
create table wsar_poc
(POC_ID NOT NULL NUMBER,
POC_RESPONSIBLE_IND VARCHAR2(50),
DIVISION_ID NOT NULL NUMBER,
OUSER VARCHAR2(30),
ACCESS_ROLE VARCHAR2(50),
PHONE_NUMBER VARCHAR2(20),
EMAIL VARCHAR2(200))
Create table wsar_archive_cs
(FILE_ID NOT NULL NUMBER,
ONELINER_DATE DATE,
PRODUCT VARCHAR2(50),
COMPONENT VARCHAR2(400),
PRIORITY VARCHAR2(50),
POC_ID NUMBER,
SUBJECT VARCHAR2(300),
ISSUE VARCHAR2(4000))
I have the table adas_poc which contians the old poc_id that is in the WSAR_ARCHIVE_CS table.
desc adas_poc
POC_ID NOT NULL NUMBER,
POC_RESPONSIBLE_IND VARCHAR2(50),
DIVISION_ID NOT NULL NUMBER,
OUSER VARCHAR2(30),
ACCESS_ROLE VARCHAR2(50),
PHONE_NUMBER VARCHAR2(20),
EMAIL VARCHAR2(200)
I tried to update using the cursor for loop, but no luck. The column does not get updated with the correct ID.
begin for c1 in (select w.poc_id new_poc_id, a.poc_id old_poc_id
from wsar_poc w, ap32admin.adas_poc a
where w.poc_responsible_ind = a.poc_responsible_ind)
loop
update wsar_archive_cs
set poc_id = c1.new_poc_id
where poc_id = c1.old_poc_id;
end loop;
end;
Thanks,
Randy
June 10, 2005 - 3:33 pm UTC
oh, so close -- you almost had inserts there with data to show
a) what happened
b) explain what you expected
That will definitely join wsap_poc to adas_poc by poc_responsible_ind
it will definitely then update wsar_archive_cs
you need to explain why this logic isn't working, it is rather straight forward. Demonstrate for us.
RE: Parallel Import
Bakunian, June 20, 2005 - 6:37 pm UTC
Tom,
I created range partitioned table with parallelization enabled, then I use import utility to load 5Gb of data into table however I don't see parallel processes spawned during load.
Shouldn't import take advantage of this functionality?
I am using 9.2 on Solaris 8.
These are instance parameters.
Thank you for your time.
NAME_COL_PLUS_SHOW_PARAM TYPE VALUE_COL_
--------------------------------------------- ----------- ----------
fast_start_parallel_rollback string LOW
log_parallelism integer 4
parallel_adaptive_multi_user boolean TRUE
parallel_automatic_tuning boolean TRUE
parallel_execution_message_size integer 4096
parallel_instance_group string
parallel_max_servers integer 40
parallel_min_percent integer 0
parallel_min_servers integer 0
parallel_server boolean FALSE
parallel_server_instances integer 1
parallel_threads_per_cpu integer 2
recovery_parallelism integer 0
June 21, 2005 - 7:51 am UTC
imp is a serial process, always has been - it reads a single DMP file sequentially and loads it.
10g with impdb has a parallel capability.
Tome u r great
shyam, June 30, 2005 - 3:08 am UTC
I have a quick question my statistics are as below i am loading data
datafile Blocks_Read FS.Phyblkwrt Blocks_Written Total_IOs
/oracle/u03/oradata/dev/data_01.dbf
26507 5240 31747
/oracle/u03/oradata/dev/data_09.dbf
26823 2823 29646
/oracle/u06/oradata/dev/data_08.dbf
24978 3704 28682
/oracle/u04/oradata/dev/data_10.dbf
26288 1831 28119
/oracle/u04/oradata/dev/data_02.dbf
24053 3634 27687
/oracle/u05/oradata/dev/data_11.dbf
26155 1284 27439
/oracle/u05/oradata/dev/data_07.dbf
23125 2865 25990
/oracle/u04/oradata/dev/data_06.dbf
23558 1978 25536
/oracle/u06/oradata/dev/data_25.dbf
21672 2234 23906
/oracle/u05/oradata/dev/data_03.dbf
21299 2574 23873
/oracle/u03/oradata/dev/data_05.dbf
20324 1819 22143
/oracle/u06/oradata/dev/data_04.dbf
19673 1746 21419
/oracle/u06/oradata/dev/data_12.dbf
19574 1306 20880
/oracle/u04/oradata/dev/data_14.dbf
19309 755 20064
/oracle/u03/oradata/dev/data_13.dbf
18807 848 19655
/oracle/u05/oradata/dev/data_24.dbf
17656 1316 18972
/oracle/u06/oradata/dev/data_16.dbf
16748 1730 18478
/oracle/u05/oradata/dev/data_15.dbf
17006 810 17816
/oracle/u03/oradata/dev/data_17.dbf
14810 1350 16160
/oracle/u05/oradata/dev/data_19.dbf
13552 1157 14709
/oracle/u03/oradata/dev/data_21.dbf
13622 911 14533
/oracle/u05/oradata/dev/data_23.dbf
13366 878 14244
/oracle/u04/oradata/dev/data_18.dbf
12644 908 13552
/oracle/u04/oradata/dev/data_22.dbf
12521 741 13262
/oracle/u06/oradata/dev/data_20.dbf
12137 546 12683
Tom u can observe that total reads is more than total writes as a result the performance is decreasing. let me know how should i increse the block written. as a result o this my data loading is slow
June 30, 2005 - 9:29 am UTC
who is "u" and why is "u" observing things?
how do you know that reads are the cause of performance decreasing. All I see are a bunch of numbers.
loading data
shyam, June 30, 2005 - 3:10 am UTC
this is the quesry i used to generate statis stics
select DF.Name File_Name,
FS.Phyblkrd Blocks_Read,
FS.Phyblkwrt Blocks_Written,
FS.Phyblkrd+FS.Phyblkwrt Total_IOs
from V$FILESTAT FS, V$DATAFILE DF
where DF.File#=FS.File#
order by FS.Phyblkrd+FS.Phyblkwrt desc;
June 30, 2005 - 9:30 am UTC
so, you've just shown the number of reads done since the database was started. All they are are a bunch of numbers with no meaning.
how do you know read IO is your bottleneck? what made you decide that?
performance while loading data
shyam, June 30, 2005 - 7:54 pm UTC
FS.Phyblkwrt
datafile = /oracle/u03/oradata/dev/data_01.dbf
Blocks_Read = 26507
Blocks_Written =5240
Total_IOs= 31747
June 30, 2005 - 9:09 pm UTC
so what? (he says again)
you've shown me that file has been read 26,507 times SINCE STARTUP and written to 5,240 time SINCE STARTUP
big deal, so what?
where they all during your load.
did they take A LONG TIME <<<===
PRECAUTIONS TO TAKE WHEN HUGE INSERTS
shyam, July 03, 2005 - 8:44 am UTC
tom can u tell me the tips and how to improv the speed with which data is loading.
to to improve data load speed. for huge inserts at present i have 73 tables data is being loaded it takes for one project 60 mins
July 03, 2005 - 10:28 am UTC
depends on your needs and circumstances. To describe them all would be a small book.
So, you'd need to be more "specific". How many new rows, how many old rows, is there lots of validation going on, is this a reload or a load, have lots of un-necessary indexes, etc etc etc
Delete & import 6M records - Data Warehouse
A reader, July 06, 2005 - 11:54 pm UTC
Tom,
We currently have v8.0.6 running data warehouse. We have 6M rows to delete everyday & same time import same amount. It currently take us 10hrs to do deletion part. Data are import every 5mins. Is there any way we can speed up the deletion & improve the import(using oracle imp/exp)
Thanx so much for your help
July 07, 2005 - 9:09 am UTC
insufficient data. why no partitioning? maybe a create table as select rows to keep and drop old table.
but insufficient data to answer properly.
and really old software.
and really archaic tools.
a lot has happened in the time between 8.0.6 and 10gR2 - you'd find many more options in 2005 than you did in 1997 when 8.0 was current.
slowly my data base
hameed, August 06, 2005 - 6:50 pm UTC
hi tom
pleas help me
i have program by oracle 8i& dev 6i
the problem:
data base is slowly
One large insert Vs multiple smaller inserts
Sami, August 14, 2005 - 10:55 am UTC
Dear Tom,
We had this issue in production during huge data migration.
Environment:-
Oracle 8.1.7.3 Sun 2.6 - 2 Node OPS. 12 CPU on each box.
The table in question here involved in M-M Oracle replication.
There is no other activity on both "12Million_Rec_table" and "1Million_Rec_table " other than my process.
Process-1
=========
exec dbms_reputil.replication_off();
commit;
insert /*+ append parallel(t,12) */ into 12Million_Rec_table t
select /*+ parallel */ * from 1Million_Rec_table;
Process-2
=========
exec dbms_reputil.replication_off();
commit;
--for loop with diff predicate value
loop
insert /*+ append parallel(t,12) */ into 12Million_Rec_table
select /*+ parallel */ * from 1Million_Rec_table
where predicate=&1; -- there is an Index on Predicate column
end loop
12Million_Rec_table has only one primary key index. There is no other constraints or LOB columns on this table. For some reason it didn't use direct-path insert, don't know why.
Process-2 executed N times in a sequence.Basically Process-1 is one large transaction compared to Process-2 (which has N number of transactions in a loop).
In terms of execution time the Process-1 took longer than Process-2. Process-1 was running for more than
2 hrs without completion so we stopped it. Process-2 with smaller chunk of records completed very fast.
When Process-1 started the USED_UBLK value in v$rollstat increased rapidly then slowed down gradually. I have seen
this behaviour many times while running Process-1.
My questions are
1)Why Process-1 is slower than Process-2. [We didn't have any rollback segment issue while running both Process-1 and Process-2]
2)Is there any performnce issue if the rollback segment goes beyond some size, say 200M or something..?
August 14, 2005 - 11:49 am UTC
you did not enable parallel DML, to do a parallel direct path insert, you must.
I cannot answer these questions, insufficient data. You could have captured information while it was running.
There is a *small* chance that what happened was
a) the big insert, because of the primary key index, generated gobs of undo.
b) this caused more undo to be generated than fit in the buffer cache
c) so it had to be flushed by dbwr
d) causing buffer waits by the insert (trying to get more buffers from the cache for undo)
e) and causing IO contention on the disk
The smaller inserts *MIGHT* have
a) filled up part of the buffer cache with undo blocks
b) committed, allowing the undo blocks in the cache to be reused
c) goto A n times
avoiding the buffer waits and the flushing of the undo to disk.
However, I would not have taken your approach for a small set of one million rows. It likely would have been much faster to
a) get rid of the primary key index
b) set big table nologging if not already
c) insert /*+ append */ into big_table select * from small_table;
d) recreate the primary key index in parallel (if even needed, just 12 million rows), nologging
e) put logging attribute back on for the future
f) backup affected datafiles
did not get rid of PK index because the table is in M-M replication
Sami, August 14, 2005 - 1:22 pm UTC
Dear Tom,
Thanks a lot for your immediate response. You are absolutely
right about why the "smaller chunk transactions" defeated "one large transaction".
<asktom>
a) get rid of the primary key index
b) set big table nologging if not already
c) insert /*+ append */ into big_table select * from small_table;
d) recreate the primary key index in parallel (if even needed, just 12 million
rows), nologging
e) put logging attribute back on for the future
f) backup affected datafiles
</asktom>
I agree with you 100%. Even I prefer your approach [get rid of the primary key index,delete,rebuild index]. However the table in question is involved in Master-Master replication.
We are not sure about getting rid of primary key index for the table which involved in replication. That is why we didn't take this approach.
Tom,thanks for your time & effort.
Additional Info
Sami, August 14, 2005 - 8:20 pm UTC
Our large pool is set to 0 size. I am just wondering whether this value had any impact on our parallel process poor performance.
NAME TYPE VALUE
------------------------------------ ------- ------------------------------
large_pool_size string 0
August 15, 2005 - 6:52 am UTC
are you sure it is really zero, if you are using automatic SGA memory management:
ops$xp10gr1\tkyte@ORA10GR1> show parameter large
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
__large_pool_size big integer 4M
large_pool_size big integer 0
ops$xp10gr1\tkyte@ORA10GR1>
I didn't set a large pool, but Oracle has set up a 4m one (that can grow/shrink as needed)
Large pool size
Sami, August 15, 2005 - 1:45 pm UTC
Tom, We are still in 8.1.7.3.
Yes, I am sure the large pool size is set to zero on both instances(2 node OPS)
August 15, 2005 - 10:22 pm UTC
it isn't "impacting it" performance wise, it is using memory from elsewhere if, but the large pool would be a better place since this memory for parallel execution isn't "LRU" memory like the other stuff in the shared pool -- the large pool is designed for memory that is allocated and deallocated, unlike shared pool memory which is designed to be allocated and aged out over time.
Undo - With and without index
Sami, August 15, 2005 - 3:17 pm UTC
Session #1
=================================================================================
/*
create table t1 as select * from dba_objects;
create index t1_idx01 on t1(object_id);
*/
SQL> select distinct sid from v$mystat;
SID
----------
126
SQL> insert into t1 select * from dba_objects;
48601 rows created.
=================================================================================
Session #2
=================================================================================
SQL> select taddr,used_ublk from v$transaction t,v$session s,v$rollstat r where s.taddr=t.addr and t.xidusn=r.usn and s.sid=126;
TADDR USED_UBLK
-------- ----------
1965B9F4 568
SQL> select taddr,used_ublk from v$transaction t,v$session s,v$rollstat r wheres.taddr=t.addr and t.xidusn=r.usn and s.sid=126;
TADDR USED_UBLK
-------- ----------
1965AFFC 2
SQL> select taddr,used_ublk from v$transaction t,v$session s,v$rollstat r where s.taddr=t.addr and t.xidusn=r.usn and s.sid=126;
TADDR USED_UBLK
-------- ----------
1965B9F4 636
SQL> select taddr,used_ublk from v$transaction t,v$session s,v$rollstat r where s.taddr=t.addr and t.xidusn=r.usn and s.sid=126;
TADDR USED_UBLK
-------- ----------
1965AFFC 2
SQL> select taddr,used_ublk from v$transaction t,v$session s,v$rollstat r where s.taddr=t.addr and t.xidusn=r.usn and s.sid=126;
no rows selected
SQL> select taddr,used_ublk from v$transaction t,v$session s,v$rollstat r where s.taddr=t.addr and t.xidusn=r.usn and s.sid=126;
TADDR USED_UBLK
-------- ----------
1965B9F4 754
SQL> select taddr,used_ublk from v$transaction t,v$session s,v$rollstat r where s.taddr=t.addr and t.xidusn=r.usn and s.sid=126;
TADDR USED_UBLK
-------- ----------
1965B9F4 1035
SQL>
=================================================================================
I monitored rollback activities for session#1(while inserting rows) through session#2.
While the USED_UBLK is gradually going up for one transaction , I noticed some other
transaction(TADDR=1965AFFC) with USED_UBLK=2.
Is it because of Index t1_idx01?
I didn't see this behavior when I tested the same test case without Index.
Sometimes I noticed "no rows selected" in session#2. I assume this is the time the transaction is
switching between TADDR 1965AFFC and 1965B9F4?
Please correct me if I am wrong?
August 15, 2005 - 10:35 pm UTC
ddl commits, there were two transactions there
the create table and the create index.
no ddl here
Sami, August 15, 2005 - 11:04 pm UTC
Tom,
Sorry for the confusion. I shouldn't have put the create table and create index statement over there (though it was commented)
create table and create index is outside of this scope. The reason I put those because I just wanted to show you that
1) the table is created from dba_objects
2) it has index on object_id column
I have tested this behavior in both 8i and 10g. The following output and "no rows selected" appeared N times (more than 2) WHILE THE INSERT STATEMENT IS GOING ON, just for clarity I just truncated the output.
TADDR USED_UBLK
-------- ----------
1965AFFC 2
August 16, 2005 - 10:46 am UTC
then I have no clue what we are in fact looking at there.
I cannot even remotely reproduce - nor is this the way it works.
the used_ublks would in this case increase, until you commit, then they go away.
1 big INSERT with joins - OR - multiple UPDATES...
Sandeep, August 16, 2005 - 8:05 am UTC
Hi Tom,
I'm trying to archive data from some big tables (5 or 6 of them). Before doing that I need to "rollup" an "amount"
column within each of those big tables (based on
a "creation date" criteria). The summations of these amount will be grouped by date (monthly sum-up) and will
need to be put into a reporting table.
The reporting table will have columns like
"Month", "Sum_Amt_BigTable1","Sum_Amt_BigTable2",
"Sum_Amt_BigTable3" and the like.
Initially there was only one BigTable (say BigTable1), so the archiving script was simple.
Just one - INSERT INTO REPORT_TABLE... SELECT.. FROM BIGTABLE1.. WHERE (the date criteria)
GROUP BY (the date column).
Now since there are more "BigTables", I have two(?) options:
1. Modify the insert script such that there is JOIN
between all BigTables, say BigTable1..BigTable6 (eeps!)
on the date criteria, and just sum-up the required
columns. For the sake of simplicity, assume that all
the tables contain at least one record for each month
(so no need to outer-join!).
2. Use the original INSERT script to create the initial
records within the REPORT table... then run multiple
UPDATEs on the REPORT table - one for
each bigtable, like this (roughly!)-
UPDATE REPORT_TABLE
SET Sum_Amt_BigTable2 = (SELECT sum(amountxx)
FROM BigTable2 WHERE (the date criteria)
GROUP BY (the date column))
WHERE REPORT_TABLE.month = LAST_DAY(TO_DATE(TO_CHAR(
BigTable2.date_column,'DD-MON-YYYY'),'DD-MON-YYYY'))
;
... and so on - for each remaining BigTable..
Which option do you thing will be better in terms of performance? - I suspect option2, as I'm a bit
apprehensive about joining 6 big tables for one big insert. Will be testing this out, but wanted
to have your opinion about the same.
Regards,
Sandeep
August 17, 2005 - 10:36 am UTC
#1 of course, do as much as you can in a single sql statement.
big hash area size, nice juicy full scans -- ready to rock and roll and have it done fast.
big INSERT with joins - OR - multiple UPDATES
Sandeep, August 17, 2005 - 11:57 am UTC
Hi,
Was thinking you might actually suggest #2 !!
Having a big join across 6 tables with multimillion
records? hmm... not sure.. but will definitely try this
out.
Did the #2 and it works well... will try to do the
#1 and get some stats to compare... somehow joins
between 6 *huge* tables does not convince me... but...
Thanks,
Sandeep
August 17, 2005 - 1:58 pm UTC
6 tables, 60 tables -- I don't care, make SQL work for you -- NOT the other way around.
databases were BORN TO JOIN, it is what they do
Option #3?
Bob B, August 17, 2005 - 2:25 pm UTC
If there was the likelihood of an outer join in the above scenario, it might be worth using a union all approach. Something like:
SELECT
'Big Table 1' NAME,
SUM( WHATEVER ) WHATEVER_SUM,
SUM( WHATEVER2 ) WHATEVER2_SUM,
PARAM1,
PARAM2
FROM BIG_TABLE1
WHERE CONDITION1 = 'whatever'
GROUP BY PARAM1, PARAM2
UNION ALL
SELECT
'Big Table 2' NAME,
SUM( WHATEVER ) WHATEVER_SUM,
SUM( WHATEVER2 ) WHATEVER_SUM,
PARAM1,
PARAM2
FROM BIG_TABLE2
WHERE CONDITION1 = 'whatever'
GROUP BY PARAM1, PARAM2
... etc
Essentially, write the report query for each table, union all them together and then summarize the union all query as necessary:
SELECT
SUM( SUM_WHATEVER ) SUM_WHATEVER,
SUM( SUM_WHATEVER2 ) SUM_WHATEVER2,
PARAM1,
PARAM2
FROM (
QUERY ABOVE
)
GROUP BY PARAM1, PARAM2
With more clear test case
sami, August 17, 2005 - 9:12 pm UTC
</asktom>
then I have no clue what we are in fact looking at there.
I cannot even remotely reproduce - nor is this the way it works.
the used_ublks would in this case increase, until you commit, then they go away.
</asktom>
Dear Tom,
While the session#1 is doing DIRECT PATH INSERT on TABLE which has PRIMAR KEY,I Monitored SESSION#2 as below.
While the USED_UBLK is gradually going up, there are 2 occurances it reported different TADDR with USED_UBLK=2.
I just wanted to know what is the new TADDR.
Session#1
#############################################################################
20:37:47 SQL> select distinct sid from v$mystat;
SID
----------
162
20:37:50 SQL> insert /*+ append */ into t1 select * from dba_objects;
48603 rows created.
20:38:23 SQL>
#############################################################################
Session #2
#############################################################################
SQL> desc undo_monitor
Name Null? Type
----------------------------------------- -------- -------------------------
TADDR VARCHAR2(50)
UNDO_UBLK NUMBER
TIME DATE
SQL> truncate table undo_monitor;
Table truncated.
SQL> set time on
SQL>
SQL>
SQL> declare
2 i number :=0;
3 begin
4 loop
5 for c0rec in (select taddr,used_ublk from v$session s,v$transaction t
6 where s.taddr=t.addr
7 and s.sid=&session_to_be_monitored)
8 loop
9 insert into undo_monitor values(c0rec.taddr,c0rec.used_ublk,sysdate);
10 end loop;
11 i := i+1;
12 if i=100000 then
13 exit;
14 end if;
15 end loop;
16 end;
17 /
Enter value for session_to_be_monitored: 162
old 7: and s.sid=&session_to_be_monitored)
new 7: and s.sid=162)
PL/SQL procedure successfully completed.
20:39:49 SQL> select distinct time,taddr,undo_ublk from undo_monitor order by t
ime;
TIME TADDR UNDO_UBLK
--------- -------------------------------------------------- ----------
17-AUG-05 4C64FADC 1
17-AUG-05 4C64FADC 1
17-AUG-05 4C64FADC 1
17-AUG-05 4C64FADC 1
17-AUG-05 4C6504D4 2
17-AUG-05 4C64FADC 5
17-AUG-05 4C64FADC 6
17-AUG-05 4C64FADC 9
17-AUG-05 4C64FADC 10
17-AUG-05 4C64FADC 16
17-AUG-05 4C6504D4 2
17-AUG-05 4C64FADC 17
17-AUG-05 4C64FADC 21
17-AUG-05 4C64FADC 25
17-AUG-05 4C64FADC 27
17-AUG-05 4C6504D4 2
17-AUG-05 4C64FADC 27
17-AUG-05 4C64FADC 30
17-AUG-05 4C64FADC 32
17-AUG-05 4C64FADC 36
17-AUG-05 4C64FADC 38
17-AUG-05 4C64FADC 49
17-AUG-05 4C64FADC 52
17-AUG-05 4C64FADC 52
17-AUG-05 4C64FADC 52
17-AUG-05 4C64FADC 52
17-AUG-05 4C64FADC 52
17-AUG-05 4C64FADC 52
17-AUG-05 4C64FADC 52
17-AUG-05 4C64FADC 52
80 rows selected.
20:39:56 SQL>
#############################################################################
August 18, 2005 - 3:34 pm UTC
as I said, I cannot even REMOTELY reproduce this at all.
date is only valid down to the second, perhaps -- these rows are scrambled. put I in that last column, not sysdate and see.
(select to_char(time, 'dd-mon-yyyy hh24:mi:ss' ) -- see if your times are not all in the same second there.
What if standby database were used?
Michel, August 18, 2005 - 9:25 am UTC
Hi Tom,
replacing update with CTAS and/or INSERT /*+ append */ INTO ... is especially usefull because of NOLOGGING
Now, what to do if we are using stnadby databases and so it is not a good idea to use operations with NOLOGGING
August 18, 2005 - 4:05 pm UTC
CTAS and /*+ append */ are done with logging enabled in ARCHIVELOG mode, unless you specify "alter <segment_name> NOLOGGING" and even then the dba can "force logging" at the database level.
LOGGING happens with this stuff!!!
Excellent
Peter, August 18, 2005 - 12:55 pm UTC
Hi Tom,
You are the best of best. I visit your site everyday.
I need your expertise in this regard too.
I have a fact table T
( order_id number,
customer_id number,
order_date date,
month_id number,
prd_id number,
prd_name varchar2(30),
quantity number,
zip number,
flag smallint,
swiprd_id number)
with about 5 million records imported every month.
NOTE: one product (prd_name) may have one or more prd_ids. There are about 200 prd_name and 2000 different prd_id in this table)
Oracle SERVER: Oracle9i Enterprise Edition Release 9.2.0.5.0 - 64bit Production
T was partitioned by month_id.
A local index is builded on T(customer_id, order_date, prd_name).
table T was analyzed using
dbms_stats.GATHER_TABLE_STATS(
OWNNAME=>'USER',
TABNAME=>'T',
ESTIMATE_PERCENT=>dbms_stats.auto_sample_size,
METHOD_OPT => 'for all indexed columns',
cascade => true,
degree => 4);
I need to know following statitics informations each month
1) how many continue customers for each prd_id (not just prd_name) in this month.
the continue customer is defined as A customer who bought a product (with a prd_id ) in this month
and also bought same product (with same or different prd_id) in past 365 days and ((previous order_date+ 30*previous quantity) >= current order_date)
SET flag =1
2) how many restart customers for each prd_id (not just prd_name) in this month.
the restart customer is defined as A customer who is not a continue customer and bought a product (with a prd_id ) in this month
and also bought same product (with same or different prd_id) in past 365 days and ((previous order_date+ 30*previous quantity) < current order_date)
SET flag =2
3) how many switch customers for each prd_id (not just prd_name) in this month.
the restart customer is defined as A customer who is not a continue nor restart customer and bought a product (with prd_id) in this month
and also bought different product (not just different prd_id) in past 365 days.
and also record the what prd_id switched from
SET flag =3, swiprd_id = previous prd_id
4) how many new customers for each prd_id (not prd_name) in this month.
the new customer is defined as A customer who bouht a product but never bought any product in past 365 days
SET flag =4
Current solutions: use a procedure to process each order
CREATE OR REPLACE PROCEDURE get_flag(v_monthid IN NUMBER)
AS
cursor c1 IS
SELECT distinct prd_id
from T
WHERE month_id=v_monthid order by prd_id;
cursor c2 is
SELECT customer_id, order_date, prd_name
FROM T
WHERE prd_id=v_prd_id and month_id = v_monthid
order by customer_id, order_date desc, quantity desc
FOR UPDATE of flag NOWAITE ;
cursor c3 -- for all orders with same customer and same
-- product during lookback
is SELECT order_date, quantity
FROM T
WHERE customer_id = v_customer_id
and prd_name = v_prd_name
AND order_date BETIEN (v_order_date-366)
AND (v_order_date-1)
order by order_date desc, quantity desc;
cursor c4 -- for all orders with same customer and
-- different product during lookback
is SELECT prd_id
FROM T
WHERE customer_id = v_customer_id
and prd_name <> v_prd_name
AND order_date BETIEN (v_order_date-366)
AND (v_order_date-1)
order by order_date desc, quantity desc;
TYPE customer_array is TABLE of T.customer_id%type;
TYPE order_date_array is TABLE of T.order_date%type;
TYPE prd_name_array is TABLE of T.prd_name%type;
lv_customer_arr customer_array;
lv_order_date_arr order_date_array;
lv_prd_name_arr prd_name_array;
v_prd_id T.prd_id%type;
v_customer_id T.customer_id%type;
v_order_date T.order_date%type;
v_order_date_tmp T.order_date%type;
v_prd_name T.prd_name%type;
v_quantity T.quantity%type;
v_flag PLS_INTEGER;
v_swiprd_id T.swiprd_id%type;
BEGIN
FOR lv_prd in c1
LOOP
v_prd_id :=lv_prd.prd_id;
v_customer_id :=0 ;
OPEN c2;
LOOP
FETCH BULK COLLECT INTO lv_customer_arr,
lv_order_date_arr, lv_prd_name_arr
LIMIT 100;
FOR i in 1 .. lv_customer_arr.count
LOOP
IF v_customer_id <> lv_customer_arr(i)
v_customer_id := lv_customer_arr(i);
v_order_date := lv_order_date_arr(i)
v_prd_name := lv_prd_name_arr(i);
v_flag := 0;
v_swiprd_id := 0;
OPEN c3;
FETCH c3 INTO v_order_date_tmp, v_quantity ;
IF c3%NOTFOUND
THEN
OEPN c4
FETCH INTO v_swiprd_id ;
IF c4%NOTFOUND
THEN -- bought another product
v_flag :=4;
ELSE
v_flag :=3;
END IF;
CLOSE c4;
ELSE -- bought same product
IF (v_order_date_tmp + 30*v_quantity)
>=v_order_date
THEN
v_flag :=1 ;
ELSE
v_flag :=2;
END IF;
END IF;
CLOSE c3
-- UPDATE flag and swiprd_id in table T
IF v_flag =3
THEN
UPDATE T
SET flag = v_flag,
swiprd_id = v_swiprd_id
WHERE CURRENT OF c2;
ELSE
UPDATE T
SET flag = v_flag
WHERE CURRENT OF c2;
END IF;
END IF;
END LOOP;
EXIT WHEN c2%notfound;
END LOOP;
CLOSE c2;
END LOOP;
EXCEPTION
WHEN ... THEN ....;
WHEN OTHERS THEN
dbms_output.put_line(SubStr('Error '||TO_CHAR(SQLCODE)||': '||SQLERRM, 1, 255));
RAISE;
END;
After processing I can run some query to get statitics information I need.
This solution is obviously very slow.
(for each record 1~2 querys invoke, total about 5~7 million querys called)
It will take about a week to finish the whole process.
IF I divide the prd_id into 6 different ranges and run 6 procedures with diffent prd_id ranges at same time.
It will take about 2 days. But it is still too long.
I did go thru. some threads in your site. you suggested that:
CREATE TABLE new_table as select <do the update "here"> from old_table;
............
but it is very hard to achieve <do the update "here"> here.
My questions are:
1) Is there any better way to achieve this?
2) IF this is a way to be, how to improve the performance.
Thanks,
As you requested
sami, August 20, 2005 - 9:54 am UTC
Dear Tom,
As you requested I put the sequence number(loop#).
I am able to reproduce consistenly in both 8i solaris and 10 windows.
Look at the TADDR difference between
SEQUENCE TADDR UNDO_UBLK TIME
---------- -------------------- ---------- ------------------
18714 4C656458 27 20-AUG-05 09:26:51
18715 4C65695C 2 20-AUG-05 09:26:51
Session#1
==========
SQL> truncate table t1;
Table truncated.
SQL> select index_name from dba_indexes where table_name='T1';
INDEX_NAME
------------------------------
T1_PK
SQL> desc t1
Name Null? Type
----------------------------------------- -------- ----------------------------
OWNER VARCHAR2(30)
OBJECT_NAME VARCHAR2(128)
SUBOBJECT_NAME VARCHAR2(30)
OBJECT_ID NOT NULL NUMBER
DATA_OBJECT_ID NUMBER
OBJECT_TYPE VARCHAR2(19)
CREATED DATE
LAST_DDL_TIME DATE
TIMESTAMP VARCHAR2(19)
STATUS VARCHAR2(7)
TEMPORARY VARCHAR2(1)
GENERATED VARCHAR2(1)
SECONDARY VARCHAR2(1)
SQL> select sid from v$mystat where rownum=1;
SID
----------
122
----*** STARTED MONITONING THIS SESSION FROM HERE ***----
SQL> insert /*+ APPEND */ into t1 select * from dba_objects;
48526 rows created.
SQL>
----*** STOPPED MONITONING THIS SESSION FROM HERE ***----
##########################################################
Session#2
===================
SQL> drop table undo_monitor;
Table dropped.
SQL> create global temporary table undo_monitor(sequence number,taddr varchar2(2
0),undo_ublk number,time date)
2 ;
Table created.
SQL> alter session set nls_date_format='DD-MON-YY HH24:MI:SS';
Session altered.
SQL> declare
2 i number :=0;
3 begin
4 loop
5 for c0rec in (select taddr,used_ublk from v$session s,v$transaction t
6 where s.taddr=t.addr
7 and s.sid=&session_to_be_monitored)
8 loop
9 insert into undo_monitor values(i,c0rec.taddr,c0rec.used_ublk,sysdate);
10 end loop;
11 i := i+1;
12 if i=100000 then
13 exit;
14 end if;
15 end loop;
16 end;
17 /
Enter value for session_to_be_monitored: 122
old 7: and s.sid=&session_to_be_monitored)
new 7: and s.sid=122)
PL/SQL procedure successfully completed.
SQL> spool c:\temp\1.log
SQL> set lines 120 pages 300
SQL> select min(undo_ublk),max(undo_ublk) from undo_monitor;
MIN(UNDO_UBLK) MAX(UNDO_UBLK)
-------------- --------------
1 52
SQL>
SQL> select * from undo_monitor where undo_ublk >1 and undo_ublk <52 order by se
quence;
SEQUENCE TADDR UNDO_UBLK TIME
---------- -------------------- ---------- ------------------
[truncated for display purpose]
18696 4C656458 27 20-AUG-05 09:26:51
18697 4C656458 27 20-AUG-05 09:26:51
18698 4C656458 27 20-AUG-05 09:26:51
18699 4C656458 27 20-AUG-05 09:26:51
18700 4C656458 27 20-AUG-05 09:26:51
18701 4C656458 27 20-AUG-05 09:26:51
18702 4C656458 27 20-AUG-05 09:26:51
18703 4C656458 27 20-AUG-05 09:26:51
18704 4C656458 27 20-AUG-05 09:26:51
18705 4C656458 27 20-AUG-05 09:26:51
18706 4C656458 27 20-AUG-05 09:26:51
18707 4C656458 27 20-AUG-05 09:26:51
18708 4C656458 27 20-AUG-05 09:26:51
18709 4C656458 27 20-AUG-05 09:26:51
18710 4C656458 27 20-AUG-05 09:26:51
18711 4C656458 27 20-AUG-05 09:26:51
18712 4C656458 27 20-AUG-05 09:26:51
18713 4C656458 27 20-AUG-05 09:26:51
18714 4C656458 27 20-AUG-05 09:26:51
18715 4C65695C 2 20-AUG-05 09:26:51
18716 4C65695C 2 20-AUG-05 09:26:51
18717 4C65695C 2 20-AUG-05 09:26:51
18718 4C65695C 2 20-AUG-05 09:26:51
18719 4C65695C 2 20-AUG-05 09:26:51
18720 4C65695C 2 20-AUG-05 09:26:51
18721 4C65695C 2 20-AUG-05 09:26:51
18722 4C65695C 2 20-AUG-05 09:26:51
18723 4C65695C 2 20-AUG-05 09:26:51
18724 4C65695C 2 20-AUG-05 09:26:51
18725 4C65695C 2 20-AUG-05 09:26:51
18726 4C65695C 2 20-AUG-05 09:26:51
18727 4C65695C 2 20-AUG-05 09:26:51
18728 4C65695C 2 20-AUG-05 09:26:51
18729 4C65695C 2 20-AUG-05 09:26:51
18730 4C65695C 2 20-AUG-05 09:26:51
18731 4C65695C 2 20-AUG-05 09:26:51
18732 4C65695C 2 20-AUG-05 09:26:51
18733 4C65695C 2 20-AUG-05 09:26:51
18734 4C65695C 2 20-AUG-05 09:26:51
18735 4C65695C 2 20-AUG-05 09:26:51
18736 4C65695C 2 20-AUG-05 09:26:51
18737 4C65695C 2 20-AUG-05 09:26:51
18738 4C65695C 2 20-AUG-05 09:26:51
18739 4C65695C 2 20-AUG-05 09:26:51
18740 4C65695C 2 20-AUG-05 09:26:51
18741 4C65695C 2 20-AUG-05 09:26:51
18742 4C65695C 2 20-AUG-05 09:26:51
18743 4C65695C 2 20-AUG-05 09:26:51
18744 4C65695C 2 20-AUG-05 09:26:51
18745 4C65695C 2 20-AUG-05 09:26:51
18746 4C656458 38 20-AUG-05 09:26:51
18747 4C656458 38 20-AUG-05 09:26:51
18748 4C656458 38 20-AUG-05 09:26:51
18749 4C656458 38 20-AUG-05 09:26:51
18750 4C656458 38 20-AUG-05 09:26:51
18751 4C656458 38 20-AUG-05 09:26:51
18752 4C656458 38 20-AUG-05 09:26:51
18753 4C656458 38 20-AUG-05 09:26:51
18754 4C656458 38 20-AUG-05 09:26:51
18755 4C656458 38 20-AUG-05 09:26:51
18756 4C656458 38 20-AUG-05 09:26:51
18757 4C656458 38 20-AUG-05 09:26:51
18758 4C656458 38 20-AUG-05 09:26:51
18759 4C656458 38 20-AUG-05 09:26:51
18760 4C656458 38 20-AUG-05 09:26:51
18761 4C656458 38 20-AUG-05 09:26:51
18762 4C656458 38 20-AUG-05 09:26:51
18763 4C656458 38 20-AUG-05 09:26:51
18764 4C656458 38 20-AUG-05 09:26:51
18765 4C656458 38 20-AUG-05 09:26:51
18766 4C656458 38 20-AUG-05 09:26:51
18767 4C656458 38 20-AUG-05 09:26:51
18768 4C656458 38 20-AUG-05 09:26:51
18769 4C656458 38 20-AUG-05 09:26:51
18770 4C656458 38 20-AUG-05 09:26:51
18771 4C656458 38 20-AUG-05 09:26:51
18772 4C656458 38 20-AUG-05 09:26:51
18773 4C656458 38 20-AUG-05 09:26:51
18774 4C656458 38 20-AUG-05 09:26:51
18775 4C656458 38 20-AUG-05 09:26:51
18776 4C656458 38 20-AUG-05 09:26:51
18777 4C656458 38 20-AUG-05 09:26:51
18778 4C656458 38 20-AUG-05 09:26:51
18779 4C656458 38 20-AUG-05 09:26:51
18780 4C656458 38 20-AUG-05 09:26:51
18781 4C656458 38 20-AUG-05 09:26:51
18782 4C656458 38 20-AUG-05 09:26:51
18783 4C656458 38 20-AUG-05 09:26:51
18784 4C656458 38 20-AUG-05 09:26:51
18785 4C656458 38 20-AUG-05 09:26:51
18786 4C656458 38 20-AUG-05 09:26:51
18787 4C656458 38 20-AUG-05 09:26:51
18788 4C656458 38 20-AUG-05 09:26:51
18789 4C656458 38 20-AUG-05 09:26:51
18790 4C656458 38 20-AUG-05 09:26:51
18791 4C656458 38 20-AUG-05 09:26:51
18792 4C656458 38 20-AUG-05 09:26:51
18793 4C656458 38 20-AUG-05 09:26:51
18794 4C656458 38 20-AUG-05 09:26:51
18795 4C656458 38 20-AUG-05 09:26:51
18796 4C656458 38 20-AUG-05 09:26:51
18797 4C656458 38 20-AUG-05 09:26:51
18798 4C656458 38 20-AUG-05 09:26:51
18799 4C656458 38 20-AUG-05 09:26:51
18800 4C656458 38 20-AUG-05 09:26:51
18801 4C656458 38 20-AUG-05 09:26:51
18802 4C656458 38 20-AUG-05 09:26:51
18803 4C656458 38 20-AUG-05 09:26:51
18804 4C656458 38 20-AUG-05 09:26:51
18805 4C656458 38 20-AUG-05 09:26:51
18806 4C656458 38 20-AUG-05 09:26:51
18807 4C656458 38 20-AUG-05 09:26:51
18808 4C656458 38 20-AUG-05 09:26:51
18809 4C656458 38 20-AUG-05 09:26:51
18810 4C656458 38 20-AUG-05 09:26:51
18811 4C656458 38 20-AUG-05 09:26:51
18812 4C656458 38 20-AUG-05 09:26:51
18813 4C656458 38 20-AUG-05 09:26:51
18814 4C656458 38 20-AUG-05 09:26:51
18815 4C656458 38 20-AUG-05 09:26:51
18816 4C656458 38 20-AUG-05 09:26:51
18817 4C656458 38 20-AUG-05 09:26:51
18818 4C656458 38 20-AUG-05 09:26:51
18819 4C656458 38 20-AUG-05 09:26:51
18820 4C656458 38 20-AUG-05 09:26:51
18821 4C656458 38 20-AUG-05 09:26:51
18822 4C656458 38 20-AUG-05 09:26:51
18823 4C656458 38 20-AUG-05 09:26:51
18824 4C656458 38 20-AUG-05 09:26:51
18825 4C656458 38 20-AUG-05 09:26:51
18826 4C656458 38 20-AUG-05 09:26:51
18827 4C656458 38 20-AUG-05 09:26:51
18828 4C656458 38 20-AUG-05 09:26:51
18829 4C656458 38 20-AUG-05 09:26:51
18830 4C656458 38 20-AUG-05 09:26:51
18831 4C656458 38 20-AUG-05 09:26:51
18832 4C656458 38 20-AUG-05 09:26:51
18833 4C656458 38 20-AUG-05 09:26:51
18834 4C656458 38 20-AUG-05 09:26:51
18835 4C656458 38 20-AUG-05 09:26:51
18836 4C656458 38 20-AUG-05 09:26:51
18837 4C656458 38 20-AUG-05 09:26:51
18838 4C656458 40 20-AUG-05 09:26:51
18839 4C656458 40 20-AUG-05 09:26:51
18840 4C656458 40 20-AUG-05 09:26:51
18841 4C656458 40 20-AUG-05 09:26:51
18842 4C656458 40 20-AUG-05 09:26:51
18843 4C656458 40 20-AUG-05 09:26:51
18844 4C656458 40 20-AUG-05 09:26:51
18845 4C656458 40 20-AUG-05 09:26:51
18846 4C656458 40 20-AUG-05 09:26:51
18847 4C656458 40 20-AUG-05 09:26:51
18848 4C656458 40 20-AUG-05 09:26:51
18849 4C656458 40 20-AUG-05 09:26:51
18850 4C656458 40 20-AUG-05 09:26:51
18851 4C656458 40 20-AUG-05 09:26:51
18852 4C656458 40 20-AUG-05 09:26:51
18853 4C656458 40 20-AUG-05 09:26:51
18854 4C656458 40 20-AUG-05 09:26:51
18855 4C656458 40 20-AUG-05 09:26:51
18856 4C656458 40 20-AUG-05 09:26:51
18857 4C656458 40 20-AUG-05 09:26:51
18858 4C656458 40 20-AUG-05 09:26:51
18859 4C656458 40 20-AUG-05 09:26:51
18860 4C656458 40 20-AUG-05 09:26:51
18861 4C656458 40 20-AUG-05 09:26:51
18862 4C656458 40 20-AUG-05 09:26:51
18863 4C656458 40 20-AUG-05 09:26:51
18864 4C656458 40 20-AUG-05 09:26:51
18865 4C656458 40 20-AUG-05 09:26:51
18866 4C656458 45 20-AUG-05 09:26:51
18867 4C656458 45 20-AUG-05 09:26:51
18868 4C656458 45 20-AUG-05 09:26:51
18869 4C656458 45 20-AUG-05 09:26:51
18870 4C656458 45 20-AUG-05 09:26:51
18871 4C656458 45 20-AUG-05 09:26:51
18872 4C656458 45 20-AUG-05 09:26:51
18873 4C656458 45 20-AUG-05 09:26:51
18874 4C656458 45 20-AUG-05 09:26:51
18875 4C656458 45 20-AUG-05 09:26:51
18876 4C656458 45 20-AUG-05 09:26:51
18877 4C656458 45 20-AUG-05 09:26:51
18878 4C656458 45 20-AUG-05 09:26:51
18879 4C656458 46 20-AUG-05 09:26:51
18880 4C656458 51 20-AUG-05 09:26:51
18881 4C656458 51 20-AUG-05 09:26:51
18882 4C656458 51 20-AUG-05 09:26:51
18883 4C656458 51 20-AUG-05 09:26:51
18884 4C656458 51 20-AUG-05 09:26:51
18885 4C656458 51 20-AUG-05 09:26:51
18886 4C656458 51 20-AUG-05 09:26:51
18887 4C656458 51 20-AUG-05 09:26:51
18888 4C656458 51 20-AUG-05 09:26:51
18889 4C656458 51 20-AUG-05 09:26:51
18890 4C656458 51 20-AUG-05 09:26:51
August 20, 2005 - 5:15 pm UTC
Recursive SQL.
turn on sql trace and see if you are not doing SQL behind your SQL (space management, quotas, etc)....
That was likely a recursive sql transaction to update the dictionary.
Which is the best way to update?
A Reader, August 22, 2005 - 11:45 pm UTC
Hi Tom,
We have a fact table with few dimension tables. Initially the records are inserted into the fact table. This way most of the columns in the table will be populated with values. For the remaining columns we have to update the table by joining with other tables.
1) What will be the best way to update? We are currently using correlated queries to update, but there is no index on the column being joined in the fact table. The source table has index if its a key column.
2)Should we create index on the joining columns in the fact table? If there are many such columns, is it a good idea to create index on each of them before updating? Will this improve performance?
3)We are currently analyzing the fact table before update. Is this the right way though we are not creating index on all required columns before update?
4) Should index be created on table before update and the table be analyzed?
Your inputs will be highly appreciated and very helpful.
Thanks in advance.
August 24, 2005 - 3:44 am UTC
I like to update joins
update ( select t1.a, t2.b from t1, t2 where t1.fk = t2.pk )
set a = b;
t2 would need to have a primary key in place for this.
Which is the best way to update?
A Reviewer, August 24, 2005 - 12:17 am UTC
Tom can you please answer the queries posted in the immediately previous post?
August 24, 2005 - 9:06 am UTC
I did, update the join by primary key. That was my answer. Use the CBO.
Allan Sun, October 02, 2005 - 1:01 pm UTC
I found your answer is very helpful and wonder how I can submit my question to you.
Thanks
October 02, 2005 - 5:35 pm UTC
when taking them, there is a button on the home page.
Calculate redolog generation
A reader, October 04, 2005 - 1:49 pm UTC
Tom,
Is there any way to calculate the amount of redolog that is going to be generated when some 2 million records are updated?
I have to update 2 million records (I can't recreate a dummy table etc etc). I need to know how much redolog will be generated and OS space required for the undo datafile
October 04, 2005 - 6:23 pm UTC
not really, not without simulating at least a portion of it on some system and measuring the amount of redo.
Table/column widths, row chaining/migration, indexes involved - they'll all play a role.
I frequently do 1,000,000 to 10,000,000 records on my laptop - I would think you can test this out on a test system someplace?
it's very good
gopal, December 07, 2005 - 5:51 am UTC
Delete, then Insert or Update?
A reader, January 06, 2006 - 3:35 pm UTC
Tom,
Quick Question. while trying to modify a row in a table from a procedure, is it better[Performance wise] to delete existing one and then insert with new values or update the existing values with new values,WHY?
Thanks,
January 06, 2006 - 4:52 pm UTC
update is less work.
you typically update a subset of the columns.
only the columns that change values and have indexes on them will affect the indexes.
doing a delete+insert would take more resources, hit all indexes.
Large update due to consolidation
David Piazza, January 25, 2006 - 8:41 pm UTC
We are consolidating five databases where the schemas are the same into one database. Currently a test system was created with five schemas that hold the previous data from the five databases. It isn't a clean update as all primary keys are in the same range, so some number x will have to be added to the primary(Psuedo key) and child keys in each schema to avoid dups. After reading through this topic, I can see two ways of doing the merge and updating the first schema: CTAS incrementing each primary and foreign key by some x value. Then either insert /*+ append or use the merge command to insert/merge the data into the first schema. The second way would be to directly insert /*+ append into the first schema from tables in each of the five schemas adding the appropriate value to the primary/foreign keys. How would you suggest implementing this?
January 26, 2006 - 10:16 am UTC
I would try to load the "fixed" data.
Eg: if the keys that are in need of fixing are all numbers - I might multiply each key by 10 and add 1, 2, 3, 4, or 5 to each during the load.
mass insert code package
Brett Clark, February 13, 2006 - 4:47 pm UTC
For what its worth, here is a package that can be used as a generic foundation for mass loads. It will disable triggers and indexes and (optionally) truncate the main table before running the insert statement that you provide. Hope this helps someone.
create or replace package table_support is
TRUNCATE_NO constant boolean := FALSE;
TRUNCATE_YES constant boolean := TRUE;
procedure load_table(G_TABLE_NAME Varchar2, G_INSERT_STMT Varchar2, TRUNC_ACTION Boolean DEFAULT TRUNCATE_NO );
end table_support;
/
create or replace package body table_support is
ACTION_ENABLE constant boolean := TRUE;
ACTION_DISABLE constant boolean := FALSE;
procedure exec( inStmt Varchar2 ) is
begin
EXECUTE IMMEDIATE inStmt;
end exec;
function exec( inStmt Varchar2 ) return Number is
begin
exec( inStmt );
return( SQL%ROWCOUNT );
end exec;
procedure tbl_triggers(G_TABLE_NAME Varchar2, action Boolean) is
begin
if( action = ACTION_ENABLE ) then
exec( 'ALTER TABLE ' || G_TABLE_NAME || ' ENABLE ALL TRIGGERS' );
elsif ( action = ACTION_DISABLE ) then
exec( 'ALTER TABLE ' || G_TABLE_NAME || ' DISABLE ALL TRIGGERS' );
end if;
end tbl_triggers;
procedure tbl_indexes( G_TABLE_NAME Varchar2, action Boolean ) is
begin
FOR C IN (SELECT INDEX_NAME
FROM USER_INDEXES
WHERE TABLE_NAME = G_TABLE_NAME) LOOP
if( action = ACTION_DISABLE ) then
exec( 'alter index ' || C.INDEX_NAME || ' unusable' );
elsif( action = ACTION_ENABLE ) then
exec( 'alter index ' || C.INDEX_NAME || ' rebuild parallel nologging' );
end if;
END LOOP;
if( action = ACTION_ENABLE ) then
exec( 'alter session set skip_unusable_indexes=false' );
elsif ( action = ACTION_DISABLE ) then
exec( 'alter session set skip_unusable_indexes=true' );
end if;
end tbl_indexes;
procedure load_table(G_TABLE_NAME Varchar2, G_INSERT_STMT Varchar2, TRUNC_ACTION Boolean DEFAULT TRUNCATE_NO ) is
rowCount Number;
begin
-- DISABLE ALL TRIGGERS
tbl_triggers( G_TABLE_NAME, ACTION_DISABLE );
-- TRUNCATE TABLE
if( TRUNC_ACTION = TRUNCATE_YES ) then
exec( 'truncate table ' || G_TABLE_NAME );
end if;
-- DISABLE INDEXES
tbl_indexes( G_TABLE_NAME, ACTION_DISABLE );
-- ENABLE PARALLEL DML
exec( 'alter session enable parallel dml' );
-- EXECUTE INSERT STATEMENT
rowCount := exec( G_INSERT_STMT );
DBMS_OUTPUT.PUT_LINE('Inserted ' || rowCount || ' records into ' || G_TABLE_NAME);
-- ENABLE/REBUILD INDEXES
tbl_indexes( G_TABLE_NAME, ACTION_ENABLE );
-- ENABLE TRIGGERS
tbl_triggers( G_TABLE_NAME, ACTION_ENABLE );
-- TURN OFF PARALLEL DML
exec( 'alter session disable parallel dml' );
end load_table;
begin
-- Initialization
NULL;
end table_support;
/
mass update
Hans, February 17, 2006 - 5:19 pm UTC
I have a table with 50 columns and 10 million records and I have a file with 10 million rows and I have to update 5 columns in the table from the file( the update is performed by getting the common ID value).
Steps I am performing
1.open the file using utl_file
2. read the record
3. perform the update in the table
The process is very slow.
I will appreciate better ideas from the group
February 18, 2006 - 8:16 am UTC
external table (so you can query the flat file), or sqlldr it into a temporary table.
and then merge.
If you are in 9i, you'll have to:
merge into db_table
using (select * from external_table where id in (select id from db_table))
on (db_table.id = external_table.id)
when matched then update ...
when not matched then insert (id) values (null); <<<=== never happens
in 10g, you can just
merge into db_table
using external_table
on (db_table.id = external_table.id)
when matched then update ...;
A reader, February 18, 2006 - 5:37 am UTC
Morning TOM.
sudip, February 19, 2006 - 12:14 am UTC
What is archieve log mode? why we prefer recovery in Archieve log mode? Please, will u brief it?
A reader, February 19, 2006 - 12:17 pm UTC
To sudip from BGALORE, india
Looks like you are new to this site just put one thing in your mind Tom really ticks off for the "U" so dont use it next time :-)
February 19, 2006 - 5:16 pm UTC
Hey, the page he used to post that actually said "if you do this, I will do what I just did..."
Which is to poke fun at at :)
It amazes more than ticks me off.
Just like in some parts of the world, an "English Accent" adds 10 IQ points (when I hear a British accent, for whatever reason - I add default IQ points, maybe it was all of the James Bond movies when I was a kid) - using this instant message speak takes them away.
Large Update Due To Consolidation
David Piazza, February 21, 2006 - 4:29 pm UTC
Tom,
Regarding my previous question. They've decided to do merge one of the five clusters at a time. I've got scripts to implement the first cluster as such:
CTAS NOLOGGING setting the keys to old_key*10+1
insert /*+ append */
DROP old table
RENAME new table
GRANTs
ALTER TABLE add Primary Key
ALTER TABLE add Foreign Key
CREATE INDEX NOLOGGING parallel(degree 4)
ALTER TABLE ADD CONSTRAINT UNIQUE ENABLE
ALTER TABLE Logging
This seems pretty fast. Anything you see that will make it faster?
When I merge in the next clusters, should I:
- Do the same steps as above(*10+2,*10+3 etc.)
- Disable primary keys, make indexes unusable and
Insert /*+ append, and then enable PK's, and indexes
- Drop primary keys, indexes, and constraints, and
INSERT /*+ append, create PK's, Indexes, and constraints
The number of rows in the tables(Approximately the same in each cluster) are: 72338677,32278464,25347185,23834984,7595305,4519904,4371347,1229877,1168373,6669752,6382030,97250,4694,934,166
-Thanks
February 22, 2006 - 8:26 am UTC
why not all of them at once to avoid having to do the disable/enable over and over.
Large Update Due To Consolidation
David Piazza, February 22, 2006 - 1:28 pm UTC
That was my suggestion, but the customer wants to do one cluster at a time, especially the first one, ensuring that it works as before and all the users are happy before merging another cluster in...
February 22, 2006 - 2:09 pm UTC
then you got it.
You don't really need to disable all contraints and if you could get it done in two bits - that would be the least work (load cluster 1, then 2, 3, 4 and 5 after testing 1)
Large Update Due To Consolidation
David Piazza, February 23, 2006 - 4:23 pm UTC
Ok. Which of the three methods would you suggest to merge in the remaining clusters, or would you recommend something else?
1) Do the same steps as above, CTAS/DROP old table/Rename
new table to old table/Insert /*+ append(*10+3,*10+4
etc.)/Create constraints,indexes.
OR
2) Disable primary keys, make indexes unusable and
Insert /*+ append, and then enable PK's, and indexes
OR
3) Drop primary keys, indexes, and constraints, and
INSERT /*+ append, create PK's, Indexes, and constraints
OR
4) Something else
I'm leaning towards 2) or 3)
-Thanks
February 23, 2006 - 8:05 pm UTC
I would just be disabling indexes (if I wanted to - insert append does a rather efficient "merge" of data) and insert appending the other data. I would not redo the CTAS over.
Bulk Insert / Update in sigle SQL
Narendra, February 24, 2006 - 1:49 am UTC
Tom,
I have a target table that stores current as well as history records. Current records are identified by a field CURRENT_REC having value of 'Y'. History records have value of CURRENT_REC as 'N'. At any stage, for a PART_NO, there can be only SINGLE record with CURRENT_REC as 'Y' in target table.
My problem is I want to transfer data (in bulk) from source table to target table with CURRENT_REC flag as 'Y' and simultaneously update existing records having CURRENT_REC as 'Y'. Following is table definition:
Source table:
CREATE TABLE PART_SRC
(
PART_NO NUMBER,
PART_NAME VARCHAR2(100),
BALANCE_QTY NUMBER(5),
CREATED_DATE DATE,
CURRENT_REC VARCHAR2(1)
)
target Table:
CREATE TABLE PART_TABLE
(
PART_NO NUMBER,
PART_NAME VARCHAR2(100),
BALANCE_QTY NUMBER(5),
CREATED_DATE DATE,
CURRENT_REC VARCHAR2(1)
)
Is this possible in a single SQL ?
February 24, 2006 - 8:25 am UTC
I don't get it.
I want to transfer data (in bulk) from source table to target table with current_rec flag as 'Y'
I understand that.
the "simultaneously update existing records have a current-rec as 'Y'"
which existing records, in which table, and update what exactly (since we are already looking only at 'Y' records - what exactly are we updating??!?!)
Bulk Insert / Update in sigle SQL
Michel Cadot, February 24, 2006 - 9:50 am UTC
If i clearly understand you, you want something like (inspired by "merge delete" or "insert all" statements):
merge into part_table t
using part_src s
on (t.part_no = s.part_no and t.current_rec='Y')
when matched then
update t.current_rec = 'N'
insert (t.part_no, ...) values (s.part_no, ...)
when not matched then
insert (t.part_no, ...) values (s.part_no, ...)
/
or
merge ALL into part_table t
using part_src s
on (t.part_no = s.part_no and t.current_rec='Y')
when matched then update t.current_rec = 'N'
when TRUE then
insert (t.part_no, ...) values (s.part_no, ...)
/
But this does not exist. ;)
Michel
More details
Narendra, February 24, 2006 - 9:28 pm UTC
Tom,
One correction to table definitions.
Source table does not contain CURRENT_REC field.
What I am trying to achieve is transfer data from source to target. Each time data is appended to target table. I may transfer updated details of PART_NO multiple times from source to target. However, I do not want data in target table to be overwritten. In order to identify the "last loaded record", I am maintaining CURRENT_REC field in target table. If a PART_NO (from source table) does not exist in target table, it should simply add that PART_NO to target table with value of CURRENT_REC = 'Y'. However, if PART_NO already exists i.e. data has been transferred earlier from source to target for that PART_NO, target table should already have a single record with CURRENT_REC as 'Y' and may have multiple records with CURRENT_REC as 'N' for same PART_NO. Now, when I transfer data for same PART_NO again from source to target, it should be added with CURRENT_REC as 'Y'. However, at the same time, existing record for same PART_NO, having CURRENT_REC as 'Y' should be updated with 'N'.
Hope I am clear.
Michel,
MERGE will not help since CURRENT_REC will not be there in my source table.
Thanks
More details
Michel Cadot, February 25, 2006 - 8:45 am UTC
Narendra,
I understood this and the pseudo-merges don't use the current_rec of source (s) but only of target (t). It would be clearer if I wrote "insert (t.part_no, ..., t.current_rec) values (s.part_no, ..., 'YES')".
There is no way to do it in a single SQL statement but it is easy to do it in an efficient PL/SQL block.
Regards
Michel
Merging ten large tables
Jesse Buzzard, March 17, 2006 - 7:00 pm UTC
Good Evening,
This is my first foray into very large databases and I need some advice.
We have ten tables that contain blob data. The total size of the tables is approximately 1.8TB. They are stored in the same table space. How this came to be is a long story for another discussion thread. Our task is merge the tables into one table. After selecting the table with the largest number of rows as the target we came up with the below code to insert the data from another table into it.
-------- start of code -------
#: Announce start insert
echo "-------------------------------------------------------------"
echo "`date '+ %a %e-%b-%Y %X'` - $0 - started" >> $1
echo "-------------------------------------------------------------"
$ORACLE_HOME/bin/sqlplus /nolog << EOF
connect odirms/odirms
insert /* +APPEND PARALLEL (nto, 8) */ into odirms.t_ompf4 nto
select /* +APPEND PARALLEL (tompf1, 8) */ * from t_ompf1 tompf1;
commit;
exit
EOF
#: End of program
echo "`date '+ %a %e-%b-%Y %X'` - $0 - completed" >> $1
echo "-------------------------------------------------------------"
echo
------------ end of code ----------
At present we are trying to merge 4 of the tables. All logging and indexes are off. Multi threading is on. From all I have read we should have completed this in a relatively short amount of time. This job has been running for over 28 hours. I can see the space used is growing (about 300MB /minute), but my supervisors would like to know when we will have all the tables merged.
Where have gone astray?
Thank You,
Jesse
March 18, 2006 - 11:55 am UTC
well, you said "all indexes are off" and "we have blobs" - the two concepts do not go together. every blob has an index associated with it that cannot possibly be disabled for any reason (short of dropping the blob column!)
So, you do having logging going on, you do have index maintenance going on.
but you don't give any information as to the size of the tables being read (and hence written into the new table).
I don't think anything has "gone astray" here, I can say that your "hint" isn't a hint, it is just text you typed in:
ops$tkyte@ORA10GR2> create table t as select * from dual;
Table created.
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> insert /* +APPEND */ into t select * from dual;
1 row created.
ops$tkyte@ORA10GR2> select * from t;
D
-
X
X
ops$tkyte@ORA10GR2> insert /*+ APPEND */ into t select * from dual;
1 row created.
ops$tkyte@ORA10GR2> select * from t;
select * from t
*
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel
See how the /* + didn't really do anything, but /*+ does. If that is your real script, you likely did not achieve your goal of either append nor parallel (parallel implies append by the way, if it was in fact using parallel)
Merging ten large tables
Jesse Buzzard, March 20, 2006 - 12:15 am UTC
Thank you for your response.
If I may I would like to impose on your wisdom again. We are averaging 1800MB per hour. Yes, the sample I provied is one of our real scripts, and yes the goal was work in parallel. At present five of the tables (56% of the rows) have been merged. The total size of the data, is 1.8 TB in 32,518,931 rows. As you can see from the description below the amount of data in addition to the blob data is rather small. If we have gotten the hint correct would we have achieve a better insert/append rate?
Name Null? Type
----------------------------------------- -------- -----------
OMPF_ID NOT NULL NUMBER
OMPF_UID NUMBER(10)
OMPF_IMG_TYPE_ID CHAR(1)
OMPF_IMG_STYPE_ID CHAR(1)
OMPF_PAGE_NUM NUMBER(5)
OMPF_ACTIVE_FLAG CHAR(1)
OMPF_ADDED_DT DATE
OMPF_INACTIVE_DT DATE
OMPF_IMAGE BLOB
OMPF_LOCK_FLAG NUMBER(5)
OMPF_USER_ID CHAR(8)
March 20, 2006 - 7:16 am UTC
you might have, you might not have. you sort of have to figure out where you are spending your time (tracing - a level 12 10046 trace on a bit of this would go a long way)
Alternative Suggestion
VK, March 21, 2006 - 2:40 pm UTC
Here is an approach based on exchange partitions I used successfully in the past for combining tables (This will work only if the data in the individual tables is distinct and there is no overlap - A scenario in which this will work well is when people in their infinite wisdom create horizontally "partitioned" tables. An example may be where all customers with ID 1 to one million are kept in CUST1 table and one million to 5 million in CUST2 table and so on..):
4 tables to be combined:
create table test(a number,b blob);
create table test1(a number,b blob);
create table test2(a number,b blob);
create table test3(a number,b blob);
Now, let's say table test has values 1 to 10,000 in column a and test1 has 10,001 to 20,000, test2 has 20,001 to 30,000 and test3 has 30,001 to 40,000.
Create a partitioned table that looks exactly like the tables being combined - pick a partition key where the values do not overlap between tables.
create table testp(a number,b blob)
partition by range(a)
(
partition p1 values less than (10001),
partition p2 values less than (20001),
partition p3 values less than (30001),
partition p4 values less than (maxvalue)
);
You can combine the 4 tables into the partitioned table using exchange partition:
alter table testpart exchange partition p1 with table test;
alter table testpart exchange partition p1 with table test1;
alter table testpart exchange partition p3 with table test2;
alter table testpart exchange partition p4 with table test3;
At this point, you can use the partitioned table or if you don't want the table to be partitioned, you can merge all 4 partitions into a single partition and exchange with a non-partitioned table.
alter table testpart merge partitions p1,p2 into partition p5;
alter table testpart merge partitions p5,p3 into partition p6;
alter table testpart merge partitions p6,p4 into partition p7;
Now we have just one partition (P7) in the table and you can exchange this with any of the original tables (we'll use test).
alter table testpart exchange partition p7 with table test;
Now we have all 40,000 rows in table test. As you can see, it is a lot of work to combine 4 small tables but if you have tens/hundreds of millions of rows, this will be much faster.
Merging ten large tables
Jesse Buzzard, March 21, 2006 - 5:03 pm UTC
Tom,
VK,
Thank you for your suggestions. They are educational and helpful. The merges have nearly completed. Now I am off to read about reorganizing the tablespace to reclaim some of the space formerluy used by the smaller tables.
Merging ten large tables
Jesse Buzzard, March 21, 2006 - 5:48 pm UTC
Tom,
VK,
Thank you for your suggestions. They are educational and helpful. The merges have nearly completed. Now I am off to read about reorganizing the tablespace to reclaim some of the space formerluy used by the smaller tables.
Very slow insert into table
bayaraat, April 17, 2006 - 12:16 am UTC
Hi Tom
I need your expertise in this regard.
My table is
create table CDR00
(
CALLDATETIME DATE,
TELNO VARCHAR2(20),
CALLEDNO VARCHAR2(20),
SECONDS NUMBER,
PULSE NUMBER,
PREFIXCODEA CHAR(14),
PREFIXCODEB CHAR(14),
RATECODE CHAR(28),
TOG_AMOUNT NUMBER(20,2),
USD_AMOUNT NUMBER(20,8),
NO_TOG_AMOUNT NUMBER(20,8),
NO_USD_AMOUNT NUMBER(20,2),
TOG_TAX NUMBER(20,8),
USD_TAX NUMBER(20,8),
FNAME VARCHAR2(12),
OPERATOR VARCHAR2(10),
RESERVED VARCHAR2(10),
PRINTED NUMBER(1)
)
tablespace CDR01
pctfree 20
pctused 80
initrans 1
maxtrans 255
storage
(
initial 1M
minextents 1
maxextents unlimited
);
Every day insert into this table 2 million records and truncate table. This contains million or records insert into partition table.
Insert process is very slow, respond times 12:00:00.
I want to quickly insert many records. I using RAID 5.
Not use constraint and trigger. Non stop archive log process. Because Oracle database server is Standby database server. Any suggestions please...
April 17, 2006 - 7:51 am UTC
if done correctly, it would only take minutes or less to bulk load 2 million records.
So, you need to describe HOW you are loading this.
and if you are loading and truncating, doubtful that you are updating - so pctfree=0 would be more appropriate -and perhaps even "compress", but it depends on how you load after all - which we don't know.
Very slow insert into table
Bayaraat, April 17, 2006 - 12:32 am UTC
Hi Tom,
I need your expertise in this regard.
My create table is
create table CDR00
(
CALLDATETIME DATE,
TELNO VARCHAR2(20),
CALLEDNO VARCHAR2(20),
SECONDS NUMBER,
PULSE NUMBER,
PREFIXCODEA CHAR(14),
PREFIXCODEB CHAR(14),
RATECODE CHAR(28),
TOG_AMOUNT NUMBER(20,2),
USD_AMOUNT NUMBER(20,8),
NO_TOG_AMOUNT NUMBER(20,8),
NO_USD_AMOUNT NUMBER(20,2),
TOG_TAX NUMBER(20,8),
USD_TAX NUMBER(20,8),
FNAME VARCHAR2(12),
OPERATOR VARCHAR2(10),
RESERVED VARCHAR2(10),
PRINTED NUMBER(1)
)
tablespace CDR01
pctfree 20
pctused 80
initrans 1
maxtrans 255
storage
(
initial 1M
minextents 1
maxextents unlimited
);
Every day insert into this table for 2 million and records. Response time is 12:00:00. After insert into another partition table and truncate table. I want very quickly insert million and records. Are you need some information? My oracle database server is Standby database server. This table is not use constraint and trigger.
Any suggestions please...
Moving high volumes of data from table1 to table2 in an OLTP system
A reader, May 10, 2006 - 10:54 am UTC
Tom,
We are looking at archiving data in a table (schema1.table1) having about 40 Million records. This table is populated continously by a java application. The objective is to move all but 5 million records(latest records) to the archive table with minimal impact to the end user. The option I am looking at is
create a different schema (schema2) on tablespace2.
create a new table (table2), structure similar to table1.
create a procedure(p_1) that will copy data from schema1.table1 to schema2.table2.
call procedure 1 from a job on a certain schedule.
Psedo code for procedure p_1
----------
insert into schema2.table2
select a.*
from schema1.table1 a
where year(a.creation_date) < 2006
and rownum < 1000
and a.id not in (select id from schema2.table2);
commit;
delete from schema1.table1 aa
where aa.id in (select bb.id from schema2.table2);
commit;
----------
Would like your expert comments on
1) Do you see any problems with the psedo code?
One of the problems I see is
After step #1 and before step #2 in the psedo code, a user may see duplicate records. Is there a way to avoid this situation?
2) Is there a better of implementing this change?
Thank you
May 11, 2006 - 7:31 am UTC
I'd probably rather move the 5 million records to a new table.
(move the lesser of the two amounts)
time to look at partitioning for you? to avoid this in the future.
Updating a single column to 'N' in a 50 Million records table
A reader, May 15, 2006 - 5:34 pm UTC
Tom,
In one of our application, we have a database conversion job to update a column with a 'N' for a table that has about 50 million rows.
We are using a rollback segment which has the capacity to commit about 8 million rows at a time.
So we are running update statements to commit 7 million rows at a time between each of the 8 identical update statements.
( requires us to run the update 8 times to commit all the 50 million updates)
The time for the conversion of these 50 million rows is taking about 20 hours.
We have already explored the following CTAS method :
1. CTAS/INSERT/APPEND/PARALLEL and rename the new table to orig table but due to space constraints and this particular table is bigger in size, the DBAs say this method is not feasible due to space constraints.
So we are looking for other methods/suggestions to see if we can improve the timing.
Thank you.
May 15, 2006 - 6:08 pm UTC
parallel update, using many rollback segments simultaneously.
I pray this column is NOT indexed.
Parallel update question
A reader, May 18, 2006 - 4:22 pm UTC
Tom,
Per Oracle documentation, it seems Tables must be partitioned to support parallel update and delete.
So, i tried parallel update on partitioned table as shown below but how do we confirm it really did parallel dml because after parallel update is done.. i tried to query the same table and it did not generate any error.
I see the same behaviour on non-partitioned tables when parallel update was performed.
Parallel update statement logs are given below :
Partitioned table :
===================
SQL> update /*+ parallel (T, 3) */ T set status='NV' where status='INVALID' ;
1410 rows updated.
Execution Plan
----------------------------------------------------------
0 UPDATE STATEMENT Optimizer=FIRST_ROWS (Cost=11 Card=138 Byte
s=690)
1 0 UPDATE OF 'T'
2 1 PARTITION RANGE* (ALL) :Q214000
3 2 TABLE ACCESS* (FULL) OF 'T' (Cost=11 Card=138 Bytes=69 :Q214000
0)
2 PARALLEL_COMBINED_WITH_PARENT
3 PARALLEL_TO_SERIAL SELECT /*+ Q214000 NO_EXPAND ROWID(A1) */ A1
.ROWID,A1."STATUS" FROM "T" PX_GRANULE(0, BL
OCK_RANGE, DYNAMIC) A1 WHERE A1."STATUS"='I
NVALID'
Statistics
----------------------------------------------------------
19 recursive calls
1733 db block gets
263 consistent gets
168 physical reads
351584 redo size
854 bytes sent via SQL*Net to client
597 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
1410 rows processed
IOMDV:SQL> select count(*) from T ;
COUNT(*)
------------
12239
1 row selected.
Non-partitioned table :
========================
update /*+ parallel(tn 3) */ tn set valid_ind='N' ;
29250 rows updated.
Execution Plan
----------------------------------------------------------
0 UPDATE STATEMENT Optimizer=FIRST_ROWS (Cost=14 Card=29250 By
tes=58500)
1 0 UPDATE OF 'TN'
2 1 TABLE ACCESS* (FULL) OF 'TN' (Cost=14 Card=29250 Bytes=5 :Q212000
8500)
2 PARALLEL_TO_SERIAL SELECT /*+ Q212000 NO_EXPAND ROWID(A1) */ A1
.ROWID,A1."VALID_IND" FROM "IOMDBO"."TN" PX_
GRANULE(0, BLOCK_RANGE, DYNAMIC) A1
Statistics
----------------------------------------------------------
73 recursive calls
143947 db block gets
244 consistent gets
222 physical reads
18118440 redo size
844 bytes sent via SQL*Net to client
576 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
29250 rows processed
bala@IOMDV> select count(*) from tn ;
COUNT(*)
------------
29250
1 row selected.
A reader, May 19, 2006 - 3:05 pm UTC
Thanks for the info.
in oracle 8.1.7, However, with parallel DML on partitioned table, i should not be able to query the same table until i commit the parallel update.
But i was able to query the same table with our any error generated.
Is it the expected behaviour.
May 20, 2006 - 4:39 pm UTC
you did not do a parallel update.
<quote src=expert Oracle database architecture>
We can actually observe the fact that there are separate independent transactions created for the parallel execution servers. WeÂ’ll use two sessions again, as before. In the session with SID=162, we explicitly enable parallel DML. PDML differs from parallel query in that regards; unless you explicitly ask for it, you will not get it:
big_table@ORA10GR1> alter session enable parallel dml;
Session altered.
The fact that the table is “parallel” is not sufficient, as it was for parallel query. The reasoning behind the need to explicitly enable PDML in your session is the fact that PDML has certain limitations associated with it, which I list after this example.
</quote>
Thank you
A reader, May 20, 2006 - 8:08 pm UTC
RE: Parallel update
bakunian, May 23, 2006 - 8:20 am UTC
I have table "A" 17 GB and 120 mil. rows. I needed to add columns with NOT NULL therefore I had to add column, update every row setting new column to some value, then make it NOT NULL. When I ran update it ran for almost 10 hours, then I thought to create empty dummy table "B" as select * from "A" and setting columnÂ’s default value to required value.
To my surprise when I did
INSERT /*+ parallel (b, 4) INTO b SELECT /*+ parallel (a,4) */ * FROM a
Default values were not inserted but when I issued regular insert into b statement default values appeared. Why?
Thank you for your time.
May 23, 2006 - 8:38 am UTC
show me - you are using "*" and not listing any columns - I do not see how a conventional path insert would have used defaults.
column defaults are only used when you explicitly skip the column in the insert statement.
ops$tkyte@ORA10GR2> create table t2( x varchar2(20), y int );
Table created.
ops$tkyte@ORA10GR2> insert into t2 values ( 't2', null );
1 row created.
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> insert /*+ append */ into t (x) select 'direct path' from dual;
1 row created.
ops$tkyte@ORA10GR2> commit;
Commit complete.
ops$tkyte@ORA10GR2> insert /*+ append */ into t select * from t2;
1 row created.
ops$tkyte@ORA10GR2> commit;
Commit complete.
ops$tkyte@ORA10GR2> select * from t;
X Y
-------------------- ----------
direct path 0
t2
ops$tkyte@ORA10GR2> insert into t (x) select 'direct path' from dual;
1 row created.
ops$tkyte@ORA10GR2> commit;
Commit complete.
ops$tkyte@ORA10GR2> insert into t select * from t2;
1 row created.
ops$tkyte@ORA10GR2> commit;
Commit complete.
ops$tkyte@ORA10GR2> select * from t;
X Y
-------------------- ----------
direct path 0
t2
direct path 0
t2
Parellel Update
bakunian, May 23, 2006 - 10:07 am UTC
I see so for my trick to work all I had to do is to skip columns that were supposed to have default values. Correct?
At the very beginning of this thread you gave below example but that does not really work for me so what would you do in my case where I need to update millions of rows with default values and new columns are added to a large table with NOT NULL constraint. Was my method almost correct?
ops$tkyte@ORA817DEV.US.ORACLE.COM> create table new_emp as
2 select empno, LOWER(ename) ename, JOB,
3 MGR, HIREDATE, SAL, COMM, DEPTNO
4 from emp;
Thank you,
May 23, 2006 - 10:11 am UTC
well, since you are going to update the entire table - hence locking every row - and you in effect have a "scheduled downtime", I don't agree with you that you cannot use the create table as select. You say "it doesn't really work for you", I say "why not"
Parallel Update
bakunian, May 23, 2006 - 10:34 am UTC
Forgive me I've been up for almost 48 hours and a bit slow so using your example would it look something like below assuming every person in this table will have ename = 'Bob'
create table new_emp as
select empno, 'Bob', job, MGR, Â….
From emp;
Correct?
May 23, 2006 - 3:26 pm UTC
select empno, 'Bob' ename, job, mgr, ...
from emp
yes.
Estimate time for a create table to end
Thierry Yen Suin, May 26, 2006 - 3:56 am UTC
We once had to sort a very large (non-partitioned) table in the region of 60G.
The sorted column was Type NUMBER and indexed.
So what was/is regularly done is
create table sorted_table
as select * from unsorted_table
where col_to_be_sorted>0;
The sorts normally takes 3 hours to finish using the index on col_to_be_sorted. As I understand this create table does not need TEMP tablespace to do the sort.
Once the sort did not finish after 35hours.
Database was 8i. We stopped the create statement.
Rebuild the index (out of desperation, who knows?)
Reissued the create table. -> never finished.
Did an export/import into 9i on another server.
Created index, issued same create statement, took less than 1hr.
My question:
Is there a way to estimate the time it will take for a create table to finish based on some V$ views?
A way to see the number of rows that have been inserted/created into the temp segment of the "to be created" table?
Or anything that gives us a clue that it will take forever.
Thanks,
Thierry Yen Suin
Humm V$SESSION_LONGOPS
Thierry Yen Suin, June 01, 2006 - 1:22 am UTC
Of Course the V$SESSION_LONGOPS view!
How could I have forgotten than?
Thierry Yen Suin.
Am i missing something?
A reader, June 13, 2006 - 4:18 am UTC
Tom,
I did following to calculate the redo generated with CTAS with & without nologging. But the amount of redo generated is identical & time taken is even more with nologging. Am I missing here something?
SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production
PL/SQL Release 9.2.0.1.0 - Production
CORE 9.2.0.1.0 Production
TNS for Solaris: Version 9.2.0.1.0 - Production
NLSRTL Version 9.2.0.1.0 - Production
SQL> set timing on
SQL> set time on
13:25:17 SQL> drop table big_table;
Table dropped.
Elapsed: 00:00:00.67
13:25:52 SQL> select a.value
13:26:38 2 from v$mystat a , V$STATNAME b
13:26:38 3 where a.statistic# = b.statistic#
13:26:38 4 and b.name = 'redo size' ;
VALUE
----------
54144
Elapsed: 00:00:00.29
13:26:38 SQL>
13:26:46 SQL> create table big_table
13:27:01 2 as select * from dba_objects;
Table created.
Elapsed: 00:00:08.56
13:27:10 SQL> select a.value
13:27:22 2 from v$mystat a , V$STATNAME b
13:27:22 3 where a.statistic# = b.statistic#
13:27:22 4 and b.name = 'redo size' ;
VALUE
----------
109040
Elapsed: 00:00:00.33
13:27:23 SQL> drop table big_table;
Table dropped.
Elapsed: 00:00:00.19
13:27:57 SQL> select a.value
13:28:02 2 from v$mystat a , V$STATNAME b
13:28:02 3 where a.statistic# = b.statistic#
13:28:02 4 and b.name = 'redo size' ;
VALUE
----------
134372
Elapsed: 00:00:00.36
13:28:03 SQL> create table big_table nologging
13:28:12 2 as select * from dba_objects;
Table created.
Elapsed: 00:00:09.04
13:28:21 SQL> select a.value
13:28:46 2 from v$mystat a , V$STATNAME b
13:28:46 3 where a.statistic# = b.statistic#
13:28:46 4 and b.name = 'redo size' ;
VALUE
----------
188816
Elapsed: 00:00:00.24
June 13, 2006 - 12:18 pm UTC
you are in a noarchivelog mode database aren't you.
run it again, it was a fluke that it took 9 seconds. something else was happening on your system
nologging on a CTAS in a noarchivelog mode database is a big "no-operation", CTAS is already a direct path, non-logged operation in noarchivelog mode.
You would expect the "logged" one to have generated gobs of redo (of the size of the table). not 50k
Here we are in ARCHIVELOG MODE:
ops$tkyte@ORA9IR2> set timing on
ops$tkyte@ORA9IR2> @test
ops$tkyte@ORA9IR2> drop table t;
Table dropped.
Elapsed: 00:00:00.06
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> @mystat "redo size"
ops$tkyte@ORA9IR2> set echo off
NAME VALUE
-------------------- ----------
redo size 3568320
Elapsed: 00:00:00.00
ops$tkyte@ORA9IR2> create table t as select * from dba_objects;
Table created.
Elapsed: 00:00:00.53
ops$tkyte@ORA9IR2> @mystat2
ops$tkyte@ORA9IR2> set echo off
NAME VALUE DIFF
-------------------- ---------- ------------------
redo size 6998672 3,430,352
Elapsed: 00:00:00.01
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> drop table t;
Table dropped.
Elapsed: 00:00:00.06
ops$tkyte@ORA9IR2> @mystat "redo size"
ops$tkyte@ORA9IR2> set echo off
NAME VALUE
-------------------- ----------
redo size 7021144
Elapsed: 00:00:00.00
ops$tkyte@ORA9IR2> create table t NOLOGGING as select * from dba_objects;
Table created.
Elapsed: 00:00:00.47
ops$tkyte@ORA9IR2> @mystat2
ops$tkyte@ORA9IR2> set echo off
NAME VALUE DIFF
-------------------- ---------- ------------------
redo size 7088872 67,728
Elapsed: 00:00:00.00
<b>That yours did not generate more than 50k of redo makes me believe you are in noarchivelog mode like this:
</b>
ops$tkyte@ORA9IR2> @test
ops$tkyte@ORA9IR2> drop table t;
Table dropped.
Elapsed: 00:00:00.03
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> @mystat "redo size"
ops$tkyte@ORA9IR2> set echo off
NAME VALUE
-------------------- ----------
redo size 205344
Elapsed: 00:00:00.00
ops$tkyte@ORA9IR2> create table t as select * from dba_objects;
Table created.
Elapsed: 00:00:00.47
ops$tkyte@ORA9IR2> @mystat2
ops$tkyte@ORA9IR2> set echo off
NAME VALUE DIFF
-------------------- ---------- ------------------
redo size 273312 67,968
Elapsed: 00:00:00.01
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> drop table t;
Table dropped.
Elapsed: 00:00:00.06
ops$tkyte@ORA9IR2> @mystat "redo size"
ops$tkyte@ORA9IR2> set echo off
NAME VALUE
-------------------- ----------
redo size 295856
Elapsed: 00:00:00.00
ops$tkyte@ORA9IR2> create table t NOLOGGING as select * from dba_objects;
Table created.
Elapsed: 00:00:00.45
ops$tkyte@ORA9IR2> @mystat2
ops$tkyte@ORA9IR2> set echo off
NAME VALUE DIFF
-------------------- ---------- ------------------
redo size 363612 67,756
Elapsed: 00:00:00.03
<b>and the 9 seconds - totally because something else was happening on your system at that point that prevented you from being "fast"</b>
Absolutely correct as usual
A reader, June 14, 2006 - 1:18 am UTC
SQL> SELECT log_mode
2 FROM v$database;
LOG_MODE
------------
NOARCHIVELOG
Thanks very much Tom.
Merge with aggregates.
Thiru, July 07, 2006 - 10:48 am UTC
Tom,
I think I posted this wrongly elsewhere!
-----------------------------------------
Will merge be a good thing to use (if so, how)when dealing with aggregates in a
multi-million records tables.
For instance:
create table source ( a varchar2(2),b number)
/
create table target ( a varchar2(2),b number,status varchar2(10))
/
insert into source values('ab',100);
insert into source values('bc',100);
insert into source values('cd',100);
insert into source values('ef',100);
insert into source values('fg',100);
insert into source values('gh',100);
insert into source values('hi',100);
insert into source values('ij',100);
insert into source values('jk',100);
insert into source values('kl',100);
insert into source values('mn',100);
insert into target (a,b) values('ab',50);
insert into target (a,b)values('ab',50);
insert into target (a,b) values('bc',50);
insert into target (a,b)values('bc',50);
insert into target (a,b)values('cd',50);
insert into target (a,b)values('cd',50);
insert into target (a,b)values('de',50);
insert into target (a,b)values('de',50);
insert into target (a,b)values('de',50);
insert into target (a,b) values('ef',50);
insert into target (a,b)values('ef',50);
insert into target (a,b)values('gh',50);
insert into target (a,b)values('hi',50);
insert into target (a,b)values('hi',50);
insert into target (a,b)values('ij',50);
insert into target (a,b)values('ij',50);
insert into target (a,b)values('jk',50);
insert into target (a,b)values('jk',50);
insert into target (a,b)values('jk',50);
commit;
I would want to update the target STATUS column to 'MCHED' if the sum(target.b)
= source.b and target.a=source.a.
In all other cases, update the STATUS column to 'UNMCHED'. The number of records
in both the tables are over 2mill.
b. Also, can this be done in a single update statment?
July 08, 2006 - 10:52 am UTC
look elsewhere for the answer, I did it over there.
Large import question
Mike, July 12, 2006 - 3:57 pm UTC
I have visited this thread on and off for a while. I have a question regarding an apparent slowdown of inserts while importing a very large table. We are on 9.2.0.5 on Solaris 8.
I am importing a 60 million row table into our development database. (I just got more space for it, wooohooo!). Not long ago, I found a very nifty script written by Frank Naude` for monitoring import progress, as follows:
select substr(sql_text,instr(sql_text,'INTO "'),30) table_name, rows_processed,
round((sysdate-to_date(first_load_time,'yyyy-mm-dd hh24:mi:ss'))*24*60,1) minutes,
trunc(rows_processed/((sysdate-to_date(first_load_time,'yyyy-mm-dd hh24:mi:ss'))*24*60)) rows_per_min
from sys.v_$sqlarea
where sql_text like 'INSERT %INTO "%'
and command_type = 2
and open_versions > 0
/
When the table in question started importing, this script returned very high rows per minute values (138905), but as the import progressed, it has fallen to 33779 per minute after a few hours of inserting. I am doing the import from a command line and have the buffer set to 4m, recordlength 64k, commit=n, no statistics (will do entire schema afterward). Can you theorize as to why the performance would erode as the import of a large table progresses? Our box has 4g of memory and 4 cpus. My SGA is about 1G with a healthy cache and auto undo, other parameters would likely pass a sanity check.
TABLE_NAME ROWS_PROCESSED MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION 524272 3.8 139805
1 row selected.
TABLE_NAME ROWS_PROCESSED MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION 676480 5.2 129263
1 row selected.
TABLE_NAME ROWS_PROCESSED MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION 744128 6.5 114775
1 row selected.
TABLE_NAME ROWS_PROCESSED MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION 1234576 12.9 96075
TABLE_NAME ROWS_PROCESSED MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION 1471344 17.3 84885
TABLE_NAME ROWS_PROCESSED MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION 1877232 26.6 70484
TABLE_NAME ROWS_PROCESSED MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION 1978704 28.7 68904
1 row selected.
TABLE_NAME ROWS_PROCESSED MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION 2114000 32.2 65618
1 row selected.
TABLE_NAME ROWS_PROCESSED MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION 2824304 50.4 56019
1 row selected.
SQL> /
Wed Jul 12
TABLE_NAME ROWS_PROCESSED MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION 3602256 78.5 45888
1 row selected.
TABLE_NAME ROWS_PROCESSED MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION 4126528 97.8 42186
1 row selected.
SQL> /
Wed Jul 12
TABLE_NAME ROWS_PROCESSED MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION 4329472 108.2 40001
1 row selected.
TABLE_NAME ROWS_PROCESSED MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION 5597872 165.7 33779
July 12, 2006 - 5:19 pm UTC
see any "cannot allocate new log" messages?
what do v$session_* views return? like session_wait?
Additional Info
Mike, July 13, 2006 - 9:08 am UTC
SELECT A.USERNAME ora_user, A.OSUSER os_user, A.PROGRAM, C.EVENT||', '|| C.P1TEXT||', '||C.P2TEXT||', '||
C.P3TEXT event,C.WAIT_TIME
FROM V$SESSION A, V$PROCESS B, V$SESSION_WAIT C
WHERE A.PADDR = B.ADDR
AND A.SID = C.SID(+)
AND A.OSUSER IS NOT NULL
and a.username is not null
and c.event not like 'SQL*Net%'
ORDER BY 1,3
(I edited the output for readability)
ORA_USER OS_USE PROGRAM
-------- ------ ------------
SYSTEM oracle imp@ods-d-pr
od1.appliedc
ard.com (TNS
EVENT WAIT_TIME
---------------------------- ---------
db file sequential read, fil 0
e#, block#, blocks
V1-V3)
There are three mirrored sets of redo logs, 15om each
SELECT class, count FROM v$waitstat
WHERE class IN ('free list', 'system undo header', 'system undo block', 'undo header', 'undo block')
order by class
CLASS COUNT
------------------ ------------
free list 0
system undo block 0
system undo header 0
undo block 0
undo header 160
Rollback!!!
ROLLBACK SEGMENT WAITS for DEVL
waits gets Wait Ratio
71 2,131,551 .00003%
The import ran all night then failed this morning on an
ORA-30036: unable to extend segment by 4 in undo tablespace 'UNDOTBS'. Now I'm really mystified, because this table, with 59 million rows, is only about 3.5G in size. During previous imports, I have seen undo segments stretch up to a bit over 4G, then apparently wrap around with no consequence (there is nobody else reading or writing this table). The undotbs has a 4G file which can extend to 16G. In addition, I have seen the "used" size of the undotbs expand and shrink based on activity....last time I looked yesterday it had reached 14.5G and this morning when the job failed (it is rolling back right now) it had maxxed out the file.
Here are the undo settings:
transactions_per_rollback_segment 4
undo_management AUTO
undo_retention 16000
undo_suppress_errors FALSE
undo_tablespace UNDOTBS
Another thing, the table is NOT partitioned, which is because it was never supposed to grow this big, (bad metrics given to dba!) and there is no natural key we can depend on. We are working on an alternate way to hold this data.
I was just looking at v$undostat and noticed this:
UNXPBLKRELCNT UNXPBLKREUCNT EXPSTEALCNT EXPBLKRELCNT
------------- ------------- ----------- ------------
0 0 0 0
0 0 1 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
4608 0 10 0
10752 0 21 0
9728 0 19 0
8192 0 16 0
13312 0 19 0
11264 0 22 0
10752 0 12 0
9216 0 8 0
11264 0 8 0
10752 3 7 0
8704 0 10 0
10752 0 7 0
8704 0 6 0
13312 3 8 0
9216 0 4 0
9216 3 5 0
12800 0 4 0
6144 0 5 0
12800 0 4 0
4676 0 5 2048
10240 0 7 2560
0 0 2 7680
0 0 2 10240
0 0 1 4096
So did this undo failure occur due to numbers shown in the last column? Hmmmm. undo_retention is set for more than 4 hours, yet the import had been running at least 16 hours when it went belly up.
For now, I am going to do a separate export from production with a where clause and populate the table that way, but I would like to further understand both the original question, which is why import performance degraded as it went along, and what caused the import to blow up on undo (we have successfully imported other very large tables).
July 13, 2006 - 9:25 am UTC
this is too much information
all I wanted to know is:
o are there any waits you see in the v$session tables
o do you have "cannot allocate new log" messages in your alert log.
160 waits is insignificant/meaningless from v$waitstat - tells us nothing. No "time" information from it and it is a count since the INSTANCE was started.
Results
Mike, July 13, 2006 - 9:59 am UTC
select sid, event, p1text, p1, p2text, p2 from v$session_wait
/
SID EVENT P1TEXT P1 P2TEXT P2
-- --------------------------- ---------- ------------ --------- --------
1 pmon timer duration 300 0
3 rdbms ipc message timeout 300 0
4 rdbms ipc message timeout 300 0
6 rdbms ipc message timeout 180000 0
7 rdbms ipc message timeout 500 0
12 db file sequential read file# 12 block# 813986
2 db file parallel write requests 153 interrupt 0
5 smon timer sleep time 300 failed 0
9 SQL*Net message to client driver id 1413697536 #bytes 1
8 SQL*Net message from client driver id 1413697536 #bytes 1
11 SQL*Net message from client driver id 1413697536 #bytes 1
18 SQL*Net message from client driver id 1413697536 #bytes 1
22 SQL*Net message from client driver id 1413697536 #bytes 1
23 SQL*Net message from client driver id 1413697536 #bytes 1
25 SQL*Net message from client driver id 1413697536 #bytes 1
The answer to the second question is no.....logs were switching about every 15 - 20 minutes during the import.
July 13, 2006 - 12:57 pm UTC
that is point in time, session_event shows what?
JUST for your session please. measure it, wait a bit, measure again - subtract. It'll show you what that session waited on during that interval (as you are measuring row counts - as the row counts decrease what wait events increase...)
(and this table is not indexed right... that is my assupmption)
How to monitor the table delete prograss?
Jim, July 19, 2006 - 12:10 am UTC
Tom,
I have a long run delete process:
delete tab where col1 < '01-May-06'
/
it takes a few hours to complete.
how can i check how many records have been deleted during the process?
TIA
July 19, 2006 - 9:06 am UTC
that is a strange string to compare to isn't it? Because I know col1 cannot be a date since we never compare dates to strings and strings to numbers and so on.... (just a comment..)
as long as the delete uses a full tablescan, you can monitor it in v$session_longops, not by number of rows deleted, but rather by number of blocks scanned in the table.
Very confused
Elaine, July 25, 2006 - 11:55 am UTC
I have been reading through this thread and saw where you said that updating a few million rows should be able to take place in minutes. I have 400 million rows daily to load. We use insert/append to do the insert (sorry, that's redundant insert/insert...).
I am having a disagreement with my boss. There are 9 indexes on the table. I maintain that as each row is appended, the indexes are updated. He maintains that the index updates do not take place until after the load is complete.
Which of us is correct? I have checked in both your books and in the Concepts manual. Someone pointed me to the following quote:
Indexes are created on one or more columns of a table. After it is created, an index is automatically maintained and used by Oracle. Changes to table data (such as adding new rows, updating rows, or deleting rows) are automatically incorporated into all relevant indexes with complete transparency to the users.
This would imply (I think) that I am right that the indexes are updated as the inserts are occurring, or am I misreading this?
Any links to either chapter numbers in your book or other documentation links greatly appreciated if this has already been covered.
July 25, 2006 - 12:04 pm UTC
neither of you is entirely correct :)
both of you are partitially right! but i'd say he is closer to the truth.
with a direct path load like that - we write the newly added data to the table above the high water mark for the table.
We also build a "mini index" of the newly loaded data off to the side (not in the "big, existing" index yet).
At the end of the append - we merge in BULK the mini-index with the existing index.
So, the existing index is maintained automatically.
But not row by row.
It is done in bulk - efficiently - at the end of the operation.
Great answer!
Elaine H, July 25, 2006 - 12:51 pm UTC
I never understood how that worked. Thanks for the excellent explanation. I still have one thing I don't understand though. If there is a unique index on the table, how does oracle compensate to make sure that the constraint is not violated if it is only building the mini index to merge later?
Also, do you know of any way to speed the load of the mini-index to the main index? Is there any way to influence the manner in which the update/insert is applied the way that you can the initial insert using hints?
Parallel DML and GROUP BY
Thiru, September 21, 2006 - 1:29 pm UTC
I do not know if this is an oracle bug or parallel processing is not going to work in this case:
alter table t parallel;
alter session enable parallel dml;
create sequence t_seq;
update
(select rkey,id from t order by id)
set rkey=t_seq.nextval;
The table contains 50mill records
What it looks like after running the above update is that the sequence used was not consistent with the order on column id.
Though I cannot run a test case at this point for your perusal, the parallel procssing looks like ordered the individual records in the splits and was allocating the sequence numbers.
for eg: the data looked like:
rkey id
100 ab
101 ab
102 bc
103 bc
..
..
500 ab
The question is: Is this the way parallel dml works or is it a bug or is something wrong with my update?
Thanks and regards
September 22, 2006 - 2:21 am UTC
I would not expect it to be. each section was ordered nicely, but that is about it. I would expect gaps to be possible and basically rows to be randomly updated.
I would say "something wrong with the thought behind doing such an update" rather than something wrong with the update. I don't see the point in assigning a number like that at all.
Thiru, September 22, 2006 - 5:43 pm UTC
The reason behind doing such an update was :
The flat files that contain data is not sorted. I used external table to load this into the table and could not use a sequence and order the records while doing the load. No idea if that was possible ??
So what I understand now is that parallel dml would work if no ordering or grouping is involved. Because if grouping is involved, then it would be the same case of grouping each splits separatlely. Right?
The other option which I thought of was:
create the table through the external table load.
And then create another table CTAS using the sequence and ordering the base table.
create table t1 as select t_seq.nextval,t.id,t.c2 from t order by id
Is this a good option?
September 24, 2006 - 1:09 pm UTC
umm, but the data is apparently already "orderable" - you just tried to use an order by to assign yet another field to order by.
so, why do you need this second field when one already apparently exists.
Delete / CTAS
Chris, September 27, 2006 - 11:25 am UTC
Tom,
Trying to process all the great info in this thread and would like a little feedback. I have a 20 million row table that gets about 2 million rows deleted every day. The table is not partitioned (let's not get into the why). There is only one index on the table and it is the primary key. Due to the extremely critical nature of the data some folks are very nervous with a CTAS, drop/rename strategy. The database does not use locally managed tablespaces and the current process spawns about 70 threads to delete data and bring specific rollback segments online. For other reasons I won't go into here, the database is bounced every day and this delete process would ideally be compatible with running before and after the bounce (currently it must complete before they bounce). I am trying to benchmark something like this and was looking for your thoughts.
1. CTAS getting only those rows to be deleted (unrecoverable/parallel).
2. Delete x number of rows at a time in parallel.
Looks like:
CREATE TABLE mytab unrecoverable parallel 8 AS
SELECT /*+ parallel(mytab_source,8) */ column1
FROM mytab_source
WHERE ...;
DELETE /*+ parallel(mytab,8) */
FROM mytab
WHERE column1 IN (SELECT column1
FROM mytab_source
WHERE rownum < 50000);
The only two reasons for limiting # of rows deleted would be rollback space and time. Would be happy to do it all at once if this became of no concern. If the whole operation could complete in < 30 minutes that would be very acceptable.
Not exactly sure how I'd loop through 50000 at a time though without deleting from mytab_source as well. In other words after I delete the first 50000, how do I get the next 50000.
Anyhow, some questions.
1. Does this seem reasonable or would you suggest another approach (I'm sure you will)?
2. Not much experience with parallel DML, does the above work as is or do I need to "alter session enable parallel dml" first? I though I had to, but it seemed to run really quick without doing this.
3. I don't believe the table currently has a value specified for freelists, should it, and if so what should it be?
4. Is the delete syntax with parallel correct?
5. Might I be better off with a CTAS for the 18 million rows I want to keep, then build an index in parallel (the first time I tried with logging and not in parallel on 11 million rows it took 34 minutes) and then rename/rename? To appease those that are fearful of dropping the old table I could keep the last version with a different name if there is sufficient space keeping a rolling cycle of two tables.
6. How would you suggest doing the delete x number of rows at a time?
7. What is the difference between unrecoverable and nologging on the CTAS?
8. If you use a drop/rename or rename/rename, packages will go invalid when the original table disappears. All code should recompile itself on first use, correct? Am I sure I won't get "invalid state of packages" on the first attempt, and then a successful use on the next attempt?
September 27, 2006 - 3:51 pm UTC
laughing out loud, but limiting the number of rows deleted - you increase the runtime (which seems to be the converse of what you are trying to achieve)
have you tried "create table as select ROWS TO KEEP", "index" (all of which will require no undo and if you like, no redo), drop and rename?
how does IT perform compared to deleting 10% of the data?
Alexander the ok, September 27, 2006 - 4:54 pm UTC
Why does limiting the number of rows increase runtime?
September 27, 2006 - 5:33 pm UTC
eh? not sure what you are refering to exactly here?
Your last follow up
Alexander the ok, September 27, 2006 - 6:09 pm UTC
But, I've got a LONG column ...
Matt, October 11, 2006 - 1:00 pm UTC
that is used by everyone's favourite CRM application. I've got to get rid of 38 million rows of a 40 million row table. If I understand correctly the conversion from LOB to LONG is a no-no (therefore cannot do a CTAS and then convert back).
So ... how does one delete 90 odd % of a table with a LONG column?
October 11, 2006 - 4:03 pm UTC
sorry, no good answer for you - longs prevent most things.
LONG and a direct path insert
Matt, October 16, 2006 - 5:35 am UTC
Tom,
Apologies for pestering you still, but does having a LONG column also count out the use of a direct path insert?
I'm thinking, create another table for the data we want to keep, do the direct path insert, drop the 'old' table and then rename.
October 16, 2006 - 7:53 am UTC
you cannot insert a long using insert as select, so the question is moot since you cannot use insert with a "as select" and a long.
Application to insert/copy?
Basil, November 01, 2006 - 2:16 pm UTC
I've inherited a database design where we have the need to make copies of rows in one particular table. The system more or less looks like:
drop table c;
drop table u;
drop table pt;
drop table ans;
create table c (
c_id NUMBER PRIMARY KEY NOT NULL,
name VARCHAR2(500));
create table u (
user_id NUMBER PRIMARY KEY NOT NULL,
name VARCHAR2(500));
create table pt (
pt_id NUMBER PRIMARY KEY NOT NULL,
name VARCHAR2(500));
create table ans (
ans_id NUMBER PRIMARY KEY NOT NULL,
name VARCHAR2(500));
create table rp
(rp_id NUMBER PRIMARY KEY NOT NULL,
ans_id NUMBER NOT NULL,
pt_id NUMBER NOT NULL,
c_id NUMBER NOT NULL ,
date_value DATE,
text_value VARCHAR2(4000),
numeric_value NUMBER,
addon_ids VARCHAR2(4000),
cr_dt DATE DEFAULT SYSDATE NOT NULL,
cr_user_id NUMBER NOT NULL,
upd_dt DATE DEFAULT SYSDATE NOT NULL,
upd_user_id NUMBER NOT NULL,
CONSTRAINT rp_c_fk FOREIGN KEY (c_id)
REFERENCES c (c_id) ENABLE,
CONSTRAINT rp_CRUSER_FK FOREIGN KEY (CR_USER_ID)
REFERENCES u (user_id) ENABLE,
CONSTRAINT rp_PRICETBL_FK FOREIGN KEY (pt_id)
REFERENCES pt (pt_id) ENABLE,
CONSTRAINT rp_SRVYRESP_FK FOREIGN KEY (ans_id)
REFERENCES ans (ans_id) ENABLE,
CONSTRAINT rp_UPDUSER_FK FOREIGN KEY (upd_user_id)
REFERENCES U (USER_ID) ENABLE
) ;
CREATE INDEX rp_ac_id ON rp(c_id, ans_id);
CREATE INDEX rp_idx ON rp(ans_id, pt_id, c_id);
CREATE INDEX rp_crsidx ON rp (c_id, ans_id, pt_id );
-- rp_idx or rp_crsidx should probably go away, to minimize duplication
CREATE INDEX rp_pt_idx ON rp(pt_id);
The table RP has about 30 million rows at present. It is unpartitioned. The use of surrogate keys everywhere makes it difficult to find a suitable key for partitioning. When we do a copy, about 600,000 rows (in one particular bad case) need to be duplicated, adjusting the ans_id, c_id, and pt_id values to new ones.
To accomplish the mapping, I created some temporary tables:
create global temporary table c_map (
previd NUMBER PRIMARY KEY NOT NULL, newid NUMBER )
on commit preserve rows;
create global temporary table pt_map (
previd NUMBER PRIMARY KEY NOT NULL, newid NUMBER )
on commit preserve rows;
create global temporary table ans_map (
previd NUMBER PRIMARY KEY NOT NULL, newid NUMBER )
on commit preserve rows;
As I generate new entries for c, pt, and ans, I do multitable inserts, so that the old and new ids get stored in the temp tables. For example:
INSERT ALL
INTO c_map(previd, newid)
VALUES (c_id, c_seq.NEXTVAL)
INTO c(c_id, name)
VALUES (c_seq.NEXTVAL, name)
SELECT c_id, name FROM c WHERE <criteria>;
My thought was then to do something like:
insert into rp (
rp_id, ans_id, pt_id, c_id,
date_value, text_value, numeric_value, addon_ids,
cr_dt, cr_user_id, upd_dt, upd_user_id )
SELECT rp_seq.NEXTVAL,
ans_map.newid, pt_map.newid, c_map.newid,
date_value, text_value, numeric_value, addon_ids,
cr_dt, cr_user_id, upd_dt, upd_user_id )
FROM rp
JOIN ans_map ON rp.ans_id = ans_map.previd
JOIN pt_map ON rp.pt_id = pt_map.previd
JOIN c_map ON rp.c_id = c_map.previd;
On top of the strange execution plans I'm getting for the SELECT portion (which is doing full scans of rp with its 30 million rows, in spite of indexes that should help), the actual insert is taking a REALLY long time.
Can this UPDATE approach be modified to improve the INSERT performance in my scenario?
Oh yes, this is on a 9iR2 system.
How do hints work
Andrew, December 05, 2006 - 2:03 pm UTC
From time to time, developers tell me that the hints they use to influence execution path, like INDEX, do not work and there must be a bug in Oracle. I tell them that sometimes the hint does not take because a HINT is a hint, not a directive and that they work by artificially changing the cost of the query to favor the HINTed path. The developers insist that HINTs should always be taken because they are directives. I looked through the Oracle documentation, and it consistently refers to hints as instructing or directing the optimizer to take a certain path, rather than favoring the hinted path.
I have found no documentation about how hints work internally. Can you tell me where to find information about how these 'execution path' hints work?
December 06, 2006 - 9:29 am UTC
they can only be obeyed when they make sense.
give me an example and I'll tell you what mistake your developers made.
Well, in addition to the mistake of hinting queries at the drop of a hat that is.
Using hints is a dicey proposition - for an emergency bug fix of one or two queries - sure, as a tuning tool - I'd avoid it.
ops$tkyte%ORA10GR2> create table t ( x int, y int );
Table created.
ops$tkyte%ORA10GR2> create index t_idx on t(x);
Index created.
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> @at
ops$tkyte%ORA10GR2> column PLAN_TABLE_OUTPUT format a72 truncate
ops$tkyte%ORA10GR2> set autotrace traceonly explain
ops$tkyte%ORA10GR2> select /*+ index( t t_idx ) */ * from t order by x;
Execution Plan
----------------------------------------------------------
Plan hash value: 961378228
------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 26 | 3 (34)| 00:00:0
| 1 | SORT ORDER BY | | 1 | 26 | 3 (34)| 00:00:0
| 2 | TABLE ACCESS FULL| T | 1 | 26 | 2 (0)| 00:00:0
------------------------------------------------------------------------
Note
-----
- dynamic sampling used for this statement
<b>the index CANNOT be used in this case, X is nullable - it is not possible to use the index on X since any row with X is null would not be in that particular index</b>
ops$tkyte%ORA10GR2> alter table t modify x not null;
Table altered.
ops$tkyte%ORA10GR2> select /*+ index( t t_idx ) */ * from t order by x;
Execution Plan
----------------------------------------------------------
Plan hash value: 3778778741
------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 26 | 827 (1
| 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 26 | 827 (1
| 2 | INDEX FULL SCAN | T_IDX | 1 | | 26 (0
------------------------------------------------------------------------
Note
-----
- dynamic sampling used for this statement
ops$tkyte%ORA10GR2> set autotrace off
<b>and now it can be</b>
directives can only be followed when they make sense. If I gave you a directive to levitate six feet off of the floor - you would not be able to do that
Basic how to do simple update in minutes rather then hours
Matthew, January 02, 2007 - 11:21 am UTC
Tom,
Earlier in this thread you demonstrated how you were able to update a million rows in just a few minutes on your laptop.
I am currently developing on a system where it takes over 2 hours to update a single column for 757474 rows, and I am beyond frustrated. I can do a CTAS for millions of rows with a complex sql join of 5 tables in a couple of minutes by modifying my session to use a large sort area and hash area size, but I cannot do a simple update statement (update t set colunmX = 'C') in under a 2 hours.
This is unacceptable as I am doing ETL and my load window needs to be within a 4 hour window. Having an update of a single table take 2 hours is a application killer. The db is at 10.2, but unfortuantely the DBA's are not.
My question is what low hanging fruit should I look for in terms of speeding this processing up, and how can I prove / debug future problems of this nature.
The following the explain plan and autotrace of the update statement (working on getting access to the trace files on the server):
idl@DEVAEETL> explain plan for
2 update idl.srvc_mbr_prmry_dim
3 set current_flag ='I';
Explained.
idl@DEVAEETL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 153652758
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 755K| 1475K| 24004 (4)| 00:04:49 |
| 1 | UPDATE | SRVC_MBR_PRMRY_DIM | | | | |
| 2 | TABLE ACCESS FULL| SRVC_MBR_PRMRY_DIM | 755K| 1475K| 24004 (4)| 00:04:49 |
-----------------------------------------------------------------------------------------
9 rows selected.
idl@DEVAEETL> set autotrace on
idl@DEVAEETL> update idl.srvc_mbr_prmry_dim
2 set current_flag ='I';
757474 rows updated.
Execution Plan
----------------------------------------------------------
0 UPDATE STATEMENT Optimizer=ALL_ROWS (Cost=24004 Card=755486 Bytes=1510972)
1 0 UPDATE OF 'SRVC_MBR_PRMRY_DIM'
2 1 TABLE ACCESS (FULL) OF 'SRVC_MBR_PRMRY_DIM' (TABLE) (Cost=24004 Card=755486
Bytes=1510972)
Statistics
----------------------------------------------------------
960 recursive calls
922410 db block gets
865720 consistent gets
133717 physical reads
253701772 redo size
447 bytes sent via SQL*Net to client
469 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
11 sorts (memory)
0 sorts (disk)
757474 rows processed
January 04, 2007 - 9:55 am UTC
that current_flag is not indexed correct?
trace it - see what the major waits in fact are.
large update for a table
ozjbrown, January 04, 2007 - 5:46 pm UTC
Hi Tom,
I have a student_achievement fact table (over 4 million rows) and need to update a single column in the record for every record contained in this table. In order to determine the value updated, I have a temporary student_school table (approximately 2 million rows) that contains the value for that the column should be updated based upon matching criteria. The update to the fact table takes over 6 hours to complete.
The student_achievement fact table has the following columns -
student_achievement_key integer (pk)
school_year varchar2(8)
student_key integer
course_key integer
mark varchar2(5)
school_key integer
The student_school temporary table has the following columns -
school_year varchar2(8)
student_key integer
school_key integer
A relationship to the student_school to student_achievement is one-to-many. Not every student_school record has a record in the student_achievement fact table. And not every student_achievement fact record will have a record in the student_school table.
The logic used in my PL/SQL procedure to update all records in the student_achievement fact table is as follows -
cursor school_upd_curs is
select *
from student_school a
where exists (select 1 from student_achievement b
where a.student_key = b.student_key
and a.school_year = b.school_year)
order by student_key, school_year;
begin
open school_upd_curs;
loop
fetch school_upd_curs bulk collect
into col... limit 1000;
exit when school_year.count = 0;
for i in school_year.first .. school_year.last loop
if first record fetched for the student then
update student_achievement
set school_key = fetched school_key
where school_year <= fetched school_year;
elsif last record fetched for student then
update student_achievement
set school_key = fetched school_key
where school_year >= fetched school_year;
else
update student_achievement
set school_key = fetched school_key
where school_year = fetched school_year;
end if;
end loop;
commit;
end loop;
end;
Is there a way to speed the update?
Thanks.
January 05, 2007 - 9:43 am UTC
update the join of the tables.
update (select column_to_be_updated, value_to_update_to
from t1, t2
where (join) )
set column_to_be_updated = value_to_update_to;
trace for update that takes 2 hours
Matthew, January 09, 2007 - 12:54 pm UTC
Tom,
Sorry that this took so long to get to you. Had to wait for the DBA's to pull the trace file from the development system for me.
The column being updated is not indexed.
Here is the trace that you requested. Please let me know if you need more information.
update idl.srvc_mbr_prmry_dim
set current_flag ='C'
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.01 0 0 0 0
Execute 1 152.68 6057.11 240133 978321 917824 757474
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 152.69 6057.12 240133 978321 917824 757474
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 68 (IDL)
Rows Row Source Operation
------- ---------------------------------------------------
0 UPDATE SRVC_MBR_PRMRY_DIM (cr=973072 pr=232705 pw=0 time=2145490920 us)
1547044 TABLE ACCESS FULL SRVC_MBR_PRMRY_DIM (cr=978249 pr=237879 pw=0 time=3465511263 us)
Rows Execution Plan
------- ---------------------------------------------------
0 UPDATE STATEMENT MODE: ALL_ROWS
0 UPDATE OF 'SRVC_MBR_PRMRY_DIM'
1547044 TABLE ACCESS MODE: ANALYZED (FULL) OF 'SRVC_MBR_PRMRY_DIM'
(TABLE)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file scattered read 10919 57.93 5622.86
db file sequential read 10192 8.05 218.19
log buffer space 6 0.98 2.94
log file switch completion 12 0.39 3.54
latch: shared pool 5 0.02 0.06
latch: cache buffers lru chain 3 0.00 0.00
latch: cache buffers chains 1 0.00 0.00
free buffer waits 10 0.01 0.15
buffer exterminate 57 0.98 56.17
latch free 1 0.00 0.00
SQL*Net message to client 1 0.00 0.00
SQL*Net message from client 1 0.03 0.03
********************************************************************************
Any help that you can give me would be appreciated as the DBA's and I are stumped.
January 11, 2007 - 9:38 am UTC
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file scattered read 10919 57.93 5622.86
seems rather obvious that you have a potentially serious IO issue there doesn't it? almost a minute for at least one of those IO's to return??
optimize update millions of lobs
Wolfgang, January 10, 2007 - 10:18 am UTC
hi tom,
every year i have to reorganize my picture table (500GB). generealy i do this like this way:
1. make everything offline
2. CREATE TABLE dbildtmp AS SELECT cid, bildtn, text FROM dbild ;
3.) ALTER TABLE DBILD SET UNUSED (text);
ALTER TABLE DBILD SET UNUSED (bildtn);
4. ALTER TABLE DBILD DROP UNUSED COLUMNS;
5. ALTER TABLE dbild ADD (bildtn BLOB) LOB (bildtn)
STORE AS ( DISABLE STORAGE IN ROW TABLESPACE NGEN_2K_LOB );
ALTER TABLE dbild ADD (text CLOB)
LOB (text) STORE AS ( DISABLE STORAGE IN ROW TABLESPACE NGEN_2K_LOB );
6. update the new created collumns:
DECLARE
CURSOR cur IS SELECT * FROM dbildlobs;
v_anz NUMBER := 0;
BEGIN
FOR rec IN cur LOOP
UPDATE DBILD
SET bildtn = rec.bildtn , text = rec.text
WHERE cid = rec.cid;
IF v_anz = 1000 THEN
COMMIT;
v_anz := 0;
END IF;
v_anz := v_anz +1;
END LOOP;
COMMIT;
END;
/
7. make everything online
please could you tell me how i can reduce the runtime from point 2 and 4 and particularly point 6 ?
i tryed to modify the procedure with bulk collect and forall. although it doesnt work faster ( 6 hours )
here is my modified procedure:
CREATE OR REPLACE PROCEDURE Update_Dbild2 IS
TYPE t_text IS TABLE OF DBILDLOBS.TEXT%TYPE INDEX BY BINARY_INTEGER;
TYPE t_bildtn IS TABLE OF DBILDLOBS.bildtn%TYPE INDEX BY BINARY_INTEGER;
TYPE t_cid IS TABLE OF DBILDLOBS.cid%TYPE INDEX BY BINARY_INTEGER;
l_text t_text;
l_bildtn t_bildtn;
l_cid t_cid;
CURSOR cr_rec IS SELECT cid,bildtn,TEXT FROM DBILDLOBS WHERE ROWNUM < 1000;
BEGIN
OPEN cr_rec;
FETCH cr_rec BULK COLLECT INTO l_cid,l_bildtn,l_text;
CLOSE cr_rec;
FORALL i IN l_cid.FIRST .. l_cid.LAST
UPDATE DBILD
SET TEXT = l_text(i), bildtn = l_bildtn(i)
WHERE cid = l_cid(i);
END;
/
Large Update on table
ozjbrown, January 10, 2007 - 10:28 pm UTC
Hi Tom,
I am lost as to what you are recommending I am changing in my procedure that uses the bulk fetch and loops through to update another table with the value.
Would it be possible to use my example as to where I need to make the change?
Thanks.
Performance Issue!! Commit Strategy
Prashant Pathak, January 11, 2007 - 9:41 am UTC
Hi Tom,
I have a table which contains following columns
ALLOY itm OpSeq DEPT DAYS offset grp_id
Alloy 102618 9999 PROC 0 0 518346
Alloy 102618 180 SGP 5 0 518346
Alloy 102618 170 QA 1 0 518346
Alloy 102618 160 LAB 0 0 518346
Alloy 102618 150 TRP 1 0 518346
Alloy 102618 140 QA 1 0 518346
Alloy 102618 130 QA 1 0 518346
Alloy 102618 120 WP 5 0 518346
Alloy 102618 110 BA 2 0 518346
Alloy 102618 100 TRP 1 0 518346
Alloy 102618 90 QA 1 0 518346
Alloy 102618 80 QA 1 0 518346
Alloy 102618 70 QA 1 0 518346
Alloy 102618 60 MW2 3 0 518346
Alloy 102618 50 MW1 0 0 518346
Alloy 102618 40 M2M 0 0 518346
Alloy 102618 30 M2M 3 0 518346
Alloy 102618 20 PROC 1 0 518346
Alloy 102618 10 PROC 0 0 518346
Alloy 102963 9999 PROC 0 0 518346
Alloy 102963 90 CHP 1 0 518346
Alloy 102963 80 SG 5 0 518346
Alloy 102963 70 CHP 1 0 518346
Alloy 102963 60 PM 6 0 518346
Alloy 102963 50 PM 6 0 518346
Alloy 102963 40 TRP 1 0 518346
Alloy 102963 30 AIM 0 0 518346
Alloy 102963 20 AIM 0 0 518346
Alloy 102963 10 PROC 0 0 518346
Now i have to update the table for offset column.
starting with opseq 9999 and deduct days from previous row and updated offset column
so for opseq 9999 at first step it will be 0
ALLOY itm OpSeq DEPT DAYS offset grp_id
Alloy 102618 9999 PROC 0 0 518346
Alloy 102618 180 SGP 5 0-5=-5 518346
Alloy 102618 170 QA 1 -5-1=-6 518346
Alloy 102618 160 LAB 0 -6-0=-6 518346
Alloy 102618 150 TRP 1 -6-1=-7 518346
.
.
.
so i have to do this operation based on grp_id and this table has about 50,000 rows.
So my question is sholud i commit the transaction for each grp_id and go forward or should finish for all and at the end commit.
becase currently i am doing all deduction and update and after that only one commit and which is taking long time.
Thanks in advance
Re: update taking over 2 hours
Matthew, January 11, 2007 - 2:05 pm UTC
"Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file scattered read 10919 57.93 5622.86
seems rather obvious that you have a potentially serious IO issue there doesn't it? almost a minute for at least one of those IO's to return?? "
I agreee that they do have an IO problem, the SAN is set as Raid 5. I do have a couple of follow up questions in regards to this and hope you can provide some insight:
On the Trace file execution plan it shows:
Rows Execution Plan
------- --------------------------------------------------
0 UPDATE STATEMENT MODE: ALL_ROWS
0 UPDATE OF 'SRVC_MBR_PRMRY_DIM'
1547044 TABLE ACCESS MODE: ANALYZED (FULL)
OF 'SRVC_MBR_PRMRY_DIM' (TABLE)
1. The actual update statement only updated 757474 rows. Why does the execution plan show over twice the number of rows then what were actually updated?
2. There where 2 db read wait events with similar number of event counts:
db file scattered read 10919 57.93 5622.86
db file sequential read 10192 8.05 218.19
for the single update statement of the table.
Could this be caused by the width of the table making the actual full table scan into the buffer cache making multiple passes over the disc. The first pass using a sequential read and the second or subsequent passes doing a scattered read to match the data in the buffer to the data on the disc?
3. Would it be worth testing the following scenerios to see if I can reduce db scatter read:
a. Creating the very wide table with a larger block size and own SGA buffer size?
b. Increasing the db_file_multiblock_read_count at the session level to higher then 32?
Appreciate your input and look forward to your next book.
plsql is faster then sql
A reader, January 22, 2007 - 2:18 pm UTC
Hi Tom.
whats wrong with my database.
it seems that pl/sql is faster then sql
SQL> create table test nologging as select * from dba_objects;
Table created.
Elapsed: 00:00:01.53
SQL>
SQL> create index test_pk on test(object_id);
Index created.
Elapsed: 00:00:00.41
SQL>
SQL>
SQL>
SQL> declare
2 l_icnt number := 0;
3 l_ucnt number := 0;
4 begin
5 for x in ( select * from test )
6 loop
7
8 update test set status = 'ok' where object_id= x.object_id;
9 l_ucnt:=l_ucnt+1;
10
11 end loop;
12 dbms_output.put_line ('rows updated :'||l_ucnt);
13 end;
14 /
rows updated :46087
PL/SQL procedure successfully completed.
Elapsed: 00:00:11.20
SQL>
SQL>
SQL> commit;
Commit complete.
Elapsed: 00:00:00.05
SQL>
SQL> update test set status='ok';
46087 rows updated.
Elapsed: 00:00:17.80
SQL>
SQL> commit;
A reader, January 23, 2007 - 2:27 pm UTC
Why do you think something is wrong with your data base? I tried your example in the same order and the SQL test ran faster than the PL/SQL test, but I would expect that because the SQL test has the benefit of the now pre-loaded buffers. Did you run the test in the order as presented? Was your database busy with other things at the time of the test, or was it dedicated to your test? Same question can be asked of the server and the disk system. There can be a lot of variables that affect results like this, especially for such a small, short test.
Updating 1M records through bulk update take long time
RK, May 11, 2007 - 9:56 am UTC
I have an update statement provided below which is updating 1 million records
in lead_master table joining with a customer_table having around 5 million records.
This is taking more than 2 hrs. I could not figure out the reason.There are no sessions running parallely.
UPDATE lead_master
SET customer_gender_cd = (
SELECT
DECODE (customer_gender_cd,
NULL, NULL,
'0' ||customer_gender_cd) FROM customer_table
WHERE customer_table.customer_id = lead_master.customer_id
)
WHERE lead_master.customer_id IS NOT NULL;
May 11, 2007 - 11:44 am UTC
because you may well be executing that subselect 1,000,000 times
you don't say if this is indexed
what release
what version
what plan
what anything really.
update ( select a.customer_gender_cd a_cd,
b.customer_gender_cd b_cd
from lead_master a, customer_table b
where a.customer_id = b.customer_id(+) )
set a_cd = nvl2( b_cd, '0'||b_cd, null );
outer join if and only if necessary of course.
Updating 1M records through bulk update take long time
RK, May 11, 2007 - 9:56 am UTC
I have an update statement provided below which is updating 1 million records
in lead_master table joining with a customer_table having around 5 million records.
This is taking more than 2 hrs. I could not figure out the reason.There are no sessions running parallely.
UPDATE lead_master
SET customer_gender_cd = (
SELECT
DECODE (customer_gender_cd,
NULL, NULL,
'0' ||customer_gender_cd) FROM customer_table
WHERE customer_table.customer_id = lead_master.customer_id
)
WHERE lead_master.customer_id IS NOT NULL;
RK, May 14, 2007 - 10:08 am UTC
Oracle Release is Oracle9i Release 9.2.0.4.0 - 64bit Production
The suggested modified SQL was much useful.
It got executed in 6 minutes.
Thanks a lot.
sammy, May 21, 2007 - 8:01 am UTC
Hi Tom,
I am trying to duplicate your May 11th followup.
I am getting the following error...ORA-01779: cannot modify a column which maps to a non key-preserved table.
UPDATE (select a.oldCol a_oldCol,
b.newCcol b_newCol
from table1 partition(part1) a,
table2 b
where a.id = b.id)
SET a_oldCol = b_newCol
b.id is indexed .
a.id is indexed (local).
I also want to use the nvl2 function you use¿can you please explain
nvl2( b_cd, '0'||b_cd, null )
as opposed to...
nvl2( b_cd, b_cd, null ) ??
Thanks in advance for you advice.
I am using Oracle9i Enterprise Edition Release 9.2.0.6.0 - 64bit Production
May 21, 2007 - 11:10 am UTC
give full example - you need a CONSTRAINT (primary key or unique) on table2(id)
Direct-path INSERT and TEMP Tablespace
A reader, June 06, 2007 - 12:23 pm UTC
Hi Tom,
I am using direct-path INSERT to copy the contents of source_table to destination_table.
When I enable PDML and the INSERT runs in parallel, I observe the use of sort_usage, temp tablespace filling up and, at the same time, nothing being written to the tablespace containing the destination table. Eventually, temp tablespace runs out of space.
However, when I don't enable PDML and the INSERT runs in serial, Oracle starts writing to the temp segments in the destination table's tablespace as soon as the source result set is available. In this scenario, it never uses any sort segments in temp tablespace.
Why is temp tablespace being used during PDML?
As you can see from the query, there are no sorts or joins -- it's just a straight select *. There are no indexes on the destination_table. I thought PDML would work similarly to serial DML (as far as space allocation goes). Shouldn't each parallel execution sever just pick a partition in destination_table start allocating extents in the destination_table's tablespace.
Here is what I run:
alter table destination_table nologging;
insert /*+ append */ into destination_table
select /*+ parallel(s 1) full(s) */ * from source_table@link s;
commit;
The same behavior can be observed when both tables are in the same database (and the select doesn't use the link).
Environment:
Oracle 9.2.0.6
source_table and destination_table are partitioned
source_table is ~400GB in size
temp tablespace is 40GB
there are no indexes on destination_table
Thank you in advance for your help.
June 06, 2007 - 9:22 pm UTC
full disclosure - show the plans, show the queries you use to monitor this, show everything we need to see what you see.
and you do know that the dblink will serialize right, so the parallel is extremely questionable - you have a parallel query getting serialized over the dblink to return the data.
Deleting 1-2 million of records from 300 millions records
Franklin, June 20, 2007 - 8:27 pm UTC
Hi Tom,
In one of my project I have a table which contains more than 400 million records, everyday one of my procedure will execute which will delete 1 million records and insert new 1 million records, some time it may no delete any single records, but insert fresh 1 million records, I have tryied no. of way but performance is vary bad, can u guide me.
My table : orders having columns
ord_id,
inv_id,
inv_date
col1,
col2,
...
my delete will be based on ord_id, inv_id, inv_date
delete from orders a
wHERE NOT EXISTS (SELECT 1 FROM ordertemp b
WHERE a.ord_id = b.ord_id
and a.inv_id = b.inv_id
and a.inv_date between b.sdate and b.edate)
My orders tables generated all records based on ordtemp which contains sdate and edate, now suppose this period has changed for any order then i have to delete this data and insert data with new period..
June 21, 2007 - 10:23 am UTC
what indexing strategies have YOU tried.
what is "performance is very bad" - what are you observing and what are your expectations.
have you considered MERGE instead of delete+insert (eg: update existing, insert new)
"insert into" using a cursor
ht, August 17, 2007 - 5:44 pm UTC
Hi Tom,
I took your advice and, instead of an update that ran for days, am using "create table as select * from".
I'm also using pl/sql with a cursor to insert additional rows into the new table and am wondering if there is a more efficient way to perform the insert. The pl/sql is trying to bring over the bare minimum of recs by first finding a "down" rec, getting only the next "up" rec and inserting it:
>declare
2
3 cursor c1 is
4 select
5 id,op,db
6 from
7 tbl1
8 where
9 status='down'
10 order by
11 id,
12 op,
13 db;
14
15 begin
16
17
18
19 for x in c1 loop
20
21 insert into tbl2
22 (
23 select
24 *
25 from
26 tbl1
27 where
28 id=
29 (
30 select min(id)
31 from
32 tbl1
33 where
34 id>x.id
35 and op=x.op
36 and status='up'
37 and db=x.db
38 )
39 );
40
41
42
43 end loop;
44
45 end;
46
47 /
August 22, 2007 - 9:48 am UTC
would need to know more, about the data.
if you are trying to get the next "up" record after each "down" record - next being assigned by ordering by ID for each op/db - there are better ways, sure
insert into tbl2
select *
from (
select tbl1.*,
lag(status) over (partition by op, db order by id) last_status
from tbl1
where status in ( 'up', 'down' )
)
where status = 'up' and last_status = 'down'
would get the "next up record after each down record by OP and DB" for example
Works great - should an index be created?
ht, August 22, 2007 - 3:16 pm UTC
Tom,
As always, you helped tremendously. My process is taking 3 minutes instead of 2 hours. Given I'm new to "lag" and "partition" statements, would adding an index speed up the process? If yes, which columns would you index on? I know full table scans are not a bad thing - just wondering if I can get tune the query further with indexes since I'll be running this 200+ times.
August 23, 2007 - 10:44 am UTC
unless the 'up' and 'down' records represent a tiny portion of ALL records - no index, we want a nice juicy full scan.
do they?
why are you running this 200 times?
once for each set of recs
ht, August 24, 2007 - 4:33 pm UTC
Tom,
Thanks for the advice. The fts is appropriate in this case and I have to perform the operation 200+ times since I was testing only a subset of the data. In reality, I just removed a predicate to operate on all the recs and it works great!
How to Load (6M+ Records x 300 Values ) between Databases
Saravanan Ramalingam, August 30, 2007 - 12:04 pm UTC
Hi Tom,
I have a requirement where I need to load around 6M+ records on daily basis from source database on which every record will hold a minimum of 300 different values. (Source DB will throw only 6 values and the others are permutation and combination of them which will be calculated on the fly). This will go up to 20 Million records per day in future.
The constraints are:
1. Currently I receive 300 different values to be stored (Per Record), which is likely to be increase to 1000 and max up to 25000 different values. This value is for only one record and the total number of records which I receive is around 6M per day.
2. As per the SLA the load window is one hour
3. I need to retain 10 years of data inside active database (10Yrs * 365 Days * 6M Records) to be stored and (10 Yrs * 365 Days * 6M Records * 300 Different Values) needs to be holded
Also I would like to know how to take care of this from the Performance, Storage and Hardware stand point for Oracle.
I have tried all theoretical solutions which are available on the net, they speak only about partitioning, compressing. But I have not found a solution to handle 300 columns (which can be extended up to 10000 in future).
Thanks in Advance,
Regards,
Saravanan Ramalingam
September 04, 2007 - 5:38 pm UTC
by 300 different values, do you mean "columns", "attributes"?
loading 20 million records will take minutes.
you will want to utilize partitioning to permit you to age the data out over time.
I see you meant 300 columns, which can theoretically go to 1000 but will never go to 25,000. The maximum number of attributes in a table is 1,000.
but - why is 300 columns "a sticking point", what are you looking for to address that? so what if you have 300 columns - why is that relevant to loading data?
is this practible ?
Karin, September 01, 2007 - 7:28 am UTC
Hallo Tom,
do you think, this is still the better way in our case ? the table contains 40 million sentences, has got a lot of foreign keys and we need to update a numeric field from n to n+1 for half of the sentences. And we have got a partitioned table and a partitioned index on that table.
Thanks for your respond, I already got a lot out of your answers.
Karin
September 05, 2007 - 12:42 pm UTC
I don't know what a sentence is in this context?
sorry, language problem ;-))
A reader, September 12, 2007 - 9:56 am UTC
sentence = row
September 15, 2007 - 4:41 pm UTC
go ahead and update it?
you wrote:
...
the table contains 40 million sentences,
has got a lot of foreign keys and we need to update a numeric field from n to n+1 for half of the
sentences. And we have got a partitioned table and a partitioned index on that table.
....
40 million rows - understood.
foreign keys - so what? You don't relate that to this column at all, so not sure what relevance it has.
partitioned - again, not sure of the relevance
indexes - you don't say that THIS column is indexed, so not seeing the relevance.
A delete scenario
A reader, September 19, 2007 - 2:11 pm UTC
Hi Tom,
I have a table containing 20 Million rows. The table has a composite index (3 columns). The application continually inserts into the table and also does delete from the table every 1 hour through independent delete statements(1 per row). The delete operation is pretty slow and we see a lot of waits on "db file sequencial access" during the delete operation. The delete is based on the indexed column values.
Please suggest a solution which will help in having better performance.
September 20, 2007 - 9:30 pm UTC
define pretty slow, got a tkprof?
the delete must update ever index, not just the one it reads, the single block IO's are likely due to that.
slow by slow processing should be avoided, it is always the slowest way to process data. think "sets"
ENABLE PARALLEL DML" and TEMP tablespace
Florin, September 20, 2007 - 12:05 pm UTC
Hi Tom,
Would you please explain me why the "ALTER SESSION ENABLE PARALLEL DML" is consuming TEMP tablespace?
If I'm not using "alter the session enable parallel dml" Oracle will use anyway the paralellism from the hint (4)
Example:
--ALTER SESSION ENABLE PARALLEL DML;
INSERT /*+ PARALLEL ( BL1_CUSTOMER_INFO_TEST 4) APPEND */
INTO BL1_CUSTOMER_INFO_TEST
SELECT /*+ PARALLEL ( BL1_CUSTOMER_INFO 4) FULL (BL1_CUSTOMER_INFO ) */
*
FROM BL1_CUSTOMER_INFO;
In this case the table BL1_CUSTOMER_INFO has more than 200 million rows and over 50 GB.
If I'm leaving the "alter session enable parallel dml" it will fail on TEMP tablespace, while if I'm disabling the "alter session .." it will go direct to the tablespace of the BL1_CUSTOMER_INFO_TEST.
Many thanks in advance!
September 24, 2007 - 7:32 am UTC
are you sure it is failing on TEMP or is it failing to acquire a temporary extent.
big difference. Let us see the precise error message please.
and what indexes have you on this table.
ENABLE PARALLEL DML" and TEMP tablespace
Florin, September 26, 2007 - 9:25 am UTC
Hi Tom,
I did another test (this time in our test database not like previous one in Prod).
I have a table (TEST1) with 25 million rows (1.7 GB).
AIM_DBA>create table test2 nologging tablespace pool_ix as select * from test1 where 1=2;
Table created.
AIM_DBA>alter session enable parallel dml;
Session altered.
AIM_DBA>insert /*+ append parallel (test2 2) */ into test2 select /*+ parallel (test1 2) */ * from test1;
ORA-12801: error signaled in parallel query server P000
ORA-01652: unable to extend temp segment by 128 in tablespace TEMP
If I'm doing the same thing without "alter session enable parallel dml" :
AIM_DBA>insert /*+ append parallel (test2 2) */ into test2 select /*+ parallel (test1 2) */ * from test1;
25192932 rows created.
Elapsed: 00:03:31.74
Please note:
No indexes in both tables.
I'm using Oracle 9.2.0.5.0 - 64 bit
I'm very curious to hear your opinion.
Thanks!
Florin
September 26, 2007 - 9:39 pm UTC
i've bookmarked and will look at this when I get back (and have a server whereby I can do parallel on)
Update Query
Pete, October 05, 2007 - 9:09 pm UTC
Hi Tom,
We have a query written like mentioned below
update table a set column B = some value where id = :id;
commit;
here column ID is a primary key
This query is written inside a package and is called each time when a user logs into a system. At one time the system ran slow when about 100 persons tried to access the system and the DBA at the client side says this is the wrong way to write the query. It should be written like this (see below);
---- below is DBA reply --
"Like about a 100 people,so this table gets updated frequently so there is a need to avoid locking the index when caching.Your related rowid and using that to bypass any index locking ,access is so important"
he suggest the following way :-
Set transaction read only;
Select rowid into dbky from table a where id = :id;
(best done in a package w. package variables)
Commit;
Return Success;
Then when you need to update
Update table a SET column b = some value where rowid = :dbky; <- no locking index
Commit;
-===========================
My questions is which is the better way ? Rowid or using the primary key as written earlier and why ?
Thanks and appreciate your feedback
Upsert
Antony, October 10, 2007 - 10:16 am UTC
I do have some issues with the merge. Currently in my database there are different schemas where the datas are manipulated with differenet applications. There is one schema which called USAGE is used to maintain the login details etc from differnet application. Currently there is a package for manipulating which is used to fetch the values from different tables in diferenet schema. The procedure basically first deletes the value from tables in USAGE corresponding to the values in source table deleted and updated then it tries to insert the data. The most problem is that during insert and delete statment it takes a hell lot of redo log files due to 1- 32 millions of data in the source table. I reviewed so many comments in u r site and found there are 2 solutions either use Merge statment or drop the table and create a table with nologging then insert the statement that won't be increasing the redolog files. Could you please suggest to which i should proceed
When its done?
Karthick, October 26, 2007 - 3:20 am UTC
After reading this thread I figured out one thing. BIG TABLE makes people crazy.
But my question is; when you decide you are done with it for now?
Say you got an update which took 1 hour. Ok you are clever and you made it to run in 15 min. But is 15 min is the optimal time for now? How would you decide that? Because we cant keep on testing and trying. Things need to go to production right!!
October 29, 2007 - 10:33 am UTC
flip the question around.
what are the requirements for it to complete. how fast does it NEED to execute.
If you always move half way to a wall (eg: you start at 10 feet away and get to 5 feet away. Then you get to 5/2 feet away, then 5/2/2 feet and so on) you never actually get to the wall (sure you get very very close...)
you have to know when you can stop.
Can you show how you do that.
karthick pattabiraman, November 05, 2007 - 7:21 am UTC
you said
"it only takes a couple of minutes to update 5 million rows on my desktop pc -- so, there must be
something more involved here.
I'd be able to do that in minutes -- why is it taking you so long -- what is your bottleneck. "
with source table having 1000000 records and target table having 5000000 records where all the records in target table are updated by source.
Can you show how you do it in couple of min. I think it will definitly take lot of time.
Just want to see how you create your index and how you perform your update and so on..
I just want to see that being done with UPDATE statement not CREATE TABLE... SELECT * FROM..
Because i cant drop a table :-(
November 06, 2007 - 9:04 am UTC
"i think it will definitely take lot of time"
why do you "think" that, why don't you "try that"?
what index would I create? If I am going to update 5 million out of 5 million rows, I am not going to use an index.
update t set col = 'value';
that is how the update would go? Nothing special.
I tried
karthick pattabiraman, November 06, 2007 - 9:19 am UTC
Yes I did try. My source table has 1 million records and target has 2 million records.
So my update was like this simple
update target_table
set target_col_1 = (select source_col_1
from source_table
where source_col_2 = target_col_2)
where exists ( select null
from source_table
where source_col_2 = target_col_2)
On seeing the explain plan I was able to identify that
Target table was going for a full table scan and source_table was going for an index range scan.
So I "think" the update will be executed in this way.
for i in (select * from target_table) --> full table scan
loop
select source_col_1
from source_table
where source_col_2 = i.taget_col_2 --> index range scan
replace i.target_col_i with source_col_1
end loop
I don¿t have any index on target_table as you said any way its going to be a full table scan. I have an index on source_col_2.
So when I did the update I took more than 1.5 hour (I don¿t know the exact time as I killed the session) so I enabled parallel dml and ran the same and it completed in 20 min.
Would you suggest parallel dml for such updates? Or any other things can be done before going parallel.
And after that I changed the source from 1 million to .1 million and ran the update in parallel again. Again it took around 18 min.
So is it because source_table is going for index range scan it doesn¿t really matter how many rows are there in source_table.
November 06, 2007 - 9:39 am UTC
update (select data_from_src, data_from_tgt from src, tgt where src.key = tgt.key)
set data_from_tgt = data_from_src;
update a join, don't do all of those correlated subqueries.
no indexes should be involved although src needs to have a unique constraint on the joined to columns.
Thanks a lot
karthick pattabiraman, November 07, 2007 - 2:17 am UTC
What to say as the world says you "ROCK"..
my 20 min query that was running parallel just completed in 3 min without parallel when i changed the update the way you suggested.
can you explain us how oracle internally processes this update.
Once again thank you so much for all your help.
November 07, 2007 - 5:13 pm UTC
it just joins.
Stuart Horsman, November 08, 2007 - 6:24 am UTC
Tom,
If we're doing an application release and updating multiple tables and multiple indexes with millions of rows, is it best to:
1) Put the database into no archivelog mode.
2) Alter all tables and indexes nologging.
3) Run inserts and updates.
4) Alter all table and indexes logging.
5) Put database back into archive log mode
Regards
Stuart
November 08, 2007 - 10:15 am UTC
nologging would have no effect.
nologging works for large direct path operations such as:
o create table as select
o create index
o insert /*+ append */ (when the append hint is actually followed, can be followed) - but only for the TABLE, not for any indexes.
nologging will not affect
o insert, update, delete, merge (without append on insert or merge....)
o index maintenance
and in noarchivelog mode, nologging is not necessary, many direct path operations will not generate log in general (create table as select, insert /*+ APPEND */).
If you are not updating an indexed column, just update - try to make sure you ONLY update a table once (eg: do not update col1, then col2 then col3 - just make a single pass).
If you are updating an indexed column - consider making the index unusable and rebuild (with or without logging, up to you).
If you are updating such that you think there might be lots of migrated rows (rows grow in size), you might
o create table as select <modified data>
o drop old table
o rename new table
o re-index new table
compound key
Karthick., November 21, 2007 - 5:24 am UTC
When the column in the source is not unique how to do the same update.
say source COL1+COL2 is unique. i join both column to target table and update target with COL2. But COL2 alone is not unique. And because i have used both COL1 and COL2 in the join condition it ensures to return only one value. But still i cant do join update. In such case what kind of update is better..
is it..
update t set t_c1 = (select s_c1 from s ...)
where exists(select null from s...)
Why 'created' instead of 'Inserted'
chandra, December 11, 2007 - 3:25 pm UTC
Tom,
Please the below..
====
SQL> insert into tmp_is_oracle_supplier_match
2 (
3 otis_vendor_num,
4 otis_vendor_name,
5 otis_tax_id,
6 status_code,
7 comments
8 )
9 select attribute2,
10 vendor_name,
11 num_1099,
12 0,
13 'Valid Record'
14 from
15 ap_suppliers_int
16 where status is null
17 and REJECT_CODE is null
18 and 1=2;
0 rows created.
===
When I try to insert, with the above command, why it says "Created" instead of "Inserted" ??
December 11, 2007 - 9:41 pm UTC
because sqlplus likes to say "created" when you use insert.
Locking conflict
Feng, December 12, 2007 - 8:28 pm UTC
Hi Tom,
Here is my case which is on 9.2.0.8. On daily basis, we need to batch update few million rows in a 30-million row table (t1) using something like:
update
( select t1.a, t2.b
from t1,t2
where t1.fk=t2.pk
)
set a=b;
t2 is a table with tens of millions rows as well. Usually this takes few hours. We also have small amount real time updates against a single record in t1 from web user. If real time user update hit the same record being updated by batch update process, the real time user update process will wait and eventually time out if batch update still not finish. In order to avoid/minimize such locking conflict, we are thinking of reducing the size of batch update using PL/SQL. Is there other way to solve this problem?
Thanks in advance.
December 13, 2007 - 9:40 am UTC
don't do the update.
Why no join to retrieve the correct value
Locking Conflict
feng, December 13, 2007 - 12:52 pm UTC
Tom,
Due to our business model, t2 is a temp table which contains periodic update from our vendor that will be used to update our permanent table t1, thus we can not join to retrieve value as you suggested.
Any alternative in this case besides reducing size of update statement either in SQL or PL/SQL?
Thanks.
December 13, 2007 - 2:24 pm UTC
well, before I even begin to describe how to do this I want you to say yes or no in no uncertain terms to the following question:
What happens if another process reads a record that was updated and committed, and another record that was not yet updated (or was updated but has not yet been committed) during your "batch update". Will the fact this other process will be seeing data that heretofore was IMPOSSIBLE to see (they would never see two records that are inconsistent with respect to each other) be ok. The fact that data integrity is going to be compromised here - is that something your applications can actually live with, deal with, not have a problem with.
Because that is what you are suggesting to do - to have PART of your update be seen by every one before the entire update itself is complete.
also, we need a flag field or the ability to delete the record in t2 as we process it - so we don't process the same record more than once in the event (which will happen) of a failure. Do we have such a field that we can update in t2 as we process it - or can we delete the record from t2.
Locking Conflict
feng, December 13, 2007 - 4:58 pm UTC
Tom,
Thanks for your prompt follow up.
1) Data integrity is not a issue here as records are independent from each other so they don't have to be updated at same time.
2) We don't delete records in t2, but we can add a flag field to it if it's needed.
Please let me know if you need additional information. Thanks.
December 14, 2007 - 12:51 pm UTC
ops$tkyte%ORA10GR2> create table t1 as select * from all_users;
Table created.
ops$tkyte%ORA10GR2> alter table t1 add constraint t1_pk primary key(user_id);
Table altered.
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create table t2 as select user_id, lower(username) username, 'N' processed from all_users where mod(user_id,2) = 0;
Table created.
ops$tkyte%ORA10GR2> create index t2_idx on t2(processed);
Index created.
ops$tkyte%ORA10GR2> select * from t1;
USERNAME USER_ID CREATED
------------------------------ ---------- ---------
BIG_TABLE 58 14-DEC-05
DIP 19 30-JUN-05
TSMSYS 21 30-JUN-05
LOTTOUSER 65 30-DEC-05
MDDATA 50 30-JUN-05
OPS$ORA10GR2 56 14-DEC-05
FOO$TKYTE 60 19-DEC-05
QUOTA 94 22-FEB-06
AQ 228 15-OCT-07
R 76 09-JAN-06
OPS$TKYTE 233 16-NOV-07
SCOTT 84 12-FEB-06
B 213 08-JAN-07
RT_TEST 231 29-OCT-07
A 221 04-SEP-07
TEST 237 16-NOV-07
DMSYS 35 30-JUN-05
DBSNMP 24 30-JUN-05
WMSYS 25 30-JUN-05
EXFSYS 34 30-JUN-05
CTXSYS 36 30-JUN-05
XDB 38 30-JUN-05
ANONYMOUS 39 30-JUN-05
OLAPSYS 47 30-JUN-05
ORDSYS 43 30-JUN-05
ORDPLUGINS 44 30-JUN-05
SI_INFORMTN_SCHEMA 45 30-JUN-05
MDSYS 46 30-JUN-05
SYSMAN 51 30-JUN-05
PERFSTAT 148 31-MAR-06
SYS 0 30-JUN-05
SYSTEM 5 30-JUN-05
OUTLN 11 30-JUN-05
MGMT_VIEW 53 30-JUN-05
MY_USER 211 09-NOV-06
35 rows selected.
ops$tkyte%ORA10GR2> select * from t2;
USER_ID USERNAME P
---------- ------------------------------ -
0 sys N
24 dbsnmp N
34 exfsys N
36 ctxsys N
38 xdb N
44 ordplugins N
46 mdsys N
50 mddata N
56 ops$ora10gr2 N
58 big_table N
60 foo$tkyte N
94 quota N
228 aq N
76 r N
84 scott N
148 perfstat N
16 rows selected.
ops$tkyte%ORA10GR2> declare
2 type numArray is table of t2.user_id%type;
3 type vcArray is table of t2.username%type;
4 l_userids numArray;
5 l_usernames vcArray;
6 begin
7 loop
8 update t2
9 set processed = NULL
10 where processed = 'N'
11 and rownum <= 10
12 returning user_id, username
13 bulk collect into l_userids, l_usernames;
14
15 exit when l_userids.count = 0;
16
17 forall i in 1..l_userids.count
18 update t1 set username = l_usernames(i) where user_id = l_userids(i);
19 commit;
20 exit when l_userids.count < 10;
21 end loop;
22 end;
23 /
PL/SQL procedure successfully completed.
ops$tkyte%ORA10GR2> select * from t1;
USERNAME USER_ID CREATED
------------------------------ ---------- ---------
big_table 58 14-DEC-05
DIP 19 30-JUN-05
TSMSYS 21 30-JUN-05
LOTTOUSER 65 30-DEC-05
mddata 50 30-JUN-05
ops$ora10gr2 56 14-DEC-05
foo$tkyte 60 19-DEC-05
quota 94 22-FEB-06
aq 228 15-OCT-07
r 76 09-JAN-06
OPS$TKYTE 233 16-NOV-07
scott 84 12-FEB-06
B 213 08-JAN-07
RT_TEST 231 29-OCT-07
A 221 04-SEP-07
TEST 237 16-NOV-07
DMSYS 35 30-JUN-05
dbsnmp 24 30-JUN-05
WMSYS 25 30-JUN-05
exfsys 34 30-JUN-05
ctxsys 36 30-JUN-05
xdb 38 30-JUN-05
ANONYMOUS 39 30-JUN-05
OLAPSYS 47 30-JUN-05
ORDSYS 43 30-JUN-05
ordplugins 44 30-JUN-05
SI_INFORMTN_SCHEMA 45 30-JUN-05
mdsys 46 30-JUN-05
SYSMAN 51 30-JUN-05
perfstat 148 31-MAR-06
sys 0 30-JUN-05
SYSTEM 5 30-JUN-05
OUTLN 11 30-JUN-05
MGMT_VIEW 53 30-JUN-05
MY_USER 211 09-NOV-06
35 rows selected.
ops$tkyte%ORA10GR2> select * from t2;
USER_ID USERNAME P
---------- ------------------------------ -
0 sys
24 dbsnmp
34 exfsys
36 ctxsys
38 xdb
44 ordplugins
46 mdsys
50 mddata
56 ops$ora10gr2
58 big_table
60 foo$tkyte
94 quota
228 aq
76 r
84 scott
148 perfstat
16 rows selected.
expect this update to take longer than the other.
10 is what I used, you would use 100 to 500.
it'll generate more redo, more undo, consume more cpu, take long - but will do it in batches.
and it will not suffer from 1555's as it does not keep a cursor open across commits.
and it is restartable when it fails, just rerun it and it will do it's job.
You will want an index on that column, even with two values.
why did I use NULL instead of "Y" - because entire null entries are not made in b*tree indexes - hence this index will stay small, it only indexes the "N" records.
Thanks
Feng, December 14, 2007 - 2:48 pm UTC
Tom,
Thanks for the solution. We will try this out.
Performance question
Feng, January 15, 2008 - 4:26 pm UTC
Hi Tom,
During testing of your bulk select/update solution, I found out that it has almost same execution time as following PL/SQL.
begin
for x in ( select * from t2 )
loop
update t1 set a=x.b where t1.fk=x.pk;
commit;
end loop;
end;
/
Since this method only lock/commit one row at a time, would this be a better solution than the bulk select and update of 500 rows at a time given that both have similar response time?
What do you think?
Feng
January 15, 2008 - 4:57 pm UTC
*no*
you have a transaction - a logical unit of work.
You commit IF and ONLY IF and when your transaction is complete.
what if this updates N rows and then fails? what then? where is your code to restart in the middle?
the goal - especially in Oracle - is not to reduce the number of locks at the expense of PROGRAM LOGIC AND CORRECTNESS (we are not sqlserver!).
Performance question
Feng, January 15, 2008 - 6:49 pm UTC
In our application, if we fail after updating N rows, we can start over by updating everything again. This is not as efficient as starting from what was left from last run, but it will reduce chance of lock conflict further since the chance of failure is pretty low.
January 16, 2008 - 3:13 pm UTC
do you understand the concepts of transactional consistency?
if you are worried about "locks", that means other people must be updating this information.
So, you come and update a row, commit, they update the same information, commit, you fail and you come back and just blind as a bat overwrite their updates again???
You understand that the outside world will see a BIT of your update before it is totally complete?
Do you understand the natural ramifications of what you propose and are you sure your applications are a) expecting that b) can deal with that c) won't totally corrupt your database (logically speaking)
If you actually have what you call "lock conflict", I think the answers to the above are "we really need to commit when we are DONE, because apparently other people are trying to modify this data and we'll introduce lost update problems on a scale the size of the grand canyon"
You are right!
Feng, January 16, 2008 - 5:01 pm UTC
delete thousands records
Dawar, January 17, 2008 - 2:49 pm UTC
I have a excel spread sheet base on one our internal table.
It contains 7000 records.
I need to delete all records from Oracle Database.
What is the quickies way to delete all records?
Thanks
D
January 19, 2008 - 10:24 pm UTC
7,000 records is
a) big for excel
b) trivial, tiny, almost non-existent for Oracle.
I would just use DELETE FROM T;
Truncate would not be much faster, if it were faster even.
7,000 is such a small number.
BULK UPDATES
SRINIVAS CHALLA, January 17, 2008 - 7:57 pm UTC
For bulk updates/deletes, Please also consider Merge Statmement. With the usage of Merge statement, one can overcome the temp table creation or performance issues of Pl/SQL BULK update statement.
MERGE INTO ok a
USING (
SELECT ID
FROM ok1
) b
ON (a.id = b.id)
WHEN MATCHED THEN
update set a.flag = 'Y'
DELETE WHERE a.FLAG='Y';
desc ok;
Name Null? Type
----------------------------------------- -------- ----------------------------
ID NUMBER
FLAG VARCHAR2(1)
SQL> desc ok1;
Name Null? Type
----------------------------------------- -------- ----------------------------
ID NUMBER
FLAG VARCHAR2(1)
January 19, 2008 - 10:31 pm UTC
why would you even consider merge for that?
delete from ok where id in (select id from ok1);
that is all you need, why would you even consider complicating things so horribly in that case with merge?
Temp Tablespace Error
Nikhilesh, April 06, 2008 - 11:54 pm UTC
Dear Tom,
When we create a table using CTAS (Create table as select) (query retunrs 20 thousand rows) it works fine for first time but then it eats up all temp tablespace and gives an error saying that it could not allot an extent. We are using Oracle 10.1.0.3, we have true temporary tablespace of 9.6 GB. extent size is 1028.
I'm the only person connected to Oracle and yes after running successfully i killed 1 session which was running the same query using dynamic sql.
Am I doing something wrong or Is it a bug in 10.1.0.3?
I don't have metalink access.
Thanks in advance,
April 07, 2008 - 9:29 am UTC
no, it is not a bug, it is however insufficiently described to say what it is.
It seems like you are saying:
we do this:
a) create table t as select ......
b) drop table t;
c) create table t as select .... <<<= and this one fails.
are you sure it is blowing up on TEMP - or are you doing something like:
a) create table T1 as select ..... <<== works
b) create table T2 as select ... <<== fails on temp
If that is the case, it could be that what is "blowing up" is not temp at all - but the tablespace where t2 is going into - when we do a CTAS - we load the new data into temporary extents in the target tablespace AND THEN at the end, convert them into permanent extents.
Many times, people say "ran out of temp space", when they really ran out of space in the target tablespace (so, check the error - maybe even consider posting a CUT AND PASTE of it)....
Or, if it is not T2's tablespace that is blowing up, but there are two different tables - I'd say "table T1 needed little to no temp and T2's query needs more temp than you have allocated"
Delete the rows that contain BLOB object as one of the column
Ramakrishna, May 13, 2008 - 11:01 am UTC
I am trying to delete more than 100,000 records. It is faster in the tables that does not contain BLOB object as one of the column.I tried it through simple SQL plus
like 'delete from table_name where indexed_column<10000'. It is hanging there for a long time even for two days , the query is not done. How to effeciently delete this and what is the query I cna use to delete it quickly.
May 13, 2008 - 11:18 am UTC
two days? really? Not that I don't believe you - but I don't believe you - you must have been blocked or otherwise locked out.
although the statement "It is faster in the tables
that does not contain BLOB object as one of the column." should be rather obvious? If you do something that takes less work - it'll be faster - of course.
Did you monitor this process from another session - looking at the plethora of V$ tables out there that tell you what is going on in the database at all?
Set based update.
john hardy, May 14, 2008 - 3:25 pm UTC
What exactly is a set based update. I have been hearing it from my colleagues. Is it better than doing an update by rowid.( I know this method is not to your liking,..but it works for me. 100 million updates on a highly indexed &triggered table in 10 hours!)
May 16, 2008 - 12:14 pm UTC
if it took me 10 hours to update 100,000,000 rows, I'd be upset. We can do better.
I'd have been looking at perhaps getting rid of indexes, looking at that trigger to see if I really needed it to fire (disabling if not) and using a single update - perhaps in parallel - to perform the operation QUICKLY - in a single sql statement.
That is a set based update, slow by slow processing written by procedural programmers = bad, set based operations - packing as much work into a single sql statement = good.
In fact, if I were updating 100,000,000 out of 100,000,000 (or nearly so), I might choose to create table as "select rows as they should be", drop old table, rename-index and so on - or use dbms_redefinition to do it online.
update
A reader, May 29, 2008 - 8:25 pm UTC
delete performance
Resl, June 03, 2008 - 4:35 am UTC
Hi Tom,
I have crm_cust_info table with 301,000 records.
the select count(*) from crm_cust_info where cust_type='CONT' needs less than 1 second to return teh value (there is an index on cust_type).
while the delete statement
delete from crm_cust_info where cust_no='3500' (for example) needs more than 10 seconds to delete the required record knowing than cust_no is a unique an indexed column.
i rebuild indexes also try the same delete with different optimizer mode after analyzing the mentioned table and its indexes but without any performance change.
Please note that i included the delete statment in a cursor in order to delete about 300,000 records.
June 03, 2008 - 11:33 am UTC
you do realize that comparing the performance of a count(*) where indexed_column=value and a delete where indexed_column = value
is about as useful as comparing the performance of your bike with an airplane. True, your bike and airplane are both modes of traveling (as select and delete are both sql statements) but that is about it.
the select statement - what does it have to do... It has to read a couple of index blocks and count stuff (that it takes almost a second means there is LOTS of records to count, I would be hoping for 0.02 seconds maybe on something like that - for it to be near a second is huge)
the delete statement - what does it have to do....
a) read the index blocks
b) navigate to the table blocks, get the block in consistent read mode
c) get the block in current mode for each row it wants to delete
d) delete it - which requires navigating all of the OTHER indexes on this table to find the entries that point to this row and remove them)
e) generate redo for all of the above
f) generate undo for all of the above
g) generate redo for the undo
h) ultimately delete the index entry that found us the row in the first place
i) do a-h for the remaining 299,999 records
and you did it the slowest way possible!!!! slow by slow via a cursor, so you are making it take EVEN LONGER (if you did it in a single sql statement, you would find it to be faster in all probability)
Update statement taking more hours --
Sipra, June 09, 2008 - 3:00 am UTC
Hi Tom,
Have been observing the great support from last six months. Many thanks for the useful info -
I am having a SQL statement whic needs to be update 4 million of rows. I couldnot figure where my query is getting strucked...
Please have the following Code -
UPDATE T1
set TYPE_ID = 141
where TYPE_ID in (
SELECT TYPE_ID
FROM T1, VIEW, T2@dblink T2
WHERE T1.ID = VW.ID
AND VW.NUM = T2.NUM
AND T2.NAME IS NOT NULL
AND T2.REV = 'Y'
AND T2.NUM = '&v_num'
AND TYPE_ID = 1);
Scenario -
Updating the type_id in table T1 with 141 if the NAME of T2 is Not null and T2.REV is 'Y'
At ta time this will update 4-5 million reocrds.
I ahve executed this thru a procedure which has given the following error. Please help me.
regards,
Sipra
www.dwforum.net
Updating millions of records-LOB column
Praveen, June 16, 2008 - 10:21 am UTC
Hi Tom,
This is w.r.t. the original question and the answer given by you. There are 1 million records in a table. And if I need to upload an image or pdf file (say of size 1MB each) into a LOB column of each record (which means there are 1 million LOB files, each of size 1MB), what would be the best approach?
I need to commit after every 1000 records. Also it will be difficult to follow the steps you suggested while dealing with LOBs.
Thanks
June 16, 2008 - 1:32 pm UTC
follow which steps? This is a HUGE page here.
so, you have 1,000,000 1mb files.
I'd be looking at perhaps, probably using dbms_lob.loadfromfile, called in parallel in as many sessions as you want to have going - each session working on a different set of files.
Re:Updating millions of records-LOB column
Praveen, June 17, 2008 - 1:08 am UTC
The steps for updating 1 million records you had suggested for the original question in this thread. Anyway, I am going to test with the solution you suggested above, for uploading 1 million 1mb files.
Thank you
Update of 2 columns...
Richard, June 25, 2008 - 11:55 am UTC
Hi Tom,
Thanks for the website.. I'm an avid reader. In reference to this thread, "How to Update millions or records in a table," recently, we've run into an issue where our tables are growing very quickly and our updates aren't able to keep up. Every day, we have to perform an update like the below example. When we had 20-30m records it would take a couple hours and wouldn't be so bad. But now, that we have over 50m consistently, and sometimes closer to 100m rows, our updates haven't scaled properly. I use the example, our method of updating is like driving a Geo Metro to the grocery store and back.. if the performance is bad you don't really notice because the ride is so short. But, when you have to drive this same Geo Metro across the state and back ,thats when you notice the performance issues :-)
We have to update our customers' location based on their IP address. We have the IP Geo data and know the starting decimal IP address and the ending decimal IP address.
For example, let's say a row in big_table has SRCIP = 192.168.1.100 & DSTIP = 172.1.1.1
In the lookup_table, the following is true:
* The country_code = 'LOCAL' for the dotted decimal IP range of 192.168.0.0 - 192.168.255.255 (decimal IP 3232235520 - 3232301055).
* The country_code = 'REMOTE' for the dotted decimal IP range of 172.0.0.0 - 172.255.255.255 (decimal IP 2885681152 - 2902458367).
update big_Table
set src_cc = (select country_code from lookup_table where srcip between 3232235520 and 3232301055)
, dst_cc = (select country_code from lookup_table where dstip between 2885681152 and 2902458367)
where src_cc is null
or dst_cc is null;
So the Line would eventually look like
SRCIP: 192.168.1.100
DSTIP: 172.1.1.1
SRC_CC: LOCAL
DST_CC: REMOTE
I tried to incorporate BULK COLLECT and FORALL into the PL/SQL below, but I'm not really sure how to do so. Maybe my logic is backwards, but I'm thinking we need to collect all of the lookup_table data (~135k rows), loop through them, and then update the big_table columns. This doesn't seem like it should be too bad.. 135k iterations through to update ~60m records. Obviously some loops will update 10k records, some 30k and some only a few hundred. I've looked at various examples of CTAS, MERGE, /+ Append hints, etc. But when it comes to updating two columns that rely on different data, I can't seem to wrap my brain around it.
I have indexes on SRCIP & DSTIP in the big_table, and lookup_table is an IOT (ip_from & ip_to columns). Reading your comments in this threat, this probably doesn't matter since my current method requires a full table scan of lookup_table anyway... and it's only ~135k records. We're running 10.2.0.2 (with plans to upgrade to 10.2.0.4 ASAP) on Solaris 9. Big_table is in nologging mode. We don't keep this data for very long but it's important for us to make these updates.
Thanks in advance for any support you can offer!
SQL> select count(*) from big_table;
COUNT(*)
----------
59,895,942
SQL> select count(*) from lookup_table;
COUNT(*)
----------
135,237
SQL> desc lookup_table
Name Null? Type
----------------------------------------- -------- ----------------------------
IP_FROM NOT NULL NUMBER(10)
IP_TO NOT NULL NUMBER(10)
COUNTRY_CODE VARCHAR2(2)
SQL> desc big_table
Name Null? Type
----------------------------------------- -------- ----------------------------
QRY CHAR(3)
STARTTIME DATE
MSECS RAW(3)
RANK NUMBER
SRCIP NUMBER
DSTIP NUMBER
SRC_CC CHAR(2)
DST_CC CHAR(2)
ASSIGNEDHOSTID RAW(4)
HOSTNAME NOT NULL VARCHAR2(20)
Original update statement....
update big_table
set src_cc = (select country_code from lookup_table where srcip between ip_from and ip_to)
, dst_cc = (select country_code from lookup_table where dstip between ip_from and ip_to)
/
Obviously, taking forever.... lots of UNDO space required.
Second attempt.... not much better results. Less UNDO used, but more use of DBWn:
begin
FOR x IN (select country_code, ip_from, ip_to from lookup_table)
LOOP
update /*+ PARALLEL(big_table,10) */ big_table
set src_cc = (loop_counter.country_code)
where srcip between x.ip_from and x.ip_to
and src_cc=' ';
commit;
END LOOP;
FOR x IN (select country_code, ip_from, ip_to from lookup_table)
LOOP
update /*+ PARALLEL(big_table,10) */ big_table
set dst_cc = (x.country_code)
where dstip between x.ip_from and x.ip_to
and dst_cc=' ';
commit;
END LOOP;
end;
/
June 25, 2008 - 12:08 pm UTC
what are the odds of filling out the IP table (that should be really easy to do)
then we could update using no code easily
Filling out IP table...
Richard, June 25, 2008 - 3:47 pm UTC
Thanks for the prompt response. We've considered that option and it may be our only choice.
Two drawbacks that I can identify immediately:
* This IP data is updated ~monthly, or should be updated ~monthly. As the IP spaces change, we will have to rebuild this table... but, as I know you'll say, rebuilding that table once/month is much better than using 2 between statements on a daily basis :-)
* The decimal IP range is 33996344 - 4278190079.
If I did my math correctly:
SQL> select 4278190079-33996344 from dual;
4278190079-33996344
-------------------
4244193735
SQL> select to_char(4244193735,'999,999,999,999') from dual;
TO_CHAR(4244193735,'999,999,99')
-------------------------------
4,244,193,735
I've considered distinct IP decimal addresses as well. Out of ~50m rows, per day, we have approximately 3m unique srcip or dstip values.
For arguments sake, if we couldn't fill out an IP table, any suggestions on a course of action? This problem has become those song lyrics that I can't get out of my head until I know the rest of the verse. I think you're absolutely right that filling out an IP table might be the best method, but I'm curious what I could do with SQL or PL/SQL to attack the problem.
Thanks again.
June 25, 2008 - 3:55 pm UTC
Actually, doh, now that I think about it, why not just maintain a table of your 3milion and just JOIN
you don't need the country code in the base table, just join - so, don't fill out the ip table, maintain a lookup nightly that has just the 3million ip's you need and monthly wise update that after you get a refresh. Every night you would just have to insert into this lookup table any ip's that aren't there yet, populate them with values.
Even if you "must" have the IP's in the base table (and I would disagree, just join, it is ok to join), it would be more efficient to maintain this 3million row table nightly (adding new rows) and only having to lookup the new rows.
union & distinct columns...
Rich, June 25, 2008 - 8:36 pm UTC
Thanks Tom.
I am thinking this might be a good use of CTAS with /*+ APPEND */ hint, parallel & nologging, no?
CTAS select distinct srcip as ip from big_table UNION select distinct dstip as ip from big_table... no logging, /*+ append */, parallel, etc.
Then, i'll just keep inserting into that table until it also becomes unmanagable... truncate it and start over.
We'll have ~3m unique IPs per day.. but then on a weekly basis, I would guess it'll be ~4.5m unique IPs... monthly ~10m unique IPs, etc. So the size of this table shouldn't exceed the size of the big_table anytime soon. *fingers crossed* :-)
Sorry to go off topic, but what are your thoughts on triggers? I read in your books that triggers have less effect on redo in 10g than they did in 8i/9i. We've considered adding the src & dst country_code columns onto big_table and using an on insert trigger to update those 2 columns. This table is appended to via sqlldr (not direct load) a few dozen rows at a time... but those few dozen times add up pretty quickly -- 50+m/day. We haven't tried it yet, and I'm sorry to be the guy who asks you the questions without having tkprof, trace, etc, but I'm just looking for a gut (experience/wisdom) feeling.
Thanks again Tom.
June 26, 2008 - 3:41 pm UTC
create table as select is ALWAYS append mode (writes above high water mark with direct path). There is no such thing as a CTAS with /*+ append */.
Parallel - up to you, do you have the resources for it, can parallel help it.
Nologging - up to you and your DBA :) they need to do special stuff if you go nologging.
... Sorry to go off topic, but what are your thoughts on triggers? ...
the title of an upcoming column by me in Oracle Magazine is:
<quote>
The Trouble with Triggers¿
Those of you who follow the asktom.oracle.com website know that I have an aversion to triggers. Once upon a time, a long time ago, I thought triggers were the coolest thing ever and I used (and abused) them heavily. Now, I will go very far out of my way in order to avoid a trigger ¿ whenever possible.
</quote>
and you said:
... his table is appended to
via sqlldr (not direct load) a few dozen rows at a time... ...
but that'll skip triggers... and make /*+ append */ a "no-op"
ops$tkyte%ORA10GR2> create table t as select * from all_users where 1=0;
Table created.
ops$tkyte%ORA10GR2> create trigger t_trigger
2 before insert or update on t for each row
3 begin
4 :new.username := lower(:new.username);
5 end;
6 /
Trigger created.
ops$tkyte%ORA10GR2> !cat t.ctl
LOAD DATA
INFILE *
INTO TABLE t
REPLACE
FIELDS TERMINATED BY '|'
(
username
,user_id
,created
)
BEGINDATA
BIG_TABLE|58|14-DEC-05
DIP|19|30-JUN-05
TSMSYS|21|30-JUN-05
ops$tkyte%ORA10GR2> !sqlldr / t.ctl direct=y
SQL*Loader: Release 10.2.0.2.0 - Production on Thu Jun 26 15:30:44 2008
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Load completed - logical record count 3.
ops$tkyte%ORA10GR2> select * from t;
USERNAME USER_ID CREATED
------------------------------ ---------- ---------
BIG_TABLE 58 14-DEC-05
DIP 19 30-JUN-05
TSMSYS 21 30-JUN-05
<b>no trigger fired</b>
ops$tkyte%ORA10GR2> update t set user_id = user_id;
3 rows updated.
ops$tkyte%ORA10GR2> select * from t;
USERNAME USER_ID CREATED
------------------------------ ---------- ---------
big_table 58 14-DEC-05
dip 19 30-JUN-05
tsmsys 21 30-JUN-05
<b>that just shows trigger works, but did not fire</b>
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> insert /*+ append */ into t select * from all_users where rownum < 3;
2 rows created.
ops$tkyte%ORA10GR2> select * from t;
USERNAME USER_ID CREATED
------------------------------ ---------- ---------
big_table 58 14-DEC-05
dip 19 30-JUN-05
tsmsys 21 30-JUN-05
big_table 58 14-DEC-05
dip 19 30-JUN-05
<b>that shows trigger fired and since the select did not fail, we did NOT append!</b>
add column
jayesh, June 28, 2008 - 3:27 am UTC
hi tom,
generally we add a new column it added at the end of all column suppose i want to add a column between two column if it is possible so plese give me a examples & syntex
June 28, 2008 - 1:53 pm UTC
this is what a view is about.
How I wish we never gave "rename column :(" views are the right approach here.
alter table t add ( x int );
create or replace view view_of_t
as
select a, x, b from t;
use the view - if need be, rename the table to "t_table" and create view T as select ....
no one will know...
Update with BETWEEN operator...
Rich, July 02, 2008 - 5:50 pm UTC
Thanks for the response to my questions Tom -- always appreciated.
I've still been thinking about how to attack this and I'm curious how BETWEEN works.If I say:
update big_table
set dst_cc = (select country_code
from lookup_table
where dstip between x.ip_from
and x.ip_to);
Will this act similar to an IN operator or an EXISTS operator? I'm curious if BETWEEN will stop at the first result or continue searching for other values?
I created a quick test... I duplicated a row in lookup_table, but with a different country_code, executed a statement like
update big_table
set src_cc = (select country_code
from lookup_table
where srcip between ip_from and ip_to);
and it returned "ORA-01427: single-row subquery returns more than one row"
Am I correct to assume that Oracle will scan my entire lookup table when it uses the BETWEEN operator?
I'm thinking about rewriting my query to something like:
FOR x IN (select country_code, ip_from, ip_to from lookup_table)
LOOP
update big_table
set src_cc = (x.country_code)
where exists (select 1
from lookup_table
where srcip between x.ip_from and x.ip_to);
commit;
END LOOP;
end;
/
The lookup_table shouldn't have duplicates, ip_to & ip_from have UNIQUE constraints. Therefore, I want my query to select the first country_code & move on to the next update, not continue to search through the lookup_table. I'm not quite sure how to attack this. Any thoughts or suggestions?
Thanks again for your help!
Regards,
Rich
July 07, 2008 - 9:19 am UTC
..
update big_table
set dst_cc = (select country_code
from lookup_table
where dstip between x.ip_from
and x.ip_to);
Will this act similar to an IN operator or an EXISTS operator?
.....
absolutely not. think of that like:
update big_table set dst_cc = f(x);
no where clause - EVERY row gets updated. And if your correlated subquery returns more than one row - you'll just get an error and the update will fail.
...
Am I correct to assume that Oracle will scan my entire lookup table when it
uses the BETWEEN operator?
....
absolutely not. it'll index range scan more likely if it can.
I cannot really help with the real update you want - too much missing here, this logic doesn't seem sensible:
FOR x IN (select country_code, ip_from, ip_to from lookup_table)
LOOP
update big_table
set src_cc = (x.country_code)
where exists (select 1
from lookup_table
where srcip between x.ip_from and x.ip_to);
commit;
END LOOP;
end;
/
why would you use where exists. that would tend to scan big table for every row in lookup
maybe this?
ops$tkyte%ORA11GR1> create table t ( src_cc varchar2(2), srcip number );
Table created.
ops$tkyte%ORA11GR1> insert into t values ( null, 50 );
1 row created.
ops$tkyte%ORA11GR1> insert into t values ( null, 500 );
1 row created.
ops$tkyte%ORA11GR1> insert into t values ( null, 5000 );
1 row created.
ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> create table lookup ( country_code varchar2(2), ip_from number, ip_to number );
Table created.
ops$tkyte%ORA11GR1> insert into lookup values ( 'aa', 0, 100 );
1 row created.
ops$tkyte%ORA11GR1> insert into lookup values ( 'bb', 101, 1000 );
1 row created.
ops$tkyte%ORA11GR1> insert into lookup values ( 'cc', 1001, 10000 );
1 row created.
ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> select * from t;
SR SRCIP
-- ----------
50
500
5000
ops$tkyte%ORA11GR1> merge into t
2 using lookup l
3 on (t.srcip between l.ip_from and l.ip_to)
4 when matched then update set src_cc = l.country_code;
3 rows merged.
ops$tkyte%ORA11GR1> select * from t;
SR SRCIP
-- ----------
aa 50
bb 500
cc 5000
please do NOT commit in the for loop, please....
No commit & no exists...
A reader, July 08, 2008 - 8:56 am UTC
Thanks Tom.
why would you use where exists. that would tend to scan big table for every row in lookup
I agree, I wouldn't want to do that. I was trying to use it as an example of how I wanted to query lookup_table and use 'exists-like logic' to stop after it finds the first entry that matches the between statement. I think your suggestion of a merge is my best bet.
please do NOT commit in the for loop, please....
That being in there was embarrassing -- sorry :-)
Thanks,
Rich
How to insert 10 million rows...into a prtition
Suresh, July 23, 2008 - 12:08 pm UTC
Hi Tom,
I have a table (
fact_projection) with currently with 26 partitions and having around 10 million rows.
Every day i need to insert into fact_projection all the partitions of table from
stging_projection joining some other tables.
Data to all the partitions are being inserted correctly but for a paricular partition called PH is getting too late.
I tried inserting using BULK COLLECT but it is taking bulk of time.
Could you please suggest the best possible approach.
Shall i do it using the
insert into fact_projection PARITION PH
SELECT <join>
Here is the query .-->
INSERT INTO invsl.fact_invntry_projctn
(mngng_org_id, org_suply_chain_item_key, suply_chain_loc_key,
snpsht_dt_key, fscl_mth_id, actvty_typ_key, slng_mrkt_id,
unit_msr_id, crncy_id, srce_sys_id, unit_qty, std_cost_amt,
sku_parms_key, item_typ_id, mrkt_fsc_key, suply_mrkt_id,PROD_TYP_GRP_ID)
SELECT /*+ parallel(a,8) push_subq */ a.mngng_org_id, d.org_item_fg_key, a.suply_chain_loc_key,
g.clndr_dt_key, b.fscl_mth_id, a.actvty_typ_key, a.lcl_mrkt_id,
MIN (NVL (a.uom_id, -100)), MIN (NVL (f.crncy_id, -100)),
MIN (a.srce_sys_id), SUM (a.unit_qty), MIN (a.std_cost_amt), -100,
MIN (a.item_typ_id), a.mrkt_fsc_key, suply_mrkt_id,min(a.PROD_TYP_GRP_ID)
FROM invsl_stage.stg_invntry_projctn a JOIN codi_stage.stage_fscl_mth b
ON ( TO_CHAR (a.projctn_mth_dt, 'mm') = b.fscl_mth_nr
AND TO_CHAR (a.projctn_mth_dt, 'yyyy') = b.fscl_yr_nr
)
JOIN codi_stage.stage_org_fg d
ON (a.suply_mrkt_id = d.org_id AND a.item_cd = d.lcl_sc_fsc_cd)
LEFT OUTER JOIN codi_stage.stage_crncy f
ON (a.lcl_crncy_cd = f.crncy_cd)
JOIN codi_stage.stage_clndr_dt g ON (a.snpsht_dt = g.clndr_dt)
WHERE a.mngng_org_cd = lv_mrkt
AND a.lcl_item_typ = 'FG'
AND a.prod_typ_grp_cd = 'FG PRODUCT'
GROUP BY a.mngng_org_id,d.org_item_fg_key,a.suply_chain_loc_key,g.clndr_dt_key,
b.fscl_mth_id,a.actvty_typ_key,a.lcl_mrkt_id,a.mrkt_fsc_key,a.suply_mrkt_id;
July 24, 2008 - 10:29 am UTC
you might consider a direct path insert, insert /*+ APPEND */ and enabling parallel dml if you want to do a parallel direct path load.
How to insert 10 million rows...into a prtition
Sureshpswamy, August 08, 2008 - 6:40 am UTC
Hi,
It was working fine when I added the APPEND hint.
But just before the deployment I had a Change request to consider new table
mapsrpt_stage.stg_mrkt_fsc.But now the whole query went to old 3hr status.
Cuold you please help me .
INSERT INTO /*+ APPEND */ invsl.fact_invntry_projctn PARTITION (part_62) NOLOGGING
(mngng_org_id, org_suply_chain_item_key, suply_chain_loc_key,
snpsht_dt_key, fscl_mth_id, actvty_typ_key, slng_mrkt_id,
unit_msr_id, crncy_id, srce_sys_id, unit_qty, std_cost_amt,
sku_parms_key, item_typ_id, mrkt_fsc_key, suply_mrkt_id,
prod_typ_grp_id)
SELECT /*+ parallel(a,8) push_subq index(pk_stg_invntry_projctn) */
a.mngng_org_id, d.org_item_fg_key, a.suply_chain_loc_key,
g.clndr_dt_key, b.fscl_mth_id, a.actvty_typ_key, a.lcl_mrkt_id,
MIN (NVL (a.uom_id, -100)), MIN (NVL (f.crncy_id, -100)),
MIN (a.srce_sys_id), SUM (a.unit_qty), MIN (a.std_cost_amt), -100,
MIN (a.item_typ_id), a.mrkt_fsc_key, suply_mrkt_id,
MIN (a.prod_typ_grp_id)
FROM invsl_stage.stg_invntry_projctn PARTITION (ph) a JOIN codi_stage.stage_fscl_mth b
ON ( TO_CHAR (a.projctn_mth_dt, 'mm') = b.fscl_mth_nr
AND TO_CHAR (a.projctn_mth_dt, 'yyyy') = b.fscl_yr_nr
)
JOIN codi_stage.stage_org_fg d
ON (a.suply_mrkt_id = d.org_id AND a.item_cd = d.lcl_sc_fsc_cd)
LEFT OUTER JOIN codi_stage.stage_crncy f
ON (a.lcl_crncy_cd = f.crncy_cd)
JOIN codi_stage.stage_clndr_dt g ON (a.snpsht_dt = g.clndr_dt)
WHERE a.mngng_org_cd = 'PH'
AND a.lcl_item_typ = 'FG'
AND a.prod_typ_grp_cd = 'FG PRODUCT'
AND EXISTS (
SELECT 1
FROM mapsrpt_stage.stg_mrkt_fsc m
WHERE a.mrkt_fsc_key = m.mrkt_fsc_key AND m.catgry_id != 17
OR ( m.catgry_id = 17
AND EXISTS (
SELECT 1
FROM system_code
WHERE m.form_id = code
AND code_type =
'SALES_TOOLS_INCLUSION_FILTER'
AND market = 'GLOBAL')
))
GROUP BY a.mngng_org_id,
d.org_item_fg_key,
a.suply_chain_loc_key,
g.clndr_dt_key,
b.fscl_mth_id,
a.actvty_typ_key,
a.lcl_mrkt_id,
a.mrkt_fsc_key,
a.suply_mrkt_id
August 08, 2008 - 1:20 pm UTC
no, not really. you'd need to trace this, look at what it is doing, and see if there is a more efficient way to do it.
you'd need to understand the indexing scheme
you'd need to understand the QUESTION trying to be answered
you'd need to understand the relationships between the tables
and so on - you have that, I don't.
Update
krishna, August 19, 2008 - 12:13 am UTC
Hi Tom,
Can I update a table based on some criteria in the where clause which is having joins.
Ex: update emp set sal=1000 where emp.emp_no = <table>.<field_name>
pls, can u provide some examples.
Thanks
Krishna
August 20, 2008 - 10:18 am UTC
"U" is not available (I believe they are dead actually). Can I help instead?
you can update a join
ops$tkyte%ORA10GR2> create table emp as select * from scott.emp;
Table created.
ops$tkyte%ORA10GR2> create table dept as select * from scott.dept;
Table created.
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> alter table emp add constraint emp_pk primary key(empno);
Table altered.
ops$tkyte%ORA10GR2> alter table emp modify deptno not null;
Table altered.
ops$tkyte%ORA10GR2> alter table dept add constraint dept_pk primary key(deptno);
Table altered.
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> update ( select sal from emp, dept where emp.deptno = dept.deptno) set sal = 1000;
14 rows updated.
and of course a subquery also works:
ops$tkyte%ORA10GR2> update emp set sal = 1000 where deptno in ( select deptno from dept );
14 rows updated.
error logging clause for CTAS
Oxnard, August 29, 2008 - 10:55 am UTC
ver 10.2
Anyway to do an error logging clause for CTAS?
We do partition swaping with the CTAS tables
so for example we do
create table my_ctas (
col1 not null, -- note the not null
col2,
col3) tablespace tbl nologging
select
cast(col1 as varchar2(20)) as col1
,cast(col2 as number) as col2
,cast(col3 as number) as col3
from taba, tabb
where
taba.x = tabb.x
when there are some data problems it can be a pain to figure it out. If not how do I get a request onto the new features request.
Thanks
August 30, 2008 - 9:48 am UTC
since you are doing a cast like that, you know the types you want, you would accomplish your goal in two statements
create table my_table( col1 varchar2(30) not null, .... );
insert /*+ append */ into my_table select ..... log errors ....;
use the direct path insert to populate the table, it'll work very much in the same fashion as the create table as select performance wise. By pass buffer cache, write directly to disk, can be non-logged, parallel, etc.
not update but just select and insert
A reader, September 10, 2008 - 3:13 am UTC
I have a subset of parent table data in table A (around 4 Millions records) and detail table B of around 4.5 billion records. I need to query them and insert the 4 Million * X (number of matchin records in B) in new table C. There is index on both table A and B on matching key.
The problem is, when I query table B alone without any conditions, just select count(1) from B, it takes hours to return. There are more than 400 partitions (hash) on the table B and the values in partions are not uniformly distributed, does that cause slowing down ?
What should be the best approach in this condition ? I am thinking of querying table A and B and put the result in C using CTAS, is there another better way ? or any tips to make it faster ?
September 11, 2008 - 10:42 am UTC
the values in b have nothing to do with anything. You are counting the number '1' (and one might ask - why the number '1', why not '*', if your intention is to count "how many records do i have", you would use count(*) to show to the world you know what you are doing - by counting '1's, it looks like you are confused about how to count records....)
However, the fact that there are 4.5 billion records in there and let us assume an average row width of 100 bytes - you might be full scanning (you do not say if you have indexed a NOT NULL column on this table) about 1/2 of a terabyte of storage.
And depending on your storage, that just might take a while?
I would question the need for this table C, you give no justification for it, but it doesn't seem very useful
not update but just select and insert
P, September 11, 2008 - 1:03 pm UTC
For count(1), I read somewhere that if you just want to count records of a table, it is better than using count(*) , please correct me if I am wrong. I dont have the table analyzed or any other way to find how many rows in the table so just to tell you correct number, I queried select count(1). If count(*) is better, please let me know and also please explain why.
For table C which I mentioned in earlier question is result table from querying A and B and there would be more tables to update table C for some more columns and then table C will be used in main process with other applications's result tables to apply some other business rules for final output of some reports.
For more information on index and tables , table A and table B are indexed on NOT NULL columns (primary keys on A and not null indexed foreign key on B) and I need to query some more tables like B to update columns in C which would have same parent child relationship like A --> B.
sorry, if I was not clear on table C, but the actual question is what approach I should take ? I hope I am able to explain my question to you.
September 16, 2008 - 12:38 pm UTC
... For count(1), I read somewhere that if you just want to count records of a
table, it is better than using count(*) , please correct me if I am wrong. ...
consider yourself corrected then.
When you read that information - did they
a) say why?
b) show why?
c) demonstrate some noticeable improvement in response times?
Always look for that - if someone says "X will make your life better", you would as to see a, b, and c - do the same with all technical advice.
As for why - count(*) is the way, the documented way, the correct way to count rows. Count(*) is in fact "count rows".
Count( <expression> ) says "count non-null expressions", by saying count(1) you are not saying "count rows", you are in fact saying "count how many non-null number 1's are in this table".
We had to go and file a bug way back when to internally rewrite count(1) as count(*) in order to make things faster - this rumor was so rampant.
read this:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::NO::P11_QUESTION_ID:1156159920245 we've been discussing this for years and years and years now.....
other than saying you have a table A and B and lots of rows in B, I don't have much information to go on here. You do not say how large C is, if C is very large - it could be that full scans + hash join = right way.
If C is not, full scan A + nested loops into B might be the right way.
But in either case, this will take a MEASURABLE amount of time, you do not say how long it is currently taking, what your (possibly unreasonable) expectations are, what the current plan is, how big (in gigabytes - row counts are useless - they only describe the smallest the table could be) A and B are, what kinds of resources you have (parallel? lots of disk? tons of memory, etc...)
and I still doubt that C is necessary, but whatever, people keep building tables like this anyway... Unless you are going to full scan C frequently - C is probably not useful.
Question on Updating a Join
Dan, September 12, 2008 - 6:24 pm UTC
Tom,
I'm looking to rewrite a pretty commonly used procedure for getting phone numbers - it's currently written "slow by slow" and takes a long time on anything of decent size. I'm running into trouble when attempting to use the UPDATE on a JOIN that you specified earlier in this thread:
UPDATE (SELECT tmp.HOME_PHONE, phn.phn_area_cde_num, phn.PHN_NUM
FROM tmp_test1 tmp,
phone_table phn
WHERE tmp.cust_id = phn.cust_id
AND phn.phn_loc = 'H'
AND phn.phn_use = 'G'
AND phn.del = 'N')
SET HOME_PHONE = phn_area_cde_num || phn_num;
This results in a "ORA-01779: cannot modify a column which maps to a non key-preserved table".
This is in 10g. There are primary key restraints on each phone_table field (cust_id, phn_loc, phn_use, del) and I placed a unique constraint on tmp.cust_id after looking through your site a bit for the error. Using the below:
alter table tmp_test1 add constraint tmp_test1_unique
unique(cust_id);
But it's still happening. It seems like that should be sufficient information for Oracle to know there is only one possible result, correct? Or is there another constraint I need?
(Also, I found this interesting - I first ran the sql from another user, one that has permissions on tmp_test1 but not on phone_table, and got an "ORA-01031: insufficient privileges" error. It seems like this should work since I am only attempting to update the table I have access to, but looks like not)
September 16, 2008 - 5:22 pm UTC
You can use a merge here - but not the join. The optimizer won't recognize the constants there - it would want tmp to have phn_loc, phn_use, del and for tmp to join to phn using 4 columns (we have to join to the primary key of the PHN table in its entirety)
so, just use a merge command.
ops$tkyte%ORA11GR1> create table tmp ( a int primary key, y int );
Table created.
ops$tkyte%ORA11GR1> create table phn ( a int, b int, y int, primary key(a,b) );
Table created.
ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> update ( select tmp.y tmpy, phn.y phny from tmp, phn where phn.a = tmp.a and phn.b = 'x' )
2 set tmpy = phny;
set tmpy = phny
*
ERROR at line 2:
ORA-01779: cannot modify a column which maps to a non key-preserved table
ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> merge into tmp
2 using (select * from phn where b = 'x' ) phn
3 on (tmp.a = phn.a)
4 when matched then update set y = phn.y;
0 rows merged.
how to update millions row in produktion
Francisco Jimenez, October 15, 2008 - 7:50 am UTC
Hello Tom,
i need to update one column from one table. This Table have 250 millions rows.
My Problem is that someone can updating rows from this table at the same time while i am updating the table. How can i do this update? I would like to use CTAs but our DBA think that it is not a good idea.
thanks in advance.
Fran.
October 15, 2008 - 5:31 pm UTC
lock table t in exclusive mode;
update t set x = whatever;
commit;
get the table, update the table, un-get the table by committing.
A reader, November 17, 2008 - 1:00 pm UTC
We have a system in production for about 6 months. One process inserts records into a table and right now it has inserted about 250 million rows in the table. We want to start putting a retention period on this table and change the process a little more. So we want to do the following in production
1. Rename the existing table.
2. Create a new table with the same structure and let it run in production.
3. Every day early morning we want to do the following - check the data in the table created above and if the data is more then 3 days old, archive it to a history table.
4. Delete from the history table any data more than 90 days old.
Now for the step 3, it could be about a million - 2 million rows every day. What will be the best way to move the data or archive it to the history table ?
Thanks for your help.
November 18, 2008 - 7:29 pm UTC
one wonders, one really really really wonders....
why....
this was not part of the original design from day one, this is important stuff.
You want to use partitioning - anything else would be a hugely inefficient way to approach it.
Change datatype of column in 400 million row table
A reader, March 20, 2009 - 5:19 pm UTC
Tom,
We have a table with 400 million rows. The table is partitioned. We have to change the datatype of a column from number to varchar2. We are running 10gr2.
For some arcane reason, the end users want the column order to remain unchanged.
My thought would be to:
a) Add a new column of type number
b) Update the new column with old column
c) Set old column to NULL
d) Change the datatype of old column to varchar2
e) Update the old column with new column
f) Drop the new column
Is this the right approach or is there a better way considering the data volume? If this is the right approach, how can I speed it up while minimizing the downtime to users?
Thanks...
March 24, 2009 - 10:47 am UTC
... the end users want the column order to remain
unchanged.
...
we call that a VIEW.
alter table t add new_col varchar2(25);
update t set new_col = number_col, number_col = null;
alter table t drop column number_col;
alter table t rename to t_table;
create or replace view t
as
select ....... <in any order, with any name you like>
from t_table;
DO NOT do the 'curly shuffle' like you suggest above, that would just be a waste.
Change datatype of column in 400 million row table
A reader, March 24, 2009 - 11:08 am UTC
Tom,
Thanks for your reply.
Following your example, from a test in non-production database, updating the column took about 15 minutes for 1 million row table. For 400 million row table, if I linearly extrapolate, it will take about 100 hours. How can I speed it up?
Thanks...
March 29, 2009 - 10:46 am UTC
parallel update.
or, use dbms_redefinition which can apply functions and conversions as you redefine the table, you'd need two copies, but it can be 100% online, your end users don't have to have any interrupted service.
actually, the fastest way to do something is NOT TO DO IT in the first place. I would go back and seriously re-evaluate your "need" to perform this operation at all...
Bulk operation
Sam, April 20, 2009 - 10:59 am UTC
Hi Tom,
We have a table with 60 million record in production. We need to do a clean up activity based on the following logic.
For a particular FORDERUNG_ID, if the column values in the column REST_KOST_UNVERZ, REST_KAP_VERZ differs, we need to keep those rows. If the value in the following row ( order by GUELTIG_VON_DAT ) has the same value in the columns REST_KOST_UNVERZ and REST_KAP_VERZ, we need to delete that row. Here the problem for me is that I don't know how to compare rows of the same table in a particular order. Sample data is provided below
FORDERUNG_ID REST_KOST_UNVERZ REST_KAP_VERZ GUELTIG_VON_DAT GUELTIG_BIS_DAT
1387221 10 3769.59 31.10.2008 10:21:35 25.11.2008 13:07:01
1387221 10 3769.59 25.11.2008 13:07:02 26.11.2008 13:07:01
1387221 0 0 26.11.2008 13:07:02 27.11.2008 13:07:01
1387221 10 10 27.11.2008 13:07:02 04.02.2009 23:59:59
1387221 0 0 05.02.2009 00:00:00 06.02.2009 23:59:59
1387221 0 0 10.02.2009 00:00:00 31.12.9999 00:00:00
1387224 0 340.16 31.10.2008 10:21:35 25.11.2008 13:07:01
1387224 116.4 340.16 25.11.2008 13:07:02 04.02.2009 23:59:59
1387224 116.4 340.16 05.02.2009 00:00:00 06.02.2009 23:59:59
1387224 116.4 340.16 10.02.2009 00:00:00 31.12.9999 00:00:00
The output data should be
FORDERUNG_ID REST_KOST_UNVERZ REST_KAP_VERZ GUELTIG_VON_DAT GUELTIG_BIS_DAT
1387221 10 3769.59 31.10.2008 10:21:35 25.11.2008 13:07:01
1387221 0 0 26.11.2008 13:07:02 27.11.2008 13:07:01
1387221 10 10 27.11.2008 13:07:02 04.02.2009 23:59:59
1387221 0 0 05.02.2009 00:00:00 06.02.2009 23:59:59
1387224 0 340.16 31.10.2008 10:21:35 25.11.2008 13:07:01
1387224 116.4 340.16 25.11.2008 13:07:02 04.02.2009 23:59:59
If you see that I need to compare current rows and previous row to see if there is any difference after ordering by GUELTIG_VON_DAT (valid from date). For your information, GUELTIG_BIS_DAT is valid to date.
Difference between GUELTIG_VON_DAT and GUELTIG_BIS_DAT of next row is GUELTIG_VON_DAT +1 second.
I need a query that returns the result so that I can insert into second table with append mode, drop the original table and rename second table to original table. Can you please help me out?
April 21, 2009 - 2:27 pm UTC
... For a particular FORDERUNG_ID, if the column values in the column REST_KOST_UNVERZ, REST_KAP_VERZ differs, we need to keep those rows. If the value in the following row ( order by GUELTIG_VON_DAT ) has the same value in the columns REST_KOST_UNVERZ and REST_KAP_VERZ, we need to delete that row. ....
did not make sense, make that more clear and of course provide create tables and inserts - else I cannot really look at it.
It sounds like to me that if you have a row whereby REST_KOST_UNVERZ, REST_KAP_VERZ differs - keep that row (so, you only look within a row, therefore decode( REST_KOST_UNVERZ, REST_KAP_VERZ, 0, 1 ) = 1 are rows to KEEP (they do not differ) always.
Then it gets difficult to understand. For rows where they do not differ (so we are only interested in them I think??) we need to delete that row.
So basically, generate a set of rows whereby REST_KOST_UNVERZ, REST_KAP_VERZ are the SAME and de-duplicate using a key of FORDERUNG_ID, REST_KOST_UNVERZ, REST_KAP_VERZ - keeping only the oldest record? right?
Re: Bulk operation
Jakub Illner, April 22, 2009 - 11:12 am UTC
Sam, I suppose you should use the LAG/LEAD analytical function which will give you the values of the previous or next row. In the next step you can compare the values in the current versus previous row and decide what to do with the record - delete or keep it. The select (not tested) could look like this:
select FORDERUNG_ID
, REST_KOST_UNVERZ
, REST_KAP_VERZ
, GUELTIG_VON_DAT
, GUELTIG_BIS_DAT
, case
when REST_KOST_UNVERZ = next_REST_KOST_UNVERZ
and REST_KAP_VERZ = next_REST_KAP_VERZ
then 'DELETE'
else 'KEEP'
end as action
from (
select FORDERUNG_ID
, REST_KOST_UNVERZ
, REST_KAP_VERZ
, GUELTIG_VON_DAT
, GUELTIG_BIS_DAT
, lead(REST_KOST_UNVERZ,1)
over (partition by FORDERUNG_ID
order by GUELTIG_VON_DAT
) as next_REST_KOST_UNVERZ
, lead(REST_KAP_VERZ,1)
over (partition by FORDERUNG_ID
order by GUELTIG_VON_DAT
) as next_REST_KAP_VERZ
prev_REST_KAP_VERZ
from <table>
)
Regards,
Jakub
Rewrite the update
sriram, April 24, 2009 - 8:00 am UTC
Tom,
I am trying to implement the technique(create new_table as select <update statement> from old_table) that you mentioned on the below sql , but no clues on proceeding.
Please can you suggest a good way of doing this?
UPDATE PS_PI_TAX_DETAIL TAX SET PI_STATE = (NVL((SELECT J_STATE_TAX FROM PS_J_PAYCHECK PAY WHERE PAY.EMPLID = TAX.PI_EMPLID AND PAY.WEEK_END_DT = TAX.PAY_END_DT AND PAY.J_PAYCHECK_NBR = TAX.PI_CHECK_NBR AND SUBSTR(PAY.J_CONTROL_NUMB, 1, 6) = TAX.PI_PAY_GROUP2),:1)) ,PI_LOCALITY = (NVL((SELECT J_LOCAL_TAX FROM PS_J_PAYCHECK PAY WHERE PAY.EMPLID = TAX.PI_EMPLID AND PAY.WEEK_END_DT = TAX.PAY_END_DT AND PAY.J_PAYCHECK_NBR = TAX.PI_CHECK_NBR AND SUBSTR(PAY.J_CONTROL_NUMB, 1, 6) = TAX.PI_PAY_GROUP2),:1)) WHERE PI_RUN_NUM = 0 AND PI_SYSTEM_ID = :2 AND PI_CONFIG_ID = :2
April 27, 2009 - 12:18 pm UTC
wouldn't you guys hate it if I wrote all of my sql like that? Isn't that just about the most impossible thing to read? All uppercase, no sensible line breaks, totally impossible to read.
you'd have to select out every row of course (your update has a where clause... you are not updating every row)
I think you want to use merge, merge into tax using pay on (join condition) with a WHEN MATCHED clause to update.
merge into ps_pi_tax_detail tax using PS_J_PAYCHECK pay on ( PAY.EMPLID= TAX.PI_EMPLID AND PAY.WEEK_END_DT = TAX.PAY_END_DT AND PAY.J_PAYCHECK_NBR = TAX.PI_CHECK_NBR AND SUBSTR(PAY.J_CONTROL_NUMB, 1, 6) = TAX.PI_PAY_GROUP2 and tax.pi_run_num = 0 and tax.pi_system_id = :b2 and tax.pi_config_id = :b2 ) when matched then update set pi_state = nvl(pay.j_state_tax,:b1), pi_locality = nvl(pay.j_local_tax,:b1)
like that. Only kidding, more like this:
ops$tkyte%ORA10GR2> merge into ps_pi_tax_detail tax
2 using PS_J_PAYCHECK pay
3 on ( PAY.EMPLID= TAX.PI_EMPLID
4 AND PAY.WEEK_END_DT = TAX.PAY_END_DT
5 AND PAY.J_PAYCHECK_NBR = TAX.PI_CHECK_NBR
6 AND SUBSTR(PAY.J_CONTROL_NUMB, 1, 6) = TAX.PI_PAY_GROUP2
7 and tax.pi_run_num = 0
8 and tax.pi_system_id = :b2
9 and tax.pi_config_id = :b2 )
10 when matched then update set pi_state = nvl(pay.j_state_tax,:b1),
pi_locality = nvl(pay.j_local_tax,:b1)
11 /
the nvl probably isn't needed anymore since we are not using scalar subqueries.
Deleting records in table
Siddu, May 27, 2009 - 7:26 am UTC
Hi Tom,
We are trying to delete 5 lakh records weekly. These 5 lakh records are selected based on criteria which happens dynamically. We are using Perl at the front end. Deletion of records is taking 20 minutes. We were looking at variuos options to increase the performance. One approach I thought was to write a stored procedure to delete the records and do a service side execution instead of client side (here it is perl). Could you please suggest any other ways we can tune the performance.
Cheers,
Siddu
May 27, 2009 - 11:36 am UTC
so, I suppose 5 lakh is 500,000 (best to be unambiguous - acronyms, colloquialisms - should be avoided...)
No one could say if your idea of moving code from perl to plsql will have any impact on performance. Anyone that says they do is lying.
There is no information here to help you.
"These 5 lakh records are selected based on criteria which happens dynamically" is not meaningful, I have no idea what that means, what that entails.
You do not say if you issue 500,000 individual delete statements (bad).
You do not say if you run a query to find 500,000 records and then issue a delete for each one (worse)
You do not say if you run a single delete (good), but you are deleting 500,000 records out of 5,000,000,000 and using a full scan (probably bad)
In short, you don't tell us what you are doing - what you have observed.
20 minutes - that might be really good (if the table has 20 indexes on it - that might be as good as you get).
What don't you (in PAINSTAKING DETAIL) tell us what you need to do and then (and only then, do not skip that part, the telling us what you are trying to do in specification format) tell us your current algorithm.
Then we can suggest methods for rewriting your algorithm more efficiently.
abdou, May 28, 2009 - 10:01 am UTC
hi tom
please i want query show 3 big sal employee from table emp
thank
May 28, 2009 - 1:56 pm UTC
what if 4,000 people all make the same exact really big salaries.
be specific do you want:
a) a random set of 3 or less people that make a salary that is in the first three rows after sorting the entire set by SAL DESC
select * from (select * from emp order by sal desc) where rownum <= 3;
b) the set of employees (maybe 0 to many many rows) that make the top 3 salaries - accounting for 'ties'. That is, if 50 people make $100, and 50 people make 75$ and 50 people make $50 and 50 people make $25 - this answer would return only the 50 people that make $100 (since they all tied for first place, there is no 2nd, 3rd, 4th, .... 50th place person since the first 50 all tied for that honor)
select * from (select emp.*, rank() over (order by sal desc) r from emp) where r <= 3;
c) the set of employees that make a salary that is in the top three salaries. In the above example that would return 150 people - the first three groups
select * from (select emp.*, dense_rank() over (order by sal desc) r from emp) where r <= 3;
updating some columns & 500,000 records in a table
Prem, June 29, 2009 - 9:23 am UTC
hi Tom,
i have a requirement where i need to update a table of 1.5 million records with 300 columns, based on various conditions(as much as 500 different sets of conditions arrived after executing predefined rules)
and the number of rows getting affected could be .5 million and out of 300 columns 20 columns will be updated in the .5 million records
So I need to fire one update query per condition so that makes 500 update statment to be fired and this takes more than 10 hours.
Could you please let me know the best possible solution to handle this situation.
thanks,
prem
3 million rows to update with a subselect
Rebecca, July 17, 2009 - 12:57 pm UTC
Hi Tom,
In a data expansion program I took a set of 22 rows and burst them into 100k+ sets, resulting in
around 3 million rows.
Each set has a common key (FAMILY_ID) and each row a unique key (MEMBER_ID).
The data in the table is structured parent/child-- with the parent's MEMBER_ID being stored in
PARENT_MEMBER_ID.
There are multiple parents in each family.
When I created the 3 million rows- every row gets a new unique MEMBER_ID.
So the new keys to the parent rows need to be remapped to the children they are parents of.
There are about 1 million parents and 2 million children. Its only parent/child- no grandchildren.
Here's where I started:
UPDATE FAMILIES_TBL A SET PARENT_MEMBER_ID = (SELECT X.MEMBER_ID FROM FAMILIES_TBL X
WHERE X.OLD_MEMBER_ID = A.PARENT_MEMBER_ID AND A.BRANCH = X.BRANCH AND A.FAMILY_ID =
X.FAMILY_ID) WHERE PARENT_MEMBER_ID IS NOT NULL
FAMILIES_TBL is a temp table without an index, which was great for the inserts. But we'd both
grow very old before the update completes.
After reading this excellent thread, I changed it to:
CREATE TABLE FAMILIES_TBL_HLD AS SELECT * FROM FAMILIES_TBL WHERE 1=2;
ALTER table FAMILIES_TBL nologging;
ALTER table FAMILIES_TBL_HLD nologging;
Created an index on FAMILIES_TBL("PARENT_MEMBER_ID", "OLD_MEMBER_ID", "BRANCH",
"FAMILY_ID")
insert into FAMILIES_TBL_HLD (PARENT_MEMBER_ID,<rest of fields>)
SELECT (SELECT X.MEMBER_ID FROM FAMILIES_TBL X WHERE A.PARENT_MEMBER_ID = X.OLD_MEMBER_ID AND
A.BRANCH = X.BRANCH AND A.FAMILY_ID = X.FAMILY_ID),<rest of fields> FROM FAMILIES_TBL A WHERE PARENT_MEMBER_ID IS NOT NULL;
Still pretty bad. Any suggestions? It certainly feels like a classic pattern, but Im not 'getting it'.
<table and field names changed to protect the innocent>
Thank you in advance for your suggestions.
Rebecca
Re: 3 million rows to update with a subselect
Carsten, July 20, 2009 - 12:47 pm UTC
Hi Rebecca,
if we are talking about a one time issue: create an index to support the join operation.
otherwise:
I guess the fastest way to solve your problem is to not do it at all.
Basically you seem to have an ID that changes (OLD_MEMBER_ID -> MEMBER_ID) and then update a corresponding FK-field.
If your Member_ID changes, you shouldn't use it as PK, but instead generate one and use this as reference in FKs.
Generate a view to display the parent_member_id in your old fashion, if you need to, which would greatly benefit from above index as well. So from solving a repetitive task you got to solving it once.
Mazhar
Mazhar, August 17, 2009 - 3:03 am UTC
Great Stuff
Update 40 million rows
A reader, August 24, 2009 - 6:52 am UTC
Hi Tom,
We are working on a warehouse and want to update Type 1 and type 2 records from the delta on a given day (which is 40 million) to a table which has 3.5 billion rows.
Some of the approaches which we could think are:
1) Create a dummy copy of the 3.5 billion row table.Run an update statement to this copy. Build indexes and then do an exchamge. this effort takes around 14 hrs total to build 13 indexes along with creating a copy of 3.5 billion rows.
2) We are thinking of another approach to directly update the Dimension on-line. Will that approach work.
3) The delta is sitting in 64 hash sub partitions and the dimension is also hashed partition 64. Can we take advantage of Partition wise Join.
4) Should we try DIY parallelism, as suggested in your book.
5) We are thinking of using Updateable Join view.
Kind of need some guidance which way to try this out.
Thanks
August 25, 2009 - 9:41 am UTC
since you are only touching 40mill out of 3.5billion - why not just update the data?
Update 40 million
A reader, August 25, 2009 - 11:28 am UTC
Hi Tom,
Thanks for your reply. So as I understand we should update the data blocks on-line.
The reason I was confused was we have 12 indexes on this dimension and if we do update the data blocks we will be doing index maintenance and most of the indexes are BITMAP.
Also can you plesae suggest if we should use:
1) Nested Loops as the Join Method (so there will be 40 million index lookups before it touches the data blocks) and updates them OR we should use HASH JOIN to bring the 40 million in hash and scan the bigger dimension (3.5 billion rows) via ROWID Range Scan.
2) I feel the Partition wise Join will scale better since we have 64 hash partitions and each partition of the delta table will own join with the corresponding hash partion of the dimension, so join will happen with 3.5/64 million rows and 40/64 million rows as 64 indepemdent joins.
3) Will NL Partition wise join work, i was eading the docs but could not find any perfect answer to this.
Please help and guide us.
August 25, 2009 - 8:30 pm UTC
... The reason I was confused was we have 12 indexes on this dimension and if we do
update the data blocks we will be doing index maintenance and most of the
indexes are BITMAP.
...
do it as a single statement - in bulk - one sweep.
do NOT do it 40,000,000 times, a slow by slow approach
do it as a single sql
1) write the sql, let the optimizer decide.
2) see #1
3) see #2
Slow Bulk Collect
A reader, August 27, 2009 - 3:12 pm UTC
Tom:
1. I have a two partitioned table with enable row movement.
partition 2 has unprocessed records , partition 1 processed records
2. I have to update around 2 mill rows which will make them move from partition 2 to partition 1.
3. I update them with normal update and it takes around 40 minutes.
4. I rewrote a procedure to do BULK UPDATE rather than simple update.
5. It takes around 60 minutes to do BULK update.
The SQL trace shows Bulk update happening:
UPDATE TABLE A
SET COL1 = :B4 ,
COL2 = :B3 ,
COL3 = :B2
WHERE
COL4 = :B1
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1708 1203.22 2235.05 203830 7981227 99254416 1707253
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1709 1203.22 2235.05 203830 7981227 99254416 1707253
Is my BULK update which should in turn do a BULK - INSERT, BULK - DELETE for partition row movement happening? How do I trouble shoot this.
please provide your valuable input to me to troubleshoot this.
Thanks in Advance.
August 28, 2009 - 5:02 pm UTC
why would you introduce slow procedural code what you have a fast single sql statement?
there is nothing to trouble shoot, I've written a billion times by now:
<quote from every book I've ever written and many postings/articles>
I have a pretty simple philosophy when it comes to developing database software, and it is one that has not changed over many years:
* You should do it in a single SQL statement if at all possible.
* If you cannot do it in a single SQL statement, then do it in PL/SQL (but as little PL/SQL as possible!).
* If you cannot do it in PL/SQL (due to some missing feature like listing the files in a directory), try a Java stored procedure. This is an extremely rare need today with Oracle9i and above.
* If you cannot do it in Java, do it in a C external procedure. This is most frequently the approach when raw speed or the use of a third-party API written in C is needed.
* If you cannot do it in a C external routine, you might want to think seriously about why exactly you need to do it.
</quote>
procedural code is going to be slower than a set based operation to do the same thing in almost every case (exceptions? sure, there are strange edge cases). In this case - a simple update is the only thing to consider.
additional information to prev review
A reader, August 27, 2009 - 4:56 pm UTC
Here are the wait events associated:
Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 28 (recursive depth: 1)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 203828 0.54 1017.05
latch: cache buffers chains 95 0.00 0.02
read by other session 27 0.04 0.26
buffer deadlock 1 0.00 0.00
latch free 20 0.00 0.00
buffer busy waits 26 0.10 0.21
log buffer space 283 0.42 15.33
log file switch completion 27 0.08 0.46
latch: object queue header operation 5 0.00 0.00
So Basically single straight update is taking same amount of time as the BULK update.
Thanks in advance...
slow bulk update
A reader, August 27, 2009 - 5:58 pm UTC
I have gone through another series of questions which you answered. I think simple straight update SQL performs best.
Thanks,
PL/SQL vs SQL
A reader, August 29, 2009 - 3:36 pm UTC
Hi Tom,
I have a question for your regarding SQL and Performance.
...I have a summary table that maintains counts based on a File ID and Region that's used for reporting. The summary table is a GTT(Global Temp Table).
In this table - I have about 250 fields - that represent when a field has changed - so if it's data is number, char date and so on .. and if's changed I update this table.
e.g update tmp_counts
set field1 = field1 + decode(nvl(x(1).field1, 9999),
nvl(x(1).field1, 9999),0,
9999,1,1),
...etc
example above is to monitor if a numeric field has changed. If so increment the counter for that field.
Now...this update to the table is called in a proc with array parameters (the result from a bulk collect using NOCOPY). The procedure is called millions of times to monitor changes.
Now, if I write the above code updating an array instead of a GTT) procedurely using dbms_sql.execute - i.e checking each individual field using IF..ENDIF statements. It takes much, much, much longer. I would have thought updating a small summary table millions of times would have been slower than updating an array in memory? Can you explain this behaviour? Is it quicker doing things in "a set" using an update rather than doing stuff procedurely in PL/SQL. Thanks!
August 29, 2009 - 7:29 pm UTC
... procedurely
using dbms_sql.execute - i.e checking each individual field using IF..ENDIF
statements. It takes much, much, much longer. ...
you thought that compiling code and dynamically executing it over and over would be faster? why?
... Is it quicker doing things in
"a set" using an update rather than doing stuff procedurely in PL/SQL. Thanks!
...
almost always - search this site for the word
mantra
and read about my mantra
PL/SQL vs SQL
A reader, August 29, 2009 - 9:55 pm UTC
Hi Tom,
I understand what you mean by "mantra" and I have all your books - it's just that "intuitively" you'd think doing something in memory would be quicker than doing I/O (from datafiles to buffer cache) is quicker? ... I process millions of records (bulk collect)from a transaction table - are there possibilities to do processes in parallel? I think if you want to use the parallel hint/degree you need to put the table in "parallel mode" ...
August 31, 2009 - 7:13 am UTC
... it's just
that "intuitively" you'd think doing something in memory would be quicker than
doing I/O (from datafiles to buffer cache) is quicker?...
Not if your procedural code you write to modify memory is a billion times less efficient than the code we wrote to modify memory is.
As for your last bit - you have a serial bit of code that updates rows slow by slow (sequentially). How would parallel SQL help you - you have a serial process.
If you want something to run orders of magnitude faster - the typical approach is
a) forget the existing code, put it out of your head for a minute
b) write down in GORY detail what it is you have to do
c) design a set based algorithm to achieve that goal (from scratch)
don't buy into 'sunk cost' (the code you have isn't a sunk cost, it is a failed first implementation). Now that you know what you need to do - do it over with SQL set based operations at the core of your thought process.
Insert 50 million rows to cluster table from another table
Hopkins, September 16, 2009 - 4:46 am UTC
Hi Tom
I need to add more than 50 million rows from one table to another. Both are clustered table
I use the following command
insert /*+ append parallel 16 */ into second_tab
select * from first_tab;
But it takes more than 10 hours and hangs
Could you please show me the best way how to achieve it and some information about parallel hint. SHould I give more than 16 or less?
Thanks
September 16, 2009 - 4:05 pm UTC
... But it takes more than 10 hours and hangs ...
what does it "hang" on - I hate that word, very rarely is it every used correctly. define what you mean by hang.
you cannot direct path load a clustered table, the append hint, not useful. In a cluster - data has a LOCATION, a PLACE where it needs to be. For every row going into the cluster, before we put it away, we have to find out where it needs to go.
A direct path load presumes "data is streaming in, we'll format a block, fill a block, write a block". If the data streaming in has to go onto a SPECIFIC block, we cannot do that.
So, the append hint - not meaningful.
The parallel hint - not specified in any documented format (eg: not correct).
http://docs.oracle.com/docs/cd/B19306_01/server.102/b14200/sql_elements006.htm#SQLRF50801 Also, you do not say if you tried even to enable parallel dml (pre-req for parallel dml is to enable it).
Also, there is the problem that clustered tables cannot be created or loaded in parallel using parallel dml - for the same reason direct path doesn't work. The data has a location - multiple loaders would all need access to the same exact locations at the same time.
you do not give us anything to go on here.
a) is the source table in a cluster with other tables, if so, then you realize a full scan will "not be fast", as it has to full scan not only the source table but EVERY OTHER table in there.
b) is the targeted table a btree cluster? hash cluster?
c) are the size and keys settings set correctly?
d) is the target table indexed?
what you should do
a) enable tracing
b) insert some of the data
c) review the work performed, see what the waits were, see if any can be 'removed' (any of the waits)
Here is me doing 50,000,000 rows on a rather old piece of hardware. 10hours would be outrageous.
it also demonstrates that append doesn't work (by showing what append would do if it did - cause a subsequent query to fail).
so, no append
no parallel load
I presume you have lots of indexes - that'll contribute to long load times.
You do have to realize that every row that goes into the cluster will either scan a btree cluster index OR run a hash function (you didn't give us cluster details). That entails quite a bit of work. Clusters take longer to load into (by definition) - their goal is to make data retrieval faster, but loads will necessarily take longer.
You might be able to do a bit of parallel query processing if you do "DIY parallel processing"
http://asktom.oracle.com/pls/ask/search?p_string=diy+parallelism ops$tkyte%ORA10GR2> /*
ops$tkyte%ORA10GR2> drop table stage;
ops$tkyte%ORA10GR2> create table stage
ops$tkyte%ORA10GR2> as
ops$tkyte%ORA10GR2> select level id, rpad('*',120,'*') data from dual connect by level <= 50000000;
ops$tkyte%ORA10GR2> */
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create cluster btree_cluster ( id number ) size 125
2 /
Cluster created.
ops$tkyte%ORA10GR2> create index btree_cluster_idx on cluster btree_cluster
2 /
Index created.
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create table t
2 as
3 select level id, rpad('*',120,'*') data from dual connect by level <= 0;
Table created.
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> insert /*+ append */ into t select 1, 'x' from dual;
1 row created.
ops$tkyte%ORA10GR2> select * from t;
select * from t
*
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> drop table t;
Table dropped.
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create table t
2 cluster btree_cluster( id )
3 as
4 select level id, rpad('*',120,'*') data from dual connect by level <= 0;
Table created.
ops$tkyte%ORA10GR2> select * from t;
ID
----------
DATA
-------------------------------------------------------------------------------
1
*******************************************************************************
*****************************************
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> rollback;
Rollback complete.
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> set timing on
ops$tkyte%ORA10GR2> insert into t select * from stage;
50000000 rows created.
Elapsed: 02:14:43.63
ops$tkyte%ORA10GR2> set timing off
Updating existing rows with sequence
SAI, February 11, 2010 - 9:42 am UTC
Hi Tom,
I have Two tables 'CLIENT_DETAIL' and 'CLIENT_STAUS_HIST' with same table structure.
ID, TimeModified, XXX, Versioin
Version is newly added column in both records.
Here ID is unique in CLIENT_DETAIL, but not in CLIENT_STATUS_HIST.
When ever CLIENT_DETAIL is updated the details are stored in CLIENT_STATUS_HIST.
I have millions of records in CLIENT_STATUS_HIST. Now i want to update the version column sequencially.
CREATE OR REPLACE
PROCEDURE CLIENT_STATUS_VERSION_UPDATE AS
counter NUMBER;
v_client_id CLIENT_STATUS_HIST.CLIENT_DETAIL_ID%TYPE;
CURSOR CLIENT_HIST IS
SELECT CLIENT_DETAIL_ID FROM client_status_hist
GROUP BY client_detail_id;
CURSOR CLIENT_HIST_FOR_UDATE (client_id in NUMBER) IS
SELECT * FROM client_status_hist WHERE client_detail_id=v_client_id
ORDER BY LAST_MODIFIED_TMSTMP ASC FOR UPDATE;
BEGIN
OPEN CLIENT_HIST;
LOOP
FETCH CLIENT_HIST INTO v_client_id;
counter:= 0;
FOR v_row IN CLIENT_HIST_FOR_UDATE (v_client_id)
LOOP
UPDATE CLIENT_STATUS_HIST SET VERSION= counter where current OF CLIENT_HIST_FOR_UDATE;
counter:= counter + 1;
EXIT WHEN CLIENT_HIST_FOR_UDATE%NOTFOUND;
END LOOP;
CLOSE CLIENT_HIST_FOR_UDATE;
UPDATE CLIENT_DETAIL SET VERSION= counter where client_detail_id= v_client_id;
EXIT WHEN CLIENT_HIST%NOTFOUND;
END LOOP;
CLOSE CLIENT_HIST;
END CLIENT_STATUS_VERSION_UPDATE;
This Procedure takes nearly 3 hours to complete the task. Is there any other way to do it in less time?
February 16, 2010 - 9:51 am UTC
ops$tkyte%ORA10GR2> create table t as select * from all_users;
Table created.
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> alter table t add version number;
Table altered.
<b>
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> merge into t
2 using (select rowid rid, row_number() over (order by created asc) rn from t) x
3 on (t.rowid = x.rid)
4 when matched then update set version = x.rn;
46 rows merged.
</b>
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select * from (
2 select * from t order by version ) where version <= 5;
USERNAME USER_ID CREATED VERSION
------------------------------ ---------- -------------------- ----------
SYS 0 30-jun-2005 19:10:18 1
SYSTEM 5 30-jun-2005 19:10:18 2
OUTLN 11 30-jun-2005 19:10:24 3
DIP 19 30-jun-2005 19:14:45 4
TSMSYS 21 30-jun-2005 19:20:34 5
Updating existing rows with sequence
SAI, February 17, 2010 - 3:46 am UTC
Tom,
Thank you for the follow up.
my table t looks some thing like this.
USERNAME USER_ID CREATED VERSION
------------------------------ ---------- -------------------- ----------
SYS 0 30-jun-2005 19:10:18 1
SYS 0 30-jun-2005 19:11:18 2
SYSTEM 5 30-jun-2005 19:10:18 1
OUTLN 11 30-jun-2005 19:10:24 1
OUTLN 11 30-jun-2005 19:11:24 2
OUTLN 11 30-jun-2005 19:13:24 3
DIP 19 30-jun-2005 19:14:45 1
TSMSYS 21 30-jun-2005 19:20:34 1
February 17, 2010 - 9:15 am UTC
just use partition by if you need to break up and reset by username.
ops$tkyte%ORA11GR2> create table t as select * from all_users where username like 'C%' or username like 'S%' or username like 'M%';
Table created.
ops$tkyte%ORA11GR2> insert into t select * from all_users where username like 'C%' or username like 'S%';
9 rows created.
ops$tkyte%ORA11GR2> insert into t select * from all_users where username like 'S%';
8 rows created.
ops$tkyte%ORA11GR2> alter table t add version number;
Table altered.
ops$tkyte%ORA11GR2> merge into t
2 using (select rowid rid, <b>row_number() over (PARTITION BY USERNAME order by created asc) rn</b> from t) x
3 on (t.rowid = x.rid)
4 when matched then update set version = x.rn;
29 rows merged.
ops$tkyte%ORA11GR2> select * from t order by username, version;
USERNAME USER_ID CREATED VERSION
------------------------------ ---------- ------------------- ----------
CTXSYS 43 2009-08-13 23:09:45 1
CTXSYS 43 2009-08-13 23:09:45 2
MDDATA 65 2009-08-13 23:19:11 1
MDSYS 57 2009-08-13 23:12:05 1
MGMT_VIEW 74 2009-08-13 23:24:58 1
SCOTT 84 2009-08-13 23:35:44 1
SCOTT 84 2009-08-13 23:35:44 2
SCOTT 84 2009-08-13 23:35:44 3
SH 88 2009-09-14 15:09:16 1
SH 88 2009-09-14 15:09:16 2
SH 88 2009-09-14 15:09:16 3
SI_INFORMTN_SCHEMA 56 2009-08-13 23:12:05 1
SI_INFORMTN_SCHEMA 56 2009-08-13 23:12:05 2
SI_INFORMTN_SCHEMA 56 2009-08-13 23:12:05 3
SPATIAL_CSW_ADMIN_USR 70 2009-08-13 23:22:20 1
SPATIAL_CSW_ADMIN_USR 70 2009-08-13 23:22:20 2
SPATIAL_CSW_ADMIN_USR 70 2009-08-13 23:22:20 3
SPATIAL_WFS_ADMIN_USR 67 2009-08-13 23:22:15 1
SPATIAL_WFS_ADMIN_USR 67 2009-08-13 23:22:15 2
SPATIAL_WFS_ADMIN_USR 67 2009-08-13 23:22:15 3
SYS 0 2009-08-13 23:00:59 1
SYS 0 2009-08-13 23:00:59 2
SYS 0 2009-08-13 23:00:59 3
SYSMAN 72 2009-08-13 23:22:32 1
SYSMAN 72 2009-08-13 23:22:32 2
SYSMAN 72 2009-08-13 23:22:32 3
SYSTEM 5 2009-08-13 23:00:59 1
SYSTEM 5 2009-08-13 23:00:59 2
SYSTEM 5 2009-08-13 23:00:59 3
29 rows selected.
Updating existing rows with sequence
SAI, February 18, 2010 - 3:51 am UTC
Thank you tom. Thank you very much.
Update huge table
Andrey, March 03, 2010 - 3:45 pm UTC
Hello!
I'm using next method last 3 years & it works well
(But i'm not working with clustered tables...):
1) Open cursor for the query
SELECT O.DATA_OBJECT_ID
,E.RELATIVE_FNO AS FILE_ID
,E.BLOCK_ID
,E.BLOCKS
FROM ALL_OBJECTS O
,DBA_EXTENTS E
WHERE O.OWNER = :PR_SCHEMA_NAME
AND O.OBJECT_NAME = :PR_TABLE_NAME
AND O.DATA_OBJECT_ID IS NOT NULL
AND E.OWNER = O.OWNER
AND E.SEGMENT_NAME = O.OBJECT_NAME
AND NVL(E.PARTITION_NAME, E.SEGMENT_NAME) = NVL(O.SUBOBJECT_NAME, O.OBJECT_NAME)
ORDER BY E.FILE_ID
,E.BLOCK_ID
2) Split by this cursor segment to many continuous parts
of blocks (i am using regions from 8000 to 12000 blocks) & construct start and finish rowids of regions by dbms_rowid
3) Save regions parameters to a queue
4) Start scheduler processes which read regions from queue and update data only between start_rowid and end_rowid of the region with commit for any operation.
P.S.> I've update 525 000 000 records on the production DB for an 2:40:15. 24CPU, 16 Job process, 24Gb mem + high-perf. storage.
Merging 3 very huge tables.
Prats, March 11, 2010 - 2:47 am UTC
Tom,
In one of the legacy database we have inherited, there are 3 existing tables with millions of records. Most of their columns are redundant and now we are planning to merge their data together in a new table which would contain all columns from these tables.
For this I will essentially have to perform following-
Step 1) insert table_A into new_table
Step 2) Traverse through table_B update respective columns in new_table [if corresponding record has been inserted from table_A].
Else insert new record from table_B.
Step 3) Repeat step 2 with table_C to update existing records and insert extra records from table_C
Though this is one time job, it would be done at production systems.What would be the optimized approach to go about it.
Please guide...
March 11, 2010 - 8:31 am UTC
why not insert a join of the three????
RE: Merge 3 huge
Duke Ganote, March 11, 2010 - 3:35 pm UTC
Like this?
create table A
as ( select level AS key_value, level as attribute_value
from dual connect by level < 5 )
;
create table B
as select level-1 AS key_value, level as attribute_value
from dual connect by level < 5
;
create table C
as select level-2 AS key_value, level as attribute_value
from dual connect by level < 5
;
create table combined_table as
select coalesce(c.key_value,b.key_value,a.key_value) as key_value
, coalesce(c.attribute_value,b.attribute_value,a.attribute_value) as key_attribute
from A
full
join b
on b.key_value = A.key_value
full
join c
on c.key_value = coalesce(b.key_value,a.key_value)
;
select key_value as "KEY", key_attribute AS "ATTR"
, case when key_value = key_attribute then 'A'
when key_value = key_attribute-1 then 'B'
when key_value = key_attribute-2 then 'C'
END as SRC
from combined_table order by key_value
;
KEY ATTR SRC
-1 1 "C"
0 2 "C"
1 3 "C"
2 4 "C"
3 4 "B"
4 4 "A"
Updating a filed with a subquery for 94 million row table
CT, March 12, 2010 - 11:39 am UTC
Hi Tom,
Hello, how are you? Based on your recocmmendation I tried updating a 94 million row table T1 as follows
where T1's structure is
mpi varchar2(25)
tid Primarykey number
ttime date
and structure of x is
oldmpi varchar2(25)
newmpi varchar2(25)
name varchar2(50)
create table t2 nologging as select (select newmpi from x where x.oldmpi = t1.mpi) mpi, tid, ttime from T1;
when table t2 was created the mpi field did not have any value. In short the subquery that I used to populate the field does not seem to have worked. The timing for the above was about 40 mins or so, which was great! I really need help with this issue as I have not been able to find any method by which to update this huge table. Your help with this is greatly appreciated
March 12, 2010 - 3:56 pm UTC
you would have to give me an example, with data - as far as I can see, it would have worked correctly as typed in.
However, I would not have done it that way, you should use:
select x.newmpi, t1.tid, t1.ttime from t1, x where t1.mpi = x.oldmpi(+);
RE: "the mpi field did not have any value"
Duke Ganote, March 12, 2010 - 12:31 pm UTC
Sounds like basic debugging...what do you get with this query?
select count(*)
from x
join t1
on x.oldmpi = t1.mpi
If the count result is zero, then it's unsurprising that "the mpi field did not have any value". If it's non-zero, then it is surprising.
The count was non zero
CT, March 12, 2010 - 1:14 pm UTC
I ran the query as suggested by you and the count was non zero. There are matching records in T1 on mpi column between x.oldmpi and T1.mpi. Is there any reason why the subquery may not have worked maybe it is returning more than one row? Thanks
March 12, 2010 - 4:10 pm UTC
if it returned more than one row, it would fail.
You would need to provide a full example, but again - you want to outer join, NOT use a scalar subquery.
But until we have a FULL EXAMPLE that shows exactly what you did - we cannot comment on how it might have gone wrong for you.
creating a new table after joining two large tables
Somjit, April 12, 2010 - 7:29 am UTC
Hi Tom,
I read the solutions you gave in this thread. I have two tables:
A => 1200Million records
B => 70Million records
The tables are not partitioned and we have a plan in place to implement that now. The issue is that we are creating a new table which would contain data starting from 01.Jan.2010 only and for this purpose want to create a new table A_1 which has the same cols as A but with an additional column eventdate:
This is how I proceeded:
create table b_1 as select * from b where 1=2;
alter table b_1 add (eventdate date);
alter table b_1 nologging;
INSERT /*+ append parallel (C,12) */
into a_1 C
SELECT /*+ parallel (A,12) */
A.TRANS_ID, ..<all columns from table>..,
To_Timestamp(to_char(B.EVENTDATE,'dd.mm.yyyy')||' '||B.Eventtime,'dd.mm.yyyy hh24:mi:ss.FF')
from A, B
where B.EventDate >= to_date('2010-01-01','YYYY-MM-DD')
where A.trans_Id = B.Trans_Id
columns used in predicate are indexed. B.Trans_Id will be unique and A.Trans_Id will be a normal index as it has duplicate values. The concern is that this join condition is doing a heavy sort operation because of which the TEMP tablespace is getting filled up. How do I get this working as efficiently as possible?
April 13, 2010 - 9:17 am UTC
are you sure temp is getting filled - or is it saying "unable to extend temporary segment in tablespace XXX" where xxx is the target tablespace for this table?
This won't be sorting, this'll be HASHING (I would hope). I would expect B to be full scanned first, the event date predicate evaluated and the rows that are > 2010-01-01 to be hashed into memory and spilling on to disk if necessary. Then the join to A.
If the hash table is spilling to disk you can increase your pga settings, or if you are using old manual memory management the hash area size.
New method
Michel, April 19, 2010 - 11:22 pm UTC
Hi Tom,
This link helped me , But still confusing
A- 170Milion rows
B-80 milion rows
some what like this i am doing
merge /*+ parallel(TableA,4) */ into TableA
using TableB K
on(TableA.Col1=K.col1)
when matched then
Update set
TableA.col2=K.col2,
TableA.col3=K.col3,
TableA.col4='p';
explain plan is taking more time , hw ca i write by using nologging method ...
April 20, 2010 - 8:13 am UTC
why do you think nologging would make it go faster - it probably would not - the redo is written out in the background, you are not really 'waiting' for it.
You cannot merge "nologging", the insert part of the merge can be done nologging (use the append hint), but NOT the update part, NOT the index maintenance part.
the update part will always be done logging
the index maintenance part will always be done logging
rename
Michel, April 20, 2010 - 12:44 pm UTC
Tom,
My qn was how to write the same requirement not using merge
by using table rename method,
April 20, 2010 - 12:56 pm UTC
wow, that is about as far away from what I thought your question was as possible. I sure didn't see that from the above...
To do the "create a new table, and rename" trick, you would have to full outer join OLD with NEW (you want to keep all OLD records even if not in NEW, you want all NEW records even if not in OLD)
It'll be like this:
create table new_table
as
select coalesce( old.key, new.key ) key,
... repeat this for every column except join key ...
case when new.key is null then old.column else new.column end,
... above repeated .....
from old full outer join new on (old.key=new.key);
if old.key is null - record is in new but not old.
if new.key is null - record is in old but not new.
if neither is null - record is in both.
coalesce will return "the key" if old is null, returns the new key (which must not be null)
the case will return the old column if the new key is null - else return the new column (if the new.key is not null - then the new.column overrides anything in the old column)
creating a new table after joining two large tables contd
Somjit, April 21, 2010 - 7:23 am UTC
Hello Tom,
This is continuation from my earlier post.
I tried to run an explain plan on my query
SQL> explain plan for
2 INSERT /*+ append parallel (C, 8) */
3 into SEALSNET_STATPER_INST_P C
4 SELECT /*+ parallel (A,8) parallel (b,8) */
5 A.TRANS_ID, A.INST_ID, A.INST_LOGGERID, A.INST_CREATIONDATE, A.INST_CREATIONTIME, A.INST_COMPCLASS,
6 A.INST_COMPTYPE, A.INST_COMPNAME, A.INST_TAG, A.INST_UOM, A.INST_VALUE, A.INST_PLATFORM, A.INST_PRIORITY,
7 To_Timestamp(to_char(B.EVENTDATE,'dd.mm.yyyy')||' '||B.Eventtime,'dd.mm.yyyy hh24:mi:ss.FF')
8 from SEALSNET_STATPER_INST_BCK A, sealsnet_statper B
9 where A.trans_Id = B.Trans_Id
10 and B.EventDate >= to_date('2010-01-01','YYYY-MM-DD');
Explained.
Elapsed: 00:00:00.01
SQL> select * from table(dbms_xplan.display());
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2340029746
-----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 186M| 39G| 239K (1)| 00:48:00 | | | |
| 1 | LOAD AS SELECT | SEALSNET_STATPER_INST_P | | | | | | | |
| 2 | PX COORDINATOR | | | | | | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10002 | 186M| 39G| 239K (1)| 00:48:00 | Q1,02 | P->S | QC (RAND) |
|* 4 | HASH JOIN BUFFERED | | 186M| 39G| 239K (1)| 00:48:00 | Q1,02 | PCWP | |
| 5 | PX RECEIVE | | 29M| 1327M| 58068 (1)| 00:11:37 | Q1,02 | PCWP | |
| 6 | PX SEND HASH | :TQ10000 | 29M| 1327M| 58068 (1)| 00:11:37 | Q1,00 | P->P | HASH |
| 7 | PX BLOCK ITERATOR | | 29M| 1327M| 58068 (1)| 00:11:37 | Q1,00 | PCWC | |
|* 8 | TABLE ACCESS FULL| SEALSNET_STATPER | 29M| 1327M| 58068 (1)| 00:11:37 | Q1,00 | PCWP | |
| 9 | PX RECEIVE | | 289M| 48G| 181K (1)| 00:36:18 | Q1,02 | PCWP | |
| 10 | PX SEND HASH | :TQ10001 | 289M| 48G| 181K (1)| 00:36:18 | Q1,01 | P->P | HASH |
| 11 | PX BLOCK ITERATOR | | 289M| 48G| 181K (1)| 00:36:18 | Q1,01 | PCWC | |
| 12 | TABLE ACCESS FULL| SEALSNET_STATPER_INST_BCK | 289M| 48G| 181K (1)| 00:36:18 | Q1,01 | PCWP | |
-----------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("A"."TRANS_ID"="B"."TRANS_ID")
8 - filter("B"."EVENTDATE">=TO_DATE('2010-01-01 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))
25 rows selected.
Is there some way I can reduce the huge space required in "HASH JOIN BUFFERED" area? 39G required
April 21, 2010 - 9:03 am UTC
you could use indexes and nested loops - but then you would be asking me how to make this run in less then 100 days.
40gb is pretty small these days... given the volume of data we work with.
how to audit ddl command from a user in oracle 9i
Soumya Samanta, April 22, 2010 - 10:46 am UTC
hi, I am Soumya Samanta. I want to audit ddl commands issued from a user in oracle 9i environment.
How do i do that?
April 22, 2010 - 12:53 pm UTC
Slow update
Philippe Gaudin, April 29, 2010 - 6:16 am UTC
Hi Tom,
I have a query that has to update 23.000.000 rows in a table. Here are the query and the plan.
SQL> EXPLAIN PLAN
SET STATEMENT_ID = 'PGA_TST1'
FOR
UPDATE (SELECT current_pol_cov_trans.pct_cov_subscription_value AS cur_cov_subscription_value,
current_pol_cov_trans.pct_cov_in_force_value AS cur_cov_in_force_value,
current_pol_cov_trans.pct_cov_gross_amount AS cur_cov_gross_amount,
current_pol_cov_trans.pct_cov_net_amount AS cur_cov_net_amount,
current_pol_cov_trans.pct_cov_tariff_amount AS cur_cov_tariff_amount,
previous_pol_cov_trans.pct_cov_subscription_value AS prev_cov_subscription_value,
previous_pol_cov_trans.pct_cov_in_force_value AS prev_cov_in_force_value,
previous_pol_cov_trans.pct_cov_gross_amount AS prev_cov_gross_amount,
previous_pol_cov_trans.pct_cov_net_amount AS prev_cov_net_amount,
previous_pol_cov_trans.pct_cov_tariff_amount AS prev_cov_tariff_amount,
current_pol_cov_trans.pct_subscription_counter,
current_pol_cov_trans.pct_in_force_counter,
current_pol_cov_trans.pct_cov_delta_net_amount,
current_pol_cov_trans.pct_cov_delta_tariff_amount
FROM fact_policy_cov_trans current_pol_cov_trans
JOIN fact_policy_cov_trans previous_pol_cov_trans
ON (previous_pol_cov_trans.pct_policy_version_skey = current_pol_cov_trans.pct_prev_version_skey
AND previous_pol_cov_trans.pct_coverage_skey = current_pol_cov_trans.pct_coverage_skey))
SET pct_subscription_counter =
(CASE
WHEN cur_cov_subscription_value = 'N' AND prev_cov_subscription_value <> 'N' THEN -1
WHEN cur_cov_subscription_value <> 'N' AND prev_cov_subscription_value = 'N' THEN +1
ELSE 0
END),
pct_in_force_counter =
(CASE
WHEN cur_cov_in_force_value = 'N' AND prev_cov_in_force_value <> 'N' THEN -1
WHEN cur_cov_in_force_value <> 'N' AND prev_cov_in_force_value = 'N' THEN +1
ELSE 0
END),
pct_cov_delta_net_amount = cur_cov_net_amount - prev_cov_net_amount,
pct_cov_delta_tariff_amount = cur_cov_tariff_amount - prev_cov_tariff_amount
Explain complete.
SQL> SELECT * FROM table (DBMS_XPLAN.display (NULL, 'PGA_TST1', 'BASIC PARALLEL ROWS'))
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------
Plan hash value: 3795694554
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | TQ |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 14M| | | |
| 1 | UPDATE | FACT_POLICY_COV_TRANS | | | | |
| 2 | PX COORDINATOR | | | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10002 | 14M| Q1,02 | P->S | QC (RAND) |
| 4 | HASH JOIN BUFFERED | | 14M| Q1,02 | PCWP | |
| 5 | PX RECEIVE | | 22M| Q1,02 | PCWP | |
| 6 | PX SEND HASH | :TQ10000 | 22M| Q1,00 | P->P | HASH |
| 7 | PX BLOCK ITERATOR | | 22M| Q1,00 | PCWC | |
| 8 | TABLE ACCESS FULL| FACT_POLICY_COV_TRANS | 22M| Q1,00 | PCWP | |
| 9 | PX RECEIVE | | 17M| Q1,02 | PCWP | |
| 10 | PX SEND HASH | :TQ10001 | 17M| Q1,01 | P->P | HASH |
| 11 | PX BLOCK ITERATOR | | 17M| Q1,01 | PCWC | |
| 12 | TABLE ACCESS FULL| FACT_POLICY_COV_TRANS | 17M| Q1,01 | PCWP | |
-----------------------------------------------------------------------------------------------
19 rows selected.
The plan seems OK to me : hash join with full tablescan (as most of the rows of the table has to be read).
My problem is that it is slow and it generates a lot of "db file sequential read" waits. I guess that these waits are for the blocks that have to be updated...
What could I do to solve this problem ?
Thanks in advance,
Philippe
April 29, 2010 - 7:48 am UTC
what is "slow"
I would start there - in order to see if I thought it was slow or not first.
23,000,000 is not totally useful, it doesn't tell us if there is 23mb, 23gb or 23tb to update.
Philippe Gaudin, April 29, 2010 - 7:55 am UTC
The fact_policy_cov_trans table is 4.5 GB.
The execution of the query took 45 minutes... and in the past, with approximately the same data, the same plan, it took 6 minutes.
April 29, 2010 - 9:05 am UTC
the db file sequential reads are from disk IO to get the blocks from disk into memory during a full scan operation.
These are waits for the blocks that identify the rows you want to update and hold those same rows you want to update.
If these waits are "new" (you never saw them before) then the 'problem' is that IO has suddenly slowed down - a table of this size would be mostly on disk and the IO was always being done, so the conclusion would be likely that "the IO is taking longer than it did before"
what has changed? pga_aggregate_target (are you using that)? new application that also modifies this table (even if just ONE record) running at the same time? do you have performance metrics from before (using ASH or AWR)? Anything to go on? New index that never existed before? More users logged in than previously (just logged in, don't even have to be active)?
I'd be looking for what changed - because we know something must have..
Philippe Gaudin, April 29, 2010 - 10:02 am UTC
Thanks Tom for your answer.
The UPDATE is part of a data warehouse FIRST load.
I was thinking that FULL TABLE SCAN waits were db file scattered reads... and not sequential reads.
I will investigate the memory (PGA_AGGREGATE_TARGET,...) parameters.
April 29, 2010 - 10:26 am UTC
sounds like index maintenance - did you recently have one added to the columns you are modifying?
Philippe Gaudin, April 30, 2010 - 1:48 am UTC
No index on the updated columns.
The db file sequential read are done by the parallel slaves. That's why I was thinking that Oracle had to get each block that needs to be updated after the hash join.
April 30, 2010 - 7:05 am UTC
that is likely true as well, the hash join would employ multiblock read (probably direct path read), the result set is built and then lots of index like access to the blocks as we discover we need to update them.
You might try a merge and/or PDML (parallel dml)
the merge would be a bulky process, the PDML would let the p->s you have possibly stay parallel.
merge into fact_policy_cov_trans Tgt
using
(SELECT current_pol_cov_trans.rowid rid,
current_pol_cov_trans.pct_cov_subscription_value AS cur_cov_subscription_value,
current_pol_cov_trans.pct_cov_in_force_value AS cur_cov_in_force_value,
current_pol_cov_trans.pct_cov_gross_amount AS cur_cov_gross_amount,
current_pol_cov_trans.pct_cov_net_amount AS cur_cov_net_amount,
current_pol_cov_trans.pct_cov_tariff_amount AS cur_cov_tariff_amount,
previous_pol_cov_trans.pct_cov_subscription_value AS prev_cov_subscription_value,
previous_pol_cov_trans.pct_cov_in_force_value AS prev_cov_in_force_value,
previous_pol_cov_trans.pct_cov_gross_amount AS prev_cov_gross_amount,
previous_pol_cov_trans.pct_cov_net_amount AS prev_cov_net_amount,
previous_pol_cov_trans.pct_cov_tariff_amount AS prev_cov_tariff_amount,
current_pol_cov_trans.pct_subscription_counter,
current_pol_cov_trans.pct_in_force_counter,
current_pol_cov_trans.pct_cov_delta_net_amount,
current_pol_cov_trans.pct_cov_delta_tariff_amount
FROM fact_policy_cov_trans current_pol_cov_trans
JOIN fact_policy_cov_trans previous_pol_cov_trans
ON (previous_pol_cov_trans.pct_policy_version_skey =
current_pol_cov_trans.pct_prev_version_skey
AND previous_pol_cov_trans.pct_coverage_skey =
current_pol_cov_trans.pct_coverage_skey)) src
on (src.rid = tgt.rowid)
when matched then
update
SET pct_subscription_counter =
(CASE
WHEN cur_cov_subscription_value = 'N' AND prev_cov_subscription_value <> 'N' THEN -1
WHEN cur_cov_subscription_value <> 'N' AND prev_cov_subscription_value = 'N' THEN +1
ELSE 0
END),
pct_in_force_counter =
(CASE
WHEN cur_cov_in_force_value = 'N' AND prev_cov_in_force_value <> 'N' THEN -1
WHEN cur_cov_in_force_value <> 'N' AND prev_cov_in_force_value = 'N' THEN +1
ELSE 0
END),
pct_cov_delta_net_amount = cur_cov_net_amount - prev_cov_net_amount,
pct_cov_delta_tariff_amount = cur_cov_tariff_amount - prev_cov_tariff_amount
/
it would tend to build the src result and bulk merge into the tgt - might be more "io" friendly
disabling table lock
Soumya Samanta, May 13, 2010 - 9:44 am UTC
Hi Tom,
Please make a solution for the problem mentioned below:
I have made a command to prevent ddl like 'Alter table table_name disable table lock' but that would create a problem to delete a record from a particular table which has a foreign key to other table and also have two indexes. what is the solution because i have to prevent the ddl
Update 150 Million rows
Jeff, August 31, 2010 - 9:38 am UTC
Hi Tom,
I have a situation where I need to update around 45 tables, and each table has around 150M rows.
Need to update only one field from 500 to 510 in all tables.
Here is what I have done.
--set parallel to 8
--drop indexes
--increased memory parameters
update table set field = '510';
commit;
Right now each table takes 1 hour on an average.
Is there any way I can make it run faster?
Really appreciate your help,
Thanks
September 09, 2010 - 11:47 am UTC
why drop indexes? do not do that.
you would not have an index on field - since it contains just one value (the only index you would drop would be an index on field - I presume you don't have any).
it seems "silly" to me to update all to '510', what version are you on?
Update 150 Million rows
A reader, September 13, 2010 - 1:33 pm UTC
Thanks Tom for the reply.
Yes the field is part of pretty much all the indexes. I ended up dropping the unique indexes and marking unusable the rest.
Its an SAP application and this field is a client number.
Version of the database is 1024 on SuSe 10.
Thanks in advance.
Jeff
September 13, 2010 - 2:54 pm UTC
SAP - means we cannot do much since SAP is a 3rd party app.
In 11g, we could have dropped (set unused) the column and readded it with a default value. In 11g - newly added columns with defaults don't have to update the table.
You are already doing it in the most effective way possible I guess - short of using more parallel operations if you have the resources.
Update 150 Million rows
Jeff, September 13, 2010 - 3:10 pm UTC
Thanks Tom.
That's a very good idea. We will be upgrading to 11g soon. This process is a client copy process where production data is copied over to the other system for testing purposes every now and then.
Thanks again.
Jeff
Update from sequence
Dan, September 23, 2010 - 11:15 pm UTC
We have about 40 million rows in a table big_book(chapter number, lineno number, line_text varchar2(80)).
We currently have lineno 1..n for each chapter.
We need to replace lineno with values from a sequence ordered by (chapter, lineno) to result in a long running lineno 1 through 4 million.
Can this be done with a single sql statement?
If not is bulk binds with forall the fastest alternative?
September 27, 2010 - 10:11 am UTC
merge into T
using (select rowid rid, row_number() over (order by ......) rn from t) x
on (t.rowid = x.rid)
when matched then update set lineno = x.rn;
and then create your sequence (for future lines) starting with the max value plus one.
And realize that of course sequences are not gap free and never ever will be - so all you can be sure of in the future is that the lineno will be unique - but NOT necessarily sequential. And unless you lock the table before generating the sequence, you cannot be sure it will be "ordered" by your columns anymore either.
Update from sequence
Dan, September 23, 2010 - 11:27 pm UTC
I may have missed a detail in the previous post.
The sequence is global - used by other books. So there may be holes in the lineno generated for a give book - and that is ok.
(The book is just an analogy to our actual schema).
Update 200M Records
Rajeshwaran, Jeyabal, November 16, 2010 - 11:34 am UTC
Hi Tom:
I have a table with nearly 200 million records ( 41 columns ) Hash partition by 8, with 21 indexes ( 3 index Global partition other 18 index is not partitioned).I need to update the column C3 ( presently 2 non-unique composite index is build on this columns) with some padded values. so i used the below scenario.
Create Table T2
(
c1 ,
c2 ,
c3
...
c41
)
nologging
parallel 4 tablespace tools
as
select c1,
c2,
case when length(c3) = 7 and replace(translate(c3,'0123456789','@'),'@') is null
then lpad( substr(c3,1,2),4,'0') || lpad( substr(c3,3,5),7,'0')
else c3
end as c3,
.....
c41
from T partition (p1);
alter table T exchange partition p1 with table T2 without validation;
drop table T2 purge;
( Simillary repeat for other partitions available in table T )
But this will make non-local index to UNUSABLE state. Hence i need to rebuild all 21 indexes in NOLOGGING and parallel option.
Is there is any other option available better than this?
November 17, 2010 - 5:40 am UTC
In this case, you might consider
a) setting unused the two indexes on c3
b) updating (using parallel dml perhaps) c3
c) rebuilding the two indexes
that seems like it would be less work. Hopefully, the update would not cause too many row migrations as you are only adding 4bytes to each row - if there is sufficient free space on the block, most rows should stay where they are.
adaptive direct path updates
Amir Riaz, November 17, 2010 - 1:09 pm UTC
Hi Tom,
http://afatkulin.blogspot.com/2009/01/11g-adaptive-direct-path-reads-what-is.html in Oracle 11g, oracle uses direct path reads for serial full table scan. oracle uses adaptive direct path reads when full table scan size is more the 2% of cache size.
If that is possible with reads, then adaptive direct path should also kick in when doing updates or deletes.
if, update and delete are also made adaptive, it will be really cool.
I tried to produce a test but cannot produce both the cases?. Please help, especially in update case, if you consider that sort of modification has been done in oracle 11g.
November 18, 2010 - 3:20 am UTC
think about it - a direct path read bypasses the buffer cache. If the block isn't in the cache and someone else attempts to go after it - how will they know you have it in current mode for your update?
when you modify a block, you get it in current mode. Only one thing at a time can have it in current mode. If you did the work in your PGA (direct reads go there), how would anyone KNOW you are doing that. It needs to be in the buffer cache so we can put it in current mode (preventing others from doing that at the same time) - modify it - release it from current mode and have it marked "dirty".
If update and delete did it - it wouldn't be "cool".
Just like a direct path write causes the entire segment to be locked, so would this so called direct path read with a modification. Not so cool :)
got your point.
Amir Riaz, November 18, 2010 - 6:36 am UTC
Hi Tom,
Thanks for reply, yes, i got your point. adaptive direct path reads were possible because oracle does not lock for reading. for serial update we need the block in current mode. With direct path update oracle(if implemented) oracle has to lock the whole segament which effects concurrency.
I think direct path updates are possible with oracle parallel updates in data warehouse environments. I tried to find out its functionality after what you have told me above. Cannot found much about this topic. However here is my theory about parallel updates in oracle.
"Oracle uses parallel updates if parallel dml mode is enabled and a hint exist in update statement. Oracle can also use parallel update statement if tables are enabled for parallel update. under these condition if size of table is more than 2% of database cache size oracle will go for parallel update using direct path. Oracle will lock the whole segament and perform update in PGA and after updating the block in PGA, write it to database files on the same place where it has read it"
If above is true. than.
1. How does oracle populate rollback segaments.
2. if, i have big cache and table size is less than 2% of database cache size, but optimizer plan finds out that serial update is still costly, can oracle will consider In memory parallel update (trying to understand under which condition in memory parallel processing will be performed)
3. multiversioning does not come into play here. because the whole segament is lock and environment is data warehouse. I hope i am right here.
4. does oracle generate redo with parallel updates.
November 19, 2010 - 7:26 am UTC
1) no differently than at any other time
2) yes
3) yes it does, it always does. why wouldn't it?
4) yes
direct path reads and updates.
A read, November 19, 2010 - 1:53 am UTC
I dont see direct path reads or parallel direct path updates are much useful, if we dont cache data in buffer cache we have to do physical I/O for each query which database will execute. Why oracle introduce direct path reads or updates.
November 19, 2010 - 9:36 am UTC
say we didn't do direct io
say you need to full scan a table of 1,000,000 blocks.
it cannot fit into memory - no chance all 1,000,000 blocks will be there.
so, we start full scanning - say we'll read 64 blocks at a time (db file multi-block read count)
now before we can read blocks 1-64, we have to check in the cache to see if any of them are there. that is 64 LOOKUPS - ONE AFTER THE OTHER. Say we found 2 blocks - block 20 and block 34.
Now we need to read blocks 1..19 into the cache, then blocks 21..33 into the cache, then blocks 35..64 into the cache.
No more 64 block IO's :(
we did 64 buffer cache lookups (for two entire blocks :( )
we put 62 blocks in the cache - probably flushing out some blocks in the meanwhile - probably blocks from the table were reading - causing us to find even less blocks over time (but we still have to look!) :(
the cache in short GETS IN THE WAY. Makes the IO we have to do less efficient, adds 1,000,000 lookups that probably won't find anything, and making us move the blocks into the cache (1,000,000 times) for NOTHING (because they won't be there later when I go to look again)
a cache is not always a good thing.
oracle 11g some more optimizations
A reader, November 21, 2010 - 1:40 pm UTC
Hi Tom,
http://jonathanlewis.wordpress.com/2009/01/16/concurrency/ Check this out
consistent gets from cache (fastpath) 2,301
A consultant is insisting that its a direct path read. While i agree with Jonathan, i think oracle copy the database blocks to PGA in bulk. But i dont know the conditions under which FastPath take place. Would to put some light on the mechanism and conditions under which this phenomenon occur
November 23, 2010 - 12:30 pm UTC
You cannot get a *concurrent* and *consistent* read without the cache.
It is not a direct read. It is a consistent get from the buffer cache that can use a shorter code path for repeated reads of the same block in a fetch call.
tell the consultant that if it were a direct read, it would be using physical IO, that is part of the definition of a direct path read.
some interesting update behaviours.
AMIR RIAZ, November 22, 2010 - 12:05 am UTC
Hi Tom,
1. how will the following statement executes.
UPDATE
( SELECT src.x src_x, src.y src_y
, tgt.x tgt_x, tgt.y tgt_y
FROM src INNER JOIN tgt
ON ( src.id = tgt.id )
)
SET tgt_x = src_x
, tgt_y = src_y
/
2 rows updated
src and tgt are big tables. if both do serial direct path reads in select query and update 2 rows. How will update execute here. I think select will bypass cache and when the blocks were in PGA they are update and written directly to modified blocks. Both scr and tgt will lock the whole segament during select (no problem since oracle locks the read in shared mode during direct path read). When updating oracle will transforms the share read lock to exclusive lock on these rows and shared lock on table and perform update. so concurrency suffers. But it just a thoughts, i think we need Toms opinion on this interesting topic.
November 23, 2010 - 12:39 pm UTC
they need to do consistent gets to identify the rows to update and then current mode gets to update. No direct path reads here - you need the buffer cache (else this update would have to lock both tables - no concurrency).
src and tgt are NOT both locked by an update - even if the update updates every row - someone else could still insert into the table.
AMIR RIAZ, November 22, 2010 - 12:48 am UTC
oops! typing error above.
so concurrency suffers = so conncurrency does not suffer.
runtime update
A reader, December 09, 2010 - 5:40 am UTC
I need to update a table (say t) based on entries in another table (say u) by matching
the key columns.
My update would be something like below
update t set (column_a,
column_b,
....
.....
column_n)=(
select
nvl(updt_a,column_a),
nvl(updt_b,column_b),
...
null(updt_n,column_n)
from
where t.k1 =s.k1 --primary column)
where exists (select null
from s
where t.k1=s.k1)
Is the above advisable for large tables ?Any alternatives.
Above sql would be run dynamically in a procedure and name of table and corresp update table
will be available runtime.
December 09, 2010 - 11:02 am UTC
merge into t using u on (t.key = u.key) when matched then update set a=u.a, b=u.b, ....
just merge it in bulk.
80 gig table
Arunraj, December 13, 2010 - 5:41 am UTC
Hi Tom,
The table1_back is 80 gig and i have tried the following queries-
SQL> alter table table1_back nologging;
Table altered.
Elapsed: 00:00:00.01
SQL> alter table table1_back parallel 8;
Table altered.
Elapsed: 00:00:00.00
SQL> truncate table table1;
Table truncated.
Elapsed: 00:00:02.36
SQL> alter table table1 nologging;
Table altered.
Elapsed: 00:00:00.01
SQL> alter table table1 parallel 8;
Table altered.
Elapsed: 00:00:00.00
SQL> insert /*+APPEND parallel(b,8) */ into table1 b select /*+ parallel(a,8) */ * from table1_back a;
It is taking an eternity for completion. Can you suggest any other method for performing this?
I want to basically copy the data from backup table table1_back to the table table1.
Thanks,
Arunraj
December 14, 2010 - 2:40 am UTC
what kind of machine are you using.
define "eternity"
do you need a copy or would you just like to replace table1 with table1_back - the fastest way to do that would be:
drop table table1;
rename table1_back to table1;
80 gig table
Arunraj, December 14, 2010 - 4:42 am UTC
dropping the table isnt an option because i would lose the back up table then.
I ran that query for 17 hours and in the end it gave me a "unable to extend segment 8" error.
I have checked with our DBAs and they said that they cant extend the table space any more.
So, I guess error which I am facing is the segmentation error and need to find ways to solve it.
I found a method of committing the table after x number of inserts. Will it serve the purpose?
Or do you suggest any other solutions?
Thanks,
Arunraj
December 14, 2010 - 7:22 am UTC
the unable to extend means the tablespace you were trying to copy into is out of space - you don't have free space in your target tablespace.
if your DBA's said "no more space for you" then you are not going to get very far, think about it.
committing isn't going to make it take less space. It could make it take more.
You could try using compression (create table t compress as select * from t_backup)
'define "eternity"'
Sokrates, December 14, 2010 - 5:55 am UTC
select NUMTOYMINTERVAL(BINARY_DOUBLE_INFINITY,'year') from dual
?
December 14, 2010 - 7:28 am UTC
hah, funny :)
update compress table
A reader, February 06, 2011 - 4:39 pm UTC
I have compress table having 9,000,000 rows.
update(merge and parallel 8 dml) of compress table's 800,000 rows takes 20 minutes generating about 450,000 undo records each session,
but update(merge and parallel 8 dml) of uncompress table's 800,000 rows takes 2 minutes generating about 100,000 undo records each session.
same plan and same data!
only different thing is compress or uncompress
February 07, 2011 - 6:23 am UTC
you do not mention the version or any other relevant conditision, but I'll assume 10g or before.
An update cannot be done in direct path. Therefore, your update of all 800,000 rows will involve decompressing them, allocating new space for them, and moving them to this new space. whereas when you do the uncompressed update - it just updates them and probably doesn't have to move them.
Saying the "only different thing is compress or uncompress" is like saying "the only difference between the racers was that one was on a bike and the other in a car". It is a pretty huge difference.
In 10g and before, basic compression was for direct path loaded data only and is only appropriate for read only/read mostly data. While it supports updates - it is certainly not recommended. It would be appropriate in a warehouse where you load in direct path mode a ton of data - and then query it. It would be appropriate in an OLTP system under some circumstances like an audit trail - where you can take last months audit trail partition and alter move it - to compress it. So your audit trail would have old months partitions compressed (never to be modified of course) and the current month uncompressed.
In 11g and above, with the advanced compression option - you can preserve the compression of data even while performing 'normal' inserts, updates and so on.
A reader, February 07, 2011 - 4:41 pm UTC
thanks.
your assumption is right. database version is 10.2.0.4.
I understand your answer.
insert/updae with plsql
KG, March 12, 2011 - 9:39 am UTC
Hi Tom
Got a t1 and t2 tables with same table structure
(
c1 number,
c2 number)
t1 has less number of rows around 10000
and t2 around 200000
Need a plsql will take the data in the T1 table and
1. insert rows not currently in the T2 table and
2. update T2 rows with a matching c1 value with the data in T1
I think MERGE cannot be use here as join condition for Insert and update varies.
Thanks in advance.
March 12, 2011 - 5:05 pm UTC
Well, you have a slight small ambiguity.
Do you want it to happen in that order? Or could you update and then insert (possibly inserting less records then before).
What if T1 has:
C1=1, C2=1
and T2 has:
C1=1, C2=2
If you insert first:
C1=1,C2=1
C1=1,C2=2
and then update:
C1=1,C2=2
C1=1,C2=2
if you update first, you'll just have
C1=1,C2=2
so, please be a tad more precise in your specification - as it stands now - it is not clear.
Sorry for Confusions
KG, March 12, 2011 - 11:19 pm UTC
Hi Tom
Yes i was bit wrong ,The col1 is primary key .
Below is task i have been asked to achieve
PL/SQL program to be written that will take the data in the T1 table and
1. insert rows not currently in the T2 table and
2. update T2 rows with a matching col1 value with the data in T1.
Please write the program so as to maximize overall performance and efficiency.
I was able to achieve this with MERGE but i need a PLSQL as well.
Whether plsql can be better than MERGE performance wise?
March 14, 2011 - 7:49 am UTC
I was able to achieve this with MERGE but i need a PLSQL as well.
no you don't and it would be an utter waste of time to write it.
MERGE is the right answer here
merge into t2 using t1 on (t1.col1 = t2.col1)
when matched then update set col2 = t1.col2
when not matched then insert(col1,col2) values (t1.col1, t1.col2);
If you need it in plsql, it would look like this:
begin
merge into t2 using t1 on (t1.col1 = t2.col1)
when matched then update set col2 = t1.col2
when not matched then insert(col1,col2) values (t1.col1, t1.col2);
end;
that is the way to write the program so as to maximize overall performance and efficiency.
I refuse to write procedural code as this solution not only does not require it, it would be less efficient and performance would suffer.
Thanks Tom
kg, March 14, 2011 - 3:14 pm UTC
Tom
Merge does full table scan of the tables.
The col1 has primary key index .
Is it not possible to use index to access tables and then do DMLS on them
March 14, 2011 - 3:30 pm UTC
Merge does full table scan of the tables.
EXCELLENT, I would be very very upset if it did anything else in most cases!!!
If the tables are of any size - then it is doing the absolute RIGHT THING.
And you will NOT break that perfection by writing your own "do it yourself nested loops joins". I won't let you - I've seen it happen far too often.
"Oh no, it isn't using my indexes!!! I have to 'fix' that!!!"
That is the most wrong way to think about it.
But also bear in mind that when it makes SENSE, it will not full scan both tables:
ops$tkyte%ORA11GR2> create table t1 as select object_id col1, object_name col2 from all_objects where rownum <= 10;
Table created.
ops$tkyte%ORA11GR2> alter table t1 add constraint t1_pk primary key(col1);
Table altered.
ops$tkyte%ORA11GR2> exec dbms_stats.gather_table_stats( user, 'T1' );
PL/SQL procedure successfully completed.
ops$tkyte%ORA11GR2> create table t2 as select object_id col1, object_name col2 from all_objects;
Table created.
ops$tkyte%ORA11GR2> alter table t2 add constraint t2 primary key(col1);
Table altered.
ops$tkyte%ORA11GR2> exec dbms_stats.gather_table_stats( user, 'T2' );
PL/SQL procedure successfully completed.
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> set autotrace traceonly explain
ops$tkyte%ORA11GR2> set linesize 1000
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> merge into t2 using t1 on (t1.col1 = t2.col1)
2 when matched then update set col2 = t1.col2
3 when not matched then insert(col1,col2) values (t1.col1, t1.col2);
10 rows merged.
Execution Plan
----------------------------------------------------------
Plan hash value: 1890694634
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | MERGE STATEMENT | | 10 | 590 | 13 (0)| 00:00:01 |
| 1 | MERGE | T2 | | | | |
| 2 | VIEW | | | | | |
| 3 | NESTED LOOPS OUTER | | 10 | 400 | 13 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL | T1 | 10 | 110 | 3 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| T2 | 1 | 29 | 1 (0)| 00:00:01 |
|* 6 | INDEX UNIQUE SCAN | T2 | 1 | | 0 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
6 - access("T1"."COL1"="T2"."COL1"(+))
ops$tkyte%ORA11GR2> set autotrace off
here it full scans the tiny 10 row table and applies the changes to T2.
If the "tiny table" grows, then we'll stop using the INEFFICIENT, BAD indexes and get back to nice big juicy efficient full scans:
ops$tkyte%ORA11GR2> delete from t1;
10 rows deleted.
ops$tkyte%ORA11GR2> insert into t1 select * from t2 where rownum <= 10000;
10000 rows created.
ops$tkyte%ORA11GR2> exec dbms_stats.gather_table_stats( user, 'T1' );
PL/SQL procedure successfully completed.
ops$tkyte%ORA11GR2> set autotrace traceonly explain
ops$tkyte%ORA11GR2> set linesize 1000
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> merge into t2 using t1 on (t1.col1 = t2.col1)
2 when matched then update set col2 = t1.col2
3 when not matched then insert(col1,col2) values (t1.col1, t1.col2);
10000 rows merged.
Execution Plan
----------------------------------------------------------
Plan hash value: 2414655244
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | MERGE STATEMENT | | 10000 | 576K| 115 (2)| 00:00:02 |
| 1 | MERGE | T2 | | | | |
| 2 | VIEW | | | | | |
|* 3 | HASH JOIN OUTER | | 10000 | 498K| 115 (2)| 00:00:02 |
| 4 | TABLE ACCESS FULL| T1 | 10000 | 214K| 14 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL| T2 | 72719 | 2059K| 100 (1)| 00:00:02 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("T1"."COL1"="T2"."COL1"(+))
ops$tkyte%ORA11GR2> set autotrace off
So you see - if you do it in a merge statement - it will use "for loops" when it makes sense, and it will use big full scans when it makes sense - meaning you don't need N versions of code for:
T1 T2
------- --------
big big
small small
big small
small big
you just have merge and we figure out the way to do it.
Inconsistent Database for Improper Shutdown
Shantanu Nandy, March 15, 2011 - 5:53 am UTC
hi Tom,
its been a pleasure to read your comments on the various aspects of oracle and knowing so much about this under the same roof.
just had a thought in my mind about your suggestion while updating large number of rows that we need to (1) create a new table, (2) do an insert, then (3) drop the old table and (4) rename the new table to old one.. now, these steps are defined by the programmer and are not in general explained in details to the dba team.. if, by any chance, the database shuts down in the middle of steps (3) and (4), then will not the dba team has to act more interactively in executing the remaining steps? otherwise it will leave the database inconsistent. does this approach then involves some kind of threat of causing the database inconsistent?
thanks again for all your explanations and answers with utmost clarity.
March 15, 2011 - 8:54 am UTC
Well, it doesn't even take this scenario - most of the times developers commit WAY too early - the do something, commit, do something, commit and so on - even though all of the do somethings are part of the same flow, the same thing - we must pick up where they left off (the process is restartable).
They (the developers) can actually use dbms_scheduler to set up these work flows. Their process would not be to do 1, 2, 3, 4 - but rather schedule a job chain that does 1, 2, 3, 4 in order (or uses whatever external job/workflow tool they want).
big table update
Indranil Das, April 16, 2011 - 8:56 am UTC
hi Tom,
Suppose, we need to an update like this-
rem --- Update Credit Card Record ---
update prod.contract_payment
set CREDIT_CARD_NUMBER =
case credit_card_type_code
when 'VISA' then
'DD07A801793B63825E3B23A5A2622A95E1924E940939D6461169DCE01F6150A7G425'
when 'MC' then
'C217DC8C813A77562F9A2979302CBD32E1924E940939D6461169DCE01F6150A7G425'
when 'AMEX' then
'AC2CA9F4A1C5B223EE870778E1F26753G425'
when 'DSC' then
'162077E116204E6F1B91C1D5ED26B9A0E1924E940939D6461169DCE01F6150A7G425'
else
CREDIT_CARD_NUMBER
end ,
bank_account_number = NULL ,
last_modified = sysdate ,
last_modified_by = user
where credit_card_type_code in ('VISA','MC','AMEX','DSC')
and TRANSACTION_TYPE_CODE ='CC' ;
COMMIT;
rem --- Update Bank Account Record ---
update prod.contract_payment
set CREDIT_CARD_NUMBER = NULL
bank_account_number = 'A48F3F9A0010D1167F9E8201A9BE1A90G425' ,
last_modified = sysdate ,
last_modified_by = user
where TRANSACTION_TYPE_CODE ='ACH' ;
COMMIT;
Now, Contract_payment Table is having 5,00,00,000 records and nothing is worked - I mean bulk update ,plain update,using rowid update - nothing worked... Now, my query is, without using the "create a new table,drop the new table " method , is there any alter nattive ? Could you please give an example,.... Thanks a lot in avance... Regards
Indranil
April 18, 2011 - 10:04 am UTC
why do you have a commit in between? Are these really two transactions - it would be ok for one to succeed and one to fail?
What do you mean by "nothing worked"??? Of course they worked, or were working... I don't know what number 5,00,00,000 is - it is either
500,000,000
or
5 billion
but in any case - assuming you have the hardware and time - it'll work.
You don't tell us how many records this will update - that would be very very important to know. As would the proper size of the table - both in RECORDS and GIGABYTES
Alex, April 18, 2011 - 10:25 am UTC
That post is pretty scary, you'd like to think experts were writing code that's touching credit card numbers and bank accounts.....Let's hope that's a bogus account # in there...
April 18, 2011 - 10:52 am UTC
it is too big to be a credit card number, it looks to be encrypted (raw) representations of something.
Update trigger issue
banerjee, April 18, 2011 - 12:35 pm UTC
today i faced one typical issue. want to share with you.
In our load test environment we found one update is taking almost 15 min to update 18 million data. but the query is pretty simple. After some analysis we found that their is an update trigger associated with the table. And the update trigger is for each row. So simple bulk collect with limit 100 itself is taking much time because, for every update the trigger is being called. So after some analysis it is found that trigger is not doing any thing related to this update. So we
1. Disabled the trigger in pl/sql code.
2. Do the update
3. Enable it again in pl/sql code.
For time being this issue is resolved. But we cant use the same technique for all the cases.
Is their any way to handle this kind of update.
April 18, 2011 - 12:49 pm UTC
avoid triggers - think about it - what else could one say????
You have a row trigger
It'll fire per row
It'll do things slow by slow
And that is unavoidable, entirely.
Even if you moved the processing to the AFTER trigger (using a compound trigger in 11g or using a plsql package to capture a row by row state in plsql tables) - it would still be slow since the row by row, slow by slow, trigger would still have to fire.
see
http://www.oracle.com/technetwork/issue-archive/2008/08-sep/o58asktom-101055.html for more reasons to avoid triggers.
update on generic columns
DAN FISHER, April 20, 2011 - 7:08 am UTC
We have a generic entity / attribute based table. I know this is a mistake :)
However we need to update 4 value varchar2 columns to 4 corresponding numeric columns ie
UPDATE t_clinical_record
SET value_varchar_23 = value_numeric_3,
value_varchar_24 = value_numeric_5,
value_varchar_25 = value_numeric_6,
value_varchar_26 = value_numeric_7
WHERE entity_id = 857;
This needs to be done for around 300 different entities, where the numeric column differs based on entity id. The table in question has approximately 100 million record and is not partitioned (this would have been handy as I could have performed a partition exchange).
Running an update in this manner is going to takes days. weeks..
Could we exploit the CTAS as highlighted and do you have an example.
Dan
Disclaimer I inherited the generic design and disliked this type of modelling before now I loathe it.
April 20, 2011 - 8:12 am UTC
why would it take days? weeks? It sounds like a very tiny update??????
You are going to update a couple hundred records as far as I can tell?? why would this take a long time?
guess
A reader, April 20, 2011 - 12:32 pm UTC
Tom,
Can you guess who I am ?
Thanks
update on generic columns
DAN, April 21, 2011 - 5:39 am UTC
It will be slow because the attribution varies across entities the basic statement below, exhibits variation across the generic numeric columns based on the entity id.
UPDATE t_clinical_record
SET value_varchar_23 = value_numeric_3,
value_varchar_24 = value_numeric_5,
value_varchar_25 = value_numeric_6,
value_varchar_26 = value_numeric_7
WHERE entity_id = 857;
So this needs to be executed a few times.
In addition running a count (*) from t_clinical_record
where entity_id = 857
Returns aproximately 25,000,000 records. The table it self has several hundred million records, not 100.
Thanks in advance
Dan
April 25, 2011 - 8:05 am UTC
why would you need to run it a couple of times??
Updating millions of records
A reader, May 02, 2011 - 5:45 am UTC
Hi Tom,
I have a similar situation.
My requirement is related to a process of archiving where I need to move 10 million records from main table to history table.
Any pointers on how can this be done faster? (We work on Oracle 10g.)
Regards,
Neha
May 04, 2011 - 12:13 pm UTC
re-read the page, I've described what you might consider doing.
Using DDL instead of DML, create tables as select.
Adjusting Primary Keys and Foreign Keys in a table
NimitZ, May 12, 2011 - 6:30 am UTC
Hi Tom,
I have a requirement where I need to merge selective data from one database instance to another. I have scripts that did the part to stage the data.
However, when before I try to merge the data to the original tables, I would need to adjust the primary keys of all the tables and update the foreign key references accordingly. There are around 130 tables. Currently what I do is, I have added additional columns to hold the new and old primary keys. Updating references on the tables takes a real long time and there is nothing that I can think of on the migration part of it. All of this is done by disabling constraints and then applying constraints separately after the updates are complete.
1. Create views around source (Not all data and not all tables are to be merged)
2. Move data from views (which is now my source) to staging
3. Staging tables have old_id and new_id in addition to the source table data
4. Update new_id with max(primary_key value at destination) + rownum – For all staging tables
5. Update old_id with current primary key value – For all staging tables
6. Get all foreign key references for each table and based on the referred table and column update the references.
Step 6 is what takes like forever. All the queries are dynamically created and executed with an execute immediate.
Can you please suggest if there is any better way of doing it?
Thank you.
PS: I tried to post a new question but the site was not open. And this is a little urgent.
May 12, 2011 - 7:59 am UTC
you seem to be missing steps? I don't see any disabling of constraints?
What about the indexes - do you do anything with them?
How much data are we talking about?
What is your definition of "forever"? Mine is 1,000 days. What is yours? (it might be expectations that need to be reset...)
Are you doing this sequentially? Do you have resources to do this in a more parallel fashion?
Updating 1M records deriving the value from a DB link
NPaul, May 19, 2011 - 3:35 pm UTC
Hi Tom,
I have a reporting table consisiting of 62378579 Rows.
I will have to update approx 1 M records in the table for a given field.
This field value will have to be derived from the erp database.
I followed your approach from this thread to
create a new table
using an update query having the db links ,
it has been running for more than 8 hours now.
For Simple updates where a reference table
<Meaning - straight forward cases- of wat has to be updated to what>
is given to update 1M records it usually takes 5 hours.
What is the right approach for these kind of data fixes,
is it okay to connect via DB links to derive the data?Please advise
Thanks,
NPaul
May 23, 2011 - 11:10 am UTC
you tell us nothing about the size of the data in the remote database - that would be the important bit... Also, "62378579 Rows." tells us nothing of the volume of data - is that 62mg or 62gb or 62tb or something else?
Updating 1M records deriving the value from a DB link
A reader, May 25, 2011 - 9:43 am UTC
Hi Tom,
Sorry for the incomplete facts as this being my first question in this site.
Reporting table size = 49G,Erp table size =9.4G.Update will be done to a single field based on data from erp table.
Thanks,
NPaul
May 25, 2011 - 11:35 am UTC
is it possible to identify the 1,000,000 rows from the ERP table easily? Or are we going to have to basically bring that entire back over the database link?
help on this please
venkata, August 03, 2011 - 4:43 am UTC
Tom,
I have to build a local table by copying some 10 million records from another table which i am accessing via dblink. The data in remote table is partitioned based on month and year i.e 'Mar-2010', 'Apr-2010', 'May-2010' and so on.
What i am doing now !
I am building table locally and then inserting data like this
insert into local_table select * from table@remotelocation where mon_year = 'Mar-2010';
commit;
insert into local_table select * from table@remotelocation where mon_year = 'Apr-2010';
commit;
insert into local_table select * from table@remotelocation where mon_year = 'May-2010';
commit;
what is my assumption
my assumption is loading data into local table as said above will not create any relog files
what i want to know
i want to know instead of the above process can i load data like this
insert into local_table select * from table@remote_location where mon_year in ('May-2010','Apr-2010','Mar-2010');
commit;
and which process is better.
i can't give you a working example here like create table etc..
August 03, 2011 - 7:44 am UTC
your assumption is false.
those are conventional path inserts - they always generate UNDO and REDO.
if you use insert /*+ append */ (and met all of the requirements for a direct path load - no foreign keys, no triggers...) - then we will skip UNDO generation for the table (but not any indexes)
further if you are in archivelog mode, AND if you use insert /*+ append */ (and met the requirements) AND the table was set to NOLOGGING AND the database was not set to FORCE LOGGING - then we can skip REDO and UNDO for the table (but *not* for any indexes)
or if you are in noarchivelog mode and you use insert /*+ append */ (and met the requirements) we would skip UNDO and REDO generation for the table (but not any indexes...)
Yes, you can and probably should use the mon_year in (.....)
is mon_year a date or a varchar2? If it is a date, please use to_date( 'may-2010', 'mon-yyyy' ) - not just 'May-2010'.
can i do this
Venkata, August 03, 2011 - 12:15 pm UTC
Hi,
can i do bulk collect over a db link to avoid multiple db reads will this approach also avoid redo and undo logs.
hmmm
i will post the exact query what i am doing in my next post sorry i don't have the query in hand now .....
August 03, 2011 - 1:27 pm UTC
how would bulk collect avoid redo and undo in general? You are very mistaken/ confused about how this works.
You can skip undo generation on a table using DIRECT PATH operations (insert /*+ append */, sqlldr direct=y, create table as select, alter table t move - for example)
You MIGHT be able to skip redo generation if you are
a) in noarchivelog mode
b) in archivelog mode and the table is set to nologging and the database IS NOT set to force logging
But that is it - that is all.
(Note: in 11gr2, using the append_values hint, you can direct path load using arrays in plsql as well)
If you want to skip undo and perhaps redo generation, you would use the append hint, period.
update two columns of same table
A reader, September 02, 2011 - 6:47 am UTC
hi tom,
my environment:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
"CORE 11.2.0.2.0 Production"
TNS for Solaris: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production
i have a table t1 with the following fileds and values;
i j k name1 name2 date
1 1 1 c d 03-SEP-11
1 1 1 z x 06-SEP-11
1 1 2 b c 04-SEP-11
1 1 2 a a 07-SEP-11
1 2 2 b b 01-AUG-11
1 2 2 v b 04-SEP-11
in RB database i,j,k,name1,name2 is used as a primary key but while
migrating to oracle for some prupose we desiging it in such a way that
i,j,k is the primary key but we will get duplicates for this combination.
as you refer in the above example.
to make it distinct we apply the logic say for combination 1,1,1
we get 2 rows , so we will be taking only one name1 and name2 combination.
logic is: for combination of i,j we take latest of name 1
( using latest load_date value) and for i,k name2(using latest load_date value)
so for 1,1,1 combination we should have the latest name 1 as 'a' and
name2 as x and load _date is latest of 1,1,1 combination and
insert into a new table as 1,1,1,z,x,06-sep-11 for the combination of 1,1,1. and so on.
please help me am trying it using many logics and making it complex. i have used ref cursors and have not succeedded.
this is my code which i tried and not successful:
declare
TYPE rec_type IS RECORD(i NUMBER,j NUMBER);
rec_c1 rec_type;
type rec_type4 IS RECORD(i NUMBER,k NUMBER);
rec_c2 rec_type4;
TYPE cv_typ IS REF CURSOR;
cur1 cv_typ;
type rec_type1 is record ( rn number, i number, j number, nm1 varchar2(10), load_date date);
rec_c3 rec_type1;
type rec_type2 is record ( rn number, k number,nm2 varchar2(10), load_date date);
rec_c4 rec_type2;
begin
select distinct i,j into rec_c1 from temp1;
select distinct i,k into rec_c2 from temp1;
--where load_date in (select max(load_date) load_date from temp1 group by i||j )
open cur1 for 'select * from (select rownum,i,j,k,nm1,nm2, load_date from temp1 where i=1 and j=1 order by load_date desc) where rownum =1';
loop
fetch cur1 into rec_c3;
insert into temp2 ( i, j, nm1,load_date) values( rec_c3.i, rec_c3.j,rec_c3.nm1,rec_c3.load_date);
end loop;
open cur1 for 'select * from (select rownum,i,j,k,nm1, nm2, load_date from temp1 where i=1 and k=1 order by load_date desc) where rownum =1';
loop
fetch cur1 into rec_c4;
update temp2
set nm2 = rec_c4.nm2 ,
k =rec_c4.k
where i = temp2.i and j =temp2.j and nm1 = temp2.nm1 and load_date = temp2.load_date;
end loop;
end;
please let me know if any additional information is required
September 02, 2011 - 8:48 am UTC
logic is: for combination of i,j we take latest of name 1
( using latest load_date value) and for i,k name2(using latest load_date value)
so for 1,1,1 combination we should have the latest name 1 as 'a' and
name2 as x and load _date is latest of 1,1,1 combination and
insert into a new table as 1,1,1,z,x,06-sep-11 for the combination of 1,1,1.
and so on.
you lost me there. You said "i,j - take the latest of name 1 and that is 'A'", "i,k table the latest of name2 and that is 'X'", and then we insert "z,x" ??????
why not a,x - and what is the rule for the date column - how do you decide what that should be????
you really also need to supply a create table and inserts if you want me to create a query for you.
update 2 columns in a table
A reader, September 02, 2011 - 6:51 am UTC
hi tom small correction in tha above example the result output should be 1,1,1,a,x,06-sep-11 in the above i wrongly mentioned as 1,1,1,z,x,06-sep-11
September 02, 2011 - 8:49 am UTC
so, tell me, why sep-6th?? what is the rule for that
and you need a create table and inserts
A reader, September 05, 2011 - 5:35 am UTC
hi tom,
6th sep follows the rule - it takes latest date for the combination of 1,1,1
please let me know any more inputs needed.
also i tried one more update statement.
Update temp2 t2
set
temp2.nm1 = (select nm1
from
temp1 T1 where
T1.i = t2.i and
T1.j = t2.j )
where exists
(select
i,j,max(load_Date) FROM t1
group by i,j)
but not sure it is correct. also if its rite, will it take less time for updating billion of records. since am using exists.
September 06, 2011 - 8:38 am UTC
you will have to, in text - just words, spell this out precisely. sep-6th is still confusing for me - I have to look at two different rule sets to get that - you have one rule for i,j. You have another rule involving i,k. Is the third - yet to be specified rule for i and (j or k)?
Also, read the last response above more closely, you are missing the second most important thing after a clear and detailed specification. Actually, you can read either of my posts since I clearly told you what you MUST provide if you want me to even look at this.
informatica
ram, September 20, 2011 - 6:12 am UTC
i have an issue here.in my source table i have 1000 records and in my target table it is shoing me as 1500 records.I want only 1000 records in my target.can you tell me how to clear that.
September 20, 2011 - 6:43 pm UTC
seriously - you are asking me an informatica question?
I don't know what you did wrong in informatica to get this transformation.
informatica
ram, September 20, 2011 - 6:14 am UTC
i have an issue here.in my source table i have 1000 records and in my target table it is shoing me as 1500 records.I want only 1000 records in my target.can you tell me how to clear that.
Table update with 900m rows
Rahul, November 09, 2011 - 8:44 am UTC
Hi Tom,
I have not managed to nail the above suggestions. I have a table with 900m rows and need to add a new column and then update the value based on some existing value in the table. I have created a dummy table with NOLOGGING clause and use Insert /*+ append parallel (EPOS_STORE_LEVEL_HOLD,12) */ to insert with Select /*+ parallel (EPOS_STORE_LEVEL,12) */ and it is taking forever. Am I missing something or should I be just patience and wait for the update to take its own time. This is in DEV only and to see the affect or downtime needed in our LIVE database. Based on sample data of 1m rows I have calculated that it would take 22 hours. Do you have any suggestions to bring it down to more realistic time frame please? Your help will be highly appreciated. Thanks in advance.
November 09, 2011 - 11:24 am UTC
a) are you sure it went parallel, did you check
b) do you have the resources to go parallel (cpu, memory, DISK, IO BANDWIDTH)
c) how big is 900m rows? Could be anywhere between 1gb and infinity
d) are you archivelog mode, if so, with or without logging set on the table.
e) what does the sql look like
f) what are you waiting on (v$session_event, v$session_wait)
#2 is probably one of the most important things
Table update with 900m rows
Rahul, November 10, 2011 - 5:10 am UTC
Thanks for your quick response. Couple of things first:
Update worked with NOLOGGING clause on all the partitions in 9 hours which is a lot faster than anticipated 22. Also, I don't think I had "all" the resources to go in parallel as there are 4 processors on our server so I am not sure if it went in parallel. All other resources are there. There is a decent disk space and all the tables-paces have been auto extended to avoid failures. 900M rows in terms of size is about 56G in space and this is the query I was using:
Insert /*+ append parallel (XXX_HOLD,12) */
into XXX_HOLD ( EPOS_DATA_PROVIDER_CODE ,
MAGAZINE_CODE ,
SALE_DATE ,
STORE_CODE ,
SUM_OF_UNITS ,
LAST_STORE_POLLED_DATE ,
LAST_UPDATED ,
LAST_UPDATED_BY ,
SUM_OF_COST ,
WHSHS_SKU ,
TITLE_ID
) Select /*+ parallel (XXX,12) */ EPOS_DATA_PROVIDER_CODE
,MAGAZINE_CODE
,SALE_DATE
,STORE_CODE
,SUM_OF_UNITS
,LAST_STORE_POLLED_DATE
,LAST_UPDATED
,LAST_UPDATED_BY
,SUM_OF_COST
,WHSHS_SKU
,ALL_MAGAZINE_PKG.GET_TITLE_ID_FROM_MAG_CODE('EPOS_MAGAZINE_CODE',MAGAZINE_CODE) Title_ID
from XXX ;
where ALL_MAGAZINE_PKG.GET_TITLE_ID_FROM_MAG_CODE is a function which returns required title_id and is using Index.
Also explain plan of select statement returns as :
Plan
SELECT STATEMENT ALL_ROWSCost: 2 Bytes: 117 Cardinality: 1
4 PX COORDINATOR
3 PX SEND QC (RANDOM) PARALLEL_TO_SERIAL SYS.:TQ10000 :Q1000Cost: 2 Bytes: 117 Cardinality: 1
2 PX BLOCK ITERATOR PARALLEL_COMBINED_WITH_CHILD :Q1000Cost: 2 Bytes: 117 Cardinality: 1 Partition #: 3 Partitions accessed #1 - #89
1 TABLE ACCESS FULL TABLE PARALLEL_COMBINED_WITH_PARENT EPOS.EPOS_STORE_LEVEL :Q1000Cost: 2 Bytes: 117 Cardinality: 1 Partition #: 3 Partitions accessed #1 - #89
Index creation (on 4 columns) took about 30 mins for the same data which is what I was hoping for. I am not in archive log mode.
November 10, 2011 - 1:33 pm UTC
Update worked with NOLOGGING clause on all the partitions in 9 hours which is a
lot faster than anticipated 22
just for your information - the nologging had no effect on this, updates are ALWAYS logged, always - no way around it
disk space isn't relevant (there had to have been space, that is obvious - serial or not). number of disks, amount of bandwidth to/from disk is important.
Table update with 900m rows
Rahul, November 10, 2011 - 9:18 am UTC
Hi Tom,
Thanks for your update. It si working fine now and I have reduced it to 1.5 hours from 9 hours. As pointed out NOLOGGING wasn't working fine and change of function and join with that table also helped.
Thanks once again.
update performance
Venkat, March 30, 2012 - 5:41 pm UTC
Hi Tom,
Thank you very much for spending your valuable time in giving the responses.
I have the below update statement(code changed for security) that's taking long time(90minutes) for updating 1500 records.
I was able to tune the select part of the statement to 3minutes(from 70minutes) using parallel hints etc. However when I replace the select statement in update with the tuned code it's taking around 10minutes to execute the select statement itself(observed from sql_monitor). There aren't any changes in the plan.
Update table set col1=<value>
where exists(select subquery);
oracle version: 11.2.0.2
Sorry if I missed providing more data.
Thanks once again
March 31, 2012 - 8:09 am UTC
I feel like the computer from Star Trek:
insufficient data.
merging tables,
A reader, April 16, 2012 - 4:02 pm UTC
The development team have 4 tables each having more than billion rows. They data in each table are unique. The team wants to merge all the 4 of them into one table. However, the data after merging can be duplicate based on the primary key columns they have defined. They want a solution from us that can enable the constraint (after merging) and capture the duplicate rows in a separate temp table.
My approach was, merge it (insert into A select * from B,C and D with parallel append nologging to speed up the inserts), delete the duplicate rows and also store them in temp table (to speed this up, create a non-unique index on the primary key columns) and then drop the non-unique index and enable the primary key.
What would be your approach?
Thanks,
April 17, 2012 - 11:57 am UTC
how do they define which row to keep and which to discard?
how many duplicates do they anticipate - that is relevant to the solution - just a rough estimate.x
follow up,
A reader, April 17, 2012 - 9:10 am UTC
They determine the duplicate rows based on few attributes. The attributes are already primary key in their respective tables but when they merge all the 3 tables there can be duplicate and they want to capture them.
They don't have a very clear picture of how many of those 3 billion rows (1 billion in each table) are duplicate but their guess is pretty small (may be 5%).
Thanks,
April 18, 2012 - 2:51 am UTC
that is not what I meant.
say a key is in all three tables.
which one do they keep
which ones do they record as duplicate
or do they keep none of them and record all as duplicate
5% of 3,000,000,000 is a big number by the way.
how big is this table, 3,000,000,000 doesn't tell us anything other than - it is at least 3,000,000,000 bytes big.
I might be tempted to use a multi-table insert
INSERT first
when rn = 1 then into good_table values ( ... )
else into duplicates_table values ( ... )
select row_number() over (partition by key1, key2, key3 order by <whatever>) rn,
x.*
from (select * from t1 union all select * from t2 union all select* from t3) x
/
yes, that will not use any indexes
yes, you do not want to use a single index
yes, that will require gobs of temp
yes, you want to use temp
yes, that will take a while
yes, it is going to take a while
get particular recors from million of records in a table
swapna, April 23, 2012 - 2:14 pm UTC
Hi,
Suppose if a Table has millions of records. Now, I want particular record from that table? which is the best way to do this?
April 24, 2012 - 7:29 am UTC
probably an index on the columns that constitute the where clause that selects this particular record.
large deletes
A reader, May 03, 2012 - 5:37 am UTC
We need to delete a large portion of our table and would like to go by
CTAS method. However, we are running short of space in any of our tablespace.
Do we have any method as basically we want to replace with trimmed version of
table
For performance reasons,I would like to go by CTAS but others prefer the
slow by slow (batch deletes).
So only way out is to extend tablespaces to do this .is there any alternative ?
May 03, 2012 - 9:03 am UTC
disk is cheap
your time is expensive
if you go with the deletes and you are short on space - ask them 'where is the TON of redo going to go??????????'
when they answer - say "ok, instead of using that disk for redo - give it to me for a few minutes and we'll be good to go - I can CTAS using it"
you need the space for the copy of the table - yes.
big update
Ravi B, July 26, 2012 - 5:20 pm UTC
Hi Tom,
Could you please let me know if there is an efficient way of doing this?
This update is taking about 5hrs to finish.
select count(*) from TEST_A_TABLE;
--17608743
select count(*) from TEST_A_TABLE where associated = 0;
--17608743
select count(*) from ADDREMOVEDATA;
--1964698
CREATE INDEX IX_ADDREMOVEDATA_7 ON ADDREMOVEDATA
(
UPPER("NAME"),
UPPER("PUB"),
UPPER("VERSION")
)
explain plan for
update TEST_A_TABLE src
set (ASSOCIATED,
IGNORE,
PROD_RID,
EDITIONS_RID,
VERSIONS_RID,
RELS_RID) =
( SELECT 1, dst.IGNORE,dst.PROD_RID,dst.EDITIONS_RID,dst.VERSIONS_RID,dst.RELS_RID
FROM ADDREMOVEDATA dst
WHERE 1 = 1
AND (upper(dst.VERSION) = upper(src.VERSION))
AND (upper(dst.PUB) = upper(src.PUB))
AND (upper(dst.NAME) = upper(src.NAME))
AND src.ASSOCIATED = 0
)
WHERE EXISTS (SELECT 1
FROM ADDREMOVEDATA dst
WHERE 1 = 1
AND (upper(dst.VERSION) = upper(src.VERSION))
AND (upper(dst.PUB) = upper(src.PUB))
AND (upper(dst.NAME) = upper(src.NAME))
AND src.ASSOCIATED = 0
);
Plan hash value: 2294094399
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 880K| 100M| 52M (1)|205:46:30 |
| 1 | UPDATE | TEST_A_TABLE | | | | |
|* 2 | FILTER | | | | | |
| 3 | TABLE ACCESS FULL | TEST_A_TABLE | 17M| 2015M| 60067 (1)| 00:14:01 |
|* 4 | FILTER | | | | | |
|* 5 | INDEX RANGE SCAN | IX_ADDREMOVEDATA_7 | 1 | 63 | 3 (0)| 00:00:01 |
|* 6 | FILTER | | | | | |
| 7 | TABLE ACCESS BY INDEX ROWID| ADDREMOVEDATA | 1 | 149 | 4 (0)| 00:00:01 |
|* 8 | INDEX RANGE SCAN | IX_ADDREMOVEDATA_7 | 1 | | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter( EXISTS (SELECT 0 FROM "ADDREMOVEDATA" "DST" WHERE :B1=0 AND
UPPER("VERSION")=UPPER(:B2) AND UPPER("PUB")=UPPER(:B3) AND UPPER("NAME")=UPPER(:B4)))
4 - filter(:B1=0)
5 - access(UPPER("NAME")=UPPER(:B1) AND UPPER("PUB")=UPPER(:B2) AND
UPPER("VERSION")=UPPER(:B3))
6 - filter(:B1=0)
8 - access(UPPER("NAME")=UPPER(:B1) AND UPPER("PUB")=UPPER(:B2) AND
UPPER("VERSION")=UPPER(:B3))
July 30, 2012 - 9:48 am UTC
looks like you are updating just about every row in that table - using an index look up TWICE!!!!!
rewrite as a merge
merge into test_a_table tgt
using addremovedata src
on ( upper(tgt.version) = upper(src.version) and .... )
when matched then update....
and hope for two full scans and a hash join. If you see any indexes being used, fix that.
and consider rewriting this as DDL, create a new table as an outer of these two tables, index it, constrain it, grant it, etc - and then drop the old and rename the new. updating ever row in a big table is a bad idea.
big update
Ravi B, July 30, 2012 - 11:44 am UTC
Hi Tom,
There were several indexes on the TEST_A_TABLE table(which are not necessary); the execution time was drastically reduced to 12 minutes, after i dropped those indexes(with the query i provided). Infact, i tried using MERGE with HASH JOIN which is taking about 17 mins.
We have a product which runs both on SQL Server and Oracle. The SQL Server takes about 7 mins to finish the same query (same hardware configuration). My management (SQL server favored) wanted to know why oracle is taking that much time.
Here is the plan with MERGE.
Plan hash value: 265507897
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | MERGE STATEMENT | | 1 | 336 | | 182K (1)| 00:42:40 |
| 1 | MERGE | TEST_A_TABLE | | | | | |
| 2 | VIEW | | | | | | |
|* 3 | HASH JOIN | | 1 | 353 | 459M| 182K (1)| 00:42:40 |
| 4 | TABLE ACCESS FULL| ADDREMOVEDATA | 1964K| 436M| | 8325 (2)| 00:01:57 |
|* 5 | TABLE ACCESS FULL| TEST_A_TABLE | 17M| 2014M| | 60095 (1)| 00:14:02 |
----------------------------------------------------------------------------------------------
July 30, 2012 - 4:04 pm UTC
trace it and tkprof it
Slow by Slow wins?!?
Matthew McPeak, August 29, 2012 - 1:41 pm UTC
Tom,
Due to a surprise sprung on us by our parent company, my team and I have several updates that we have to accomplish and we'd like to be able to do them all in one Sunday afternoon.
Ordinarily, for a large update, I would consider:
*) Build a new, partitioned table with the same structure as the target table, then
*) INSERT /*+ APPEND */ the converted data into the new, partitioned table, then
*) Exchange the partition in the new table with the target table, then
*) Rebuild the indexes on the target table.
However, in this case, the client does not have the partitioning option licensed.
Also, the tables in question are part of a 3rd party package, so they are not comfortable with a similar strategy involving just renaming the tables at the end, since that would leave our home-built copy in place as the "real" table.
So, the best pure-SQL approach I could find was:
UPDATE ( SELECT t.col1, v.new_val FROM large_table t
INNER JOIN cnv_master_data v ON v.old_val = t.col1 )
SET col1 = new_val;
That has a plan of:
UPDATE STATEMENT ALL_ROWS
UPDATE LARGE_TABLE
HASH JOIN
TABLE ACCESS FULL CNV_MASTER_DATA
TABLE ACCESS FULL LARGE_TABLE
I didn't actually run it to completion, but I started it and watched v$session_longops. After spending almost 30 minutes doing FTS of the two tables, the progress on the HASH JOIN indicated it was going to take another 4 or 5 hours.
So, I tried the following approach:
1) Create a table having the row_ids and new value for each row_id.
2) Create an index on that table, so I could quickly select the data, sorted by row_id.
3) Open a cursor on the table, bulk collect into arrays, and FORALL UPDATE to the new value, based on the row ids.
It took just over 2 hours.
Now, the manager wants me to type this up as the "approach" for all the other developers who have similar updates to do as part of this initiative.
I really hate typing up a "slow-by-slow" approach as the "right" way to do this, but I cannot deny that it seems faster.
Am I crazy?
As always, I appreciate your insights and comments.
Best regards,
Matt
August 29, 2012 - 1:56 pm UTC
I don't know how big these tables are nor how many rows are to be updated.
I would have used
CTAS
drop old
rename new
index, constrain, grant
myself.
Yes, but...
Matthew McPeak, August 29, 2012 - 2:04 pm UTC
I know. Me too. But they're not comfortable doing those operation on important tables of a 3rd party application.
There is the fear the the new table would be somehow different from the original (a missed trigger, rowdependencies, etc).
It's not that those fears cannot be overcome -- it's a "F.U.D." issue, not a technical issue.
But what about the PL/SQL approach with the pre-built rowid table? Is it possible for such an approach to be faster than a pure-SQL update?
P.S. The tables vary in size -- mostly a few million blocks. We're updating maybe 80% of the rows in each table.
Thanks again,
Matt
August 29, 2012 - 2:26 pm UTC
wasn't the plan for the create table with rowids just
hash join
full scan t1
full scan t2
????
what was different about that plan that the creation of the table went instantly fast?
(you are already updating a 3rd party applications table, you've already crossed the line... the FUD should already be there...)
Yes
Matthew McPeak, August 29, 2012 - 2:35 pm UTC
Yes, that was the plan. It wasn't instantly fast, but it was just 20 minutes or so. A better question (that I cannot answer) is why does that HASH JOIN step go so much slower when it is inside the update statement?
>> you've already crossed the line... the FUD should already be there...
We're in a bind :(... We're just trying not to make the situation worse than it needs to be...
Anyway, I am putting together a more focused test on a 1 million row dummy table. If that continues to be faster, I'll post it.
Thanks,
Matt
August 29, 2012 - 2:52 pm UTC
the hash join was actually doing the update at that point in time as well. The hash join had:
a) read the one table and hashed it
b) started reading the table to update and as it hashed joined it to (a), did the update
The update hash join did everything in one step.
did your steps 1, 2 and 3 take 2 hours or was it just step 3 that took 2 hours.
Comparing
Matthew McPeak, August 29, 2012 - 2:58 pm UTC
Building the rowid table took about 20 minutes, using it to update the base table took another 2 hours or so.
Doing the straight update was on course to take 4-5 hours, all told.
insert select VS CAST which is faster?
A reader, August 31, 2012 - 3:05 am UTC
Hi tom,
I heared that in Teradata, 'create table as' is 20% slower than 'create table, insert select'.
Would like to know how is Oracle if we do not consider parallel, nologging? Just a very common case.
My rough testing shows, 'insert select' is slower than 'create as', could you please elaborate more on the processing logic behind their implementation?
September 10, 2012 - 6:30 pm UTC
in Oracle, the opposite is usually true - the create table as select will in some cases marginally outperform an insert as select (we can bypass a final processing step sometimes).
Posting example
Matthew McPeak, August 31, 2012 - 9:01 am UTC
Tom,
I am actually posting my example with timings.
I am wondering if maybe something is not tuned correctly that is making a big "update" statement so slow? Because I wrote a focused test case with a million rows and found the PL/SQL approach almost as fast as pure-SQL, but not faster.
But, when I ran the two approaches on the real (much bigger) table, the difference was dramatic.
The only difference I see is that the PL/SQL approach can tune the work by rowid, so all the updates to a single block happen together. With the pure update do that? Would it even make a difference?
I'd much rather believe that there is something de-tuned in the database. The single UPDATE statement obviously has to write and read from TEMP segments... maybe something there I should look at?
Thanks, as always! (Code example below)
Matt
-- OE_ORDER_LINES_ALL is the table to be converted
-- CNV_MASTER_VINS is pre-existing -- tells us the old values and new values for the conversion
-- Step 1 -- create a list of row_ids from OE_ORDER_LINES_ALL and the new value for each record
-- Run time in STRDV4: 1020 seconds
CREATE TABLE cnv_oel_rowids AS
SELECT oel.rowid row_id, v.vin17
FROM oe_order_lines_all oel INNER JOIN cnv_master_vins v ON v.vin8 = oel.ordered_item
WHERE oel.item_type_code = 'CONFIG';
-- Step 2 -- create an index on the rowid table so that we can quickly sort by rowid
-- (maybe isn't necessary, but helps in testing with less than the full # of rows, and
-- doesn't take too much time to build.)
-- Run time in STRDV4: 19 seconds
CREATE UNIQUE INDEX cnv_oel_rowids_u1 on cnv_oel_rowids ( row_id, vin17 )
-- Step 3 -- Lock the table in exclusive mode so we don't get hung up waiting for anyone's locks...
LOCK TABLE oe_order_lines_all IN EXCLUSIVE MODE;
-- Step 4 -- Loop through the rowid table, and use the information to update the conversion table
-- Run time in STRDV4: 873 seconds
DECLARE
CURSOR c IS SELECT * FROM ( SELECT row_id, vin17 FROM cnv_oel_rowids ORDER BY row_id ); -- WHERE ROWNUM <= 10000;
TYPE t_row_id_tab IS TABLE OF rowid;
TYPE t_vin17_tab IS TABLE OF VARCHAR2(17);
l_row_ids t_row_id_tab;
l_vin17s t_vin17_tab;
-- For longops progress...
rindex BINARY_INTEGER;
slno BINARY_INTEGER;
totalwork NUMBER;
sofar NUMBER;
obj BINARY_INTEGER;
BEGIN
rindex := DBMS_APPLICATION_INFO.set_session_longops_nohint;
sofar := 0;
totalwork := 1737827;
OPEN c;
LOOP
FETCH c BULK COLLECT INTO l_row_ids, l_vin17s LIMIT 1000;
FORALL i IN 1..l_row_ids.COUNT
UPDATE oe_order_lines_all
SET ordered_item = l_vin17s(i)
WHERE rowid = l_row_ids(i);
sofar := sofar + l_row_ids.COUNT;
DBMS_APPLICATION_INFO.set_session_longops (rindex,
slno,
'OEOL Conversion 3',
obj,
0,
sofar,
totalwork,
'conversion',
'items');
EXIT WHEN c%notfound;
END LOOP;
CLOSE C;
END;
-- Total runtime of bulk SQL approach in STRDV4: 1020+19+873 = 1912 seconds
-- Alternative "pure-sql Approach":
-- Run-time in STRDV4: 15911 seconds
UPDATE (SELECT oel.ordered_item, v.vin17
FROM oe_order_lines_all oel INNER JOIN cnv_master_vins v ON v.vin8 = oel.ordered_item
WHERE oel.item_type_code = 'CONFIG')
SET ordered_item = vin17 ;
September 10, 2012 - 6:33 pm UTC
The only difference I see is that the PL/SQL approach can tune the work by
rowid, so all the updates to a single block happen together. With the pure
update do that? Would it even make a difference?
that could definitely have a big difference - but one would hope an update like that would not use an index so it will full scan and hit the blocks in order too
what is the plan for the update?
Tom You have to accept problem exists with Oracle
A reader, September 05, 2012 - 5:25 am UTC
Hi Tom,
I have been a long time reader of your posts and give them quite a bit of importance. but I never found your CTAS suggestion to replace update.
Oracle has to accept that their serial update or parallel update operations are not optimized.
Oracle parallel processing capabilities are very very bad. Oracle have to put some effort in this area.
September 10, 2012 - 7:42 pm UTC
they are optimized.
they are optimized for OLTP, they always have been, they always will be.
and for you to say "parallel processing ... are very very bad" - wow. Flabergasted. Ok, you are "entitled" to your opinion - but it is wrong.
You are a software *engineer* correct?
as an *engineer* do you accept the fact that if a plane flies slowly - say at 500-600mph - the wings should be more or less straight out and long? I hope so, it is the current design of most of the planes I fly on.
as an *engineer* do you accept the fact that if a plane wants to fly above the speed of sound - you need a *different* approach, a *different* design, and entirely *different* thing altogether. Those wings, they better be short, they better be stubby - otherwise you'll rip them RIGHT OFF - now won't you.
Does that meant the maker of the 777 "has bad wings"? No, not at all, it means their wings are designed for one set of parameters. Does it mean the makers of the wings for fighter jets got it all wrong? has a bad implementation? No, not at all - that would be a stupid silly statement wouldn't it?
If you want to update millions of rows, I'm convinced (based on watching it over and over and OVER and OVER again) that you want to use PARALLEL (yes, that bad parallel stuff we got) direct path (create table as select/insert /*+ APPEND */) operations which can bypass the undo and redo *necessary* (necessary) for OLTP - but not for a big bulk operation)
If you are a software engineer and you truly believe that there is a single best way for everything - you are not yet a software engineer, not even close :)
You have to put some effort into learning the fact that a warehouse type bulk operation and an OLTP operation is like comparing a 777 to a fighter jet. *silly* is the first word that pops into my mind.
It's the rowid sorting!
Matthew McPeak, September 06, 2012 - 1:54 pm UTC
Tom,
It seems to be the rowid sorting in the PL/SQL that is saving all the time.
I build a biggish table "MATT1" -- 5.1 million rows using the following:
<code>
create table matt1 ( a number not null, b varchar2(17) not null, block_filler varchar2(400));
insert into matt1
select rownum a, dbms_random.string('u',17), dbms_random.string('a',400) block_filler
from dual
connect by rownum <= 100000;
insert into matt1
select rownum + 100000, dbms_random.string('u',17), dbms_random.string('a',400) block_filler
from matt1, ( SELECT null FROM dual CONNECT BY ROWNUM <= 50 );
commit;
create table matt1_cnv ( b varchar2(17) not null, new_b varchar2(8) not null );
insert into matt1_cnv
select b, substr(b,-8) from matt1
order by dbms_random.value;
create unique index matt1_cnv_u1 on matt1_cnv (b);
Then, ran a pure SQL update both with rowid sorting and without:
update ( SELECT t.a, t.b, c.new_b FROM matt1 t inner join matt1_cnv c on c.b = t.b)
set b = new_b
vs...
update ( SELECT t.a, t.b, c.new_b FROM matt1 t inner join matt1_cnv c on c.b = t.b ORDER BY t.rowid)
set b = new_b
The update with no rowid sorting took 3.93 hours. The update with ORDER BY t.rowid took 21 minutes.
My question for you -- is there some statistic for updates that I could look at to prove one is doing less work than the other. Like, for SELECT statements, I would compare "consistent gets". Is there something analagous for tracking how much work updates do?
Thanks!
Matt
</code>
September 10, 2012 - 7:58 pm UTC
let's see the plan for the update.
Here are the plans
Matt McPeak, September 11, 2012 - 9:35 am UTC
The first plan, with the order by, ran about 11 times faster.
<code>
update ( SELECT t.a, t.b, c.new_b FROM matt1 t inner join matt1_cnv c on c.b = t.b ORDER BY t.rowid)
set b = new_b
UPDATE STATEMENT ALL_ROWS Cost: 150,330 Bytes: 163,072,896 Cardinality: 4,291,392 Partition #: 0
5 UPDATE APPS.MATT1 Cost: 0 Bytes: 0 Cardinality: 0 Partition #: 0
4 SORT ORDER BY Cost: 150,330 Bytes: 163,072,896 Cardinality: 4,291,392 Partition #: 0
3 HASH JOIN Cost: 107,408 Bytes: 163,072,896 Cardinality: 4,291,392 Partition #: 0
1 TABLE ACCESS FULL APPS.MATT1 Cost: 86,835 Bytes: 94,410,602 Cardinality: 4,291,391 Partition #: 0
2 TABLE ACCESS FULL APPS.MATT1_CNV Cost: 6,296 Bytes: 85,605,536 Cardinality: 5,350,346 Partition #: 0
update ( SELECT t.a, t.b, c.new_b FROM matt1 t inner join matt1_cnv c on c.b = t.b)
set b = new_b
UPDATE STATEMENT ALL_ROWS Cost: 104,970 Bytes: 111,576,192 Cardinality: 4,291,392 Partition #: 0
4 UPDATE APPS.MATT1 Cost: 0 Bytes: 0 Cardinality: 0 Partition #: 0
3 HASH JOIN Cost: 104,970 Bytes: 111,576,192 Cardinality: 4,291,392 Partition #: 0
1 TABLE ACCESS FULL APPS.MATT1 Cost: 86,835 Bytes: 42,913,910 Cardinality: 4,291,391 Partition #: 0
2 TABLE ACCESS FULL APPS.MATT1_CNV Cost: 6,296 Bytes: 85,605,536 Cardinality: 5,350,346 Partition #: 0
</code>
September 14, 2012 - 4:25 pm UTC
yeah, it is the data access patterns the hashing puts the data all over in memory (not in block order) - your order by put it back in block order.
I would say that is the cause - your order by happens to put the rows to be updated back in block order and minimized the physical IO probably by a great deal (we didn't have to read and re-read blocks)
Thanks
Matt McPeak, September 14, 2012 - 5:33 pm UTC
Thanks, Tom! That was really bugging me. I thought that was probably it, but I find it hard to be sure sometimes. Your help is much appreciated, as always!
One clarification...
Matt McPeak, September 15, 2012 - 10:12 am UTC
Thanks again for your help. One more follow up though please...
You said that the order by "happens" to put the updates back in block order, as though this is an unusual and unique situation.
After this, it seems to me that any sort of large update against an updatable view should be ordered by rowid of the target table. Maybe even merge statements too, if the target table is in the USING query and the ON clause doesn't access the target table via index.
In your view, is this just "one of those things" or is it something to remember to do going forward?
Thanks!
September 16, 2012 - 4:13 am UTC
it depends on the plan, only if the plan messed up the natural order.
if the plan was:
nested loops
full scan of table to be updated
index range scan
table access by index rowid of table with new values
then the table to be updated would be in block order - we'd read each block once and update it.
it depends on the plan taken, the data access pattern dictated by the plan.
A Reader, September 21, 2012 - 2:01 pm UTC
Hi Tom,
If I do an insert into...select x,y,z from, and if this is inserting a substantial number of records, can another user enter the same record and commit before this completes? In that case the original insert will fail with dup_val_on_index right?
A Reader
September 26, 2012 - 12:56 pm UTC
yes
Oracle Parallel processing is flawed.
A reader, September 26, 2012 - 10:21 pm UTC
Tom,
By reading this thread, I am extremely disappointed from Oraclel software. Its now an establish fact that Oracle is just for OLTP processing.
For Oracle parallel processing even with Exadata, update statements poses problems. Oracle still not able to access the DOP of parallelism correctly for parallel queries. Exadata still have many architectural flaws.
If you buy a one million USD machine and to update a huge table completely you have to follow the non practical techniques as describe in this thread. I would really suggest the clients to see for another database a database design for data warehouse.
September 27, 2012 - 8:56 am UTC
update statements poses
problems.no duh, brilliant. that is true for any database. update was invented for..... hold onto your hats..... OLTP, transactions!!!!
if you buy a 1,000,000$ machine to update a huge table - you are doing it wrong, anyone would tell you that.
you don't update a huge table, that is silly (you don't want to update millions or billions of rows and generate redo, undo (oltp things!!!))
you would use create table as select, period. massively parallel, no undo, no redo, no buffer cache, direct path reads/writes, easily compressable, etc.
I've already written this to you before - it is getting boring.
If something is an established fact, you'd be able to point us to the proof. I can point you to surveys that show MOST data warehouses run on Oracle. That Oracle is well positioned for warehousing. Now what?
http://www.oracle.com/us/products/database/dw-software-leadership-1691879.html http://www.sybase.com/files/White_Papers/Gartner_MagicQuad_forDataWarehouseDMS.pdf if you recommend to clients to use update to modify a huge table, I wold suggest your clients consult another consultant.
established fact just for Oracle
A reader, September 27, 2012 - 11:32 pm UTC
Hi Tom,
I dont mean to hurt you, neither I think its a bored topic. Try update statements in Netezza or Teradata You would find this thread more as a work around of a problem which Oracle has failed to address. The links you provided dont hold any significance since they are from oracle or sybase which already dont considered a warehouse database.
Just look at the parallel processing capabilities of Oracle, its extremely poor compared netezza or teradata. Further now its an establish fact that Oracle architecture have failed to understand what needed for Parallel processing.
Just take your example, You have advocating for a long long time that share everything architecture is best compare to Teradata share nothng architecture. Asktom is full of such FUD information. While now Exadata uses a sort of shared nothing and share everything architecture.
September 28, 2012 - 6:59 am UTC
ummm, the link to sybase is to a gartner article. gartner is not a database vendor, they are an independent group that evaluates database vendors.
the link to oracle.com is a link to IDC, again, another independent body outside of the vendor.
http://www.idc.com/ now what? still say "no significance"?????
Just look at the parallel processing capabilities of Oracle, its extremely
poor compared netezza or teradata.explain, don't just spew junk out, explain. If it is so so so obviously clear, it should be simple for you.
You have advocating for a long long time that share
everything architecture is best compare to Teradata share nothng architecture. i still do, 100%.
exadata combines a bit of shared nothing (the storage cells and query offloading) with the flexibility and ease of implementation of shared everything (the database server in the middle, the up to eight nodes of RAC cluster).
We do not make you reorganize your database as you add capacity, you don't partition for the ease and convenience of the computer. You partition the way you want to partition for administration (purging of data for example), you do not partition for your shared nothing implementation.
And you knew that, you just wrote it. It is not a shared nothing, some of the IO is that way - and that totally 100% transparent to the database designer, not stuck right in your face. It combines some shared something with a completely shared everything approach.
further
A reader, September 27, 2012 - 11:39 pm UTC
The problem with most Oracle resouces is, they have considered the oracle database as final thing and did not experiemented with other database. For a warehouse prespective most of the Oracle corporation resouces dont qualify even as warehouse expert, they have limited experience in this field. Exadata is just one such example. You spend millions of dollars just because you are not ready to experiment new systems.
I challenge TOM to have a BENCH MARK WITH NETEZZA VS ORACLE JUST ON THIS UPDATE. IF NETEZZA WON, CLOSE ASK TOM, IF TOM WINS I WOULD LEAVE NETEZZA DATABASE.
HOPE TOM IS MAN ENOUGH TO ACCEPT THE CHALLENGE AND TOM PLEASE STOP SPREADING FUD INFORMATION. THE INFORMATION HE SPREADS JUST HOLDS FOR ORACLE, NOT FOR OTHER DATABASE.
September 28, 2012 - 7:04 am UTC
there will never be a benchmark on an update on a data warehouse for the simple reason:
you don't do it that way. that would not be a smart approach.
I tell people how to properly use the Oracle database - yes, that is 100% true. You can tell them how to use the others, that isn't my job. But if you try to tell us that the approaches you use on database X are good for database Y, then you need to rethink your position.
If all databases were exactly the same, did everything the same way, there wouldn't be other databases now would there. We are different, you do things differently on the different databases.
for example, on Oracle a DBA partitions for their ease and convenience, we don't worry about the shared nothing aspect - that doesn't ever enter into our design. Now if you tried that on the others - that wouldn't work so swell would it.
for example, on Oracle, you can actually trickle feed your reporting/warehouse system since we can actually do OLTP in your warehouse - we can support fully concurrent reads and writes. You wouldn't want to be doing that in some other databases.
and of course the information I "spread" holds for Oracle - that is pretty obvious isn't it? I've never tried to hide where I'm from (ahem, like "a reader")
Maintaining Referential Integrity during CTAS
Ankit, January 29, 2013 - 4:14 am UTC
Hi Tom,
Can you explain to me how can one use CTAS (create table as select) strategy on a table with concurrent reads and contains referential constraints?
Please direct me to the link if you have already answered.
January 31, 2013 - 9:16 am UTC
if you are updating millions of records, this is going to be an offline operation (no modifications - you are locking pretty much the entire table after all).
so, during the create table as select, you will be the only one accessing the table in a write (modify) mode.
anyone that reads the old table will just read the old table - nothing to worry about there.
as for referential integrity -- you'll be dropping and adding those constraints after you drop and rename the table, they'll take care of themselves (again, an offline operation)
Bulk update order
Rajesh Voruganti, February 15, 2013 - 4:32 am UTC
Hi Tom,
I have one quetion. When we update the set of records using with bulk update query. So In this case Oracle engine taking which order?. Is Taking by that table of the primary column or else anything?
Thanks
Rajesh.
February 15, 2013 - 9:31 am UTC
if you do a bulk update, like this:
forall i in 1 .. some_table_variable.count
update t set x = some_table_variable.x(i) where key = some_table_variable.key(i);
then the updates would take place in the order of the elements in some_table_variable.
how can i decide when i use partitioning tables?
mohamed Aboazm, March 07, 2013 - 8:53 am UTC
Dear Tom, I have acertain table contaning 3 milion records and the bussiness workflow is to update and insert every minute as totally about 1000 recors every day. So, i want to use certain scenario to deal with this table.
March 07, 2013 - 9:05 am UTC
it is a tiny table. with very little work done to it.
what are you trying to accomplish with partitioning in this case? why are you applying partitioning - what do you want to achieve with it. You need to start there.
problem
asdf, March 14, 2013 - 3:25 am UTC
dear tom,
plz solve this problem
rror in ==> C:\MATLAB6p5\work\Untitled3.m
On line 7 ==> Diffs(row, col) = Image(row, col)-100 ;
ramya, April 29, 2013 - 7:08 pm UTC
Hello TOM,
I have a select statement in a cursor through which the data is being selected and the data is for looped and it is update in a table having 64 million records(OVRRD_COST).The select statement in the cursor is taking time as the driving table which is a view(CRE_WORK_SET_TEST_VW) and the data in that view will change for each run. we do have 2,000,000 records in RAS_HIER table , 20,000,000 records in OVRRD_BOM and the another table OVRRD_COST is having 64 million records. And in the filter where clause it is having the bind variables in the select statement. the values of costper01 to costper18 from select statement are being used in the update set columns.
I want to optimize the below select statement. Any suggestions please ...
Please find the below select statement and I would like to mention that for each run the row coun to be updated in the target table will vary b/n 4000 to 1,000,000 .
Orginal select :
SELECT CB.matnr,CB.ORGUNTY,CB.ORGUNCD,DECODE(CR.CTO_FLAG,'N',CR.PTCSTY,DECODE(PRODTY,:V_OPTION_IND,CR.PTCSTY,DECODE(CR.PTCSTY,'LO','M',CR.PTCSTY))) PTCSTY,
sum(CMPNTQT1 * (COST_MTH1 + NVL(AGGRG_COST_MTH1,0))) COSTPER01,
sum(CMPNTQT2 * (COST_MTH2 + NVL(AGGRG_COST_MTH2,0))) COSTPER02,
sum(CMPNTQT3 * (COST_MTH3 + NVL(AGGRG_COST_MTH3,0))) COSTPER03,
sum(CMPNTQT4 * (COST_MTH4 + NVL(AGGRG_COST_MTH4,0))) COSTPER04,
sum(CMPNTQT5 * (COST_MTH5 + NVL(AGGRG_COST_MTH5,0))) COSTPER05,
sum(CMPNTQT6 * (COST_MTH6 + NVL(AGGRG_COST_MTH6,0))) COSTPER06,
sum(CMPNTQT7 * (COST_MTH7 + NVL(AGGRG_COST_MTH7,0))) COSTPER07,
sum(CMPNTQT8 * (COST_MTH8 + NVL(AGGRG_COST_MTH8,0))) COSTPER08,
sum(CMPNTQT9 * (COST_MTH9 + NVL(AGGRG_COST_MTH9,0))) COSTPER09,
sum(CMPNTQT10 * (COST_MTH10 + NVL(AGGRG_COST_MTH10,0))) COSTPER10,
sum(CMPNTQT11 * (COST_MTH11 + NVL(AGGRG_COST_MTH11,0))) COSTPER11,
sum(CMPNTQT12 * (COST_MTH12 + NVL(AGGRG_COST_MTH12,0))) COSTPER12,
sum(CMPNTQT13 * (COST_MTH13 + NVL(AGGRG_COST_MTH13,0))) COSTPER13,
sum(CMPNTQT14 * (COST_MTH14 + NVL(AGGRG_COST_MTH14,0))) COSTPER14,
sum(CMPNTQT15 * (COST_MTH15 + NVL(AGGRG_COST_MTH15,0))) COSTPER15,
sum(CMPNTQT16 * (COST_MTH16 + NVL(AGGRG_COST_MTH16,0))) COSTPER16,
sum(CMPNTQT17 * (COST_MTH17 + NVL(AGGRG_COST_MTH17,0))) COSTPER17,
sum(CMPNTQT18 * (COST_MTH18 + NVL(AGGRG_COST_MTH18,0))) COSTPER18,
sum (CMPNTQT1 * (STD_COST + NVL(AGGRG_STD_COST,0))) STDCOST,
DECODE(PRODTY,'UN','Y','N') OPTION_FLAG,
WS.FLOOR_SOURCE
from CRE_WORK_SET_TEST_VW WS,
RAS_HIER RH,
OVRRD_BOM CB,
OVRRD_COST_TEST CR
WHERE CR.ORGUNTY = CB.ORGUNTY AND
CR.ORGUNCD = CB.ALT_ORGUNCD AND
CR.MATNR = CB.IDNRK AND
CB.MATNR = WS.MATNR AND
WS.MATNR = RH.SKU_NO(+) AND
WS.ORGUNTY =CB.ORGUNTY AND
WS.ORGUNCD = CB.ORGUNCD AND
WS.LEVELNO = :N_LEVELNO AND
WS.PTCSTY = :V_PTCSTY AND
WS.ORGUNCD = :ORGUNCD AND
WS.ORGUNTY = :V_ORGUNTY AND
WS.BATCH_ID = :TASK_LOG AND
DECODE(PRODTY,:V_OPTION_IND,CR.PTCSTY,DECODE(CR.CTO_FLAG,'N',CR.PTCSTY,DECODE(CR.PTCSTY,'LO','M',CR.PTCSTY))) =:V_PTCSTY AND
WS.SETNO = :N_SETNO
GROUP BY CB.matnr,
CB.ORGUNTY,
CB.ORGUNCD,
DECODE(CR.CTO_FLAG,'N',CR.PTCSTY,DECODE(PRODTY,:V_OPTION_IND,CR.PTCSTY,DECODE(CR.PTCSTY,'LO','M',CR.PTCSTY))),
DECODE(PRODTY,:V_OPTION_IND'UN','Y','N'),
WS.FLOOR_SOURCE
April 30, 2013 - 2:35 pm UTC
do not select, read slow by slow and update small numbers of rows millions and millions of times.
write one single select statement - this select statement will produce the "updated set". Then put create table as select on front of this select and create a new table, drop the old, rename the new.
no undo
no redo (if you want)
all direct path writings, no inefficient caching happening
so, write a single select that joins the data to be updated with the data that does the updating and just create a new table
watch this:
http://tinyurl.com/RWP-DW-PART3 the interesting bit for you is about half way in - but the entire video is relevant to anyone doing some sort of 'ETL' processing like you.
A reader, April 30, 2013 - 4:19 pm UTC
Hello TOM,
I would like to thank you for answering my question.
Could you please show me an example for CTAS .
"write a single select that joins the data to be updated with the data that does the updating and just create a new table " -- correct me if I am wrong regarding this sentence, you want me to use merge here, if not could please elaborate it !!
Answer for example of CTAS
Siby, April 30, 2013 - 7:51 pm UTC
Hi,
In response to the previous question, I think this want Tom is referring to (he has shown this earlier on this page):
<quote>
ok, say you wanted to update emp to set ename = lower(ename). Instead, you could do this:
ops$tkyte@ORA817DEV.US.ORACLE.COM> create table new_emp as
2 select empno, LOWER(ename) ename, JOB,
3 MGR, HIREDATE, SAL, COMM, DEPTNO
4 from emp;
Table created.
ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> drop table emp;
Table dropped.
ops$tkyte@ORA817DEV.US.ORACLE.COM> rename new_emp to emp;
Table renamed.
</quote>
Instead of the simple select, you'd probably have a join depending on the number of tables and the new (updated) values that you want.
how to join these two merge statements
A reader, May 12, 2013 - 6:27 pm UTC
Hi Tom,
I have developed below 2 merge statements for updating parent/child records from the same table. However, I am not sure if there is a way we can join these two merge statements into single merge(or update) statement. Can you please help out here? I have included comments in the 2nd merge statement(where there is a difference from 1st merge statement). For security purposes I scrambled the data.
merge into test1 t1_o
using (select t1.col1, t1.col2, t1.col3, sum(t3.qty) parent_qty
from test1 t1, test2 t2, test3 t3
where t1.col3 = 'TEST' and t2.col1 = t1.col1 and
t2.col2 = t1.col2 and t2.col4=t1.col4 and t4.st_date <= sysdate-4
and t3.col1(+) = t1.parent_col /* Parent */
and t3.col5(+) = t1.col5 and t3.col6 (+) = t1.col6
and t3.col7(+) <= sysdate-2
group by t1.col, t1.col2, t1.col3) i
on (t1_o.col1 = i.col1 and t1_o.col2 = i.col2 and t1_o.col3 = i.col3)
when matched then update set t1_o.parent_qty = i.parent_qty, t1_o.col9='TEST2', t1_o.col10='TEST3';
merge into test1 t1_o
using (select t1.parent_col, t1.col1, t1.col2, t1.col3, sum(t3.qty) child_qty
from test1 t1, test2 t2, test3 t3
where t1.col3 = 'TEST' and t2.col1 = t1.col1 and
t2.col2 = t1.col2 and t2.col4=t1.col4 and t4.st_date <= sysdate-4
and t3.col1(+) = t1.col1 /* col1 instead of parent_col */
and t3.col5(+) = t1.col5 and t3.col6 (+) = t1.col6
and t3.col7(+) <= sysdate-2
group by t1.col, t1.col2, t1.col3) i
on (t1_o.parent_col = i.parent_col and t1_o.col1 = i.col1 t1_o.col2 = i.col2 and t1_o.col3 = i.col3)
when matched then update set t1_o.child_qty = i.child_qty, /* child_qty instead of parent_qty */
t1_o.col9='TEST2', t1_o.col10='TEST3';
May 13, 2013 - 12:57 pm UTC
In general, I'd have to say "no"
what if the first merge updates record X *and then* the second merge updates record X. That would work fine as a two step, it could raise an error as a single merge.
that and you are merging in on different columns, you are joining to t1_o using entirely different criteria on the ON clause - while we could union together the two using statements *maybe* - it would do us no good since you join using three columns in one of them and four columns in the other.
update help
A reader, May 13, 2013 - 8:56 pm UTC
Hi Tom,
Can you please check the below update and provide your valuable suggestions on how it could be done efficiently.
update test_1 o set col1='test1', col2='test2' where exists (
select 1 from test_1 i where col3 = :B1 and col2 = :B2
and col4 = o.col4 and col5 = o.col5 and col6 = o.col6
group by col4, col5, col6 having count(*) > 1)
and col3 = :B1
and col2 = :B2;
Thank you
May 13, 2013 - 11:43 pm UTC
no create
no inserts
no look
also, give information about size of table, probable number of rows modified, how many rows you expect the count(*) to represent when it is updated
Alexander, June 06, 2013 - 3:59 pm UTC
Hi Tom,
I have a pretty large table that developers want to add a new column to, and back fill the new column with default values. Is the fastest way to do this:
create table t_temp as select cols, 1 as "new_col_value"
from t nologging
drop t
rename t_temp to t
put indexes and stuff back...?
INSERT 200000 ROWS
Luis, August 13, 2013 - 11:28 pm UTC
Hi Tom,
I need to insert 200000 rows (result a select) to a table every month, what's the best practics to do that??
INSERT APPEND INTO in_table SELECT * FROM out_table;
Thanks in advance.
August 14, 2013 - 2:55 pm UTC
append would be a hint - so it would be insert /*+ APPEND */
do you delete from this table? or just insert? 200,000 rows is pretty small - if you do deletes and want to reuse that space - you'll want to drop the append hint.
post load update - two times
waran, August 20, 2013 - 2:17 am UTC
hi Tom i am learning from your blog lot. but this is my question. i need your expertise on this. i am loading one table which contain 12 million records every day. after loading i need to do post load update in one column two times. what is the best way to do this. to load the table i am using SP which will automatically load the table daily basis.
Version 11.2
August 28, 2013 - 5:29 pm UTC
the best way:
do not update, perform the update logic during the load. Just do a single load with all of the update logic embedded.
why do you feel the need to update twice? why wouldn't you just insert the data you want????
Update Methods
Todor, August 22, 2013 - 12:10 pm UTC
Hi Tom,
I found an articale where 8 update methods are compared:
http://www.orafaq.com/node/2450 Can we trust the test that are made?
In your opinion, which method schould be used if it's necessary to use dynamic sql for the updates? (11g)
Thank you!
August 28, 2013 - 6:11 pm UTC
they missed the best
do not UPDATE!!!!!! use create table as select to create the modified data. can skip redo, undo, run in parallel, bypass the inefficient buffer cache and do it all in a single statement.
but always bulk up
a single sql statement is almost always the way to go
parallel can benefit sometimes (but not always)
DDL should be considered when you are going to hit most of the blocks anyway (eg: if you are going to update 1% of the rows in a table, and that table has 100 rows per block, and your data is uniformally scattered through the table - you will update every single block - you might as well use DDL for that!!!!)
Any better suggestion for the below code?
Karthik, September 06, 2013 - 8:01 am UTC
Hi Tom
I request you to kindly let me know if there is a better way to write the below code. This code basically updates about 350 million records using BULK UPDATE and FORALL statements. It takes about 32-35 hours for this code to complete the updation. Kindly suggest if there any better ways to update. We are using 20 threads / jobs in parallel to do the updation and still takes the above time.
----------------------
FUNCTION INTF_MAINFL1_UPDATE_T(ID_FROM IN NUMBER, ID_TO IN NUMBER)
RETURN BOOLEAN AS
CURSOR cur_intf_mainfl1 IS
SELECT im.rowid i_rowid, im.im1_sys_id i_seqno
FROM intf_mainfl1 im
where im.im1_sys_id between ID_FROM AND ID_TO;
varid intf_mainfl1.im1_sys_id%TYPE;
varcount NUMBER;
TYPE typ_cur_intf_mainfl1 IS TABLE OF cur_intf_mainfl1%ROWTYPE;
lv_typ_cur_intf_mainfl1 typ_cur_intf_mainfl1;
BEGIN
varcount := 0;
OPEN cur_intf_mainfl1;
LOOP
FETCH cur_intf_mainfl1 BULK COLLECT
INTO lv_typ_cur_intf_mainfl1 LIMIT 2500;
EXIT WHEN lv_typ_cur_intf_mainfl1.COUNT = 0;
FORALL i IN lv_typ_cur_intf_mainfl1.first .. lv_typ_cur_intf_mainfl1.last
-- LOOP
-- BEGIN
-- varid := lv_typ_cur_intf_mainfl1(i).i_seqno;
UPDATE dedup_precook_data dpd
SET dpd.dpd_rowid = lv_typ_cur_intf_mainfl1(i).i_rowid
WHERE dpd.dpd_table_id = 'M'
AND dpd.dpd_uploadid = lv_typ_cur_intf_mainfl1(i).i_seqno;
FORALL i IN lv_typ_cur_intf_mainfl1.first .. lv_typ_cur_intf_mainfl1.last
UPDATE dedup_result dr
SET dr.drm_rowid = lv_typ_cur_intf_mainfl1(i).i_rowid
WHERE dr.drm_table_id = 'M'
AND dr.drm_uploadid = lv_typ_cur_intf_mainfl1(i).i_seqno;
--END LOOP;
COMMIT;
END LOOP;
CLOSE cur_intf_mainfl1;
COMMIT;
RETURN TRUE;
EXCEPTION
WHEN OTHERS THEN
RETURN FALSE;
END;
----------------------
September 09, 2013 - 11:37 am UTC
EXCEPTION
WHEN OTHERS THEN
RETURN FALSE;
END;
I hate your code
search this site for "i hate your code" to find out what I hate about your code...
in general, i would never even consider updating 350,000 rows. I would be looking to use DDL to convert the data (no redo, no undo, parallel easily, fast - big - bulk - NO PROCEDURAL CODE)
committing inside the loop!!!!!!!! ugh, what is that about??? do you want to get ora-1555's, what happens when this code fails? How do you restart?????????????
this should be two create tables as select - one to create a new dedup_result and another to create a new dedup_precook_data
Updateable Join view deprecated ?
Philippe, September 09, 2013 - 12:31 pm UTC
September 10, 2013 - 9:44 am UTC
it is not deprecated, it is ANSI standard SQL syntax. they (the authors of that page) are calling it deprecated because in their opinion, merge is the easy way to express what you want to do (and I sort of agree, but I won't call an updateable join 'deprecated')
Partition exchange problem with primary key constraint
Milind, January 16, 2014 - 6:52 am UTC
Hi Tom,
I am trying for partition exchange as follows which fails with error. (Oracle version 11.2.0.1.0)
SQL> CREATE TABLE pemp
2 (empno NUMBER(4) CONSTRAINT pk_pemp PRIMARY KEY,
3 ename VARCHAR2(10),
4 job VARCHAR2(20),
5 mgr NUMBER(4),
6 hiredate DATE,
7 sal NUMBER(7,2),
8 comm NUMBER(7,2),
9 deptno NUMBER(2) )
10 PARTITION BY RANGE (sal) (
11 PARTITION p1_1250 VALUES LESS THAN (1251),
12 PARTITION p2_3000 VALUES LESS THAN (3001),
13 PARTITION p3_restsal VALUES LESS THAN (MAXVALUE));
Table created.
SQL> INSERT INTO pemp SELECT * FROM emp;
14 rows created.
SQL>
SQL> CREATE TABLE emp_t AS SELECT * FROM pemp WHERE empno IS NULL;
Table created.
SQL>
SQL> INSERT INTO emp_t SELECT empno, ename, job || 'abcd' AS job, mgr, hiredate, sal, comm, deptno FROM pemp PARTITION
2 (p2_3000);
8 rows created.
SQL>
SQL> ALTER TABLE pemp EXCHANGE PARTITION p2_3000
2 WITH TABLE emp_t INCLUDING INDEXES WITHOUT VALIDATION;
ALTER TABLE pemp EXCHANGE PARTITION p2_3000
*
ERROR at line 1:
ORA-14097: column type or size mismatch in ALTER TABLE EXCHANGE PARTITION
Above thing works if I do not have Primary Key constraint specified on table pemp. Why it doesn't work when Primary key is there? What is remedy for this?
I also referred MOS 1418545.1 however it seems none of the situation is applicable here.
Thanks in advance.
January 16, 2014 - 5:44 pm UTC
you are missing the primary key constraint on EMP_T
ops$tkyte%ORA11GR2> CREATE TABLE emp_t AS SELECT * FROM pemp WHERE empno IS NULL;
Table created.
<b>
ops$tkyte%ORA11GR2> alter table emp_t add constraint emp_t_pk primary key(empno);
Table altered.
</b>
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> INSERT INTO emp_t SELECT empno, ename, job || 'abcd' AS job, mgr,
2 hiredate, sal, comm, deptno FROM pemp PARTITION (p2_3000);
8 rows created.
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> alter table pemp exchange partition p2_3000 with table emp_t with validation update indexes;
Table altered.
Note that when you have enabled constraints like a primary key - without validation "is not valid"
http://docs.oracle.com/cd/E11882_01/server.112/e25523/part_admin002.htm#i1107555 When you specify WITHOUT VALIDATION for the exchange partition operation, this is normally a fast operation because it involves only data dictionary updates. However, if the table or partitioned table involved in the exchange operation has a primary key or unique constraint enabled, then the exchange operation is performed as if WITH VALIDATION were specified to maintain the integrity of the constraints.also, the index to enforce your primary on the PEMP table is going to be a global index (since you are not partitioning the table by the primary key - the index used to enforce the primary key must be globally partitioned or not partitioned at all which is just a special case of a globally partitioned index. So, using update indexes in 11g and before (before 12c's asynchronous global index maintenance) might be appropriate unless you want to rebuild the global index afterwards.
Why CTAS is giving problem?
Milind, January 17, 2014 - 4:11 am UTC
Hi Tom,
Thanks a lot for prompt reply.
Now instaed of creating empty table, adding Primary Key constraint and inserting data, I tried second approach as following i.e. using CTAS and adding Primary key. Now even I have Primary key constraint with status enabled and validated, why it gives the error.
SQL> drop table emp_t purge;
Table dropped.
SQL> create table emp_t as SELECT empno, ename, job || 'abcd' AS job, mgr, hiredate, sal,
2 comm, deptno FROM pemp PARTITION(p2_3000);
Table created.
SQL> alter table emp_t add CONSTRAINT pk_emp_t_empno PRIMARY KEY (empno);
Table altered.
SQL> select CONSTRAINT_NAME, STATUS, VALIDATED from user_constraints where table_name='EMP_T';
CONSTRAINT_NAME STATUS VALIDATED
------------------------------ -------- -------------
PK_EMP_T_EMPNO ENABLED VALIDATED
SQL> ALTER TABLE pemp EXCHANGE PARTITION p2_3000
2 WITH TABLE emp_t WITH VALIDATION UPDATE INDEXES;
ALTER TABLE pemp EXCHANGE PARTITION p2_3000
*
ERROR at line 1:
ORA-14097: column type or size mismatch in ALTER TABLE EXCHANGE PARTITION
Thanks in advance
Why CTAS is giving problem?
Milind, January 17, 2014 - 4:17 am UTC
Hi Tom,
Sorry to bother you. I got it. In emp_t table with CTAS, job column size become varchar2(24) because of job || 'abcd' . In pemp table it is varchar2(20).
When column size is corrected, it worked.
SQL> alter table emp_t modify job varchar2(20);
Table altered.
SQL> ALTER TABLE pemp EXCHANGE PARTITION p2_3000
2 WITH TABLE emp_t WITH VALIDATION UPDATE INDEXES;
Table altered.
Warm Regards.
Issue while Updating 100k records in a 10 Million table
Pa1, February 03, 2014 - 10:17 am UTC
Hi Tom,
Need your expertise in help resolve this problem we are facing. We have a table with 10 millions records and need to update this table with two new column values in a batch of 100k records. While doing the initial 100k we didnt see any issues but as we are trying to process further batches of 100k records, its taking considerable amount of time. We have indexes on this table which is used by the Update query. Is it causing the problem. Can we halt the indexing on this table during the bulk update and rebuild it again. Will this help?
A reader, February 14, 2014 - 8:20 pm UTC
Drop the index. Load the data and then create the index. That should be faster.
Oracle Magazine July/August 2014
Sokrates, June 25, 2014 - 3:25 pm UTC
just saw your article in Oracle Magzine July/August 2014 regarding the topic "large updates with CTAS".
One should probably add, that the algorithm given there has to be extended in case the table has virtual columns ( and that this might only work in 12c ):
1a. CTAS the new table without virtual columns ( because CTAS does not support them )
1b. add virtual columns
1c. ( only works in 12c ) if necessary, re-order virtual columns by setting columns invisible/visible again
records of atable
y.navya, July 10, 2014 - 9:21 am UTC
Hiiii tom
how to update all the records of a table in database
Shell script used to delete
Shuja, September 11, 2014 - 3:10 pm UTC
Hi Tom,
I was faced with a similar problem and considered using the create a new table and drop the existing. However I had already put a shell script in place and cron'd to run every 3 hours.
It seems to be working well, but I dont know if this is the most efficient method to tackle this in future purging of data.
Script tbl_clean.sh :-
#!/usr/bin/sh
clear
#-- Set Month and Row limit to delete
month="MAY"
limitRow=6
#-- Function to get total count for each table thats imported from the "table" file
getCount() {
sqlplus -S /nolog << EOF
connect / as sysdba
SELECT count(*) FROM $dataline
WHERE importdate LIKE '%$month%';
exit
EOF
}
#-- Function to delete X rows from the imported table list and commit the change
deleteRows() {
sqlplus -S /nolog << EOF
connect / as sysdba
DELETE FROM $dataline
WHERE IMPORTDATE LIKE '%$month%'
AND rownum < $limitRow;
commit;
EOF
}
cat tables | while read dataline
do
rows="$(getCount | sed 's/[^0-9]*//g')"
echo "Table " $dataline " has " $rows "rows"
if [ "$rows" = 0 ]
then
echo "No rows to delete\n"
else
echo $(deleteRows) "\n"
fi
done
Rgds,
Shuja
Intermediate Commit in direct load
Samby, November 18, 2014 - 6:46 pm UTC
Hi Tom,
I am working in Dataware housing project in which we insert 30-40 millions of record in a table using append hints (Direct load) from a select query
like
insert /*+ append */ into temp
select *
from all_objects;
I want to commit after 10k records, could you please suggest is there any way to do that without using cursor,loop,bulk collect.
Many Thanks in Advance
Samby
bulk update using merge and analytic functions
santosh, January 06, 2015 - 5:22 pm UTC
Is it advisable to merge update a table from another which stores some values which need to be extracted using last_value ordered by a date column ?
Is there a faster way to do this.
How to Update millions of records in a table
Shweta, February 25, 2015 - 6:17 pm UTC
Hi Tom,
What if there was an "Identity Column" in old table and I created the table "AS" instead of updating the same table? After the new table is created, would the identity column behave same when new records are added?
Update 5 millions rows using info in another table
Lang, March 12, 2015 - 2:07 pm UTC
Hi Tom,
I have table1 with 5 million rows that I need to update one column with info from table2 with 700 rows. For each record in table2, I need to go through all rows in table2 to find a match. What's the best way to do it. I am doing it in SQL Developer, Oracle 11g. Running for 40 hours now, no end in sight:
BEGIN
For i in (select id, inspectiondate, eventid from inspections)
LOOP
Update readings set inspectionid = i.eventid, modifiedby=''
where inspectionid is null and stationid= i.id and readingdate=i.inspectiondate;
END LOOP;
END;
We really need to get this done very soon!
Thanks a lot!
Jenny, April 28, 2015 - 9:26 am UTC
Is there a difference between doing a CTAS vs IAS for this usecase ?
Doing a CTAS requires me to re-create all indexes, triggers, constraints etc, instead can we just
1) make a copy of the original table
2) disable all indexes, triggers, constraints on original table
3) truncate original table, do IAS from temp table with modified columns to original table
4) enable all indexes, triggers etc on original table
Thanks Tom
Sunil Kumar Choubey, October 09, 2016 - 7:36 pm UTC
Thanks for this wonderful conversation
October 10, 2016 - 5:32 am UTC
Thanks for the feedback
Oracle Script
Prashant, May 06, 2019 - 9:17 am UTC
How to create script for drop tables older than 45 days from schema?
May 06, 2019 - 4:38 pm UTC
What do you mean by "older" ?
Data? Created ? Other?
Oracle Script
A reader, May 07, 2019 - 2:40 pm UTC
tables created date
May 10, 2019 - 8:06 am UTC
The CREATED column in *_objects tells you when the object was made:
select * from all_objects
where object_type = 'TABLE'
and created < sysdate - 45
and owner = 'THE_SCHEMA'
Thanks for the Excellent Briefing.
Khalid Mehmood, June 25, 2019 - 4:29 am UTC
Thanks for the Excellent Briefing.
Although I read this post after I was in the middle of Merging a huge table and was regretting but this post will surely help me for future tasks.
Rizwan Admani
Rizwan Admani, December 07, 2020 - 12:38 pm UTC
DECLARE
CURSOR ONE IS
SELECT EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO
FROM EMP
minus
SELECT EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO
FROM EMP2;
emp number;
emp1 number;
BEGIN
GO_BLOCK('BUTTON');
FIRST_RECORD;
for x in one
LOOP
select count(empno)
into emp
from emp2;
select count(empno)
into emp1
from emp;
fetch one into
:EMPNO,:ENAME,:JOB,:MGR,:HIREDATE,:SAL,:COMM,:DEPTNO;
if emp = emp1 then
null;
else
insert into emp2(EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO)
values(:EMPNO,:ENAME,:JOB,:MGR,:HIREDATE,:SAL,:COMM,:DEPTNO);
commit;
:per:=nvl(:Per,0)+1;
end if;
EXIT WHEN one%NOTFOUND;
END LOOP;
END;
December 08, 2020 - 5:11 am UTC
That is some nice Forms code there.... I'm not sure what its for, but thanks for posting ?
2022 Update
Georgi, February 23, 2022 - 11:54 am UTC
Hello,
Since we are in the 2022 now and times and technology has changed, I am wondering what is the up to date answer to the original question ?
To be more specific:
In our DWH, we need to update a column in approx. 400 tables split in different schemas, containing records ranging from 1,000,000 to 10,000,000,000 (yes, 10 billion...).
We have 10 tables with more than 1 billion, 40 with less than 1 billion and more than 100 million. And around a hundred with less 100 million and more than 1 million.
And table recreation is not an option...
The initial idea was to use dbms_parallel_execute and run the update one schema at a time. i did some reading and saw that in newer versions of Oracle, the parallel hint can be used even on a table that has no parallel in its definition or that has no partitions.
We have sufficient amount of resources, so would it be a good idea to use the integrated parallelism and in general which is the best approach ?
Thanks !
May 23, 2022 - 10:20 am UTC
If you're updating a significant fraction of the rows in a large table, the CTAS method will still be the fastest. The thresholds for what counts as "significant" and "large" are something you'll need to test in your environment!
Yes, you can use a parallel hint to run an update in parallel like this:
update /*+ parallel */t
set ...
(Ensure that parallel DML is enabled though)
The big benefit of this is it's easy to write. The downsides are:
- You're locking all the affected rows for the duration of the statement. This could be a problem if other sessions need to update the rows at the same time.
- It's all-or-nothing. So if the process is 90%+ complete and hits an error, the whole thing rolls back. You can use DML error logging to mitigate this.
In contrast, there's more code to write with DIY parallelism, making it more fiddly to use. The upside is you're processing rows in chunks. You can limit how many chunks are processed at the same time. So you're not locking the whole data set the whole time.
That said, there is an easier way - DBMS_REDEFINITION.EXECUTE_UPDATE. This is essentially a wrapper for an online CTAS method.
All you need is:
exec dbms_redefinition.execute_update ( 'update ... set ... ' );
I discussed this and other update tuning methods in this Office Hours session:
Further on 2022
Georgi, February 24, 2022 - 8:23 am UTC
Thanks for the quick and detailed reply !
I've watched your video and today I will do some testing with EXECUTE_UPDATE, so thanks !
The task is to add a new column for the insert date of the record to those ~400 tables, which have to be taken (updated) from another column in the same table. This column is different for most of the tables and for some it does not even exist, so in this case we will use SYSDATE. All records should have it, so everything should be updated.
This is needed because a regular purging process will be implemented later based on that insert date column. And we will have a maintenance window of around 4 hours, so locking rows is not an issue.
We do not want to use CTAS, because we have to do it for ~400 tables and there are lots of indexes, constraints, partitions, dependent objects etc.
For our tests we used dbms_parallel_execute and it works fine in our test environment, but it has significantly less amount of data.
Any additional comments based on the above is appreciated. Thanks !
February 24, 2022 - 4:09 pm UTC
For the tables without an existing column to copy use:
alter table ...
add insert_date date default sysdate not null;
This will do a "fast column add", instantly setting its value for all existing rows.
For the tables where you are copying from an existing column - my big question is:
Why bother?
If you need/want a consistent column name across all tables, there are a couple of tricks you could use to avoid all the updates.
First is to rename the existing column:
alter table ...
rename column old_name to new_name;
If this will cause lots of code change headaches, another is to "clone" the column using views or virtual columns, e.g.:
create view ... as
select t.*, old_name as new_name
from ... t
Finally if you want to minimize downtime, you could look at using Edition-based Redefinition to do the change fully online.
Thanks
Georgi, February 25, 2022 - 2:11 pm UTC
Thank you so much, Chris !
The fast column add is really helpful ! I read about it and tried it, and it will save sufficient amount of time for those tables without any date column. Really appreciated !
To your question... For the other tables, the value in the date column can change, so it does not guarantee the age of the record. In this case we cannot create a view, rename the column, or add a virtual one based on it. This is why we have to add a new column, update it with whatever other date we have at that point in time and consider it as the creation date of the record, so that we can later delete based on it.
I guess that's life working with legacy applications :)
February 28, 2022 - 4:10 am UTC
Just be aware that to enable fast add with add a hidden column to your table. This is *unlikely* to cause any issues, but you might some care with thinks like 'alter table shrink' or partition maintenance should that need ever arise
A reader, August 12, 2022 - 12:09 am UTC
What if i cannot delete the table, like it has lots of dependencies. I want to update the table with millions of rows.
August 12, 2022 - 3:05 am UTC
check out dbms_parallel_execute for that