updating millions of records
Om, November 11, 2002 - 11:09 am UTC
Hi Tom,
Could you please elaborate
CREATE TABLE new_table as select <do the update "here"> from old_table;
the above statement with a suitable example please
November 11, 2002 - 5:45 pm UTC
ok, say you wanted to update emp to set ename = lower(ename). Instead, you could do this:
ops$tkyte@ORA817DEV.US.ORACLE.COM> create table new_emp as
2 select empno, LOWER(ename) ename, JOB,
3 MGR, HIREDATE, SAL, COMM, DEPTNO
4 from emp;
Table created.
ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> drop table emp;
Table dropped.
ops$tkyte@ORA817DEV.US.ORACLE.COM> rename new_emp to emp;
Table renamed.
Million records update
Ramesh G, November 11, 2002 - 1:44 pm UTC
What if a table has over 100 million records and if i only want to update 1 million? If your method is still applicable could you elaborate it.
Many thanks in advance.
November 11, 2002 - 6:33 pm UTC
most likely -- yes. I don't have a 100million row table to test with for you but -- the amount of work required to update 1,000,000 indexed rows is pretty large. Fortunately, you are probably using partitioning so you can do this easily in parallel -- bit by bit.
"How to Update millions or records in a table", version 8.1.7
John Bittner, November 11, 2002 - 2:21 pm UTC
This is absolutely a viable approach, and one we have used repeatedly. One of our apps updates a table of several hundred million records.
The cursor..For loop approach for the update was calculated to take 53.7 years to complete!
We institued the Insert into a dummy table append with nologging, and were able to complete the "update" in under 30 minutes.
With nologging, if the system aborts, you simply re-run the 'update' again, as you have the original data in the main table. When done, we swap the partition of original data with the 'dummy' table (the one containing new values), rebuild indexes in parallel, and wha-la! Our update is complete.
i.e, to update field2 in a table:
1) First create your dummy hold table: create table xyz_HOLD as select * from xyz where rownum<1. Alter tablexyz nologging.
2) insert /*+ append parallel (xyzhold,12) */ into xyz_hold xyzhold (field1, field2, field3) select /*+ parallel (x,12) */ xyz.field1, my_new_value_for_field2, xyz.field3 from xyz x where blah blah blah.
3) when done, either rename the table, or swap the partition if your original table is partitioned, and you only updated one partition as we do. Obviously you need to rebuild indecies, etc as required.
Hope this helps!
updating millions of records
A.Ashiq, November 11, 2002 - 11:28 pm UTC
Hi Tom,
As u suggested us to create a new table ,then drop the original table and rename the new table to original table instead of updating a table with millions of records.But what happen to dependent objects ,everything will get invalidated.Yeah ,of course it'll recompile itself when it called next time.But again it(dependent objects) has to do parsing.Is it Ok.
November 12, 2002 - 10:11 am UTC
It is OK.
in case of delete
A reader, November 12, 2002 - 9:58 am UTC
We've a similar situation., We delete around 3 million records from 30 million rows table everyday.
There is no logical column to do partition.,
I guess the insert into a new table will take considerable time with 27 mil records.. Please let me know what is the best approach.
November 12, 2002 - 10:35 am UTC
wait 10 days so that you are deleting 30 million records from a 60 million record table and then this will be much more efficient.
Time it some day though. 3 million records on an indexed table will take considerable time. There is a chance that INSERT /*+ append */ select <rows to keep> done nologging, parallel with create indexes in parallel could best the delete.
A reader, November 12, 2002 - 12:43 pm UTC
Tom,
Recently I had conducted a interview in which one the dba mentioned that they had a table that might conatin 10 million records or might be 1 million. He meant to say they delete the records and some time later the table will be populated again and viceversa.
Tom according to you do you consider partitions for such tables and if yes which type of partition..
Thanks.
November 13, 2002 - 12:31 pm UTC
hard to tell -- is the data deleted by something that is relatively constant (eg: the value in that column doesn't change - so the row doesn't need to move from partition to partition). If so -- sure, cause we could just drop partitions (fast) instead of deleting the data.
Agree with John - insert append / CTAS / partition swap is the only way to fly
Jack Silvey, November 12, 2002 - 1:19 pm UTC
I work with John Bittner, one of the previous reviewers. I second what he said absolutely. It is the only way to fly. We also have an absolutely incredible stored procedure that rebuilds all of our indexes concurrently after the load, using the Oracle job scheduler as the mechanism of allowing separate threads in pl/sql.
addendum
A reader, November 13, 2002 - 10:39 am UTC
This process was introduced to our environment by a master tuner and personal friend, Larry Elkins. This was a totally new paradigm for the application, and one that saved the entire mission-critical application. The in-place updates would not have worked with the terrabytes of data that we have in our database.
How to Update millions or records in a table
Boris Milrud, November 19, 2002 - 1:55 pm UTC
In response to the Jack Silvey (from Richardson, TX ) review, where he wrote "It is the only way to fly. We also have an absolutely incredible stored procedure that rebuilds all of our indexes concurrently after the load, using the Oracle job scheduler as the mechanism of allowing separate threads in pl/sql":
Could you provide more information about that procedure and how to rebuild multiple same-table indexes concurrently using Oracle job scheduler?
Thanks,
Boris.
November 19, 2002 - 5:29 pm UTC
instead of
begin
execute immediate 'alter index idx1 rebuild';
execute immediate 'alter index idx2 rebuild';
end;
you can code:
declare
l_job number;
begin
dbms_job.submit( l_job, 'execute immediate ''alter index idx1 rebuild'';' );
commit;
dbms_job.submit( l_job, 'execute immediate ''alter index idx2 rebuild'';' );
commit;
end;
Now, just set job_queue_processes > 0 to set the "degree of threads" and in 8i and before set job_queue_interval to say 60 or so and there you go.
How to Update millions or records in a table
Boris Milrud, November 19, 2002 - 9:10 pm UTC
Thanks, Tom.
Your status said that you had a large backlog, so I decided not to wait for your response and tried myself using dbms_job.submit() calls. At first, it did not work (job_queue_processes was 0), but after I set it to 12 it started working. The only one difference between your code and mine is that I issue just one commit at the end. It should not matter, right?
I selected 1 mln. rows table and rebuild 5 non-partitioned indexes with 'compute statistics parallel nologging' clause.
Here is the numbers I've got: rebuilding indexes sequentually consistently took 76 sec., while using dbms_job.submit() calls took around 40 - 42 sec.
I said "around", because the technique I used may not be perfect, though it served the purpose. I recorded the time right after the commit statement at the end of PL/SQL block - that's the start time. Then I kept querying user_jobs view every 2 - 3 sec, until the last of the 5 jobs were gone. That's was the end time.
The last question on this topic: is user_jobs view is the right place to look in order to determine that's rebuilding is done and how long it took? In package I am writing, I do massive delete operation, then rebuilding indexes, then starting the next routine. What would be the best way to detect the end of rebuiding, in order to proceed with the next call?
Thanks.
Boris.
November 19, 2002 - 10:14 pm UTC
You can use user_jobs or dba_jobs but -- you might just want to put some "logging" into your jobs themselves so you can monitor their progress and record their times.
You can even rebuild all the indexes in a partition simultaneously
ramakrishna, November 21, 2002 - 12:50 am UTC
Thanks Tom,
In our environment, we have partitioned tables and we use:
ALTER TABLE table_name MODIFY PARTITION part_name REBUILD UNUSABLE LOCAL INDEXES
and this rebuilds all the indexes in the partition at one shot. We found this much faster than doing the indexes one by one. We will now try out if we can submit multiple such jobs in parallel (one for each partition of the table).
regards
ramakrishna
in case of deletes
carl, November 29, 2002 - 5:17 pm UTC
Hi Tom,
Thanks so much for your web site and help. It is our
number 1 reference in times of fear and loathing.
This is what we came up with concerning mass updates
INV 50M
INVINS 10M
INVDEL 7M
There are indexes on INV.KEY and INVDEL.KEY
Execution Plan ( for deletes and updates )
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 FILTER
2 1 TABLE ACCESS (FULL) OF 'INV'
3 1 INDEX (RANGE SCAN) OF 'IX_INVDEL01' (NON-UNIQUE)
alter table INVTMP nologging;
-- INVINS contains delta inserts and updates
insert /*+ APPEND */
into INVTMP
select *
from INVINS t1;
-- INVDEL contains delta deletes and updates
insert /*+ APPEND */
into INVTMP
select *
from INV t1
where not exists ( select null
from INVDEL t2
where t2.KEY = t1.KEY );
alter table INVTMP logging;
drop table INV;
rename INVTMP to INV
-- build indexs etc
This is what we came up with and is to the fastest
approach we've tested.
Any comments or suggestions are welcome and appreciated.
November 30, 2002 - 8:49 am UTC
I might try an anti-join on the NOT EXISTS and doing the work in a single parallel CTAS
create table invtmp as
select * from invins
UNION ALL
select t1.*
from inv t1, invdel t2
where t1.key = t2.key(+)
and t2.key is null;
add parallel /nologging as needed.
in case of deletes - many thanks
carl, December 01, 2002 - 7:45 pm UTC
Ran the test cases at home 500K rows in INV and 50K rows in
INVINS and INVDEL
My way: 16.33 secs
Your way: 08.35 secs
Many thanks.
A bit confused
Choche, March 17, 2003 - 10:30 am UTC
I was just wandering that none of the reviews made mention if these techniques could be applied in a multi-user environment where multiple users could be updating the same table at the same time.
March 17, 2003 - 10:32 am UTC
sorry -- I though it obvious that in most cases "no" is the answer. we are copying the data or locking excessive amounts of it -- or disabling indexes and the like.
This is a "batch" process here, to update millions of records.
Replacing PK values
Michael, July 25, 2003 - 6:42 pm UTC
We need to replace the main ID for all entries in the database which is the PK in about 100 tables, largest of them have about 2 mln rows. New and old values for ID are stored in the lookup table, and there about half a million of them. The original approach was to replace PK in all tables for one ID value, commit the changes and move on to the next ID. This way referential integrity would be guaranteed even if update fails at any stage, and we do not need a long rollback segments and can restart the process at the point it aborted. But this update is too lengthy, and in our case would take almost a month of continuous processing. Do you have any suggestions on how to improve the performance of this critical update? Would it be better to add new ID as an additional column into one table at a time and populate all of it at once as "create table ... as select" (with nologging option), and then rename table and drop old ID column?
July 25, 2003 - 8:42 pm UTC
sounds like I would:
create table new_table as select <join>;
create table new_table2 as select <join>;
etc etc etc etc
drop old tables
add indexes
add primary/fkeys/constraints etc....
that is -- rebuild in bulk, using nologging (no redo, no undo generated), using parallel when appropriate.
Update million Rows
Reaz, July 26, 2003 - 5:41 am UTC
Dear Tom,
We have this situation, where we load data from external files numbering 7. Next we had to run a transformation process, which checks if the row exists in the target table and updates the existing row with the new one. Otherwise inserts the row to the target table.
Our current process is very slow, sometimes taking a night long to complete for say 200,000 rows.
The procedures discussed so far meant of updating only, but in our case we need to update and insert too.
I would appreciate your suggestion.
Thanks.
Reaz.
July 26, 2003 - 12:53 pm UTC
read about MERGE in the sql reference guide.
A single batch?
A reader, July 26, 2003 - 9:54 am UTC
Tom,
Is it still okay, if these steps (create a new table, drop the old one, and them rename the new table as old, create indexes etc.) are coded as the batch update program? OR would you suggests these should be detailed as the Steps that should be performed under the supervision of a DBA? What I would like to know is that can we still have the steps in a single batch routine (as update would have had) or instead of the single batch these should be DBA instructions instead?
Thanks
July 26, 2003 - 12:54 pm UTC
I would have these performed under the watchful eye of someone.
it is a one time bulk manipulation. it won't be a "pretty program" with tons of gracefully logging/error handling. You'll be doing it "down and dirty" (at least I would be). It would need someone to watch over it.
What about space consideration
A reader, July 28, 2003 - 1:39 am UTC
Hi Tom,
I had a similar problem where my table had gone big and i wanted to truncate some old data, my table was big more then 20 million records, i could not use the create temp table truncate and insert flow as the space of the database was not enough to hold 2 copies of the large table, I thought of doing a partition on table and then truncate one partition of table but that does not work.
Can you suggest some way which caters for this space constraint also
July 28, 2003 - 7:04 am UTC
why didn't the partitioning work for you.
How to update part of a Very Large Table
Mohan K, July 28, 2003 - 6:11 am UTC
The update operation can be made simple by sepearting out the query and update operation. To do that create another table which stores the rowids of the records in the original table which has to be updated along with the the value to be updated. Then run the PL/SQL script to update the records one by one. To test the above method the following scripts can be used.
SQL> create table test_update(id number, name varchar2(100), description varchar2(4000)) storage initial(48M next 4m);
SQL> declare
v_n number;
v_name number;
v_desc number;
i number;
begin
for i in 1..1000000 LOOP
insert into test_update(id, name, description) values(i, 'Test Name'||i, 'Test Name '||i||' description ');
END LOOP;
end;
/
Elapsed: 00:04:277.23
The above script will insert 1 million rows.
SQL> select count(*) from test_update where description like '%5 description%';
COUNT(*)
----------
100000
Elapsed: 00:00:02.63
SQL> create table test_update_rowids as select rowid rid, description from test_update where description like '%5 description%';
Elapsed: 00:00:54.58
The table test_update_rowids stores the rowids and the new values that has to be updated. ie 100000 rows needs to be updated.
SQL> declare
begin
for c1 in(select rid, description from test_update_rowids)
LOOP
update test_update set description=c1.description||' after update' where rowid=c1.rid;
END LOOP;
end;
/
Elapsed: 00:01:82.17
The above script performs the updation.
Mohan
July 28, 2003 - 7:37 am UTC
the above script performs an "updation" in the worst possible way, consuming the most resources.
it should have been simply:
update test_udpate set description = description || ' after update ' where description like '%5 description%';
Never do procedurally that which can be done in a single sql statement.
Updating 100K of 1 Million rows
Mohan K, July 28, 2003 - 8:38 am UTC
I ran the update statement as a single SQL statement and the time elapsed is slightly more than the above method. It has to be tested against update statements containing complex queries or join operations.
SQL> update test_update set description = description || ' after update ' where description like '%5 description%';
Elapsed: 00:01:100.13
Mohan
July 28, 2003 - 8:53 am UTC
trust me -- that difference is neglible and the amount of time you spent writing code wasn't included.
Also, you FORGOT to add in the time to populate your temporary table -- no?
if you can do it in a single sql statment (and you almost always can) you SHOULD.
Lets do you benchmark a little better:
ops$tkyte@ORA920> create table test_update(id number, name varchar2(100), description varchar2(4000));
Table created.
ops$tkyte@ORA920>
ops$tkyte@ORA920> insert /*+ append */
2 into test_update
3 select rownum, 'Test Name' || rownum, 'Test Name ' || rownum || ' description '
4 from big_table.big_table
5 where rownum <= 1000000;
1000000 rows created.
ops$tkyte@ORA920> commit;
Commit complete.
ops$tkyte@ORA920>
ops$tkyte@ORA920> select count(*)
2 from test_update
3 where description like '%5 desc%' ;
COUNT(*)
----------
100000
ops$tkyte@ORA920>
ops$tkyte@ORA920> select count(*)
2 from test_update
3 where description like '%6 desc%' ;
COUNT(*)
----------
100000
ops$tkyte@ORA920>
ops$tkyte@ORA920> exec runstats_pkg.rs_start
PL/SQL procedure successfully completed.
ops$tkyte@ORA920>
ops$tkyte@ORA920> update test_update
2 set description = description || ' after update'
3 where description like '%6 description%';
100000 rows updated.
ops$tkyte@ORA920>
ops$tkyte@ORA920> exec runstats_pkg.rs_middle
PL/SQL procedure successfully completed.
ops$tkyte@ORA920>
ops$tkyte@ORA920> create table test_update_rowids
2 as
3 select rowid rid, description from test_update where description like '%5 description%';
Table created.
ops$tkyte@ORA920>
ops$tkyte@ORA920> begin
2 for c1 in(select rid, description from test_update_rowids)
3 LOOP
4 update test_update
5 set description=c1.description||' after update'
6 where rowid=c1.rid;
7 END LOOP;
8 end;
9 /
PL/SQL procedure successfully completed.
ops$tkyte@ORA920>
ops$tkyte@ORA920>
ops$tkyte@ORA920> exec runstats_pkg.rs_stop
Run1 ran in 1376 hsecs
Run2 ran in 9806 hsecs
run 1 ran in 14.03% of the time
<b>so, according to me, the SINGLE (efficient, effective, correct way) update statment runs in about 14% of the time. furthermore:</b>
Name Run1 Run2 Diff
STAT...recursive cpu usage 5 4,662 4,657
STAT...CPU used by this sessio 448 5,397 4,949
<b>only 4,949 MORE cpu seconds using the row by row approach...</b>
STAT...CPU used when call star 448 5,397 4,949
STAT...free buffer inspected 10,162 17,039 6,877
STAT...db block gets 111,705 104,450 -7,255
STAT...dirty buffers inspected 9,016 16,620 7,604
STAT...calls to kcmgas 7,755 68 -7,687
STAT...Cached Commit SCN refer 7,718 0 -7,718
STAT...switch current to new b 7,719 0 -7,719
STAT...cleanouts only - consis 0 7,720 7,720
STAT...immediate (CR) block cl 0 7,720 7,720
STAT...commit txn count during 0 7,721 7,721
STAT...cleanout - number of kt 3 7,726 7,723
STAT...consistent gets - exami 3 7,760 7,757
LATCH.undo global data 221 7,985 7,764
STAT...redo entries 100,700 108,727 8,027
LATCH.redo allocation 100,795 108,945 8,150
STAT...db block changes 202,723 210,907 8,184
STAT...Elapsed Time 1,385 9,825 8,440
STAT...physical reads 7,894 16,476 8,582
LATCH.checkpoint queue latch 26,845 38,012 11,167
LATCH.cache buffers lru chain 35,638 52,827 17,189
STAT...no work - consistent re 7,741 100,072 92,331
STAT...table scan blocks gotte 7,719 107,721 100,002
STAT...table scan rows gotten 1,000,000 1,100,002 100,002
STAT...execute count 27 100,092 100,065
STAT...buffer is not pinned co 7,763 107,836 100,073
STAT...calls to get snapshot s 56 100,241 100,185
LATCH.shared pool 172 100,778 100,606
STAT...session logical reads 119,503 220,165 100,662
STAT...consistent gets 7,798 115,715 107,917
LATCH.library cache pin 233 200,865 200,632
STAT...recursive calls 177 200,870 200,693
LATCH.library cache 332 201,322 200,990
LATCH.cache buffers chains 587,377 835,167 247,790
STAT...redo size 31,323,804 31,836,904 513,100
STAT...session pga memory max 0 524,824 524,824
STAT...session pga memory 0 1,049,112 1,049,112
Run1 latches total versus runs -- difference and pct
Run1 Run2 Diff Pct
754,602 1,553,986 799,384 48.56%
PL/SQL procedure successfully completed.
<b>and worse, it used 2x the number of latches! (scalability inhibitors)
There is nothing good about row at a time processing, it is something to do ONLY when a set based operation isn't feasible</b>
This is very good example for updating the records
Bipin Ganar, July 28, 2003 - 9:00 am UTC
Hi Tom,
Yes this is good example to update the record but my problem
is i have 1Million records in one table. I want update and give hike of 10% of their salaries in one table but that table is accessed by n no of people at the same moment. I can not drop the table and update and second how can i define tablespace in this script
Create Table a as select * from emp_sal;
July 28, 2003 - 9:10 am UTC
create table a
tablespace foo
as
select * from emp_sal;
slightly different problem!
Srikanth Adiga, July 28, 2003 - 12:33 pm UTC
My table:
5M records are inserted each month.
Operations:
5M records inserted on the first day of the month
5M records updated on rest of the days on the month.
Note that this update is update each row with diff values.
I am using OCI and using batch operations. Can you please suggest me what else can I do within my program to improve the performance of my updates? I tried multithreading, but that did not help.
anything else?
July 28, 2003 - 12:48 pm UTC
it only takes a couple of minutes to update 5 million rows on my desktop pc -- so, there must be something more involved here.
I'd be able to do that in minutes -- why is it taking you so long -- what is your bottleneck.
space considerations
A reader, July 28, 2003 - 8:56 pm UTC
could not create another temp table which holds the data same as teh table being partitioned as there was not enuff space in the harddisk to hold 2 or 1.5 to be precise copies of the same huge table,is there a way to partition without having to drop of truncate or transfer data to some temp location and then partition
July 29, 2003 - 6:52 am UTC
disk is cheap.
time is money.
your time spent just thinking about this problem cost much more then the disk to solve it would.
penny wise, pound foolish. How big is this 20million row table (assuming you are the same "a reader" from above). Maybe 1-3 gig?
slightly different problem!
Srikanth Adiga, July 29, 2003 - 12:30 am UTC
couple of minutes!!! Oh dear, thats really too good.
Let me reitterate my case:
My table schema is something like this:
SubscriberVsQtyTable:
int client-id
int year
int month
int quantity
primary key - (client-id, year,month)
This table has 5M client ids.
Now every day, I have update the quantity field with the txns made for that day. Note that, I have to update one subscriber at a time since the number of txns made by each subscriber if different.
So, even now do you believe you can do this in couple of minutes? My app takes about 30 to 50 mins even on higher machines like 8 CPU.
As I said, this is a C++ app, running Solaris + Oracle 8i/9i. We are using OCI.
Thanks,
July 29, 2003 - 7:01 am UTC
if you are doing that work with more then a single sql statement, I could see it taking a while.
If you do that work in a single update, no, I don't see it taking more then a couple of minutes.
Time for Update
Mohan K, July 29, 2003 - 8:27 am UTC
It is my understanding that the time for update depends on the following factors. There may be something else also involved.
1. Row Chaining/Migration caused by update
2. The where cluase is used to select the rows to be updated.
3. Percentage rows updated in a block. The more the better. Even if one row is updated in a block containing 10 records the entire block has to written by DBWR and LGWR.
So just the number of records is not that matters. 100 million recorsd may takes days for one table and may take less than a hour for another table with few columns.
Mohan
July 29, 2003 - 8:31 am UTC
1) updating a quantity (number) field is unlikely to do that.
2) correct
3) correct
no, i disagree with the last part. to update 5million rows, a single unindexed column, should not take "a really long time". I was answering their direct question -- they update every row, on every block. so #1 and #3 really don't come into play. The biggest factor is #2 here
Verifing if neccesary to update, before update
juan carlos reyes, July 29, 2003 - 9:09 am UTC
Hi Tom, I had seen several times that you can
increase the performance by evaluating if you really need to update, or maybe it was only my impression.
For example
Update table set income = 500
where level = 7
AND NOT INCOME = 500
The AND NOT INCOME = 500, allows to avoid unncesary updates,
my point is a question
How advisable is it, does it really increase performance?
As you are always thinking how to increase performance, tt could be a good idea in newer version of Oracle, that if Oracle verifies that the value was set to that value, not update
:)
July 29, 2003 - 11:07 am UTC
if there are more then a handful of records where income = 500, then yes, this will definitely increase performance by
a) decreasing the latching
b) decreasing the redo
c) decreasing the undo
we cannot make that optimization for you -- it would stop triggers and other EXPECTED side effects from happening.
A reader, July 29, 2003 - 11:42 am UTC
:) Thanks,
Maybe then it could be an optional feature
UPDATE_ONLY_IF_DIFFERENT hint
I think it could interest several people
who don't use trigger, neither constraints ,etc.
Stu Charlton, July 29, 2003 - 2:28 pm UTC
"Note that, I have to update one subscriber at a time since the number of txns made by each subscriber if different."
That still seems doable in 1 SQL statement, even if you wanted to merge the updates. Couldn't you just do a COUNT(*) grouped by client_id, year, and month?
July 29, 2003 - 2:38 pm UTC
I agree -- most things CAN be done in a single sql statement.
easier said then done in real life situations
A reader, July 29, 2003 - 8:59 pm UTC
Hi Tom,
I agree you may think getting a disk is cheaper, that is only when u are considering getting up or just going to the store and getting a disk,
In real life conditions specially corporates this is not as easy as it seems nor is it less costly.
A Simple purchase of disk has to first be justified
which is quite difficuult, i dont think i have justification as yet)then be approved then be purchased by purchase dept then be installed or fixed by the server teams, server team may want to take the system down which means more dollars much more.These are just things off my head, when we acutually inititate the purchase process , there will be definitly some more steps and processes.
Coming back to my question, am i right in saying that we can't partion an existing table without copy the data to some temp location,Let me know if there is any other way.
Thanks
July 29, 2003 - 9:27 pm UTC
justification
you pay me alot
disk is cheap.
think about it.
(sometimes it really can be that simple)
we are talking about a couple of gig (like 1 to 3 -- small, tiny -- my /tmp has more room then that) here, something my LAPTOP would not have an issue with.
in order to partition -- think about it -- you'll need the SOURCE DATA and the PARTITIONED DATA -- at the same time, for a period of time. no magic there.
slightly confused !
Srikanth Adiga, July 30, 2003 - 12:00 am UTC
Thanks, Stu and rest for your updates.
Taking this buit more further, since I am confused :)
[Sri]
"Note that, I have to update one subscriber at a time since the number of txns
made by each subscriber if different."
[Sri]
[Stu Charlton]
That still seems doable in 1 SQL statement, even if you wanted to merge the
updates. Couldn't you just do a COUNT(*) grouped by client_id, year, and month?
Followup:
I agree -- most things CAN be done in a single sql statement.
[Stu Charlton]
Sorry, how would one do this?
If my table has two rows;
clientid =1,year=july,year=2003,quantity=10
clientid =2,year=july,year=2003,quantity=20
Now I have to update (clientid=1)'s quantity by 15 and (clientid=2)'s quantity by 25.
How would you manage this in a single SQL? Like this there would be 5M rows to be updated.
Btw, my table is indexed on clientid,year and month.
July 30, 2003 - 7:24 am UTC
what is the OTHER table you are updating from? you must have a detail table elsewhere from where you derive that 15 and 25. So, assuming something like this:
ops$tkyte@ORA920> create table t ( clientid int, month int, year int, quantity int );
Table created.
ops$tkyte@ORA920> create table txns ( clientid int, month int, year int );
Table created.
ops$tkyte@ORA920>
ops$tkyte@ORA920> insert into t values ( 1, 7, 2003, 10 );
1 row created.
ops$tkyte@ORA920> insert into t values ( 2, 7, 2003, 20 );
1 row created.
ops$tkyte@ORA920>
ops$tkyte@ORA920> insert into txns select 1, 7, 2003 from all_objects where rownum <= 15;
15 rows created.
ops$tkyte@ORA920> insert into txns select 2, 7, 2003 from all_objects where rownum <= 25;
25 rows created.
ops$tkyte@ORA920>
ops$tkyte@ORA920>
ops$tkyte@ORA920> select * from t;
CLIENTID MONTH YEAR QUANTITY
---------- ---------- ---------- ----------
1 7 2003 10
2 7 2003 20
ops$tkyte@ORA920> update t
2 set quantity = quantity + ( select count(*)
3 from txns
4 where txns.clientid = t.clientId
5 and txns.month = t.month
6 and txns.year = t.year );
2 rows updated.
ops$tkyte@ORA920> select * from t;
CLIENTID MONTH YEAR QUANTITY
---------- ---------- ---------- ----------
1 7 2003 25
2 7 2003 45
there you go.
(it is a shame you are using the wrong datatypes -- only a DATE should be used to hold, well, DATES. Using numbers to hold a year and a month isn't a good practice)
SCN,CHECK POINT
AATIF, July 30, 2003 - 1:03 am UTC
i ask about whta is difference SCN system commit number,systemt change number and check point.
Which base oracle sync the database SCN or CHECKPOINT ,
when check point ocur who number write in control file headre,datafile and log file header.
dear tom.
your web site say me not ask more question now but i need clear my point.
AATIF.
July 30, 2003 - 7:30 am UTC
er?
i remember you asking this on another page -- yes.
and I also remember answering this on that other page -- yes.
I fail to see how this relates to this page.
anyway there IS no difference between
scn
system commit number
system change number
they are the same.
as for the rest of the comment -- it doesn't "make sense" (i do suggest a read through of the server concepts guide, many questions are answered there!)
More clarification
Srikanth Adiga, August 01, 2003 - 12:34 am UTC
>what is the OTHER table you are updating from? you must
>have a detail table elsewhere from where you derive that
>15 and 25.
It is coming from an OCI application after processing some files.
August 01, 2003 - 7:49 am UTC
use bulk processing - update between 100 and 500 rows at a time in a single call using array processing.
Or, put the data where it belongs - not in files, in tables -- and process the data in the database rather then in C
Srikanth Adiga, August 06, 2003 - 10:16 am UTC
>what is the OTHER table you are updating from? you must
>have a detail table elsewhere from where you derive that
>15 and 25.
It is coming from an OCI application after processing some files.
>Followup:
>use bulk processing - update between 100 and 500 rows at a >time in a single call using array processing.
Yes, that is what we do. It takes about 60 mins to update 5M records on a 2CPU machine. Is this much expected?
If I do the same in multiple threads I do not see any performance improvement i.e. 1M records parallely updated in 5 threads.
Any idea why?
August 06, 2003 - 10:55 am UTC
are you sure they are going in parallel and not blocking/locking each other.
verify that all 5 sessions are in fact not only ACTIVE (v$session) at the same time but that you are not blocking yourself.
and you have taken some statspacks during your testing to see if you have any obvious bottlenecks right?
Explicit commit !
Nathan, August 07, 2003 - 5:04 am UTC
Tom,
Why do we commit explicitly after submitting jobs ?
Thanks
Nathan
/*------------------------------------------------
Followup:
instead of
begin
execute immediate 'alter index idx1 rebuild';
execute immediate 'alter index idx2 rebuild';
end;
you can code:
declare
l_job number;
begin
dbms_job.submit( l_job, 'execute immediate ''alter index idx1 rebuild'';' );
commit;
dbms_job.submit( l_job, 'execute immediate ''alter index idx2 rebuild'';' );
commit;
end;
---------------------------------*/
August 09, 2003 - 3:51 pm UTC
because if you do not YOUR session can "see" the job in the job queue but the job queue processes cannot!
How many of us have sat there for minutes going "when is the stupid thing going to run" and until we exit sqlplus it does nothing :)
Explicit Commit
Nathan, August 07, 2003 - 8:51 am UTC
Apologies,
I must withdraw my previous question... i was obviously not thinking straight. I was thinking about a similar sitaution where a commit was issued ( before dbms_job.submit ) with a comment /* do not remove */ without any explanations... I'm still pondering why the commit should be there.
Sorry for the trouble.
Regards
Nathan
I have to take issue....
cheaper than disk, August 07, 2003 - 10:00 am UTC
Your response about justification for buying disk space is outdated...
"justification
you pay me alot
disk is cheap.
think about it.
(sometimes it really can be that simple)"
Anybody else out there NOT paid alot ANYMORE?!
;-)
Re: Explicit Commit - To Nathan
A reader, August 07, 2003 - 3:25 pm UTC
Tom is out till weekend..
"I'm still pondering why the commit should be there. "
The commit after dbms_job is requierd so that other sessions can see the job_queue...which typically is the
requirement...As usual, Tom can confirm..
Menon:)
Reader
A reader, August 10, 2003 - 1:27 pm UTC
"
create table invtmp as
select * from invins
UNION ALL
select t1.*
from inv t1, invdel t2
where t1.key = t2.key(+)
and t2.key is null;
"
Like to know why
"
where t1.key = t2.key(+)
and t2.key is null
"
better than
where t1.key != t2.key with ALWAYS_ANTI_JOIN hint
August 10, 2003 - 2:42 pm UTC
because there is no "always anti join" hint?
you could of course use the CBO, and a NOT IN subquery in which case it'll do it itself.
Su Baba, August 14, 2003 - 5:54 pm UTC
So if I have a big table, x, that needs to be updated, I would
create table y nologging as select * from x...
retrieve index definition from x
build indexes on y
drop table x
rename y to x
If I have all these done in a PL/SQL batch job, what's the best way to retrieve index definition and form create index statement from table x? I want to create indexes on y that have the exact storage parameter, tablespace, maybe partition, etc. as the ones on table x.
By the way, the database version is 8.1.7.
thanks
August 14, 2003 - 8:01 pm UTC
you COULD, not WOULD, it is an option.
you don't have your create indexes somewhere?
guess I would exp rows=n, imp indexfile=foo to get them -- or DBMS_METADATA.get_ddl in 9i
How to update part of a Very Large Table
Mohan K, August 28, 2003 - 7:03 am UTC
please refer "How to update part of a Very Large Table July 28, 2003"
When updating a large table in a database having comparatively less buffer cache one thing to make sure is that blocks are updated in a continuous order. Suppose some records of block1 is updated first in the beginning and next after updating some other blocks, then block1 will be written twice by DBWR and LGWR. After the first update block1 may get flushed out by LRU algorithm. If all records in block1 is updated at a time then block1 is wriiten by LGWR and DBWR only once. If the update statement is doing a full table scan then bloacks are updated in order. But the order can be different when the is using an index for selecting the rows.
The SQL statement that makes sure of that is
update (select id, description from test_update where rowid in(select rowid from (select rowid from test_update order by rowid))) set description=description||' new val' where description like '%5%';
August 28, 2003 - 7:54 am UTC
not necessarily so -- first lgwr doesn't write blocks -- it writes redo. second, dbwr may or may not write the block more then once.
I would never never never use your approach. why I would want to
a) read the entire table
b) sort the rowids
c) distinct that set (implied)
d) join to it (and you are making the assumption the table in the IN will be the driving table and that a hash join will not be used and ..... lots of other stuff)
when a simple:
update T set d = d || 'x' where d like '%5%'
will do it -- if you are worried about the index on d being used - you would simply FULL hint it -- but doubtful that it would be necessary.
the additional work you add would more then offset the work you are trying to save.
update large table
Vera, August 28, 2003 - 2:50 pm UTC
Hi Tom and thank you again and again for all your work.
My problem is that I cannot simply replace update with creating a new table. The table I need to update - I'll call it 'pension table' - has about 1.5 mln rows. I have to update "amount" fields for 1994, 1995,1995 etc. years, and the way how I update these fields depends on status of the participant at each particular year. To determine this status I built a function that looks through another tables and takes several input parameneters including each particular financial year because it determines not just status of a participant, but status he/she had in particular FY. (This function I call a "box of cockroaches"). Basically, I ask Oracle to bring me annual monthly amount for Mr/Mrs 'X' for, let's say, 1994 and if 'X' had status 'retired' in 1994 then I want this amount to be calculated one way, if 'X' was 'deferred' -another way and if 'X' was 'dead' third way.
As you alredy guessed, updates take long time - about an hour for each year. I wonder what approach would you use in a situation like this?
Thank you
By the way, what is a status of your new book?
August 29, 2003 - 8:35 am UTC
I would use a CASE statement in SQL without using PLSQL to compute the value.
optionally -- i would turn it inside out. instead of sql calling plsql, this might be one of the times when plsql calling sql is more appropriate. build up procedurally an array of values to update -- forall update them -- and do it again until the rows are processed. with a little foresight, you can even parallelize this processing and run your procedure N times -- each instance of it running against a different slice of the table...
book is "real soon" -- should be in trucks right now.
update large table
Luc Gyselinck, August 28, 2003 - 4:18 pm UTC
For large tables, if I have to update more than 20 percent of the rows, I do it like Tom: write a query to insert the 'updated' records in a new table, using the /*+ APPEND */ hint. You have no indexes to update, no constraints to validate and you generate almost no redo. Next, I truncate the old table, after I have disabled all constraints and triggers (once again, almost no redo, very fast), and then I inject, again with the /*+ APPEND */ hint, the data from the new table into the old (very fast, little redo). Indexes get automaticaly rebuild AFTER the extents are populated. I reenable the triggers, the constraints (with or without validation, as you wish).
If the new values for the columns being updated must come from other tables, I NEVER write something like
update t1
set c1 = (select c2 from t2 where t2.c3 = t1.c4)
In fact, you are performing NESTED LOOPs. Using functions to get the new values for the columns is much the same : NESTED LOOPS (even worse: SQL / PLSQL engine context switches, open/close cursors in PL/SQL, NO read consistency).
Whenever I find myself in such a situation (typicaly when writing batch procedures, during data migrations, data transformations), I make sure my queries use HASH joins, I give the session more resources (higher SORT_AREA_SIZE, higher HASH_AREA_SIZE, DB_MULTIBLOCK_READ_COUNT to the max), avoid the INDEX RANGE SCANS, do FULL (yes FULL) table scans, do FAST FULL INDEX scans, thus bypassing the buffer pool which otherwise gets flushed by the data from my queries. And by the way, use LMTs with uniform size extents.
As an example, I rewrote a batch procedure that took 3 days to complete (once, many years ago), written the old way (see the update statement above), that now does the same job in only 4 hours, on the same (old) hardware, same database, but different Oracle version: Oracle 7 / Oracle 8i.
Yes, you are right,
Vera, August 28, 2003 - 4:50 pm UTC
but right now I don't see how can I perform my task without update statement and a function. In other cases I do exactly the same thing you do on large tables - create new table rather than update old one and then rename tables.
As for LM tablespaces I was begging our DBAs to switch on LMT since I don't know when and they keep promising to do it.
Does parallel insert just imply direct?
Dennis, September 08, 2003 - 1:49 pm UTC
Tom (or anyone),
Does a parallel insert imply direct (append) insert? I read something in the documentation (or is that forbidden territory) ;) awhile back about the append hint not being needed on a parallel insert because it was implied.
I issued the following:
INSERT /*+ append, nologging, parallel(ext, 8) */ INTO hrdmadm.ext_xpayd_pay_details ext
SELECT /*+ parallel(ext2, 8) */ *
FROM hrdmadm.ext_xpayd_pay_details@hip2 ext2
and while I saw the parallel processes spawn on hip2 (for the select), I didn't notice any spawn where I was.
I was wondering if that was because parallel insert was synonymous with direct insert, or did I mess up the syntax somehow? Is the insert in fact parallel, and it's just 'hidden' from us?
Thanks,
Dennis
Thanks!
Dennis, September 08, 2003 - 2:57 pm UTC
My bad was that I didn't alter session. Thanks Tom for your valuable insight. I guess I was just too afraid to read down a few more pages when I was in the documentation, with that being forbidden territory and all ;) Well maybe not forbidden...just hard to put that 'haunted/fear-inspiring' word that would fit in there.
Thanks,
Dennis
Optimizing Query
Mohan, September 11, 2003 - 8:40 am UTC
BULK collect and BULK insert can improve the performance. Create SQL objects of type t1%rowtype and t2%rowtype.
BULK collect 100 records at a time to the collection array. Create seperate collection arrays to store values that are computed or obtained after searching the database.
Do BULK insert/update into the target table. Search this site using key words "bulk insert" or "bulk collect" or "bulk update" for more examples.
Other things are
1. Improve query performance by proper indexing
2. Remove fragmentation by "alter table move" command
3. Rebuild indexes
4. Allow large pctfree if the update causes row migration/row chaining
5. Size rollback segments and rollback tablespaces
Mohan
September 11, 2003 - 9:49 am UTC
#2 isn't really "a good thing" in general.
#3 is really not "a good thing" in general.
reorgs -- so so overrated.
Kailash, September 11, 2003 - 1:51 pm UTC
Can you explain which of the 3 operations will be faster. If I want to update millions of rows,
1. then would delete/reinsert be faster or
2. mere update will be faster
3. the one you suggested will be faster.
Can you advise as to why the method you had suggested will be faster than 1 and 2.
Can you explain why updating millions of rows is not a good idea. Is it based on the amount of redo generated and what if the columns I update are not part of the index?
Which operation will be the fastest and I request you to explain the various DML operations and the amount of redo they generate. ie. I would appreciate if you could explain me which DML generates most redo, followed by the second followed by the 3rd which generates the least redo.
Currently we have an application that checks for a condition in a table A and obtains the unique id. if a record is found, then flushes all of the dependent tables for the given id and re-inserts these records back into the dependent tables with a set of new values. Is this a good approach? If so, what would be the ideal approach?
September 11, 2003 - 7:25 pm UTC
I can say yes to all three -- each is the fastest.
each is the slowest.
If you have a large bulk update that is done once and hits most of the of rows -- you may be best off by doing a CTAS, drop old and rename new.
the CTAS, indexing, etc can be done in nologging mode -- no undo, no redo. you end up getting done much faster.
updating millions of records
b, October 03, 2003 - 4:28 pm UTC
Hi,
The approach of updating the view instead of a table works great.
But here i want to update multiple colums of the same table based on different table queries.
we have around 600,000 records in the base table.
what could be the most inexpensive query.
We run oralce 9i.
UPDATE user1.UserSummary
Set
Column1 = (
SELECT Count(DISTINCT WebOrderID)
FROM user1.OrderTotal ot
WHERE total>0 and
(ot.ostatus <> -1 and
ot.ostatus <> -1) and
ot.UREFITEM = user1.UserSummary.UREFITEM)),
Column2 = (
SELECT NVL(Sum(Total),0) FROM
user1.OrderTotal ot
WHERE (ot.ostatus <> -1 and ot.ostatus <> -1)
and ot.UREFITEM = user1.UserSummary.UREFITEM),
Column3 = (
SELECT min(sodate) FROM orders.salesorders so left join
user1.ordertotal ot on ot.sorefitem = so.sorefitem
WHERE total>0 and (so.ostatus <> -1 and so.ostatus <> -1)
and so.UREFITEM = user1.UserSummary.UREFITEM),
Column4 = (
SELECT max(sodate) FROM orders.salesorders so
left join user1.ordertotal ot on ot.sorefitem = so.sorefitem
WHERE total>0 and (so.ostatus <> -1 and so.ostatus <> -1) and
so.UREFITEM = user1.UserSummary.UREFITEM)
;
the above is the sample of my query
thanks in advance
;;)
October 04, 2003 - 9:38 am UTC
if you can come up with a single query that pulls all 4 columns together in a single result set -- we can optimize this. Left as an exercise for you -- see if you cannot merge the above 4 separate query into a single query
great
b, October 17, 2003 - 3:33 pm UTC
thanks
i finally made my query simple
Great
Raj, October 21, 2003 - 5:28 pm UTC
Great technique never thought of it.
Anthony Reddy, October 22, 2003 - 1:27 am UTC
Updating 1 million records
Kailash, October 22, 2003 - 4:43 pm UTC
I have a situation where I have a unique key on 3 columns (c1,c2,c3) on a table T and I want to update c2 from A to B for some records that match certain criteria. There are a total of 30 million records in the table and there will be about 1 million records in the table that match the criteria to be updated. The table is not partitioned.
1. Your suggestion to drop the old table and to re-create a new table and re-naming it to the old table may not be possible because I am updating only a subset of the table. Is there any other method to acheive this ?
2. Which one of the following will be a better solution?
a. Drop index and update the rows and recreate the
index
b. Delete the rows with C2 = A and insert new rows
with C2 = B. My DBA says that this will create
fragmentation and is not a better approach. Is
that true?
c. Do you suggest any better solution under these
scenarios since I keep getting these type of
questions from my manager often.
Your suggestion in this regard is highly appreciated.
October 22, 2003 - 6:28 pm UTC
have you considered just "update set c2 = b where ...."?
1,000,000 rows out of 30,000,000 isn't alot. if c2 is indexed, you might benefit from a set unusable, skip unusable, update, rebuild.
but it won't take that long to update 1,000,000 rows. couple of minutes on my laptop
big_table@ORA920> update big_table set created = created+1;
1833792 rows updated.
Elapsed: 00:02:13.28
not a big deal.
How about rollback segment problem when updating million rows.
Ramasamy, October 22, 2003 - 10:49 pm UTC
Tom ,
Thanks lot for your service.
I have to update a single column NUMBER(9) where
the table has got more than 80 columns.
This table has got 30 million rows and i have to update
around 17 million rows each night.
As you always insisted ,
i tried with a single update like
UPDATE t1 SET batchno = (A constant )
WHERE batchno is NULL;
1.)I came across the rollback segment issue.
2.) It is taking around 3 hrs to update.
Your input is greatly appreciated to solve the above issues.
Thanking You,
Ramasamy
October 23, 2003 - 8:03 am UTC
there is no problem -- other then you haven't sized your system to do the things you require it to do!!!!!!!!
1) the rollback segment "issue" is a "non issue". size things for what you do and there are NO ISSUES. disk is cheap -- compared to what they pay us all to have this discussion. consider that. the amount of time you've spent thinking about this already cost your employer much more then the disk it would take to do this correctly.
2) you are changing a null to a non-null -- increasing the width of the field. make sure you are not causing massive row migration (check your pctfree, make sure it is set appropriately here). is batchno "indexed"?
MILLION RECORD INSERT/SELECT/UPDATE TAKES TOO MUCH TIME
Santosh Das, October 23, 2003 - 3:27 am UTC
Hi Tom,
I have a table A and soon after creating it 1million records are put into it. Then it being in a real time environment its size goes on increasing atleast half million each day.However each day a cleanup application cleans half million every day.After inserting 1 million at the creation of table its put in the realtime environment.When ever a new record is inserted a contuniously listening application detects it and applies a select operation on table A. Then it send the selected items to some table B and then updates the table A.
The listening operation may be 3 or 4 in number.How ever the select and update operations are taking a lot of time and the initial insertion of 1 million records is taking 11 hours to complete so what can be the possible architecture.
I tried sending the rowid to a queue/pipe after inserting in table A and then used it to select and update.But still its taking a lot of time. Plz suggest some method.
Thanks and regards,
Sandy..
October 23, 2003 - 12:42 pm UTC
sounds like you should just be using AQ -- plop the record in as a message -- have as many dequeuers as you like.
not sure what else to say, not a ton of detail here.
Select and update of a table Having Millions of Records.
Jack, October 23, 2003 - 8:05 am UTC
Hi Tom,
I have the following queries which execute on a table of million records.That table has a PK but its not used in both the queries given below.CMTIME,SAD and STATUS all are NUMBER fields.If I make index on cmtime then it takes longer. Presently i dont have a index field.Can you plz tellme how i can optimize these or where the performance bottleneck is ?
update SD set CMTIME = 1064929834 where (CMTIME is NULL or 1064929834 - CMTIME >= 10) and (1064929834 - SAD >= 0) and 1064929834 not in (select CMTIME from SD where CMTIME is not NULL)
SELECT * from SD where (CMTIME is not NULL and (CMTIME = 1064929834 OR 1064929834 - CMTIME >= 10)) AND (ED - 1064929834 > 0 and (STATUS = 1 OR STATUS = 0) and 1064929834 - SAD >= 0)
Thanks and regards,
Jack.
October 23, 2003 - 12:50 pm UTC
are you using the CBO.
Followup to your question
Ramasamy, October 23, 2003 - 2:17 pm UTC
2) you are changing a null to a non-null -- increasing the width of the field.
make sure you are not causing massive row migration (check your pctfree, make
sure it is set appropriately here). is batchno "indexed"?
Yes Tom. The Batchno is indexed like
index on (BATCHno,STATUS);
Will it be worth Droping the index and recreating it?.
Thanks,
Ramasamy
October 23, 2003 - 7:16 pm UTC
could definitely help, yes.
George Lee, October 23, 2003 - 9:12 pm UTC
I tried to update 30 million records using Tom's solution.
It works fine. Using approximately 1.5 hours to finish.
George (Broadway Photo Supply Ltd. HK)
whats CBO
Jack, October 24, 2003 - 4:29 am UTC
October 24, 2003 - 9:37 am UTC
cost based optimizer
No
jack, October 27, 2003 - 1:37 am UTC
We are not using any cost based optimizers.
It's a simple query which we are using to retrieve data from a table having millions of records and it's performance is not satisfactory.
How to optimize it so that it runs faster ?
jack
October 27, 2003 - 7:42 am UTC
you are kidding right?
this goes right up there with "my car won't start, why not?"
there is so much missing from this "request" as to make it impossible to say anything sensible.
George Lee, November 11, 2003 - 11:21 pm UTC
I have a fact table being partitioned by month. The indexes are built local to partition.
The data population is working fine.
But the rebuild index fails at the end.
I don't know how to rebuild indexes in the partitions.
Can you help me ?
November 12, 2003 - 10:19 am UTC
why do you need to rebuild here at all??
but why don't you show us what you are doing -- at the very least -- define what "fails at the end" means (error codes, messages, details)
George Lee, November 12, 2003 - 8:17 pm UTC
Dear Tom,
Yes, I should provide enough information to you. Sorry for that.
Here is the story,
I create a table as following,
drop table D_DYNRPT_SALES_FACT;
create table D_DYNRPT_SALES_FACT
(
TX_DATE DATE,
SHOP_NO VARCHAR2(20),
ITEM_NO VARCHAR2(12),
PARENT_ITEM_NO VARCHAR2(12),
BRAND_NO VARCHAR2(5),
VENDOR_NO VARCHAR2(20),
ITEM_GROUP VARCHAR2(5),
ITEM_TYPE VARCHAR2(5),
CONSIGNMENT VARCHAR2(1),
ITEM_DESC VARCHAR2(40),
BRAND_DESC VARCHAR2(30),
VENDOR_DESC VARCHAR2(30),
CATEGORY_DESC VARCHAR2(30),
QTY_SOLD NUMBER(14,2),
NET_SALES_AMT NUMBER(14,2),
GROSS_PROFIT_AMT NUMBER(14,2)
)
PARTITION BY RANGE (TX_DATE)
( PARTITION D_DYNRPT_SALES_FACT_2000_09 VALUES LESS THAN (TO_DATE('01/10/2000','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2000_09,
PARTITION D_DYNRPT_SALES_FACT_2000_10 VALUES LESS THAN (TO_DATE('01/11/2000','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2000_10,
PARTITION D_DYNRPT_SALES_FACT_2000_11 VALUES LESS THAN (TO_DATE('01/12/2000','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2000_11,
PARTITION D_DYNRPT_SALES_FACT_2000_12 VALUES LESS THAN (TO_DATE('01/01/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2000_12,
PARTITION D_DYNRPT_SALES_FACT_2001_01 VALUES LESS THAN (TO_DATE('01/02/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_01,
PARTITION D_DYNRPT_SALES_FACT_2001_02 VALUES LESS THAN (TO_DATE('01/03/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_02,
PARTITION D_DYNRPT_SALES_FACT_2001_03 VALUES LESS THAN (TO_DATE('01/04/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_03,
PARTITION D_DYNRPT_SALES_FACT_2001_04 VALUES LESS THAN (TO_DATE('01/05/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_04,
PARTITION D_DYNRPT_SALES_FACT_2001_05 VALUES LESS THAN (TO_DATE('01/06/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_05,
PARTITION D_DYNRPT_SALES_FACT_2001_06 VALUES LESS THAN (TO_DATE('01/07/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_06,
PARTITION D_DYNRPT_SALES_FACT_2001_07 VALUES LESS THAN (TO_DATE('01/08/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_07,
PARTITION D_DYNRPT_SALES_FACT_2001_08 VALUES LESS THAN (TO_DATE('01/09/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_08,
PARTITION D_DYNRPT_SALES_FACT_2001_09 VALUES LESS THAN (TO_DATE('01/10/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_09,
PARTITION D_DYNRPT_SALES_FACT_2001_10 VALUES LESS THAN (TO_DATE('01/11/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_10,
PARTITION D_DYNRPT_SALES_FACT_2001_11 VALUES LESS THAN (TO_DATE('01/12/2001','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_11,
PARTITION D_DYNRPT_SALES_FACT_2001_12 VALUES LESS THAN (TO_DATE('01/01/2002','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2001_12,
PARTITION D_DYNRPT_SALES_FACT_2002_01 VALUES LESS THAN (TO_DATE('01/02/2002','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2002_01,
PARTITION D_DYNRPT_SALES_FACT_2002_02 VALUES LESS THAN (TO_DATE('01/03/2002','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2002_02,
PARTITION D_DYNRPT_SALES_FACT_2002_03 VALUES LESS THAN (TO_DATE('01/04/2002','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2002_03,
PARTITION D_DYNRPT_SALES_FACT_2002_04 VALUES LESS THAN (TO_DATE('01/05/2002','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2002_04,
.
.
.
PARTITION D_DYNRPT_SALES_FACT_2004_12 VALUES LESS THAN (TO_DATE('01/01/2005','DD/MM/YYYY'))
TABLESPACE D_DYNRPT_SALES_FACT_2004_12,
PARTITION D_DYNRPT_SALES_FACT_MAXVALUE VALUES LESS THAN (maxvalue)
TABLESPACE D_DYNRPT_SALES_FACT_MAXVALUE
)
/
CREATE INDEX D_DYNRPT_SALES_FACT_I01 ON D_DYNRPT_SALES_FACT
(TX_DATE) LOCAL;
CREATE INDEX D_DYNRPT_SALES_FACT_I02 ON D_DYNRPT_SALES_FACT
(TX_DATE, ITEM_GROUP, ITEM_TYPE) LOCAL;
CREATE INDEX D_DYNRPT_SALES_FACT_I03 ON D_DYNRPT_SALES_FACT
(TX_DATE, SHOP_NO, ITEM_NO, ITEM_GROUP) LOCAL;
CREATE INDEX D_DYNRPT_SALES_FACT_I04 ON D_DYNRPT_SALES_FACT
(TX_DATE, BRAND_NO, ITEM_GROUP, ITEM_TYPE) LOCAL;
CREATE INDEX D_DYNRPT_SALES_FACT_I05 ON D_DYNRPT_SALES_FACT
(TX_DATE, ITEM_NO) LOCAL;
CREATE OR REPLACE PUBLIC SYNONYM D_DYNRPT_SALES_FACT FOR BPSADM.D_DYNRPT_SALES_FACT;
then, populate data into the table as the following,
alter index D_DYNRPT_SALES_FACT_I01 unusable;
alter index D_DYNRPT_SALES_FACT_I02 unusable;
alter index D_DYNRPT_SALES_FACT_I03 unusable;
alter index D_DYNRPT_SALES_FACT_I04 unusable;
alter index D_DYNRPT_SALES_FACT_I05 unusable;
alter session set skip_unusable_indexes=true;
@pop_d_dynrpt_sales_fact.sql '2000/09/01' '2000/01/30';
.
.
.
@pop_d_dynrpt_sales_fact.sql '2003/11/01' '2003/11/30';
alter index D_DYNRPT_SALES_FACT_I01 rebuild nologging;
alter index D_DYNRPT_SALES_FACT_I02 rebuild nologging;
alter index D_DYNRPT_SALES_FACT_I03 rebuild nologging;
alter index D_DYNRPT_SALES_FACT_I04 rebuild nologging;
alter index D_DYNRPT_SALES_FACT_I05 rebuild nologging;
the data_population is working fine, but alter index rebuild nologging fail with the error as following,
alter index D_DYNRPT_SALES_FACT_I01 rebuild nologging
*
ERROR at line 1:
ORA-14086: a partitioned index may not be rebuilt as a whole
alter index D_DYNRPT_SALES_FACT_I02 rebuild nologging
*
ERROR at line 1:
ORA-14086: a partitioned index may not be rebuilt as a whole
alter index D_DYNRPT_SALES_FACT_I03 rebuild nologging
*
ERROR at line 1:
ORA-14086: a partitioned index may not be rebuilt as a whole
alter index D_DYNRPT_SALES_FACT_I04 rebuild nologging
*
ERROR at line 1:
ORA-14086: a partitioned index may not be rebuilt as a whole
alter index D_DYNRPT_SALES_FACT_I05 rebuild nologging
*
ERROR at line 1:
ORA-14086: a partitioned index may not be rebuilt as a whole
I checked the reference, it should be something syntax error on my rebuild index statement, it is like alter index D_DYNRPT_SALES_FACT_I05 rebuild partition..., I don't know the exact statement, can you help me, thanks.
November 13, 2003 - 6:54 am UTC
ops$tkyte@ORA920LAP> CREATE TABLE t
2 (
3 data char(255),
4 temp_date date
5 )
6 PARTITION BY RANGE (temp_date) (
7 PARTITION part1 VALUES LESS THAN (to_date('13-mar-2003','dd-mon-yyyy')) ,
8 PARTITION part2 VALUES LESS THAN (to_date('14-mar-2003','dd-mon-yyyy')) ,
9 PARTITION part3 VALUES LESS THAN (to_date('15-mar-2003','dd-mon-yyyy')) ,
10 PARTITION part4 VALUES LESS THAN (to_date('16-mar-2003','dd-mon-yyyy')) ,
11 PARTITION part5 VALUES LESS THAN (to_date('17-mar-2003','dd-mon-yyyy')) ,
12 PARTITION part6 VALUES LESS THAN (to_date('18-mar-2003','dd-mon-yyyy')) ,
13 PARTITION junk VALUES LESS THAN (MAXVALUE)
14 )
15 ;
Table created.
ops$tkyte@ORA920LAP>
ops$tkyte@ORA920LAP> create index t_idx1 on t(temp_date) LOCAL nologging;
Index created.
ops$tkyte@ORA920LAP>
ops$tkyte@ORA920LAP> alter index t_idx1 unusable;
Index altered.
ops$tkyte@ORA920LAP>
ops$tkyte@ORA920LAP> begin
2 for x in ( select 'alter index ' || index_name ||
3 ' rebuild partition ' || partition_name stmt
4 from user_ind_partitions
5 where index_name = 'T_IDX1' )
6 loop
7 dbms_output.put_line( x.stmt );
8 execute immediate x.stmt;
9 end loop;
10 end;
11 /
alter index T_IDX1 rebuild partition PART1
alter index T_IDX1 rebuild partition PART2
alter index T_IDX1 rebuild partition PART3
alter index T_IDX1 rebuild partition PART4
alter index T_IDX1 rebuild partition PART5
alter index T_IDX1 rebuild partition PART6
alter index T_IDX1 rebuild partition JUNK
PL/SQL procedure successfully completed.
A reader, December 11, 2003 - 4:56 pm UTC
Hi Tom,
I am running below update statement from past 24 hr and it's still running..
Table S_REVN has 1.5 Million and Table S_OPTY_PROD has 1.2 Million Record.. It's doing full table scan of both table sinace I am using function in where clause.. We using RULE base optimizer and oracle 8.1.7.4
Here is sql statement
Update siebel.S_REVN r
set x_opty_prod_city=(select X_CITY from siebel.S_OPTY_PROD p where replace(replace(replace(p.ROW_ID,'ce','='),'-','='), '@', '$') = r.row_id),
X_opty_prod_STATE =(select X_STATE from siebel.S_OPTY_PROD p where replace(replace(replace(p.ROW_ID,'ce','='),'-','='), '@', '$') = r.row_id),
X_PR_YEAR_QTY =(select X_PR_YEAR_QTY from siebel.S_OPTY_PROD p where replace(replace(replace(p.ROW_ID,'ce','='),'-','='), '@', '$') = r.row_id)
0 UPDATE STATEMENT Optimizer=RULE
1 0 UPDATE OF 'S_REVN'
2 1 TABLE ACCESS (FULL) OF 'S_REVN'
3 0 TABLE ACCESS (FULL) OF 'S_OPTY_PROD'
4 0 TABLE ACCESS (FULL) OF 'S_OPTY_PROD'
5 0 TABLE ACCESS (FULL) OF 'S_OPTY_PROD'
Please advice
Thanks
December 11, 2003 - 6:27 pm UTC
you understand that it is doing 3 full scans of S_OPTY_PROD for EACH and EVERY row in S_REAVN yes???
how about
update t
set (a,b,c) = (select a,b,c from t2 where ... )
at the very least (reduce the full scans from 3 to 1)
at the very most, consider creating an index:
create index temp_index_to_make_this_complete_before_my_grandchildren_have_grandchildren
on p(replace(replace(replace(p.ROW_ID,'ce','='),'-','='), '@', '$'))
and useing a hint (to let sighbul use the cbo) to get the index used.
A reader, December 12, 2003 - 12:44 pm UTC
Hi Tom,
I am following your advice on update I've changed sql statement
Update /*+ INDEX(S_OPTY_PROD T_TEMP) */ siebel.S_REVN r
set (x_opty_prod_city,X_opty_prod_STATE,X_PR_YEAR_QTY) =
(select X_CITY,X_STATE,X_PR_YEAR_QTY from siebel.S_OPTY_PROD p
where replace(replace(replace(p.ROW_ID,'ce','='),'-','='), '@', '$') = r.row_id)
created function based index on S_OPTY_PROD
create index T_TEMP
on S_OPTY_PROD(replace(replace(replace(ROW_ID,'ce','='),'-','='), '@', '$'))
BUT above sql statement in not using index eventhough I supply hint (Is it b/c function based index not reconized by Rule based optimizer?)
Please help
Thanks
December 13, 2003 - 11:11 am UTC
you put the hint in the wrong place :)
and you might not have met all of the requirements for FBI's. read:
http://asktom.oracle.com/~tkyte/article1/index.html
but it is not the RBO (the well formed hint -> cbo)....
your hint is wrong (must use the correlation name of P) and in the wrong place (should be in the subquery).
you need query_rewrite_enabled and (depending on version) might need query_rewrite_integrity (see the above link).
but here is an example:
ops$tkyte@ORA920> create table t1( x_opty_prod_city int, x_opty_prod_state int, x_pr_year_qty int, row_id varchar2(25) );
Table created.
ops$tkyte@ORA920> create table t2( x_city int, x_state int, x_pr_year_qty int, row_id varchar2(25) );
Table created.
ops$tkyte@ORA920>
ops$tkyte@ORA920> create index t2_idx on t2( replace(replace(replace(ROW_ID,'ce','='),'-','='), '@', '$') );
Index created.
ops$tkyte@ORA920>
ops$tkyte@ORA920> delete from plan_table;
4 rows deleted.
ops$tkyte@ORA920>
ops$tkyte@ORA920> ALTER SESSION SET QUERY_REWRITE_ENABLED=true;
Session altered.
ops$tkyte@ORA920>
ops$tkyte@ORA920> explain plan for
2 Update t1 r
3 set (x_opty_prod_city,X_opty_prod_STATE,X_PR_YEAR_QTY) =
4 (select /*+ INDEX( t2 t2_idx ) */ X_CITY,X_STATE,X_PR_YEAR_QTY
5 from t2 p
6 where replace(replace(replace(p.ROW_ID,'ce','='),'-','='), '@', '$') = r.row_id)
7 /
Explained.
ops$tkyte@ORA920>
ops$tkyte@ORA920> prompt @?/rdbms/admin/utlxpls
@?/rdbms/admin/utlxpls
ops$tkyte@ORA920> set echo off
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |Cost(%CPU)|
-------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 82 | 4346 | 3 (34)|
| 1 | UPDATE | T1 | | | |
| 2 | TABLE ACCESS FULL | T1 | 82 | 4346 | 3 (34)|
| 3 | TABLE ACCESS BY INDEX ROWID| T2 | 1 | 53 | 2 (50)|
|* 4 | INDEX RANGE SCAN | T2_IDX | 1 | | 2 (50)|
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access(REPLACE(REPLACE(REPLACE("P"."ROW_ID",'ce','='),'-','='),'@','$')=:B
1)
16 rows selected.
A reader, December 12, 2003 - 9:04 pm UTC
hello Tom,I create table as following using the parallel hint,but the cost is different:
sql>insert /*+ append parallel(demo01,4) */ into demo01 select /*+ parallel(customer,4) */ * from customer;
the cost --42s
and the Execution Plan is:
Execution Plan
----------------------------------------------------------
ERROR:
ORA-12838: 无法在并行模式下修改之后读/修改对象
SP2-0612: 生成AUTOTRACE EXPLAIN报告时出现错误
Statistics
----------------------------------------------------------
1698 recursive calls
455 db block gets
8728 consistent gets
8075 physical reads
66492440 redo size
630 bytes sent via SQL*Net to client
525 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
11 sorts (memory)
0 sorts (disk)
390126 rows processed
sql>insert * from demo02 select * from customer;
cost --26s
and the Execution Plan is:
Execution Plan
----------------------------------------------------------
0 INSERT STATEMENT Optimizer=CHOOSE (Cost=1226 Card=659568 Byt
es=680674176)
1 0 PARTITION RANGE (ALL)
2 1 TABLE ACCESS (FULL) OF 'CUSTOMER' (Cost=1226 Card=659568
Bytes=680674176)
Statistics
----------------------------------------------------------
1939 recursive calls
44876 db block gets
20918 consistent gets
1 physical reads
64954576 redo size
646 bytes sent via SQL*Net to client
467 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
7 sorts (memory)
0 sorts (disk)
390126 rows processed
The customer table have 0.4 million records and the machine is 2 CPU's SUN E420,which runiing Oracle database.
Can you tell me why the physical reads increase so much using parallel hint in the DML sql,thanks
December 13, 2003 - 11:49 am UTC
well, for a tiny number of records like this, i would not even consider PQ at this point.
But -- to answer your questions
the cost, well the cost is affected by thousands of things. HINTS definitely affect the cost (That is in part how hints work -- by playing with costs). Ignore the cost of two queries that are not the same, they are NOT comparable.
the PIO's -- well, you ran one before the other? That and parallel query prefers to checkpoint -- and do direct io many times (you might find PQ doing a global checkpoint before the query begins to get current images onto disk in order to just all hit slam the data as fast as it can without messing with the buffer cache)
but for so few records -- it would seem that regular sql is what you want.
A reader, December 13, 2003 - 4:37 pm UTC
Hi Tom,
Followed you advice...WORKS LIKE A CHARM..updated 1.5 Million recod in 10 Min. EXCELLENT!!!!!!!
Thanks a LOTTTTTTTTTTT
Update siebel.S_REVN r
set (x_opty_prod_city,X_opty_prod_STATE,X_PR_YEAR_QTY) =
(select /*+ INDEX(S_OPTY_PROD T_TEMP) */ X_CITY,X_STATE,X_PR_YEAR_QTY from siebel.S_OPTY_PROD p
where replace(replace(replace(p.ROW_ID,'ce','='),'-','='), '@', '$') = r.row_id)
Online updation of very big table
Praveen, March 06, 2004 - 3:37 pm UTC
Hi Tom
Iwant to update a table with one go on online system.A table has 20 lakh records and has 110 columns. when i give the update command it takes ~ one hrs.I don't know why it is taking so much time even index is created on that particular field.
select count(*),app_flg from test;
app_flag has only two values Approved or unapproved.By defualt is unapproved.
approved =170000
unapproved=30000
update test set app_flg='APPROVED' where app_flg='UNAPPROVED'
it took 1hrs to update the records and other application online users processing got slow down and locks started to occure on the table.
Suggest the best solution
pkr
March 06, 2004 - 8:30 pm UTC
It would be the INDEX that slowed it down.
You want to FULL SCAN this table.
You want there to be NO index on the column you are updating.
it gives error
praveen, March 07, 2004 - 5:16 am UTC
on dropping the index it gives error ora-00054
thanks
March 07, 2004 - 9:26 am UTC
that'll happen whilst there are outstanding transactions, yes. You'll want to keep trying until you get a chance to drop it.
thanks
jasdeep,praveen, March 07, 2004 - 11:16 am UTC
i have solved that problem as an user has locked rows on the but at present not logged on i killed that session and index was dropped immediately.
and updates were as fast as u can think.
thanks
Update taking hrs of time
Sachin, March 09, 2004 - 7:44 am UTC
Hi Tom,
I have a query as:
I have two tables Table1(with around 10000 records max) and table2(actually the GL code combination master table with around 600,000 records). I need to update three fields in table1 (which is a temp processing table) with a unique value from table2. T1 has ccid fields which need to be updated, and s1-s4 fields corresponding to segment1-segment4 fields of table2
Following is the query like:
UPDATE table1 t1
SET t1.ccid1 = (SELECT t2.code_combination_id
FROM gl_code_combinations t2,
gl_sets_of_books gb
WHERE t2.segment5 = 'yyyyy'
and t2.segment4 = t1.s4
and t2.segment3 = 'XXXXX'
and t2.segment2 = t1.s2
and t2.segment1 = t1.s1
and t2.coa_id = gb.chart_of_accounts_id
and gb.set_of_books_id = 1),
t1.ccid2 = (SELECT t2.code_combination_id
FROM gl_code_combinations t2,
gl_sets_of_books gb
WHERE t2.segment5 = 'yyyyy'
and t2.segment4 = t1.s4
and t2.segment3 = 'zzzzzz'
and t2.segment2 = t1.s2
and t2.segment1 = t1.s1
and t2.coa_id = gb.coa_id
and gb.set_of_books_id = 1),
t1.ccid3 = (SELECT t2.ccid
FROM gl_code_combinations t2,
gl_sets_of_books gb
WHERE t2.segment5 = 'yyyyy'
and t2.segment4 = t1.s4
and t2.segment3 = 'wwwww'
and t2.segment2 = t1.s2
and t2.segment1 = t1.s1
and t2.coa_id = gb.coa_id
and gb.set_of_books_id = 1
But this is taking more than 6 hrs to complete. When I check the table locks the table remains locked Row Xclusively. I am committing immediately after the update statement in the procedure. Cud u pls tell me why is this happening?
March 09, 2004 - 12:26 pm UTC
not that i don't believe you but -- about the locking issue -- I'm not sure i believe you.
when you commit -- locks are released. You'd have to show me the info from v$lock showing that t1 is still "locked" and locked by your session.
you could
set (c1,c2,c3) = ( select max(decode(segment3,'xxxxx',ccid)),
max(decode(segment3,'zzzzz',ccid)),
max(decode(segment3,'wwwww',ccid))
FROM gl_code_combinations t2,
gl_sets_of_books gb
WHERE t2.segment5 = 'yyyyy'
and t2.segment4 = t1.s4
and t2.segment3 in ('xxxxx','zzzzzz', 'wwwww')
and t2.segment2 = t1.s2
and t2.segment1 = t1.s1
and t2.coa_id = gb.coa_id
and gb.set_of_books_id = 1 )
instead of running a query per column.
if you run the query
select max(decode(segment3,'xxxxx',ccid)),
max(decode(segment3,'zzzzz',ccid)),
max(decode(segment3,'wwwww',ccid))
FROM gl_code_combinations t2,
gl_sets_of_books gb
WHERE t2.segment5 = 'yyyyy'
and t2.segment4 = some_value_value
and t2.segment3 in ('xxxxx','zzzzzz', 'wwwww')
and t2.segment2 = some_value_value
and t2.segment1 = some_value_value
and t2.coa_id = gb.coa_id
and gb.set_of_books_id = 1
how long does that take?
Updating millions of rows
A reader, March 15, 2004 - 1:46 pm UTC
Tom,
In this discussion you mention that you would not do an UPDATE/DELETE of millions of rows, but rather would create a new table (in parallel/nologging) with the rows that needs to be retained and then create indexes in parallel, drop the old table and rename the new table to old table. If so, my understanding is :
1. Delete/UPDATE of tables with more than 1 million row would also have to maintain the index structure and also will generate considerable rollback and undo.
2. Other consideration with respect to update/delete is that the blocks will be moved in and out of the freelists. Please confirm if this is true and if so, please explain what happens during update/delete of large number of rows with regard to freelists?
3. Other consideration is that it will cause lot of table/segment fragmentation and lot of empty blocks below the HWM and all full table scans on the table will read lot of empty blocks.
Is my understanding correct and if I have missed any other things as to why DELETE/UPDATE of large tables is not recommended, please let me know.
4. If my table that needs to be updated/deleted is the primary/parent table with dependent child tables/constraints and triggers, is this approach correct
Disable all the triggers/constraints
Drop the table
Rename the new table to the old table
Reenable all constraints and triggers
What to do in such cases?
March 15, 2004 - 3:03 pm UTC
I "might", "probably" if it was most of the records...
1) correct
2) correct, deletes would put them on, updates might put them on, might take them off.
3) it could, yes.
the delete/update can take lots longer than CTAS + reindex if you are doing MOST all of the records in a LARGE table with lots of indexes.
4) triggers are not relevant here....
but you would have to drop the fkeys
do the CTAS+drop+rename
add the constraints back with enable/rely (don't need validation)
Deletes/Updates
A reader, March 15, 2004 - 5:44 pm UTC
Tom,
In the above discussion you mention :
1. The deletes will put them on because the used space in the block will drop below PCTUSED. Is it correct?
2. The update will put them on if the updated column makes the used space in the block fall below PCTUSED or the updated column makes the free space available in the block less than PCTFREE. Is it correct.
If both of the above understanding is not correct, please explain. Also, please let me know how we could track the movement of the blocks on and off the freelist. Any SQL scripts/tools available that I could use?
Thanks
March 15, 2004 - 6:24 pm UTC
1) the deletes could, yes.
2) yes.
to track individual blocks -- not without dumping blocks (and I don't go there). dbms_space and other tools (search this site for show_space for the script I use) can show you how MANY things are on the freelist.
A small correction
A reader, March 15, 2004 - 6:30 pm UTC
Tom,
In the above scenario for question 2, there is a small correction marked in CAPITAL letters :
"The update will put them on if the updated column makes the used space in the block fall below PCTUSED AND WILL TAKE OFF of the freelist if the updated column makes the free space available in the block less than PCTFREE. "
Is it correct?
Thanks
March 15, 2004 - 6:33 pm UTC
I'll state it simply, forget pctfree/pctused/etc -- in english:
if the update reduces the row size, it can put it back onto the freelist.
if the update increases the row size, it can take it off the freelist.
Parallel DML
A reader, March 16, 2004 - 8:22 am UTC
1)On page 630 of your Oracle Expert One on One, you mention that parallel DML can be used only upon partitions -- in 8i. I'm using 9.2. Has this behavior changed in the later versions ?
2)I need to do a mass delete on a 12 GB table and it's taking to much time (far too much, really). How could I speed it up ?
Thanks
enable PK with parallel clause
A reader, March 20, 2004 - 6:40 pm UTC
Hi
I am working in 8.1.7.4.
I want to delete 30 million rows of 60 million so I am doing this
1. create a copy of original table with good data
2. disable original tables´s constraints, primary keys (and child FKs) and foreign keys
3. truncate the original table
4. make all original tables indexes unusable
5. insert append back from copy to original, I have to do this way because I am not allowed to modify constraint names
6. drop the copy
7. enable constraints. here when I enable PK I cannot provide paralle clause right? I searched the doco but it seems that I can only specify using index tablespace clause or I am missing something?
8. rebuild all unusable indexes
March 21, 2004 - 9:50 am UTC
you can create an index in parallel (unique) and then enable/rely/novalidate the pk.
How to Update millions or records in a table
A reader, March 25, 2004 - 11:55 am UTC
Hi Tom,
I read your response to Murali's question above and believe, there will be a downtime window for the application. This is because if I want to keep the index names same as before, I will have to create the new table, drop the old table, rename the new table to old table name and then create the required indexes on the same.
Wondering, how if we can create the indexes by some other name on the new table and rename the indexes after dropping the old table.
As always your valuable advice helps a lot!
March 25, 2004 - 9:13 pm UTC
you can rename the indexes.
alter index supports this.
how to make this restartble?
A reader, March 25, 2004 - 6:14 pm UTC
Hi
I want to delete 80% of rows from a couple of 50 million rows table. The process is divided in 10 steps
1. Create TEMP table, CTAS from original table
2. Disable constraints
3. Truncate original table
4. Set indexes to UNUSABLE
5. Insert into original table from TEMP table
6. Drop TEMP table
7. Create PK and UK
8. Enable PK and UK
9. Enable FKs
10. Rebuild indexes
I want to make this process restartble, i.e if it fails in step 3 if I later rerun the procedure it will start from step 3 again. How can we achiveve this? I have been thinking using a table which stores the two table name and process status something like following table then update the status as the process is going
TABLE: PROCESS_STATUS
TABLE_NAME STATUS
----------- ---------------
EMP 2
DEPT 0
Process_status is the step, 0 means that I need to start from beginning so in each process I would query this table and see which step should be run. For example if the process failed at step 3 then STATUS would be 2
select status into l_proc_status from PROCESS_STATUS
where table_name = 'EMP'
if l_proc_status = 0
then
run process 1
end if;
select status into l_proc_status from PROCESS_STATUS
where table_name = 'EMP'
if l_proc_status = 1
then
run process 2
end if;
select status into l_proc_status from PROCESS_STATUS
where table_name = 'EMP'
if l_proc_status = 2
then
run process 3
end if;
select status into l_proc_status from PROCESS_STATUS
where table_name = 'EMP'
if l_proc_status = 2
then
run process 3
end if;
and so on.
Any suggestions :-? What I see is that I will hav quite a few redundant codes...
March 25, 2004 - 9:35 pm UTC
you'd have to keep a state table and have you code query it up, much like you are suggesting. That would do it yes.
another "inserting" idea might be to
a) insert the steps to process
b) delete them as you complete them (and commit)
c) to restart, just pick up at the step you wanted
you could insert the procedures to be called and just:
for x in ( select name from procedures order by seq )
loop
execute immediate 'begin ' || x.name || '; end;';
delete from procedures....
commit;
end loop
(just a thought, not fully baked)
provide more information about that procedure drop indexes and rebuild with nologging
mohan, April 07, 2004 - 2:39 am UTC
Hi Tom
Could you provide more information about that procedure and
how to drop indexes and rebuild with nologging .We are using informatica etl tool
before loading bulk data into target drop the indexes ( pre-session) and after load the data rebuilt the indexes with nologging (post-session ), it takes less amount of time because its generates less undo
Regards
Mohan
April 07, 2004 - 9:09 am UTC
it is just a drop and create? not sure what you are looking for -- if it is "syntax", we document that.
Problem with Update
Ram, April 08, 2004 - 6:23 am UTC
Hi Tom,
This update is not working properly.How to correct this?
SQL> declare
2 x number;
3 y number;
4 begin
5 execute immediate 'update e set sal = sal*1.1 where ename in ('SMITH','KING') returning sal into :x,:y' using x,y;
6 dbms_output.put_line(x || chr(9)||y);
7* end;
*
ERROR at line 5:
ORA-06550: line 5, column 65:
PLS-00103: Encountered the symbol "SMITH" when expecting one of the following:
. ( * @ % & = - + ; < / > at in is mod not rem return
returning <an exponent (**)> <> or != or ~= >= <= <> and or
like between into using || bulk
Please do reply.
Bye!
April 08, 2004 - 10:09 am UTC
..... ( ''SMITH'', ''KING'' ) .....
you need two quotes for a quote in a string (until 10g when there is another way)
Updating million records
Himnish Narang, April 08, 2004 - 7:34 am UTC
Hi Tom,
i was just going through the discussion in this subject. In this you have described that you will create a table instead of going for updates of million records and this update will be peroformed as part of creation of new table.
Don't you think that the new table will also occupy the same amount of space and after dropping the table we will have to do the reorg for that tablespace.
please comment.
Thanx.
April 08, 2004 - 10:24 am UTC
reorgs are so overrated.
no, i would not reorge a tablespace or anything simply cause I copied some data and dropped some old stuff. not a chance.
How to Update millions or records in a table
Ganesh, April 08, 2004 - 9:03 am UTC
Hi Tom,
Its very useful and I used this saved lots of time by creating new table.
I got another issue similar to this. We are using 9i. there is requirement of needs to modify primary key datatype number to varchar which is having lot of dependents. Is there any option with out rebuilding table as data already exist.
Can you please suggest on this.
Thanks in Advance.
Ganesh
April 08, 2004 - 10:51 am UTC
that'll be a rebuild pretty much.
Please see this
Ram, April 08, 2004 - 11:38 pm UTC
Hi Tom,
Thanks for your reply but Still it's not working properly.
SQL> declare
2 x number;
3 y number;
4 begin
5 execute immediate 'update e set sal = sal*1.1 where ename in (''SMITH'',''ALLEN'')
6 RETURNING sal into :x,:y' using x,y;
7 dbms_output.put_line(x ||chr(9) ||y);
8 end;
9 /
declare
*
ERROR at line 1:
ORA-00913: too many values
ORA-06512: at line 5
How to correct this?You say about another way in Oracle 10G.How does that work?Could you please provide an example.
Please do reply.
Bye!
April 09, 2004 - 7:42 am UTC
you are returning SAL (one thing) into two things (:x,:y)
different problem all together.
in 10g, the strings could be
q'|how's this for quoting|'
instead of
'how''s this for quoting'
a character string literal that starts with q for quote will use the next two characters as the start quote and those two characters in reverse for the end quote.
Thanks
Ram, April 09, 2004 - 9:39 am UTC
Hi Tom,
Thanks for your reply.I found out the way of doing it as follows and Do you have any other option to do it in a better way?
SQL> create or replace type sal_t as table of number;
2 /
Type created.
SQL> declare
2 sals sal_t := sal_t();
3 begin
4 execute immediate 'update emp set sal = sal*1.1 where ename in (''SMITH'',''KING'')
5 returning sal into :sals' returning bulk collect into sals;
6 for i in 1..sals.count loop
7 dbms_output.put_line('Updated salary:'||sals(i));
8 end loop;
9 end;
10 /
Updated salary:968
Updated salary:5500
PL/SQL procedure successfully completed.
Please do reply.
Bye!
April 09, 2004 - 10:17 am UTC
get rid of the dynamic sql, it isn't necessary (fixes your original issue with the quotes as well). also, would seem to make sense to pull back other columns so you would be able to see what rows were actually updated:
ops$tkyte@ORA9IR2> declare
2 type numArray is table of number;
3 l_sals numArray;
4 l_empno numArray;
5 begin
6 update emp
7 set sal = sal*1.1
8 where ename in ('SMITH','KING')
9 returning empno, sal bulk collect into l_empno, l_sals;
10
11 for i in 1..l_sals.count loop
12 dbms_output.put_line
13 ('Updated salary for empno ' || l_empno(i) ||
14 ' to: '||l_sals(i));
15 end loop;
16 end;
17 /
Updated salary for empno 7369 to: 880
Updated salary for empno 7839 to: 5500
PL/SQL procedure successfully completed.
To Mr. Ram
A reader, April 09, 2004 - 10:44 am UTC
Hi Ram and All,
May I request you all not to pester Tom with inane posts requesting Tom to debug and program on our behalf.
This forum is not to learn coding.
update based on rowid
john, April 16, 2004 - 2:02 am UTC
Tom,
we use non-intelligent primary key to update the table through a stored procedure.
i.e. when the table is queried, our select procedure returns the non-intelligent key along with the details like below
select emp_key, other details from emp;
when updating the data, emp_key is passed from the front end to the stored procedure and table is updated based on emp_key.
is it a bad idea, using rowid instead of non-intelligent key(emp_key)?
April 16, 2004 - 7:27 am UTC
non-intelligent? interesting choice of terminology.
But in any case -- once upon a time ago, rowids were IMMUTABLE. Once assigned a row would have a rowid assigned to it and that rowid would live with that row until you deleted it.
Starting in 8i with support for updates to partition keys that would cause a row to move from partition A to partition B -- that is no longer true (and then there are IOT's...).
In 10g, there are even more options for 'row movement' -- an online segment shrink for example.
So, rowids can change and are changing in more circumstances as time goes on.
Sooooo, if you have lost update detection in place using 'FOR UPDATE' -- rowids are very safe (forms uses them). What I mean is -- you
a) select a.*, rowid from T a where .... 'normally'
b) you decide to update a row, so you
select * from t where rowid = :that_rowid AND
((c1 = :that_c1 or (c1 is null and :that_c1 is null)) AND
((c2 = :that_c2 or (c2 is null and :that_c2 is null)) AND ...
((cN = :that_cN or (cN is null and :that_cN is null))
for update NOWAIT;
you lock it -- this prevents anything that could change a rowid from doing so and verifies the data hasn't been changed. If that returns 0 rows, someone changed the data (or reorg'ed it and moved it). You need to requery to get the current values before you let the user even think about modifying it. If that returns a row -- you got it. If that returns ora-54, something has it locked, you have to decide what you want to do about that.
c) you can then safely update that row by rowid
If you do not have lost update detection in place using FOR UPDATE locks -- then you should stick with a primary key probably, just to protect yourself in the future.
update based on rowid
john, April 19, 2004 - 12:42 am UTC
thanks a lot Tom.
to understand fully what you said, can you please tell
1.why do we have to where condition?
((c1 = :that_c1 or (c1 is null and :that_c1 is null)) AND
((c2 = :that_c2 or (c2 is null and :that_c2 is null)) AND ...
((cN = :that_cN or (cN is null and :that_cN is null))
2.another question is, is there a situation where a rowid of a row being assigned another row of the same table?(because this may cause more danger as we end up updating another row)?
thanks again.
April 19, 2004 - 6:37 am UTC
1) you want to make sure the row hasn't changed.
c1 = :that_c1 will only with if both the column and the bind are not null, hence the rest is needed to make it so null's compare "the same"
2) this is tied into 1 again. sure, we could delete from emp where empno = 7788; commit; and then re-insert that row and get the same rowid (rare, but could happen).
But - if we inserted the same exact values and they got the same exact rowid -- then #1 would make this "safe". the values of the row are the same, for all intents and purposes it IS the same row.
If the newly inserted row doesn't match column by column the values we expect, well, then we'll reject it (won't lock it) and all will be well.
update large table
Prasad Chittori, April 22, 2004 - 6:04 pm UTC
I have a very large partitioned table with DATE column, I would like to take out the time portion from the date column
I did the following and it is taking lot of time and failing with unable to extend the rollback segments.
Any better ideas without increasing the rollback segments or changing the query
update t1
set dt = trunc(dt)
April 23, 2004 - 10:12 am UTC
parallel dml -- each pq slave can get its own RBS, letting you use them all at the same time, not just one of them.
convert delete to insert
marvin, April 26, 2004 - 11:30 am UTC
Hi
We want to delete several tables of several million of rows. The problem is, the delete statement is quite simple but if we want to this faster we would insert to temp table, truncate original and insert back the good data. We want to convert delete into insert statements but I think it´s very hard
For example
We have these DELETE statements
DELETE
FROM OM_ANNEX t
WHERE EXISTS
(SELECT 1
FROM LISTA_ORDENES l,OM_CONTRACT_INST c
WHERE t.contract_id=c.contract_id
AND c.order_id=l.order_id)
/
COMMIT
/
DELETE
FROM OM_ANNEX t
WHERE EXISTS(SELECT 1 FROM OM_CONTRACT_INST c
WHERE t.contract_id=c.contract_id
AND c.order_id IS NULL)
/
COMMIT
to convert to that we did
INSERT /*+ APPEND */ INTO OM_ANNEX$TMP
SELECT t.*
FROM OM_ANNEX t
WHERE NOT EXISTS (SELECT 1
FROM LISTA_ORDENES l,
OM_CONTRACT_INST c
WHERE t.contract_id=c.contract_id
AND c.order_id=l.order_id)
UNION
SELECT t.*
FROM OM_ANNEX t
WHERE NOT EXISTS (SELECT 1
FROM OM_CONTRACT_INST c
WHERE t.contract_id=c.contract_id
AND c.order_id IS NULL);
Seems incorrect, if I run the queries inside insert seperately I get correct results but if we use UNIONs we get data which we dont want, if we do it in two inserts we get duplicate rows
One approach is this
create a temp table using the delete statements but converted to querys for example instead of
DELETE
FROM OM_ANNEX t
WHERE EXISTS
(SELECT 1
FROM LISTA_ORDENES l,OM_CONTRACT_INST c
WHERE t.contract_id=c.contract_id
AND c.order_id=l.order_id)
we convert to
select t.*
FROM OM_ANNEX t
WHERE EXISTS (SELECT 1
FROM LISTA_ORDENES l,OM_CONTRACT_INST c
WHERE t.contract_id=c.contract_id
AND c.order_id=l.order_id)
then simply says
select * from om_annex a
where not exists (select null from temp_table b
where a.annexid = b.annexid)
annexid is PK
the other approach is this
select * from om_annex a
where annexid not in (select annexid
FROM OM_ANNEX t
WHERE EXISTS (SELECT 1
FROM LISTA_ORDENES l,OM_CONTRACT_INST c
WHERE t.contract_id=c.contract_id
AND c.order_id=l.order_id)
union
select annexid
FROM OM_ANNEX t
WHERE EXISTS(SELECT 1
FROM OM_CONTRACT_INST c
WHERE t.contract_id=c.contract_id
AND c.order_id IS NULL))
They are all very slow because involving tables have several million of rows!
Is there a better approach?
April 26, 2004 - 2:20 pm UTC
why not:
create table new as select rows to keep;
drop table old;
rename new to old;
do the first in parallel, with nologging.
err the problem is convert the DELETE to CTAS
A reader, April 26, 2004 - 3:03 pm UTC
Hi
create table new as select rows to keep;
drop table old;
rename new to old;
do the first in parallel, with nologging.
That is exactly I want to do, the problem is until now we have always done the other way round, use plain DELETE (and it takes a week to delete everything!), the problem I am not sure how to convert DELETE to CTAS. If I want to do the reverse of DELETE statements (some table has 5 DELETE statements!) it is not as simple as write the DELETE the other way round? For example how would you change
delete tab1
where exists (select null
from tab2, tab3
where tab2.id = tab3.id
and tab1.id = tab2.fid)
delete tab1
where exists (select null
from tab2
where tab2.fid = tab1.id
and tab2.id is null)
Would you change it to
insert into tmp_x
select *
from tab1
where not exists (select null
from tab2, tab3
where tab2.id = tab3.id
and tab1.id = tab2.fid)
insert into tmp_x
select *
from tab1
where not exists (select null
from tab2
where tab2.fid = tab1.id
and tab2.id is null)
Is simple as this?
April 27, 2004 - 4:02 am UTC
if i had this:
delete tab1
where exists (select null
from tab2, tab3
where tab2.id = tab3.id
and tab1.id = tab2.fid)
delete tab1
where exists (select null
from tab2
where tab2.fid = tab1.id
and tab2.id is null)
I would probably have this:
create table tab1_new
as
select tab1.*
from tab1, tab2, tab3
where tab1.id = tab2.fid(+)
and tab2.id = tab3.id(+)
and NOT ( tab2.fid is not null and tab3.id is not null )
and NOT ( tab2.fid is not null and tab2.id is null )
/
outer join the three tables. Negate the conditions for the where exists.
that is, after outer joining tab1 to tab2, tab3 -- remove the rows
where tab2.fid is not null and tab3.id is not null -- that is subquery one in your deletes above.
where tab2.fid is not null and tab2.id is null -- that is subquery two in your deletes above.
err the problem is convert the DELETE to CTAS
marvin, April 26, 2004 - 3:04 pm UTC
Hi
create table new as select rows to keep;
drop table old;
rename new to old;
do the first in parallel, with nologging.
That is exactly I want to do, the problem is until now we have always done the other way round, use plain DELETE (and it takes a week to delete everything!), the problem I am not sure how to convert DELETE to CTAS. If I want to do the reverse of DELETE statements (some table has 5 DELETE statements!) it is not as simple as write the DELETE the other way round? For example how would you change
delete tab1
where exists (select null
from tab2, tab3
where tab2.id = tab3.id
and tab1.id = tab2.fid)
delete tab1
where exists (select null
from tab2
where tab2.fid = tab1.id
and tab2.id is null)
Would you change it to
insert into tmp_x
select *
from tab1
where not exists (select null
from tab2, tab3
where tab2.id = tab3.id
and tab1.id = tab2.fid)
insert into tmp_x
select *
from tab1
where not exists (select null
from tab2
where tab2.fid = tab1.id
and tab2.id is null)
Is simple as this?
thank you very much for the outer join tip
marvin, April 27, 2004 - 6:16 am UTC
Hi
I am going to have a look how to apply the outer join in order to convert DELETE to CTAS.
I have a further question, I have another table which undergoes 4 DELETEs
DELETE
FROM SW_PERSON t
WHERE EXISTS
(SELECT 1
FROM LISTA_ORDENES o
WHERE o.order_id=t.swobjectid AND t.swtype='ORDER')
/
COMMIT
/
DELETE
FROM SW_PERSON t
WHERE t.swtype='ORDER' AND t.swobjectid IS NULL AND COMP_INST_ID IS NULL
/
COMMIT
/
DELETE
FROM SW_PERSON t
WHERE t.swtype IS NULL
/
COMMIT
/
DELETE
FROM SW_PERSON t
WHERE t.swtype='ORDER'
AND t.swobjectid IS NULL
AND EXISTS
(SELECT 1
FROM OM_COMPANY_INST c, LISTA_ORDENES l
WHERE c.COMP_INST_ID=t.COMP_INST_ID
AND l.order_id=c.order_id)
/
COMMIT
/
I need to convert this to CTAS as well, however I am not sure if this can be done in a single statement. These DELETE for example cant be converted into one as follows right? (because of commit between them)
DELETE
FROM SW_PERSON t
WHERE EXISTS (SELECT 1
FROM LISTA_ORDENES o
WHERE o.order_id=t.swobjectid
AND t.swtype='ORDER')
OR (t.swtype='ORDER'
AND t.swobjectid IS NULL
AND COMP_INST_ID IS NULL)
OR t.swtype IS NULL
OR (t.swtype='ORDER'
AND t.swobjectid IS NULL
AND EXISTS (SELECT 1
FROM OM_COMPANY_INST c,
LISTA_ORDENES l
WHERE c.COMP_INST_ID=t.COMP_INST_ID
AND l.order_id=c.order_id));
Can this use the outer join tip as well?
TIA
April 28, 2004 - 12:17 pm UTC
why have any commits in between.
but of course -- any four deletes against a single table can (and if you ask me, should) be done as a single delete.
the outer join was used in the CTAS, not in a delete.
why do you use outer join
A reader, April 27, 2004 - 6:57 am UTC
hi
Why is outer join needed for tab1, tab2 and tab3 :-?
create table tab1_new
as
select tab1.*
from tab1, tab2, tab3
where tab1.id = tab2.fid(+)
and tab2.id = tab3.id(+)
and NOT ( tab2.fid is not null and tab3.id is not null )
and NOT ( tab2.fid is not null and tab2.id is null )
/
April 28, 2004 - 12:20 pm UTC
because we wanted to keep all rows in tab1 -- if there is NO mate in tab2/tab3
since they deleted "where exists in tab2/tab3":
delete tab1
where exists (select null
from tab2, tab3
where tab2.id = tab3.id
and tab1.id = tab2.fid)
delete tab1
where exists (select null
from tab2
where tab2.fid = tab1.id
and tab2.id is null)
rows in tab2/tab3, we need to ALWAYS keep rows that are NOT in tab2/tab3 using CTAS. outer join is mandatory for that in this example.
to marvin
A reader, April 27, 2004 - 9:33 am UTC
hi marvin, you can try this
select * from tab1
where PK not in (select PK
from tab1
where exists (select null
from tab2, tab3
where tab2.id = tab3.id
and tab1.id = tab2.fid)
union
select *
from tab1
where exists (select null
from tab2
where tab2.fid = tab1.id
and tab2.id is null)))
regarding the conditions
A reader, April 27, 2004 - 11:09 am UTC
Hi Tom
May you show some light why
NOT ( tab2.fid is not null and tab3.id is not null )
is same as
exists (select null
from tab2, tab3
where tab2.id = tab3.id
and tab1.id = tab2.fid)
and
NOT ( tab2.fid is not null and tab2.id is null )
is same as
exists (select null
from tab2
where tab2.fid = tab1.id
and tab2.id is null)
Cant see why. Thank you
April 28, 2004 - 12:50 pm UTC
it isn't the same.
it is in fact the opposite.
if you outer join T1 to T2 to T3
and before you were looking for (where exists)
a) a match in T2 (tab1.id = tab2.fid)
b) a match in T3 for that T2 (tabe2.id = tab3.id)
then you are saying "if I outer join T1 to T2 to T3, that row would be such that:
a) tab2.fid is NOT NULL (we found a mate)
b) tab3.id is NOT NULL (we found a mate in t3 for t2)
with the where exists -- we would have deleted that row, hence with the CTAS (which is finding rows to keep) we simply NEGATE that (with NOT).
Therefore we would keep that row, IF that was "not" satisfied.
Same logic for the second part.
the second where exists says
delete the row if
a) there is a match in T2 (where tab2.fid = tab1.id)
b) the id column in t2 for that match is NULL
that would be (in an outer join)
(tab2.fid is not null) -- we joined to a row
(tab2.id is null) -- and that row is having id is null
negate it and keep it.
Updating Table having millions of records taking lot of Time..
Anand Pandey, April 28, 2004 - 3:45 am UTC
Hi Tom,
I hd a table having millions of record in which two of its cols are Null.
i just tried to update the null colls with the data from other table, which is taking taking around 10-12 hrs for a single day record, and I've to update it for 31 days.
pls help Me in getting the high perf. on updation.
Query used is:
UPDATE /*+nologging parallel(4) */ MASTER_tab A SET (null_col1,null_col2)=(SELECT /*+PARALLEL(4) */ MIN(C1),MIN(C2)
FROM Table2 B WHERE SUBSTR (A.col1_T1, 1, LENGTH (B.C1)) = B.C1
AND SUBSTR(A.col2_T1,1,3)=B.C2)
WHERE c_date='01-Jan-04'
Thanks and Regards,
Anand Pandey
Nologging - how does it impact recovery?
Naresh, April 30, 2004 - 11:12 am UTC
Hi Tom,
This is a great chain of discussion. I especially liked the "outer join to replace the not exists". I am really looking forward to my copy of your first book that I oredered recently (on it's way from amazon).
One question regarding making the table nologging: Does it not have implications for recovery? What am I missing?
April 30, 2004 - 5:16 pm UTC
you need to schedule a hot backup if you use non-logged operations, yes.
db sequential waits on UPDATE
A reader, May 14, 2004 - 9:55 am UTC
Dear Tom,
Due to some processing complexities I am unable to use a single SQL for Update/Insert.
I use:
LOOP
1. Bulk select (500 rows at a time) from Table A, C with rowids from C
2. Bulk insert
3. Bulk update table C
END LOOP
I am getting a very high number of "db file sequential read" waits on the update part. Please see relevant portions of Trace File:
********************************************************************************
SELECT /*+ FULL(v) FULL(c) PARALLEL(v,4) PARALLEL(c,4) HASH(v,c) */
c.rowid,
...
FROM vp v,citi c
WHERE v.idno= c.idno
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.03 0.07 10 234 0 0
Execute 1 0.32 3.38 3023 156 3 0
Fetch 155 1.96 82.82 0 0 0 77500
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 157 2.32 86.28 3033 390 3 77500
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 95 (recursive depth: 1)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 4 0.01 0.03
rdbms ipc reply 26 0.00 0.00
enqueue 2 0.02 0.02
PX Deq: Join ACK 7 0.00 0.00
PX Deq: Parse Reply 4 0.23 0.35
PX Deq: Execute Reply 717 2.06 735.45
PX qref latch 4 1.02 2.75
PX Deq: Table Q Normal 5218 0.35 23.95
********************************************************************************
INSERT /*+ APPEND PARALLEL(A,4) */ INTO t_v A
VALUES(:b1,
:b2)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.01 2 11 0 0
Execute 155 0.32 8.00 552 1116 5567 77500
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 156 0.32 8.02 554 1127 5567 77500
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 95 (recursive depth: 1)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 552 0.36 6.43
free buffer waits 6 1.02 1.23
********************************************************************************
UPDATE /* PARALLEL(c,4) */ citi c
SET ...
WHERE c.rowid = :b6
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 1 0 0
Execute 154 35.26 586.87 75251 77036 79595 77000
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 155 35.26 586.87 75251 77037 79595 77000
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 95 (recursive depth: 1)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 75578 1.04 461.09
free buffer waits 242 1.03 96.11
********************************************************************************
Is it not true that if the row has already been selected (and is there in the buffer cache) no disk IO should happen to update this row?
Please tell me a way to make this faster.
Regards
May 15, 2004 - 10:33 am UTC
insert /*+ append parallel... */ VALUES hmmmmmm
append = no go with values, only works with "as select"
parallel -- parallel on what exactly?
update /* parallel(c,4) */ hmmmmmm
in search of the fast = true switch. thankfully that hint is malformed and hence ignored as well -- you are updating by rowid.
parallel is *not* fast=true.
append is not either really, especially with the values clause.
you can review the trace file itself (p1,p2,p3 will tell you file and block info, along with blocks read). you can use that to verify that it is the very act of reading the indexes that need to be updated and maintained that is causing this. If they are not in the case, well, we'll need to read them into there.
More Info
A reader, May 16, 2004 - 11:41 pm UTC
Thanks Tom,
The hint in the update was there by a Developer, it has been rightly disabled. The insert is not a bottleneck so didn't look into it.
This is a development machine, there are no other jobs running, asynchronous IO is enabled, the machine is on RAID 0 (no fault tolerance - being a development one). There are NO INDEXES on Tables being inserted and updated.
The segment on which the WAITS (db sequential read) are happening are that of the TABLE that is being UPDATED.
Please guide me next.
May 17, 2004 - 7:29 am UTC
then you are seeing physical IO performed to read into the buffer cache the data needed to be updated. If it is not cached, we'll have to read it.
PARALLEL DML
Sar, May 28, 2004 - 2:50 pm UTC
Tom,
I need to update a table that has 50 million rows but
the number of rows affected are only 1 million rows.
I have a single update statement to do this.
UPDATE acc_proc
SET acc_seq_num = 0
WHERE acc_seq_num > 0
/
The column acc_seq_num is not part of an index.
Can you please suggest me if there is anything better
that I can do on this SQL to run it faster.
Thanks
Sar
May 28, 2004 - 2:56 pm UTC
nope, that is perfect.
shouldn't take very long at all. unless you are getting blocked constantly by other sessions.
do i need to commit after execute immediate ? for dml or ddl ?
A reader, June 02, 2004 - 11:31 am UTC
June 02, 2004 - 11:35 am UTC
DDL is processed *conceptually as:
begin
commit; -- commit anything you started, even if DDL fails, you are committed
do the ddl;
commit; -- commit the ddl
exception
when others
then
rollback; -- the ddl
raise;
end;
/
You need not commit DDL, it'll commit 2 times for you.
You need to commit DML as DML does not do that.
This is regardless of the method used to execute SQL, native dynamic sql, dbms_sql, static sql, whatever -- the rules are the same in all cases.
Thanks !!
A reader, June 02, 2004 - 11:50 am UTC
ok,
so i did
1.)
begin
begin
Execute immediate 'SET CONSTRAINTS ALL DEFERRED';
end;
begin
execute immediate 'delete from table ' || t;
end;
begin
execute immediate 'delete from table ' || t1;
end;
exception
when others then
rollback;
raise;
end;
/
i got the message some constraint of t1 violated. (child record exists)
if i did not issue the commit,post or rollback,
and did not perform any DDL,constraints are DEFERRED
what is the issue ? can u help ?
June 02, 2004 - 11:56 am UTC
...
begin
execute immediate 'set constraints all deferred';
execute immediate 'delete from ' || t;
execute immediate 'delete from ' || t1;
execute immediate 'set constraints all immediate';
end;
/
that is all the code you want -- the all immediate will validate the contraints
if they fail -- then the error (exception) will be propagated back to the client and the rollback of the statement (which is whatever the client sent over to the database -- a plsql block, whatever) will be undone.
A reader, June 02, 2004 - 12:01 pm UTC
...
begin
execute immediate 'set constraints all deferred';
execute immediate 'delete from ' || t; -- ***** this one
-- execute immediate 'delete from ' || t1;
-- execute immediate 'set constraints all immediate';
end;
/
the question is
as soon as it executes the
'delete from ' || t;
statement, it raises the referential constriant violation exception. why ?
as
1.) constraints all deferred
2.) it NOT a ddl
3.) I am not commiting(or ending the tx) manually
June 02, 2004 - 12:45 pm UTC
the constraint must not be deferrable.
you have to have deferrable constraints in order to defer them. the default is "not deferrable"
7.3.4 - extracting 5 million records from 100 million record table
A reader, June 15, 2004 - 12:39 pm UTC
Any suggestions on how to accomplish this on 7.3.4? No concept of nologging, partitioning, Export/query clause etc in 7.3.4..
How I can extract about 5 million records (based on a condition) into a separate database/server from a 100million record table? What's the most efficient way? I'm having to make do with very limited disk/space on both machines.
Thanks.
June 15, 2004 - 5:30 pm UTC
5 million is pretty small.
I'd just
create table as select ( concept of nologging did in fact exist, unrecoverable);
export it
import it
or use the sqlplus copy command if they are connected via a nice network.
A reader, June 16, 2004 - 5:38 pm UTC
Thanks. Had some long columns in there. So 'Create table as select' wouldn't work. Am just ending up plsql looping and hitting smaller chunks of the mega table. Then creating smaller target tables..then exp/imp.
Am afraid I won't have sufficient temp space to do a sqlplus "Copy from". Also, documentation talks about Copy not being intended for Oracle to Oracle DBs. No idea why.
June 16, 2004 - 6:12 pm UTC
copy is ONLY for oracle to oracle db's....
A reader, June 16, 2004 - 7:40 pm UTC
Per 7.3.4 documentation :
</code>
http://download-west.oracle.com/docs/pdf/A42562_1.pdf <code>
Excerpt from it -
Note: In general, the COPY command was designed to be used
for copying data between Oracle and non-Oracle databases. You should use SQL commands (CREATE TABLE AS and INSERT)
to copy data between Oracle databases.
June 17, 2004 - 7:53 am UTC
but if you think about it..... doesn't matter what the doc says, it takes two connections. connections are only to oracle databases. sure you could be using a gateway -- but even there, it would be true that create table as and insert would work.
Updating and Inserting 1 Million rows Daily with Bitmap indexes
Sidda, June 24, 2004 - 6:32 pm UTC
Hi Tom,
Here we are facing very big problem. We have a Partitioned table with 180Million records(with 70 columns and 10 bitmaps,10 B-tree indexes).End users need Bitmap indexes 24/7 so we can't drop and recreate table and indexes.Daily we have to update and insert 1Million records. We tried with bulk updates but invain. What is the best method to follow up?
Thanks in advance
Sidda
June 24, 2004 - 9:01 pm UTC
describe "in vain", what went wrong?
it would be best to do a SINGLE insert into and a SINGLE Update against this table (not even in bulk - just single statements)
Creating Table with aggregated data from another table
RB, July 26, 2004 - 1:16 pm UTC
Tom: Qn related to creating a table with data from another table -
I want to create a table with few fields and aggregated some of few columns from another table.
Here is a query:
create table SRC200401 as select ID, src, LOC, sum(BYTES) TOTAL, sum (billable_bytes) TOTAL_BILLABLE from SRC_MONTHLY where ID = 1072915200 group by ID,LOC, SRC;
I have 128Million record in the SRC_MONTHLY table and it is taking more than 1 hr to get this table created. I have index on ID, src and LOC fields.
Any faster way of getting this table created?
July 26, 2004 - 1:32 pm UTC
so how many records have that id?
Great aproach, but is it as fast for an IOT table?
Peter Tran, July 26, 2004 - 2:34 pm UTC
Hi Tom,
I'm trying the same approach with an IOT table. We have an IOT table partitioned daily. I want to recreate this table with a monthly partition. I do a CTAS parallel nologging using the new monthly partition, but it's SLOW. Then again, the table does have 200 million rows. Is the "Index Organization" part of table the slow part?
Thanks,
-Peter
July 26, 2004 - 2:47 pm UTC
what do the inputs/outputs look like (table structure) -- remember, you are sorting 200,000,000 rows as well!
did you give it a nice juicy sort area size?
Unfortunately no.
Peter Tran, July 26, 2004 - 2:58 pm UTC
"...nice juicy sort area size" That would be a negative. :( Man, this is going to take forever isn't it? Ugh...
SQL> show parameter sort_area
NAME TYPE VALUE
------------------------------------ ----------- --------
sort_area_retained_size integer 8388608
sort_area_size integer 8388608
SQL> desc odfrc;
Name Null? Type
----------------------------------------------------- -------- ------------
ODIFID NOT NULL NUMBER(10)
TRPORGN NOT NULL VARCHAR2(5)
TRPDSTN NOT NULL VARCHAR2(5)
POSCOUNTRYCODE NOT NULL VARCHAR2(3)
PAXTYPE NOT NULL VARCHAR2(1)
DCP NOT NULL NUMBER(2)
ODIFDATE NOT NULL DATE
FRCDATE NOT NULL DATE
BKGMEAN NUMBER
BKGMEANINFLUENCED NUMBER
BKGVARIANCE NUMBER
XXLMEAN NUMBER
XXLMEANINFLUENCED NUMBER
XXLVARIANCE NUMBER
Here's my CTAS:
CREATE TABLE ODFRC_MONTHLY (
ODIFID,
TRPORGN,
TRPDSTN,
POSCOUNTRYCODE,
PAXTYPE,
DCP,
ODIFDATE,
FRCDATE,
BKGMEAN,
BKGMEANINFLUENCED,
BKGVARIANCE,
XXLMEAN,
XXLMEANINFLUENCED,
XXLVARIANCE,
CONSTRAINT ODFRC_MONTHLY_PK PRIMARY KEY
(ODIFID, ODIFDATE, TRPORGN, TRPDSTN,POSCOUNTRYCODE, PAXTYPE, DCP, FRCDATE)
) ORGANIZATION INDEX nologging parallel 8
PARTITION BY RANGE (ODIFDATE)
(PARTITION ODFRC_20021130 VALUES LESS THAN (TO_DATE('2002-12-01', 'YYYY-MM-DD'))
TABLESPACE ODFRC_IDX01 ,
PARTITION ODFRC_20021231 VALUES LESS THAN (TO_DATE('2003-01-01', 'YYYY-MM-DD'))
TABLESPACE ODFRC_IDX02 ,
PARTITION ODFRC_20030131 VALUES LESS THAN (TO_DATE('2003-02-01', 'YYYY-MM-DD'))
TABLESPACE ODFRC_IDX03 ,
PARTITION ODFRC_20030228 VALUES LESS THAN (TO_DATE('2003-03-01', 'YYYY-MM-DD'))
TABLESPACE ODFRC_IDX04 ,
PARTITION ODFRC_20030331 VALUES LESS THAN (TO_DATE('2003-04-01', 'YYYY-MM-DD'))
TABLESPACE ODFRC_IDX05 ,
PARTITION ODFRC_20030430 VALUES LESS THAN (TO_DATE('2003-05-01', 'YYYY-MM-DD'))
TABLESPACE ODFRC_IDX06 ,
PARTITION ODFRC_20030531 VALUES LESS THAN (TO_DATE('2003-06-01', 'YYYY-MM-DD'))
TABLESPACE ODFRC_IDX07 ,
PARTITION ODFRC_20030630 VALUES LESS THAN (TO_DATE('2003-07-01', 'YYYY-MM-DD'))
TABLESPACE ODFRC_IDX08 ,
PARTITION ODFRC_20030731 VALUES LESS THAN (TO_DATE('2003-08-01', 'YYYY-MM-DD'))
TABLESPACE ODFRC_IDX09 ,
PARTITION ODFRC_20030831 VALUES LESS THAN (TO_DATE('2003-09-01', 'YYYY-MM-DD'))
TABLESPACE ODFRC_IDX10 ,
PARTITION ODFRC_20030930 VALUES LESS THAN (TO_DATE('2003-10-01', 'YYYY-MM-DD'))
TABLESPACE ODFRC_IDX11 ,
PARTITION ODFRC_20031031 VALUES LESS THAN (TO_DATE('2003-11-01', 'YYYY-MM-DD'))
TABLESPACE ODFRC_IDX12 ,
PARTITION ODFRC_20031131 VALUES LESS THAN (TO_DATE('2003-12-01', 'YYYY-MM-DD'))
TABLESPACE ODFRC_IDX13) AS SELECT
ODIFID,
TRPORGN,
TRPDSTN,
POSCOUNTRYCODE,
PAXTYPE,
DCP,
ODIFDATE,
FRCDATE,
BKGMEAN,
BKGMEANINFLUENCED,
BKGVARIANCE,
XXLMEAN,
XXLMEANINFLUENCED,
XXLVARIANCE
FROM odfrc nologging;
The original ODFRC table has daily partition for one year. Anyway, I can estimate how long this will take?
-Peter
July 26, 2004 - 3:29 pm UTC
the nologging is in the wrong place
from odfrc nologging;
is the same thing as:
from odfrc a;
would you consider
a) create the table (empty, alter it to be nologging)
b) insert /*+ append */ into the individual partitions (instead of all at once) in parallel sessions?
Create Table with data from an aggregated sum of few fields from another table
RB, July 26, 2004 - 3:48 pm UTC
Tom: Qn related to creating a table with data from another table -
I want to create a table with few fields and aggregated some of few columns from
another table.
Here is a query:
create table SRC200401 as select ID, src, LOC, sum(BYTES) TOTAL, sum
(billable_bytes) TOTAL_BILLABLE from SRC_MONTHLY where ID = 1072915200 group by
ID,LOC, SRC;
I have 128Million record in the SRC_MONTHLY table and it is taking more than 1
hr to get this table created. I have index on ID, src and LOC fields.
Any faster way of getting this table created?
Followup:
so how many records have that id?
RB: Tom - This number varies - we have so many IDs in the master table. If I pass one id then the query will have one equi join with that ID if more than one I was planning to use IN clause. So I do not know how many records per id I will have it in the table at any given point of time.
July 26, 2004 - 3:51 pm UTC
now I'm confused -- the predicate is variant? why create *that* table then?
what is the goal here -- to make queries of the form:
select ID, src, LOC, sum(BYTES) TOTAL, sum
(billable_bytes) TOTAL_BILLABLE from SRC_MONTHLY where ID = :x group by id, loc, src;
faster in general? (you don't have to duplicate lots of text, it is all right here)
Great suggestion!
Peter Tran, July 26, 2004 - 3:58 pm UTC
Hi Tom,
Thanks for the useful suggestion.
1) Part one done (easy).
2) Is this what you mean for part 2? When you say parallel sessions, do you mean kick off a bunch of them using execute immediate?
INSERT /*+ append */ INTO odfrc_monthly(
ODIFID,TRPORGN,TRPDSTN,POSCOUNTRYCODE,
PAXTYPE,DCP,ODIFDATE,FRCDATE,BKGMEAN,
BKGMEANINFLUENCED,BKGVARIANCE,XXLMEAN,
XXLMEANINFLUENCED,XXLVARIANCE)
SELECT
ODIFID,TRPORGN,TRPDSTN,POSCOUNTRYCODE,
PAXTYPE,DCP,ODIFDATE,FRCDATE,BKGMEAN,
BKGMEANINFLUENCED,BKGVARIANCE,XXLMEAN,
XXLMEANINFLUENCED,XXLVARIANCE
FROM odfrc partition(ODFRC_20021114) nologging;
3) Should I still give it a nice juicy sort_area_size? :)
Thanks,
-Peter
July 26, 2004 - 5:13 pm UTC
2) fire up N sqlplus sessions and run the insert append in each
3) sure, let each session have a big SAS -- alter session can be used.
Create Table with data from an aggregated sum of few fields from another table
RB, July 26, 2004 - 7:57 pm UTC
Tom - If the user can select one or more ids. If I have more than one ID then I was planning to use an IN clause in the where clause. The temp table that I am creating will be used in a later phase of the app for other joins. What I am looking for a soln which will be must fater than my current approach. The Query that I have given with a 120M table is taking more than 1 hr to create the aggregated table.
July 26, 2004 - 8:19 pm UTC
what is the query plan in general then.
you would not create a temporary table in oracle -- that would be so sqlserver. just use that query in the "IN" statment in the first place!!!!!
Update...
Peter Tran, July 27, 2004 - 4:11 pm UTC
Hi Tom,
I wanted to give you an update on progress. The nice thing about your approach is I can monitor the progress, but it's not as fast I as thought it would be.
I started 6 sessions and gave each session a sort_area_size = 200M. I then executed a month's worth of insert each session. It's taking about 30 minutes per partition/day to insert from the source daily table to the monthly table. Each partition holds around 700K to 800K rows.
Should I expect to run this long?
Thanks,
-Peter
July 27, 2004 - 7:01 pm UTC
sounds long -- can you run one of the sessions with a 10046 level 12 trace and see what they might be waiting on?
sort_area_size, pga_aggregate_target, insert /*+ APPEND*/
Dusan Valasek, July 28, 2004 - 3:02 am UTC
Hi Tom,
I have two questions about setting of big sort_area_size before direct path load.
1) How does setting of sort_area_size (from session) affect direct path load? I thought that this is useful for sort operation and building of indexes.
2) If I use pga_aggregate_target, does setting of sort_area_size (from session) override pga_aggregate_target?
Thanks,
Dushan
July 28, 2004 - 8:14 am UTC
1) you are loading an IOT. big sort going on.
2) you would set the workarea policy to manual (in your session) to allow you to override the sort area size for this "batch operation"
A reader, July 28, 2004 - 12:10 pm UTC
Would you suggest to re-create the table when other users want to update it online
July 28, 2004 - 1:28 pm UTC
huh?
How to change a DATATYPE of column.
peru, July 30, 2004 - 7:54 am UTC
Hi Tom,
How to change a datatype of a particular column.
say a table with column VARCHAR2 for date.
Now i want to change the datatype DATE.
Assume that table has many records and referenced with
tables , procedures,triggers.
Thank you
July 30, 2004 - 4:43 pm UTC
not really going to happen.
you can add a new column that is a date, update this new column to be to_date() of the varchar2 and set the varchar2 to null, drop the old column and rename the newly added column to the old name.
Update to July 27, 2004
Peter Tran, August 22, 2004 - 11:11 am UTC
Hi Tom,
Sorry it took awhile to get back with you on this. You wanted to see a 10046 level 12 trace. I wanted to do some research first and I had to rebuild the table to reproduce the step.
Here's the trace.
insert /*+ append */ into odfrc_monthly select * from odfrc
partition(ODFRC_20040201)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.01 0 0 0 0
Execute 1 15.08 5776.28 11657 15874 111333 569939
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 15.09 5776.29 11657 15874 111333 569939
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 34
Rows Row Source Operation
------- ---------------------------------------------------
1 LOAD AS SELECT
569939 INDEX FAST FULL SCAN ODFRC_PK PARTITION: 185 185 (object id 12351)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
control file sequential read 8 0.00 0.00
enqueue 1911 3.00 5726.43
db file sequential read 38 0.03 0.20
db file scattered read 380 0.63 9.50
direct path write 19 0.00 0.00
direct path read 10 0.00 0.00
log buffer space 45 0.61 12.01
log file switch completion 2 0.28 0.33
log file sync 1 0.00 0.00
SQL*Net message to client 1 0.00 0.00
SQL*Net message from client 1 0.00 0.00
Jonathan Lewis wrote in his book that you cannot use direct-path insert using the APPEND hint into an IOT table. Can this be the reason for the large enqueue timed event? Are the 6 sessions waiting to lock the index to modify it?
Thanks,
-Peter
August 22, 2004 - 4:55 pm UTC
Umm, you were blocked by someone else for an excessively long period of time here. Enqueue waits:
enqueue 1911 3.00 5726.43
normal "blocker, blockee" problem here! doh, it was me doing it to you.
only one session at a time can append. only one session at a time can direct path insert into a table.
sorry -- use normal insert, I goofed.
delete 2M records without dropping table.
Sean, August 23, 2004 - 9:18 pm UTC
Hi Tom,
I have to delete 2M records from 6M records and the table has about 400 columns (Oracle 920, Solaris 9).
I understand that your suggestion of creating temp table with the records I needed, then drop the original table and change the temp table name.
But since our table is not that big and the application is using this table all the time, we try to use traditional delete method to accomplish this.
(1)If we have enough redo tablespace, what is the good number of rows for each delete. I tried each commit for 10000 records or 1000 records. Both are quite slow.
(2)Anything you can suggest to improve the speed of the delete in the following procedure.
Thanks so much for your help
------------------------------------------------------------
create or replace procedure sp_del_audit(p_total_loop_time number)
is
total_loop_time number := 1;
BEGIN
WHILE total_loop_time <= p_total_loop_time
LOOP
delete /*+ parallel */ from audit a
where rownum < 10000
and exists ( select /*+ parallel */ null from audit_temp b
where a.Pkey = b.Pkey and
a.row_type = b.row_type and
a.update_time = b.update_time);
commit;
total_loop_time := total_loop_time + 1;
END LOOP;
END;
/
------------------------------------------------------------
August 24, 2004 - 7:19 am UTC
1) assuming you meant UNDO tablespace, there is no such thing as a REDO tablespace...
removing 2 million rows, with 400 columns (every row in this table is CHAINED, when you have more than 255 columns...) and probably many indexes -- it is not going to be what you might term "speedy".
2) do it as a SINGLE delete (deleting 10,000 records in "parallel" doesn't even begin to make sense here. PQ is for BIG BIG BIG things. this is tiny)
delete from audit
where (pkey, rowi_type, update_time) in ( select pkey, row_type, update_time
from audit_temp);
nothing more, nothing less. but don't expect this to be super fast if this table is indexed. removing 33% of the index entries is going to take some amount of time.
Deleting 90% of rows
A reader, October 13, 2004 - 10:54 pm UTC
As part of some ETL processing, I need to delete 90% of the rows in a large (6million rows) table. This table has a PK index so traditional delete takes a long time. I see the following options
1. Mark the index unusable, delete and rebuild index nologging. This is significantly faster than plain delete
2. create new table as remaining 10% of the rows, drop, rename, etc
3. Precreate a regular or GTT and do insert /*+ append */ in my batch program.
(2) is not very attractive as I dont want to be doing DDL in my code.
How would you compare (1) and (3) above? Both of them would generate similar amounts of undo/redo, take roughly same amount of time?
Thanks
October 14, 2004 - 9:19 am UTC
ETL is 'special', it doesn't matter 5 times a minute.
I'd go with #2 actually, CTAS a newone, drop oldone, rename newone to oldone.
#3 is "dangerous" -- you could lose everything in the event of a failure.
A reader, October 14, 2004 - 11:02 am UTC
"#3 is "dangerous" -- you could lose everything in the event of a failure"
Why would I lose everything? What kind of failure? Instance or media failure? Oracle guarantees recoverability of commited transactions, right, why do you bring that up here?
The only reason I dont want to do (2) is that I usually want to prevent DDL in my code. The only difference between (2) and (3) is that the table is already created in (3), right? Yes, (3) would generate much more redo/undo (lesser if I create the table as nologging), but I thought using a GTT would cut down on the redo/undo as well?)
October 14, 2004 - 11:16 am UTC
i saw "use a gtt"
i could only assume you mean:
insert rows to keep into gtt
truncate table
insert rows from gtt into the table
that would be dangerous. if you fail after the truncate, before/during the insert -- gone, all gone. gtt would be dangerous here.
(truncate is ddl btw...)
How can you Display multiple rows in one record
Mack, October 14, 2004 - 2:34 pm UTC
Hi Tom,
Let's suppose in deptno 10, there are 3 to 5 employees, I want to see the emp names like
TOM, KIM, JOHN... and so on.
Is there an easy way to do it in SQL? Not through PL/SQL. The number of records are unknown, it could have 10, 20 or one hundred. Please advise.
October 14, 2004 - 7:42 pm UTC
collect() in 10g
stragg (search for it -- nasty plsql you install once and use over and over and over and over and over) in pre 10g
Query first and then update
A reader, October 15, 2004 - 1:23 am UTC
We have two tables with approximately 28 million and 35 million records respectively. These tables are joined to produce data to be displayed to users using IE browsers. Based on this information, around 10000 records get added to these tables and around the same updated daily. Our SLA is to display each screenful (100 rows) in 2-4 seconds.
While partitioning is being reviewed to improve the performance for the queries, could you let us know if there are any issues regarding partitions? For instance someone has reported that using global indexes on a partitioned table has degraded the performance.
October 15, 2004 - 11:19 am UTC
i seriously doubt partitioning is going to be used to increase the performance of these queries.
partitioning -- great for speeding up a full scan. are you suggesting to full scan and still return in 2-4 seconds?
"someone reported using global indexes ... degraded performance"
I just about fell out of my chair on that one. geez.
If you have my book "Effective Oracle by Design" -- I go into the "physics behind partitioning".
In order to return the first 100 rows to a web based application -- you are doing to be using indexes (or you are not going to be doing what you signed up to do -- funny, you have an SLA in place but no idea if you can live up to it...). whether the tables are partitioned or not probably won't have any bearing on making this faster. Partitioning could certainly make things slower if you do it wrong, but in a system like this, you would be looking at partitioning to make administration easier and hopefully not NEGATIVELY impact performance.
given two tables...
index access
get 100 rows
I personally would be shooting for well under 1 second response times for everything -- regardless of whether there was 1 row or 1 billion.
don't get the tie into "query first and then update" though.
A reader, October 18, 2004 - 1:33 am UTC
"Our SLA is to display each *screenful* (100 rows) in 2-4 seconds."
In our case the queries, after joining 2 tables of 28 million and 35 million records, could return 10 screens or 600 screens or many more screens based on the query parameters. Each screenful (with 100 records in each screen) should appear in 2-4 seconds.
October 18, 2004 - 8:13 am UTC
so, you've signed up for an SLA you have no idea if you can meet.
but hey -- using indexes to retrieve 100 rows from 1, 1000, 10000000000000 should be about the same amount of time (and way under subsecond).
but -- getting to "screen 600" is not. Look to google as the gold standard for searching and web pagination
o totally estimate the number of returned rows -- don't even THINK about giving an accurate count
o don't give them the ability to go to page "600", pages 1-10 is more than sufficient
o even if there is a page 600 -- realize it doesn't make sense to go there (no human could know "what I need is on page 600 -- 6,000 rows into this result set). Google stops you at page 99
o understand that page 2 takes more time to retrieve than page 1, page 50 more than 2, and so on (as you page through google --each page takes longer)
But perhaps most importantly -- laugh at people that say things like:
"For instance
someone has reported that using global indexes on a partitioned table has
degraded the performance."
and say -- "yes, it has when improperly applied it can degrade performance, there have also been sightings of systems where it didn't affect performance at all, there have also been reported instances of them massively improving performance. Now what we have to do is understand how the feature works, what it does, and how it might apply to our problem"
The neat thing about that paragraph -- it is INFINITELY reusable. You can use it with regards to any feature!
(if you have effective Oracle by design -- i go into the "physics" of partitioning and who -- without the judicious use of global indexes, your system could fall apart and run really really slow as well)
Senior Statistical Analyst
Hank Freeman, October 18, 2004 - 5:58 pm UTC
18 October 2004
Tom,
Thanks for the lob_replace code !!!
It worked wonders when trying to fix about 25,000 clobs with a known error. Here my detailed discussion in outline and then in detail.
1. What type of data was is in the CLOB
2. What went wrong to create the error
3. What was done to correct it.
a. Stored Proc
b. Declare syntax
c. VBA code in Excel
1. What type of data was is in the CLOB. The company has about 45,000 internet/intranet web pages stored in an Oracle database table field which is a CLOB. Meaning instead of calling the website from a file server or physical location the site, the entire website-webpages/html source code is held in the table at this CLOB field.
2. What went wrong to create the error. An error occurred when these records were being modified to remove a specific piece of information and a replacement null character was inserted. The null character for some unknown reason did not work and the web-page information in the CLOB field got garbage appended to the end of the web-page after the closing </html> tag.
a. It was easy to spot the records but a conventional update within would not work because of all the size of the web-pages being so large and the special characters that were found within and the unique nature of working with clobs.
b. I contacted a friend who pointed me to your site and URL: </code>
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:1533006062995 c. Upon examination of the last changes you made for a previous person, the solution came into focus clear on how to update the records and, it had to be done with a program.
3. What was done to correct the problem.
a. The last stored procedure you listed on your website was used, because it was applicable to our situation. This procedure allowed the creation of a specific version of the Declare sql to select the bad part of the CLOB record and replace it with a (space character) and not a null.
i. The Stored Proc syntax, is not listed since it is the Toms lob_replace syntax verbatim.
ii. The Declare statement is listed below would take the unique record id and the starting location of the bad information in the CLOB. The sql would then pass the CLOB, the data to find, and the replacing string (a space) to the stored procedure.
Declare Sql syntax
declare
l_lob clob;
l_substr varchar2(32767);
begin
select KS_DESCRIPTION into l_lob
from ks_longdescription
WHERE KS_LONGDESCRIPTIONID = 55012 for update;
select SUBSTR(KS_DESCRIPTION, 876, length(KS_DESCRIPTION) - 876) into l_substr
from ks_longdescription
WHERE KS_LONGDESCRIPTIONID = 55012;
hf_clob_replace (l_lob, l_substr, ' ');
end;
iii. It was decided to use Excel VBA to control the update process. It was contemplated to write a second procedure with for the declare statement and just call execute it with parameters. However, as you noticed the maximum value for the l_substr Varchar2 field was declared as 32767. It was already known that some of the erroneous parts of the CLOB data were larger than 25,000 characters. Therefore, Excel was used to ascertain which records were going to be within this limit and those outside a 25,000 character limit would require a manual fix.
iv. Approximately 6,000 records were loaded into an Excel worksheet and the following Macro code was written to process/update the records. This was the optimum method of performing the cleanup because the VBA code was written to decide:
1. Length of clob
2. Starting location of the junk in the clob
3. The unique record id was already known and with the aforementioned starting location being determined by the VBA code, it was a simple task to just pass there parameters so the VBA Declare SQL string and create a specific run for each qualifying record.
4. The VBA code then would execute the Declare SQL string, firing up the stored procedure hf_lob_replace, thus cleaning up the record.
5. If a record could be processed it was, otherwise it was marked for manual cleanup.
6. During the process it was determined that any l_Substr value greater than 20,000 would cause an Oracle error on the Buffer size. Therefore, upon see that the error would occur often any l_Substr(p_what) value greater then 20,000 was immediately marked for a SQL Buffer Failure, meaning it would have to be manually updated.. The important part here was that the process would not stop if did not like a specific record, it just kept going.
In Summary, it took less than ONE second to fix a record ,with an auto-commit turned on and we knew it was working for the process could be watched. Within four hours (6,000 per hour) using two systems, the 25,000 records were corrected with only 150 needing a manual fix, a highly acceptable result.
In closing, I would like to give credit to the following.
Mr. Andy Sword, who is/was a subscriber to Toms web page and submitted to me the first draft of the Declare statement we used. This was what we really needed to know !!! I had never worked with CLOB, so this was a real leason.
Tom Web page:
http://asktom.oracle.com/pls/ask/f?p=4950:1: Tom Reed:
http://the-big-o.port5.com/index.shtml <code>where some of the best Excel/VBA code for interfacing with Oracle can be found.
I have listed the VBA code below for those who are so inclined to read it. There is nothing there that is private and confidential. So anyone who can learn from it may feel free to do so.
Respectfully
Hank Freeman
FCS Ltd
hfreeman@msn.com
updating millions of records
Josphat, October 26, 2004 - 10:28 am UTC
You approach is quite ok for one line update but how about procedure which updates the next row based on value of previous row.
Kindly look at the following scripts and give hard suggestions on how 5 million rows could be processed in under 10 hours...capturing the business logic
----starting here
---3 Scripts to process over 5 million rows
---and constitute approx 40000 distinct identifiers (id)
---should exec as ordered
drop table temporary_emphist;
create table temporary_emphist
(ID VARCHAR2(15),PAYDATE DATE, AMOUNT NUMBER(10,2),
PPART NUMBER(13,2), IPART NUMBER(13,2), INTRATE NUMBER(5,2), OBAL NUMBER(13,2),
REF NUMBER(7), NOOFMONTHS NUMBER(13,2), ACCRUED_INT NUMBER(15,2))
/
-----values(ID ,PAYDATE,AMOUNT , PPART, IPART, INTRATE, OBAL, REF, NOOFMONTHS, ACCRUED_INT)
-----data for different ids being inserted (one id might have as many as 300 entries)
-----but i have put just a few for each id
----set for customer id 10979844
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE, OBAL)
values('10979844','30-jun-2003',2000,2,112000);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10979844','30-jul-2003',2000,2);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10979844','30-aug-2003',2000,2);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10979844','30-dec-2003',2000,2);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10979844','30-jan-2004',1950,2);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10979844','30-jun-2004',5000,2);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10979844','30-aug-2004',4000,2);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10979844','30-oct-2004',3000,2);
----set for customer id 10779844
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE, OBAL)
values('10779844','30-jun-2003',2000,2,99380);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10779844','30-jul-2003',2000,2);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10779844','30-aug-2003',2000,2);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10779844','30-dec-2003',2000,2);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10779844','30-jan-2004',1950,2);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10779844','30-jun-2004',5000,2);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10779844','30-aug-2004',4000,2);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10779844','30-oct-2004',3000,2);
----set for customer id 10679844
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE, OBAL)
values('10679844','30-jun-2003',800,4,40390);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-jul-2003',1500,4);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-aug-2003',1600,4);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-dec-2003',2000,4);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-jan-2004',1950,4);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-mar-2004',1950,4);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-jul-2004',1000,4);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-aug-2004',1000,4);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-sep-2004',1992,4);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE)
values('10679844','30-oct-2004',1200,4);
commit;
set lines 1000
select * from temporary_emphist
order by paydate,id;
create index temporary_emphistid on temporary_emphist(id);
create index temporary_emphistpd on temporary_emphist(paydate);
---script 1 (will number transactions for each customer from 1 to ..n then start from 1 for the next customer
DECLARE -----Procedure 19
V_2 NUMBER(7) :=1;
V_1 VARCHAR2(18);
v_3 VARCHAR2(18);
v_4 VARCHAR2(18) :='T';
v_5 date;
CURSOR DRECS IS
SELECT ROWID,id,paydate
FROM temporary_emphist
ORDER BY ID,paydate;
BEGIN
OPEN DRECS;
LOOP
FETCH DRECS INTO V_1,v_3,v_5;
EXIT WHEN DRECS%NOTFOUND;
if v_3 <> v_4 then
v_4 := v_3;
v_2 := 1;
end if;
UPDATE temporary_emphist
SET REF = V_2
WHERE ROWID = V_1;
V_2 := V_2+1;
END LOOP;
commit;
CLOSE DRECS;
END;
/
select * from temporary_emphist
order by id,paydate;
-----script 2
----gets the difference in months from one transaction to the next
DECLARE -----Procedure 20
V_1 NUMBER(7) :=0;
V_2 NUMBER(15,2):=0;
V_3 NUMBER(15,2) :=0;
v_4 NUMBER(15,2) :=0;
V_5 NUMBER(15,2) :=0;
V_6 NUMBER(15,2) :=0;
V_7 varchar2(15) :='0';
V_8 NUMBER(15,2) :=0;
v_9 number(15,2);
v10 number(35,10) := ROUND(1/12,10);
V11 number(15,2) ;
V12 DATE;
V14 DATE;
V13 NUMBER(13,2);
V15 number(5);
CURSOR UPLO IS
SELECT REF,OBAL,amount,ID,INTRATE,last_day(PAYDATE) FROM temporary_emphist
where nvl(ref,0)>1
ORDER BY ID, REF;
BEGIN
OPEN UPLO;
LOOP
FETCH UPLO INTO V_1,V_2,V_3,V_7,V11,V12;
EXIT WHEN UPLO%NOTFOUND;
---V13 :=V12;
uPDATE temporary_emphist
SET NOOFMONTHS=mONTHS_BETWEEN(v12,last_day(paydate))
WHERE NVL(REF,0) = V_1-1 and id = V_7 ;
END LOOP;
commit;
CLOSE UPLO;
END;
/
select * from temporary_emphist
order by id,paydate;
---script 3
-------------------------------------------puts interest per month
-------------------------------------------CALCULATES INTEREST ON OUTSTANDING BALANCE
---as per the numbering above(script 1) value of rows for customer
----will be manipulated to get values for the subsequent row though some input from the
----current row will be integrated with derivative of the previous to populate values into the
----current row. This is applied for each customer untill all are cleared
SELECT 'ESTABLISHING BALANCES ON REDUCING BASIS-----Procedure 21 STARTS ' FROM DUAL;
DECLARE -----Procedure 21
V_1 NUMBER(7) :=0;
V_2 NUMBER(15,2):=0;
V_3 NUMBER(15,2) :=0;
v_4 NUMBER(15,2) :=0;
V_5 NUMBER(15,2) :=0;
V_6 NUMBER(15,2) :=0;
V_7 varchar2(15) ;
V_8 NUMBER(15,2) :=0;
v_9 number(15,2);
v10 number(35,10) := ROUND(1/12,10);
V11 number(15,2) ;
V12 NUMBER(15,2) :=0;
V13 NUMBER(15,2);
v14 number(15,2) :=0; ---interest hold
v15 number(15,2) :=0; ---balance hold
v16 number(15,2) :=0; ---ipart hold
V20 VARCHAR2(18) :='T';
CURSOR UPLO IS
SELECT NVL(REF,0),NVL(OBAL,0),NVL(amount,0),ID,INTRATE,nvl(NOOFMONTHS,1) FROM temporary_emphist
ORDER BY ID,REF;
BEGIN
OPEN UPLO;
LOOP
FETCH UPLO INTO V_1,V_2,V_3,V_7,V11,V12;
EXIT WHEN UPLO%NOTFOUND;
if v_7<>v20 AND V_1=1 then
v20 :=v_7;
V_4 :=V_2-V_3;
v16 :=ROUND((((v_4)*POWER((1+V11/100),v10*V12)-V_4)/V12),2);
v15 :=v_4;
ELSE
select obal,INTRATE,nvl(NOOFMONTHS,1) into v_4,V11,V12 from temporary_emphist
where NVL(ref,0) = V_1 - 1 AND ID = V_7 ;
select nvl(amount,0) into v_3 from temporary_emphist
where NVL(ref,0) = V_1 AND ID = V_7 ;
v15 := ROUND((v_4)*POWER((1+V11/100),v10*V12),2)-v_3; -------obal
v14 :=ROUND((v_4)*POWER((1+V11/100),v10*V12),2)-V_4; ---accrued interest
v16 :=ROUND((((v_4)*POWER((1+V11/100),v10*V12)-V_4)/V12),2); --ipart
END IF;
UPDATE temporary_emphist
SET OBAL = v15,ACCRUED_INT =v14,IPART=v16
WHERE NVL(REF,0) =V_1 AND ID = V_7 ;
--END IF;
END LOOP;
commit;
CLOSE UPLO;
END;
/
select * from temporary_emphist
order by id,paydate;
I dont know whether there is a faster way to employ the logic...I've seen your authority..give it a short
October 26, 2004 - 10:42 am UTC
one word for you:
analytics
they rock, they roll. lag/lead -- i frankly did not really read all of your code (i'm not a compiler, v_1, v_2 etc make it really truly utterly hard to follow and comprehend) but if you need to update a row based on the prior row(s) after sorting -- analytics -- lag, lead, sum, whatever.
database
cp, October 26, 2004 - 10:45 am UTC
Hi Tom,
I have to appear for interview and need your help.
1. Hows to decide for datafile size?
2. How to decide for redolog file size?
3. Any other factors which you feel as interviewer to ask?
Please help me out as its really urgent
Regards
October 26, 2004 - 10:47 am UTC
how do you know they'll ask these questions -- not questions I would ask.
datafile size = pure function of amount of data you want to store. I mean, well, what else could you say?
redolog file size is a function of
o amount of redo generated by most of your transactions
o the number of these transactions you do in a period of time
predominantly (but if you are an experienced dba -- why not just fall back on your knowledge of what you've done in the past?)
problem
cp, October 26, 2004 - 1:42 pm UTC
So untill or unless we get some opportunity we cant learn the skills to decide for initial size of database right?
Please provide words about or facotrs to think on this
Thanks!
October 26, 2004 - 2:43 pm UTC
experience.
you get an entry level job (you have to sort of start in the basement) and get experience.
update of millions of records/rows
Josphat, October 29, 2004 - 10:29 am UTC
Hi Tom,
Thanks for decisive debate in this site.
Kindly go through my script cos now I have elaborated on
the v_1,v_2..etc just variables as you will see...SORRY for MY FIRST SUBMISSION. I will be grateful if ya suggestion /code
could reduce my processing time for 5 million records from
15 hrs to 1, 2, 3? .
---Script to process over 5 million rows
---and constitute approx 40000 distinct identifiers (id)
-------------------------------------------puts interest per month
-------------------------------------------CALCULATES INTEREST ON OUTSTANDING BALANCE
-----EACH customer HAS OWN ROWS STARTING 1..N AND REPRESENTED BY REF COLUMN ON TABLE
-----MULTIPLIER FACTOR FOR EACH MONTH IS REPRESENTED BY NOOFMONTHS
-----will be manipulated to get values for the subsequent row though some input from the
-----current row will be integrated with derivative of the previous to populate values into the
-----current row. This is applied for each customer untill all are cleared
-----PROCESSING IS SEQUENCIAL (I.E FROM REF 1 ..N FOR EACH CUSTOMER)
-----REF =1 FOR EACH CUSTOMER HOLDS A VALUE FOR OBAL WHICH WILL BE USED TO FILL UP
-----THE SUBSEQUENT ROWS AFTER MANIPULATION WITH MULTIPLIER FACTOR
----work starts next
drop table temporary_emphist;
create table temporary_emphist
(ID VARCHAR2(15),PAYDATE DATE, AMOUNT NUMBER(10,2),
PPART NUMBER(13,2), IPART NUMBER(13,2), INTRATE NUMBER(5,2), OBAL NUMBER(13,2),
REF NUMBER(7), NOOFMONTHS NUMBER(13,2), ACCRUED_INT NUMBER(15,2))
/
----DATA set for customer id 10979844
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE, OBAL,ref,NOOFMONTHS)
values('10979844','30-jun-2003',2000,2,112000,1,1);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10979844','30-jul-2003',2000,2,2,1);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10979844','30-aug-2003',2000,2,3,4);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10979844','30-dec-2003',2000,2,4,1);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10979844','30-jan-2004',1950,2,5,5);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10979844','30-jun-2004',5000,2,6,2);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10979844','30-aug-2004',4000,2,7,2);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10979844','30-oct-2004',3000,2,8,0);
----DATA set for customer id 10779844
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE, OBAL,ref,NOOFMONTHS)
values('10779844','30-jun-2003',2000,2,99380,1,1);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10779844','30-jul-2003',2000,2,2,1);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10779844','30-aug-2003',2000,2,3,4);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10779844','30-dec-2003',2000,2,4,1);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10779844','30-jan-2004',1950,2,5,5);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10779844','30-jun-2004',5000,2,6,2);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10779844','30-aug-2004',4000,2,7,2);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10779844','30-oct-2004',3000,2,8,0);
----DATA set for customer id 10679844
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE, OBAL,ref,NOOFMONTHS)
values('10679844','30-jun-2003',800,4,40390,1,1);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-jul-2003',1500,4,2,1);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-aug-2003',1600,4,3,4);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-dec-2003',2000,4,4,1);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-jan-2004',1950,4,5,2);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-mar-2004',1950,4,6,4);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-jul-2004',1000,4,7,1);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-aug-2004',1000,4,8,1);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-sep-2004',1992,4,9,1);
insert into temporary_emphist(ID ,PAYDATE,AMOUNT , INTRATE,ref,NOOFMONTHS)
values('10679844','30-oct-2004',1200,4,10,0);
commit;
set lines 1000
select * from temporary_emphist
order by paydate,id;
create index temporary_emphistid on temporary_emphist(id);
create index temporary_emphistpd on temporary_emphist(paydate);
---script STARTS
DECLARE
V1_REF temporary_emphist.REF%TYPE;
V2_OBAL temporary_emphist.OBAL%TYPE;
V3_AMOUNT temporary_emphist.amount%TYPE;
v_4 NUMBER(15,2) :=0;
V5 NUMBER(15,2) :=0;
V6 NUMBER(15,2) :=0;
V7_ID Temporary_emphist.ID%TYPE;
V8 NUMBER(15,2) :=0;
v9 number(15,2);
v10 number(35,10) := ROUND(1/12,10); ---CONVERTS ONE MONTH INTO YEARS
V11_INTRATE temporary_emphist.INTRATE%TYPE;
V12_NOOFMONTHS temporary_emphist.NOOFMONTHS%TYPE;
V13 NUMBER(15,2);
v14 number(15,2) :=0; ---interest hold
v15 number(15,2) :=0; ---balance hold AS CALCULATED
v16 number(15,2) :=0; ---ipart holdING INTEREST AS CALCULATED
V20 VARCHAR2(18) :='T'; ----ARBITRALY KEY TO ENABLE FOCUS ON DIFFERENT IDS
CURSOR UPLO IS
SELECT NVL(REF,0),NVL(OBAL,0),NVL(amount,0),ID,INTRATE,nvl(NOOFMONTHS,1) FROM temporary_emphist
ORDER BY ID,REF;
BEGIN
OPEN UPLO;
LOOP
FETCH UPLO INTO V1_REF,V2_OBAL,V3_AMOUNT,V7_ID,V11_INTRATE,V12_NOOFMONTHS;
EXIT WHEN UPLO%NOTFOUND;
----V20 IS VARIABLE KEY FOR CHECKING WHETHER WORKING ON SAME ID
if v7_ID<>v20 AND V1_REF=1 then
v20 :=v7_ID;
V_4 :=V2_OBAL-V3_AMOUNT;
v16 :=ROUND((((v_4)*POWER((1+V11_INTRATE/100),v10*V12_NOOFMONTHS)-V_4)/V12_NOOFMONTHS),2);
v15 :=v_4;
ELSE
select obal,INTRATE,nvl(NOOFMONTHS,1) into v_4,V11_INTRATE,V12_NOOFMONTHS from temporary_emphist
where NVL(ref,0) = V1_REF - 1 AND ID = V7_ID ;
select nvl(amount,0) into v3_AMOUNT from temporary_emphist
where NVL(ref,0) = V1_REF AND ID = V7_ID ;
--------V10 IS ONE MONTH COVERTED INTO YEARS
v15 := ROUND((v_4)*POWER((1+V11_INTRATE/100),v10*V12_NOOFMONTHS),2)-v3_AMOUNT; -------CALCULATED obal
v14 :=ROUND((v_4)*POWER((1+V11_INTRATE/100),v10*V12_NOOFMONTHS),2)-V_4; ---CALCULATED accrued interest
v16 :=ROUND((((v_4)*POWER((1+V11_INTRATE/100),v10*V12_NOOFMONTHS)-V_4)/V12_NOOFMONTHS),2); --ipart
END IF;
UPDATE temporary_emphist
SET OBAL = v15,ACCRUED_INT =v14,IPART=v16
WHERE NVL(REF,0) =V1_REF AND ID = V7_ID ;
--END IF;
END LOOP;
commit;
CLOSE UPLO;
END;
/
select * from temporary_emphist
order by id,paydate;
spool off
October 29, 2004 - 10:49 am UTC
this isn't the right place for "recode my application".
show me your attempt at using analytics.
(UPDATE temporary_emphist
SET OBAL = v15,ACCRUED_INT =v14,IPART=v16
WHERE NVL(REF,0) =V1_REF AND ID = V7_ID ;
is still really obscure, comments didn't "do much" really)
Reader
dheeraj, November 21, 2004 - 3:35 pm UTC
Excellent discussion as usual on this site.
Selecting data over a database link from 7.3.4 to 10g
Stewart W. Bryson, December 02, 2004 - 10:54 am UTC
Tom:
If I use the "INSERT /*+ APPEND */ ..." statement on a 10g database, but the "SELECT..." part of the statement is over a database link to a 7.3.4 database, would the bulk-loader still be invoked, or would the hint be ignored?
Also, if I do a "MERGE... " statement with the "INSERT..." part of the statement being an "INSERT /*+ APPEND */...", would the MERGE work correctly, and would the bulk-loader be used?
Thanks for your time.
December 02, 2004 - 11:54 am UTC
trick question
it isn't going to work -- 9ir2 and later do not interoperate with 7.3 and earlier.
if it did work
insert /*+ append */ into local_table select * from remote_table@dblink;
could be direct pathed.
insert /*+ append */ is supported only for insert as select -- not insert values.
ops$tkyte@ORA9IR2> create table t ( x varchar2(25), y varchar2(25) );
Table created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> merge into t
2 using dual
3 on ( t.x = dual.dummy )
4 when matched then update set y = dummy
5 when not matched then insert /*+ append */ values ( dummy, null );
1 row merged.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select * from t;
X Y
------------------------- -------------------------
X
ops$tkyte@ORA9IR2> insert /*+ append */ into t values ( 'y', null );
1 row created.
ops$tkyte@ORA9IR2> select * from t;
X Y
------------------------- -------------------------
X
y
ops$tkyte@ORA9IR2> insert /*+ append */ into t select 'z', null from dual;
1 row created.
ops$tkyte@ORA9IR2> select * from t;
select * from t
*
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel
ops$tkyte@ORA9IR2>
<b>only the last insert append was a direct path...</b>
A follow-up question
Stewart W. Bryson, December 02, 2004 - 1:20 pm UTC
Thanks for the good information Tom.
I've looked and I've looked, but I still cannot find the information about 9iR2 and 7.3.4 database links stated in the documentation. As a matter of fact, the 9iR2 Migration guide lists that 7.3.4 Materialized View sites can interact with 9iR2 master sites, and vice versa:
</code>
http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96530/migrep.htm#620714 <code>
As Advanced Replication utilizes database links, this seems a little contradictory. Of course, if you tell me once more (I'm a bit stuborn) that it won't work, I promise to believe you.
As I am currently architecting a solution to try and bring some of the data from our European sites into a central repository, and as I don't yet have access to those sites, I cannot simply test it as we have no 7.3.4 sites (thank God) in the US.
I appreciate your time and understanding.
December 02, 2004 - 7:21 pm UTC
contact support -- they will confirm
version 7.3 was so far out of any sort of support at the time of 9ir2 being released that they never even considered testing it. and it does not work.
I believe that documentation to be erroneously left over from 9ir1.
mg, December 03, 2004 - 10:30 am UTC
Hi Tom,
I have a table T in tablespace TS1.
Every day so much of insertions on table T.
So I needs to check the a perticular size of TS is exceeds, if then I needs to move the older data to another tablespace.
So could you please tell me the best way to do this?
meny thanks in advance.
December 03, 2004 - 11:01 am UTC
you sound like you really want a rolling window partitioned table -- a table partitioned by date.
A reader, December 06, 2004 - 10:20 am UTC
Can we use trigger to check the % of filled tablespace size ?
December 06, 2004 - 12:35 pm UTC
yes, but -- why? just to make the code run really really slow?
What about the larger tables?
Steven, December 15, 2004 - 3:44 pm UTC
I used to work at a place where the main table had 4 billion rows and growing. This tracked manufacturing details and rows for an item trickled in and then were "archived" (written to tape then deleted) as one. Finding a good partitioning scheme was difficult due to the large variable time that items could stay in the manufacturing process. That is, two rows inserted on the same day could be archived on vastly different timescales.
The system had around 40 million rows inserted and deleted on a daily basis (about 1%). Some of the items could have 1 million or more records; they needed to be archived at the same time. The table had a PK and a secondary index.
Obviously CTAS was not an option due to the 24x7 availability requirement and besides we usually didn't have enough space for a full copy of the main table.
Is this a case where you just kick off the delete and take your lumps?
(Hmm, given past growth curves, the table should by now be up to 20 billion rows and 200 million rows in/out daily.)
December 15, 2004 - 6:25 pm UTC
or, enable row movement....
partition by a date field, the date field represents "record done as of"
when you initially insert it, it has a "done as of" way into the future (in the active partition(s))
when you are done, you update this field and it moves from active into the "you can archive me later" partitions.
It would be like the delete but the delete would be happening continously as little transactions instead of a big delete all at once.
data warehouse daily load
shgoh, December 16, 2004 - 3:58 am UTC
Dear Tom,
One of my dw developer is having performance problem with their daily data load from legacy system. Approximately 5-8 million records inserted into a table (append load table). The table has a lots of constraints and none of us are familiar with the data given (it could have a constraint voilation).
What is the method to speed up the data loading process beside parallel, direct load and nologging ?
Appreciated if you can share your view. Since we don't know well about the data and the constraints, is it advisable to disbale constraint in this case. Thanks
Rgds
SHGoh
December 16, 2004 - 8:18 am UTC
parallel
direct
nologging
they are them. how long is "long"
if you are adding 5/8 million records a day -- are you using partitioning? you can look at disabling portions of partitioned indexes and rebuilding them after the fact.
Not a partionioning table
shgoh, December 17, 2004 - 4:14 am UTC
Dear Tom,
It is not a partioning table. Would it help if I go for parallel , nologging, direct and disable portion of the constraints? Thanks
Rgds
Shgoh
December 17, 2004 - 2:30 pm UTC
you are already doing parallel, nologging, direct -- disabling the constraints only means "gotta enable them".
you asked for additional ideas. partitioning.
How about Bulk Update?
Tony, December 22, 2004 - 5:13 am UTC
Tom,
I have two tables PO_HDR and PO_DTL having millions of records. I want to join the two tables, and update a flag in PO_HDR table. Can I use Bulk Update based on rowid here?
Example:
DECLARE
TYPE tblrowid IS TABLE OF ROWID INDEX BY BINARY_INTEGER;
arr_rowid tblrowid;
CURSOR po_cur
IS SELECT a.rowid FROM po_hdr a, po_dtl b
WHERE a.poseq = b.poseq
AND a.custid = b.custid
AND b.po_qty <> b.del_qty;
BEGIN
OPEN po_cur;
LOOP
FETCH po_cur BULK COLLECT INTO arr_rowid LIMIT 100000;
EXIT WHEN arr_rowid.COUNT = 0;
FORALL i IN arr_rowid.FIRST..arr_rowid.LAST
UPDATE po_hdr
SET del_flg = 'Partial'
WHERE rowid = arr_rowid(i);
COMMIT;
END LOOP;
CLOSE po_cur;
END;
/
Is rowid based bulk update is fast, and reliable? Is there any better, and faster approach?
Please advice me.
December 22, 2004 - 9:41 am UTC
update po_hdr
set del_flg = 'Partial'
where (poseq,custid) in ( select poseq,custid
from po_dtl
where po_qty <> del_qty );
period.
To update or insert by sql statement and not procedurally
DAV, December 23, 2004 - 2:44 pm UTC
Hi,Tom,
I have 2 questions:
The one is :
By the update - sql statement that you wrote above
say i want to execute on millions records, and i want to make commit every 1000 records
how can i do it?
The second question is:
If i have to use pl/sql table and i want to load
alot of records to the table say 70,000- have i do it by:
fetch bulk bind limit 1000
or SELECT ... BULK BIND ....?
Why is it a problem to load many records to pl/sql tables?
Thank's
DAV
December 23, 2004 - 3:08 pm UTC
you don't want to commit every 1000 records, that would take longer, consume more resources and break your transactional integrity.
if you have 70k rows to process, and want to do it 1k at a time, you'll be using fetch ... bulk collect ... limit 1000;
can you spell "memory"
Tom,sorry i didn't understand you.
DAV, December 23, 2004 - 3:49 pm UTC
Why fetch is better than sql statement at the case that written above. You told before that sql is better.
Can you answer me please to the second question thar i wrote above?
Tnank you very much.
DAV
December 23, 2004 - 4:30 pm UTC
not understanding you.
you asked:
If i have to use pl/sql table and i want to load
alot of records to the table say 70,000- have i do it by:
fetch bulk bind limit 1000
or SELECT ... BULK BIND ....?
Why is it a problem to load many records to pl/sql tables?
and I answered??? the question wasn't "single sql vs procedural code"
you started with "if I HAVE TO USE..."
you would not select BULK COLLECT for 70k records, you would use fetch bulk collect with the limit clause.
I was unclear - let me explain
A reader, December 23, 2004 - 5:12 pm UTC
Sorry ,Tom,
i didn't explain myself well.
What i meant was:
First:
if i have to update million records in the plsql procedure
is it better to do it in update sql statement or procedural kode like
fetch.....? i want to commit every n records because of rollback segment so if i use sql update statement how can i do it?
Second:
General question - IF I USE PROCEDURAL CODE - why it is not good to load say 40k of records to plsql table by SELECT...BULK BIND... but by fetch bulk bind limit ...
As i know plsql table's records are kept in pga memory,so what happend if there will be 40k of records?
Please,clarify the theory.
Thank you very much.
December 23, 2004 - 8:03 pm UTC
you need to size your rollback segment large enough to do you job. seriously -- really you do.
a million records is nothing.
big_table@ORA9IR2> update big_table set owner = lower(owner);
1000000 rows updated.
big_table@ORA9IR2> select used_ublk*8/1024 from v$transaction;
USED_UBLK*8/1024
----------------
92.4921875
big_table@ORA9IR2>
93meg, big deal.
If you commit every N records -- you do realize you have to make your code restartable somehow, so when you inflict the ora-1555 on YOURSELF by committing, or fail for whatever other reason -- you can restart where you left off. Now we are talking some code!
If you have to update tons of records and tons of columns AND they are indexed, disable indexes, consider recreating the table instead of a mass update (if you are updating millions of records -- this is going to be an offline thing anyway, you might be supporting queries but you are not going to have "high concurrency")
can you spell "memory"
and think about streaming -- keeping everyone busy. So you wait for a long time while oracle gets 40k records, jams them into ram, manages them, then you process them and do it all over. doing things in bite sized pieces is easier for everyone - couple hundred in , couple hundred out.
Update 1000000 in sql statement
A reader, December 24, 2004 - 1:24 am UTC
Thank's, Tom!
What is the message ora-1555?I haven't a guide.
And i really can't findthe way to mmit n rows in update sql statement.May you give me some idea?
About loading to PL/SQL table for example 80k, if i need procedural code of update at least 1000000 records and not update sql statement so why can'not i avoid pl/sql table and do it just by regular cursor?
December 24, 2004 - 9:01 am UTC
search this site for 1555, you'll see lots and lots. check out Expert one on one Oracle -- i wrote about it lots.
YOU DO NOT WANT TO COMMIT THE ROWS INCREMENTALLY. that is my fundemental premise here.
If you have access to my book "Expert one on one Oracle" -- you can read all about it.
Another question
A reader, December 24, 2004 - 3:58 pm UTC
About loading to PL/SQL table for example 80k, if i need procedural code of
update at least 1000000 records and not update sql statement so why can'not i
avoid pl/sql table and do it just by regular cursor?
Thank's
December 24, 2004 - 4:06 pm UTC
slow by slow is slow by slow (opps, keep doing that don't I, row by row is row by row)
Sorry,misunderstanding.
A reader, December 25, 2004 - 4:21 am UTC
I didn't explain myself well.
What i meant in my question is a generall question:
If a need a cursor ( and not sql statement) for some process that acceses to a big table,why is it better to load data to pl/sql table and run on plsql table than just fetch records directly from the db table ?
December 25, 2004 - 9:10 am UTC
I understood, you didn't understand me.
slow by slow processing (a row at a time)
is slower than
o a single sql statement
o bulk collect and forall updates
doing things in BULK is
o faster
o uses less resources
index DDL in the midst of transactions
Nik Malenovic, December 30, 2004 - 3:36 pm UTC
Tom,
this is a followup to your March 07, 2004 followup related to ddl operations on indecies whilst there are outstanding transactions in progress on the underlying table. You said
"that'll happen whilst there are outstanding transactions, yes. You'll want to keep trying until you get a chance to drop it."
We just ran into a similar production problem: the system was slow because we were missing an index on a column and there were updates running full table scans (don't ask!). Since there are users banging on the darn thing, there was no way to shutdown and add index so we had to do it on the fly. Here's what we did:
declare
l_attempt pls_integer := 0;
l_limit pls_integer := 2000*10; -- 10 seconds
l_rindex binary_integer;
l_slno binary_integer;
l_target binary_integer;
begin
l_rindex := dbms_application_info.set_session_longops_nohint;
l_target := 42;
dbms_application_info.set_client_info('Vincent The Builder');
dbms_application_info.set_module('rebuild_index', 'between_transactions');
for l_attempt in 1 .. l_limit
loop
begin
dbms_application_info.set_session_longops
(
rindex => l_rindex,
slno => l_slno,
op_name => 'rebuilding the index',
target => l_target,
context => 0,
sofar => l_attempt,
totalwork => l_limit,
target_desc => 'attempt',
units => 'attempts'
);
execute immediate 'create index RESPONSE_QUESTIONID_INDX on response (questionid) nologging parallel 5';
dbms_output.put_line('SUCCESS - now rebuilding');
exit;
exception
when others then
null;
end;
if l_attempt = l_limit then
dbms_output.put_line('FAILED - there was not a phat chance to succeed');
end if;
end loop;
dbms_application_info.set_module(null, null);
dbms_application_info.set_client_info(null);
end;
/
and in another session we monitored the progress with
select sid, message from v$session_longops where time_remaining > 0;
that's all. This may be useful for somebody in a similar situation, hence the post.
thanks,
Nik
December 30, 2004 - 4:10 pm UTC
you could try (for maximum "uptime")
create index .. ONLINE;
that would
a) block and wait for outstanding transactions to complete but...
b) block new transactions from starting and
c) once it started creating -- lets the new transactions proceeed and
d) at the very very end -- does a/b again to finish
reduces the amount of "downtime" by allowing people to still work in the middle (yes, they'll be full scanning, but working -- if that was something you wanted..)
CTAS doesnt have compute statistics?
A reader, January 06, 2005 - 11:15 am UTC
create index .... compute statistics;
Why doesnt create table have a similar compute statistics clause?
I have a CTAS that takes 5 hours during which Oracle already knows everything about the data (similar to create index). Gathering stats on this table later takes another 5 hours!
Is there a way to gather stats during the CTAS itself?
Thanks
January 06, 2005 - 11:31 am UTC
Oracle does not already know about everything.
Have you traced a dbms_stats.gather_table_stats to see how many queries, in how many different ways it gathers stuff?
about the only thing Oracle knows after a CTAS is number of rows and number of blocks (and so do you, you can set that)
there is no way it was computing histograms and such
I don't believe it takes 5 hours after the fact -- if it does, have you considered "estimate"
robin, January 18, 2005 - 8:53 am UTC
I am trying to modify your code from above:
/**********************************************/
set (c1,c2,c3) = ( select max(decode(segment3,'xxxxx',ccid)),
max(decode(segment3,'zzzzz',ccid)),
max(decode(segment3,'wwwww',ccid))
FROM gl_code_combinations t2,
gl_sets_of_books gb
WHERE t2.segment5 = 'yyyyy'
and t2.segment4 = t1.s4
and t2.segment3 in ('xxxxx','zzzzzz', 'wwwww')
and t2.segment2 = t1.s2
and t2.segment1 = t1.s1
and t2.coa_id = gb.coa_id
and gb.set_of_books_id = 1 )
/***************************************************/
to allow me to update one column based on several different where clauses and having no luck.
What I want to do is:
update table t1
set t1.c1 = 'X' where t1.c5 and t1.c6 are null
set t1.c1 = 'A' where t1.4 is not null and t1.c8 > sysdate
set t1.c1 = 'B' where t1.4 is null and t1.c8 < sysdate
etc....
I've tried a case statement and decode like above, but neither seems to compile....thanks
January 18, 2005 - 10:07 am UTC
well, in answer to the "what I want to do is"
update table t1
set t1.c1 = (case when c5 is null and c6 is null
then 'X'
when c4 is not null and c8 > sysdate
then 'Y'
when c4 is null and c8 < sysdate
then 'B'
end )
where (c5 is null and c6 is null )
or (c4 is not null anc c8 > sysdate)
or (c4 is null and c8 < sysdate)
I cannot comment on the other one -- well, because there isn't anything to comment on :)
updating large table
Tracy, January 20, 2005 - 9:50 am UTC
I have table_a with 50 million rows and table_b with 10 million rows.
table_a has a new column added called statid
Table_b has statid and tabaid. tabaid has a unique constraint on it and it is also a foreign key to the primary key of table_a (id)
I want to update all rows in table_a to set the statid column to match the statid column from table_b where the tabaid column in table_b = the primary key in table_a
which of the following two do you think is better, and is there some other altogether different way which would be better still?
UPDATE
(SELECT a.statid newval
,b.statid oldval
FROM table_a a
,table_b b
WHERE a.id=b.tabaid)
SET newval = oldval
or
UPDATE table_a a SET a.statid =
(SELECT b.statid
FROM table_b b
WHERE a.id=b.tabaid)
I've tested the above on a subset of the data using rownum, and both statements seem to update at a rate of about 300,000 rows per hour, but I would like to get it going faster than this.
January 20, 2005 - 10:46 am UTC
update the join.
expect to see a nice big juicy full scan and hash join. it'll be as good as it gets.
I have a similar situation where I must update millions of records fast
Ryan, February 09, 2005 - 10:58 pm UTC
First I want to say thank you very much for all your help this last week Tom, it has helped me out a lot. Unfortunately Oracle is not easy so I have another question.
I am updating millions of records using insert statements.
I am updating the records using insert with values. I'm not updating one table using another table.
example:
insert into dbname (FirstName, Number) values ('Ryan', 1);
called with different records millions of times.
When I call "commit" at the end of all the insert's it takes more than 17 hours to commit. Of course this was importing records from a 1.5 gigabyte text file with 83000 records. The time to call all insert statements was less than a half an hour. Can you think of any reasons why the commit would be so slow. There is another program at the office that can import the same data (in another database format) in less than 20 minutes and that program is calling insert and commit in a similar fashion so it can't be that oracle is just that slow. I'm also disabling all triggers at the beginning of the import, and enabling them at the end. I'm turning off logging and using /*+ APPEND */ and parallel hints but they probably don't help as I'm not using direct load inserts.
Any help would be greatly appreciated.
February 10, 2005 - 1:07 am UTC
<quote>
I am updating millions of records using insert statements.
</quote>
really?
sorry-- but i don't believe you on your commment about commit, you are mistaken.
committing is fast, regardless of the number of records. Your method of observation is somehow "flawed".
Unless of course you have an "on commit refresh" set of mv's or something.
So, turn on 10046 level 12 trace and trace it, use tkprof to report on it and see what you see.
query retrival slow it effects in the whole system
Praveen, February 10, 2005 - 3:41 pm UTC
hi Tom
When developer using the escape sequence in the selection criteria the response time is slow upto 15sec. this delay time effect the whole system. Kindly suggest the appropriate way to resolve this.
select count(*)
from pf_document_rv where
((r_folder_path like '/Jobs/52021/Assembly/UNDER~_BAR/%' escape '~')
and (lower(r_object_id) <> lower('f004000000010471')))
takes about 15 secs to run
select count(*)
from pf_document_rv where
((r_folder_path like '/Jobs/52021/Assembly/NOBAR/%')
and (lower(r_object_id) <> lower('f004000000010471')))
takes less then one second.
Perhaps this is helpful ... if I add the escape '~' specifier to this query it also runs slowly even though it doesn't use the escape character.
regards
PKR
February 11, 2005 - 7:38 pm UTC
so, why do you believe this has anything to do with escape?
(you do understand these two queries return totally different result sets and perhaps that -- that is the "difference")
have you tkprof'ed these queries and compared them?
RE : Row Migration
A reader, February 17, 2005 - 3:35 pm UTC
Hi Tom,
You mentioned in one of the above discussion threads that while updating a null-column to a not null value we have to make sure that we have enough PCTFREE and no row migration happens.
I have a table that gets 2 million rows on a daily basis and I want to update say a column for only 50000 rows.
1. What should be my approach
A. if the column to update is indexed.
B. if the column to update is not indexed
C. How do I find out row migration that may happen
during the update
Thanks
February 17, 2005 - 6:58 pm UTC
regardless of whether the column is indexed or not -- the approach would be "the same". the index will make the update take longer as it does lots more work, but for just 50,000 rows -- it won't be a very big deal.
2,000,000 and 50,000 are only big if it is US dollars and it was your bonus and paycheck. In a database, in 2005, they are very small numbers.
ops$tkyte@ORA9IR2> set timing on
ops$tkyte@ORA9IR2> update big_table set owner = owner||'x' where mod(id,40) = 0;
50000 rows updated.
Elapsed: 00:00:18.02
ops$tkyte@ORA9IR2> create index big_table_idx on big_table(owner);
Index created.
Elapsed: 00:00:27.84
ops$tkyte@ORA9IR2> update big_table set owner = owner||'y' where mod(id,40) = 0;
50000 rows updated.
Elapsed: 00:00:28.04
ops$tkyte@ORA9IR2> select count(*) from big_table;
COUNT(*)
----------
2000000
Elapsed: 00:00:03.02
ops$tkyte@ORA9IR2>
to discover if rows migrated, you would analyze the table (perhaps using the list chained rows, if you just analyze it, you'll get chain_cnt filled in the dictionary)
you cannot find out "before" the update what will happen.
RE : Update
A reader, February 18, 2005 - 8:59 am UTC
Tom,
I think I did not make the last statement regarding the UPDATE of 50000+ rows more clear. Here is my requirement :
1. I have a flat-file coming in to our system around 10.00 AM in the morning with over 2 million records and has the following fields :
ACCOUNT_NUMBER
SECURITY_ID
COST_BASIS
....
....
2. I have another flat file (a delta file containing only data that has changed) coming in on a daily-basis around 2.00 AM in the morning with around 50000+ records and has the following fields :
ACCOUNT_NUMBER
SECURITY_ID
.....
.....
which we directly load into a table B with the following fields :
ACCOUNT_NUMBER
SECURITY_ID
COST_BASIS <== which is set to NULL during this load
.....
.....
3. Once this file is loaded, we run a process which takes unmodified data (from prior run ie. which did not appear in file in step 2) so that the table B always has full-load which is the 2 million records I was talking about
4. We need to run a process that will update the COST_BASIS field only for modified data (ie. that appeared in today's delta load which will be the 50000+ records or so that I need to update. For data is brought over from previous run, we use prior run's value itself)
In order to perform this, I thought of using the 2 million file as an external table (say table A) and join this table with B and update the COST_BASIS from A to B for the modified records. We have the logic to identify the changed records however. But my concern is that the table A has to be full-scanned for 50000+ records from B and no indexes can be created on A.
I kindly need your suggestion as to what would be the best approach in tackling this problem.
February 18, 2005 - 9:24 am UTC
why would A be full scanned more than once in a join?
RE : Full Scan
A reader, February 18, 2005 - 10:13 am UTC
Tom,
This is because I thought if we join two tables, then it will do a Nested Loop join. For each row got from table B, it would have to full scan A to get matching rows and also that B (delta table) would be my driving table (since it has fewer number of rows to start with)
If my understanding is not correct, please advise.
February 18, 2005 - 1:59 pm UTC
hash join it would do in light of no indexes.
hash join it would do even in the presence of indexes because the CBO is smart and knows indexes are evil when most or all of the rows are going to be processed.
it would be a nice big juicy fast hash join.
Use MERGE command to combine steps 3 and 4
Logan Palanisamy, February 18, 2005 - 12:51 pm UTC
Combine steps 3 and 4 with the merge command. Your problem is solved. The External table with 2 million records will be scanned just once.
MERGE INTO B
USING A
ON (B.ACCOUNT_NUMBER = A.ACCOUNT_NUMBER)
WHEN MATCHED THEN UPDATE
SET B.COST_BASIS = A.COST_BASIS
WHEN NOT MATCHED THEN INSERT (B.ACCOUNT_NUMBER, B.SECURITY_ID, B.COST_BASIS, ..) VALUES (A.ACCOUNT_NUMBER, A.SECURITY_ID, A.COST_BASIS, ..)
"A" is your external table with 2 million records.
"B" is the one that initally contains 50,000+ records. After the merge, B will have 2 million records.
The INSERT clause is step number 3
The UPDATE clause is step number 4
Hope that helps
Delete Record !!!
Anupam, February 28, 2005 - 6:49 am UTC
What is the diffence between :
1. DELETE <table> TRUNCATE WHERE <clause>;
2. DELETE <table> WHERE <clause>;
?? Both works!!!
February 28, 2005 - 8:01 am UTC
just like
ops$tkyte@ORA9IR2> delete t this_is_a_correlation_name where 1=1;
27794 rows deleted.
works -- truncate is just being used as a correlation name there (a table alias)
Delete record
Anupam, February 28, 2005 - 6:50 am UTC
What is the diffence between :
1. DELETE <table> TRUNCATE WHERE <clause>;
2. DELETE <table> WHERE <clause>;
?? Both works!!!
Deleted message
Franc Uwa, March 03, 2005 - 10:11 am UTC
the deleted was of today and trying to see if you can help me to resend those deleted message of me back to my inbox
March 03, 2005 - 10:28 am UTC
huh?
Brad Doczkat, March 10, 2005 - 10:51 am UTC
What do you do when the resident DBAs do not approve of the use of this method? How do I convince them that this approach is much more streamlined and requires far less processing time than doing an update of millions of rows the old fashioned way? For my particular need that brought me to this thread, I only have to do a one time initialization of a one character column that I want to put a not null (with default) contraint on. There are 5.5 million rows in the table. I need to do this one-time as an implementation step, yet the DBAs do not seem to buy into this approach. They would rather update so many rows a night, over the course of several nights, doing frequent commits during the update process.
March 10, 2005 - 7:11 pm UTC
tell the dba's to do the work then?
Sorry -- I just don't know how to make your dba's do something they don't want to.
Help Please I don't Understand
A Reader, March 11, 2005 - 4:20 pm UTC
Tom:
I have a project due tomorrow that I am getting paid a boat load of money for and do not have a clue
Could you write it for me without being compensated? Oh and answer my interview questions for me as well.
Just joking ;)
As always you have the best info on your site (dare we say better than meta link?).
However I have made it through two threads in just over 9 hours. Have you ever thought of hiring a few people and extracting: best part of thread ; lessons learned; sample code (I still have not found the latest and greatest 10g show_space); and a glossary for items like CTAS, (sic), SLA, FBI? Perhaps make a book out of it and sell it but it would have to be a searchable electronic copy. (Hmm, WIKI enable a portion of asktom (naa)?)
In this thread alone I found: q| quoted strings |; CTAS for speed; I really need to start using hints; I need to look into Analytics; I have to check if my DW really needs partitions
Thanks for everything.
A Reader.
(Guess response: This is a feature request
Denied :)
March 12, 2005 - 9:58 am UTC
I do that -- it is called "a book" :)
Effective Oracle by Design -- best of asktom, with scripts as of the time of the book.
Expert One on One Oracle -- what you need to know, with scripts -- second edition under construction (in fact I'm going to take April "off" -- from asktom -- in order to just get it going)
Expert One on One Oracle second will be the 10g version of what you are looking for.
Alberto Dell'Era, March 12, 2005 - 10:16 am UTC
> best of asktom, with scripts
> as of the time of the book.
Yes, but that doesn't mean that the books are just a collection of q&a from asktom (as i've found many ones believe); they are "regular" books - the q&a are mainly used as "Customer-driven Topic Selection", so to speak...
March 12, 2005 - 10:20 am UTC
correct -- definitely not a collection of asktom articles, that would be disjointed at best. Rather, a flow -- a story -- with examples from the site to help it along. The things I pick to write about are driven very much by the questions and trends I see here.
A reader, March 13, 2005 - 2:45 am UTC
Hi Tom,
Regarding...
"in fact I'm going to take April "off" -- from asktom "
What will happen to desparate people, who are looking for answers supported by technical reasons? I am sure that others will be of the same opinion that this is one of greatest help ever one can get in Oracle in timely and accurate manner...
March 13, 2005 - 9:54 am UTC
start here:
</code>
http://www.jlcomp.demon.co.uk/faq/ind_faq.html
and try:
groups.google.com -> comp.databases.oracle.*
the place where I got hooked on discussing Oracle in the first place way back in October of 1994. You might get an answer from Jonathan Lewis, Connor McDonald, or a host of others. I still lurk there from time to time.
otn.oracle.com -> discussion forums
metalink.oracle.com -> discussion forums
the oracle-l list
http://www.freelists.org/archives/oracle-l/ <code>
Where Cary Millsap or Steve Adams or Tim Gorman to name but a very few might answer your question.
You might even find yourself answering some from time to time!
dblink
shiva, March 23, 2005 - 2:49 am UTC
We are trying to access a stored procedure which is existing in another database thro dblink, in that stored procedure one update stat.. is there because of that screen gets hanged
Regards,
Shivachandramouli.R
March 23, 2005 - 8:46 am UTC
so little data... 'one update stat' what is that. and what have you done to diagnose the issue?
questions in the dependent objects after
A reader, March 27, 2005 - 10:03 pm UTC
Hi Tom,
As u suggested us to create a new table ,then drop the original table and rename
sounds the great method update the table by CTAS, drop original table and then rename it to the original table.
questions all the dependent objects such as packages, views, synonynums... all will because invalid. do these need to be recompiled? I know all the grants have to be regrants.
Thanks.
March 28, 2005 - 7:18 am UTC
dependent objects will take care of themselves next time they are executed.
A reader, March 27, 2005 - 10:06 pm UTC
Hi Tom,
Sounds the great method to update the table by CTAS, drop original table and then
rename it to the original table.
Question is all the dependent objects such as packages, views, synonynums... all
will become invalid. Do these need to be recompiled? I know all the grants have
to be regrants.
Thanks.
A reader, March 27, 2005 - 10:48 pm UTC
tom,
you said:
"
Now, just set job_queue_processes > 0 to set the "degree of threads" and in 8i
and before set job_queue_interval to say 60 or so and there you go.
"
what's the "degree of threads"?
March 28, 2005 - 7:29 am UTC
"degree of threads" is number of concurrent job queue processes, how many dbms_job jobs can be running simultaneously.
commits or not
A reader, April 01, 2005 - 5:09 pm UTC
want to insert 30 million records to a dummy table with nologging as you adivsed.
insert into /*+ append parallel ( ) */
select * from ...
if i have the large rollback segs, it can be complete without doing the commits during the insert into (for example comment every one millions inserts), is it faster without the commits?
Thanks alot
April 01, 2005 - 6:12 pm UTC
to commit or not. Everytime you commit, you will wait, wait for log file sync.
Every time you commit, you say "i'm done". That's ok, if you are restartable!
However, with a big parallel direct path insert it'll be all or nothing (no point to do N rows, and do the query all over again). Each parallel process will utilize it's own rbs and if you have no indexes, a direct path insert isn't going to generate MUCH undo at all (it writes above the HWM and doesn't need to generate UNDO information)
A reader, April 05, 2005 - 5:08 am UTC
Let's Tell the CBO About Our Great Idea!
Doug Winage, April 11, 2005 - 7:28 pm UTC
Given that it seems there are clear cases in which making and loading a new table is preferable to updating in place, I wonder if it would be just as well to make such an approach available to the CBO as a candidate execution plan. I'll grant that one wouldn't want to leave the choice entirely up to the optimizer; the safest thing would be to allow enabling and disabling of the plan option at the system level with a parameter called, say, "ctas_for_updates_enabled." One could also provide a pair of hints, say, /*+ USE_CTAS */ and /*+ NO_CTAS */, to allow one to override the CBO, if need be, and expressly demand or prohibit a plan involving recreation of the table.
April 11, 2005 - 8:23 pm UTC
all really good ideas need to go via support and get filed....
CTAS instead of Deletes
GG, April 11, 2005 - 11:02 pm UTC
Tom,
Please, this is a relevant follow up to this thread.
I have also included a test case.
Instead of deleting rows from a table with 270 Million records, I have used CTAS. The delete criteria is based on a join of two fields from two tables. The delete occurs only on one table (ORDERS). The other table is INVOICE.
If DOC_NUMBER and ORD_ITEM values from both tables match, then delete the matching records from ORDERS.
Test case:
drop table orders;
create table orders (
DOC_NUMBER number(10) not null
,S_ORD_ITEM number(10) not null
,PROMOTION varchar2(10) not null);
drop table invoice;
create table invoice(
DOC_NUMBER number(10) not null
,S_ORD_ITEM number(10) not null
,OTHER_INFO varchar2(10) not null);
begin
for i in 1..10 loop
insert into orders values(i,i+1,'ORDERS');
end loop;
commit;
end;
/
select * from orders;
DOC_NUMBER S_ORD_ITEM PROMOTION
---------- ---------- ----------
1 2 ORDERS
2 3 ORDERS
3 4 ORDERS
4 5 ORDERS
5 6 ORDERS
6 7 ORDERS
7 8 ORDERS
8 9 ORDERS
9 10 ORDERS
10 11 ORDERS
begin
for i in 1..5 loop
if mod(i,2) = 0 then
insert into invoice values(i,i+1,'INVOICE');
end if;
end loop;
commit;
end;
/
select * from invoice order by 1;
DOC_NUMBER S_ORD_ITEM PROMOTION
---------- ---------- ----------
2 3 INVOICE
4 5 INVOICE
The above two rcords should be deleted from ORDERS:
drop TABLE ORDERS_NEW;
CREATE TABLE ORDERS_NEW
NOLOGGING
AS
SELECT /*+ parallel(ORDERS,4) */ *
from orders ord
where NOT EXISTS
(select null from invoice inv
where inv.DOC_NUMBER = ord.DOC_NUMBER and
inv.S_ORD_ITEM = ord.S_ORD_ITEM);
select * from orders_new order by 1;
DOC_NUMBER S_ORD_ITEM PROMOTION
---------- ---------- ----------
1 2 ORDERS
3 4 ORDERS
5 6 ORDERS
6 7 ORDERS
7 8 ORDERS
8 9 ORDERS
9 10 ORDERS
10 11 ORDERS
IS THIS THE CORRECT WAY OF DOING IT? ESPECIALLY THE SQL WITH "NOT EXISTS"?
April 12, 2005 - 7:07 am UTC
hints need to use the correlation (alias) names....
I would just use NOT IN but not exists with the CBO will be considered as not in conceptually so it is just harder to code, but has the same net effect.
See the concepts guide for details on parallel CTAS.
Update millions of records
Marsboy, April 12, 2005 - 1:33 pm UTC
Hi Tom,
It helped me so much that my days work is now can be accomplished in minutes.
update partition
phil, April 14, 2005 - 2:33 am UTC
I am sorry for posting aq similar question in another thread. The follow up should have been here !
I am getting a little confused with the technique of updated lots of rows in a partitioned table.
I have a table partioned by month. To do an update on a column you suggest doing a create table as select. This of course will create a new table with the updated colum and the data from the table to be updated.
What I do not understand is how to handle the each partition. My new table has none, just valid data. How do I now get the new table to the structure of the old with all the partitions ?
I am sure the answer is here, I just cannot see it !
regards
April 14, 2005 - 7:43 am UTC
to do a huge bulk update of most of the rows, using a create table as select could be a way to go. I followed up on the other thread.
Partioned tables
Greg, April 14, 2005 - 7:54 am UTC
a previous poster was asking about partioned tables ..
[quote]
I am getting a little confused with the technique of updated lots of rows in a
partitioned table.
[/quote]
Feel free to ignore this question if you answered it already in the other followup (I wasn't sure which thread you posted it in, and can't see find it .. *sigh*)
Anyway ... if the table is partitioned, can you not just deal with each partition seperately ?
That is: if you have 15 partitions (ie P01 through P15 ...), and you "update" only encompased P03 and P04 ... then could you not just "re-create" that partition in the same way you "re-create" the table .. then just "stuff" the partition(s) back in?
I can't remember all the syntax to pull this little "card-shuffle" off ... but I'm pretty sure it can be done? can't it?
A reader, April 21, 2005 - 12:12 pm UTC
"
declare
l_job number;
begin
dbms_job.submit( l_job, 'execute immediate ''alter index idx1 rebuild'';' );
commit;
dbms_job.submit( l_job, 'execute immediate ''alter index idx2 rebuild'';' );
commit;
end;
Now, just set job_queue_processes > 0 to set the "degree of threads" and in 8i
and before set job_queue_interval to say 60 or so and there you go.
"
will the dbms job only be executed once?
TIA
April 22, 2005 - 9:02 am UTC
yes, well, it will execute successfully ONCE.
It might fail over and over and over and over, then run successfully and go away.
How to Update millions or records in a table?
S.Darshan, May 17, 2005 - 7:48 am UTC
I have to update one table which has 10 million records. The regular rollback segment won't enough to hold this data, In this case, how can we predict, how much rollback area will be used by this update.
Initially I thought of your approach "CREATE TABLE new_table as select <do the update "here"> from old_table;"
But I have to update about 30% of columns. Rest of the 70% columns should be not touched. Also this table have lot foreign key Constraints. Is any alternative method? Thanks in Advance.
With best regards
S.Darshan
May 17, 2005 - 9:16 am UTC
if I had to update 1/3 of a 10,000,000 row table -- that is the perfect candidate for create table as select. foreign keys or not (you can always disable, ctas and for the columns you have not updated -- enable novalidate, just put them back on)
but, if you cannot do that, disable indexes on columns to be updated and update and then rebuild the indexes. to get a grasp on "how much undo", create a copy of the first 100,000 rows of data for example, update it and see how much undo you generated.
Parallel DML
Tamilselvan, May 18, 2005 - 10:05 am UTC
I agree with what Tom said.
How ever, in many real life situation CTAS may not be executed as we think. For example, if the table has a LONG column, then CTAS will not work.
Consider Parallel DML. I have successfully implemented parallel dml in a large DB.
Here is the script:
alter table scott.table_a parallel 8;
alter index scott.table_a_p1 parallel 8;
alter table scott.tamil_temp_table_b parallel 8;
alter index scott.tamil_temp_table_b_idx parallel 8;
commit;
alter session enable parallel dml ;
alter session enable parallel query ;
update /*+ parallel(a,8) */ scott.table_a a
set a.pr_addr_id = ( select addr_per_id
from scott.tamil_temp_table_b c
where c.accnt_id = a.ID)
where exists ( select null
from scott.tamil_temp_table_b c
where c.accnt_id = a.ID) ;
commit;
alter session disable parallel dml ;
alter session disable parallel query ;
alter table scott.table_a noparallel ;
alter index scott.table_a_p1 noparallel ;
Thanks
Tamil
May 18, 2005 - 11:20 am UTC
<quote>
if the table has a LONG column, then CTAS will not work.
</quote>
convert long to clob and be done with it.
Ecellent Explanation
A reader, May 18, 2005 - 11:18 pm UTC
this is very helpful nice Toma i have question i am facing while using dbstart utility where can i ask that question
How to Update millions or records in a table
Debasish, June 08, 2005 - 5:29 am UTC
you told that create tble with update command. but I have a problem that. ( in 8i)
One of my Master table is contain following fields ( approx 500000 records)
1. ID varchar2
2. Pincode Nuimber
3. Tag char
Another three table is bankmaster1,bankmaster2,bankmaster3 fields are
1. pincode number
2. branch varchar2
I have to update tag column in master table according the pincode distribution in bankmaster table, like tag E for bank1, tag C form bank2, tag V for bank3.
when I try it through update command then it works for a small table but not for this 500000 record table.
Plese let me know in advance.
June 08, 2005 - 9:04 am UTC
I told you to create a table with the update command? Where and when -- I'll need to correct that if true, for you cannot create a table with the update command.
It works for me for a 50,000,000 row table or a 5 row table. You'll have to tell us why it isn't "working" (and hopefully it is not using a single index)
ops$tkyte@ORA10G> create table t
2 as
3 select id, id pincode, 'x' tag
4 from big_table.big_table
5 where rownum <= 1000000;
Table created.
Elapsed: 00:00:01.36
ops$tkyte@ORA10G>
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> create table t1 as select id pincode, 'xxxxxx' branch from t where mod(id,3) = 0;
Table created.
Elapsed: 00:00:01.60
ops$tkyte@ORA10G> create table t2 as select id pincode, 'xxxxxx' branch from t where mod(id,3) = 1;
Table created.
Elapsed: 00:00:01.54
ops$tkyte@ORA10G> create table t3 as select id pincode, 'xxxxxx' branch from t where mod(id,3) = 2;
Table created.
Elapsed: 00:00:01.49
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> exec dbms_stats.gather_table_stats( user, 'T', method_opt=> 'for all indexed columns' );
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.06
ops$tkyte@ORA10G> exec dbms_stats.gather_table_stats( user, 'T1', method_opt=> 'for all indexed columns' );
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.13
ops$tkyte@ORA10G> exec dbms_stats.gather_table_stats( user, 'T2', method_opt=> 'for all indexed columns' );
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.07
ops$tkyte@ORA10G> exec dbms_stats.gather_table_stats( user, 'T3', method_opt=> 'for all indexed columns' );
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.13
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> set timing on
ops$tkyte@ORA10G> update (select tag from t where pincode in (select pincode from t1)) set tag = 'E';
333333 rows updated.
Elapsed: 00:00:06.13
ops$tkyte@ORA10G> update (select tag from t where pincode in (select pincode from t2)) set tag = 'C';
333334 rows updated.
Elapsed: 00:00:06.48
ops$tkyte@ORA10G> update (select tag from t where pincode in (select pincode from t3)) set tag = 'V';
333333 rows updated.
Elapsed: 00:00:06.40
ops$tkyte@ORA10G>
Only takes a couple of seconds.
A reader, June 10, 2005 - 8:42 am UTC
Tom,
I have this query in which I'm trying update the old POC_ID with the new POC_ID. However, it doesn't produce the results I would expect. The table does not get updated with the correct POC_ID. What am I missing here?
begin for c1 in (select w.poc_id new_poc_id, a.poc_id old_poc_id
from wsar_poc w, ap32admin.adas_poc a
where w.poc_responsible_ind = a.poc_responsible_ind)
loop
update wsar_archive_cs
set poc_id = c1.new_poc_id
where poc_id = c1.old_poc_id;
end loop;
end;
Thanks,
Randy
June 10, 2005 - 10:34 am UTC
need example.
but that looks like it should be at most a SINGLE sql statement, no code whatsoever...
A reader, June 10, 2005 - 10:59 am UTC
I have a table WSAR_ARCHIVE_CS which contains a column POC_ID. POC_ID is FK to the ADAS_POC POC_ID PK. I want to update that column with the new POC_ID which is stored in WSAR_POC POC_ID PK.
Thanks,
Randy
June 10, 2005 - 11:10 am UTC
create table .....
create table ....
create table ....
insert some data...
insert some more data...
followed by "and this is what I need to do..."
A reader, June 10, 2005 - 11:28 am UTC
create table wsar_poc
(POC_ID NOT NULL NUMBER,
POC_RESPONSIBLE_IND VARCHAR2(50),
DIVISION_ID NOT NULL NUMBER,
OUSER VARCHAR2(30),
ACCESS_ROLE VARCHAR2(50),
PHONE_NUMBER VARCHAR2(20),
EMAIL VARCHAR2(200))
Create table wsar_archive_cs
(FILE_ID NOT NULL NUMBER,
ONELINER_DATE DATE,
PRODUCT VARCHAR2(50),
COMPONENT VARCHAR2(400),
PRIORITY VARCHAR2(50),
POC_ID NUMBER,
SUBJECT VARCHAR2(300),
ISSUE VARCHAR2(4000))
I have the table adas_poc which contians the old poc_id that is in the WSAR_ARCHIVE_CS table.
desc adas_poc
POC_ID NOT NULL NUMBER,
POC_RESPONSIBLE_IND VARCHAR2(50),
DIVISION_ID NOT NULL NUMBER,
OUSER VARCHAR2(30),
ACCESS_ROLE VARCHAR2(50),
PHONE_NUMBER VARCHAR2(20),
EMAIL VARCHAR2(200)
I tried to update using the cursor for loop, but no luck. The column does not get updated with the correct ID.
begin for c1 in (select w.poc_id new_poc_id, a.poc_id old_poc_id
from wsar_poc w, ap32admin.adas_poc a
where w.poc_responsible_ind = a.poc_responsible_ind)
loop
update wsar_archive_cs
set poc_id = c1.new_poc_id
where poc_id = c1.old_poc_id;
end loop;
end;
Thanks,
Randy
June 10, 2005 - 3:33 pm UTC
oh, so close -- you almost had inserts there with data to show
a) what happened
b) explain what you expected
That will definitely join wsap_poc to adas_poc by poc_responsible_ind
it will definitely then update wsar_archive_cs
you need to explain why this logic isn't working, it is rather straight forward. Demonstrate for us.
RE: Parallel Import
Bakunian, June 20, 2005 - 6:37 pm UTC
Tom,
I created range partitioned table with parallelization enabled, then I use import utility to load 5Gb of data into table however I don't see parallel processes spawned during load.
Shouldn't import take advantage of this functionality?
I am using 9.2 on Solaris 8.
These are instance parameters.
Thank you for your time.
NAME_COL_PLUS_SHOW_PARAM TYPE VALUE_COL_
--------------------------------------------- ----------- ----------
fast_start_parallel_rollback string LOW
log_parallelism integer 4
parallel_adaptive_multi_user boolean TRUE
parallel_automatic_tuning boolean TRUE
parallel_execution_message_size integer 4096
parallel_instance_group string
parallel_max_servers integer 40
parallel_min_percent integer 0
parallel_min_servers integer 0
parallel_server boolean FALSE
parallel_server_instances integer 1
parallel_threads_per_cpu integer 2
recovery_parallelism integer 0
June 21, 2005 - 7:51 am UTC
imp is a serial process, always has been - it reads a single DMP file sequentially and loads it.
10g with impdb has a parallel capability.
Tome u r great
shyam, June 30, 2005 - 3:08 am UTC
I have a quick question my statistics are as below i am loading data
datafile Blocks_Read FS.Phyblkwrt Blocks_Written Total_IOs
/oracle/u03/oradata/dev/data_01.dbf
26507 5240 31747
/oracle/u03/oradata/dev/data_09.dbf
26823 2823 29646
/oracle/u06/oradata/dev/data_08.dbf
24978 3704 28682
/oracle/u04/oradata/dev/data_10.dbf
26288 1831 28119
/oracle/u04/oradata/dev/data_02.dbf
24053 3634 27687
/oracle/u05/oradata/dev/data_11.dbf
26155 1284 27439
/oracle/u05/oradata/dev/data_07.dbf
23125 2865 25990
/oracle/u04/oradata/dev/data_06.dbf
23558 1978 25536
/oracle/u06/oradata/dev/data_25.dbf
21672 2234 23906
/oracle/u05/oradata/dev/data_03.dbf
21299 2574 23873
/oracle/u03/oradata/dev/data_05.dbf
20324 1819 22143
/oracle/u06/oradata/dev/data_04.dbf
19673 1746 21419
/oracle/u06/oradata/dev/data_12.dbf
19574 1306 20880
/oracle/u04/oradata/dev/data_14.dbf
19309 755 20064
/oracle/u03/oradata/dev/data_13.dbf
18807 848 19655
/oracle/u05/oradata/dev/data_24.dbf
17656 1316 18972
/oracle/u06/oradata/dev/data_16.dbf
16748 1730 18478
/oracle/u05/oradata/dev/data_15.dbf
17006 810 17816
/oracle/u03/oradata/dev/data_17.dbf
14810 1350 16160
/oracle/u05/oradata/dev/data_19.dbf
13552 1157 14709
/oracle/u03/oradata/dev/data_21.dbf
13622 911 14533
/oracle/u05/oradata/dev/data_23.dbf
13366 878 14244
/oracle/u04/oradata/dev/data_18.dbf
12644 908 13552
/oracle/u04/oradata/dev/data_22.dbf
12521 741 13262
/oracle/u06/oradata/dev/data_20.dbf
12137 546 12683
Tom u can observe that total reads is more than total writes as a result the performance is decreasing. let me know how should i increse the block written. as a result o this my data loading is slow
June 30, 2005 - 9:29 am UTC
who is "u" and why is "u" observing things?
how do you know that reads are the cause of performance decreasing. All I see are a bunch of numbers.
loading data
shyam, June 30, 2005 - 3:10 am UTC
this is the quesry i used to generate statis stics
select DF.Name File_Name,
FS.Phyblkrd Blocks_Read,
FS.Phyblkwrt Blocks_Written,
FS.Phyblkrd+FS.Phyblkwrt Total_IOs
from V$FILESTAT FS, V$DATAFILE DF
where DF.File#=FS.File#
order by FS.Phyblkrd+FS.Phyblkwrt desc;
June 30, 2005 - 9:30 am UTC
so, you've just shown the number of reads done since the database was started. All they are are a bunch of numbers with no meaning.
how do you know read IO is your bottleneck? what made you decide that?
performance while loading data
shyam, June 30, 2005 - 7:54 pm UTC
FS.Phyblkwrt
datafile = /oracle/u03/oradata/dev/data_01.dbf
Blocks_Read = 26507
Blocks_Written =5240
Total_IOs= 31747
June 30, 2005 - 9:09 pm UTC
so what? (he says again)
you've shown me that file has been read 26,507 times SINCE STARTUP and written to 5,240 time SINCE STARTUP
big deal, so what?
where they all during your load.
did they take A LONG TIME <<<===
PRECAUTIONS TO TAKE WHEN HUGE INSERTS
shyam, July 03, 2005 - 8:44 am UTC
tom can u tell me the tips and how to improv the speed with which data is loading.
to to improve data load speed. for huge inserts at present i have 73 tables data is being loaded it takes for one project 60 mins
July 03, 2005 - 10:28 am UTC
depends on your needs and circumstances. To describe them all would be a small book.
So, you'd need to be more "specific". How many new rows, how many old rows, is there lots of validation going on, is this a reload or a load, have lots of un-necessary indexes, etc etc etc
Delete & import 6M records - Data Warehouse
A reader, July 06, 2005 - 11:54 pm UTC
Tom,
We currently have v8.0.6 running data warehouse. We have 6M rows to delete everyday & same time import same amount. It currently take us 10hrs to do deletion part. Data are import every 5mins. Is there any way we can speed up the deletion & improve the import(using oracle imp/exp)
Thanx so much for your help
July 07, 2005 - 9:09 am UTC
insufficient data. why no partitioning? maybe a create table as select rows to keep and drop old table.
but insufficient data to answer properly.
and really old software.
and really archaic tools.
a lot has happened in the time between 8.0.6 and 10gR2 - you'd find many more options in 2005 than you did in 1997 when 8.0 was current.
slowly my data base
hameed, August 06, 2005 - 6:50 pm UTC
hi tom
pleas help me
i have program by oracle 8i& dev 6i
the problem:
data base is slowly
One large insert Vs multiple smaller inserts
Sami, August 14, 2005 - 10:55 am UTC
Dear Tom,
We had this issue in production during huge data migration.
Environment:-
Oracle 8.1.7.3 Sun 2.6 - 2 Node OPS. 12 CPU on each box.
The table in question here involved in M-M Oracle replication.
There is no other activity on both "12Million_Rec_table" and "1Million_Rec_table " other than my process.
Process-1
=========
exec dbms_reputil.replication_off();
commit;
insert /*+ append parallel(t,12) */ into 12Million_Rec_table t
select /*+ parallel */ * from 1Million_Rec_table;
Process-2
=========
exec dbms_reputil.replication_off();
commit;
--for loop with diff predicate value
loop
insert /*+ append parallel(t,12) */ into 12Million_Rec_table
select /*+ parallel */ * from 1Million_Rec_table
where predicate=&1; -- there is an Index on Predicate column
end loop
12Million_Rec_table has only one primary key index. There is no other constraints or LOB columns on this table. For some reason it didn't use direct-path insert, don't know why.
Process-2 executed N times in a sequence.Basically Process-1 is one large transaction compared to Process-2 (which has N number of transactions in a loop).
In terms of execution time the Process-1 took longer than Process-2. Process-1 was running for more than
2 hrs without completion so we stopped it. Process-2 with smaller chunk of records completed very fast.
When Process-1 started the USED_UBLK value in v$rollstat increased rapidly then slowed down gradually. I have seen
this behaviour many times while running Process-1.
My questions are
1)Why Process-1 is slower than Process-2. [We didn't have any rollback segment issue while running both Process-1 and Process-2]
2)Is there any performnce issue if the rollback segment goes beyond some size, say 200M or something..?
August 14, 2005 - 11:49 am UTC
you did not enable parallel DML, to do a parallel direct path insert, you must.
I cannot answer these questions, insufficient data. You could have captured information while it was running.
There is a *small* chance that what happened was
a) the big insert, because of the primary key index, generated gobs of undo.
b) this caused more undo to be generated than fit in the buffer cache
c) so it had to be flushed by dbwr
d) causing buffer waits by the insert (trying to get more buffers from the cache for undo)
e) and causing IO contention on the disk
The smaller inserts *MIGHT* have
a) filled up part of the buffer cache with undo blocks
b) committed, allowing the undo blocks in the cache to be reused
c) goto A n times
avoiding the buffer waits and the flushing of the undo to disk.
However, I would not have taken your approach for a small set of one million rows. It likely would have been much faster to
a) get rid of the primary key index
b) set big table nologging if not already
c) insert /*+ append */ into big_table select * from small_table;
d) recreate the primary key index in parallel (if even needed, just 12 million rows), nologging
e) put logging attribute back on for the future
f) backup affected datafiles
did not get rid of PK index because the table is in M-M replication
Sami, August 14, 2005 - 1:22 pm UTC
Dear Tom,
Thanks a lot for your immediate response. You are absolutely
right about why the "smaller chunk transactions" defeated "one large transaction".
<asktom>
a) get rid of the primary key index
b) set big table nologging if not already
c) insert /*+ append */ into big_table select * from small_table;
d) recreate the primary key index in parallel (if even needed, just 12 million
rows), nologging
e) put logging attribute back on for the future
f) backup affected datafiles
</asktom>
I agree with you 100%. Even I prefer your approach [get rid of the primary key index,delete,rebuild index]. However the table in question is involved in Master-Master replication.
We are not sure about getting rid of primary key index for the table which involved in replication. That is why we didn't take this approach.
Tom,thanks for your time & effort.
Additional Info
Sami, August 14, 2005 - 8:20 pm UTC
Our large pool is set to 0 size. I am just wondering whether this value had any impact on our parallel process poor performance.
NAME TYPE VALUE
------------------------------------ ------- ------------------------------
large_pool_size string 0
August 15, 2005 - 6:52 am UTC
are you sure it is really zero, if you are using automatic SGA memory management:
ops$xp10gr1\tkyte@ORA10GR1> show parameter large
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
__large_pool_size big integer 4M
large_pool_size big integer 0
ops$xp10gr1\tkyte@ORA10GR1>
I didn't set a large pool, but Oracle has set up a 4m one (that can grow/shrink as needed)
Large pool size
Sami, August 15, 2005 - 1:45 pm UTC
Tom, We are still in 8.1.7.3.
Yes, I am sure the large pool size is set to zero on both instances(2 node OPS)
August 15, 2005 - 10:22 pm UTC
it isn't "impacting it" performance wise, it is using memory from elsewhere if, but the large pool would be a better place since this memory for parallel execution isn't "LRU" memory like the other stuff in the shared pool -- the large pool is designed for memory that is allocated and deallocated, unlike shared pool memory which is designed to be allocated and aged out over time.
Undo - With and without index
Sami, August 15, 2005 - 3:17 pm UTC
Session #1
=================================================================================
/*
create table t1 as select * from dba_objects;
create index t1_idx01 on t1(object_id);
*/
SQL> select distinct sid from v$mystat;
SID
----------
126
SQL> insert into t1 select * from dba_objects;
48601 rows created.
=================================================================================
Session #2
=================================================================================
SQL> select taddr,used_ublk from v$transaction t,v$session s,v$rollstat r where s.taddr=t.addr and t.xidusn=r.usn and s.sid=126;
TADDR USED_UBLK
-------- ----------
1965B9F4 568
SQL> select taddr,used_ublk from v$transaction t,v$session s,v$rollstat r wheres.taddr=t.addr and t.xidusn=r.usn and s.sid=126;
TADDR USED_UBLK
-------- ----------
1965AFFC 2
SQL> select taddr,used_ublk from v$transaction t,v$session s,v$rollstat r where s.taddr=t.addr and t.xidusn=r.usn and s.sid=126;
TADDR USED_UBLK
-------- ----------
1965B9F4 636
SQL> select taddr,used_ublk from v$transaction t,v$session s,v$rollstat r where s.taddr=t.addr and t.xidusn=r.usn and s.sid=126;
TADDR USED_UBLK
-------- ----------
1965AFFC 2
SQL> select taddr,used_ublk from v$transaction t,v$session s,v$rollstat r where s.taddr=t.addr and t.xidusn=r.usn and s.sid=126;
no rows selected
SQL> select taddr,used_ublk from v$transaction t,v$session s,v$rollstat r where s.taddr=t.addr and t.xidusn=r.usn and s.sid=126;
TADDR USED_UBLK
-------- ----------
1965B9F4 754
SQL> select taddr,used_ublk from v$transaction t,v$session s,v$rollstat r where s.taddr=t.addr and t.xidusn=r.usn and s.sid=126;
TADDR USED_UBLK
-------- ----------
1965B9F4 1035
SQL>
=================================================================================
I monitored rollback activities for session#1(while inserting rows) through session#2.
While the USED_UBLK is gradually going up for one transaction , I noticed some other
transaction(TADDR=1965AFFC) with USED_UBLK=2.
Is it because of Index t1_idx01?
I didn't see this behavior when I tested the same test case without Index.
Sometimes I noticed "no rows selected" in session#2. I assume this is the time the transaction is
switching between TADDR 1965AFFC and 1965B9F4?
Please correct me if I am wrong?
August 15, 2005 - 10:35 pm UTC
ddl commits, there were two transactions there
the create table and the create index.
no ddl here
Sami, August 15, 2005 - 11:04 pm UTC
Tom,
Sorry for the confusion. I shouldn't have put the create table and create index statement over there (though it was commented)
create table and create index is outside of this scope. The reason I put those because I just wanted to show you that
1) the table is created from dba_objects
2) it has index on object_id column
I have tested this behavior in both 8i and 10g. The following output and "no rows selected" appeared N times (more than 2) WHILE THE INSERT STATEMENT IS GOING ON, just for clarity I just truncated the output.
TADDR USED_UBLK
-------- ----------
1965AFFC 2
August 16, 2005 - 10:46 am UTC
then I have no clue what we are in fact looking at there.
I cannot even remotely reproduce - nor is this the way it works.
the used_ublks would in this case increase, until you commit, then they go away.
1 big INSERT with joins - OR - multiple UPDATES...
Sandeep, August 16, 2005 - 8:05 am UTC
Hi Tom,
I'm trying to archive data from some big tables (5 or 6 of them). Before doing that I need to "rollup" an "amount"
column within each of those big tables (based on
a "creation date" criteria). The summations of these amount will be grouped by date (monthly sum-up) and will
need to be put into a reporting table.
The reporting table will have columns like
"Month", "Sum_Amt_BigTable1","Sum_Amt_BigTable2",
"Sum_Amt_BigTable3" and the like.
Initially there was only one BigTable (say BigTable1), so the archiving script was simple.
Just one - INSERT INTO REPORT_TABLE... SELECT.. FROM BIGTABLE1.. WHERE (the date criteria)
GROUP BY (the date column).
Now since there are more "BigTables", I have two(?) options:
1. Modify the insert script such that there is JOIN
between all BigTables, say BigTable1..BigTable6 (eeps!)
on the date criteria, and just sum-up the required
columns. For the sake of simplicity, assume that all
the tables contain at least one record for each month
(so no need to outer-join!).
2. Use the original INSERT script to create the initial
records within the REPORT table... then run multiple
UPDATEs on the REPORT table - one for
each bigtable, like this (roughly!)-
UPDATE REPORT_TABLE
SET Sum_Amt_BigTable2 = (SELECT sum(amountxx)
FROM BigTable2 WHERE (the date criteria)
GROUP BY (the date column))
WHERE REPORT_TABLE.month = LAST_DAY(TO_DATE(TO_CHAR(
BigTable2.date_column,'DD-MON-YYYY'),'DD-MON-YYYY'))
;
... and so on - for each remaining BigTable..
Which option do you thing will be better in terms of performance? - I suspect option2, as I'm a bit
apprehensive about joining 6 big tables for one big insert. Will be testing this out, but wanted
to have your opinion about the same.
Regards,
Sandeep
August 17, 2005 - 10:36 am UTC
#1 of course, do as much as you can in a single sql statement.
big hash area size, nice juicy full scans -- ready to rock and roll and have it done fast.
big INSERT with joins - OR - multiple UPDATES
Sandeep, August 17, 2005 - 11:57 am UTC
Hi,
Was thinking you might actually suggest #2 !!
Having a big join across 6 tables with multimillion
records? hmm... not sure.. but will definitely try this
out.
Did the #2 and it works well... will try to do the
#1 and get some stats to compare... somehow joins
between 6 *huge* tables does not convince me... but...
Thanks,
Sandeep
August 17, 2005 - 1:58 pm UTC
6 tables, 60 tables -- I don't care, make SQL work for you -- NOT the other way around.
databases were BORN TO JOIN, it is what they do
Option #3?
Bob B, August 17, 2005 - 2:25 pm UTC
If there was the likelihood of an outer join in the above scenario, it might be worth using a union all approach. Something like:
SELECT
'Big Table 1' NAME,
SUM( WHATEVER ) WHATEVER_SUM,
SUM( WHATEVER2 ) WHATEVER2_SUM,
PARAM1,
PARAM2
FROM BIG_TABLE1
WHERE CONDITION1 = 'whatever'
GROUP BY PARAM1, PARAM2
UNION ALL
SELECT
'Big Table 2' NAME,
SUM( WHATEVER ) WHATEVER_SUM,
SUM( WHATEVER2 ) WHATEVER_SUM,
PARAM1,
PARAM2
FROM BIG_TABLE2
WHERE CONDITION1 = 'whatever'
GROUP BY PARAM1, PARAM2
... etc
Essentially, write the report query for each table, union all them together and then summarize the union all query as necessary:
SELECT
SUM( SUM_WHATEVER ) SUM_WHATEVER,
SUM( SUM_WHATEVER2 ) SUM_WHATEVER2,
PARAM1,
PARAM2
FROM (
QUERY ABOVE
)
GROUP BY PARAM1, PARAM2
With more clear test case
sami, August 17, 2005 - 9:12 pm UTC
</asktom>
then I have no clue what we are in fact looking at there.
I cannot even remotely reproduce - nor is this the way it works.
the used_ublks would in this case increase, until you commit, then they go away.
</asktom>
Dear Tom,
While the session#1 is doing DIRECT PATH INSERT on TABLE which has PRIMAR KEY,I Monitored SESSION#2 as below.
While the USED_UBLK is gradually going up, there are 2 occurances it reported different TADDR with USED_UBLK=2.
I just wanted to know what is the new TADDR.
Session#1
#############################################################################
20:37:47 SQL> select distinct sid from v$mystat;
SID
----------
162
20:37:50 SQL> insert /*+ append */ into t1 select * from dba_objects;
48603 rows created.
20:38:23 SQL>
#############################################################################
Session #2
#############################################################################
SQL> desc undo_monitor
Name Null? Type
----------------------------------------- -------- -------------------------
TADDR VARCHAR2(50)
UNDO_UBLK NUMBER
TIME DATE
SQL> truncate table undo_monitor;
Table truncated.
SQL> set time on
SQL>
SQL>
SQL> declare
2 i number :=0;
3 begin
4 loop
5 for c0rec in (select taddr,used_ublk from v$session s,v$transaction t
6 where s.taddr=t.addr
7 and s.sid=&session_to_be_monitored)
8 loop
9 insert into undo_monitor values(c0rec.taddr,c0rec.used_ublk,sysdate);
10 end loop;
11 i := i+1;
12 if i=100000 then
13 exit;
14 end if;
15 end loop;
16 end;
17 /
Enter value for session_to_be_monitored: 162
old 7: and s.sid=&session_to_be_monitored)
new 7: and s.sid=162)
PL/SQL procedure successfully completed.
20:39:49 SQL> select distinct time,taddr,undo_ublk from undo_monitor order by t
ime;
TIME TADDR UNDO_UBLK
--------- -------------------------------------------------- ----------
17-AUG-05 4C64FADC 1
17-AUG-05 4C64FADC 1
17-AUG-05 4C64FADC 1
17-AUG-05 4C64FADC 1
17-AUG-05 4C6504D4 2
17-AUG-05 4C64FADC 5
17-AUG-05 4C64FADC 6
17-AUG-05 4C64FADC 9
17-AUG-05 4C64FADC 10
17-AUG-05 4C64FADC 16
17-AUG-05 4C6504D4 2
17-AUG-05 4C64FADC 17
17-AUG-05 4C64FADC 21
17-AUG-05 4C64FADC 25
17-AUG-05 4C64FADC 27
17-AUG-05 4C6504D4 2
17-AUG-05 4C64FADC 27
17-AUG-05 4C64FADC 30
17-AUG-05 4C64FADC 32
17-AUG-05 4C64FADC 36
17-AUG-05 4C64FADC 38
17-AUG-05 4C64FADC 49
17-AUG-05 4C64FADC 52
17-AUG-05 4C64FADC 52
17-AUG-05 4C64FADC 52
17-AUG-05 4C64FADC 52
17-AUG-05 4C64FADC 52
17-AUG-05 4C64FADC 52
17-AUG-05 4C64FADC 52
17-AUG-05 4C64FADC 52
80 rows selected.
20:39:56 SQL>
#############################################################################
August 18, 2005 - 3:34 pm UTC
as I said, I cannot even REMOTELY reproduce this at all.
date is only valid down to the second, perhaps -- these rows are scrambled. put I in that last column, not sysdate and see.
(select to_char(time, 'dd-mon-yyyy hh24:mi:ss' ) -- see if your times are not all in the same second there.
What if standby database were used?
Michel, August 18, 2005 - 9:25 am UTC
Hi Tom,
replacing update with CTAS and/or INSERT /*+ append */ INTO ... is especially usefull because of NOLOGGING
Now, what to do if we are using stnadby databases and so it is not a good idea to use operations with NOLOGGING
August 18, 2005 - 4:05 pm UTC
CTAS and /*+ append */ are done with logging enabled in ARCHIVELOG mode, unless you specify "alter <segment_name> NOLOGGING" and even then the dba can "force logging" at the database level.
LOGGING happens with this stuff!!!
Excellent
Peter, August 18, 2005 - 12:55 pm UTC
Hi Tom,
You are the best of best. I visit your site everyday.
I need your expertise in this regard too.
I have a fact table T
( order_id number,
customer_id number,
order_date date,
month_id number,
prd_id number,
prd_name varchar2(30),
quantity number,
zip number,
flag smallint,
swiprd_id number)
with about 5 million records imported every month.
NOTE: one product (prd_name) may have one or more prd_ids. There are about 200 prd_name and 2000 different prd_id in this table)
Oracle SERVER: Oracle9i Enterprise Edition Release 9.2.0.5.0 - 64bit Production
T was partitioned by month_id.
A local index is builded on T(customer_id, order_date, prd_name).
table T was analyzed using
dbms_stats.GATHER_TABLE_STATS(
OWNNAME=>'USER',
TABNAME=>'T',
ESTIMATE_PERCENT=>dbms_stats.auto_sample_size,
METHOD_OPT => 'for all indexed columns',
cascade => true,
degree => 4);
I need to know following statitics informations each month
1) how many continue customers for each prd_id (not just prd_name) in this month.
the continue customer is defined as A customer who bought a product (with a prd_id ) in this month
and also bought same product (with same or different prd_id) in past 365 days and ((previous order_date+ 30*previous quantity) >= current order_date)
SET flag =1
2) how many restart customers for each prd_id (not just prd_name) in this month.
the restart customer is defined as A customer who is not a continue customer and bought a product (with a prd_id ) in this month
and also bought same product (with same or different prd_id) in past 365 days and ((previous order_date+ 30*previous quantity) < current order_date)
SET flag =2
3) how many switch customers for each prd_id (not just prd_name) in this month.
the restart customer is defined as A customer who is not a continue nor restart customer and bought a product (with prd_id) in this month
and also bought different product (not just different prd_id) in past 365 days.
and also record the what prd_id switched from
SET flag =3, swiprd_id = previous prd_id
4) how many new customers for each prd_id (not prd_name) in this month.
the new customer is defined as A customer who bouht a product but never bought any product in past 365 days
SET flag =4
Current solutions: use a procedure to process each order
CREATE OR REPLACE PROCEDURE get_flag(v_monthid IN NUMBER)
AS
cursor c1 IS
SELECT distinct prd_id
from T
WHERE month_id=v_monthid order by prd_id;
cursor c2 is
SELECT customer_id, order_date, prd_name
FROM T
WHERE prd_id=v_prd_id and month_id = v_monthid
order by customer_id, order_date desc, quantity desc
FOR UPDATE of flag NOWAITE ;
cursor c3 -- for all orders with same customer and same
-- product during lookback
is SELECT order_date, quantity
FROM T
WHERE customer_id = v_customer_id
and prd_name = v_prd_name
AND order_date BETIEN (v_order_date-366)
AND (v_order_date-1)
order by order_date desc, quantity desc;
cursor c4 -- for all orders with same customer and
-- different product during lookback
is SELECT prd_id
FROM T
WHERE customer_id = v_customer_id
and prd_name <> v_prd_name
AND order_date BETIEN (v_order_date-366)
AND (v_order_date-1)
order by order_date desc, quantity desc;
TYPE customer_array is TABLE of T.customer_id%type;
TYPE order_date_array is TABLE of T.order_date%type;
TYPE prd_name_array is TABLE of T.prd_name%type;
lv_customer_arr customer_array;
lv_order_date_arr order_date_array;
lv_prd_name_arr prd_name_array;
v_prd_id T.prd_id%type;
v_customer_id T.customer_id%type;
v_order_date T.order_date%type;
v_order_date_tmp T.order_date%type;
v_prd_name T.prd_name%type;
v_quantity T.quantity%type;
v_flag PLS_INTEGER;
v_swiprd_id T.swiprd_id%type;
BEGIN
FOR lv_prd in c1
LOOP
v_prd_id :=lv_prd.prd_id;
v_customer_id :=0 ;
OPEN c2;
LOOP
FETCH BULK COLLECT INTO lv_customer_arr,
lv_order_date_arr, lv_prd_name_arr
LIMIT 100;
FOR i in 1 .. lv_customer_arr.count
LOOP
IF v_customer_id <> lv_customer_arr(i)
v_customer_id := lv_customer_arr(i);
v_order_date := lv_order_date_arr(i)
v_prd_name := lv_prd_name_arr(i);
v_flag := 0;
v_swiprd_id := 0;
OPEN c3;
FETCH c3 INTO v_order_date_tmp, v_quantity ;
IF c3%NOTFOUND
THEN
OEPN c4
FETCH INTO v_swiprd_id ;
IF c4%NOTFOUND
THEN -- bought another product
v_flag :=4;
ELSE
v_flag :=3;
END IF;
CLOSE c4;
ELSE -- bought same product
IF (v_order_date_tmp + 30*v_quantity)
>=v_order_date
THEN
v_flag :=1 ;
ELSE
v_flag :=2;
END IF;
END IF;
CLOSE c3
-- UPDATE flag and swiprd_id in table T
IF v_flag =3
THEN
UPDATE T
SET flag = v_flag,
swiprd_id = v_swiprd_id
WHERE CURRENT OF c2;
ELSE
UPDATE T
SET flag = v_flag
WHERE CURRENT OF c2;
END IF;
END IF;
END LOOP;
EXIT WHEN c2%notfound;
END LOOP;
CLOSE c2;
END LOOP;
EXCEPTION
WHEN ... THEN ....;
WHEN OTHERS THEN
dbms_output.put_line(SubStr('Error '||TO_CHAR(SQLCODE)||': '||SQLERRM, 1, 255));
RAISE;
END;
After processing I can run some query to get statitics information I need.
This solution is obviously very slow.
(for each record 1~2 querys invoke, total about 5~7 million querys called)
It will take about a week to finish the whole process.
IF I divide the prd_id into 6 different ranges and run 6 procedures with diffent prd_id ranges at same time.
It will take about 2 days. But it is still too long.
I did go thru. some threads in your site. you suggested that:
CREATE TABLE new_table as select <do the update "here"> from old_table;
............
but it is very hard to achieve <do the update "here"> here.
My questions are:
1) Is there any better way to achieve this?
2) IF this is a way to be, how to improve the performance.
Thanks,
As you requested
sami, August 20, 2005 - 9:54 am UTC
Dear Tom,
As you requested I put the sequence number(loop#).
I am able to reproduce consistenly in both 8i solaris and 10 windows.
Look at the TADDR difference between
SEQUENCE TADDR UNDO_UBLK TIME
---------- -------------------- ---------- ------------------
18714 4C656458 27 20-AUG-05 09:26:51
18715 4C65695C 2 20-AUG-05 09:26:51
Session#1
==========
SQL> truncate table t1;
Table truncated.
SQL> select index_name from dba_indexes where table_name='T1';
INDEX_NAME
------------------------------
T1_PK
SQL> desc t1
Name Null? Type
----------------------------------------- -------- ----------------------------
OWNER VARCHAR2(30)
OBJECT_NAME VARCHAR2(128)
SUBOBJECT_NAME VARCHAR2(30)
OBJECT_ID NOT NULL NUMBER
DATA_OBJECT_ID NUMBER
OBJECT_TYPE VARCHAR2(19)
CREATED DATE
LAST_DDL_TIME DATE
TIMESTAMP VARCHAR2(19)
STATUS VARCHAR2(7)
TEMPORARY VARCHAR2(1)
GENERATED VARCHAR2(1)
SECONDARY VARCHAR2(1)
SQL> select sid from v$mystat where rownum=1;
SID
----------
122
----*** STARTED MONITONING THIS SESSION FROM HERE ***----
SQL> insert /*+ APPEND */ into t1 select * from dba_objects;
48526 rows created.
SQL>
----*** STOPPED MONITONING THIS SESSION FROM HERE ***----
##########################################################
Session#2
===================
SQL> drop table undo_monitor;
Table dropped.
SQL> create global temporary table undo_monitor(sequence number,taddr varchar2(2
0),undo_ublk number,time date)
2 ;
Table created.
SQL> alter session set nls_date_format='DD-MON-YY HH24:MI:SS';
Session altered.
SQL> declare
2 i number :=0;
3 begin
4 loop
5 for c0rec in (select taddr,used_ublk from v$session s,v$transaction t
6 where s.taddr=t.addr
7 and s.sid=&session_to_be_monitored)
8 loop
9 insert into undo_monitor values(i,c0rec.taddr,c0rec.used_ublk,sysdate);
10 end loop;
11 i := i+1;
12 if i=100000 then
13 exit;
14 end if;
15 end loop;
16 end;
17 /
Enter value for session_to_be_monitored: 122
old 7: and s.sid=&session_to_be_monitored)
new 7: and s.sid=122)
PL/SQL procedure successfully completed.
SQL> spool c:\temp\1.log
SQL> set lines 120 pages 300
SQL> select min(undo_ublk),max(undo_ublk) from undo_monitor;
MIN(UNDO_UBLK) MAX(UNDO_UBLK)
-------------- --------------
1 52
SQL>
SQL> select * from undo_monitor where undo_ublk >1 and undo_ublk <52 order by se
quence;
SEQUENCE TADDR UNDO_UBLK TIME
---------- -------------------- ---------- ------------------
[truncated for display purpose]
18696 4C656458 27 20-AUG-05 09:26:51
18697 4C656458 27 20-AUG-05 09:26:51
18698 4C656458 27 20-AUG-05 09:26:51
18699 4C656458 27 20-AUG-05 09:26:51
18700 4C656458 27 20-AUG-05 09:26:51
18701 4C656458 27 20-AUG-05 09:26:51
18702 4C656458 27 20-AUG-05 09:26:51
18703 4C656458 27 20-AUG-05 09:26:51
18704 4C656458 27 20-AUG-05 09:26:51
18705 4C656458 27 20-AUG-05 09:26:51
18706 4C656458 27 20-AUG-05 09:26:51
18707 4C656458 27 20-AUG-05 09:26:51
18708 4C656458 27 20-AUG-05 09:26:51
18709 4C656458 27 20-AUG-05 09:26:51
18710 4C656458 27 20-AUG-05 09:26:51
18711 4C656458 27 20-AUG-05 09:26:51
18712 4C656458 27 20-AUG-05 09:26:51
18713 4C656458 27 20-AUG-05 09:26:51
18714 4C656458 27 20-AUG-05 09:26:51
18715 4C65695C 2 20-AUG-05 09:26:51
18716 4C65695C 2 20-AUG-05 09:26:51
18717 4C65695C 2 20-AUG-05 09:26:51
18718 4C65695C 2 20-AUG-05 09:26:51
18719 4C65695C 2 20-AUG-05 09:26:51
18720 4C65695C 2 20-AUG-05 09:26:51
18721 4C65695C 2 20-AUG-05 09:26:51
18722 4C65695C 2 20-AUG-05 09:26:51
18723 4C65695C 2 20-AUG-05 09:26:51
18724 4C65695C 2 20-AUG-05 09:26:51
18725 4C65695C 2 20-AUG-05 09:26:51
18726 4C65695C 2 20-AUG-05 09:26:51
18727 4C65695C 2 20-AUG-05 09:26:51
18728 4C65695C 2 20-AUG-05 09:26:51
18729 4C65695C 2 20-AUG-05 09:26:51
18730 4C65695C 2 20-AUG-05 09:26:51
18731 4C65695C 2 20-AUG-05 09:26:51
18732 4C65695C 2 20-AUG-05 09:26:51
18733 4C65695C 2 20-AUG-05 09:26:51
18734 4C65695C 2 20-AUG-05 09:26:51
18735 4C65695C 2 20-AUG-05 09:26:51
18736 4C65695C 2 20-AUG-05 09:26:51
18737 4C65695C 2 20-AUG-05 09:26:51
18738 4C65695C 2 20-AUG-05 09:26:51
18739 4C65695C 2 20-AUG-05 09:26:51
18740 4C65695C 2 20-AUG-05 09:26:51
18741 4C65695C 2 20-AUG-05 09:26:51
18742 4C65695C 2 20-AUG-05 09:26:51
18743 4C65695C 2 20-AUG-05 09:26:51
18744 4C65695C 2 20-AUG-05 09:26:51
18745 4C65695C 2 20-AUG-05 09:26:51
18746 4C656458 38 20-AUG-05 09:26:51
18747 4C656458 38 20-AUG-05 09:26:51
18748 4C656458 38 20-AUG-05 09:26:51
18749 4C656458 38 20-AUG-05 09:26:51
18750 4C656458 38 20-AUG-05 09:26:51
18751 4C656458 38 20-AUG-05 09:26:51
18752 4C656458 38 20-AUG-05 09:26:51
18753 4C656458 38 20-AUG-05 09:26:51
18754 4C656458 38 20-AUG-05 09:26:51
18755 4C656458 38 20-AUG-05 09:26:51
18756 4C656458 38 20-AUG-05 09:26:51
18757 4C656458 38 20-AUG-05 09:26:51
18758 4C656458 38 20-AUG-05 09:26:51
18759 4C656458 38 20-AUG-05 09:26:51
18760 4C656458 38 20-AUG-05 09:26:51
18761 4C656458 38 20-AUG-05 09:26:51
18762 4C656458 38 20-AUG-05 09:26:51
18763 4C656458 38 20-AUG-05 09:26:51
18764 4C656458 38 20-AUG-05 09:26:51
18765 4C656458 38 20-AUG-05 09:26:51
18766 4C656458 38 20-AUG-05 09:26:51
18767 4C656458 38 20-AUG-05 09:26:51
18768 4C656458 38 20-AUG-05 09:26:51
18769 4C656458 38 20-AUG-05 09:26:51
18770 4C656458 38 20-AUG-05 09:26:51
18771 4C656458 38 20-AUG-05 09:26:51
18772 4C656458 38 20-AUG-05 09:26:51
18773 4C656458 38 20-AUG-05 09:26:51
18774 4C656458 38 20-AUG-05 09:26:51
18775 4C656458 38 20-AUG-05 09:26:51
18776 4C656458 38 20-AUG-05 09:26:51
18777 4C656458 38 20-AUG-05 09:26:51
18778 4C656458 38 20-AUG-05 09:26:51
18779 4C656458 38 20-AUG-05 09:26:51
18780 4C656458 38 20-AUG-05 09:26:51
18781 4C656458 38 20-AUG-05 09:26:51
18782 4C656458 38 20-AUG-05 09:26:51
18783 4C656458 38 20-AUG-05 09:26:51
18784 4C656458 38 20-AUG-05 09:26:51
18785 4C656458 38 20-AUG-05 09:26:51
18786 4C656458 38 20-AUG-05 09:26:51
18787 4C656458 38 20-AUG-05 09:26:51
18788 4C656458 38 20-AUG-05 09:26:51
18789 4C656458 38 20-AUG-05 09:26:51
18790 4C656458 38 20-AUG-05 09:26:51
18791 4C656458 38 20-AUG-05 09:26:51
18792 4C656458 38 20-AUG-05 09:26:51
18793 4C656458 38 20-AUG-05 09:26:51
18794 4C656458 38 20-AUG-05 09:26:51
18795 4C656458 38 20-AUG-05 09:26:51
18796 4C656458 38 20-AUG-05 09:26:51
18797 4C656458 38 20-AUG-05 09:26:51
18798 4C656458 38 20-AUG-05 09:26:51
18799 4C656458 38 20-AUG-05 09:26:51
18800 4C656458 38 20-AUG-05 09:26:51
18801 4C656458 38 20-AUG-05 09:26:51
18802 4C656458 38 20-AUG-05 09:26:51
18803 4C656458 38 20-AUG-05 09:26:51
18804 4C656458 38 20-AUG-05 09:26:51
18805 4C656458 38 20-AUG-05 09:26:51
18806 4C656458 38 20-AUG-05 09:26:51
18807 4C656458 38 20-AUG-05 09:26:51
18808 4C656458 38 20-AUG-05 09:26:51
18809 4C656458 38 20-AUG-05 09:26:51
18810 4C656458 38 20-AUG-05 09:26:51
18811 4C656458 38 20-AUG-05 09:26:51
18812 4C656458 38 20-AUG-05 09:26:51
18813 4C656458 38 20-AUG-05 09:26:51
18814 4C656458 38 20-AUG-05 09:26:51
18815 4C656458 38 20-AUG-05 09:26:51
18816 4C656458 38 20-AUG-05 09:26:51
18817 4C656458 38 20-AUG-05 09:26:51
18818 4C656458 38 20-AUG-05 09:26:51
18819 4C656458 38 20-AUG-05 09:26:51
18820 4C656458 38 20-AUG-05 09:26:51
18821 4C656458 38 20-AUG-05 09:26:51
18822 4C656458 38 20-AUG-05 09:26:51
18823 4C656458 38 20-AUG-05 09:26:51
18824 4C656458 38 20-AUG-05 09:26:51
18825 4C656458 38 20-AUG-05 09:26:51
18826 4C656458 38 20-AUG-05 09:26:51
18827 4C656458 38 20-AUG-05 09:26:51
18828 4C656458 38 20-AUG-05 09:26:51
18829 4C656458 38 20-AUG-05 09:26:51
18830 4C656458 38 20-AUG-05 09:26:51
18831 4C656458 38 20-AUG-05 09:26:51
18832 4C656458 38 20-AUG-05 09:26:51
18833 4C656458 38 20-AUG-05 09:26:51
18834 4C656458 38 20-AUG-05 09:26:51
18835 4C656458 38 20-AUG-05 09:26:51
18836 4C656458 38 20-AUG-05 09:26:51
18837 4C656458 38 20-AUG-05 09:26:51
18838 4C656458 40 20-AUG-05 09:26:51
18839 4C656458 40 20-AUG-05 09:26:51
18840 4C656458 40 20-AUG-05 09:26:51
18841 4C656458 40 20-AUG-05 09:26:51
18842 4C656458 40 20-AUG-05 09:26:51
18843 4C656458 40 20-AUG-05 09:26:51
18844 4C656458 40 20-AUG-05 09:26:51
18845 4C656458 40 20-AUG-05 09:26:51
18846 4C656458 40 20-AUG-05 09:26:51
18847 4C656458 40 20-AUG-05 09:26:51
18848 4C656458 40 20-AUG-05 09:26:51
18849 4C656458 40 20-AUG-05 09:26:51
18850 4C656458 40 20-AUG-05 09:26:51
18851 4C656458 40 20-AUG-05 09:26:51
18852 4C656458 40 20-AUG-05 09:26:51
18853 4C656458 40 20-AUG-05 09:26:51
18854 4C656458 40 20-AUG-05 09:26:51
18855 4C656458 40 20-AUG-05 09:26:51
18856 4C656458 40 20-AUG-05 09:26:51
18857 4C656458 40 20-AUG-05 09:26:51
18858 4C656458 40 20-AUG-05 09:26:51
18859 4C656458 40 20-AUG-05 09:26:51
18860 4C656458 40 20-AUG-05 09:26:51
18861 4C656458 40 20-AUG-05 09:26:51
18862 4C656458 40 20-AUG-05 09:26:51
18863 4C656458 40 20-AUG-05 09:26:51
18864 4C656458 40 20-AUG-05 09:26:51
18865 4C656458 40 20-AUG-05 09:26:51
18866 4C656458 45 20-AUG-05 09:26:51
18867 4C656458 45 20-AUG-05 09:26:51
18868 4C656458 45 20-AUG-05 09:26:51
18869 4C656458 45 20-AUG-05 09:26:51
18870 4C656458 45 20-AUG-05 09:26:51
18871 4C656458 45 20-AUG-05 09:26:51
18872 4C656458 45 20-AUG-05 09:26:51
18873 4C656458 45 20-AUG-05 09:26:51
18874 4C656458 45 20-AUG-05 09:26:51
18875 4C656458 45 20-AUG-05 09:26:51
18876 4C656458 45 20-AUG-05 09:26:51
18877 4C656458 45 20-AUG-05 09:26:51
18878 4C656458 45 20-AUG-05 09:26:51
18879 4C656458 46 20-AUG-05 09:26:51
18880 4C656458 51 20-AUG-05 09:26:51
18881 4C656458 51 20-AUG-05 09:26:51
18882 4C656458 51 20-AUG-05 09:26:51
18883 4C656458 51 20-AUG-05 09:26:51
18884 4C656458 51 20-AUG-05 09:26:51
18885 4C656458 51 20-AUG-05 09:26:51
18886 4C656458 51 20-AUG-05 09:26:51
18887 4C656458 51 20-AUG-05 09:26:51
18888 4C656458 51 20-AUG-05 09:26:51
18889 4C656458 51 20-AUG-05 09:26:51
18890 4C656458 51 20-AUG-05 09:26:51
August 20, 2005 - 5:15 pm UTC
Recursive SQL.
turn on sql trace and see if you are not doing SQL behind your SQL (space management, quotas, etc)....
That was likely a recursive sql transaction to update the dictionary.
Which is the best way to update?
A Reader, August 22, 2005 - 11:45 pm UTC
Hi Tom,
We have a fact table with few dimension tables. Initially the records are inserted into the fact table. This way most of the columns in the table will be populated with values. For the remaining columns we have to update the table by joining with other tables.
1) What will be the best way to update? We are currently using correlated queries to update, but there is no index on the column being joined in the fact table. The source table has index if its a key column.
2)Should we create index on the joining columns in the fact table? If there are many such columns, is it a good idea to create index on each of them before updating? Will this improve performance?
3)We are currently analyzing the fact table before update. Is this the right way though we are not creating index on all required columns before update?
4) Should index be created on table before update and the table be analyzed?
Your inputs will be highly appreciated and very helpful.
Thanks in advance.
August 24, 2005 - 3:44 am UTC
I like to update joins
update ( select t1.a, t2.b from t1, t2 where t1.fk = t2.pk )
set a = b;
t2 would need to have a primary key in place for this.
Which is the best way to update?
A Reviewer, August 24, 2005 - 12:17 am UTC
Tom can you please answer the queries posted in the immediately previous post?
August 24, 2005 - 9:06 am UTC
I did, update the join by primary key. That was my answer. Use the CBO.
Allan Sun, October 02, 2005 - 1:01 pm UTC
I found your answer is very helpful and wonder how I can submit my question to you.
Thanks
October 02, 2005 - 5:35 pm UTC
when taking them, there is a button on the home page.
Calculate redolog generation
A reader, October 04, 2005 - 1:49 pm UTC
Tom,
Is there any way to calculate the amount of redolog that is going to be generated when some 2 million records are updated?
I have to update 2 million records (I can't recreate a dummy table etc etc). I need to know how much redolog will be generated and OS space required for the undo datafile
October 04, 2005 - 6:23 pm UTC
not really, not without simulating at least a portion of it on some system and measuring the amount of redo.
Table/column widths, row chaining/migration, indexes involved - they'll all play a role.
I frequently do 1,000,000 to 10,000,000 records on my laptop - I would think you can test this out on a test system someplace?
it's very good
gopal, December 07, 2005 - 5:51 am UTC
Delete, then Insert or Update?
A reader, January 06, 2006 - 3:35 pm UTC
Tom,
Quick Question. while trying to modify a row in a table from a procedure, is it better[Performance wise] to delete existing one and then insert with new values or update the existing values with new values,WHY?
Thanks,
January 06, 2006 - 4:52 pm UTC
update is less work.
you typically update a subset of the columns.
only the columns that change values and have indexes on them will affect the indexes.
doing a delete+insert would take more resources, hit all indexes.
Large update due to consolidation
David Piazza, January 25, 2006 - 8:41 pm UTC
We are consolidating five databases where the schemas are the same into one database. Currently a test system was created with five schemas that hold the previous data from the five databases. It isn't a clean update as all primary keys are in the same range, so some number x will have to be added to the primary(Psuedo key) and child keys in each schema to avoid dups. After reading through this topic, I can see two ways of doing the merge and updating the first schema: CTAS incrementing each primary and foreign key by some x value. Then either insert /*+ append or use the merge command to insert/merge the data into the first schema. The second way would be to directly insert /*+ append into the first schema from tables in each of the five schemas adding the appropriate value to the primary/foreign keys. How would you suggest implementing this?
January 26, 2006 - 10:16 am UTC
I would try to load the "fixed" data.
Eg: if the keys that are in need of fixing are all numbers - I might multiply each key by 10 and add 1, 2, 3, 4, or 5 to each during the load.
mass insert code package
Brett Clark, February 13, 2006 - 4:47 pm UTC
For what its worth, here is a package that can be used as a generic foundation for mass loads. It will disable triggers and indexes and (optionally) truncate the main table before running the insert statement that you provide. Hope this helps someone.
create or replace package table_support is
TRUNCATE_NO constant boolean := FALSE;
TRUNCATE_YES constant boolean := TRUE;
procedure load_table(G_TABLE_NAME Varchar2, G_INSERT_STMT Varchar2, TRUNC_ACTION Boolean DEFAULT TRUNCATE_NO );
end table_support;
/
create or replace package body table_support is
ACTION_ENABLE constant boolean := TRUE;
ACTION_DISABLE constant boolean := FALSE;
procedure exec( inStmt Varchar2 ) is
begin
EXECUTE IMMEDIATE inStmt;
end exec;
function exec( inStmt Varchar2 ) return Number is
begin
exec( inStmt );
return( SQL%ROWCOUNT );
end exec;
procedure tbl_triggers(G_TABLE_NAME Varchar2, action Boolean) is
begin
if( action = ACTION_ENABLE ) then
exec( 'ALTER TABLE ' || G_TABLE_NAME || ' ENABLE ALL TRIGGERS' );
elsif ( action = ACTION_DISABLE ) then
exec( 'ALTER TABLE ' || G_TABLE_NAME || ' DISABLE ALL TRIGGERS' );
end if;
end tbl_triggers;
procedure tbl_indexes( G_TABLE_NAME Varchar2, action Boolean ) is
begin
FOR C IN (SELECT INDEX_NAME
FROM USER_INDEXES
WHERE TABLE_NAME = G_TABLE_NAME) LOOP
if( action = ACTION_DISABLE ) then
exec( 'alter index ' || C.INDEX_NAME || ' unusable' );
elsif( action = ACTION_ENABLE ) then
exec( 'alter index ' || C.INDEX_NAME || ' rebuild parallel nologging' );
end if;
END LOOP;
if( action = ACTION_ENABLE ) then
exec( 'alter session set skip_unusable_indexes=false' );
elsif ( action = ACTION_DISABLE ) then
exec( 'alter session set skip_unusable_indexes=true' );
end if;
end tbl_indexes;
procedure load_table(G_TABLE_NAME Varchar2, G_INSERT_STMT Varchar2, TRUNC_ACTION Boolean DEFAULT TRUNCATE_NO ) is
rowCount Number;
begin
-- DISABLE ALL TRIGGERS
tbl_triggers( G_TABLE_NAME, ACTION_DISABLE );
-- TRUNCATE TABLE
if( TRUNC_ACTION = TRUNCATE_YES ) then
exec( 'truncate table ' || G_TABLE_NAME );
end if;
-- DISABLE INDEXES
tbl_indexes( G_TABLE_NAME, ACTION_DISABLE );
-- ENABLE PARALLEL DML
exec( 'alter session enable parallel dml' );
-- EXECUTE INSERT STATEMENT
rowCount := exec( G_INSERT_STMT );
DBMS_OUTPUT.PUT_LINE('Inserted ' || rowCount || ' records into ' || G_TABLE_NAME);
-- ENABLE/REBUILD INDEXES
tbl_indexes( G_TABLE_NAME, ACTION_ENABLE );
-- ENABLE TRIGGERS
tbl_triggers( G_TABLE_NAME, ACTION_ENABLE );
-- TURN OFF PARALLEL DML
exec( 'alter session disable parallel dml' );
end load_table;
begin
-- Initialization
NULL;
end table_support;
/
mass update
Hans, February 17, 2006 - 5:19 pm UTC
I have a table with 50 columns and 10 million records and I have a file with 10 million rows and I have to update 5 columns in the table from the file( the update is performed by getting the common ID value).
Steps I am performing
1.open the file using utl_file
2. read the record
3. perform the update in the table
The process is very slow.
I will appreciate better ideas from the group
February 18, 2006 - 8:16 am UTC
external table (so you can query the flat file), or sqlldr it into a temporary table.
and then merge.
If you are in 9i, you'll have to:
merge into db_table
using (select * from external_table where id in (select id from db_table))
on (db_table.id = external_table.id)
when matched then update ...
when not matched then insert (id) values (null); <<<=== never happens
in 10g, you can just
merge into db_table
using external_table
on (db_table.id = external_table.id)
when matched then update ...;
A reader, February 18, 2006 - 5:37 am UTC
Morning TOM.
sudip, February 19, 2006 - 12:14 am UTC
What is archieve log mode? why we prefer recovery in Archieve log mode? Please, will u brief it?
A reader, February 19, 2006 - 12:17 pm UTC
To sudip from BGALORE, india
Looks like you are new to this site just put one thing in your mind Tom really ticks off for the "U" so dont use it next time :-)
February 19, 2006 - 5:16 pm UTC
Hey, the page he used to post that actually said "if you do this, I will do what I just did..."
Which is to poke fun at at :)
It amazes more than ticks me off.
Just like in some parts of the world, an "English Accent" adds 10 IQ points (when I hear a British accent, for whatever reason - I add default IQ points, maybe it was all of the James Bond movies when I was a kid) - using this instant message speak takes them away.
Large Update Due To Consolidation
David Piazza, February 21, 2006 - 4:29 pm UTC
Tom,
Regarding my previous question. They've decided to do merge one of the five clusters at a time. I've got scripts to implement the first cluster as such:
CTAS NOLOGGING setting the keys to old_key*10+1
insert /*+ append */
DROP old table
RENAME new table
GRANTs
ALTER TABLE add Primary Key
ALTER TABLE add Foreign Key
CREATE INDEX NOLOGGING parallel(degree 4)
ALTER TABLE ADD CONSTRAINT UNIQUE ENABLE
ALTER TABLE Logging
This seems pretty fast. Anything you see that will make it faster?
When I merge in the next clusters, should I:
- Do the same steps as above(*10+2,*10+3 etc.)
- Disable primary keys, make indexes unusable and
Insert /*+ append, and then enable PK's, and indexes
- Drop primary keys, indexes, and constraints, and
INSERT /*+ append, create PK's, Indexes, and constraints
The number of rows in the tables(Approximately the same in each cluster) are: 72338677,32278464,25347185,23834984,7595305,4519904,4371347,1229877,1168373,6669752,6382030,97250,4694,934,166
-Thanks
February 22, 2006 - 8:26 am UTC
why not all of them at once to avoid having to do the disable/enable over and over.
Large Update Due To Consolidation
David Piazza, February 22, 2006 - 1:28 pm UTC
That was my suggestion, but the customer wants to do one cluster at a time, especially the first one, ensuring that it works as before and all the users are happy before merging another cluster in...
February 22, 2006 - 2:09 pm UTC
then you got it.
You don't really need to disable all contraints and if you could get it done in two bits - that would be the least work (load cluster 1, then 2, 3, 4 and 5 after testing 1)
Large Update Due To Consolidation
David Piazza, February 23, 2006 - 4:23 pm UTC
Ok. Which of the three methods would you suggest to merge in the remaining clusters, or would you recommend something else?
1) Do the same steps as above, CTAS/DROP old table/Rename
new table to old table/Insert /*+ append(*10+3,*10+4
etc.)/Create constraints,indexes.
OR
2) Disable primary keys, make indexes unusable and
Insert /*+ append, and then enable PK's, and indexes
OR
3) Drop primary keys, indexes, and constraints, and
INSERT /*+ append, create PK's, Indexes, and constraints
OR
4) Something else
I'm leaning towards 2) or 3)
-Thanks
February 23, 2006 - 8:05 pm UTC
I would just be disabling indexes (if I wanted to - insert append does a rather efficient "merge" of data) and insert appending the other data. I would not redo the CTAS over.
Bulk Insert / Update in sigle SQL
Narendra, February 24, 2006 - 1:49 am UTC
Tom,
I have a target table that stores current as well as history records. Current records are identified by a field CURRENT_REC having value of 'Y'. History records have value of CURRENT_REC as 'N'. At any stage, for a PART_NO, there can be only SINGLE record with CURRENT_REC as 'Y' in target table.
My problem is I want to transfer data (in bulk) from source table to target table with CURRENT_REC flag as 'Y' and simultaneously update existing records having CURRENT_REC as 'Y'. Following is table definition:
Source table:
CREATE TABLE PART_SRC
(
PART_NO NUMBER,
PART_NAME VARCHAR2(100),
BALANCE_QTY NUMBER(5),
CREATED_DATE DATE,
CURRENT_REC VARCHAR2(1)
)
target Table:
CREATE TABLE PART_TABLE
(
PART_NO NUMBER,
PART_NAME VARCHAR2(100),
BALANCE_QTY NUMBER(5),
CREATED_DATE DATE,
CURRENT_REC VARCHAR2(1)
)
Is this possible in a single SQL ?
February 24, 2006 - 8:25 am UTC
I don't get it.
I want to transfer data (in bulk) from source table to target table with current_rec flag as 'Y'
I understand that.
the "simultaneously update existing records have a current-rec as 'Y'"
which existing records, in which table, and update what exactly (since we are already looking only at 'Y' records - what exactly are we updating??!?!)
Bulk Insert / Update in sigle SQL
Michel Cadot, February 24, 2006 - 9:50 am UTC
If i clearly understand you, you want something like (inspired by "merge delete" or "insert all" statements):
merge into part_table t
using part_src s
on (t.part_no = s.part_no and t.current_rec='Y')
when matched then
update t.current_rec = 'N'
insert (t.part_no, ...) values (s.part_no, ...)
when not matched then
insert (t.part_no, ...) values (s.part_no, ...)
/
or
merge ALL into part_table t
using part_src s
on (t.part_no = s.part_no and t.current_rec='Y')
when matched then update t.current_rec = 'N'
when TRUE then
insert (t.part_no, ...) values (s.part_no, ...)
/
But this does not exist. ;)
Michel
More details
Narendra, February 24, 2006 - 9:28 pm UTC
Tom,
One correction to table definitions.
Source table does not contain CURRENT_REC field.
What I am trying to achieve is transfer data from source to target. Each time data is appended to target table. I may transfer updated details of PART_NO multiple times from source to target. However, I do not want data in target table to be overwritten. In order to identify the "last loaded record", I am maintaining CURRENT_REC field in target table. If a PART_NO (from source table) does not exist in target table, it should simply add that PART_NO to target table with value of CURRENT_REC = 'Y'. However, if PART_NO already exists i.e. data has been transferred earlier from source to target for that PART_NO, target table should already have a single record with CURRENT_REC as 'Y' and may have multiple records with CURRENT_REC as 'N' for same PART_NO. Now, when I transfer data for same PART_NO again from source to target, it should be added with CURRENT_REC as 'Y'. However, at the same time, existing record for same PART_NO, having CURRENT_REC as 'Y' should be updated with 'N'.
Hope I am clear.
Michel,
MERGE will not help since CURRENT_REC will not be there in my source table.
Thanks
More details
Michel Cadot, February 25, 2006 - 8:45 am UTC
Narendra,
I understood this and the pseudo-merges don't use the current_rec of source (s) but only of target (t). It would be clearer if I wrote "insert (t.part_no, ..., t.current_rec) values (s.part_no, ..., 'YES')".
There is no way to do it in a single SQL statement but it is easy to do it in an efficient PL/SQL block.
Regards
Michel
Merging ten large tables
Jesse Buzzard, March 17, 2006 - 7:00 pm UTC
Good Evening,
This is my first foray into very large databases and I need some advice.
We have ten tables that contain blob data. The total size of the tables is approximately 1.8TB. They are stored in the same table space. How this came to be is a long story for another discussion thread. Our task is merge the tables into one table. After selecting the table with the largest number of rows as the target we came up with the below code to insert the data from another table into it.
-------- start of code -------
#: Announce start insert
echo "-------------------------------------------------------------"
echo "`date '+ %a %e-%b-%Y %X'` - $0 - started" >> $1
echo "-------------------------------------------------------------"
$ORACLE_HOME/bin/sqlplus /nolog << EOF
connect odirms/odirms
insert /* +APPEND PARALLEL (nto, 8) */ into odirms.t_ompf4 nto
select /* +APPEND PARALLEL (tompf1, 8) */ * from t_ompf1 tompf1;
commit;
exit
EOF
#: End of program
echo "`date '+ %a %e-%b-%Y %X'` - $0 - completed" >> $1
echo "-------------------------------------------------------------"
echo
------------ end of code ----------
At present we are trying to merge 4 of the tables. All logging and indexes are off. Multi threading is on. From all I have read we should have completed this in a relatively short amount of time. This job has been running for over 28 hours. I can see the space used is growing (about 300MB /minute), but my supervisors would like to know when we will have all the tables merged.
Where have gone astray?
Thank You,
Jesse
March 18, 2006 - 11:55 am UTC
well, you said "all indexes are off" and "we have blobs" - the two concepts do not go together. every blob has an index associated with it that cannot possibly be disabled for any reason (short of dropping the blob column!)
So, you do having logging going on, you do have index maintenance going on.
but you don't give any information as to the size of the tables being read (and hence written into the new table).
I don't think anything has "gone astray" here, I can say that your "hint" isn't a hint, it is just text you typed in:
ops$tkyte@ORA10GR2> create table t as select * from dual;
Table created.
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> insert /* +APPEND */ into t select * from dual;
1 row created.
ops$tkyte@ORA10GR2> select * from t;
D
-
X
X
ops$tkyte@ORA10GR2> insert /*+ APPEND */ into t select * from dual;
1 row created.
ops$tkyte@ORA10GR2> select * from t;
select * from t
*
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel
See how the /* + didn't really do anything, but /*+ does. If that is your real script, you likely did not achieve your goal of either append nor parallel (parallel implies append by the way, if it was in fact using parallel)
Merging ten large tables
Jesse Buzzard, March 20, 2006 - 12:15 am UTC
Thank you for your response.
If I may I would like to impose on your wisdom again. We are averaging 1800MB per hour. Yes, the sample I provied is one of our real scripts, and yes the goal was work in parallel. At present five of the tables (56% of the rows) have been merged. The total size of the data, is 1.8 TB in 32,518,931 rows. As you can see from the description below the amount of data in addition to the blob data is rather small. If we have gotten the hint correct would we have achieve a better insert/append rate?
Name Null? Type
----------------------------------------- -------- -----------
OMPF_ID NOT NULL NUMBER
OMPF_UID NUMBER(10)
OMPF_IMG_TYPE_ID CHAR(1)
OMPF_IMG_STYPE_ID CHAR(1)
OMPF_PAGE_NUM NUMBER(5)
OMPF_ACTIVE_FLAG CHAR(1)
OMPF_ADDED_DT DATE
OMPF_INACTIVE_DT DATE
OMPF_IMAGE BLOB
OMPF_LOCK_FLAG NUMBER(5)
OMPF_USER_ID CHAR(8)
March 20, 2006 - 7:16 am UTC
you might have, you might not have. you sort of have to figure out where you are spending your time (tracing - a level 12 10046 trace on a bit of this would go a long way)
Alternative Suggestion
VK, March 21, 2006 - 2:40 pm UTC
Here is an approach based on exchange partitions I used successfully in the past for combining tables (This will work only if the data in the individual tables is distinct and there is no overlap - A scenario in which this will work well is when people in their infinite wisdom create horizontally "partitioned" tables. An example may be where all customers with ID 1 to one million are kept in CUST1 table and one million to 5 million in CUST2 table and so on..):
4 tables to be combined:
create table test(a number,b blob);
create table test1(a number,b blob);
create table test2(a number,b blob);
create table test3(a number,b blob);
Now, let's say table test has values 1 to 10,000 in column a and test1 has 10,001 to 20,000, test2 has 20,001 to 30,000 and test3 has 30,001 to 40,000.
Create a partitioned table that looks exactly like the tables being combined - pick a partition key where the values do not overlap between tables.
create table testp(a number,b blob)
partition by range(a)
(
partition p1 values less than (10001),
partition p2 values less than (20001),
partition p3 values less than (30001),
partition p4 values less than (maxvalue)
);
You can combine the 4 tables into the partitioned table using exchange partition:
alter table testpart exchange partition p1 with table test;
alter table testpart exchange partition p1 with table test1;
alter table testpart exchange partition p3 with table test2;
alter table testpart exchange partition p4 with table test3;
At this point, you can use the partitioned table or if you don't want the table to be partitioned, you can merge all 4 partitions into a single partition and exchange with a non-partitioned table.
alter table testpart merge partitions p1,p2 into partition p5;
alter table testpart merge partitions p5,p3 into partition p6;
alter table testpart merge partitions p6,p4 into partition p7;
Now we have just one partition (P7) in the table and you can exchange this with any of the original tables (we'll use test).
alter table testpart exchange partition p7 with table test;
Now we have all 40,000 rows in table test. As you can see, it is a lot of work to combine 4 small tables but if you have tens/hundreds of millions of rows, this will be much faster.
Merging ten large tables
Jesse Buzzard, March 21, 2006 - 5:03 pm UTC
Tom,
VK,
Thank you for your suggestions. They are educational and helpful. The merges have nearly completed. Now I am off to read about reorganizing the tablespace to reclaim some of the space formerluy used by the smaller tables.
Merging ten large tables
Jesse Buzzard, March 21, 2006 - 5:48 pm UTC
Tom,
VK,
Thank you for your suggestions. They are educational and helpful. The merges have nearly completed. Now I am off to read about reorganizing the tablespace to reclaim some of the space formerluy used by the smaller tables.
Very slow insert into table
bayaraat, April 17, 2006 - 12:16 am UTC
Hi Tom
I need your expertise in this regard.
My table is
create table CDR00
(
CALLDATETIME DATE,
TELNO VARCHAR2(20),
CALLEDNO VARCHAR2(20),
SECONDS NUMBER,
PULSE NUMBER,
PREFIXCODEA CHAR(14),
PREFIXCODEB CHAR(14),
RATECODE CHAR(28),
TOG_AMOUNT NUMBER(20,2),
USD_AMOUNT NUMBER(20,8),
NO_TOG_AMOUNT NUMBER(20,8),
NO_USD_AMOUNT NUMBER(20,2),
TOG_TAX NUMBER(20,8),
USD_TAX NUMBER(20,8),
FNAME VARCHAR2(12),
OPERATOR VARCHAR2(10),
RESERVED VARCHAR2(10),
PRINTED NUMBER(1)
)
tablespace CDR01
pctfree 20
pctused 80
initrans 1
maxtrans 255
storage
(
initial 1M
minextents 1
maxextents unlimited
);
Every day insert into this table 2 million records and truncate table. This contains million or records insert into partition table.
Insert process is very slow, respond times 12:00:00.
I want to quickly insert many records. I using RAID 5.
Not use constraint and trigger. Non stop archive log process. Because Oracle database server is Standby database server. Any suggestions please...
April 17, 2006 - 7:51 am UTC
if done correctly, it would only take minutes or less to bulk load 2 million records.
So, you need to describe HOW you are loading this.
and if you are loading and truncating, doubtful that you are updating - so pctfree=0 would be more appropriate -and perhaps even "compress", but it depends on how you load after all - which we don't know.
Very slow insert into table
Bayaraat, April 17, 2006 - 12:32 am UTC
Hi Tom,
I need your expertise in this regard.
My create table is
create table CDR00
(
CALLDATETIME DATE,
TELNO VARCHAR2(20),
CALLEDNO VARCHAR2(20),
SECONDS NUMBER,
PULSE NUMBER,
PREFIXCODEA CHAR(14),
PREFIXCODEB CHAR(14),
RATECODE CHAR(28),
TOG_AMOUNT NUMBER(20,2),
USD_AMOUNT NUMBER(20,8),
NO_TOG_AMOUNT NUMBER(20,8),
NO_USD_AMOUNT NUMBER(20,2),
TOG_TAX NUMBER(20,8),
USD_TAX NUMBER(20,8),
FNAME VARCHAR2(12),
OPERATOR VARCHAR2(10),
RESERVED VARCHAR2(10),
PRINTED NUMBER(1)
)
tablespace CDR01
pctfree 20
pctused 80
initrans 1
maxtrans 255
storage
(
initial 1M
minextents 1
maxextents unlimited
);
Every day insert into this table for 2 million and records. Response time is 12:00:00. After insert into another partition table and truncate table. I want very quickly insert million and records. Are you need some information? My oracle database server is Standby database server. This table is not use constraint and trigger.
Any suggestions please...
Moving high volumes of data from table1 to table2 in an OLTP system
A reader, May 10, 2006 - 10:54 am UTC
Tom,
We are looking at archiving data in a table (schema1.table1) having about 40 Million records. This table is populated continously by a java application. The objective is to move all but 5 million records(latest records) to the archive table with minimal impact to the end user. The option I am looking at is
create a different schema (schema2) on tablespace2.
create a new table (table2), structure similar to table1.
create a procedure(p_1) that will copy data from schema1.table1 to schema2.table2.
call procedure 1 from a job on a certain schedule.
Psedo code for procedure p_1
----------
insert into schema2.table2
select a.*
from schema1.table1 a
where year(a.creation_date) < 2006
and rownum < 1000
and a.id not in (select id from schema2.table2);
commit;
delete from schema1.table1 aa
where aa.id in (select bb.id from schema2.table2);
commit;
----------
Would like your expert comments on
1) Do you see any problems with the psedo code?
One of the problems I see is
After step #1 and before step #2 in the psedo code, a user may see duplicate records. Is there a way to avoid this situation?
2) Is there a better of implementing this change?
Thank you
May 11, 2006 - 7:31 am UTC
I'd probably rather move the 5 million records to a new table.
(move the lesser of the two amounts)
time to look at partitioning for you? to avoid this in the future.
Updating a single column to 'N' in a 50 Million records table
A reader, May 15, 2006 - 5:34 pm UTC
Tom,
In one of our application, we have a database conversion job to update a column with a 'N' for a table that has about 50 million rows.
We are using a rollback segment which has the capacity to commit about 8 million rows at a time.
So we are running update statements to commit 7 million rows at a time between each of the 8 identical update statements.
( requires us to run the update 8 times to commit all the 50 million updates)
The time for the conversion of these 50 million rows is taking about 20 hours.
We have already explored the following CTAS method :
1. CTAS/INSERT/APPEND/PARALLEL and rename the new table to orig table but due to space constraints and this particular table is bigger in size, the DBAs say this method is not feasible due to space constraints.
So we are looking for other methods/suggestions to see if we can improve the timing.
Thank you.
May 15, 2006 - 6:08 pm UTC
parallel update, using many rollback segments simultaneously.
I pray this column is NOT indexed.
Parallel update question
A reader, May 18, 2006 - 4:22 pm UTC
Tom,
Per Oracle documentation, it seems Tables must be partitioned to support parallel update and delete.
So, i tried parallel update on partitioned table as shown below but how do we confirm it really did parallel dml because after parallel update is done.. i tried to query the same table and it did not generate any error.
I see the same behaviour on non-partitioned tables when parallel update was performed.
Parallel update statement logs are given below :
Partitioned table :
===================
SQL> update /*+ parallel (T, 3) */ T set status='NV' where status='INVALID' ;
1410 rows updated.
Execution Plan
----------------------------------------------------------
0 UPDATE STATEMENT Optimizer=FIRST_ROWS (Cost=11 Card=138 Byte
s=690)
1 0 UPDATE OF 'T'
2 1 PARTITION RANGE* (ALL) :Q214000
3 2 TABLE ACCESS* (FULL) OF 'T' (Cost=11 Card=138 Bytes=69 :Q214000
0)
2 PARALLEL_COMBINED_WITH_PARENT
3 PARALLEL_TO_SERIAL SELECT /*+ Q214000 NO_EXPAND ROWID(A1) */ A1
.ROWID,A1."STATUS" FROM "T" PX_GRANULE(0, BL
OCK_RANGE, DYNAMIC) A1 WHERE A1."STATUS"='I
NVALID'
Statistics
----------------------------------------------------------
19 recursive calls
1733 db block gets
263 consistent gets
168 physical reads
351584 redo size
854 bytes sent via SQL*Net to client
597 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
1410 rows processed
IOMDV:SQL> select count(*) from T ;
COUNT(*)
------------
12239
1 row selected.
Non-partitioned table :
========================
update /*+ parallel(tn 3) */ tn set valid_ind='N' ;
29250 rows updated.
Execution Plan
----------------------------------------------------------
0 UPDATE STATEMENT Optimizer=FIRST_ROWS (Cost=14 Card=29250 By
tes=58500)
1 0 UPDATE OF 'TN'
2 1 TABLE ACCESS* (FULL) OF 'TN' (Cost=14 Card=29250 Bytes=5 :Q212000
8500)
2 PARALLEL_TO_SERIAL SELECT /*+ Q212000 NO_EXPAND ROWID(A1) */ A1
.ROWID,A1."VALID_IND" FROM "IOMDBO"."TN" PX_
GRANULE(0, BLOCK_RANGE, DYNAMIC) A1
Statistics
----------------------------------------------------------
73 recursive calls
143947 db block gets
244 consistent gets
222 physical reads
18118440 redo size
844 bytes sent via SQL*Net to client
576 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
29250 rows processed
bala@IOMDV> select count(*) from tn ;
COUNT(*)
------------
29250
1 row selected.
A reader, May 19, 2006 - 3:05 pm UTC
Thanks for the info.
in oracle 8.1.7, However, with parallel DML on partitioned table, i should not be able to query the same table until i commit the parallel update.
But i was able to query the same table with our any error generated.
Is it the expected behaviour.
May 20, 2006 - 4:39 pm UTC
you did not do a parallel update.
<quote src=expert Oracle database architecture>
We can actually observe the fact that there are separate independent transactions created for the parallel execution servers. Well use two sessions again, as before. In the session with SID=162, we explicitly enable parallel DML. PDML differs from parallel query in that regards; unless you explicitly ask for it, you will not get it:
big_table@ORA10GR1> alter session enable parallel dml;
Session altered.
The fact that the table is parallel is not sufficient, as it was for parallel query. The reasoning behind the need to explicitly enable PDML in your session is the fact that PDML has certain limitations associated with it, which I list after this example.
</quote>
Thank you
A reader, May 20, 2006 - 8:08 pm UTC
RE: Parallel update
bakunian, May 23, 2006 - 8:20 am UTC
I have table "A" 17 GB and 120 mil. rows. I needed to add columns with NOT NULL therefore I had to add column, update every row setting new column to some value, then make it NOT NULL. When I ran update it ran for almost 10 hours, then I thought to create empty dummy table "B" as select * from "A" and setting columns default value to required value.
To my surprise when I did
INSERT /*+ parallel (b, 4) INTO b SELECT /*+ parallel (a,4) */ * FROM a
Default values were not inserted but when I issued regular insert into b statement default values appeared. Why?
Thank you for your time.
May 23, 2006 - 8:38 am UTC
show me - you are using "*" and not listing any columns - I do not see how a conventional path insert would have used defaults.
column defaults are only used when you explicitly skip the column in the insert statement.
ops$tkyte@ORA10GR2> create table t2( x varchar2(20), y int );
Table created.
ops$tkyte@ORA10GR2> insert into t2 values ( 't2', null );
1 row created.
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> insert /*+ append */ into t (x) select 'direct path' from dual;
1 row created.
ops$tkyte@ORA10GR2> commit;
Commit complete.
ops$tkyte@ORA10GR2> insert /*+ append */ into t select * from t2;
1 row created.
ops$tkyte@ORA10GR2> commit;
Commit complete.
ops$tkyte@ORA10GR2> select * from t;
X Y
-------------------- ----------
direct path 0
t2
ops$tkyte@ORA10GR2> insert into t (x) select 'direct path' from dual;
1 row created.
ops$tkyte@ORA10GR2> commit;
Commit complete.
ops$tkyte@ORA10GR2> insert into t select * from t2;
1 row created.
ops$tkyte@ORA10GR2> commit;
Commit complete.
ops$tkyte@ORA10GR2> select * from t;
X Y
-------------------- ----------
direct path 0
t2
direct path 0
t2
Parellel Update
bakunian, May 23, 2006 - 10:07 am UTC
I see so for my trick to work all I had to do is to skip columns that were supposed to have default values. Correct?
At the very beginning of this thread you gave below example but that does not really work for me so what would you do in my case where I need to update millions of rows with default values and new columns are added to a large table with NOT NULL constraint. Was my method almost correct?
ops$tkyte@ORA817DEV.US.ORACLE.COM> create table new_emp as
2 select empno, LOWER(ename) ename, JOB,
3 MGR, HIREDATE, SAL, COMM, DEPTNO
4 from emp;
Thank you,
May 23, 2006 - 10:11 am UTC
well, since you are going to update the entire table - hence locking every row - and you in effect have a "scheduled downtime", I don't agree with you that you cannot use the create table as select. You say "it doesn't really work for you", I say "why not"
Parallel Update
bakunian, May 23, 2006 - 10:34 am UTC
Forgive me I've been up for almost 48 hours and a bit slow so using your example would it look something like below assuming every person in this table will have ename = 'Bob'
create table new_emp as
select empno, 'Bob', job, MGR,
.
From emp;
Correct?
May 23, 2006 - 3:26 pm UTC
select empno, 'Bob' ename, job, mgr, ...
from emp
yes.
Estimate time for a create table to end
Thierry Yen Suin, May 26, 2006 - 3:56 am UTC
We once had to sort a very large (non-partitioned) table in the region of 60G.
The sorted column was Type NUMBER and indexed.
So what was/is regularly done is
create table sorted_table
as select * from unsorted_table
where col_to_be_sorted>0;
The sorts normally takes 3 hours to finish using the index on col_to_be_sorted. As I understand this create table does not need TEMP tablespace to do the sort.
Once the sort did not finish after 35hours.
Database was 8i. We stopped the create statement.
Rebuild the index (out of desperation, who knows?)
Reissued the create table. -> never finished.
Did an export/import into 9i on another server.
Created index, issued same create statement, took less than 1hr.
My question:
Is there a way to estimate the time it will take for a create table to finish based on some V$ views?
A way to see the number of rows that have been inserted/created into the temp segment of the "to be created" table?
Or anything that gives us a clue that it will take forever.
Thanks,
Thierry Yen Suin
Humm V$SESSION_LONGOPS
Thierry Yen Suin, June 01, 2006 - 1:22 am UTC
Of Course the V$SESSION_LONGOPS view!
How could I have forgotten than?
Thierry Yen Suin.
Am i missing something?
A reader, June 13, 2006 - 4:18 am UTC
Tom,
I did following to calculate the redo generated with CTAS with & without nologging. But the amount of redo generated is identical & time taken is even more with nologging. Am I missing here something?
SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production
PL/SQL Release 9.2.0.1.0 - Production
CORE 9.2.0.1.0 Production
TNS for Solaris: Version 9.2.0.1.0 - Production
NLSRTL Version 9.2.0.1.0 - Production
SQL> set timing on
SQL> set time on
13:25:17 SQL> drop table big_table;
Table dropped.
Elapsed: 00:00:00.67
13:25:52 SQL> select a.value
13:26:38 2 from v$mystat a , V$STATNAME b
13:26:38 3 where a.statistic# = b.statistic#
13:26:38 4 and b.name = 'redo size' ;
VALUE
----------
54144
Elapsed: 00:00:00.29
13:26:38 SQL>
13:26:46 SQL> create table big_table
13:27:01 2 as select * from dba_objects;
Table created.
Elapsed: 00:00:08.56
13:27:10 SQL> select a.value
13:27:22 2 from v$mystat a , V$STATNAME b
13:27:22 3 where a.statistic# = b.statistic#
13:27:22 4 and b.name = 'redo size' ;
VALUE
----------
109040
Elapsed: 00:00:00.33
13:27:23 SQL> drop table big_table;
Table dropped.
Elapsed: 00:00:00.19
13:27:57 SQL> select a.value
13:28:02 2 from v$mystat a , V$STATNAME b
13:28:02 3 where a.statistic# = b.statistic#
13:28:02 4 and b.name = 'redo size' ;
VALUE
----------
134372
Elapsed: 00:00:00.36
13:28:03 SQL> create table big_table nologging
13:28:12 2 as select * from dba_objects;
Table created.
Elapsed: 00:00:09.04
13:28:21 SQL> select a.value
13:28:46 2 from v$mystat a , V$STATNAME b
13:28:46 3 where a.statistic# = b.statistic#
13:28:46 4 and b.name = 'redo size' ;
VALUE
----------
188816
Elapsed: 00:00:00.24
June 13, 2006 - 12:18 pm UTC
you are in a noarchivelog mode database aren't you.
run it again, it was a fluke that it took 9 seconds. something else was happening on your system
nologging on a CTAS in a noarchivelog mode database is a big "no-operation", CTAS is already a direct path, non-logged operation in noarchivelog mode.
You would expect the "logged" one to have generated gobs of redo (of the size of the table). not 50k
Here we are in ARCHIVELOG MODE:
ops$tkyte@ORA9IR2> set timing on
ops$tkyte@ORA9IR2> @test
ops$tkyte@ORA9IR2> drop table t;
Table dropped.
Elapsed: 00:00:00.06
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> @mystat "redo size"
ops$tkyte@ORA9IR2> set echo off
NAME VALUE
-------------------- ----------
redo size 3568320
Elapsed: 00:00:00.00
ops$tkyte@ORA9IR2> create table t as select * from dba_objects;
Table created.
Elapsed: 00:00:00.53
ops$tkyte@ORA9IR2> @mystat2
ops$tkyte@ORA9IR2> set echo off
NAME VALUE DIFF
-------------------- ---------- ------------------
redo size 6998672 3,430,352
Elapsed: 00:00:00.01
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> drop table t;
Table dropped.
Elapsed: 00:00:00.06
ops$tkyte@ORA9IR2> @mystat "redo size"
ops$tkyte@ORA9IR2> set echo off
NAME VALUE
-------------------- ----------
redo size 7021144
Elapsed: 00:00:00.00
ops$tkyte@ORA9IR2> create table t NOLOGGING as select * from dba_objects;
Table created.
Elapsed: 00:00:00.47
ops$tkyte@ORA9IR2> @mystat2
ops$tkyte@ORA9IR2> set echo off
NAME VALUE DIFF
-------------------- ---------- ------------------
redo size 7088872 67,728
Elapsed: 00:00:00.00
<b>That yours did not generate more than 50k of redo makes me believe you are in noarchivelog mode like this:
</b>
ops$tkyte@ORA9IR2> @test
ops$tkyte@ORA9IR2> drop table t;
Table dropped.
Elapsed: 00:00:00.03
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> @mystat "redo size"
ops$tkyte@ORA9IR2> set echo off
NAME VALUE
-------------------- ----------
redo size 205344
Elapsed: 00:00:00.00
ops$tkyte@ORA9IR2> create table t as select * from dba_objects;
Table created.
Elapsed: 00:00:00.47
ops$tkyte@ORA9IR2> @mystat2
ops$tkyte@ORA9IR2> set echo off
NAME VALUE DIFF
-------------------- ---------- ------------------
redo size 273312 67,968
Elapsed: 00:00:00.01
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> drop table t;
Table dropped.
Elapsed: 00:00:00.06
ops$tkyte@ORA9IR2> @mystat "redo size"
ops$tkyte@ORA9IR2> set echo off
NAME VALUE
-------------------- ----------
redo size 295856
Elapsed: 00:00:00.00
ops$tkyte@ORA9IR2> create table t NOLOGGING as select * from dba_objects;
Table created.
Elapsed: 00:00:00.45
ops$tkyte@ORA9IR2> @mystat2
ops$tkyte@ORA9IR2> set echo off
NAME VALUE DIFF
-------------------- ---------- ------------------
redo size 363612 67,756
Elapsed: 00:00:00.03
<b>and the 9 seconds - totally because something else was happening on your system at that point that prevented you from being "fast"</b>
Absolutely correct as usual
A reader, June 14, 2006 - 1:18 am UTC
SQL> SELECT log_mode
2 FROM v$database;
LOG_MODE
------------
NOARCHIVELOG
Thanks very much Tom.
Merge with aggregates.
Thiru, July 07, 2006 - 10:48 am UTC
Tom,
I think I posted this wrongly elsewhere!
-----------------------------------------
Will merge be a good thing to use (if so, how)when dealing with aggregates in a
multi-million records tables.
For instance:
create table source ( a varchar2(2),b number)
/
create table target ( a varchar2(2),b number,status varchar2(10))
/
insert into source values('ab',100);
insert into source values('bc',100);
insert into source values('cd',100);
insert into source values('ef',100);
insert into source values('fg',100);
insert into source values('gh',100);
insert into source values('hi',100);
insert into source values('ij',100);
insert into source values('jk',100);
insert into source values('kl',100);
insert into source values('mn',100);
insert into target (a,b) values('ab',50);
insert into target (a,b)values('ab',50);
insert into target (a,b) values('bc',50);
insert into target (a,b)values('bc',50);
insert into target (a,b)values('cd',50);
insert into target (a,b)values('cd',50);
insert into target (a,b)values('de',50);
insert into target (a,b)values('de',50);
insert into target (a,b)values('de',50);
insert into target (a,b) values('ef',50);
insert into target (a,b)values('ef',50);
insert into target (a,b)values('gh',50);
insert into target (a,b)values('hi',50);
insert into target (a,b)values('hi',50);
insert into target (a,b)values('ij',50);
insert into target (a,b)values('ij',50);
insert into target (a,b)values('jk',50);
insert into target (a,b)values('jk',50);
insert into target (a,b)values('jk',50);
commit;
I would want to update the target STATUS column to 'MCHED' if the sum(target.b)
= source.b and target.a=source.a.
In all other cases, update the STATUS column to 'UNMCHED'. The number of records
in both the tables are over 2mill.
b. Also, can this be done in a single update statment?
July 08, 2006 - 10:52 am UTC
look elsewhere for the answer, I did it over there.
Large import question
Mike, July 12, 2006 - 3:57 pm UTC
I have visited this thread on and off for a while. I have a question regarding an apparent slowdown of inserts while importing a very large table. We are on 9.2.0.5 on Solaris 8.
I am importing a 60 million row table into our development database. (I just got more space for it, wooohooo!). Not long ago, I found a very nifty script written by Frank Naude` for monitoring import progress, as follows:
select substr(sql_text,instr(sql_text,'INTO "'),30) table_name, rows_processed,
round((sysdate-to_date(first_load_time,'yyyy-mm-dd hh24:mi:ss'))*24*60,1) minutes,
trunc(rows_processed/((sysdate-to_date(first_load_time,'yyyy-mm-dd hh24:mi:ss'))*24*60)) rows_per_min
from sys.v_$sqlarea
where sql_text like 'INSERT %INTO "%'
and command_type = 2
and open_versions > 0
/
When the table in question started importing, this script returned very high rows per minute values (138905), but as the import progressed, it has fallen to 33779 per minute after a few hours of inserting. I am doing the import from a command line and have the buffer set to 4m, recordlength 64k, commit=n, no statistics (will do entire schema afterward). Can you theorize as to why the performance would erode as the import of a large table progresses? Our box has 4g of memory and 4 cpus. My SGA is about 1G with a healthy cache and auto undo, other parameters would likely pass a sanity check.
TABLE_NAME ROWS_PROCESSED MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION 524272 3.8 139805
1 row selected.
TABLE_NAME ROWS_PROCESSED MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION 676480 5.2 129263
1 row selected.
TABLE_NAME ROWS_PROCESSED MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION 744128 6.5 114775
1 row selected.
TABLE_NAME ROWS_PROCESSED MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION 1234576 12.9 96075
TABLE_NAME ROWS_PROCESSED MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION 1471344 17.3 84885
TABLE_NAME ROWS_PROCESSED MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION 1877232 26.6 70484
TABLE_NAME ROWS_PROCESSED MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION 1978704 28.7 68904
1 row selected.
TABLE_NAME ROWS_PROCESSED MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION 2114000 32.2 65618
1 row selected.
TABLE_NAME ROWS_PROCESSED MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION 2824304 50.4 56019
1 row selected.
SQL> /
Wed Jul 12
TABLE_NAME ROWS_PROCESSED MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION 3602256 78.5 45888
1 row selected.
TABLE_NAME ROWS_PROCESSED MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION 4126528 97.8 42186
1 row selected.
SQL> /
Wed Jul 12
TABLE_NAME ROWS_PROCESSED MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION 4329472 108.2 40001
1 row selected.
TABLE_NAME ROWS_PROCESSED MINUTES ROWS_PER_MIN
------------------------------ -------------- --------- ------------
INTO "COLLECTIONS_ORGANIZATION 5597872 165.7 33779
July 12, 2006 - 5:19 pm UTC
see any "cannot allocate new log" messages?
what do v$session_* views return? like session_wait?
Additional Info
Mike, July 13, 2006 - 9:08 am UTC
SELECT A.USERNAME ora_user, A.OSUSER os_user, A.PROGRAM, C.EVENT||', '|| C.P1TEXT||', '||C.P2TEXT||', '||
C.P3TEXT event,C.WAIT_TIME
FROM V$SESSION A, V$PROCESS B, V$SESSION_WAIT C
WHERE A.PADDR = B.ADDR
AND A.SID = C.SID(+)
AND A.OSUSER IS NOT NULL
and a.username is not null
and c.event not like 'SQL*Net%'
ORDER BY 1,3
(I edited the output for readability)
ORA_USER OS_USE PROGRAM
-------- ------ ------------
SYSTEM oracle imp@ods-d-pr
od1.appliedc
ard.com (TNS
EVENT WAIT_TIME
---------------------------- ---------
db file sequential read, fil 0
e#, block#, blocks
V1-V3)
There are three mirrored sets of redo logs, 15om each
SELECT class, count FROM v$waitstat
WHERE class IN ('free list', 'system undo header', 'system undo block', 'undo header', 'undo block')
order by class
CLASS COUNT
------------------ ------------
free list 0
system undo block 0
system undo header 0
undo block 0
undo header 160
Rollback!!!
ROLLBACK SEGMENT WAITS for DEVL
waits gets Wait Ratio
71 2,131,551 .00003%
The import ran all night then failed this morning on an
ORA-30036: unable to extend segment by 4 in undo tablespace 'UNDOTBS'. Now I'm really mystified, because this table, with 59 million rows, is only about 3.5G in size. During previous imports, I have seen undo segments stretch up to a bit over 4G, then apparently wrap around with no consequence (there is nobody else reading or writing this table). The undotbs has a 4G file which can extend to 16G. In addition, I have seen the "used" size of the undotbs expand and shrink based on activity....last time I looked yesterday it had reached 14.5G and this morning when the job failed (it is rolling back right now) it had maxxed out the file.
Here are the undo settings:
transactions_per_rollback_segment 4
undo_management AUTO
undo_retention 16000
undo_suppress_errors FALSE
undo_tablespace UNDOTBS
Another thing, the table is NOT partitioned, which is because it was never supposed to grow this big, (bad metrics given to dba!) and there is no natural key we can depend on. We are working on an alternate way to hold this data.
I was just looking at v$undostat and noticed this:
UNXPBLKRELCNT UNXPBLKREUCNT EXPSTEALCNT EXPBLKRELCNT
------------- ------------- ----------- ------------
0 0 0 0
0 0 1 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
4608 0 10 0
10752 0 21 0
9728 0 19 0
8192 0 16 0
13312 0 19 0
11264 0 22 0
10752 0 12 0
9216 0 8 0
11264 0 8 0
10752 3 7 0
8704 0 10 0
10752 0 7 0
8704 0 6 0
13312 3 8 0
9216 0 4 0
9216 3 5 0
12800 0 4 0
6144 0 5 0
12800 0 4 0
4676 0 5 2048
10240 0 7 2560
0 0 2 7680
0 0 2 10240
0 0 1 4096
So did this undo failure occur due to numbers shown in the last column? Hmmmm. undo_retention is set for more than 4 hours, yet the import had been running at least 16 hours when it went belly up.
For now, I am going to do a separate export from production with a where clause and populate the table that way, but I would like to further understand both the original question, which is why import performance degraded as it went along, and what caused the import to blow up on undo (we have successfully imported other very large tables).
July 13, 2006 - 9:25 am UTC
this is too much information
all I wanted to know is:
o are there any waits you see in the v$session tables
o do you have "cannot allocate new log" messages in your alert log.
160 waits is insignificant/meaningless from v$waitstat - tells us nothing. No "time" information from it and it is a count since the INSTANCE was started.
Results
Mike, July 13, 2006 - 9:59 am UTC
select sid, event, p1text, p1, p2text, p2 from v$session_wait
/
SID EVENT P1TEXT P1 P2TEXT P2
-- --------------------------- ---------- ------------ --------- --------
1 pmon timer duration 300 0
3 rdbms ipc message timeout 300 0
4 rdbms ipc message timeout 300 0
6 rdbms ipc message timeout 180000 0
7 rdbms ipc message timeout 500 0
12 db file sequential read file# 12 block# 813986
2 db file parallel write requests 153 interrupt 0
5 smon timer sleep time 300 failed 0
9 SQL*Net message to client driver id 1413697536 #bytes 1
8 SQL*Net message from client driver id 1413697536 #bytes 1
11 SQL*Net message from client driver id 1413697536 #bytes 1
18 SQL*Net message from client driver id 1413697536 #bytes 1
22 SQL*Net message from client driver id 1413697536 #bytes 1
23 SQL*Net message from client driver id 1413697536 #bytes 1
25 SQL*Net message from client driver id 1413697536 #bytes 1
The answer to the second question is no.....logs were switching about every 15 - 20 minutes during the import.
July 13, 2006 - 12:57 pm UTC
that is point in time, session_event shows what?
JUST for your session please. measure it, wait a bit, measure again - subtract. It'll show you what that session waited on during that interval (as you are measuring row counts - as the row counts decrease what wait events increase...)
(and this table is not indexed right... that is my assupmption)
How to monitor the table delete prograss?
Jim, July 19, 2006 - 12:10 am UTC
Tom,
I have a long run delete process:
delete tab where col1 < '01-May-06'
/
it takes a few hours to complete.
how can i check how many records have been deleted during the process?
TIA
July 19, 2006 - 9:06 am UTC
that is a strange string to compare to isn't it? Because I know col1 cannot be a date since we never compare dates to strings and strings to numbers and so on.... (just a comment..)
as long as the delete uses a full tablescan, you can monitor it in v$session_longops, not by number of rows deleted, but rather by number of blocks scanned in the table.
Very confused
Elaine, July 25, 2006 - 11:55 am UTC
I have been reading through this thread and saw where you said that updating a few million rows should be able to take place in minutes. I have 400 million rows daily to load. We use insert/append to do the insert (sorry, that's redundant insert/insert...).
I am having a disagreement with my boss. There are 9 indexes on the table. I maintain that as each row is appended, the indexes are updated. He maintains that the index updates do not take place until after the load is complete.
Which of us is correct? I have checked in both your books and in the Concepts manual. Someone pointed me to the following quote:
Indexes are created on one or more columns of a table. After it is created, an index is automatically maintained and used by Oracle. Changes to table data (such as adding new rows, updating rows, or deleting rows) are automatically incorporated into all relevant indexes with complete transparency to the users.
This would imply (I think) that I am right that the indexes are updated as the inserts are occurring, or am I misreading this?
Any links to either chapter numbers in your book or other documentation links greatly appreciated if this has already been covered.
July 25, 2006 - 12:04 pm UTC
neither of you is entirely correct :)
both of you are partitially right! but i'd say he is closer to the truth.
with a direct path load like that - we write the newly added data to the table above the high water mark for the table.
We also build a "mini index" of the newly loaded data off to the side (not in the "big, existing" index yet).
At the end of the append - we merge in BULK the mini-index with the existing index.
So, the existing index is maintained automatically.
But not row by row.
It is done in bulk - efficiently - at the end of the operation.
Great answer!
Elaine H, July 25, 2006 - 12:51 pm UTC
I never understood how that worked. Thanks for the excellent explanation. I still have one thing I don't understand though. If there is a unique index on the table, how does oracle compensate to make sure that the constraint is not violated if it is only building the mini index to merge later?
Also, do you know of any way to speed the load of the mini-index to the main index? Is there any way to influence the manner in which the update/insert is applied the way that you can the initial insert using hints?
Parallel DML and GROUP BY
Thiru, September 21, 2006 - 1:29 pm UTC
I do not know if this is an oracle bug or parallel processing is not going to work in this case:
alter table t parallel;
alter session enable parallel dml;
create sequence t_seq;
update
(select rkey,id from t order by id)
set rkey=t_seq.nextval;
The table contains 50mill records
What it looks like after running the above update is that the sequence used was not consistent with the order on column id.
Though I cannot run a test case at this point for your perusal, the parallel procssing looks like ordered the individual records in the splits and was allocating the sequence numbers.
for eg: the data looked like:
rkey id
100 ab
101 ab
102 bc
103 bc
..
..
500 ab
The question is: Is this the way parallel dml works or is it a bug or is something wrong with my update?
Thanks and regards
September 22, 2006 - 2:21 am UTC
I would not expect it to be. each section was ordered nicely, but that is about it. I would expect gaps to be possible and basically rows to be randomly updated.
I would say "something wrong with the thought behind doing such an update" rather than something wrong with the update. I don't see the point in assigning a number like that at all.
Thiru, September 22, 2006 - 5:43 pm UTC
The reason behind doing such an update was :
The flat files that contain data is not sorted. I used external table to load this into the table and could not use a sequence and order the records while doing the load. No idea if that was possible ??
So what I understand now is that parallel dml would work if no ordering or grouping is involved. Because if grouping is involved, then it would be the same case of grouping each splits separatlely. Right?
The other option which I thought of was:
create the table through the external table load.
And then create another table CTAS using the sequence and ordering the base table.
create table t1 as select t_seq.nextval,t.id,t.c2 from t order by id
Is this a good option?
September 24, 2006 - 1:09 pm UTC
umm, but the data is apparently already "orderable" - you just tried to use an order by to assign yet another field to order by.
so, why do you need this second field when one already apparently exists.
Delete / CTAS
Chris, September 27, 2006 - 11:25 am UTC
Tom,
Trying to process all the great info in this thread and would like a little feedback. I have a 20 million row table that gets about 2 million rows deleted every day. The table is not partitioned (let's not get into the why). There is only one index on the table and it is the primary key. Due to the extremely critical nature of the data some folks are very nervous with a CTAS, drop/rename strategy. The database does not use locally managed tablespaces and the current process spawns about 70 threads to delete data and bring specific rollback segments online. For other reasons I won't go into here, the database is bounced every day and this delete process would ideally be compatible with running before and after the bounce (currently it must complete before they bounce). I am trying to benchmark something like this and was looking for your thoughts.
1. CTAS getting only those rows to be deleted (unrecoverable/parallel).
2. Delete x number of rows at a time in parallel.
Looks like:
CREATE TABLE mytab unrecoverable parallel 8 AS
SELECT /*+ parallel(mytab_source,8) */ column1
FROM mytab_source
WHERE ...;
DELETE /*+ parallel(mytab,8) */
FROM mytab
WHERE column1 IN (SELECT column1
FROM mytab_source
WHERE rownum < 50000);
The only two reasons for limiting # of rows deleted would be rollback space and time. Would be happy to do it all at once if this became of no concern. If the whole operation could complete in < 30 minutes that would be very acceptable.
Not exactly sure how I'd loop through 50000 at a time though without deleting from mytab_source as well. In other words after I delete the first 50000, how do I get the next 50000.
Anyhow, some questions.
1. Does this seem reasonable or would you suggest another approach (I'm sure you will)?
2. Not much experience with parallel DML, does the above work as is or do I need to "alter session enable parallel dml" first? I though I had to, but it seemed to run really quick without doing this.
3. I don't believe the table currently has a value specified for freelists, should it, and if so what should it be?
4. Is the delete syntax with parallel correct?
5. Might I be better off with a CTAS for the 18 million rows I want to keep, then build an index in parallel (the first time I tried with logging and not in parallel on 11 million rows it took 34 minutes) and then rename/rename? To appease those that are fearful of dropping the old table I could keep the last version with a different name if there is sufficient space keeping a rolling cycle of two tables.
6. How would you suggest doing the delete x number of rows at a time?
7. What is the difference between unrecoverable and nologging on the CTAS?
8. If you use a drop/rename or rename/rename, packages will go invalid when the original table disappears. All code should recompile itself on first use, correct? Am I sure I won't get "invalid state of packages" on the first attempt, and then a successful use on the next attempt?
September 27, 2006 - 3:51 pm UTC
laughing out loud, but limiting the number of rows deleted - you increase the runtime (which seems to be the converse of what you are trying to achieve)
have you tried "create table as select ROWS TO KEEP", "index" (all of which will require no undo and if you like, no redo), drop and rename?
how does IT perform compared to deleting 10% of the data?
Alexander the ok, September 27, 2006 - 4:54 pm UTC
Why does limiting the number of rows increase runtime?
September 27, 2006 - 5:33 pm UTC
eh? not sure what you are refering to exactly here?
Your last follow up
Alexander the ok, September 27, 2006 - 6:09 pm UTC
But, I've got a LONG column ...
Matt, October 11, 2006 - 1:00 pm UTC
that is used by everyone's favourite CRM application. I've got to get rid of 38 million rows of a 40 million row table. If I understand correctly the conversion from LOB to LONG is a no-no (therefore cannot do a CTAS and then convert back).
So ... how does one delete 90 odd % of a table with a LONG column?
October 11, 2006 - 4:03 pm UTC
sorry, no good answer for you - longs prevent most things.
LONG and a direct path insert
Matt, October 16, 2006 - 5:35 am UTC
Tom,
Apologies for pestering you still, but does having a LONG column also count out the use of a direct path insert?
I'm thinking, create another table for the data we want to keep, do the direct path insert, drop the 'old' table and then rename.
October 16, 2006 - 7:53 am UTC
you cannot insert a long using insert as select, so the question is moot since you cannot use insert with a "as select" and a long.
Application to insert/copy?
Basil, November 01, 2006 - 2:16 pm UTC
I've inherited a database design where we have the need to make copies of rows in one particular table. The system more or less looks like:
drop table c;
drop table u;
drop table pt;
drop table ans;
create table c (
c_id NUMBER PRIMARY KEY NOT NULL,
name VARCHAR2(500));
create table u (
user_id NUMBER PRIMARY KEY NOT NULL,
name VARCHAR2(500));
create table pt (
pt_id NUMBER PRIMARY KEY NOT NULL,
name VARCHAR2(500));
create table ans (
ans_id NUMBER PRIMARY KEY NOT NULL,
name VARCHAR2(500));
create table rp
(rp_id NUMBER PRIMARY KEY NOT NULL,
ans_id NUMBER NOT NULL,
pt_id NUMBER NOT NULL,
c_id NUMBER NOT NULL ,
date_value DATE,
text_value VARCHAR2(4000),
numeric_value NUMBER,
addon_ids VARCHAR2(4000),
cr_dt DATE DEFAULT SYSDATE NOT NULL,
cr_user_id NUMBER NOT NULL,
upd_dt DATE DEFAULT SYSDATE NOT NULL,
upd_user_id NUMBER NOT NULL,
CONSTRAINT rp_c_fk FOREIGN KEY (c_id)
REFERENCES c (c_id) ENABLE,
CONSTRAINT rp_CRUSER_FK FOREIGN KEY (CR_USER_ID)
REFERENCES u (user_id) ENABLE,
CONSTRAINT rp_PRICETBL_FK FOREIGN KEY (pt_id)
REFERENCES pt (pt_id) ENABLE,
CONSTRAINT rp_SRVYRESP_FK FOREIGN KEY (ans_id)
REFERENCES ans (ans_id) ENABLE,
CONSTRAINT rp_UPDUSER_FK FOREIGN KEY (upd_user_id)
REFERENCES U (USER_ID) ENABLE
) ;
CREATE INDEX rp_ac_id ON rp(c_id, ans_id);
CREATE INDEX rp_idx ON rp(ans_id, pt_id, c_id);
CREATE INDEX rp_crsidx ON rp (c_id, ans_id, pt_id );
-- rp_idx or rp_crsidx should probably go away, to minimize duplication
CREATE INDEX rp_pt_idx ON rp(pt_id);
The table RP has about 30 million rows at present. It is unpartitioned. The use of surrogate keys everywhere makes it difficult to find a suitable key for partitioning. When we do a copy, about 600,000 rows (in one particular bad case) need to be duplicated, adjusting the ans_id, c_id, and pt_id values to new ones.
To accomplish the mapping, I created some temporary tables:
create global temporary table c_map (
previd NUMBER PRIMARY KEY NOT NULL, newid NUMBER )
on commit preserve rows;
create global temporary table pt_map (
previd NUMBER PRIMARY KEY NOT NULL, newid NUMBER )
on commit preserve rows;
create global temporary table ans_map (
previd NUMBER PRIMARY KEY NOT NULL, newid NUMBER )
on commit preserve rows;
As I generate new entries for c, pt, and ans, I do multitable inserts, so that the old and new ids get stored in the temp tables. For example:
INSERT ALL
INTO c_map(previd, newid)
VALUES (c_id, c_seq.NEXTVAL)
INTO c(c_id, name)
VALUES (c_seq.NEXTVAL, name)
SELECT c_id, name FROM c WHERE <criteria>;
My thought was then to do something like:
insert into rp (
rp_id, ans_id, pt_id, c_id,
date_value, text_value, numeric_value, addon_ids,
cr_dt, cr_user_id, upd_dt, upd_user_id )
SELECT rp_seq.NEXTVAL,
ans_map.newid, pt_map.newid, c_map.newid,
date_value, text_value, numeric_value, addon_ids,
cr_dt, cr_user_id, upd_dt, upd_user_id )
FROM rp
JOIN ans_map ON rp.ans_id = ans_map.previd
JOIN pt_map ON rp.pt_id = pt_map.previd
JOIN c_map ON rp.c_id = c_map.previd;
On top of the strange execution plans I'm getting for the SELECT portion (which is doing full scans of rp with its 30 million rows, in spite of indexes that should help), the actual insert is taking a REALLY long time.
Can this UPDATE approach be modified to improve the INSERT performance in my scenario?
Oh yes, this is on a 9iR2 system.
How do hints work
Andrew, December 05, 2006 - 2:03 pm UTC
From time to time, developers tell me that the hints they use to influence execution path, like INDEX, do not work and there must be a bug in Oracle. I tell them that sometimes the hint does not take because a HINT is a hint, not a directive and that they work by artificially changing the cost of the query to favor the HINTed path. The developers insist that HINTs should always be taken because they are directives. I looked through the Oracle documentation, and it consistently refers to hints as instructing or directing the optimizer to take a certain path, rather than favoring the hinted path.
I have found no documentation about how hints work internally. Can you tell me where to find information about how these 'execution path' hints work?
December 06, 2006 - 9:29 am UTC
they can only be obeyed when they make sense.
give me an example and I'll tell you what mistake your developers made.
Well, in addition to the mistake of hinting queries at the drop of a hat that is.
Using hints is a dicey proposition - for an emergency bug fix of one or two queries - sure, as a tuning tool - I'd avoid it.
ops$tkyte%ORA10GR2> create table t ( x int, y int );
Table created.
ops$tkyte%ORA10GR2> create index t_idx on t(x);
Index created.
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> @at
ops$tkyte%ORA10GR2> column PLAN_TABLE_OUTPUT format a72 truncate
ops$tkyte%ORA10GR2> set autotrace traceonly explain
ops$tkyte%ORA10GR2> select /*+ index( t t_idx ) */ * from t order by x;
Execution Plan
----------------------------------------------------------
Plan hash value: 961378228
------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 26 | 3 (34)| 00:00:0
| 1 | SORT ORDER BY | | 1 | 26 | 3 (34)| 00:00:0
| 2 | TABLE ACCESS FULL| T | 1 | 26 | 2 (0)| 00:00:0
------------------------------------------------------------------------
Note
-----
- dynamic sampling used for this statement
<b>the index CANNOT be used in this case, X is nullable - it is not possible to use the index on X since any row with X is null would not be in that particular index</b>
ops$tkyte%ORA10GR2> alter table t modify x not null;
Table altered.
ops$tkyte%ORA10GR2> select /*+ index( t t_idx ) */ * from t order by x;
Execution Plan
----------------------------------------------------------
Plan hash value: 3778778741
------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 26 | 827 (1
| 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 26 | 827 (1
| 2 | INDEX FULL SCAN | T_IDX | 1 | | 26 (0
------------------------------------------------------------------------
Note
-----
- dynamic sampling used for this statement
ops$tkyte%ORA10GR2> set autotrace off
<b>and now it can be</b>
directives can only be followed when they make sense. If I gave you a directive to levitate six feet off of the floor - you would not be able to do that
Basic how to do simple update in minutes rather then hours
Matthew, January 02, 2007 - 11:21 am UTC
Tom,
Earlier in this thread you demonstrated how you were able to update a million rows in just a few minutes on your laptop.
I am currently developing on a system where it takes over 2 hours to update a single column for 757474 rows, and I am beyond frustrated. I can do a CTAS for millions of rows with a complex sql join of 5 tables in a couple of minutes by modifying my session to use a large sort area and hash area size, but I cannot do a simple update statement (update t set colunmX = 'C') in under a 2 hours.
This is unacceptable as I am doing ETL and my load window needs to be within a 4 hour window. Having an update of a single table take 2 hours is a application killer. The db is at 10.2, but unfortuantely the DBA's are not.
My question is what low hanging fruit should I look for in terms of speeding this processing up, and how can I prove / debug future problems of this nature.
The following the explain plan and autotrace of the update statement (working on getting access to the trace files on the server):
idl@DEVAEETL> explain plan for
2 update idl.srvc_mbr_prmry_dim
3 set current_flag ='I';
Explained.
idl@DEVAEETL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 153652758
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 755K| 1475K| 24004 (4)| 00:04:49 |
| 1 | UPDATE | SRVC_MBR_PRMRY_DIM | | | | |
| 2 | TABLE ACCESS FULL| SRVC_MBR_PRMRY_DIM | 755K| 1475K| 24004 (4)| 00:04:49 |
-----------------------------------------------------------------------------------------
9 rows selected.
idl@DEVAEETL> set autotrace on
idl@DEVAEETL> update idl.srvc_mbr_prmry_dim
2 set current_flag ='I';
757474 rows updated.
Execution Plan
----------------------------------------------------------
0 UPDATE STATEMENT Optimizer=ALL_ROWS (Cost=24004 Card=755486 Bytes=1510972)
1 0 UPDATE OF 'SRVC_MBR_PRMRY_DIM'
2 1 TABLE ACCESS (FULL) OF 'SRVC_MBR_PRMRY_DIM' (TABLE) (Cost=24004 Card=755486
Bytes=1510972)
Statistics
----------------------------------------------------------
960 recursive calls
922410 db block gets
865720 consistent gets
133717 physical reads
253701772 redo size
447 bytes sent via SQL*Net to client
469 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
11 sorts (memory)
0 sorts (disk)
757474 rows processed
January 04, 2007 - 9:55 am UTC
that current_flag is not indexed correct?
trace it - see what the major waits in fact are.
large update for a table
ozjbrown, January 04, 2007 - 5:46 pm UTC
Hi Tom,
I have a student_achievement fact table (over 4 million rows) and need to update a single column in the record for every record contained in this table. In order to determine the value updated, I have a temporary student_school table (approximately 2 million rows) that contains the value for that the column should be updated based upon matching criteria. The update to the fact table takes over 6 hours to complete.
The student_achievement fact table has the following columns -
student_achievement_key integer (pk)
school_year varchar2(8)
student_key integer
course_key integer
mark varchar2(5)
school_key integer
The student_school temporary table has the following columns -
school_year varchar2(8)
student_key integer
school_key integer
A relationship to the student_school to student_achievement is one-to-many. Not every student_school record has a record in the student_achievement fact table. And not every student_achievement fact record will have a record in the student_school table.
The logic used in my PL/SQL procedure to update all records in the student_achievement fact table is as follows -
cursor school_upd_curs is
select *
from student_school a
where exists (select 1 from student_achievement b
where a.student_key = b.student_key
and a.school_year = b.school_year)
order by student_key, school_year;
begin
open school_upd_curs;
loop
fetch school_upd_curs bulk collect
into col... limit 1000;
exit when school_year.count = 0;
for i in school_year.first .. school_year.last loop
if first record fetched for the student then
update student_achievement
set school_key = fetched school_key
where school_year <= fetched school_year;
elsif last record fetched for student then
update student_achievement
set school_key = fetched school_key
where school_year >= fetched school_year;
else
update student_achievement
set school_key = fetched school_key
where school_year = fetched school_year;
end if;
end loop;
commit;
end loop;
end;
Is there a way to speed the update?
Thanks.
January 05, 2007 - 9:43 am UTC
update the join of the tables.
update (select column_to_be_updated, value_to_update_to
from t1, t2
where (join) )
set column_to_be_updated = value_to_update_to;
trace for update that takes 2 hours
Matthew, January 09, 2007 - 12:54 pm UTC
Tom,
Sorry that this took so long to get to you. Had to wait for the DBA's to pull the trace file from the development system for me.
The column being updated is not indexed.
Here is the trace that you requested. Please let me know if you need more information.
update idl.srvc_mbr_prmry_dim
set current_flag ='C'
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.01 0 0 0 0
Execute 1 152.68 6057.11 240133 978321 917824 757474
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 152.69 6057.12 240133 978321 917824 757474
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 68 (IDL)
Rows Row Source Operation
------- ---------------------------------------------------
0 UPDATE SRVC_MBR_PRMRY_DIM (cr=973072 pr=232705 pw=0 time=2145490920 us)
1547044 TABLE ACCESS FULL SRVC_MBR_PRMRY_DIM (cr=978249 pr=237879 pw=0 time=3465511263 us)
Rows Execution Plan
------- ---------------------------------------------------
0 UPDATE STATEMENT MODE: ALL_ROWS
0 UPDATE OF 'SRVC_MBR_PRMRY_DIM'
1547044 TABLE ACCESS MODE: ANALYZED (FULL) OF 'SRVC_MBR_PRMRY_DIM'
(TABLE)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file scattered read 10919 57.93 5622.86
db file sequential read 10192 8.05 218.19
log buffer space 6 0.98 2.94
log file switch completion 12 0.39 3.54
latch: shared pool 5 0.02 0.06
latch: cache buffers lru chain 3 0.00 0.00
latch: cache buffers chains 1 0.00 0.00
free buffer waits 10 0.01 0.15
buffer exterminate 57 0.98 56.17
latch free 1 0.00 0.00
SQL*Net message to client 1 0.00 0.00
SQL*Net message from client 1 0.03 0.03
********************************************************************************
Any help that you can give me would be appreciated as the DBA's and I are stumped.
January 11, 2007 - 9:38 am UTC
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file scattered read 10919 57.93 5622.86
seems rather obvious that you have a potentially serious IO issue there doesn't it? almost a minute for at least one of those IO's to return??
optimize update millions of lobs
Wolfgang, January 10, 2007 - 10:18 am UTC
hi tom,
every year i have to reorganize my picture table (500GB). generealy i do this like this way:
1. make everything offline
2. CREATE TABLE dbildtmp AS SELECT cid, bildtn, text FROM dbild ;
3.) ALTER TABLE DBILD SET UNUSED (text);
ALTER TABLE DBILD SET UNUSED (bildtn);
4. ALTER TABLE DBILD DROP UNUSED COLUMNS;
5. ALTER TABLE dbild ADD (bildtn BLOB) LOB (bildtn)
STORE AS ( DISABLE STORAGE IN ROW TABLESPACE NGEN_2K_LOB );
ALTER TABLE dbild ADD (text CLOB)
LOB (text) STORE AS ( DISABLE STORAGE IN ROW TABLESPACE NGEN_2K_LOB );
6. update the new created collumns:
DECLARE
CURSOR cur IS SELECT * FROM dbildlobs;
v_anz NUMBER := 0;
BEGIN
FOR rec IN cur LOOP
UPDATE DBILD
SET bildtn = rec.bildtn , text = rec.text
WHERE cid = rec.cid;
IF v_anz = 1000 THEN
COMMIT;
v_anz := 0;
END IF;
v_anz := v_anz +1;
END LOOP;
COMMIT;
END;
/
7. make everything online
please could you tell me how i can reduce the runtime from point 2 and 4 and particularly point 6 ?
i tryed to modify the procedure with bulk collect and forall. although it doesnt work faster ( 6 hours )
here is my modified procedure:
CREATE OR REPLACE PROCEDURE Update_Dbild2 IS
TYPE t_text IS TABLE OF DBILDLOBS.TEXT%TYPE INDEX BY BINARY_INTEGER;
TYPE t_bildtn IS TABLE OF DBILDLOBS.bildtn%TYPE INDEX BY BINARY_INTEGER;
TYPE t_cid IS TABLE OF DBILDLOBS.cid%TYPE INDEX BY BINARY_INTEGER;
l_text t_text;
l_bildtn t_bildtn;
l_cid t_cid;
CURSOR cr_rec IS SELECT cid,bildtn,TEXT FROM DBILDLOBS WHERE ROWNUM < 1000;
BEGIN
OPEN cr_rec;
FETCH cr_rec BULK COLLECT INTO l_cid,l_bildtn,l_text;
CLOSE cr_rec;
FORALL i IN l_cid.FIRST .. l_cid.LAST
UPDATE DBILD
SET TEXT = l_text(i), bildtn = l_bildtn(i)
WHERE cid = l_cid(i);
END;
/
Large Update on table
ozjbrown, January 10, 2007 - 10:28 pm UTC
Hi Tom,
I am lost as to what you are recommending I am changing in my procedure that uses the bulk fetch and loops through to update another table with the value.
Would it be possible to use my example as to where I need to make the change?
Thanks.
Performance Issue!! Commit Strategy
Prashant Pathak, January 11, 2007 - 9:41 am UTC
Hi Tom,
I have a table which contains following columns
ALLOY itm OpSeq DEPT DAYS offset grp_id
Alloy 102618 9999 PROC 0 0 518346
Alloy 102618 180 SGP 5 0 518346
Alloy 102618 170 QA 1 0 518346
Alloy 102618 160 LAB 0 0 518346
Alloy 102618 150 TRP 1 0 518346
Alloy 102618 140 QA 1 0 518346
Alloy 102618 130 QA 1 0 518346
Alloy 102618 120 WP 5 0 518346
Alloy 102618 110 BA 2 0 518346
Alloy 102618 100 TRP 1 0 518346
Alloy 102618 90 QA 1 0 518346
Alloy 102618 80 QA 1 0 518346
Alloy 102618 70 QA 1 0 518346
Alloy 102618 60 MW2 3 0 518346
Alloy 102618 50 MW1 0 0 518346
Alloy 102618 40 M2M 0 0 518346
Alloy 102618 30 M2M 3 0 518346
Alloy 102618 20 PROC 1 0 518346
Alloy 102618 10 PROC 0 0 518346
Alloy 102963 9999 PROC 0 0 518346
Alloy 102963 90 CHP 1 0 518346
Alloy 102963 80 SG 5 0 518346
Alloy 102963 70 CHP 1 0 518346
Alloy 102963 60 PM 6 0 518346
Alloy 102963 50 PM 6 0 518346
Alloy 102963 40 TRP 1 0 518346
Alloy 102963 30 AIM 0 0 518346
Alloy 102963 20 AIM 0 0 518346
Alloy 102963 10 PROC 0 0 518346
Now i have to update the table for offset column.
starting with opseq 9999 and deduct days from previous row and updated offset column
so for opseq 9999 at first step it will be 0
ALLOY itm OpSeq DEPT DAYS offset grp_id
Alloy 102618 9999 PROC 0 0 518346
Alloy 102618 180 SGP 5 0-5=-5 518346
Alloy 102618 170 QA 1 -5-1=-6 518346
Alloy 102618 160 LAB 0 -6-0=-6 518346
Alloy 102618 150 TRP 1 -6-1=-7 518346
.
.
.
so i have to do this operation based on grp_id and this table has about 50,000 rows.
So my question is sholud i commit the transaction for each grp_id and go forward or should finish for all and at the end commit.
becase currently i am doing all deduction and update and after that only one commit and which is taking long time.
Thanks in advance
Re: update taking over 2 hours
Matthew, January 11, 2007 - 2:05 pm UTC
"Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file scattered read 10919 57.93 5622.86
seems rather obvious that you have a potentially serious IO issue there doesn't it? almost a minute for at least one of those IO's to return?? "
I agreee that they do have an IO problem, the SAN is set as Raid 5. I do have a couple of follow up questions in regards to this and hope you can provide some insight:
On the Trace file execution plan it shows:
Rows Execution Plan
------- --------------------------------------------------
0 UPDATE STATEMENT MODE: ALL_ROWS
0 UPDATE OF 'SRVC_MBR_PRMRY_DIM'
1547044 TABLE ACCESS MODE: ANALYZED (FULL)
OF 'SRVC_MBR_PRMRY_DIM' (TABLE)
1. The actual update statement only updated 757474 rows. Why does the execution plan show over twice the number of rows then what were actually updated?
2. There where 2 db read wait events with similar number of event counts:
db file scattered read 10919 57.93 5622.86
db file sequential read 10192 8.05 218.19
for the single update statement of the table.
Could this be caused by the width of the table making the actual full table scan into the buffer cache making multiple passes over the disc. The first pass using a sequential read and the second or subsequent passes doing a scattered read to match the data in the buffer to the data on the disc?
3. Would it be worth testing the following scenerios to see if I can reduce db scatter read:
a. Creating the very wide table with a larger block size and own SGA buffer size?
b. Increasing the db_file_multiblock_read_count at the session level to higher then 32?
Appreciate your input and look forward to your next book.
plsql is faster then sql
A reader, January 22, 2007 - 2:18 pm UTC
Hi Tom.
whats wrong with my database.
it seems that pl/sql is faster then sql
SQL> create table test nologging as select * from dba_objects;
Table created.
Elapsed: 00:00:01.53
SQL>
SQL> create index test_pk on test(object_id);
Index created.
Elapsed: 00:00:00.41
SQL>
SQL>
SQL>
SQL> declare
2 l_icnt number := 0;
3 l_ucnt number := 0;
4 begin
5 for x in ( select * from test )
6 loop
7
8 update test set status = 'ok' where object_id= x.object_id;
9 l_ucnt:=l_ucnt+1;
10
11 end loop;
12 dbms_output.put_line ('rows updated :'||l_ucnt);
13 end;
14 /
rows updated :46087
PL/SQL procedure successfully completed.
Elapsed: 00:00:11.20
SQL>
SQL>
SQL> commit;
Commit complete.
Elapsed: 00:00:00.05
SQL>
SQL> update test set status='ok';
46087 rows updated.
Elapsed: 00:00:17.80
SQL>
SQL> commit;
A reader, January 23, 2007 - 2:27 pm UTC
Why do you think something is wrong with your data base? I tried your example in the same order and the SQL test ran faster than the PL/SQL test, but I would expect that because the SQL test has the benefit of the now pre-loaded buffers. Did you run the test in the order as presented? Was your database busy with other things at the time of the test, or was it dedicated to your test? Same question can be asked of the server and the disk system. There can be a lot of variables that affect results like this, especially for such a small, short test.
Updating 1M records through bulk update take long time
RK, May 11, 2007 - 9:56 am UTC
I have an update statement provided below which is updating 1 million records
in lead_master table joining with a customer_table having around 5 million records.
This is taking more than 2 hrs. I could not figure out the reason.There are no sessions running parallely.
UPDATE lead_master
SET customer_gender_cd = (
SELECT
DECODE (customer_gender_cd,
NULL, NULL,
'0' ||customer_gender_cd) FROM customer_table
WHERE customer_table.customer_id = lead_master.customer_id
)
WHERE lead_master.customer_id IS NOT NULL;
May 11, 2007 - 11:44 am UTC
because you may well be executing that subselect 1,000,000 times
you don't say if this is indexed
what release
what version
what plan
what anything really.
update ( select a.customer_gender_cd a_cd,
b.customer_gender_cd b_cd
from lead_master a, customer_table b
where a.customer_id = b.customer_id(+) )
set a_cd = nvl2( b_cd, '0'||b_cd, null );
outer join if and only if necessary of course.
Updating 1M records through bulk update take long time
RK, May 11, 2007 - 9:56 am UTC
I have an update statement provided below which is updating 1 million records
in lead_master table joining with a customer_table having around 5 million records.
This is taking more than 2 hrs. I could not figure out the reason.There are no sessions running parallely.
UPDATE lead_master
SET customer_gender_cd = (
SELECT
DECODE (customer_gender_cd,
NULL, NULL,
'0' ||customer_gender_cd) FROM customer_table
WHERE customer_table.customer_id = lead_master.customer_id
)
WHERE lead_master.customer_id IS NOT NULL;
RK, May 14, 2007 - 10:08 am UTC
Oracle Release is Oracle9i Release 9.2.0.4.0 - 64bit Production
The suggested modified SQL was much useful.
It got executed in 6 minutes.
Thanks a lot.
sammy, May 21, 2007 - 8:01 am UTC
Hi Tom,
I am trying to duplicate your May 11th followup.
I am getting the following error...ORA-01779: cannot modify a column which maps to a non key-preserved table.
UPDATE (select a.oldCol a_oldCol,
b.newCcol b_newCol
from table1 partition(part1) a,
table2 b
where a.id = b.id)
SET a_oldCol = b_newCol
b.id is indexed .
a.id is indexed (local).
I also want to use the nvl2 function you use¿can you please explain
nvl2( b_cd, '0'||b_cd, null )
as opposed to...
nvl2( b_cd, b_cd, null ) ??
Thanks in advance for you advice.
I am using Oracle9i Enterprise Edition Release 9.2.0.6.0 - 64bit Production
May 21, 2007 - 11:10 am UTC
give full example - you need a CONSTRAINT (primary key or unique) on table2(id)
Direct-path INSERT and TEMP Tablespace
A reader, June 06, 2007 - 12:23 pm UTC
Hi Tom,
I am using direct-path INSERT to copy the contents of source_table to destination_table.
When I enable PDML and the INSERT runs in parallel, I observe the use of sort_usage, temp tablespace filling up and, at the same time, nothing being written to the tablespace containing the destination table. Eventually, temp tablespace runs out of space.
However, when I don't enable PDML and the INSERT runs in serial, Oracle starts writing to the temp segments in the destination table's tablespace as soon as the source result set is available. In this scenario, it never uses any sort segments in temp tablespace.
Why is temp tablespace being used during PDML?
As you can see from the query, there are no sorts or joins -- it's just a straight select *. There are no indexes on the destination_table. I thought PDML would work similarly to serial DML (as far as space allocation goes). Shouldn't each parallel execution sever just pick a partition in destination_table start allocating extents in the destination_table's tablespace.
Here is what I run:
alter table destination_table nologging;
insert /*+ append */ into destination_table
select /*+ parallel(s 1) full(s) */ * from source_table@link s;
commit;
The same behavior can be observed when both tables are in the same database (and the select doesn't use the link).
Environment:
Oracle 9.2.0.6
source_table and destination_table are partitioned
source_table is ~400GB in size
temp tablespace is 40GB
there are no indexes on destination_table
Thank you in advance for your help.
June 06, 2007 - 9:22 pm UTC
full disclosure - show the plans, show the queries you use to monitor this, show everything we need to see what you see.
and you do know that the dblink will serialize right, so the parallel is extremely questionable - you have a parallel query getting serialized over the dblink to return the data.
Deleting 1-2 million of records from 300 millions records
Franklin, June 20, 2007 - 8:27 pm UTC
Hi Tom,
In one of my project I have a table which contains more than 400 million records, everyday one of my procedure will execute which will delete 1 million records and insert new 1 million records, some time it may no delete any single records, but insert fresh 1 million records, I have tryied no. of way but performance is vary bad, can u guide me.
My table : orders having columns
ord_id,
inv_id,
inv_date
col1,
col2,
...
my delete will be based on ord_id, inv_id, inv_date
delete from orders a
wHERE NOT EXISTS (SELECT 1 FROM ordertemp b
WHERE a.ord_id = b.ord_id
and a.inv_id = b.inv_id
and a.inv_date between b.sdate and b.edate)
My orders tables generated all records based on ordtemp which contains sdate and edate, now suppose this period has changed for any order then i have to delete this data and insert data with new period..
June 21, 2007 - 10:23 am UTC
what indexing strategies have YOU tried.
what is "performance is very bad" - what are you observing and what are your expectations.
have you considered MERGE instead of delete+insert (eg: update existing, insert new)
"insert into" using a cursor
ht, August 17, 2007 - 5:44 pm UTC
Hi Tom,
I took your advice and, instead of an update that ran for days, am using "create table as select * from".
I'm also using pl/sql with a cursor to insert additional rows into the new table and am wondering if there is a more efficient way to perform the insert. The pl/sql is trying to bring over the bare minimum of recs by first finding a "down" rec, getting only the next "up" rec and inserting it:
>declare
2
3 cursor c1 is
4 select
5 id,op,db
6 from
7 tbl1
8 where
9 status='down'
10 order by
11 id,
12 op,
13 db;
14
15 begin
16
17
18
19 for x in c1 loop
20
21 insert into tbl2
22 (
23 select
24 *
25 from
26 tbl1
27 where
28 id=
29 (
30 select min(id)
31 from
32 tbl1
33 where
34 id>x.id
35 and op=x.op
36 and status='up'
37 and db=x.db
38 )
39 );
40
41
42
43 end loop;
44
45 end;
46
47 /
August 22, 2007 - 9:48 am UTC
would need to know more, about the data.
if you are trying to get the next "up" record after each "down" record - next being assigned by ordering by ID for each op/db - there are better ways, sure
insert into tbl2
select *
from (
select tbl1.*,
lag(status) over (partition by op, db order by id) last_status
from tbl1
where status in ( 'up', 'down' )
)
where status = 'up' and last_status = 'down'
would get the "next up record after each down record by OP and DB" for example
Works great - should an index be created?
ht, August 22, 2007 - 3:16 pm UTC
Tom,
As always, you helped tremendously. My process is taking 3 minutes instead of 2 hours. Given I'm new to "lag" and "partition" statements, would adding an index speed up the process? If yes, which columns would you index on? I know full table scans are not a bad thing - just wondering if I can get tune the query further with indexes since I'll be running this 200+ times.
August 23, 2007 - 10:44 am UTC
unless the 'up' and 'down' records represent a tiny portion of ALL records - no index, we want a nice juicy full scan.
do they?
why are you running this 200 times?
once for each set of recs
ht, August 24, 2007 - 4:33 pm UTC
Tom,
Thanks for the advice. The fts is appropriate in this case and I have to perform the operation 200+ times since I was testing only a subset of the data. In reality, I just removed a predicate to operate on all the recs and it works great!
How to Load (6M+ Records x 300 Values ) between Databases
Saravanan Ramalingam, August 30, 2007 - 12:04 pm UTC
Hi Tom,
I have a requirement where I need to load around 6M+ records on daily basis from source database on which every record will hold a minimum of 300 different values. (Source DB will throw only 6 values and the others are permutation and combination of them which will be calculated on the fly). This will go up to 20 Million records per day in future.
The constraints are:
1. Currently I receive 300 different values to be stored (Per Record), which is likely to be increase to 1000 and max up to 25000 different values. This value is for only one record and the total number of records which I receive is around 6M per day.
2. As per the SLA the load window is one hour
3. I need to retain 10 years of data inside active database (10Yrs * 365 Days * 6M Records) to be stored and (10 Yrs * 365 Days * 6M Records * 300 Different Values) needs to be holded
Also I would like to know how to take care of this from the Performance, Storage and Hardware stand point for Oracle.
I have tried all theoretical solutions which are available on the net, they speak only about partitioning, compressing. But I have not found a solution to handle 300 columns (which can be extended up to 10000 in future).
Thanks in Advance,
Regards,
Saravanan Ramalingam
September 04, 2007 - 5:38 pm UTC
by 300 different values, do you mean "columns", "attributes"?
loading 20 million records will take minutes.
you will want to utilize partitioning to permit you to age the data out over time.
I see you meant 300 columns, which can theoretically go to 1000 but will never go to 25,000. The maximum number of attributes in a table is 1,000.
but - why is 300 columns "a sticking point", what are you looking for to address that? so what if you have 300 columns - why is that relevant to loading data?
is this practible ?
Karin, September 01, 2007 - 7:28 am UTC
Hallo Tom,
do you think, this is still the better way in our case ? the table contains 40 million sentences, has got a lot of foreign keys and we need to update a numeric field from n to n+1 for half of the sentences. And we have got a partitioned table and a partitioned index on that table.
Thanks for your respond, I already got a lot out of your answers.
Karin
September 05, 2007 - 12:42 pm UTC
I don't know what a sentence is in this context?
sorry, language problem ;-))
A reader, September 12, 2007 - 9:56 am UTC
sentence = row
September 15, 2007 - 4:41 pm UTC
go ahead and update it?
you wrote:
...
the table contains 40 million sentences,
has got a lot of foreign keys and we need to update a numeric field from n to n+1 for half of the
sentences. And we have got a partitioned table and a partitioned index on that table.
....
40 million rows - understood.
foreign keys - so what? You don't relate that to this column at all, so not sure what relevance it has.
partitioned - again, not sure of the relevance
indexes - you don't say that THIS column is indexed, so not seeing the relevance.
A delete scenario
A reader, September 19, 2007 - 2:11 pm UTC
Hi Tom,
I have a table containing 20 Million rows. The table has a composite index (3 columns). The application continually inserts into the table and also does delete from the table every 1 hour through independent delete statements(1 per row). The delete operation is pretty slow and we see a lot of waits on "db file sequencial access" during the delete operation. The delete is based on the indexed column values.
Please suggest a solution which will help in having better performance.
September 20, 2007 - 9:30 pm UTC
define pretty slow, got a tkprof?
the delete must update ever index, not just the one it reads, the single block IO's are likely due to that.
slow by slow processing should be avoided, it is always the slowest way to process data. think "sets"
ENABLE PARALLEL DML" and TEMP tablespace
Florin, September 20, 2007 - 12:05 pm UTC
Hi Tom,
Would you please explain me why the "ALTER SESSION ENABLE PARALLEL DML" is consuming TEMP tablespace?
If I'm not using "alter the session enable parallel dml" Oracle will use anyway the paralellism from the hint (4)
Example:
--ALTER SESSION ENABLE PARALLEL DML;
INSERT /*+ PARALLEL ( BL1_CUSTOMER_INFO_TEST 4) APPEND */
INTO BL1_CUSTOMER_INFO_TEST
SELECT /*+ PARALLEL ( BL1_CUSTOMER_INFO 4) FULL (BL1_CUSTOMER_INFO ) */
*
FROM BL1_CUSTOMER_INFO;
In this case the table BL1_CUSTOMER_INFO has more than 200 million rows and over 50 GB.
If I'm leaving the "alter session enable parallel dml" it will fail on TEMP tablespace, while if I'm disabling the "alter session .." it will go direct to the tablespace of the BL1_CUSTOMER_INFO_TEST.
Many thanks in advance!
September 24, 2007 - 7:32 am UTC
are you sure it is failing on TEMP or is it failing to acquire a temporary extent.
big difference. Let us see the precise error message please.
and what indexes have you on this table.
ENABLE PARALLEL DML" and TEMP tablespace
Florin, September 26, 2007 - 9:25 am UTC
Hi Tom,
I did another test (this time in our test database not like previous one in Prod).
I have a table (TEST1) with 25 million rows (1.7 GB).
AIM_DBA>create table test2 nologging tablespace pool_ix as select * from test1 where 1=2;
Table created.
AIM_DBA>alter session enable parallel dml;
Session altered.
AIM_DBA>insert /*+ append parallel (test2 2) */ into test2 select /*+ parallel (test1 2) */ * from test1;
ORA-12801: error signaled in parallel query server P000
ORA-01652: unable to extend temp segment by 128 in tablespace TEMP
If I'm doing the same thing without "alter session enable parallel dml" :
AIM_DBA>insert /*+ append parallel (test2 2) */ into test2 select /*+ parallel (test1 2) */ * from test1;
25192932 rows created.
Elapsed: 00:03:31.74
Please note:
No indexes in both tables.
I'm using Oracle 9.2.0.5.0 - 64 bit
I'm very curious to hear your opinion.
Thanks!
Florin
September 26, 2007 - 9:39 pm UTC
i've bookmarked and will look at this when I get back (and have a server whereby I can do parallel on)
Update Query
Pete, October 05, 2007 - 9:09 pm UTC
Hi Tom,
We have a query written like mentioned below
update table a set column B = some value where id = :id;
commit;
here column ID is a primary key
This query is written inside a package and is called each time when a user logs into a system. At one time the system ran slow when about 100 persons tried to access the system and the DBA at the client side says this is the wrong way to write the query. It should be written like this (see below);
---- below is DBA reply --
"Like about a 100 people,so this table gets updated frequently so there is a need to avoid locking the index when caching.Your related rowid and using that to bypass any index locking ,access is so important"
he suggest the following way :-
Set transaction read only;
Select rowid into dbky from table a where id = :id;
(best done in a package w. package variables)
Commit;
Return Success;
Then when you need to update
Update table a SET column b = some value where rowid = :dbky; <- no locking index
Commit;
-===========================
My questions is which is the better way ? Rowid or using the primary key as written earlier and why ?
Thanks and appreciate your feedback
Upsert
Antony, October 10, 2007 - 10:16 am UTC
I do have some issues with the merge. Currently in my database there are different schemas where the datas are manipulated with differenet applications. There is one schema which called USAGE is used to maintain the login details etc from differnet application. Currently there is a package for manipulating which is used to fetch the values from different tables in diferenet schema. The procedure basically first deletes the value from tables in USAGE corresponding to the values in source table deleted and updated then it tries to insert the data. The most problem is that during insert and delete statment it takes a hell lot of redo log files due to 1- 32 millions of data in the source table. I reviewed so many comments in u r site and found there are 2 solutions either use Merge statment or drop the table and create a table with nologging then insert the statement that won't be increasing the redolog files. Could you please suggest to which i should proceed
When its done?
Karthick, October 26, 2007 - 3:20 am UTC
After reading this thread I figured out one thing. BIG TABLE makes people crazy.
But my question is; when you decide you are done with it for now?
Say you got an update which took 1 hour. Ok you are clever and you made it to run in 15 min. But is 15 min is the optimal time for now? How would you decide that? Because we cant keep on testing and trying. Things need to go to production right!!
October 29, 2007 - 10:33 am UTC
flip the question around.
what are the requirements for it to complete. how fast does it NEED to execute.
If you always move half way to a wall (eg: you start at 10 feet away and get to 5 feet away. Then you get to 5/2 feet away, then 5/2/2 feet and so on) you never actually get to the wall (sure you get very very close...)
you have to know when you can stop.
Can you show how you do that.
karthick pattabiraman, November 05, 2007 - 7:21 am UTC
you said
"it only takes a couple of minutes to update 5 million rows on my desktop pc -- so, there must be
something more involved here.
I'd be able to do that in minutes -- why is it taking you so long -- what is your bottleneck. "
with source table having 1000000 records and target table having 5000000 records where all the records in target table are updated by source.
Can you show how you do it in couple of min. I think it will definitly take lot of time.
Just want to see how you create your index and how you perform your update and so on..
I just want to see that being done with UPDATE statement not CREATE TABLE... SELECT * FROM..
Because i cant drop a table :-(
November 06, 2007 - 9:04 am UTC
"i think it will definitely take lot of time"
why do you "think" that, why don't you "try that"?
what index would I create? If I am going to update 5 million out of 5 million rows, I am not going to use an index.
update t set col = 'value';
that is how the update would go? Nothing special.
I tried
karthick pattabiraman, November 06, 2007 - 9:19 am UTC
Yes I did try. My source table has 1 million records and target has 2 million records.
So my update was like this simple
update target_table
set target_col_1 = (select source_col_1
from source_table
where source_col_2 = target_col_2)
where exists ( select null
from source_table
where source_col_2 = target_col_2)
On seeing the explain plan I was able to identify that
Target table was going for a full table scan and source_table was going for an index range scan.
So I "think" the update will be executed in this way.
for i in (select * from target_table) --> full table scan
loop
select source_col_1
from source_table
where source_col_2 = i.taget_col_2 --> index range scan
replace i.target_col_i with source_col_1
end loop
I don¿t have any index on target_table as you said any way its going to be a full table scan. I have an index on source_col_2.
So when I did the update I took more than 1.5 hour (I don¿t know the exact time as I killed the session) so I enabled parallel dml and ran the same and it completed in 20 min.
Would you suggest parallel dml for such updates? Or any other things can be done before going parallel.
And after that I changed the source from 1 million to .1 million and ran the update in parallel again. Again it took around 18 min.
So is it because source_table is going for index range scan it doesn¿t really matter how many rows are there in source_table.
November 06, 2007 - 9:39 am UTC
update (select data_from_src, data_from_tgt from src, tgt where src.key = tgt.key)
set data_from_tgt = data_from_src;
update a join, don't do all of those correlated subqueries.
no indexes should be involved although src needs to have a unique constraint on the joined to columns.
Thanks a lot
karthick pattabiraman, November 07, 2007 - 2:17 am UTC
What to say as the world says you "ROCK"..
my 20 min query that was running parallel just completed in 3 min without parallel when i changed the update the way you suggested.
can you explain us how oracle internally processes this update.
Once again thank you so much for all your help.
November 07, 2007 - 5:13 pm UTC
it just joins.
Stuart Horsman, November 08, 2007 - 6:24 am UTC
Tom,
If we're doing an application release and updating multiple tables and multiple indexes with millions of rows, is it best to:
1) Put the database into no archivelog mode.
2) Alter all tables and indexes nologging.
3) Run inserts and updates.
4) Alter all table and indexes logging.
5) Put database back into archive log mode
Regards
Stuart
November 08, 2007 - 10:15 am UTC
nologging would have no effect.
nologging works for large direct path operations such as:
o create table as select
o create index
o insert /*+ append */ (when the append hint is actually followed, can be followed) - but only for the TABLE, not for any indexes.
nologging will not affect
o insert, update, delete, merge (without append on insert or merge....)
o index maintenance
and in noarchivelog mode, nologging is not necessary, many direct path operations will not generate log in general (create table as select, insert /*+ APPEND */).
If you are not updating an indexed column, just update - try to make sure you ONLY update a table once (eg: do not update col1, then col2 then col3 - just make a single pass).
If you are updating an indexed column - consider making the index unusable and rebuild (with or without logging, up to you).
If you are updating such that you think there might be lots of migrated rows (rows grow in size), you might
o create table as select <modified data>
o drop old table
o rename new table
o re-index new table
compound key
Karthick., November 21, 2007 - 5:24 am UTC
When the column in the source is not unique how to do the same update.
say source COL1+COL2 is unique. i join both column to target table and update target with COL2. But COL2 alone is not unique. And because i have used both COL1 and COL2 in the join condition it ensures to return only one value. But still i cant do join update. In such case what kind of update is better..
is it..
update t set t_c1 = (select s_c1 from s ...)
where exists(select null from s...)
Why 'created' instead of 'Inserted'
chandra, December 11, 2007 - 3:25 pm UTC
Tom,
Please the below..
====
SQL> insert into tmp_is_oracle_supplier_match
2 (
3 otis_vendor_num,
4 otis_vendor_name,
5 otis_tax_id,
6 status_code,
7 comments
8 )
9 select attribute2,
10 vendor_name,
11 num_1099,
12 0,
13 'Valid Record'
14 from
15 ap_suppliers_int
16 where status is null
17 and REJECT_CODE is null
18 and 1=2;
0 rows created.
===
When I try to insert, with the above command, why it says "Created" instead of "Inserted" ??
December 11, 2007 - 9:41 pm UTC
because sqlplus likes to say "created" when you use insert.
Locking conflict
Feng, December 12, 2007 - 8:28 pm UTC
Hi Tom,
Here is my case which is on 9.2.0.8. On daily basis, we need to batch update few million rows in a 30-million row table (t1) using something like:
update
( select t1.a, t2.b
from t1,t2
where t1.fk=t2.pk
)
set a=b;
t2 is a table with tens of millions rows as well. Usually this takes few hours. We also have small amount real time updates against a single record in t1 from web user. If real time user update hit the same record being updated by batch update process, the real time user update process will wait and eventually time out if batch update still not finish. In order to avoid/minimize such locking conflict, we are thinking of reducing the size of batch update using PL/SQL. Is there other way to solve this problem?
Thanks in advance.
December 13, 2007 - 9:40 am UTC
don't do the update.
Why no join to retrieve the correct value
Locking Conflict
feng, December 13, 2007 - 12:52 pm UTC
Tom,
Due to our business model, t2 is a temp table which contains periodic update from our vendor that will be used to update our permanent table t1, thus we can not join to retrieve value as you suggested.
Any alternative in this case besides reducing size of update statement either in SQL or PL/SQL?
Thanks.
December 13, 2007 - 2:24 pm UTC
well, before I even begin to describe how to do this I want you to say yes or no in no uncertain terms to the following question:
What happens if another process reads a record that was updated and committed, and another record that was not yet updated (or was updated but has not yet been committed) during your "batch update". Will the fact this other process will be seeing data that heretofore was IMPOSSIBLE to see (they would never see two records that are inconsistent with respect to each other) be ok. The fact that data integrity is going to be compromised here - is that something your applications can actually live with, deal with, not have a problem with.
Because that is what you are suggesting to do - to have PART of your update be seen by every one before the entire update itself is complete.
also, we need a flag field or the ability to delete the record in t2 as we process it - so we don't process the same record more than once in the event (which will happen) of a failure. Do we have such a field that we can update in t2 as we process it - or can we delete the record from t2.
Locking Conflict
feng, December 13, 2007 - 4:58 pm UTC
Tom,
Thanks for your prompt follow up.
1) Data integrity is not a issue here as records are independent from each other so they don't have to be updated at same time.
2) We don't delete records in t2, but we can add a flag field to it if it's needed.
Please let me know if you need additional information. Thanks.
December 14, 2007 - 12:51 pm UTC
ops$tkyte%ORA10GR2> create table t1 as select * from all_users;
Table created.
ops$tkyte%ORA10GR2> alter table t1 add constraint t1_pk primary key(user_id);
Table altered.
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create table t2 as select user_id, lower(username) username, 'N' processed from all_users where mod(user_id,2) = 0;
Table created.
ops$tkyte%ORA10GR2> create index t2_idx on t2(processed);
Index created.
ops$tkyte%ORA10GR2> select * from t1;
USERNAME USER_ID CREATED
------------------------------ ---------- ---------
BIG_TABLE 58 14-DEC-05
DIP 19 30-JUN-05
TSMSYS 21 30-JUN-05
LOTTOUSER 65 30-DEC-05
MDDATA 50 30-JUN-05
OPS$ORA10GR2 56 14-DEC-05
FOO$TKYTE 60 19-DEC-05
QUOTA 94 22-FEB-06
AQ 228 15-OCT-07
R 76 09-JAN-06
OPS$TKYTE 233 16-NOV-07
SCOTT 84 12-FEB-06
B 213 08-JAN-07
RT_TEST 231 29-OCT-07
A 221 04-SEP-07
TEST 237 16-NOV-07
DMSYS 35 30-JUN-05
DBSNMP 24 30-JUN-05
WMSYS 25 30-JUN-05
EXFSYS 34 30-JUN-05
CTXSYS 36 30-JUN-05
XDB 38 30-JUN-05
ANONYMOUS 39 30-JUN-05
OLAPSYS 47 30-JUN-05
ORDSYS 43 30-JUN-05
ORDPLUGINS 44 30-JUN-05
SI_INFORMTN_SCHEMA 45 30-JUN-05
MDSYS 46 30-JUN-05
SYSMAN 51 30-JUN-05
PERFSTAT 148 31-MAR-06
SYS 0 30-JUN-05
SYSTEM 5 30-JUN-05
OUTLN 11 30-JUN-05
MGMT_VIEW 53 30-JUN-05
MY_USER 211 09-NOV-06
35 rows selected.
ops$tkyte%ORA10GR2> select * from t2;
USER_ID USERNAME P
---------- ------------------------------ -
0 sys N
24 dbsnmp N
34 exfsys N
36 ctxsys N
38 xdb N
44 ordplugins N
46 mdsys N
50 mddata N
56 ops$ora10gr2 N
58 big_table N
60 foo$tkyte N
94 quota N
228 aq N
76 r N
84 scott N
148 perfstat N
16 rows selected.
ops$tkyte%ORA10GR2> declare
2 type numArray is table of t2.user_id%type;
3 type vcArray is table of t2.username%type;
4 l_userids numArray;
5 l_usernames vcArray;
6 begin
7 loop
8 update t2
9 set processed = NULL
10 where processed = 'N'
11 and rownum <= 10
12 returning user_id, username
13 bulk collect into l_userids, l_usernames;
14
15 exit when l_userids.count = 0;
16
17 forall i in 1..l_userids.count
18 update t1 set username = l_usernames(i) where user_id = l_userids(i);
19 commit;
20 exit when l_userids.count < 10;
21 end loop;
22 end;
23 /
PL/SQL procedure successfully completed.
ops$tkyte%ORA10GR2> select * from t1;
USERNAME USER_ID CREATED
------------------------------ ---------- ---------
big_table 58 14-DEC-05
DIP 19 30-JUN-05
TSMSYS 21 30-JUN-05
LOTTOUSER 65 30-DEC-05
mddata 50 30-JUN-05
ops$ora10gr2 56 14-DEC-05
foo$tkyte 60 19-DEC-05
quota 94 22-FEB-06
aq 228 15-OCT-07
r 76 09-JAN-06
OPS$TKYTE 233 16-NOV-07
scott 84 12-FEB-06
B 213 08-JAN-07
RT_TEST 231 29-OCT-07
A 221 04-SEP-07
TEST 237 16-NOV-07
DMSYS 35 30-JUN-05
dbsnmp 24 30-JUN-05
WMSYS 25 30-JUN-05
exfsys 34 30-JUN-05
ctxsys 36 30-JUN-05
xdb 38 30-JUN-05
ANONYMOUS 39 30-JUN-05
OLAPSYS 47 30-JUN-05
ORDSYS 43 30-JUN-05
ordplugins 44 30-JUN-05
SI_INFORMTN_SCHEMA 45 30-JUN-05
mdsys 46 30-JUN-05
SYSMAN 51 30-JUN-05
perfstat 148 31-MAR-06
sys 0 30-JUN-05
SYSTEM 5 30-JUN-05
OUTLN 11 30-JUN-05
MGMT_VIEW 53 30-JUN-05
MY_USER 211 09-NOV-06
35 rows selected.
ops$tkyte%ORA10GR2> select * from t2;
USER_ID USERNAME P
---------- ------------------------------ -
0 sys
24 dbsnmp
34 exfsys
36 ctxsys
38 xdb
44 ordplugins
46 mdsys
50 mddata
56 ops$ora10gr2
58 big_table
60 foo$tkyte
94 quota
228 aq
76 r
84 scott
148 perfstat
16 rows selected.
expect this update to take longer than the other.
10 is what I used, you would use 100 to 500.
it'll generate more redo, more undo, consume more cpu, take long - but will do it in batches.
and it will not suffer from 1555's as it does not keep a cursor open across commits.
and it is restartable when it fails, just rerun it and it will do it's job.
You will want an index on that column, even with two values.
why did I use NULL instead of "Y" - because entire null entries are not made in b*tree indexes - hence this index will stay small, it only indexes the "N" records.
Thanks
Feng, December 14, 2007 - 2:48 pm UTC
Tom,
Thanks for the solution. We will try this out.
Performance question
Feng, January 15, 2008 - 4:26 pm UTC
Hi Tom,
During testing of your bulk select/update solution, I found out that it has almost same execution time as following PL/SQL.
begin
for x in ( select * from t2 )
loop
update t1 set a=x.b where t1.fk=x.pk;
commit;
end loop;
end;
/
Since this method only lock/commit one row at a time, would this be a better solution than the bulk select and update of 500 rows at a time given that both have similar response time?
What do you think?
Feng
January 15, 2008 - 4:57 pm UTC
*no*
you have a transaction - a logical unit of work.
You commit IF and ONLY IF and when your transaction is complete.
what if this updates N rows and then fails? what then? where is your code to restart in the middle?
the goal - especially in Oracle - is not to reduce the number of locks at the expense of PROGRAM LOGIC AND CORRECTNESS (we are not sqlserver!).
Performance question
Feng, January 15, 2008 - 6:49 pm UTC
In our application, if we fail after updating N rows, we can start over by updating everything again. This is not as efficient as starting from what was left from last run, but it will reduce chance of lock conflict further since the chance of failure is pretty low.
January 16, 2008 - 3:13 pm UTC
do you understand the concepts of transactional consistency?
if you are worried about "locks", that means other people must be updating this information.
So, you come and update a row, commit, they update the same information, commit, you fail and you come back and just blind as a bat overwrite their updates again???
You understand that the outside world will see a BIT of your update before it is totally complete?
Do you understand the natural ramifications of what you propose and are you sure your applications are a) expecting that b) can deal with that c) won't totally corrupt your database (logically speaking)
If you actually have what you call "lock conflict", I think the answers to the above are "we really need to commit when we are DONE, because apparently other people are trying to modify this data and we'll introduce lost update problems on a scale the size of the grand canyon"
You are right!
Feng, January 16, 2008 - 5:01 pm UTC
delete thousands records
Dawar, January 17, 2008 - 2:49 pm UTC
I have a excel spread sheet base on one our internal table.
It contains 7000 records.
I need to delete all records from Oracle Database.
What is the quickies way to delete all records?
Thanks
D
January 19, 2008 - 10:24 pm UTC
7,000 records is
a) big for excel
b) trivial, tiny, almost non-existent for Oracle.
I would just use DELETE FROM T;
Truncate would not be much faster, if it were faster even.
7,000 is such a small number.
BULK UPDATES
SRINIVAS CHALLA, January 17, 2008 - 7:57 pm UTC
For bulk updates/deletes, Please also consider Merge Statmement. With the usage of Merge statement, one can overcome the temp table creation or performance issues of Pl/SQL BULK update statement.
MERGE INTO ok a
USING (
SELECT ID
FROM ok1
) b
ON (a.id = b.id)
WHEN MATCHED THEN
update set a.flag = 'Y'
DELETE WHERE a.FLAG='Y';
desc ok;
Name Null? Type
----------------------------------------- -------- ----------------------------
ID NUMBER
FLAG VARCHAR2(1)
SQL> desc ok1;
Name Null? Type
----------------------------------------- -------- ----------------------------
ID NUMBER
FLAG VARCHAR2(1)
January 19, 2008 - 10:31 pm UTC
why would you even consider merge for that?
delete from ok where id in (select id from ok1);
that is all you need, why would you even consider complicating things so horribly in that case with merge?
Temp Tablespace Error
Nikhilesh, April 06, 2008 - 11:54 pm UTC
Dear Tom,
When we create a table using CTAS (Create table as select) (query retunrs 20 thousand rows) it works fine for first time but then it eats up all temp tablespace and gives an error saying that it could not allot an extent. We are using Oracle 10.1.0.3, we have true temporary tablespace of 9.6 GB. extent size is 1028.
I'm the only person connected to Oracle and yes after running successfully i killed 1 session which was running the same query using dynamic sql.
Am I doing something wrong or Is it a bug in 10.1.0.3?
I don't have metalink access.
Thanks in advance,
April 07, 2008 - 9:29 am UTC
no, it is not a bug, it is however insufficiently described to say what it is.
It seems like you are saying:
we do this:
a) create table t as select ......
b) drop table t;
c) create table t as select .... <<<= and this one fails.
are you sure it is blowing up on TEMP - or are you doing something like:
a) create table T1 as select ..... <<== works
b) create table T2 as select ... <<== fails on temp
If that is the case, it could be that what is "blowing up" is not temp at all - but the tablespace where t2 is going into - when we do a CTAS - we load the new data into temporary extents in the target tablespace AND THEN at the end, convert them into permanent extents.
Many times, people say "ran out of temp space", when they really ran out of space in the target tablespace (so, check the error - maybe even consider posting a CUT AND PASTE of it)....
Or, if it is not T2's tablespace that is blowing up, but there are two different tables - I'd say "table T1 needed little to no temp and T2's query needs more temp than you have allocated"
Delete the rows that contain BLOB object as one of the column
Ramakrishna, May 13, 2008 - 11:01 am UTC
I am trying to delete more than 100,000 records. It is faster in the tables that does not contain BLOB object as one of the column.I tried it through simple SQL plus
like 'delete from table_name where indexed_column<10000'. It is hanging there for a long time even for two days , the query is not done. How to effeciently delete this and what is the query I cna use to delete it quickly.
May 13, 2008 - 11:18 am UTC
two days? really? Not that I don't believe you - but I don't believe you - you must have been blocked or otherwise locked out.
although the statement "It is faster in the tables