rowid range splitting
Pravesh Karthik from India, December 11, 2004 - 7:14 pm UTC
Tom,
Nice to hear of rowid range splitting, but how to go about it? .. how to know that these are the rowid. i have for the range of date -- write a query or a block, fetch the rowid's for the unique record, go after those rowid's?
Thanks for your consideration,
Pravesh Karthik from India
December 11, 2004 - 7:40 pm UTC
Thanks as always.
NOTNA, December 12, 2004 - 3:08 am UTC
I both have your book and I'll just try to implement what you've suggested. by the way, when you said :
<quote>
Your current approach is maybe even using an index -- really bad (you end up
reading every row via the index -- Many times the work you could perform just by
scanning by rowid ranges).
</quote>
Even if oracle is reading 150t of rows out of 2M, using and index is also not a good approach???
Cheers,
NOTNA
December 12, 2004 - 9:58 am UTC
you are reading ultimately all 2million rows in aggregate! that was my point, you might only be reading 1/18th of the rows via an index but you have 17 other sessions doing the same.
Do it yourself parallelism.
Sam, December 12, 2004 - 8:03 am UTC
Hi Tom,
Do you recommend DIY, parallelism (Do it yourself parallelism) in the first place?
Shouldn't that be left to CBO to come out with optimum plan?
December 12, 2004 - 10:03 am UTC
You have a batch process.
It operates slow by slow (opps -- meant row by row)
so, it is doing something like:
for x in ( select * from ..... )
loop
process record;
end loop;
doesn't matter how "parallel" that select * from is done -- you are serial serial serial -- row by row.
Now, how could the CBO make your serial process go in parallel? It cannot -- hence, enter DIY parallelism, for when you need to parallelize YOUR processing.
The database -- it can parallelize a query.
The database -- it cannot parallelize your serial processing.
Only you can take advantage of all of the CPU's in this case.
foggy day for me ...
Gabe, December 12, 2004 - 10:44 am UTC
I read the original question over and over and still cannot picture the context ... is that table the source, the target or both the source and target for their batch process?
DYI parallelism is about splitting the source ... the questions says <quote>This table is updated during batch to update account balances</quote> ...
For Gabe...
NOTNA, December 12, 2004 - 11:47 am UTC
Actually it's both, source because this is your accounts table and it is also the target because the balances will be updated in that same table.
Tom,
How about partitioning the int_key to hash, would it help???
Cheers,
NOTNA
December 12, 2004 - 12:07 pm UTC
what exactly is wrong with "rowid" ranges (which would accomplish much the same of hash partitioning into some power of two number of partitions???)
Sure, you could partition into 2, 4, 8, 16, 32, ... hash partitions and start a job per partition -- but that would be virtually identical to breaking up the existing already in place table by rowid ranges.
Well, not identical, because using the hash technique you would have to use dynamic sql in order to process "from table partition(pname_N)" -- the partition name would have to be put into there. It would actually be harder with hash partitions...
DIY vs partitioning
Reader, February 15, 2005 - 11:17 pm UTC
we have a transaction table (1 million rows ) .we break this table into 20 parts then run 20 batch jobs to insert data .
this works but the load on the system is very high during this time and code is not very clean .
Will it be better if we use partitioned tables and parallelism instead of DIY approach ?
will using a plsql bulk procedure help .
Thanks in advance .
February 16, 2005 - 7:50 am UTC
DIY parallelism using rowids (see link above) achieves effectively the same thing as partitioning -- you get 20 non-overlapping ranges of data on disk.
make sure this table has AT LEAST 20 extents -- the more extents the better.
look at your *algorithms*, 99% of your tuning capabilities are there.
DIY vs partitioning
Reader, February 16, 2005 - 8:41 am UTC
If partioning will do almost the same thing then what is the point of using DIY .
Simple coding .
Everything is done in the database --parallism + Partioning .
Pl help me understand .
Thanks
February 16, 2005 - 8:46 am UTC
is your process able to make use of parallel query?
if not, parallelism in the database isn't going to help. given that you run 20 jobs, you are doing "do it your self parallelism", sounds like you have 20 jobs that:
for x in ( select * from table where ....)
loop
do some really long intense processing
end loop
parallel query cannot help that. so you want to divide this into "20 pieces". One approach is to partition (you have to range partition, hash partition wants 2, 4, 8, 16, 32, 64 partitions -- not 20. you have to figure out how to range partition into 20 more or less equal sized partitions) and have each batch job:
for x in ( select * from table partition( MY_PARTITION_TO_PROCESS ) )
loop
or, just let the table go into hundreds of small extents, use the split script above and have your 20 processes do:
for x in ( select * from table where rowid between A and B )
loop
to achieve the same "lack of contention while reading the batch input table"
but your problem is not the reading of this 1,000,000 row table (we can do that in our sleep), your problem is the stuff inside the loop in all probability.
pro *c vs plsql
Reader, February 22, 2005 - 9:16 am UTC
In your book (effective Oracle by design) you have mentioned better results using pro*c (page 535) .Our developers are not very familiar with C but know pl/sql .
Pl help --using bulk,dbms_jobs ,pl/sql --will not be as fast as pro *c -- TRUE ???
If yes ,then we will use C .
Thanks
February 22, 2005 - 9:46 am UTC
read that closer - I had to do something violently computationaly expensive -- hugely computational.
That is why "c"
That was an aberration -- it is the only time I've felt the need to drop down to C in many years.
<quote>
Additionally, you saw a demonstration of do-it-yourself (DIY) parallelism. Just the other day, I used this technique to accomplish a bulk update that required Pro*C processing of the data. <b>We had three-quarter billion records to process. Each record had three fields that needed to be sent through an extremely computation-expensive routine.</b> When this routine was written in PL/SQL (not possible in SQL at all), we were able to do it in about 0.01 second. Contrast this to serial processing, which would take about 86 * 3 days (three calls per row). We did not have 258 days. Additionally, even if we split it up in PL/SQLdid it in parallel 32 using the technique described in the previous section, it would take more than a week, and we didnt have that long.
Using C, we could execute this function in about 0.0002. So, we did a parallel 32 in Pro*C, with Pro*C bulk-fetching the data 100 rows at a time, calling the function 300 times and updating the data 100 rows at a time. It took about five hours to complete in that fashion. Instead of using DBMS_JOB and PL/SQL, we used Pro*C and shell scripts, but the concept was identical to the parallelization example you saw in the previous section. We fed ROWID ranges into the application and had it process its slice of the table.
</quote>
it was a strange case (we were computing the "distance" between strings -- doing a name match to clean some customer data). It is not *the norm*
Hey, nothing will be faster than writing machine code, assembler is just a layer of overhead on that.....
SQL is the fastest (watch out for slow by slow processing -- get SQL developers, not just coders!)
PL/SQL is generally faster than fast enough when that doesn't work (when impossible to use SQL alone, rare but it happens)
A tiny bit of java could be useful if plsql doesn't have a certain capability.
And C comes into play when you have an excessively computational expensive thing to do (although, with 10g and native floats - there is a really good chance that PLSQL could do it now....)
ops$tkyte@ORA9IR2> create or replace function pi return number
2 as
3 subtype my_number is number;
4 last_pi my_number := 0;
5 delta my_number := 0.000001;
6 pi my_number := 1;
7 denom my_number := 3;
8 oper my_number := -1;
9 negone my_number := -1;
10 two my_number := 2;
11 begin
12 loop
13 last_pi := pi;
14 pi := pi + oper * 1/denom;
15 exit when (abs(last_pi-pi) <= delta );
16 denom := denom + two;
17 oper := oper * negone;
18 end loop;
19 return pi * 4;
20 end;
21 /
Function created.
<b>a function to estimate PI -- lots of math in there. in 9i:</b>
ops$tkyte@ORA9IR2> set timing on
ops$tkyte@ORA9IR2> exec dbms_output.put_line( pi );
3.14159465358579324446263938327350288021
PL/SQL procedure successfully completed.
Elapsed: 00:00:01.48
ops$tkyte@ORA9IR2> exec dbms_output.put_line( pi );
3.14159465358579324446263938327350288021
PL/SQL procedure successfully completed.
Elapsed: 00:00:01.25
ops$tkyte@ORA9IR2> exec dbms_output.put_line( pi );
3.14159465358579324446263938327350288021
PL/SQL procedure successfully completed.
Elapsed: 00:00:01.31
ops$tkyte@ORA9IR2> set timing off
<b>about 1.3 seconds... now in 10g, changing number to binary_double (native number)</b>
ops$tkyte@ORA10GR1> create or replace function pi return number
2 as
3 subtype my_number is BINARY_DOUBLE;
4 last_pi my_number := 0;
5 delta my_number := 0.000001;
6 pi my_number := 1;
7 denom my_number := 3;
8 oper my_number := -1;
9 negone my_number := -1;
10 two my_number := 2;
11 begin
12 loop
13 last_pi := pi;
14 pi := pi + oper * 1/denom;
15 exit when (abs(last_pi-pi) <= delta );
16 denom := denom + two;
17 oper := oper * negone;
18 end loop;
19 return pi * 4;
20 end;
21 /
Function created.
ops$tkyte@ORA10GR1> pause
ops$tkyte@ORA10GR1> clear screen
ops$tkyte@ORA10GR1> set timing on
ops$tkyte@ORA10GR1> exec dbms_output.put_line( pi );
3.1415946535856922
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.36
ops$tkyte@ORA10GR1> exec dbms_output.put_line( pi );
3.1415946535856922
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.24
ops$tkyte@ORA10GR1> exec dbms_output.put_line( pi );
3.1415946535856922
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.27
ops$tkyte@ORA10GR1> set timing off
ops$tkyte@ORA10GR1>
Works with partitions?
LewisC, April 18, 2005 - 1:51 pm UTC
Hi Tom,
I ran the split query splitting up by rowids against a partitioned table and got invalid rowid (ORA-01410: invalid ROWID, ORA-06512: at "SYS.DBMS_ROWID", line 38). This is in 10g.
It took me a while to figure out where the problem was coming from. I think it comes down to the fact that each partition has it's own object id. Is that correct? I was getting multiple (32) rows where you were getting (in your book) 4 rows.
Can this query be made to work against a partitioned table? Or should I split it by partition? I added subobject_name and selected only a single partition, but the rowids I got (I got the expected 4), return no rows when used in the between.
The process I want to use this with is selecting data from multiple partitions.
Thanks,
Lewis
April 18, 2005 - 2:49 pm UTC
ahh, doh -- of course, they have their own data object id
You want to use DIY parallelism within a partition (a segment). since you are able to target individual partitions via:
from table t partition(x)
i don't think it really makes sense to go over partitions (just use them as the "parallel" slices)
but if you want to go parallel against a single partition, that would be an easy fix in the query.
Partitioning does work
LewisC, April 18, 2005 - 3:47 pm UTC
My turn for Doh. I forgot about the
from table partition(x)
I was leaving off the partition(x) when selecting. It's working great now.
Thanks,
Lewis
About Split partition
Sathish, November 19, 2005 - 10:06 pm UTC
During a split operation with update global indexes clause,
do all indexes get updated simultaneously or does oracle
update index by index in a serial fashion?
November 20, 2005 - 8:21 am UTC
it is like a big bulk update - but everything is "sequential", but done in a bulk operation. This is very much like a really big "update of an indexed column"
Regarding ORA:01410
Sasi Kumar, December 04, 2006 - 3:30 pm UTC
Hi Tom,
We are running select queries daily against Oracle database using SQR's. We are getting the below error,only on saturdays and all other days it runs successfully.The error we received was:
ORA-01410: invalid ROWID
(SQR 3725) Bad return fetching row from database
Can You help me to understand the problem.
December 05, 2006 - 9:39 pm UTC
don't know what a "SQR" is.
maybe it doesn't like to work on saturdays?
anything else "special" happening on saturdays we might want to know about?
SQR
reader, January 14, 2010 - 4:34 am UTC
SQR's are used in Peoplesoft - Structured Query Reporter ...