Skip to Main Content
  • Questions
  • Spawn Jobs from a Procedure that run in parallel

Breadcrumb

more

Connor and Chris don't just spend all day on AskTOM. You can also catch regular content via Connor's blog and Chris's blog. Or if video is more your thing, check out Connor's latest video and Chris's latest video from their Youtube channels. And of course, keep up to date with AskTOM via the official twitter account.

Question and Answer

Tom Kyte

Thanks for the question, Kartik.

Asked: June 17, 2003 - 4:54 pm UTC

Answered by: Tom Kyte - Last updated: August 04, 2011 - 7:21 pm UTC

Category: Database - Version: 8.1.7

Viewed 10K+ times! This question is

Whilst you are here, check out some content from the AskTom team: Partitioning Guide updated

You Asked

I have a table that contains 2 million transaction records. A procedure reads these transaction records and creates an output record in the RESULT table. Usually for each record in the TRANSACTION table there is a record in the RESULT table. Even though all the SQL queries and the code has been optimized, to the best of our ability, the procedure take a very long time to run.

I thought of splitting the transaction table into four tables (T1,T2,T3,T4), each containing 500,000 records. There would be a main procedure (PROC_MAIN)that would spawn four instances of the procedure (P1, P2, P3 and P4) using DBMS_JOB package. This would create four RESULT tables (R1,R2,R3,R4).

1. In the main procedure (PROC_MAIN) how do I check if all the four jobs (P1,P2,P3,P4) have completed?
2. The Main procedure (PROC_MAIN) has to wait until all the four instances are complete because I need to consolidate the RESULT tables (R1,R2,R3,R4) into a single table and generate some reports. All this has to be automated without any manual intervention. Any suggestions?

and we said...

I use a technique like the following:

create or replace procedure simulation( p_procedure in varchar2, p_jobs in number, p_iters in number )
authid current_user
as
l_job number;
l_cnt number;
begin
for i in 1 .. p_jobs
loop
begin
execute immediate 'drop table t' || i;
exception
when others then null;
end;
execute immediate 'create table t' || i || ' ( x int )';
end loop;

for i in 1 .. p_jobs
loop
dbms_job.submit( l_job, p_procedure || '(JOB);' );
insert into job_parameters
( jobid, iterations, table_idx )
values ( l_job, p_iters, i );
end loop;

statspack.snap;
commit;
loop
dbms_lock.sleep(30);
select count(*) into l_cnt from job_parameters;
exit when (l_cnt = 0);
end loop;
statspack.snap;
end;
/


I have a parameter table which I populate with inputs to the "back ground" processes.

After I commit, the jobs start running -- I just "poll" the job parameter table till they are done...


Now, you don't need to split the table into 4 -- you can just use ROWID ranges. If you use this script I call "split"

-------------------------------------------------


set verify off
define TNAME=&1
define CHUNKS=&2


select grp,
dbms_rowid.rowid_create( 1, data_object_id, lo_fno, lo_block, 0 ) min_rid,
dbms_rowid.rowid_create( 1, data_object_id, hi_fno, hi_block, 10000 ) max_rid
from (
select distinct grp,
first_value(relative_fno)
over (partition by grp order by relative_fno, block_id
rows between unbounded preceding and unbounded following) lo_fno,
first_value(block_id )
over (partition by grp order by relative_fno, block_id
rows between unbounded preceding and unbounded following) lo_block,
last_value(relative_fno)
over (partition by grp order by relative_fno, block_id
rows between unbounded preceding and unbounded following) hi_fno,
last_value(block_id+blocks-1)
over (partition by grp order by relative_fno, block_id
rows between unbounded preceding and unbounded following) hi_block,
sum(blocks) over (partition by grp) sum_blocks
from (
select relative_fno,
block_id,
blocks,
trunc( (sum(blocks) over (order by relative_fno, block_id)-0.01) /
(sum(blocks) over ()/&CHUNKS) ) grp
from dba_extents
where segment_name = upper('&TNAME')
and owner = user order by block_id
)
),
(select data_object_id from user_objects where object_name = upper('&TNAME') )
/

-------------------------------- eof --------------------


and run it

SQL> @split T 4

it'll produce something like:

big_table@ORA920LAP> @split big_table 4

GRP MIN_RID MAX_RID
---------- ------------------ ------------------
0 AAAHchAAJAAAAAJAAA AAAHchAAJAAAA4ICcQ
1 AAAHchAAJAAAA4JAAA AAAHchAAJAAABwICcQ
2 AAAHchAAJAAABwJAAA AAAHchAAJAAACgICcQ
3 AAAHchAAJAAACgJAAA AAAHchAAJAAADgICcQ


those are 4 non-overlapping rowid ranges that complete "cover" the table -- so, you can query:


for x in ( select /*+ FIRST_ROWS */ * from t where rowid between X and Y )
loop

and just pass in a rowid pair for X and Y. that way, you can use the above query to generate 4 rowid ranges, insert them into the job parameter table and run 4 copies of your procedure. they'll each process about 1/4th of the table.

and you rated our response

  (75 ratings)

Is this answer out of date? If it is, please let us know via a Review

Reviews

June 18, 2003 - 2:33 pm UTC

Reviewer: Kartik Dhekne from New Jersey

Thank you very much for your solution.
One quick question. I am assuming that the procedures that are spawned would delete the record in JOB_PARAMETERS table upon completion. Otherwise there will be no exit from the last loop.

Tom Kyte

Followup  

June 19, 2003 - 7:48 am UTC

correctg

Reader

June 22, 2003 - 11:22 am UTC

Reviewer: A reader

"
dbms_job.submit( l_job, p_procedure || '(JOB);' );
"

Could you explain how the PL/SQL parser works when you
concatenate the iterative procedure "p_procedure" with
the actual job "'(JOB);'" and pass it as the second
parameter to dbms_job.submit

Hi

September 03, 2003 - 3:01 pm UTC

Reviewer: A reader

I am sorry I have posted this review on wrong question earlier. I am posting it again in the correct place

In the order by clause in the analytic expressions , am I correct in saying that
for first_value(relative_fno) block_id is not necessary and for
first_value(block_id) relative_fno need not be there.

i.e. the query would look like

set verify off
define TNAME=&1
define CHUNKS=&2

select a.grp,
dbms_rowid.rowid_create( 1, b.data_object_id, a.lo_fno, a.lo_block, 0 )
min_rid,
dbms_rowid.rowid_create( 1, b.data_object_id, a.hi_fno, a.hi_block, 10000
) max_rid
from (
select distinct grp,
first_value(relative_fno) over
(partition by grp order by
relative_fno
rows between unbounded
preceding and unbounded following) lo_fno,
first_value(block_id) over
(partition by grp order by
block_id
rows between unbounded
preceding and unbounded following) lo_block,
last_value(relative_fno) over
(partition by grp order by
relative_fno
rows between unbounded
preceding and unbounded following) hi_fno,
last_value(block_id+blocks-1) over
(partition by grp order by
block_id
rows between unbounded
preceding and unbounded following) hi_block,
sum(blocks) over (partition by grp) sum_blocks
from (
select relative_fno,
block_id,
blocks,
trunc((sum(blocks) over (order by relative_fno,
block_id)-0.01) / (sum(blocks) over ()/&CHUNKS)) grp
from dba_extents
where segment_name = upper('&TNAME')
and owner = user
order by block_id
)
)a,
(
select data_object_id
from user_objects
where object_name = upper('&TNAME')
)b
/

thanks
amit


Tom Kyte

Followup  

September 04, 2003 - 8:35 am UTC

i don't get your point -- care to point out what differences you made and then you can prove it works or not via testing.

I might have extra columns in some of the inline views -- when I built this query -- i did it from the inside out. so I could see what it was doing at each level. I sometimes add a column or two in there in order to "debug" and maybe i left one in there. is that what you mean?

October 27, 2003 - 3:25 pm UTC

Reviewer: Robert from Memphis, USA


Can I Split a Global Temporary Table?

November 04, 2003 - 2:49 pm UTC

Reviewer: Upendra from NJ

Hi Tom,

   This is a great site - very informative. I also read your book - Expert One-On-One. I've already recommended it to my team-mates. You are doing a great job, keep it up!

I have a couple of questions:

1) How do I make the Split work on a Global Temporary Table? I tried the following: - 

10:41:53 SQL> create global temporary table t
10:42:03   2  on commit preserve rows
10:42:08   3  as
10:42:11   4  select * from testocc1;

Table created.

Elapsed: 00:00:02.62
10:42:19 SQL> commit;

Commit complete.

Elapsed: 00:00:00.00
10:42:23 SQL> select count(*) from testocc1;

  COUNT(*)
----------
    675158

Elapsed: 00:00:00.24
10:42:28 SQL> @split_table t 10

no rows selected

Elapsed: 00:00:00.42

2) We are currently building a data warehouse. We get fixed width files (master and transaction ) from several source systems. The data warehouse source schema contains external tables, one per file and views which reside on top of the external tables which contains many row level transformations - for example CASE statements, NVL handling, decimal conversion etc. We used Direct Path Insert to populate fact tables and we use single 'Insert into .. Select from ..' statement approach. What are your thoughts on this approach? Some of the questions that come to my mind are :-
1. What happens if query execution plan changes in the production making the SQL statement itself go slower? What can I do to minimize this risk?
2. We have seen that when you load a fact table using the external_table->view->joined with dimension table approach is slower at times - this is in-consistent. However if I dump the data from the view into a Global Temporary Table, it goes faster - well many times. Would it make sense to split the Global Temporary Table as you've shown in this thread and then submit them as load jobs. What are your thoughts? 

Thanks,

Upendra 

Tom Kyte

Followup  

November 04, 2003 - 5:33 pm UTC

1) you cannot -- it would not make sense.

you cannot because -- it has no real "extents"

it doesn't make sense because the only use of split would be to have many sessions work on different portions but the only session that can see your GTT is -- YOUR OWN!!

2) i love that approach

2.1) what if it changes to go faster? you can freeze the plan, but in general, in absence of a bug, let it do what it does.

2.2) have you looked at using optimizer dynamic sampling. what are the different plans generated?

Load from External Table->View->into Normal Table and Split?

November 05, 2003 - 9:55 am UTC

Reviewer: A reader

Thank you for quick response. Next time, I'll take a closer look at the difference in plans when I switch to a temporary table from a View. Not sure if they looked the same or not.

Some more questions:-
-------------------------------------

1. Some of our transaction files are really large, close to 80 or more million rows in 1 day alone. We plan to run daily loads. In this scenario, would it make sense to load the data from the view into real table, then split that table into, say 10 pieces and then submit them as jobs. So they can all run in parallel. Even if some of these jobs were to fail, we will be able to restart the batch job for only the ones that failed rather that having to run one big job. What are your thoughts on this?

2. What is the difference between splitting and loading versus 'alter session enable parallel DML'. Are we introducing another layer of parallelism when we split?

3. We have a surrogate key (also defined a Primary Key) on our fact (partitioned) table. If I disable the Primary Key before loading, it would drop the UNIQUE INDEX. How to avoid INDEX maintenance during this load? I also want to avoid re-creating this INDEX because there may be history data sitting in this table.

Thanks,

Upendra

Tom Kyte

Followup  

November 05, 2003 - 5:39 pm UTC

1) could be -- depends on whether the time is spent processing the record (eg: that is the bottleneck and needs to be parallelized) or whether loading the stage table with 80m rows would be.

sounds "reasonable" that loading a stage and then processing it using "split" in parallel would be more efficient (although if you can request N input files, that would work as well -- just have N external tables)

2) my "split" by rowids -- lets you parallelize your PROCEDURAL CODE. parallel dml takes an insert/update/delete and executes it in parallel -- pdml will not help a procedural process get parallelized

3) if you disable the primary key during the load -- you need to recreate it.

if you don't maintain it during the load -- you need to recreate it.

so, unless you maintain it during the load, you are recreating it. that is it -- no other choices really.

Question on splitting.

July 07, 2004 - 10:22 am UTC

Reviewer: Kashif from Reston, VA

Hi Tom,

Splitting a table into rowid ranges seems like a fantastic idea to reduce contention on data files for one table, but frequently the need is to process a query involving several tables. A scenario, I have a multi-table query that produces records which I then write to a file. Here, even if I do split the driving table into 4 rowid ranges, for example, there will still be contention for data files when joining and looking up the detail information. Is that about right, or can I use the rowid splitting technique for multi-table queries as well somehow and reduce contention? Thanks.

Kashif

Tom Kyte

Followup  

July 07, 2004 - 11:57 am UTC

it is not just about contention -- it is about getting things done in parallel.


If you need to produce *a file*, parallel query might be what you want.

if you can produce N-files -- do it yourself parallelism (by splitting the main driving table out) might be what you want.

Why the "-.01" ?

December 13, 2004 - 12:30 pm UTC

Reviewer: Robert from Memphis, USA.

Tom,

Why did you put the "-.01" in your ROWID split formula...

<QUOTE>
trunc( (sum(blocks) over (order by relative_fno, block_id)-0.01) / (sum(blocks) over ()/&CHUNKS) ) grp
<QUOTE>

Thanks,

Robert.


Tom Kyte

Followup  

December 13, 2004 - 2:02 pm UTC

to get the division to work correctly, to "slide" the endpoint from grp=2 into grp=1

it was for boundary conditions -- if each chunk was to be 100 blocks, I wanted the numbers 1..100 to be in group "0"

trunc( 100/100 ) = 1

trunc( (100-0.01)/100 ) = 0

December 13, 2004 - 2:33 pm UTC

Reviewer: Robert from Memphis, USA


December 22, 2004 - 11:49 am UTC

Reviewer: A reader


only one out of four is working

March 20, 2005 - 10:47 pm UTC

Reviewer: Reader from US

I have a process split into four using the above technique.
The 4 jobs are inserting into the same table -Direct insert to be precise.
Only one job is able to insert with a lock mode of 6.The others complete their execution without inserting .

Why, can you explain ?

BTW ,If this method is used on RAC is it possible to execute the 4 jobs on the four different instances of RAC .

Thanks,


Tom Kyte

Followup  

March 21, 2005 - 10:09 am UTC

insert /*+ append */ can only be done on a single segment by a single session

You either need to:

a) use external tables and parallel query to parallel direct path load (eg: no do it yourself, let the database do it)

b) use sqlldr parallel direct path loads

c) use conventional path loads with do it yourself parallelism


if you are just doing a query and inserting it by 4 sessions (a) makes the most sense.

Paritioning a 110GB table with LONG columns

April 06, 2005 - 9:03 pm UTC

Reviewer: Logan Palanisamy from Sunnyvale, CA, USA

Tom,

I am trying to partition a 110gb table with one LONG column into a table with multiple partitions with the long column converted to CLOB.

The source table ST has 18 million records. I am using your Do It Yourself (DIY) parallelism technique. It looks like it is going to take nearly two days to complete the job. Is there anyway to speed it up?


Here is the scenario:

Colummn C90 is the CLOB column in the target table TT_P. It has the same structure as the source table ST except for the LONG/CLOB column.

SQL>
SQL> drop table tt_p;

Table dropped.

SQL> CREATE TABLE  TT_P (C1 VARCHAR2(11),
   2  C2 NUMBER(1, 0), C3 NUMBER(1, 0),
   3  C4 NUMBER(1, 0), C5 NUMBER(1, 0), C6
   4  NUMBER(1, 0), C7 NUMBER(1, 0), C8 NUMBER(1, 0),
   5  C9 NUMBER(1, 0), C10 NUMBER(1, 0), C11 VARCHAR2(31),
   6  C12 VARCHAR2(31), C13 VARCHAR2(31), C14 VARCHAR2(11),
   7  C15 VARCHAR2(21), C16 VARCHAR2(31), C17 VARCHAR2(31),
   8  C18 VARCHAR2(31), C19 VARCHAR2(31), C20 VARCHAR2(31),
   9  C21 VARCHAR2(31), C22 VARCHAR2(101), C23 VARCHAR2(65),
  10  C24 VARCHAR2(51), C25 VARCHAR2(51), C26 VARCHAR2(31),
  11  C27 VARCHAR2(11), C28 VARCHAR2(3), C29 VARCHAR2(31),
  12  C30 VARCHAR2(31), C31 VARCHAR2(11), C32 VARCHAR2(2),
  13  C33 VARCHAR2(11), C34 VARCHAR2(2), C35 VARCHAR2(2),
  14  VIETC35 VARCHAR2(2), C37 VARCHAR2(31), C38
  15  VARCHAR2(31), C39 VARCHAR2(31), C37_DEG VARCHAR2(21),
  16  C38_DEG VARCHAR2(21), C39_DEG VARCHAR2(21),
  17  C37_MAJOR VARCHAR2(51), C38_MAJOR VARCHAR2(51),
  18  C39_MAJOR VARCHAR2(51), C37_YEAR DATE, C38_YEAR
  19  DATE, C39_YEAR DATE, C37_GPA NUMBER(5, 2), C38_GPA
  20  NUMBER(5, 2), C39_GPA NUMBER(5, 2), C52 VARCHAR2(2000),
  21  C53 VARCHAR2(21), C54 VARCHAR2(21), C55
  22  VARCHAR2(21), C56 VARCHAR2(21), C57 VARCHAR2(21),
  23  C58 VARCHAR2(2000), C59 NUMBER(4, 0), C60 NUMBER(9,
  24  2), C61 VARCHAR2(21), C62 VARCHAR2(31), C63
  25  VARCHAR2(513), NONC63 VARCHAR2(513), C65 VARCHAR2(513),
  26  C66 VARCHAR2(51), C67 VARCHAR2(51), C68
  27  VARCHAR2(51), C69 VARCHAR2(51), C70 VARCHAR2(51),
  28  C71 VARCHAR2(51), C72 VARCHAR2(51), C73 DATE, C74
  29  DATE, C75 DATE, C76 DATE, C77 DATE, C78 DATE,
  30  C79 VARCHAR2(513), C80 VARCHAR2(513), C81 VARCHAR2(513),
  31  C82 VARCHAR2(2), C83 VARCHAR2(2), C84 VARCHAR2(2),
  32  C85 VARCHAR2(51), C86 VARCHAR2(51), C87
  33  VARCHAR2(51), C88 VARCHAR2(51), C88_SPECIFIC VARCHAR2(51),
  34  C90 CLOB, C91 VARCHAR2(11) NOT NULL ENABLE, C92
  35  VARCHAR2(11) NOT NULL ENABLE, C93 DATE NOT NULL ENABLE, C94
  36  DATE NOT NULL ENABLE, UPPER_C11 VARCHAR2(31), UPPER_C12
  37  VARCHAR2(31), UPPER_C22 VARCHAR2(101), C98 NUMBER(9, 2),
  38  C622 VARCHAR2(31), C100 VARCHAR2(31), C242
  39  VARCHAR2(51), C252 VARCHAR2(51), C262 VARCHAR2(31), C272
  40  VARCHAR2(11), C282 VARCHAR2(3), C292 VARCHAR2(31),
  41  C302 VARCHAR2(31), C108 VARCHAR2(31), C109
  42  VARCHAR2(31), C110 VARCHAR2(31), C111 NUMBER(1, 0),
  43  C112 NUMBER(1, 0)) PCTFREE 25 PCTUSED 60 INITRANS 1 MAXTRANS
  44  255 noLOGGING parallel (degree 10 instances 1) STORAGE(MINEXTENTS 1
  45  MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 50 FREELIST GROUPS 1
  46  BUFFER_POOL DEFAULT) TABLESPACE USER
  47  partition by range (ctime) (
  48  partition tt_upto_1998 values less than (to_date('01-jan-1998', 'dd-mon-yyyy')),
  49  partition tt_1998 values less than (to_date('01-jan-1999', 'dd-mon-yyyy')),
  50  partition tt_1999 values less than (to_date('01-jan-2000', 'dd-mon-yyyy')),
  51  partition tt_2000 values less than (to_date('01-jan-2001', 'dd-mon-yyyy')),
  52  partition tt_2001 values less than (to_date('01-jan-2002', 'dd-mon-yyyy')),
  53  partition tt_2002 values less than (to_date('01-jan-2003', 'dd-mon-yyyy')),
  54  partition tt_2003 values less than (to_date('01-jan-2004', 'dd-mon-yyyy')),
  55  partition tt_2004 values less than (to_date('01-jan-2005', 'dd-mon-yyyy')),
  56  partition tt_2005 values less than (to_date('01-jan-2006', 'dd-mon-yyyy'))
  57  );

Table created.

SQL> alter session enable parallel dml;

Session altered.

SQL>
SQL> create or replace procedure p_insert(p_rowid_low char, p_rowid_high char)
  2  as
  3  begin
  4  execute immediate 'insert /*+ APPEND parallel (tt_p,8) */ into ts.tt_p select /*+ parallel (p, 8) */ C1, C2, C3, C4, C5,
 C6, C7, C8, C9, C10, C11, C12, C13, C14, C15, C16, C17, C18, C19, C20, C21, C22, C23, C24, C25, C26, C27, C28, C29, C30, C31
, C32, C33, C34, C35, VIETC35, C37, C38, C39, C37_DEG, C38_DEG, C39_DEG, C37_MAJOR, C38_MAJOR, C39_MAJOR, C37_YEAR, C38_YEAR,
 C39_YEAR, C37_GPA, C38_GPA, C39_GPA, C52, C53, C54, C55, C56, C57, C58, C59, C60, C61, C62, C63, NONC63, C65, C66, C67, C68,
 C69, C70, C71, C72, C73, C74, C75, C76, C77, C78, C79, C80, C81, C82, C83, C84, C85, C86, C87, C88, C88_SPECIFIC, to_lob(C90
), C91, C92, C93, C94, UPPER_C11, UPPER_C12, UPPER_C22, C98, C622, C100, C242, C252, C262, C272, C282, C292, C302, C108, C109
, C110, C111, C112 from ss.st p
  5  where rowid between chartorowid(:p_rowid_low) and chartorowid(:p_rowid_high)' using p_rowid_low, p_rowid_high;
  6  end;
  7  /

Procedure created.

-- In the procedure above, I use the TO_LOB function to convert the LONG column "C90" to CLOB.



SQL>
SQL> drop table t_rowid_pairs ;

Table dropped.

SQL> create table t_rowid_pairs as
  2  select dbms_rowid.rowid_create
  3         ( 1, data_object_id, lo_fno, lo_block, 0 ) min_rid,
  4         dbms_rowid.rowid_create
  5         ( 1, data_object_id, hi_fno, hi_block, 10000 ) max_rid
  6    from (
  7  select distinct grp,
  8         first_value(relative_fno) over
  9         (partition by grp order by relative_fno, block_id
 10          rows between unbounded preceding and unbounded following) lo_fno,
 11         first_value(block_id    ) over
 12         (partition by grp order by relative_fno, block_id
 13          rows between unbounded preceding and unbounded following) lo_block,
 14         last_value(relative_fno) over
 15         (partition by grp order by relative_fno, block_id
 16          rows between unbounded preceding and unbounded following) hi_fno,
 17         last_value(block_id+blocks-1) over
 18         (partition by grp order by relative_fno, block_id
 19          rows between unbounded preceding and unbounded following) hi_block,
 20         sum(blocks) over (partition by grp) sum_blocks
 21    from (
 22  select relative_fno,
 23         block_id,
 24         blocks,
 25         trunc( (sum(blocks) over (order by relative_fno, block_id)-0.01) /
 26                (sum(blocks) over ()/200) ) grp
 27    from dba_extents
 28   where segment_name = upper('ST')
 29     and owner = 'SS' order by block_id
 30         )
 31         ),
 32         (select data_object_id
 33            from dba_objects
 34           where object_name = 'ST' and owner = 'SS' );

Table created.

SQL>
SQL>
SQL> declare
  2      l_job    number;
  3  begin
  4
  5  for x in (select * from t_rowid_pairs)
  6  loop
  7  dbms_job.submit(l_job, 'p_insert('''||x.min_rid||''', '''||x.max_rid||''');');
  8  end loop;
  9  commit;
 10
 11  end;
 12  /

PL/SQL procedure successfully completed.

SQL>
SQL> select count(*) from t_rowid_pairs;
       197

SQL> select count(*) from dba_jobs;
       197

SQL> show parameter parallel

NAME                                 TYPE    VALUE
------------------------------------ ------- ------------------------------
fast_start_parallel_rollback         string  LOW
optimizer_percent_parallel           integer 0
parallel_adaptive_multi_user         boolean FALSE
parallel_automatic_tuning            boolean FALSE
parallel_broadcast_enabled           boolean FALSE
parallel_execution_message_size      integer 2148
parallel_instance_group              string
parallel_max_servers                 integer 16
parallel_min_percent                 integer 0
parallel_min_servers                 integer 0
parallel_server                      boolean FALSE
parallel_server_instances            integer 1
parallel_threads_per_cpu             integer 2
recovery_parallelism                 integer 0
SQL> show parameter job

NAME                                 TYPE    VALUE
------------------------------------ ------- ------------------------------
job_queue_interval                   integer 60
job_queue_processes                  integer 36
SQL> l
  1*  select message from v$session_longops where time_remaining > 0
SQL> /

MESSAGE
--------------------------------------------------------------------------------
Table Scan:  SS.ST: 1285 out of 73600 Blocks done
Table Scan:  SS.ST: 780 out of 74240 Blocks done
Table Scan:  SS.ST: 541 out of 73600 Blocks done
Table Scan:  SS.ST: 62 out of 72960 Blocks done
Table Scan:  SS.ST: 122 out of 73600 Blocks done
Table Scan:  SS.ST: 66 out of 76455 Blocks done
Table Scan:  SS.ST: 88 out of 69055 Blocks done
Table Scan:  SS.ST: 83 out of 54773 Blocks done
Table Scan:  SS.ST: 77 out of 77440 Blocks done
Table Scan:  SS.ST: 39 out of 123471 Blocks done
Table Scan:  SS.ST: 69 out of 60199 Blocks done
Table Scan:  SS.ST: 165 out of 64912 Blocks done
Table Scan:  SS.ST: 64 out of 31940 Blocks done
Table Scan:  SS.ST: 81 out of 46345 Blocks done
Table Scan:  SS.ST: 65 out of 79269 Blocks done
Table Scan:  SS.ST: 100 out of 76800 Blocks done
Table Scan:  SS.ST: 63 out of 89633 Blocks done
Table Scan:  SS.ST: 78 out of 70400 Blocks done
Table Scan:  SS.ST: 175 out of 72960 Blocks done
Table Scan:  SS.ST: 111 out of 75065 Blocks done
Table Scan:  SS.ST: 60 out of 68375 Blocks done
Table Scan:  SS.ST: 78 out of 19880 Blocks done
Table Scan:  SS.ST: 131 out of 72814 Blocks done
Table Scan:  SS.ST: 75 out of 85409 Blocks done
Table Scan:  SS.ST: 94 out of 99756 Blocks done
Table Scan:  SS.ST: 47 out of 115485 Blocks done
Table Scan:  SS.ST: 112 out of 62108 Blocks done
Table Scan:  SS.ST: 68 out of 70400 Blocks done
Table Scan:  SS.ST: 74 out of 70400 Blocks done
Table Scan:  SS.ST: 44 out of 92895 Blocks done
Table Scan:  SS.ST: 94 out of 72960 Blocks done
Table Scan:  SS.ST: 76 out of 140374 Blocks done
Table Scan:  SS.ST: 217 out of 71564 Blocks done
Table Scan:  SS.ST: 70 out of 89990 Blocks done
Table Scan:  SS.ST: 213 out of 73600 Blocks done
Table Scan:  SS.ST: 74 out of 78157 Blocks done

36 rows selected.

I see only 36 entries in the v$session_longops, one for each of the 36 job_queue_processes. How come there is no paralleism within each of the JOB process? 

I even tried

execute immediate 'alter session enable parallel DML;' 

within the p_insert procedure just above the insert statement. Same result.

Thanks in advance for your help.
 

Tom Kyte

Followup  

April 07, 2005 - 9:02 am UTC

you cannot parallel direct path insert into the table using do it yourself parallelism (unless oracle is doing the work, only one session can direct path into a table)


so either

a) use parallel create table as select and let oracle do it or b
b) parallel direct load N tables that represent your N partitions and exchange them in after the fact. That is, load N separate tables that represent your partitions. After any are loaded, do an alter table exchange partition (search this site for "exchange partition", lots of examples)



Paritioning a 110GB table with LONG columns

April 06, 2005 - 10:58 pm UTC

Reviewer: Logan Palanisamy from Sunnyvale, CA, USA

Tom,

Forgot to mention. The DB version is 8.1.7.4. So, no online redefintion.

Also, 8.1.7.4 PL/SQL doesn't seem to support analytical functions. So I had to create the table t_rowid_ranges to get around it.

The target partitioned table doesn't have any indexes yet. Will be created after all the rows are inserted.

how to handle rows which are not updated?

June 26, 2005 - 1:01 pm UTC

Reviewer: chintu from NY

Hi Tom,

We are using this technique, of rowids, for our data transformation. We are transforming data in about 100 tables. We need to update 3 or 4 columns added to all these tables with ids from master tables. Following are my question:

1. For how big tables we can use technique? Some tables have
millions of rows and some around 100k or so.
2. How to spilt based on number of rows, (I want each
process to process 100k rows at a time)? I want to run
4 processes each processing 100k rows at a time.
3. (a) Since we are updating all the rows, is it
better to create temp table, insert with nologging,
etc technique?
(b) Is it good to apply this technique to smaller
table also?
4. Right now we are using the technique of rowid you shown
in this article. We are storing the splits in a table
similar to job_parameters table. But, we are not deleting
the rows once the job is completed. The reason being, if
in case we had to rerun the process it should not pickup
the already processed rowid range. This is good, but...
when some rows did not get updated for some reason (like
not finding id, should not happen -- but nothing is
perfect) we are never going pick them up again since
we said we've already processed.

On the other hand, if we don't retain the rows in
job_parameters table, then we'll again process the whole
table. Out of 10million rows if 1000rows failed...we'll
still be processing whole table using rowid range.
For this we add another condition to the update statement
saying 'where col1 is NULL'. We are not sure whether
this is ok or not. I think we are still going thru 10m
rows to find 1000 rows.

Could you please give some idea?

5. Another alternative I'm thinking about is to use decode
which in turn will require outer joins.

UPDATE table_a a
SET col1 =
(SELECT
CASE WHEN col_from_master IS NOT NULL THEN
col_from_mater
WHEN col_from_master IS NULL THEN
call_function_to_create_row_in_master
and return id
end case
FROM master m
WHERE m.code = a.code (+)
)
WHERE col1 IS NULL
;

what do you think? Any performance hit because of outer
join?


As always, thanks a lot for your help to Oracle community.
BTW, your blog is good. Also, thanks for introducing to 'Creating passtionate users' blog. They are right on in what they are saying. Now I'm regular reader of your blog and that blog.


good day.


Tom Kyte

Followup  

June 26, 2005 - 2:01 pm UTC

1) it can be used on tables of any size, you just want lots of extents so the split routine can split it "good". Beware of small tables with small numbers of extents -- haven't tested thoroughly against them with this query (eg: if the number of extents is less than the number of chunks, I could foresee issues)

2) that would be expensive, you would have to use NTILE() and primary key ranges (eg: read entire table, rank the rows and find the hi/lo keys - not rowids).

3) for larger tables, and if the updates will cause tons of row migrations, there is definitely something to be said for a parallel create table as select to do the update, yes.

4) update the rowid ranges and mark them as "done"? so you know which have been done successfully?

5) FROM master m
WHERE m.code = a.code (+)

that'll not work. you need two tables to always get a row.

since a.code should be a primary/unique key -- you would just update the join, no correlated subquery


update ( select old, new
from t1, t2
where t1.key = t2.key )
set old = new;


how to handle rows which are not updated?

June 26, 2005 - 3:51 pm UTC

Reviewer: Chintu from NY

Thanks Tom for your comments.

3) Since we are writing a generic script for all our customers we are not sure whether row migration happens are not. Without knowing this, whats our best bet?
Row migration might happen as we are adding 3 or 4 new column in each table.
All new columns are numbers and will eventually become unique keys. (We are replacing the existing keys with the new keys as our architecture changed). But, we are still retaining the old keys.

4) Let me elaborate. We make rowid range as processed so that no other process picks up the same rowid. Within the rowid ranges, some rows might not get updated. This could happen in multiple rowid ranges. We are still marking them as success as opposed to failure (called routine does it that way). Even if we mark them as failure we will be processing more rows that needed.
Not sure how to get out of this.

5) Actually, I already tried that but will not work for us. The issue here is, we are replacing the existing key columns with new ones. Some columns being added (say 2 of 4) will become key columns replacing the old key columns.
So, in this case updating join would not work.

Would the following be ok:

update table
set new_key_col1 = nvl((scalar_subquery to return id with join using old key),
fun2create_id),
new_key_col2 = (some other query),
new_non_key_col3 = value,
new_non_key_col4 = some_other_value
where <rowid range>;

The update statement above is for step 2 below.

The reason for calling function to create ids is this:

1. Rows from old master table is taken and inserted into another (existing) master table with new ids.
A temp table is created which stores new ids along with old ids. This happens during normal business - no downtime for users.
Note that these are master tables. We will be updating actual transaction table in the next step.

2. After some time (during downtime), may be hours or days after step 1, we update transactions table with ids from temp table.
But, it might so happen that in this gap between step 1 and 2, users might have created some new data and for that we might not have rows in temp table. So, our updates are going to fail.
So, if we don't find a row in temp table I should call a function which create row new id, inserts into temp table and returns me the id. Would the above query work.

3. Once all rows are updated then we would create unique index with new columns.


I hope I'm clear.

Thanks

Tom Kyte

Followup  

June 26, 2005 - 4:41 pm UTC

3) you should be able to "guess" pretty accurately. say you add 3 columns, a number (5 bytes maybe) a date (7 bytes) and a string (average of 15 bytes maybe)

you are adding about 30bytes/row.

say you are using an 8k block, 10% free (default), you have about 800 bytes free (maybe).

so, if you have more than 20 or 30 rows/block, you will likely be migrating tons of data as the rows don't fit anymore.


just look at it and come up with a reasonable guess.

Or test it and see what happens "in general".


4) neither am I, you'll have to design something to handle your logic, like logging the rowids of the unprocessed rows into an exception table that someone needs to come back and fix later.

5) sorry, if the update of a join won't work -- nothing will. Using your bad query:

UPDATE table_a a
SET col1 =
(SELECT
CASE WHEN col_from_master IS NOT NULL THEN
col_from_mater
WHEN col_from_master IS NULL THEN
call_function_to_create_row_in_master
and return id
end case
FROM master m
WHERE m.code = a.code (+)
)
WHERE col1 IS NULL
;

IF M.CODE IS NOT UNIQUE, that subquery can return more than one row. Hence M.CODE MUST BE UNIQUE and hence the update of a join CAN WORK.




how to handle rows which are not updated?

June 26, 2005 - 5:03 pm UTC

Reviewer: Chintu from NY

Wow!!! I did not expect reply this fast. Thanks a lot for reading by big posts.

5) Now I'm clear. Initially, I misunderstood. I thought the column being updated should have uniqe index. Doesn't make sense. I should have given more thought, sorry.

It should the column from where we are fetching records should be unique to guarantee only one row is returned. Right?

Would there be any impact on performance if I add decode/nvl/case to get new ids when returned rows is null. In ideal situation, if we are lucky, it would never be called.

I'll try your solution once I get to work tomorrow.

Really, appreciate your time.

Tom Kyte

Followup  

June 26, 2005 - 5:52 pm UTC

no, the column used to look up the new value in the lookup table should have a unique constraint.



statistics

June 27, 2005 - 10:19 am UTC

Reviewer: Chintu from NY

Thanks again Tom.

Statistics:

I'm planning to run several test cases:

1. Split and process with multiple process.
2. Direct Updates
3. CTAS

Above cases with different sizes of tables.

How should I gather statistics for 1st test case? Could you please give me some direction?

How to "bold" font a particular word or line? I tried to start with hash-B and end with hash-B. But, it applied for all the lines from the start of hash-B.

Tom Kyte

Followup  

June 27, 2005 - 10:29 am UTC

what statistics or metrics would you like to gather?

statspack is a tool I've used in the past

statspack.snap
<run whatever using as many sessions as you like>
statspack.snap right after it is done.

you'll have the system utilization during that period of time, you'll know the elapsed time, waits, bottlenecks and so on.

statistics

June 27, 2005 - 10:23 am UTC

Reviewer: Chintu from NY

Oops! I forgot to ask another question.

How to find out whether row is migrated or not?


Thanks.

Tom Kyte

Followup  

June 27, 2005 - 10:29 am UTC

analyze table list chained rows.

ORA-01410: Invalid Rowid/ORA-06512: at SYS.DBMS_ROWID

July 08, 2005 - 10:52 am UTC

Reviewer: Andy from US

Tom,
Whenever I try to get the ROWID ranges for a partitioned table (Hash/Range/Composite/List) I get the error. Works just fine with non-partitioned table.Could you explain why and suggest me a workaround ?

Tom Kyte

Followup  

July 08, 2005 - 12:50 pm UTC

you'll want to do it for a PARTITION of a partitioned table (a segment), it works on segments -- not multi-segment objects.

ORA-08103 Object No Longer Exists!!

July 08, 2005 - 1:57 pm UTC

Reviewer: Andy from US

I Change my query to the following:

select grp,
dbms_rowid.rowid_create( 1, data_object_id, lo_fno, lo_block, 0 )
min_rid,
dbms_rowid.rowid_create( 1, data_object_id, hi_fno, hi_block, 10000 )
max_rid
from (
select distinct grp,
first_value(relative_fno)
over (partition by grp order by relative_fno, block_id
rows between unbounded preceding and unbounded following) lo_fno,
first_value(block_id )
over (partition by grp order by relative_fno, block_id
rows between unbounded preceding and unbounded following) lo_block,
last_value(relative_fno)
over (partition by grp order by relative_fno, block_id
rows between unbounded preceding and unbounded following) hi_fno,
last_value(block_id+blocks-1)
over (partition by grp order by relative_fno, block_id
rows between unbounded preceding and unbounded following) hi_block,
sum(blocks) over (partition by grp) sum_blocks
from (
select relative_fno,
block_id,
blocks,
trunc( (sum(blocks) over (order by relative_fno, block_id)-0.01) /
(sum(blocks) over ()/4) ) grp
from dba_extents
where segment_name = upper('BRD_CUST')
and partition_name in ('BRD_CUST_LP01','BRD_CUST_LP02','BRD_CUST_LP03','BRD_CUST_LP04','BRD_CUST_LP05')
and owner = user order by block_id
)
),
(select data_object_id
from user_objects
where object_name = upper('BRD_CUST')
and data_object_id is not null
and subobject_name in ('BRD_CUST_LP01','BRD_CUST_LP02','BRD_CUST_LP03','BRD_CUST_LP04','BRD_CUST_LP05') )

I get the ROWID ranges. Then I go back to find the record using the rowid's and it then gives me the error. Can you please explain me why ? I'm sure I am goofing up somewhere. Please advice.

Thanks
Andy

Tom Kyte

Followup  

July 08, 2005 - 3:19 pm UTC

what I mean is -- you will have to rewrite the query to work on a SINGLE PARTITION, not across partitions.

This will not break up a partitioned table,
This could be written to break up a SINGLE partition of a partitioned table.

besides, if you have 5 partitions and were going to break this into 4 -- I'd say "just break it into 5, target separate PARTITIONS, not rowid ranges"

Why dbms_lock.sleep(30)?

August 18, 2005 - 12:58 am UTC

Reviewer: Ashish

Hi Tom,
I couldn't the understand the significance of dbms_lock.sleep(30). Why not simply call
loop
select count(*) into l_cnt from job_parameters;
exit when (l_cnt = 0);
end loop;

Thanks


Tom Kyte

Followup  

August 18, 2005 - 3:50 pm UTC

I am trying to benchmark something.

In order to not adversely affect the performance of the thing being benchmarked, I sleep lots, so as to NOT use resources.


Your approach would burn 100% of a cpu continously.

August 18, 2005 - 1:37 am UTC

Reviewer: Ashish

Sorry for posting it twice. Pls ignore the post above and answer </code> http://asktom.oracle.com/pls/ask/f?p=4950:61:11698952939567937223::::P61_ID:45428179919845 <code>

instead.

Thanks

Tom Kyte

Followup  

August 18, 2005 - 3:57 pm UTC

too late. I read them in order. Look here :)

how to split a result set without writing it into a table

September 07, 2005 - 2:13 am UTC

Reviewer: A reader

I have a large result set (around 500K), I would like to split it into 50 chunks and each chunk will have 10K rows (result sets of 4 tables join), and then spawn 50 dbms_jobs to process each and every chunk in parallel.

Do I have to store the result set into a table, then split the result table, or is there any way that I can split the result set on the fly and make sure there is no overlaps between chunks?

Thanks in advance.

Tom Kyte

Followup  

September 07, 2005 - 7:35 am UTC

one would need more information...

If the 'result set' comes from a single table OR there is a single table that "drives the query", we can use rowid ranges -- eg:

select * from emp, dept where emp.deptno=dept.deptno;

we could use our knowledge to know that EMP 'drives' this particular query and if we split EMP into 4 pieces by rowid range, we'd be good to go.

select * from emp, dept where emp.deptno=dept.deptno and emp.rowid between A and B;

using split from above.

OR is there some primary "key" we can use.  for example, all_objects has a "primary key" of object_id:

ops$tkyte@ORA9IR2> select nt, min(object_id), max(object_id), count(*)
  2    from (select object_id, ntile(4) over (order by object_id) nt
  3            from all_objects )
  4   group by nt;
 
        NT MIN(OBJECT_ID) MAX(OBJECT_ID)   COUNT(*)
---------- -------------- -------------- ----------
         1              3           7147       6970
         2           7148          14117       6970
         3          14118          21087       6970
         4          21088          44477       6969

there are 4 non-overlapping ranges that break "all_objects" into 4 pieces. 

October 13, 2005 - 1:01 pm UTC

Reviewer: A reader

Is there a way to figure this out for an index without taking a dump? Namely, given an index block, what are the keys that reside on it

Tom Kyte

Followup  

October 13, 2005 - 4:15 pm UTC

not without a dump.

Thanks

November 21, 2005 - 5:14 pm UTC

Reviewer: Sagar from New York

Hi Tom,
In my case the source system is a view not table. The view is owned by other group, for which we have read permissions. How can i split the views?
Thanks
Sagar

Tom Kyte

Followup  

November 21, 2005 - 5:17 pm UTC

use the ntile approach.

getting ORA-01410: invalid ROWID when running parallel jobs

May 27, 2006 - 1:06 pm UTC

Reviewer: Dusan from Czech republic

Hi Tom,

We are running parallel jobs, doinng the following:
1. Insert into partitioned table rows, using alter table exchange partition with table
2. consequently, update of this partitioned table

Occasionaly, we are getting the error ORA-01410 during update. We try to do the update again, in loop, until success. We run this on ORACLE 10gR2 (10.2.0.1). Any idea what might be the reason of this error?

Thanks,

Dusan

Tom Kyte

Followup  

May 27, 2006 - 9:35 pm UTC

Please utilize support for this, getting an invalid rowid from parallel query would be "not expected"

Re: getting ORA-01410: invalid ROWID when running parallel jobs

June 21, 2006 - 2:33 am UTC

Reviewer: Dusan from Czech republic

Hi Tom,

I contacted the Oracle Support, I believe that we have solved the problem as I described above.

</code> http://metalink.oracle.com/metalink/plsql/tar_main.this_tar?tar_num=5481454.993&p_ctryCode=840 <code>

regards,

Dusan


split usage with multiple values to a column

July 07, 2006 - 4:34 pm UTC

Reviewer: Kumar hs

Tom,

I am trying to use the split procedure for a table that is required to split based on the actual values in the column that would not overlap into the next chunk grp.

For instance;

create table chunk_tbl ( a varchar2(2),b number);

begin
insert into chunk_tbl values('ab',100);
insert into chunk_tbl values('ab',100);
insert into chunk_tbl values('bc',100);
insert into chunk_tbl values('cd',100);
insert into chunk_tbl values('cd',100);
insert into chunk_tbl values('de',100);
insert into chunk_tbl values('de',100);
insert into chunk_tbl values('de',100);
insert into chunk_tbl values('ef',100);
insert into chunk_tbl values('ef',100);
insert into chunk_tbl values('ef',100);
insert into chunk_tbl values('fg',100);
insert into chunk_tbl values('fg',100);
insert into chunk_tbl values('fg',100);
insert into chunk_tbl values('fg',100);
insert into chunk_tbl values('fg',100);
commit;
end;

This is the pattern in the table that has around 5m records with the first column indexed. If I have to split them based on rowid, it is possible that one split group could have the first record with value for column 1 equal to 'fg' and the second group all other records starting with the second record with the 'fg' value.

Is there way to make use of the split procedure that would include all records of any values that make it to a group?

In effect, if the first split group has any record of the column 1 'fg' then I would want that split group to have all the records with 'fg'. This is required as the program logic is based on the aggregates for a particular set of values in column 1.

Tom Kyte

Followup  

July 08, 2006 - 8:40 pm UTC

becomes bit of a bin fitting problem (what if fg should itself be "a group" and ab+de another group and bc+cd+ef the third group - to be nicely and evenly distributed.....)

You can use a technique like this:

ops$tkyte@ORA10GR2> select min(a), max(a), sum(cnt), grp
  2    from (
  3  select a, cnt,
  4         sum(cnt) over (order by a) running_tot,
  5             sum(cnt) over () tot,
  6         trunc( sum(cnt) over (order by a) / (sum(cnt) over ()/4) ) grp
  7    from (
  8  select a, count(*) cnt
  9    from chunk_tbl
 10   group by a
 11         )
 12         )
 13   group by grp
 14   order by 1
 15  /

MI MA   SUM(CNT)        GRP
-- -- ---------- ----------
ab bc          3          0
cd cd          2          1
de ef          6          2
fg fg          5          4


but now we start to wonder what takes longer :) the split or the processing... 

July 09, 2006 - 2:33 am UTC

Reviewer: Michel Cadot from France

Hi,

I think we get a better even distribution if we substract one to the running total in the group definition then not only the last a value can be in the last group.

SQL> with 
  2    counting as (
  3      select a, count(*) cnt from chunk_tbl group by a
  4    ),
  5    data as (
  6      select a, cnt,
  7             sum(cnt) over (order by a) running_tot,
  8             trunc ((sum(cnt) over (order by a)-1) / (sum(cnt) over ()/4)) grp
  9      from counting
 10    )
 11  select min(a), max(a), sum(cnt), grp
 12  from data
 13  group by grp
 14  order by 1
 15  /
MI MA   SUM(CNT)        GRP
-- -- ---------- ----------
ab bc          3          0
cd de          5          1
ef ef          3          2
fg fg          5          3

4 rows selected.

I once posted a query using the MODEL clause to solve such a problem. Have a look at:

http://groups.google.fr/group/comp.databases.oracle.misc/browse_thread/thread/274f3a314a2f3731/d945bc1ad0e264fb#d945bc1ad0e264fb

Regards,
Michel 

Group split

July 12, 2006 - 1:47 pm UTC

Reviewer: Kumar

<"what if fg should itself be "a group" and
ab+de another group and bc+cd+ef the third group">

I probably was not clear in my question earlier. In the sceanrio descibed, fg would be a group (n no of records),
ab will be another etc. So ab+be would never be a group. All are individual groups.

a. My concern was if I use the split procedure, will more than one grp have records pertaining to a particular group of values. In other words, if a record of 'fg' is split in grp 0, then all records of 'fg' should be found in grp 0 itself.

b. Also what I observed was, after the split procedure and the grps acquired, if I query for a record based on the min rowid of a grp, I dont' find the record. While if I do a count(*) for records between min and max of the rowid, it gives the results correctly. Is this expected.

Tom Kyte

Followup  

July 12, 2006 - 3:59 pm UTC

now I'm not following you at all. You lost me entirely if every unique code is a separate group - I fail to see where split comes in at all.

Last record in a grp

July 12, 2006 - 3:03 pm UTC

Reviewer: Kumar

Tom,

ONe more question:

How do I find out the last record in grp 0, the first record in grp1 etc..



Tom Kyte

Followup  

July 12, 2006 - 4:01 pm UTC

first_value(...)
last_value(...)

pop into mind?

July 12, 2006 - 4:36 pm UTC

Reviewer: Kumar

Sorry Tom that I am confusing you.

The records are not unique in the table in question.

the records are like:

col1 col2

ab 10
ab 10
ab 20

bc 50
bc 50
bc 100

...
When I split on rowid let's say for 2 chunks, will the grp 0 and grp 1 have overlapping values? If I take max(rowid) from grp0 and find that the value of col1 is 'ab' and then if I take the min(rowid) of grp1 and find the value of col1 is also 'ab' then that's what I want to avoid. Hope this is clear.

Also your answer about first_value(..) is first_value a key word to find the rowid?

What I did was:
select min(rowid) from tbl where rowid between 'xxx' and 'xxx' for grp 0. Similarly max(rowid) I found. Is this correct?

Tom Kyte

Followup  

July 12, 2006 - 5:29 pm UTC

do you want grp1 = ab,
grp1 = bc
and so on

or can

grp1 = ab+bc
and so on


that is the question.

first_value is a documented analytic function, documented alongside all of the other analytic functions you have been using with split.



July 12, 2006 - 4:37 pm UTC

Reviewer: A reader

To add to above, the table is indexed on col1.

BS

July 13, 2006 - 4:05 am UTC

Reviewer: Bhagat Singh from India, Delhi

Hello,

I tried the split.sql but getting different result


ora92m> desc t1;
Name Null? Type
----------------------------------------- -------- -------------------

X NOT NULL CHAR(30)

ora92m> select count(*) from t1;

COUNT(*)
----------
48413

ora92m> @c:\bhagatsingh\zerk\rnd\split t1 10

GRP MIN_RID MAX_RID
---------- ------------------ ------------------
0 AAANkwAABAAAO8xAAA AAANkwAABAAAO+oCcQ
1 AAANkwAABAAAO/BAAA AAANkwAABAAAPAoCcQ
2 AAANkwAABAAAPBRAAA AAANkwAABAAAPCICcQ
3 AAANkwAABAAAPGRAAA AAANkwAABAAAPG4CcQ
4 AAANkwAABAAAPHBAAA AAANkwAABAAAPKICcQ
9 AAANkwAABAAAPKJAAA AAANkwAABAAAPMICcQ

6 rows selected.

here you see I requested for 10 but got 6 more over one more thing that in GRP column 5-8 sequences are also missing.

Tom Kyte

Followup  

July 13, 2006 - 7:52 am UTC

because you have insufficient extents - it does things on extent boundaries. The maximum degree of "splitedness" will be the number of extents.

July 13, 2006 - 8:38 am UTC

Reviewer: Kumar

"do you want grp1 = ab,
grp1 = bc
and so on

or can

grp1 = ab+bc
and so on"

I would not be doing ab+bc at all. It will always be sum(col 2) group by col 1.
If I am doing a split and then run my procedure on the range of the rowid for let's say grp0 and doing sum(col 2) grouping on col 1 then it is obvious I would require all values of the col1 required to be grouped on.

eg:
min_rowid max_rowid

grp0 uuuuuuuuuuuuu xxxxxxxxxxx

grp1 yyyyyyyyyyyy zzzzzzzzzzzz


Grp1 in effect should not have col1 values overlapping from Grp0.

Please advise.





Tom Kyte

Followup  

July 13, 2006 - 9:20 am UTC

we are not communicating.


in grp0 - what are possible values of col1 - CAN both "AB" and "BC" be in grp0 - or not.

is grp0 just "AB" records or can grp0 under some circumstances be "AB" records *and* "BC" records.


if grp0 can have MORE THAN ONE DISTINCT value for col1 - then "bin fitting problem" is exactly what you have and what I originally said holds true.

if grp0 can have EXACTLY ONE DISTINCT value for col1 - then you don't want split, you just want to distinct col1 - those are your groups apparently.




July 13, 2006 - 9:55 am UTC

Reviewer: Kumar

"in grp0 - what are possible values of col1 - CAN both "AB" and "BC" be in grp0 - or not."

Yes. It could have AB and BC. Because I am splitting 10mill records to run my procedure in multiple sessions. There could be just one AB or 100's of AB. My statement is to do the sum(col 2) grouping by AB,BC etc.



"is grp0 just "AB" records or can grp0 under some circumstances be "AB" records
*and* "BC" records."

Almost always there will be AB and BC depending on the number of records. If for eg: my table has 2mill out of which AB and BC are 1million each, then possibly the first chunk will be only AB.


if grp0 can have MORE THAN ONE DISTINCT value for col1 - then "bin fitting
problem" is exactly what you have and what I originally said holds true.

if grp0 can have EXACTLY ONE DISTINCT value for col1 - then you don't want
split, you just want to distinct col1 - those are your groups apparently.


So in the given case, you would suggest the solution for 'bin fitting'?


Tom Kyte

Followup  

July 13, 2006 - 12:53 pm UTC

then it is the bin fitting problem and we cannot look at rowids - we can only look at values and the solution sets are above, in the original feedback!!! You have to do it by value ranges, you cannot use rowids.

July 13, 2006 - 9:57 am UTC

Reviewer: Kumar

Tom, one more pleae:

Will merge statemet lock the whole table? In the chunking case, when I give the range of rowid in the query:
merge into t1
using
( <my query using range of rowid>)

I feel that the whole table is locked. And my next session procedure serially follows after the completion of the first session calling the proc.


Thanks.


Tom Kyte

Followup  

July 13, 2006 - 12:54 pm UTC

No, merge will not lock an entire table.


Don't "feel" - prove.


But your merge looks suspicious.... You are not using a range on the table you are merging into, you are using a range on the SOURCE.



July 13, 2006 - 12:58 pm UTC

Reviewer: Kumar

"You have to do it by value ranges, you cannot use rowids. "

Will this hold true even if the records are sorted (which in fact are) by col1 before the insert goes through ?

Tom Kyte

Followup  

July 13, 2006 - 1:43 pm UTC

yes, because even pre-sorted data might not be "sorted" on disk.

You cannot use rowids
You have to use value pairs.

July 13, 2006 - 1:05 pm UTC

Reviewer: Kumar

merge into t
using
(select s.c1,s.c2,S.c3,S.c4 from s,
(select T.c1,T.c2,T.c3,
sum(t.c5) TOTAL from t
where rowid between start_rid and end_rid
group by T.c1,t.c2,T.c3) t1
where S.c1= t1.c1 AND s.c2=t1.c2
and sc.c3=t1.c3 and (s.c4= T1.total )
) s1
on ( T.ROWID between start_rid and end_rid AND
T.c1=S1.c1 AND t.c2=s1.c2 AND T.c3=S1.c3)
when matched then
update set STATUS=1;

S is the source and T is the target. The rowid's are used on T.


Tom Kyte

Followup  

July 13, 2006 - 1:49 pm UTC

that source doesn't make sense.

sum(t.c5) is basically a random number, you are only looking at a small slice of the table.

I don't get this logic at all.

July 13, 2006 - 1:20 pm UTC

Reviewer: Kumar

"But your merge looks suspicious.... You are not using a range on the table you
are merging into, you are using a range on the SOURCE."

This is the expanded query :


merge into t
using
(select s.c1,s.c2,S.c3,S.c4 from s,
(select T.c1,T.c2,T.c3,
sum(t.c5) TOTAL from t
where rowid between start_rid and end_rid
group by T.c1,t.c2,T.c3) t1
where S.c1= t1.c1 AND s.c2=t1.c2
and sc.c3=t1.c3 and (s.c4= T1.total )
) s1
on ( T.ROWID between start_rid and end_rid AND
T.c1=S1.c1 AND t.c2=s1.c2 AND T.c3=S1.c3)
when matched then
update set STATUS=1;

S is the source and T is the target. The rowid's are used on T.

Thus it look questionable?


Tom Kyte

Followup  

July 13, 2006 - 1:51 pm UTC

yes, looks questionable.

November 16, 2006 - 9:13 am UTC

Reviewer: A reader


Need of Statistics for the split to work accurate

December 15, 2006 - 2:23 am UTC

Reviewer: Ragesh from India

Hi Tom,
We have a transaction table which contain around 200 million records and we have a month end query which takes around 5 hours to complete to pull an average 2lakh records. This table is not partitioned. When I tested with rowid partition the results were not matching. I guess this is because of the old statistics.
Here our difficulty is we dont know how frequent the statistics is collected on this table in the production

Please advise what we can do without the participation of DBA. (The user which we use have permission on dba_extents and dba_objects)
We are not using the DBMS_JOB since we pull the records using a SAS tool.
Thanks a lot for the great support you provide..

Tom Kyte

Followup  

December 15, 2006 - 8:45 am UTC

what is a lakh

what is a rowid partition

why would statistics lead to a wrong result?

December 19, 2006 - 1:33 am UTC

Reviewer: A reader

Hi Tom,

Thanks a lot for the quick reply and sorry for confusing you.
'a lack' is equal to a hundred thousand in Indian numbering system.
By rowid partition I mean splitting the table by rowid ranges (I should be more careful when selecting words :)

And it was my misunderstanding that statistics lead the wrong results.( The confusion came due to the 'dba_objects' in the query.)

Thanks again for the woderful support




December 19, 2006 - 1:41 am UTC

Reviewer: A reader

Hi Tom,

Thanks a lot for the quick reply and sorry for confusing you.
'a lack' is equal to a hundred thousand in Indian numbering system.
By rowid partition I mean splitting the table by rowid ranges (I should be more careful when selecting words :)

And it was my misunderstanding that statistics lead the wrong results.( The confusion came due to the 'dba_objects' in the query.) The query gave the wrong result since I added the rowid condition in an EXISTS subquery.

Thanks - Ragesh






December 19, 2006 - 1:43 am UTC

Reviewer: A reader

Hi Tom,

Thanks a lot for the quick reply and sorry for confusing you.
'a lakh' is equal to a hundred thousand in Indian numbering system.

By rowid partition I mean splitting the table by rowid ranges (I should be more careful when selecting words :)

And it was my misunderstanding that statistics lead the wrong results.( The confusion came due to the 'dba_objects' in the query.) The query gave the wrong result since I added the rowid condition in an EXISTS subquery.

Thanks - Ragesh






Serialized access

April 17, 2007 - 7:09 pm UTC

Reviewer: RP from London, UK

Hi Tom,

i think i've seen something similar on this site but a saerch turned up nothing so forgive me if you've seen this before.

I am currently in the design phase of a project that gathers data from the web.

A user submits a query (a TASK) which can potentially result in millions of rows being written to the ARCHIVE table. The ARCHIVE table has information about a URL (as well as the TASK id foreign key and a status to say ready for Java).

Java will be querying the ARCHIVE table using multiple threads to get a set of up to 100 rows for a given task id. Java will then go to each url, grab the content then write the content to the db and update the ARCHIVE row to say done.

Is their any strategy whereby Java can use simultaneous selects but guarantee another thread didnt get that row? I can see serializing access but that kill performance (there might also be another job adding more rows to ARCHIVE for another task).

I hope i've explained myself clearly.

Thanks

R
Tom Kyte

Followup  

April 18, 2007 - 11:50 am UTC

why not use AQ (advanced queues) - a feature built just for such a thing, your java stuff can even use JMS and the like to interact with it.

How to use AQ for this ?

April 19, 2007 - 1:38 pm UTC

Reviewer: Pasko

Hi Tom,

you suggested the use of AQ,
could you please elaborate on a Feature of AQ which will guarantee the Requirement as presented above, i.e

<quote>
'Is their any strategy whereby Java can use simultaneous selects but guarantee another thread didnt get that row?
</quote>

the biggest issue is to guarantee that the competing Java threads don't get the same rows...

i have some ideas:

1.Should we assign Priority to the Queue Payloads and then
let each Java Thread extract rows with specific Priority?

2.Should the Java Threads be registered as Consumers to Specific Queues?

3...


Best regards,

Pasko

Tom Kyte

Followup  

April 19, 2007 - 1:56 pm UTC

instead of putting rows into a table and have java try to select out different ranges

you put messages into a queue and java programs dequeue them.

Yes, but how do we prevent Java dequeueing the same messages

April 20, 2007 - 12:21 pm UTC

Reviewer: Pasko

Hi Tom,

Thanks for your follow-up.

so, now i put my messages into a Queue and let my Java programs dequeue them.

It's still not clear to me how should i prevent the Java threads from dequeuing the same messages.

Another Requirement:
I don't want to delete the messages after a Dequeue.

Tom Kyte

Followup  

April 20, 2007 - 1:49 pm UTC

that is what queues do, the provide highly concurrent, multi-consumer dequeueing capabilities.

which Dequeue MODE to use

April 20, 2007 - 4:33 pm UTC

Reviewer: Pasko

Hi Tom,

i just scanned through AQ Streams Guide,and noted that there are following Dequeue Modes:
BROWSE, LOCK, REMOVE, and REMOVE_NO_DATA

So, we have to use 'LOCK Dequeue Mode' in order to keep the Messages after Dequeue and guarantee that Multiple Dequeue Clients do not get the same messages.

1.What kind of lock is used internally when i choose LOCK Dequeue Mode?

2.Is this an internal Oracle Latch or a normal 'select for update' done by Oracle?



Tom Kyte

Followup  

April 20, 2007 - 5:00 pm UTC

it is all internal, AQ uses documented methods to read the queue tables in a non-blocking, but concurrently safe fashion.

want to insert to other Tables

April 24, 2007 - 4:28 am UTC

Reviewer: Pasko

Hi Tom,
so i am thinking of designing a prototype for system using AQ.
The current System is continously polling the Database for NEW Messages of some Status.
So i want to use AQ in order to avoid having to constantly poll the DB.
I also want to INSERT the enqueued messages to Other Non-AQ Relational Tables.These Tables might also be partitioned History Tables,so i can't depend only on AQ created Tables.

My Question is, should insert into other non-AQ Tables at Dequeue/Enqueue Time using a Normal 'Insert into ... values ...' Clause or should i use Streams AQ Apply Process using LCR Handler?

I guess the normal insert statement would generate extra Redo/Undo,bu i am not sure if the Streams Apply Handler using LCR would also generate the same amount of Redo/Undo.

Regards,

pasko
Tom Kyte

Followup  

April 24, 2007 - 11:06 am UTC

... also want to INSERT the enqueued messages to Other Non-AQ Relational Tables. ...

does not compute, don't know what it means to "insert an enqueued message" like that. The message would be enqueued, a process would dequeue and that process can do whatever it likes (including inserting into other tables...)

you would not be using logical change records here, you would be queuing a message, something else dequeues it and processes it.

Think of it like email. You send an email to a subordinate telling them "do a, b and c". That is the "enqueue". They receive the email and do a, b, and c - whatever that entails.

Restricting number of parallel sessions for a stored procedure @ user level

February 08, 2008 - 1:09 am UTC

Reviewer: Dheeraj from India

Hi Tom,

We have a third party Java application that makes multiple Oracle stored procedure calls, say, 30, at a time. Now, we want DB to accept only 10, 15 or 20 calls of this procedure, based on the parameter value, controlled by us.

Pls note that we wish to achieve this @user level and not at system level.

Basically, depending on the load size and DB availability, we wish to restrict no. of parallel sessions of the same procedure @ the DB side, for a particular user.

Any pointer will be highly appreciated.

Will putting following statements inside the stored proc. help in any way:
------------------------------
parallel_sessions_count := 10;
ALTER SYSTEM SET SHARED_SERVERS = parallel_sessions_count;
------------------------------

Many thanks,

Dheeraj
Tom Kyte

Followup  

February 08, 2008 - 7:38 am UTC

that statement would do nothing for you. Well, maybe really hurt you, but that is about it.

shared servers are an instance thing. You said "at the user, not the system", think about it (make sure you understand what shared servers are, how they work, what they do - see the concepts guide)

resource manager - active session pool
http://docs.oracle.com/docs/cd/B19306_01/server.102/b14231/dbrm.htm#sthref3237


Does paralell processing actualy help?

February 08, 2008 - 8:33 am UTC

Reviewer: Vinicius Pacheco from SP, Brasil

I was wondering if the split process described in your first answer to this thread actually bring a significant reduction in processing time...

Ok, we split the processing in 4 processes, but the data still in the same disk. The head can't read 4 positions of same disk at same time, so it will generate some I/O contention... Or I'm wrong?

Thanks for this amazing source of information that your site is.
Tom Kyte

Followup  

February 08, 2008 - 9:21 am UTC

well, it is doubtful that you only READ - correct.

You read, then you process, you read, then you process.

So, maybe not 4 processes, maybe just two, but parameterize it - see what you see with different degrees - it'll matter how much time you spend "processing" (and then we'd need to look at how many cpu's you have)

parallel processing can make a difference even on a single cpu machine with a single disk - not as much on a machine with 32 cpus and 128 disks - but no one would expect that.

Chunk script stopped working

March 11, 2008 - 7:25 am UTC

Reviewer: Phillip Lewis from UK

Sir,

I have used your 'chunk' script successfully for many years. However, last weekend our database was upgraded to 'Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - 64bit Production' and the script now runs forever, well 2 hours before killing it! Can you please let me know if this chunking concept should work with 10g or should another method be used.

I thank you in anticipation for your help with this matter.

Phillip Lewis

Tom Kyte

Followup  

March 11, 2008 - 9:04 pm UTC

should work just dandy

have you gathered stats on the SYS tables *as well as* the fixed tables?

I suspect a bad plan derived from bad statistics.

Rowid Parellelism.....

March 12, 2008 - 9:56 am UTC

Reviewer: Vivek Sharma from Mumbai, India

Hi,

I recently used this rowid parallelism introduced by Tom. It works perfectly fine and is fruitful, when you are short of resources.

A purge, that took more than 12-14 hours, was achieved in less than 3 hours using this method. The Server was short of resources (12 CPU), hence, PARALLEL hint also failed to achieve what I could with ROWID Parallelism.

Regards
Vivek

October 20, 2008 - 4:52 am UTC

Reviewer: A reader

Hi,

Can you please elaborate on the rowID parallelism code ? I have a table with six fields in it and also a BLOB on a source machine. I have 100 million records in it and by doing direct insert its taking a lot of time .I am unable to understand the code that you posted in the first thread. I want to insert the values of this source table into target by using dblink option "@" i.e migrating the source table values into the target

I tried using insert into t (select * from t@dblink); but this is taking about 2 days with 100 million records in it.

I was successfully able to retrieve the 4 rowid's min and max for the source table using your split.sql script but please let me know how do we proceed about inserting the values in pl/sql ? (the code)

Thanks for your help
Tom Kyte

Followup  

October 21, 2008 - 11:51 am UTC

fire up 4 sqlplus sessions.

in session 1

insert into t select * from t@remote where rowid between &A and &B;

use one of the rowid ranges for &A and &B

In session 2

insert into t select * from t@remote where rowid between &A and &B;

use one of the OTHER (not the one you used above) rowid ranges...

and so on.


the rowid ranges break your big table into N non-overlapping chunks, the sum of which entirely cover your table - so now you can start N sessions, each reading a different bit of the source table.


You know - you might be much better off TRANSPORTING this data - have you thought about doing that - instead of read it over the network, reformatting it and writing it - just detach the datafiles from source, copy them and attach them to target?

October 20, 2008 - 10:26 am UTC

Reviewer: A reader

Hi tom,

I am the same reader from above ... 

Here is how my table looks like the one which has the BLOB
SQL> select count(*) from example;

  COUNT(*)
----------
  96519878
SQL> @split example 4

no rows selected

Please let me know how do I proceed with this ... 
any help would be appreciated

Tom Kyte

Followup  

October 21, 2008 - 12:04 pm UTC

give the ddl for example.

is example a local table
is example a SINGLE table - not partitioned

do you understand what the query you are using does - please take a couple of minutes to study it and understand it - it simply reads dba_extents to break the segment up - if example is not a single segment - there isn't anything for it to break up.

If it is a partitioned table, you probably have it easier already. just use the partition extended name in the select statement and use N sqlplus sessions where N = number of partitions.

Example please

October 21, 2008 - 3:37 pm UTC

Reviewer: GS

Can someone please share an example of using the split and the "simulation" procedure?

I am not clear about how the new tables (T1..T4) and the rowids from the split query are used.

Can I use the ntile and get the PK ranges and use them instead of rowid?

Thanks.
Tom Kyte

Followup  

October 21, 2008 - 4:13 pm UTC

t1, t2, t3, t4 and split were mutually exclusive.

I said

.. "Now, you don't need to split the table into 4 -- you can just use ROWID ranges. If you use this script I call "split""


meaning "instead of 4 tables, use one and split it up"

all split does is

read the extent information
create N non-overlapping rowid ranges.

Say you do "split 2" of a table, then you get two rowid ranges (a1,b1) and (a2,b2) it will be true that:

select * from t where rowid between a1 and b1
union all
select * from t where rowid between a2 and b2

is the same as

select * from t;


we just broke the table into two non-overlapping pieces that completely cover the table.

Hence, if you wanted to do something big to T, you could

a) split it into N chunks - N rowid ranges
b) run N jobs - passing into each a different rowid range - simultaneously in parallel


that is all.


If you have access to my book Expert Oracle Database Architecture, see the chapter on parallel - I have more examples in there as well.

parallel

February 09, 2009 - 9:57 pm UTC

Reviewer: A reader


DIY Parallelism -- Can we split a selection of rows from Table?

July 08, 2009 - 7:15 am UTC

Reviewer: Rahul Nimmagadda from India

I read your article on DIY Parallelism in your book "Effective Oracle by Design". The ROWID split procedure talks about splitting a whole table. Is it possible to have such a split for some subset of rows which will be selected by a where clause. I mean, can we split the result of a select query (which will have only a WHERE clause) into N chunks where some processing can be done on each chunk.

Thanks,
Rahul
Tom Kyte

Followup  

July 08, 2009 - 3:20 pm UTC

I also mentioned NTILE in that writing - you would probably use that.

Say you have a table with a single column primary key (object_id in ALL_OBJECTS) and you wanted to split up the processing of any users objects into 4 equal sized sets. It would look like this:

ops$tkyte%ORA10GR2> variable o varchar2(30)
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> exec :o := 'SCOTT'

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> select min(object_id), max(object_id), count(*), nt
  2    from (
  3  select object_id, ntile(4) over (order by object_id) nt
  4    from all_objects
  5   where owner = :o
  6         )
  7   group by nt
  8  /

MIN(OBJECT_ID) MAX(OBJECT_ID)   COUNT(*)         NT
-------------- -------------- ---------- ----------
         51146          51148          3          1
         51149          51151          3          2
         58116          58130          3          4
         52698          55357          3          3

ops$tkyte%ORA10GR2> exec :o := 'SYS'

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> /

MIN(OBJECT_ID) MAX(OBJECT_ID)   COUNT(*)         NT
-------------- -------------- ---------- ----------
             2           8631       5719          1
          8632          15176       5719          2
         20896          58476       5719          4
         15177          20895       5719          3




and your process would receive as input the min/max primary key values and use that in their where clause (instead of rowid ranges)

Thanks

July 09, 2009 - 12:46 am UTC

Reviewer: Rahul Nimmagadda from India

Tom,

As always, Thanks a lot. I will let you know if I face any problems. Hopefully, not expecting. :)

Ntile approach

September 15, 2009 - 6:22 pm UTC

Reviewer: A Reader from UK

Hi Tom,

Isn't it that if we use the min-max Ntile range (object id in the above example) the range may include object_id of objects other than that of the user specified. i.e. in the first example if for NT=1 if I select objects 'where object_id between 51146 and 51148' there is a chance that the result may include objects owned by users other than 'scott'.

Thanks

Tom Kyte

Followup  

September 16, 2009 - 8:53 am UTC

so what? I didn't care about that, it wasn't a 'requirement' in my example, I just wanted to process EVERY ROW in the table

If you wanted just "scotts" rows, you would:

ops$tkyte%ORA10GR2> select nt, min(object_id), max(object_id), count(*)
  2    from (select object_id, ntile(4) over (order by object_id) nt
  3            from all_objects
  4                   where owner = 'SCOTT' )
  5   group by nt;

        NT MIN(OBJECT_ID) MAX(OBJECT_ID)   COUNT(*)
---------- -------------- -------------- ----------
         1         175243         175246          4
         2         175247         178239          4
         4         178244         178252          3
         3         178240         178243          4



and then your query to retrieve data to process would of course be:
select * 
  from all_objects 
 where object_id between :MIN and :MAX 
   and owner = 'SCOTT';



just a variation on a theme, using a technique and adapting it for your specific needs.

split on large table

October 07, 2009 - 4:53 pm UTC

Reviewer: Paul from NJ

Tom
we have a temporary table which does not have a primary key. Right now, it is doing row-by-row cursor processing.
we are looking into parallelize the plsql processing by split or NTILE and run it parallel.

a) In the cursor, we have order by clause. How to do split without affecting the logic.

b) as there is no primary key, Is NTILE applicable in this case?

How to do range-wise split using order by in the pl/sql cursor ?


Code is here
===================
DECLARE
CURSOR ren_pol_cur IS
SELECT a.*
,ROWID ROW_ID
FROM etl_sdm.ren_pol_prod_temp a
ORDER BY
a.pol_no
,a.pol_eff_date
,a.ann_stmt_co_code
,a.product_renewal_abbr
,a.ipl_update_ts ASC;
/****************************************************************************************************/
oldrec ren_pol_cur%ROWTYPE;
newrec ren_pol_cur%ROWTYPE;
/****************************************************************************************************/
v_flag BOOLEAN :=TRUE ;
dsp_in BOOLEAN :=FALSE ;
v_counter NUMBER :=0 ;
v_dsp_gwp NUMBER :=0 ;
v_dsp_div_no oldrec.dsp_div_no%TYPE ;
v_dsp_sec_code oldrec.dsp_sec_code%TYPE ;
v_dsp_puc oldrec.dsp_puc%TYPE ;


/****************************************************************************************************/
BEGIN
OPEN ren_pol_cur;
LOOP
FETCH ren_pol_cur INTO newrec ;

EXIT WHEN ren_pol_cur%NOTFOUND ;
/****************************************************************************************************/
IF NOT
(
oldrec.pol_no = newrec.pol_no AND
oldrec.pol_eff_date = newrec.pol_eff_date AND
oldrec.ann_stmt_co_code = newrec.ann_stmt_co_code AND
oldrec.product_renewal_abbr = newrec.product_renewal_abbr
) AND
v_flag = FALSE
THEN
IF dsp_in = FALSE
THEN
INSERT INTO fei.ren_pol_prod
VALUES ( oldrec.pol_no
,oldrec.pol_eff_date
,oldrec.ann_stmt_co_code
,oldrec.product_renewal_abbr
,oldrec.renewable_ind
,oldrec.pol_exp_date
,oldrec.pol_dtl_link
,oldrec.sum_gwp
,oldrec.proc_date
,oldrec.producer_no
,oldrec.duns_no
,oldrec.premium_market_segment_code
,oldrec.insured_name
,oldrec.branch_code
,oldrec.working_branch_cd
,oldrec.dsp_div_no
,oldrec.dsp_sec_code
,oldrec.dsp_puc
,oldrec.aigrm_contract_no
,oldrec.prior_pol_no
,oldrec.source_system_id
);
v_dsp_gwp := newrec.max_gwp ;
v_dsp_div_no := newrec.dsp_div_no ;
v_dsp_sec_code := newrec.dsp_sec_code ;
v_dsp_puc := newrec.dsp_puc ;
ELSIF dsp_in = TRUE
THEN
INSERT INTO fei.ren_pol_prod
VALUES (oldrec.pol_no
,oldrec.pol_eff_date
,oldrec.ann_stmt_co_code
,oldrec.product_renewal_abbr
,oldrec.renewable_ind
,oldrec.pol_exp_date
,oldrec.pol_dtl_link
,oldrec.sum_gwp
,oldrec.proc_date
,oldrec.producer_no
,oldrec.duns_no
,oldrec.premium_market_segment_code
,oldrec.insured_name
,oldrec.branch_code
,oldrec.working_branch_cd
,v_dsp_div_no
,v_dsp_sec_code
,v_dsp_puc
,oldrec.aigrm_contract_no
,oldrec.prior_pol_no
,oldrec.source_system_id
);
v_dsp_gwp := newrec.max_gwp ;
v_dsp_div_no := newrec.dsp_div_no ;
v_dsp_sec_code := newrec.dsp_sec_code ;
v_dsp_puc := newrec.dsp_puc ;
END IF;

ELSE
IF v_flag = FALSE AND
newrec.dsp_div_no = oldrec.dsp_div_no AND
newrec.dsp_sec_code = oldrec.dsp_sec_code AND
newrec.dsp_puc = oldrec.dsp_puc
THEN
DSP_IN := FALSE;
ELSIF v_flag = FALSE
THEN
DSP_IN := TRUE ;
IF newrec.max_gwp >= v_dsp_gwp
THEN v_dsp_gwp := newrec.max_gwp ;
v_dsp_div_no := newrec.dsp_div_no ;
v_dsp_sec_code := newrec.dsp_sec_code ;
v_dsp_puc := newrec.dsp_puc ;
END IF;

ELSIF v_flag = TRUE
THEN
v_dsp_gwp := newrec.max_gwp ;
v_dsp_div_no := newrec.dsp_div_no ;
v_dsp_sec_code := newrec.dsp_sec_code ;
v_dsp_puc := newrec.dsp_puc ;
END IF;
END IF;
v_flag:=FALSE;
oldrec:=newrec;
END LOOP;
CLOSE ren_pol_cur;
IF dsp_in = FALSE
THEN
INSERT INTO fei.ren_pol_prod
VALUES ( oldrec.pol_no
,oldrec.pol_eff_date
,oldrec.ann_stmt_co_code
,oldrec.product_renewal_abbr
,oldrec.renewable_ind
,oldrec.pol_exp_date
,oldrec.pol_dtl_link
,oldrec.sum_gwp
,oldrec.proc_date
,oldrec.producer_no
,oldrec.duns_no
,oldrec.premium_market_segment_code
,oldrec.insured_name
,oldrec.branch_code
,oldrec.working_branch_cd
,oldrec.dsp_div_no
,oldrec.dsp_sec_code
,oldrec.dsp_puc
,oldrec.aigrm_contract_no
,oldrec.prior_pol_no
,oldrec.source_system_id
);
ELSIF dsp_in = TRUE
THEN
INSERT INTO fei.ren_pol_prod
VALUES (oldrec.pol_no
,oldrec.pol_eff_date
,oldrec.ann_stmt_co_code
,oldrec.product_renewal_abbr
,oldrec.renewable_ind
,oldrec.pol_exp_date
,oldrec.pol_dtl_link
,oldrec.sum_gwp
,oldrec.proc_date
,oldrec.producer_no
,oldrec.duns_no
,oldrec.premium_market_segment_code
,oldrec.insured_name
,oldrec.branch_code
,oldrec.working_branch_cd
,v_dsp_div_no
,v_dsp_sec_code
,v_dsp_puc
,oldrec.aigrm_contract_no
,oldrec.prior_pol_no
,oldrec.source_system_id
);
END IF;
COMMIT;
END;
/

missing rows using 10000 as maximum rownum

October 22, 2009 - 5:06 am UTC

Reviewer: Jan Leers from Belgium

I implemented your solution as:
rowid >= min_rowid AND (next_min_rowid IS NULL OR rowid < next_row_id)

The reason for this is when using 10000 as my maximum row_number, i get missing rows.

This approach is working, but it would be easier to use between. Is their a way to find out the real maximum row_number in a block.

Regards, jan
Tom Kyte

Followup  

October 23, 2009 - 1:22 pm UTC

I don't know what you did, this:

rowid >= min_rowid AND (next_min_rowid IS NULL OR rowid < next_row_id)

means nothing to me, I don't see it in context of anything.

...
The reason for this is when using 10000 as my maximum row_number, i get missing
rows.
...

I don't think so, can you prove that to me? You cannot have more than 10,000 rows on a block.

missing rows using 10000 as maximum rownum

October 27, 2009 - 10:50 am UTC

Reviewer: Jan Leers from Belgium

I'm sorry, I can't seem to reproduce the case.
However, I did read somewhere that the maximum rownumber in a block is 32767.

Regards, Jan
Tom Kyte

Followup  

October 27, 2009 - 12:10 pm UTC

then bump it up


but - tell me how, given that there is block overhead and a row takes way more than one byte - and the maximum block size is 32k..............

the number to hold the row number is a signed 2 byte integer.

But the maximum number of rows/block is way under that limit.

Works Great!

January 13, 2010 - 7:40 am UTC

Reviewer: Bruce Szalwinski from Seattle, WA

We used the splitter technique recently to update 3.2B records across 20 tables using 10 parallel jobs. Total time was just over 6 hours. Very cool. Thanks for the new tool.
Tom Kyte

Followup  

January 18, 2010 - 4:20 pm UTC

wait till 11gr2 with dbms_parallel_execute :)

it completely automates this process - 100%

January 31, 2010 - 11:18 am UTC

Reviewer: venkatswamy from India

Hi Sir,
Iam using 10g R2 database.
I need to generate 2000000000 simulations. Iam splitting into 10 times using 10 parallel jobs using dbms_job package. It is taking almost 8 hours to finish the task. But I want to reduce time to 2 hours.
job_queue_processes = 10(maximum as per DBA of my database).

Please help me.
Tom Kyte

Followup  

February 01, 2010 - 10:23 am UTC

make your simulation run 4 times faster???

I mean SERIOUSLY - what could *anyone* say.


but have your DBA re-read the documentation
http://docs.oracle.com/docs/cd/B19306_01/server.102/b14237/initparams089.htm#REFRN10077

not that it'll run faster - if you have 4'ish cpus, 10 is about the top number you want (2xcpus would be about the MAXIMUM)

February 01, 2010 - 9:56 pm UTC

Reviewer: venkatswamy from India

Tom, 
Thank you very much for your help.
I have asked to DBA to increase job_queue_processes value to 50. He was increased. I have tried to run 50 jobs parallelly like below but it is also taken same time (8 hours). 

FOR I IN 1..50 LOOP
     DBMS_JOB.SUBMIT (vJob,'PKG_JP_SIMULATION.SP_JP_GET_SIMULATIONS('||lSIMULATION||');');
     INSERT INTO CP_SIMLN_JOBS VALUES (vJob);
    END LOOP;
    COMMIT;

Can you please tell me the process in order to reduce the time.

show parameter parallel

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
fast_start_parallel_rollback         string      LOW
parallel_adaptive_multi_user         boolean     TRUE
parallel_automatic_tuning            boolean     FALSE
parallel_execution_message_size      integer     2152
parallel_instance_group              string      
parallel_max_servers                 integer     120
parallel_min_percent                 integer     0
parallel_min_servers                 integer     0
parallel_server                      boolean     FALSE
parallel_server_instances            integer     1
parallel_threads_per_cpu             integer     2
recovery_parallelism                 integer     0

SQL> show parameter job

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
job_queue_processes                  integer     50


Tom Kyte

Followup  

February 02, 2010 - 12:16 pm UTC

... Can you please tell me the process in order to reduce the time....

I already did, make your process run faster.


You don't say how many cpu's you have, but I doubt it is 25, so 50 is probably way over the top. did you read what I said about 2*NUMBER OF CPUS ??? and that only applies if you are the only thing on the machine.

February 24, 2010 - 4:18 am UTC

Reviewer: A reader

Hi Tom,

I had and issue simmilar to this after running the procedure i got an error message :

ORA-01410: Invalid ROWID

ORA-06512: at "APPS.XXX_GAPP_UPDATE_UPDATECLIENT1" line 96

ORA-06512:at line 3


when i check the error stack i find this statement highlighted in yellow

for i in c1(branch1_code) loop

what could be the cause, this procedure takes several hours executing and it use to take just 4 secs to execute.



Tom Kyte

Followup  

March 01, 2010 - 8:47 am UTC

are you using parallel query?

A different approach

March 04, 2010 - 3:01 am UTC

Reviewer: Emre Yavuz from TURKEY

Hi,
I ve just found this thread and I am surprised that there are actually other people trying do what I ve done. Actually my purpose was to write a procedure that will improve our ETL performance by dividing tables into chunks on the remote source databases. In the above approach, its very nice to have rowid intervals with just one query, but the problem arises when you have very large databases as we do. In our case, doing a simple query on DBA_EXTENTS takes hours, so the approach would loose its effectiveness. In my approach, I first create the queryable version of DBA_EXTENTS which I ve discovered after long hours of trial and error work. Then the procedure first divides the block ranges into desired number of chunks and then converts those block ranges into rowid ranges. All these takes 2 minutes at most for a 1 Tb table on a 70 Tb database.
Next I will try to upload the scripts, I hope I can...

Enjoy...
Tom Kyte

Followup  

March 04, 2010 - 9:50 am UTC

in fact, in 11g R2, this is a builtin feature of the database. dbms_parallel_execute.

and before you say dba_extents is too slow, have you even tried the query provided above? It has worked pretty well many times in the past on many databases in the past.....

scripts

March 04, 2010 - 3:11 am UTC

Reviewer: Emre Yavuz from TURKEY

/*MAIN TABLE THAT CONTAINS ROWID CHUNKS */
CREATE TABLE T_PARALLEL_CHUNKS
(
CHUNK_ID NUMBER,
OWNER VARCHAR2(30 BYTE),
SEGMENT_NAME VARCHAR2(30 BYTE),
PARTITION_NAME VARCHAR2(30 BYTE),
START_ROWID ROWID,
END_ROWID ROWID
);

/*THIS IS TEMPORARY, SO YOU CAN USE A GLOBAL TEMPORARY TABLE INSTEAD */

CREATE TABLE T_SEGMENT_BLOCKS
(
CHUNK_ID NUMBER,
OWNER VARCHAR2(30 BYTE),
SEGMENT_NAME VARCHAR2(30 BYTE),
PARTITION_NAME VARCHAR2(30 BYTE),
REL_FNO NUMBER,
START_BLOCKID VARCHAR2(30 BYTE),
END_BLOCKID VARCHAR2(30 BYTE),
OBJECT_ID NUMBER
);


/* THIS IS OPTIONAL */

CREATE INDEX IX_PARALLEL_CHUNKS_01 ON T_PARALLEL_CHUNKS(OWNER,SEGMENT_NAME,CHUNK_ID,START_ROWID,END_ROWID);


/* SHOULD BE CREATED UNDER SYS */

CREATE OR REPLACE PROCEDURE SYS.P_PREPARE_EXTENTS_VIEW AS
BEGIN

EXECUTE IMMEDIATE 'alter session set optimizer_index_cost_adj=100';
BEGIN
EXECUTE IMMEDIATE 'DROP TABLE DB_EXTENTS_TEMP';
EXCEPTION
WHEN OTHERS THEN
NULL;
END;
EXECUTE IMMEDIATE 'CREATE TABLE DB_EXTENTS_TEMP NOLOGGING PCTFREE 0 PARALLEL 8 AS
select /*+ no_use_nl full(o) full(s) parallel(o 4) full(so) parallel(so 4) parallel(s 4)*/ NVL(u.name, ''SYS'') owner, o.name segment_name, o.subname partition_name,
so.object_type segment_type, s.type#,
f.file# header_file, s.block# header_block,
s.blocks,
s.file# relative_fno, o.dataobj#,S.TS# tablespace_id,NVL(s.spare1,0) segment_flags
from sys.user$ u, sys.obj$ o,
sys.sys_objects so, sys.seg$ s,
sys.file$ f
where s.file# = so.header_file
and s.block# = so.header_block
and s.ts# = so.ts_number
and o.obj# = so.object_id
and o.owner# = u.user# (+)
and s.type# = so.segment_type_id
and o.type# = so.object_type_id
and s.ts# = f.ts#
and s.file# = f.relfile#';

BEGIN
DBMS_STATS.GATHER_TABLE_STATS('SYS','DB_EXTENTS_TEMP',degree=>8);
END;

EXECUTE IMMEDIATE 'CREATE INDEX IX_DB_EXTENTS_TEMP ON DB_EXTENTS_TEMP(tablespace_id,relative_fno) NOLOGGING PARALLEL 4';

EXECUTE IMMEDIATE 'CREATE OR REPLACE VIEW V_DB_EXTENTS AS
select /*+ ordered use_nl(e) use_hash(f ds) no_index(f) no_index(ds) no_parallel(f) no_parallel(ds) */
ds.owner, ds.segment_name, ds.partition_name, ds.segment_type,
f.file# file_id, e.ktfbuebno block_id,
e.ktfbueblks BLOCKS, e.ktfbuefno REL_FNO
from sys.file$ f,DB_EXTENTS_TEMP ds,sys.x$ktfbue e
where e.ktfbuesegfno = ds.relative_fno
and e.ktfbuesegbno = ds.header_block
and e.ktfbuesegtsn = ds.tablespace_id
and ds.tablespace_id = f.ts#
and e.ktfbuesegtsn=f.ts#
and e.ktfbuefno=f.relfile#
and bitand(NVL(ds.segment_flags, 0), 1) = 1
and bitand(NVL(ds.segment_flags,0), 65536) = 0
';
END;
/

/* THESE PRIVILEGES SHOULD BE GRANTED TO THE OWNER OF THE MAIN PROCEDURE */

GRANT EXECUTE ON SYS.P_PREPARE_EXTENTS_VIEW TO <OWNER OF THE PROCEDURE>;

GRANT SELECT ON SYS.V_DB_EXTENTS TO <OWNER OF THE PROCEDURE>;

/* MAIN PROCEDURE */

CREATE OR REPLACE PROCEDURE P_CREATE_TABLE_CHUNKS(PAR_SOURCE_DB VARCHAR2,PAR_OWNER VARCHAR2,PAR_SEGMENT_NAME VARCHAR2,PAR_SUBOBJECT_NAME VARCHAR2 DEFAULT NULL, PAR_CHUNK_COUNT NUMBER,PAR_POPULATE_EXTENTS NUMBER DEFAULT 0) AS
nBlkcount NUMBER;
nBlks NUMBER;
nBlkid NUMBER;
nBlkPerJob NUMBER;
nBlkAssigned NUMBER:=0;
nTemp NUMBER;
nJobid NUMBER:=1;
nobjid NUMBER;
strSrcDb VARCHAR2(30):='';
TYPE OBJECT_REC IS RECORD (OBJECT_ID NUMBER,OBJECT_NAME VARCHAR2(30), SUBOBJECT_NAME VARCHAR2(30));
TYPE OBJECT_TAB IS TABLE OF OBJECT_REC INDEX BY BINARY_INTEGER;
Object_Table OBJECT_TAB;
TYPE EXTENT_REC IS RECORD (OWNER VARCHAR2(30), SEGMENT_NAME VARCHAR2(30), PARTITION_NAME VARCHAR2(30), SEGMENT_TYPE VARCHAR2(30), FILE_ID NUMBER,BLOCK_ID NUMBER, BLOCKS NUMBER, REL_FNO NUMBER);
TYPE EXTENT_TAB IS TABLE OF EXTENT_REC INDEX BY BINARY_INTEGER;
Extent_Table EXTENT_TAB;
BEGIN
IF PAR_SOURCE_DB IS NOT NULL THEN
strSrcDb:='@' || PAR_SOURCE_DB;
END IF;
IF PAR_POPULATE_EXTENTS=1 THEN
EXECUTE IMMEDIATE 'BEGIN SYS.P_PREPARE_EXTENTS_VIEW' || strSrcDb || '; END;';
END IF;
EXECUTE IMMEDIATE 'SELECT SUM(BLOCKS) FROM SYS.V_DB_EXTENTS' || strSrcDb || ' WHERE OWNER=''' || PAR_OWNER || ''' AND SEGMENT_NAME=''' || PAR_SEGMENT_NAME || ''' AND NVL(PARTITION_NAME,''XXXX'') LIKE NVL(''' || PAR_SUBOBJECT_NAME || ''',NVL(PARTITION_NAME,''XXXX''))' INTO nBlkcount;
nBlkPerJob:=CEIL(nBlkcount/PAR_CHUNK_COUNT);
EXECUTE IMMEDIATE 'SELECT DISTINCT DATA_OBJECT_ID,OBJECT_NAME,SUBOBJECT_NAME FROM DBA_OBJECTS' || strSrcDb || ' WHERE OWNER=''' || PAR_OWNER || ''' AND OBJECT_NAME='''
|| PAR_SEGMENT_NAME || ''' AND NVL(SUBOBJECT_NAME,''XXXX'') LIKE NVL(''' || PAR_SUBOBJECT_NAME || ''',NVL(SUBOBJECT_NAME,''XXXX'')) AND DATA_OBJECT_ID IS NOT NULL'
BULK COLLECT INTO Object_Table;
FOR j IN 1..Object_Table.Count LOOP
EXECUTE IMMEDIATE 'DELETE FROM T_SEGMENT_BLOCKS WHERE OWNER=''' || PAR_OWNER || ''' AND SEGMENT_NAME=''' || PAR_SEGMENT_NAME || ''' AND NVL(PARTITION_NAME,''XXX'') = ''' || NVL(Object_Table(j).Subobject_name,'XXX') || '''';
COMMIT;
EXECUTE IMMEDIATE 'DELETE FROM T_PARALLEL_CHUNKS WHERE OWNER=''' || PAR_OWNER || ''' AND SEGMENT_NAME=''' || PAR_SEGMENT_NAME || ''' AND NVL(PARTITION_NAME,''XXX'') = ''' || NVL(Object_Table(j).Subobject_name,'XXX') || '''';
COMMIT;

EXECUTE IMMEDIATE 'SELECT /*+ NO_PARALLEL(A) */ * FROM SYS.V_DB_EXTENTS' || strSrcDb || ' A WHERE OWNER=''' || PAR_OWNER || ''' AND SEGMENT_NAME=''' || PAR_SEGMENT_NAME || ''' AND NVL(PARTITION_NAME,''XXX'')=''' || NVL(Object_Table(j).Subobject_name,'XXX') || ''' ORDER BY REL_FNO,BLOCK_ID,BLOCKS'
BULK COLLECT INTO Extent_table;
FOR i IN 1..Extent_table.Count LOOP
nBlks:=Extent_table(i).BLOCKS;
nBlkid:=Extent_table(i).BLOCK_ID;
WHILE nBlks>0 LOOP
INSERT INTO T_SEGMENT_BLOCKS VALUES (njobid,Extent_Table(i).Owner,Extent_Table(i).Segment_Name,Extent_Table(i).Partition_Name,
Extent_Table(i).Rel_Fno,nBlkid,(nBlkid+LEAST(nBlks,nBlkPerJob-nBlkAssigned))-1,Object_Table(j).Object_Id);
ntemp:=LEAST(nBlks,nBlkPerJob-nBlkAssigned);
nBlkid:=nBlkid+ntemp;
nBlkAssigned:=nBlkAssigned+nTemp;
nBlks:=nBlks-ntemp;
IF nBlkAssigned=nBlkPerJob THEN
nJobid:=nJobid+1;
nBlkAssigned:=0;
END IF;
END LOOP;
END LOOP;
COMMIT;
INSERT INTO T_PARALLEL_CHUNKS SELECT CHUNK_ID,OWNER,SEGMENT_NAME,PARTITION_NAME,
DBMS_ROWID.rowid_create(1,Object_table(j).Object_id,REL_FNO,START_BLOCKID,0),DBMS_ROWID.rowid_create(1,OBJECT_ID,REL_FNO,END_BLOCKID+1,0)
FROM T_SEGMENT_BLOCKS WHERE OWNER=PAR_OWNER AND SEGMENT_NAME=PAR_SEGMENT_NAME AND NVL(PARTITION_NAME,'XXX') LIKE NVL(Object_table(j).subobject_name,NVL(PARTITION_NAME,'XXX'));
COMMIT;
END LOOP;
END;
/

Usage

March 04, 2010 - 3:24 am UTC

Reviewer: Emre Yavuz from TURKEY

First execute the procedure to get the chunks. In this case it will create 64 chunks each of which consists of multiple rowid ranges.

EXEC P_CREATE_TABLE_CHUNKS(NULL,'SUBS','SUBSCRIBERS',NULL, 64,1);

Now lets query the records for one of the chunks.
Here the main problem is to access the source table over ROWIDs , not over a full table scan, so a ROWID hint will be necessary for most of the times.

SELECT /*+ ORDERED ROWID(B) USE_NL(A B) */ *
FROM T_PARALLEL_CHUNKS A, SUBS.SUBSCRIBER B
WHERE B.ROWID >= A.START_ROWID
AND B.ROWID < A.END_ROWID AND A.OWNER='SUBS'
AND A.SEGMENT_NAME = 'SUBSCRIBER'
AND CHUNK_ID = 1;

For UPDATE and DELETE s a PLS/SQL block is necessary to make use of ROWID access. i.e :

DECLARE
BEGIN
FOR I IN (SELECT * FROM T_PARALLEL_CHUNKS WHERE OWNER='SUBS' AND SEGMENT_NAME = 'SUBSCRIBER' AND CHUNK_ID=1) LOOP
UPDATE /*+ ROWID(A) */ SUBS.SUBSCRIBER A SET SUBS_TYPE='01'
WHERE A.ROWID>=I.START_ROWID AND A.ROWID<I.END_ROWID;
COMMIT;
END LOOP;
END;

Now you can parallelize the queries using a loop by incrementing CHUNK_ID s.

Hope you can enjoy...
Regards...


Usefulness of p_iters parameter

August 14, 2010 - 3:22 am UTC

Reviewer: SV from San Jose, CA

In the procedure simulation what exactly does parameter p_iters does, what value should be passed to it and based on what condition?
Tom Kyte

Followup  

August 19, 2010 - 12:45 am UTC

in my very specific example (it was my code - for a simulation I ran) - p_iters told the stored procedure how many times to iterate - in my case it told it how many rows to insert since that is what I was simulating.


You would use entirely different parameters of course, you would be passing via that table any parameters your stored procedures run by the job queues would need to have access to.

Splitting dataset into Multiple Batches

November 05, 2010 - 3:57 am UTC

Reviewer: Kamesh from Bay Area, CA, USA

I have used LAG analytical function to split based on the derived value. However, I was not 100% successful with my requirement. But it may be useful.

My Requirement is as follows.

1) I have dataset for example 500 records which contains multiple lines for each order. Say, there are 7 Orders as follows.
Order# # of Lines
----------------------
123 44
345 40
567 57
789 110
987 96
654 104
321 49

Please remember that those are number of lines for each order #, which means if we select entire data we will get 500 Records not 7.

Now, I want to split the data by 100 records, so that I can submit each batch parallely using another concurernt program. But the split should happend based on Order. I do not want to split the Order. If Order has more than 100 lines though I want 100 lines per batch, in this case I want all 100+ lines in one batch.

The output batches should be as follows for above example.

New Batch Order# # of Lines
-----------------------------------
Yes 123 44
No 345 40
No 567 57 (It's also not a new batch
though the count crossed 100
as I don't want to send few
lines of one order in two
differnet batches. Hence all
141 lines should go into one
batch)
Yes 789 110
Yes 987 96
No 654 104(As above order has only 96,
it should proceed with next
order)
Yes 321 49 (As this is last batch and less
than 100 just submit this as
one batch).

Can this requirement achievable with one single SQL Query?????

Thanks

November 05, 2010 - 4:03 am UTC

Reviewer: Kamesh from Bay Area, CA, USA

Below is the query....that I am trying...which was not 100% successful...

_____________________________________________

SELECT CASE WHEN rn = 1 THEN
'Y'
ELSE
CASE WHEN new_order='Y' THEN
CASE WHEN LAG(b.cumulative_count) OVER (ORDER BY order_number) >= 100 THEN
'Y'
ELSE
CASE WHEN LAG(b.cumulative_count) OVER (ORDER BY order_number) = 1 THEN
'Y'
ELSE
'N'
END
END
ELSE
CASE WHEN b.cumulative_count >=100 AND new_order='Y' THEN
'Y'
ELSE
'N'
END
END
END new_batch,b.new_order,b.order_number, b.line_id, b.part_number, b.priority, b.department, b.quantity, b.cost,b.ship_confirmed_date,
b.delivery_detail_id, b.unique_id,b.use_tax,b.header_id,b.transactional_curr_code,b.org_id,b.list_cost,b.oa_country,b.oa_geocode,
b.address_id,b.st_country,b.st_state,b.st_county,b.st_city,b.st_postal_code,b.st_geocode,b.location_id,b.sf_country,b.region_2,
b.sf_postal_code
FROM (SELECT a.*,COUNT(a.rn) OVER (PARTITION BY order_number ORDER BY rn) cumulative_count
FROM (SELECT cio.order_number, cio.line_id, cio.part_number, cio.priority, cio.department, cio.quantity, cio.cost,
cio.ship_confirmed_date,cio.delivery_detail_id, cio.unique_id,use_tax,
CASE WHEN (cio.order_number = LAG (cio.order_number) OVER (ORDER BY cio.order_number)) THEN -- DO NOT CHANGE ORDER BY
'N'
ELSE
'Y'
END new_order, row_number() OVER(ORDER BY cio.order_number) rn,
attributes.header_id,attributes.transactional_curr_code,attributes.org_id,
attributes.list_cost,attributes.oa_country,attributes.oa_geocode ,attributes.address_id,
attributes.st_country,attributes.st_state,attributes.st_county,attributes.st_city,
attributes.st_postal_code,attributes.st_geocode,attributes.location_id,attributes.sf_country,
attributes.region_2,attributes.sf_postal_code
FROM xx_oe_orders cio,
(SELECT wdd.delivery_detail_id,oola.line_id,ooha.order_number,ooha.header_id,ooha.transactional_curr_code,oola.org_id,
oola.unit_list_price * oola.ordered_quantity list_cost,
asp.default_country oa_country,NVL (asp.sales_tax_geocode, 0000) oa_geocode,address.cust_acct_site_id address_id,
address.country st_country,DECODE(address.country,'US',address.state,'') st_state,address.county st_county,
address.city st_city,
DECODE (SUBSTR (address.postal_code, 1, 5),'28255', '28254',SUBSTR (address.postal_code, 1, 5)) st_postal_code,
REPLACE (REPLACE (REPLACE (SUBSTR (address.postal_code, 6), '-'), ' '),' ') st_geocode,
hl.location_id,hl.country sf_country,hl.region_2,SUBSTR (hl.postal_code, 1, 5) sf_postal_code
FROM apps.oe_order_headers_all ooha, apps.oe_order_lines_all oola, apps.ar_system_parameters_all asp,
apps.wsh_delivery_details wdd,apps.hr_locations hl,
(SELECT hcsua1.site_use_code,hp1.party_name,hca1.account_number,hp1.party_id,hca1.cust_account_id,hps1.party_site_id,
hcasa1.cust_acct_site_id,hcsua1.site_use_id,hl1.location_id,hl1.address1,hl1.address2,hl1.address3,hl1.city,
hl1.postal_code,hl1.state,hl1.province,hl1.country,hl1.county
FROM apps.hz_cust_acct_sites_all hcasa1, apps.hz_cust_site_uses_all hcsua1,
apps.hz_party_sites hps1, apps.hz_locations hl1,apps.hz_cust_accounts hca1, apps.hz_parties hp1
WHERE hcsua1.cust_acct_site_id = hcasa1.cust_acct_site_id
AND hps1.party_site_id = hcasa1.party_site_id
AND hl1.location_id = hps1.location_id
AND hcasa1.cust_account_id = hca1.cust_account_id
AND hp1.party_id = hca1.party_id) address
WHERE ooha.header_id = oola.header_id
AND oola.org_id = asp.org_id
AND asp.set_of_books_id = fnd_profile.value('GL_SET_OF_BKS_ID')
AND address.site_use_id = oola.ship_to_org_id
AND hl.location_id = wdd.ship_from_location_id) attributes
WHERE cio.delivery_detail_id IS NOT NULL
AND cio.use_tax IS NULL
AND cio.order_number IN ('44953668','42798765','44953668','45420685','46108299','47995168','48380513','48682387')
AND attributes.order_number = cio.order_number
AND attributes.line_id = cio.line_id
AND attributes.delivery_detail_id = cio.delivery_detail_id
ORDER BY cio.order_number) a) b
_____________________________________________

dbms_schedule

August 04, 2011 - 4:11 pm UTC

Reviewer: a reader from Boston USA

Hi Tom,

Thank you very much for showing some techniques using jobs to do parallel processing. I want to follow up on the thread for the following situation.

If I have a job created by the new dbms_schedule package running a PL/SQL procedure, and when I use dbms_scheduler.stop_job to stop it, how would the PL/SQL procedure react to the stop process, receiving an exception? This also applies to that a job runs within a window, and when it reaches the end of window, scheduler would stop the job, but how gracefully would the PL/SQL procedure be terminated?


My own test shows the PL/SQL procedure doesn't receive an exception. If that is the case, that would leave the PL/SQL procedure no chance to do some cleanup, and it would stay in unpredicable state.

What's your thought?

As always thank you very much in advance!

My simple test case:

create table test_job (start_dt date, end_dt date, JOB_NM VARCHAR2(30));

CREATE OR REPLACE PROCEDURE job_proc (in_job_nm VARCHAR2)
AS
BEGIN
INSERT INTO test_job (start_dt, job_nm) VALUES (sysdate, in_job_nm);
--the infinite loop is just to give me enough time to stop the job
loop
null;
end loop;
UPDATE test_job SET end_dt = SYSDATE WHERE job_nm = in_job_nm;
COMMIT;
exception
when others then
UPDATE test_job SET end_dt = SYSDATE WHERE job_nm = in_job_nm;
commit;
END;
/


exec dbms_scheduler.create_job(job_name=>'my_job', job_type=>'PLSQL_BLOCK',
job_action=>'begin JOB_PROC(''my_job'');end;', number_of_arguments=>0, auto_drop=>true, enabled=>false);

exec dbms_scheduler.enable('MY_JOB');

exec dbms_scheduler.stop_job('MY_JOB');

select * from test_job; --no rows returned
Tom Kyte

Followup  

August 04, 2011 - 7:21 pm UTC

we just sort of kill the job - your code - the job queue handles it gracefully, you just get killed and rolled back.

hence: you NEVER want to commit in job, you will not be restartable.

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.