Skip to Main Content

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Biswaranjan.

Asked: May 26, 2017 - 12:30 pm UTC

Last updated: February 03, 2023 - 6:39 am UTC

Version: 10g

Viewed 10K+ times! This question is

You Asked

How to frame the select query which will divide the result set into required chunks by using CREATE_CHUNKS_BY_SQL by_rowid => true.
For example:

i have a table contains 100 records and my select query fetches after filtration 50 records, i want to create 5 chunks which contains
10 records each.I want to delete those 50 records fetched by my select query with 5 chunks and with corresponding start_id and end_id.
By using below query i want to delete the fetched 50 records :

L_sql_string := 'delete <table_name> WHERE rowid BETWEEN :start_id AND :end_id';

DBMS_PARALLEL_EXECUTE.run_task(task_name => L_task_name,
sql_stmt => L_sql_string,
language_flag => DBMS_SQL.NATIVE,
parallel_level => 10);

Kindly assist.

Thanks in advance!!

and Connor said...

For a simple delete like that, all you need do is:

begin
  dbms_parallel_execute.create_chunks_by_rowid(task_name   => 'my_task',
                                               table_owner => 'OWN',
                                               table_name  => 'MY_TABLE',
                                               by_row      => true,
                                               chunk_size  => ???);
end;


which gives a rowid distribution across the entire table. Then when you create the *task*, you can add the other criteria, ie,

delete MY_TABLE WHERE rowid BETWEEN :start_id AND :end_id
and COL1 = 123
and COL2 = 456
etc etc

Rating

  (5 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Georgi, November 05, 2021 - 3:37 pm UTC

Hi Connor,

What about not chunking the entire table, but only the result from the WHERE clause ?
If we have a table with 1 billion records, but we want to delete only 10 million, it would be nice to chunk only those 10 million and not the entire table so we can process only the 10 million. DBMS_PARALLEL_EXECUTE processes the chunks for the entire table and then applies the WHERE clause.
Is there a way around this ? I've watched your video about making DBMS_PARALLEL_EXECUTE even faster, but I can't seem to find a solution. How to chunk only the result of the records that I actually want tot delete ?

Thanks !
Connor McDonald
November 09, 2021 - 3:55 am UTC

If you have a numeric range, then you can pass any SQL you like into DBMS_PARALLEL_EXECUTE, eg

declare
  l_sql clob;
begin
  l_sql := 'select col1, col2 from bigtable where ....';
  dbms_parallel_execute.create_chunks_by_sql(task_name => 'my_task',
                                             sql_sql  => l_sql,
                                             by_rowid  => false);
end;
/

Georgi, November 09, 2021 - 2:06 pm UTC

Unfortunately I do not have a sequential numeric column. What I have is a DATE column, but I guess it won't work with that. My plan was to use ROWID in something like this:
DECLARE
    l_sql_1                CLOB;
    l_sql_2                CLOB;
BEGIN
    l_sql_1 := q'| select min(rid) start_id, max(rid) end_id |' || 
               q'| from ( |' || 
               q'| select rowid rid, ntile(1000) over(order by rowid) nt |' || 
               q'| from HUGE_TABLE |' || 
               q'| where INSERT_DATE between to_date('01-SEP-18') and to_date('01-OCT-18') |' || 
               q'|      ) |' || 
               q'| group by nt  |';

    l_sql_2 := 'delete from HUGE_TABLE where rowid between :start_id and :end_id ';
    
    dbms_parallel_execute.create_task('TEST_TASK');
    
    dbms_parallel_execute.create_chunks_by_sql(
    task_name   => 'TEST_TASK',
    sql_stmt    => l_sql_1,
    by_rowid    => true
);  
    
    dbms_parallel_execute.run_task(
        task_name        => 'TEST_TASK',
        sql_stmt         => l_sql_2,
        language_flag    => dbms_sql.native,
        parallel_level   => 16
    );
    
END;

My concern is that between min(rid) and max(rid) it may include ROWIDs which should not be deleted.

In general, on a monthly basis I have to delete this (and many others) table which has over 1 billion records. The records for one month are ~20 million.
This is why I am looking for a way to chunk those ~20 million and use dbms_parallel_execute for the deletion.

Connor McDonald
November 10, 2021 - 4:11 am UTC

No overlaps with other segments are not an issue because we won't scan an extent that does not belong to the table, eg

SQL> create table t ( x char(1000));

Table created.

SQL> create table t1 ( x char(1000));

Table created.

SQL>
SQL> begin
  2  for i in 1 .. 40
  3  loop
  4    insert into t select 'x' from dual connect by level <= 10;
  5    insert into t1 select 'x' from dual connect by level <= 10;
  6  end loop;
  7  end;
  8  /

PL/SQL procedure successfully completed.

SQL>
SQL> select segment_name, block_id
  2  from   dba_extents
  3  where  segment_name in ('T','T1')
  4  and owner = user
  5  order by 2;

SEGMENT_NAME                     BLOCK_ID
------------------------------ ----------
T                                   37640
T1                                1031760
T                                 1031768
T1                                1031776
T                                 1031784
T1                                1031792
T                                 1031800
T1                                1058048
T                                 1058056
T1                                1058064
T                                 1058072
T1                                1058080
T                                 1058088
T1                                1058096
T                                 1058112
T1                                1058120

16 rows selected.

SQL>
SQL> select min(rowid), max(rowid) from t;

MIN(ROWID)         MAX(ROWID)
------------------ ------------------
AAEULoABAAAAJMLAAA AAEULoABAAAECVHAAG

SQL> select min(rowid), max(rowid) from t1;

MIN(ROWID)         MAX(ROWID)
------------------ ------------------
AAEULpABAAAD75TAAA AAEULpABAAAECVPAAG

SQL>
SQL> select count(*) from t;

  COUNT(*)
----------
       400

SQL>
SQL> select count(*) from t
  2  where rowid >= (select min(rowid) from t )
  3  and   rowid <= (select max(rowid) from t );

  COUNT(*)
----------
       400


Georgi, November 10, 2021 - 7:53 am UTC

Thanks for the example !

But is this valid if we include a WHERE clause ? Are there going to be overlaps and scanning extents that does not belong to the table ?

eg:
select min(rid) start_id, max(rid) end_id
from (
    select rowid rid, ntile(1000) over(order by rowid) nt
    from HUGE_TABLE
    where INSERT_DATE between to_date('01-SEP-18') and to_date('01-OCT-18')
    )
group by nt;

Connor McDonald
November 11, 2021 - 5:42 am UTC

Yes but in your *processing* code (ie, the thing that you call from dbms_parallel_exeucte) you would have


select ...
from huge_table
where rowid between :start_id and :end_id
and INSERT_DATE between to_date('01-SEP-18') and to_date('01-OCT-18') <<<=====

thus ensuring you only get rows of interest


RUN_TASK call does not wait for all jobs processing chunks to finish??

Narendra, January 27, 2023 - 9:57 am UTC

Hello Connor/Chris,

I am trying to make sense of the flow of processing that I witnessed yesterday when we were running a stored procedure in production using DBMS_PARALLEL_EXECUTE.
My procedure code that uses DBMS_PARALLEL_EXECUTE is something like this

-- Create the TASK
l_taskname :=  'TESTTASK'||to_char(sysdate,'DDMONYYYYHH24MI');
DBMS_PARALLEL_EXECUTE.CREATE_TASK (l_taskname);
-- Chunk the table by NUM_COL
l_chunk_sql := 'select min(rec_id) from_rec, max(rec_id) to_rec
    from (
       select ntile(10) over (partition by hdr_id order by rec_id) grp, rec_id, hdr_id
      from REF_TABLE
        where hdr_id = '||p_hdrid||' 
      )
      group by grp 
      order by grp' ;
DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_SQL(TASK_NAME => l_taskname, SQL_STMT  => l_chunk_sql, BY_ROWID  => false);
--
l_sql_stmt := 'BEGIN PKG_TEST.PRC_TESTPROC(p_hdrid => '||p_hdrid||', p_from_rec => :start_id, p_to_rec => :end_id); END;';
DBMS_PARALLEL_EXECUTE.RUN_TASK(TASK_NAME => l_taskname,  SQL_STMT => l_sql_stmt,  LANGUAGE_FLAG  => DBMS_SQL.NATIVE, PARALLEL_LEVEL => 16);
--
-- Check status and report errors if any
l_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS(l_taskname);
IF l_status = 'FINISHED_WITH_ERROR' OR l_status = 'CRASHED' THEN
 g_err_msg := 'Error running task '||l_taskname ;
 logit(p_runid, NULL, g_err_msg);
END IF;
-- Set session identifider
dbms_application_info.set_module(module_name => NULL, action_name => NULL);
dbms_application_info.set_client_info(NULL);
logstatus(p_runid, NULL, 'E');


Yesterday, we ran this procedure in production database (4-node RAC, 11.2.0.4).
To my surprise, the line of code logstatus(p_runid, NULL, 'E'); got executed BEFORE all the chunks finished processing. The difference was a handful of seconds but I was not expecting this behaviour.
I checked the documentation https://docs.oracle.com/cd/E11882_01/appdev.112/e40758/d_parallel_ex.htm#ARPLS67375 and it clearly appears to say this
This procedure returns only when all the chunks are processed. In parallel cases, this procedure returns only when all the Job slaves finished.

So how could RUN_TASK call have completed before all chunks got processed? In all non-production databases where we tested this, we never observed this behaviour and it always worked as expected i.e. RUN_TASK call waiting for all chunks to be processed before the control is returned to the procedure.
I am at a loss about how to explain this. More importantly, because it happened only in production RAC database, it makes it very difficult for me to try to reproduce this and hence provide a working test case.

Would you know what could have caused this? Any known bug or some logical reason?

Thanks in advance
Connor McDonald
January 31, 2023 - 7:01 am UTC

Certainly something I've never heard of occurring. I had a quick look in our bug database and could not find anything that seemed to match that behaviour.

I think you'll need to log a call with Support on that one. If you can get your 'PRC_TESTPROC' to log timestamps on completion as well as logging when run_task completes, that should give enough evidence to help them diagnose it.

Re: RUN_TASK call does not wait for all jobs processing chunks to finish??

Narendra, February 01, 2023 - 6:41 pm UTC

Hello Connor,

Thank you for taking time to help with this and my apologies for being silly.
If you can get your 'PRC_TESTPROC' to log timestamps on completion as well as logging when run_task completes, that should give enough evidence to help them diagnose it.

Your above comment pushed me to look at source code of my procedure again. In fact, this is exactly what I had done with original version of my procedure, which was single-threaded. In order to incorporate DBMS_PARALLEL_EXECUTE, I had ended up creating a wrapper procedure and while I managed to make all other necessary changes, I missed out to remove the logging of "Begin" and "End" status from the core procedure. This, combined with DBMS_PARALLEL_EXECUTE running core procedure concurrently in multiple sessions, meant the "End" status was incorrectly updated by one of the "chunks" BEFORE all the chunks finished processing.
There is a follow-up step in the chain that is designed to keep polling for "End" status and when found then it proceeds with remaining steps. Somehow this never happened/was observed in any of the non-production but caught us in production run, when the follow-up step ended up processing "incomplete" chunks.
On reflection, I feel silly about missing out on essential code changes and more importantly the "lazy design". I am thinking this would have been an (almost) perfect use case for DBMS_ALERT or even chained DBMS_SCHEDULER jobs.
Once again very sorry for asking question based on incorrect assumption. There is no issue with any of the DBMS_PARALLEL_EXECUTE API behaviour and after fixing my code, everything now works as expected.
Connor McDonald
February 03, 2023 - 6:39 am UTC

Glad you got to the bottom of it.

Thanks for letting us know.

More to Explore

DBMS_PARALLEL_EXECUTE

More on PL/SQL routine DBMS_PARALLEL_EXECUTE here