Hello Connor/Chris,
I am trying to make sense of the flow of processing that I witnessed yesterday when we were running a stored procedure in production using DBMS_PARALLEL_EXECUTE.
My procedure code that uses DBMS_PARALLEL_EXECUTE is something like this
-- Create the TASK
l_taskname := 'TESTTASK'||to_char(sysdate,'DDMONYYYYHH24MI');
DBMS_PARALLEL_EXECUTE.CREATE_TASK (l_taskname);
-- Chunk the table by NUM_COL
l_chunk_sql := 'select min(rec_id) from_rec, max(rec_id) to_rec
from (
select ntile(10) over (partition by hdr_id order by rec_id) grp, rec_id, hdr_id
from REF_TABLE
where hdr_id = '||p_hdrid||'
)
group by grp
order by grp' ;
DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_SQL(TASK_NAME => l_taskname, SQL_STMT => l_chunk_sql, BY_ROWID => false);
--
l_sql_stmt := 'BEGIN PKG_TEST.PRC_TESTPROC(p_hdrid => '||p_hdrid||', p_from_rec => :start_id, p_to_rec => :end_id); END;';
DBMS_PARALLEL_EXECUTE.RUN_TASK(TASK_NAME => l_taskname, SQL_STMT => l_sql_stmt, LANGUAGE_FLAG => DBMS_SQL.NATIVE, PARALLEL_LEVEL => 16);
--
-- Check status and report errors if any
l_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS(l_taskname);
IF l_status = 'FINISHED_WITH_ERROR' OR l_status = 'CRASHED' THEN
g_err_msg := 'Error running task '||l_taskname ;
logit(p_runid, NULL, g_err_msg);
END IF;
-- Set session identifider
dbms_application_info.set_module(module_name => NULL, action_name => NULL);
dbms_application_info.set_client_info(NULL);
logstatus(p_runid, NULL, 'E');
Yesterday, we ran this procedure in production database (4-node RAC, 11.2.0.4).
To my surprise, the line of code
logstatus(p_runid, NULL, 'E'); got executed BEFORE all the chunks finished processing. The difference was a handful of seconds but I was not expecting this behaviour.
I checked the documentation
https://docs.oracle.com/cd/E11882_01/appdev.112/e40758/d_parallel_ex.htm#ARPLS67375 and it clearly appears to say this
This procedure returns only when all the chunks are processed. In parallel cases, this procedure returns only when all the Job slaves finished.So how could RUN_TASK call have completed before all chunks got processed? In all non-production databases where we tested this, we never observed this behaviour and it always worked as expected i.e. RUN_TASK call waiting for all chunks to be processed before the control is returned to the procedure.
I am at a loss about how to explain this. More importantly, because it happened only in production RAC database, it makes it very difficult for me to try to reproduce this and hence provide a working test case.
Would you know what could have caused this? Any known bug or some logical reason?
Thanks in advance