Skip to Main Content
  • Questions
  • Parallel union all, rollup, cube degradation while insert (11.2.0.4)

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Evgeny.

Asked: May 26, 2017 - 1:21 pm UTC

Last updated: June 06, 2017 - 1:58 am UTC

Version: 11.2.0.4

Viewed 1000+ times

You Asked

the problem is that query like this:(table creation script at the end)
  insert /*+no_append parallel(8)*/ into tmp_px0
  select count(distinct rn) from tmp_px1
  union all
  select count(distinct rn) from tmp_px2
  union all
  select count(distinct rn) from tmp_px3

uses 2*8*(number of union subqueries) px servers, while all union operations runs consequently(according to sql monitor(OEM) and v$px_session view)
(haven't found any reliable way to check how much parallel servers were used in already executed query(except parallel tab in sql monitor)
to check this in currently running query used this)
  select sq.SQL_TEXT, ps.QCSID, ps.QCSERIAL#, count(*) from v$px_session ps
  join v$session s on ps.SID=s.SID and ps.SERIAL#=s.SERIAL#
  join (select SQL_ID, max(SQL_TEXT) as SQL_TEXT from v$sql group by SQL_ID) sq on sq.SQL_ID=s.SQL_ID
  group by sq.SQL_TEXT, ps.QCSID, ps.QCSERIAL#;

question is - is there any workaround to use 2*8 total px process instead of 2*8*(number of union subqueries), without dividing insert into several inserts(witch is not always possible)

according to v$px_session px servers allocates consequently:processing each union adds 16 servers in v$px_session, so total count of used servers grows while query runs:16-32-48
in sql_monitor(parallel tab) at the end of execution they are grouped in 3 groups, each of 2 set of 8 process(all distinct)
since database time of these groups are equal, it seems that only 16 of them are active at any time

main problem is that if there are not enought free px servers or there are resource manager limit,
remaining union subqueries will be executed serial - witch may takes a lots of time,
and all this time all allocated parallel servers where locked by this query - so other queries of this resource group(or whole server if resource manager is not properly set) had to run serial

it doesn't reproduce for select without insert:runs each union consequently, uses 16 parallel servers(2 set of 8 process each) all the time query executes
  select /*+parallel(8)*/count(distinct rn) from tmp_px1
  union all
  select /*+parallel(8)*/count(distinct rn) from tmp_px2
  union all
  select /*+parallel(8)*/count(distinct rn) from tmp_px3;

but reproduces with materialized subquery and rollup,cube(which uses temporary materialized subquery inside)
while populating internal temporary table count of servers grows same way:16-32-48
  with tmp as (select /*+materialize*/ * from (
              select count(distinct rn) from tmp_px1
              union all
              select count(distinct rn) from tmp_px2
              union all
              select count(distinct rn) from tmp_px3
              ))
  select /*+parallel(8)*/* from tmp;

it doesn''t depend on whether dml is parallel or not(append hint with enabled parallel dml/no_append hint):
parallel dml add 8 addition px servers for insert, so total count grows: 24-40-56

test tables creation(made them big enought so single select runs at least a few seconds, so there are time to check v$px_session while each select runs)
drop table tmp_px0;
drop table tmp_px1;
drop table tmp_px2;
drop table tmp_px3;
create table tmp_px0(rn number);
create table tmp_px1(rn number);
create table tmp_px2(rn number);
create table tmp_px3(rn number);

begin
  for ii in 1..10 loop
    insert into tmp_px1 select 10000000*ii+rownum as rn from dual connect by rownum<10000000;
    commit;
  end loop;
end;
call dbms_stats.gather_table_stats(ownname => 'TMP', tabname =>'TMP_PX1', degree => 8, cascade => true);

insert into tmp_px2 select * from tmp_px1;
commit;
call dbms_stats.gather_table_stats(ownname => 'TMP', tabname =>'TMP_PX2', degree => 8, cascade => true);
insert into tmp_px3 select * from tmp_px1;
commit;
call dbms_stats.gather_table_stats(ownname => 'TMP', tabname =>'TMP_PX3', degree => 8, cascade => true);


hope you give me some advice
Thanks

and we said...

Evgeny,

The behavior you are seeing is expected in Oracle Database 11g.

Here is what is happening.

When executing a UNION ALL statement in parallel, in 11g, each branch is executed one at a time by a set of parallel server processes. This means each branch will use a new parallelizer. See https://blogs.oracle.com/datawarehousing/px-server-sets%2c-parallelizers%2c-dfo-trees%2c-parallel-groups%2c-lets-talk-about-terminology for more information on what a parallelizer is.

Typically the first branch will start executing and will allocate 2*DOP parallel server processes. Once the branch completes, the parallel server processes will be released. Then the second branch will do the same, and so on. When you run the just the query part of your statement and look at the Parallel tab in the SQL Monitor report you will see that the same set of parallel server processes get reallocated for each of different branches (that is if the same parallel server processes are available at that time of course). Therefore there is no parallel server processes accumulation, as at one point in time there will only be 2*DOP parallel server processes allocated.

However, when you introduce parallel DML things change. When there is a transaction involved, none of the parallel server processes involved in executing the statement are released until the user commits.

Therefore when a parallel DML statement involves a UNION ALL the execution will begin the same way, with the first branch allocating 2*DOP parallel server processes. However, when the first branch finishes the parallel server processes will not be released. The second branch will then allocate another 2*DOP parallel server processes, and so on.

So, with 3 branches, each with 2 sets of parallel server processes (producers and consumers) and another set of parallel server processes executing the INSERT, you will see (3*2*DOP)+DOP parallel server processes allocated.

For your example, where the DOP is 8 and we have 3 branches in the UNION ALL statement, the total number of parallel server processes allocated will be (3*2*8)+8 or 56.

This is exactly what I see when I execute your test case on 11g and query v$px_process or check the parallel tab in SQL Monitor https://sqlmaria.com/11g_px_allocation/ (4 sets or groups of parallel server process)

SQL> SELECT * FROM V$PX_PROCESS;

SERV STATUS           PID SPID                            SID    SERIAL#
---- --------- ---------- ------------------------ ---------- ----------
P009 AVAILABLE         34 4079
P034 AVAILABLE         59 4129
P006 AVAILABLE         31 4073
P037 AVAILABLE         62 4135
P039 AVAILABLE         64 4139
P003 AVAILABLE         28 4067
P007 AVAILABLE         32 4075
P018 AVAILABLE         43 4097
P044 AVAILABLE         69 4149
P005 AVAILABLE         30 4071
P050 AVAILABLE         75 4161
P051 AVAILABLE         76 4163
P015 AVAILABLE         40 4091
P055 AVAILABLE         80 4171
P054 AVAILABLE         79 4169
P028 AVAILABLE         53 4117
P021 AVAILABLE         46 4103
P043 AVAILABLE         68 4147
P004 AVAILABLE         29 4069
P031 AVAILABLE         56 4123
P010 AVAILABLE         35 4081
P016 AVAILABLE         41 4093
P017 AVAILABLE         42 4095
P025 AVAILABLE         50 4111
P020 AVAILABLE         45 4101
P000 AVAILABLE         18 4032
P036 AVAILABLE         61 4133
P008 AVAILABLE         33 4077
P002 AVAILABLE         27 4065
P040 AVAILABLE         65 4141
P022 AVAILABLE         47 4105
P042 AVAILABLE         67 4145
P029 AVAILABLE         54 4119
P026 AVAILABLE         51 4113
P033 AVAILABLE         58 4127
P011 AVAILABLE         36 4083
P049 AVAILABLE         74 4159
P045 AVAILABLE         70 4151
P032 AVAILABLE         57 4125
P046 AVAILABLE         71 4153
P041 AVAILABLE         66 4143
P027 AVAILABLE         52 4115
P014 AVAILABLE         39 4089
P047 AVAILABLE         72 4155
P053 AVAILABLE         78 4167
P024 AVAILABLE         49 4109
P035 AVAILABLE         60 4131
P013 AVAILABLE         38 4087
P052 AVAILABLE         77 4165
P019 AVAILABLE         44 4099
P023 AVAILABLE         48 4107
P038 AVAILABLE         63 4137
P030 AVAILABLE         55 4121
P012 AVAILABLE         37 4085
P048 AVAILABLE         73 4157
P001 AVAILABLE         19 4034

56 rows selected.


Unfortunately, there is no way around this is 11g.

Starting in Oracle Database 12c Release 1 the way we execute UNION ALL statements in parallel has completely changed with the introduction of concurrent UNION ALL, which is on by default if OPTIMIZER_FEATURE_ENABLED set to 12.1 or higher. Only 2*DOP parallel server processes will be allocated, which is exactly what I saw when I executed you testcase in 12c and query v$px_process or check the parallel tab in SQL Monitor https://sqlmaria.com/12c_px_allocation/ (1 set of parallel server processes)

</code>
SQL> SELECT * FROM V$PX_PROCESS;

SERV STATUS PID SPID SID SERIAL# IS_GV CON_ID
---- --------- ---------- ------- ---------- ---------- ----- ---
P000 AVAILABLE 34 3904 FALSE 0
P001 AVAILABLE 35 3906 FALSE 0
P002 AVAILABLE 43 3926 FALSE 0
P003 AVAILABLE 44 3928 FALSE 0
P004 AVAILABLE 45 3930 FALSE 0
P005 AVAILABLE 46 3932 FALSE 0
P006 AVAILABLE 47 3934 FALSE 0
P007 AVAILABLE 48 3936 FALSE 0
P008 AVAILABLE 50 4061 FALSE 0
P009 AVAILABLE 53 4063 FALSE 0
P00A AVAILABLE 54 4065 FALSE 0
P00B AVAILABLE 55 4067 FALSE 0
P00C AVAILABLE 56 4069 FALSE 0
P00D AVAILABLE 57 4071 FALSE 0
P00E AVAILABLE 58 4073 FALSE 0
P00F AVAILABLE 59 4076 FALSE 0

16 rows selected.
</code>

More information on the new concurrent UNION ALL can be found at http://http://docs.oracle.com/database/122/VLDBG/types-parallelism.htm#VLDBG1472

Rating

  (1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

thank you for your reply

Evgeny Bunakov, June 02, 2017 - 8:04 am UTC

second part of article you linked is very close to what i've been searching for, never met term "parallelizer" before
https://blogs.oracle.com/datawarehousing/multiple-parallelizers

However, when you introduce parallel DML things change. When there is a transaction involved, none of the parallel server processes involved in executing the statement are released until the user commits. 

very valuable information, seems it applies to serial dml with parallel subqueries too(alter session disable parallel dml or no_append hint)

And seems the only workaround is to make parallel_max_servers big enough(since parallelizers are nonconcurrent, that shouldn't increase the overall load by such dml)
and set Resource Manager to limit parallel resources only for querying activities
Connor McDonald
June 06, 2017 - 1:58 am UTC

or upgrade to 12 :-)

More to Explore

Performance

Get all the information about database performance in the Database Performance guide.