Thanks for the question, Satender.

Asked: March 13, 2019 - 3:46 pm UTC

Last updated: May 12, 2021 - 7:00 am UTC

Version: 11.2.0.3.0

Viewed 10K+ times! This question is

You Asked

Hi TOM,

I am having a data movement activity from one table to another table.
Due to Huge data size in both tables, i have planned to use DBMS_PARALLEL_EXECUTE functionality, so that i can finish it quickly. I am creating chunks based on ROWID.
But the challenge is the table from which i am creating the Chunks is Partitioned with 562 partitions.
Now when i am running the this it is creating 96718 chunks which is very huge and is taking 6 hours to complete. But this much time i can't allow the system to run this activity.
Please note some of the below param responsible for this much chunks.
In in view DBMS_PARALLEL_EXECUTE_EXTENTS i am getting 96718 (same as number of chunks) records with Block size 16 and all rows have segment type as TABLE PARTITION.
AVG_ROW_LEN - 237
DB_BLOCK_SIZE - 8192
CHUNK_SIZE - 10000

Please help me out how can i solve this as another table which is not partitioned created 7568 chunks and moved 56 M data in 1 hour, but for this one i am facing challenges.

Thanks and Regards
Satender D

and Chris said...

You're creating 96,718 chunks?! Yikes!

Does your database have 100,000 CPUs? I seriously doubt it...

Each parallel process needs CPU to run. So when the number of processes >> numbers of CPUs/cores, you're going to have some serious queueing going on. And starve everything else the database is trying to do of resources (assuming you haven't limited the number of parallel processes).

You need each chunk to process significantly more rows. And have far fewer parallel chunks running.

7568 chunks and moved 56 M data in 1 hour

56 M => million? It depends on what exactly you're doing, but I'd say an hour is still a long time to process that size of data.

Again, I'd be impressed if your database has 7,000+ cores available.

Try with a much smaller number of chunks. Like 10.

If you need more help, please share a test case showing exactly what you're doing:

- create tables
- insert into for some sample data (doesn't need to be much, just enough so we can understand what you're doing)
- the code for your parallel processing

Rating

(11 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

little big (commit) world

Racer I., March 15, 2019 - 6:51 am UTC

Hi,

Caveat : never used DBMS_PARALLEL_EXECUTE (DPE), but thought about it some. We use our own batching and AQ frameworks. Also some experience with parallel pipelining (aka Oracle forking).

> assuming you haven't limited the number of parallel processes).
> Try with a much smaller number of chunks. Like 10.

Weird. I'd have thought creating lots of chunks (of 1000 rows each or so) and setting a reasonable parallel degree (say 16) would be the regular way to drive DPE? That is never let it flood the system with running all chunks in parallel. That would have the machine flailing spastically.

I guess this might also be related to the recommendation to commit rarely. 10 chunks = 10 commits. If so I'm still not convinced if this is sustainable in a mixed OLTP/batch system. Keeping locks so long doesn't sound right.

Running 10 (or 16) chunks AND commiting every 1000 rows or so might be possible (basically the way our pl/sql batch frameworks work). There's still the possibility the workload is not entirely even so the total running time is determined by the biggest/slowest chunk. With smaller chunks and X workers (basically our java and AQ frameworks) all workers fill finish at about the same time and thus the machine is used more evenly and the batch finishes faster.

regards,

March 15, 2019 - 7:09 am UTC

I'd have thought creating lots of chunks (of 1000 rows each or so) and setting a reasonable parallel degree (say 16) would be the regular way to drive DPE?

It depends on what your goal for using DPE is.

If it's "run the process as fast as possible" then you want as many chunks (N) as your resources allow, with each handling num_rows/N rows.

If it's "trickle feed a background process using as few resources as possible", then, yes processing a "small" number of rows (e.g. 10k) with few parallel chunks (N) is a valid approach. Here the process could run for many hours though, which is the problem for the OP. To combat this they need to have fewer chunks processing more rows each.

workers of AQ, unite

Racer I., March 15, 2019 - 7:32 am UTC

Hi,

> few parallel chunks (N)

Mmmh. There is maybe a different use of chunk than what I'm thinking of in play. I would use three numbers :

X number of rows to process (say 1 million)
Y number of chunks (with chunksize 10.000 that would be 100)
Z number of parallel processes (say 16, parallel_level parameter to run_task, lets call them workers)

So 16 workers start processing the first 16 chunks. As each finishes it picks the next unprocessed chunk. Ends when the last chunk is processed. Here each would process about 7 chunks one after another.
The runtime for 16 chunks (70.000 rows or so) and and 16 workers would be about the same if the processing is perfectly uniform. If it isn't the slowest chunk problem appears. With smaller chunks no worker falls idle prematurely.
Typical case : using simple chunking with say id-ranges of 10000 each when not all IDs in each range exist or some are filtered out by some logic.
This can be addressed (for example PRECENTILE_CONT over the actual filtered statement) but with smaller chunking you wouldn't need to. Our own frameworks use the actual rows as input so don't have the problem of uneven effective chunk sizes. But their code is usually that much more complex than a simple INSERT as to make the processing time uneven even so.

That brings another question to the fore :
> I am having a data movement activity from one table to another table.
Why not do this with a simple parallel INSERT AS SELECT? Unless there are good reasons against nothing can beat that.

regards,

March 15, 2019 - 8:29 am UTC

That's what I was getting at by "trickle feed a background process" ;)

Why not do this with a simple parallel INSERT AS SELECT? Unless there are good reasons against nothing can beat that.

Indeed.

Questions and thoughts

Stew Ashton, March 15, 2019 - 8:55 am UTC

As Chris stated, it's crucial to know whether there is OLTP activity at the same time as this huge batch job.

- Is the target table being modified at the same time? This is the worst case, since the batch and OLTP can slow each other up, or there can be "lost updates".
- Is the source table being modified at the same time? If so, using DBMS_PARALLEL_EXECUTE with multiple commits can cause inconsistent data to be copied to the target.
- Is there other activity that does not touch the two tables? If so, the batch job should not use all the CPU/memory/IO resources.

Next question: can you create chunks based on something other than ROWID? That may take more time than the actual copying.
If both tables are partitioned in the same way, why not make each partition a chunk?

Now some general remarks.

You should not mix two types of parallelism:
either do an INSERT...AS SELECT in parallel
or do DBMS_PARALLEL_EXECUTE with each process running serially.

With DBMS_PARALLEL_EXECUTE, there can be more chunks than processes. Each process can do several chunks, but there will be one commit per chunk.

With answers to the two questions above, there will be a much better chance to provide appropriate advice.

Best regards,
Stew

March 15, 2019 - 8:57 am UTC

Great questions Stew. To provide effective help we really need to understand what the OP is trying to do here.

the charge of trickle brigade

Racer I., March 15, 2019 - 9:03 am UTC

Hi,

Ok. But I don't see how you assume that this trickle feed is slower than the big chunk version. I think it can only be faster (if the workload is anything but perfectly even). Both utilise the system the same (16 parallel processes in this case) but the smaller chunks never underutilize it towards the end.

If you were to "unleash" the small chunk version it would run all 100 in parallel. I think this would bring the system to a standstill so is not an option.

Whatever intermediate unleashing you choose (say 32 workers) because the system can take it can equally be used for either version.

regards,

March 15, 2019 - 9:37 am UTC

if the workload is anything but perfectly even

You should aim to process the same number of rows in each chunk, right?

Assuming you do, I don't see how the trickle feed can be "only be faster"!

It's like the "one delete removing N rows vs looping M times, deleting M/N rows and committing" debate. A single data is almost always going to be faster.

But the problem's worse for DPE because the cost of splitting into chunks is higher.

There's overhead every time you start a new chunk. The database has to submit the job, start it up when there's a slot available, commit the transaction, etc.

10k rows is basically nothing for modern systems. Unless you're doing something really funky, I'd expect the database to do this in under a second. So these overheads become a non-trivial matter.

If you were to "unleash" the small chunk version it would run all 100 in parallel. I think this would bring the system to a standstill so is not an option.

That depends on your resources. If you've got a small, 4-core server then yes, you've brought the system to its knees. If you're lucky enough to have a full rack Exadata, with spare capacity the database should be able to handle it without breaking a sweat.

So you need to figure out how many parallel process you have enough spare resources to safely run at the same time. Say this is 10. If you then split your workload up into 10 even chunks, I don't see how this can be "only slower", than 10 processes which handle a fixed number of rows each (where this number << total rows to process, so the total number of chunks is >> 10).

The caveat would be if you need to process in small batches to avoid blocking issues with customer transactions.

as always it depends

Racer I., March 15, 2019 - 10:26 am UTC

Hi,

> You should aim to process the same number of rows in each chunk, right?

Aim, yes. Succeed not always (with DPE, maybe lack of knowledge on my part, because we haven't actually used it). Our frameworks have that part down. But still they exhibit uneven processing speed when the work per row is not constant.

This comes only into play if there is some complex processing involved. Say each row is a customer order. Most are small but some are resalers -> skew possible. Small chunks can adjust to this.

> There's overhead every time you start a new chunk

Yeah. I was thinking of our cases where the processing dominates the runtime. The chunk size would be determined by locking time limits. I.e. how many rows per say 10 seconds or at most a minute. Maybe be 10.000 or may be only 10 if each has lots to do. Also again we don't use DPE and I have misgivings about its chunk startup capabilities. For the framework side I would expect Oracle to "do its best". For the app side I would probably try to use JSON-strings with all 1000 IDs listed instead of the ID BETWEEN X AND Y scheme so processing can start right away.

> Unless you're doing something really funky

So yes I was in the funky camp. Non-funky -> MASS-DML.

There can of course still be many use cases where MASS-DML is too big and the logic small enough to go the big-chunk way. In particular could well be the case for the OP.

As I said I think once you have any Z for what the system can take both big and small chunk can use Z. Big will win if the logic is small enough to allow even processing. Small if its not, by enough to compensate for the startup overhead and then some.

regards,

"Even processing"

Stew Ashton, March 15, 2019 - 11:12 am UTC

Having the same number of rows does not necessarily mean "even processing".

If one chunk has 10,000 rows in 10,000 different blocks, that requires much more processing than if they are in 100 different blocks.

That is one reason why chunking by blocks can be preferable, provided we can process all the rows in the block at the same time.

If there are several chunks per process, "even processing" is easier to get since the first process to finish a chunk will start the next available chunk.

To be clear, "chunking by blocks" means setting the parameter "by_row" to FALSE in the CREATE_CHUNKS_BY_ROWID procedure.

Regards,
Stew

March 15, 2019 - 2:52 pm UTC

Very true.

Satender, March 15, 2019 - 12:10 pm UTC

Hi,
To make it simpler, i would like to tell you below details.
I have Table_1 and Table_2 of same structure.
Table_1 get around 2-3 Million data everyday and Table_2 have data in Billions.
Now i want to move Table_1 data to Table_2 everyday.
I am doing Insert into Table_2 and Delete from Table_1.
</>
dbms_parallel_execute.create_chunks_by_rowid(task_name => 'MY_TASK',
table_owner => SYS_CONTEXT('USERENV','SESSION_USER'),
table_name => 'TABLE_1',
by_row => true,
chunk_size => 10000);

dbms_parallel_execute.run_task(task_name => 'MY_TASK',
sql_stmt => 'BEGIN PR(:start_id, :end_id); END;',
language_flag => DBMS_SQL.NATIVE,
parallel_level => 20);

</>

Now the challenge is TABLE_1 is partitioned and it is creating chunks based on Table Partitions.
Now, when i simply do insert and delete it will take 3-4 hours as other processes are also there.
And, this is the reason i want to do it with DBMS_PARALLEL_EXECUTE.
Also, it will not create any data issues and data will not be duplicate in table_2 or data in table_1 will be dropped first and then deleted.
Please i want your help in finding another way.
What i think is i can try it with CHUNKS_BY_SQL which could decrease the number of Chunks.
And i don't have 100,000 CPU's.. At one time it will run (parallel_level => 20) 20 parallel jobs and as it will finish processing in one job it will pick next chunk.
And number of chunks monitored from DBA_PARALLEL_EXECUTE_CHUNKS.

March 15, 2019 - 2:43 pm UTC

Why do you need to use dbms_parallel_execute?

Why not just:

insert into table2
  select * from table1 where...

As you're partitioned, I'd also hope you could convert your delete to a:

alter table table1 (drop|truncate) partition ...

Which is "instant". If you can't your partition scheme seems fishy to me...

As Racer mentions, partition exchange is another option too.

If you can't wait

Racer I., March 15, 2019 - 12:31 pm UTC

Hi,

> I have Table_1 and Table_2 of same structure.
> Table_1 get around 2-3 Million data everyday and Table_2 have data in Billions.
> Now i want to move Table_1 data to Table_2 everyday.

That does sound like a potential case for partition exchange. Requires some sophisticated setup. In particular table_2 must have a compatible partitioning structure to table_1 and the rows to move must reside in a partition (or multiples) of their own and not be mixed with rows that should not be moved. At least the system already uses partitioning so its available and some knowledge can be assumed.

regards,

Satender, March 15, 2019 - 3:00 pm UTC

Hi Chris,

I can't truncate the partition as there is other data in same partition which i don't want to delete.
So i have to use Delete.
And insert into table_2 i am doing but using Parallel Chunks.

As Table_2 is huge and Table_1 also have Millions rows to process.
So, it is taking huge amount of time in doing so.
And same action on same time is running on several sessions with different conditions in where clause.

March 15, 2019 - 4:46 pm UTC

I can't truncate the partition as there is other data in same partition which i don't want to delete.

One of the big benefits of partition is quick deletion of old data by dropping/truncating the partition... So you may want to revisit how you're partitioning...

I'm not sure why the size of table2 matters here. You're just inserting into it, right?

Satender, March 15, 2019 - 4:52 pm UTC

Hi Chris,

Partitions are there for several other reasons, but here i want to delete some data.

Example. Partition done on Column A, and below is data.

A B
--------
A 1
A 2
A 3
A 4
A 5
A 6
A 7

Now in above data if i want to delete B<=5.
So i cant truncate the partition.

And i have noticed it is taking from view DBMS_PARALLEL_EXECUTE_EXTENTS and creating the number of chunks.
Is there any way we can reduce the number of Chunks.

March 18, 2019 - 10:18 am UTC

We create chunks based on extents because by definition they will be split in different
but contiguous parts of storage. So when we do "rowid between :start and :end" then we dont get overlaps.

The number of chunks is not much of an issue as long as you control the concurrency of the execution with job_queue_processes. If this is not possible, then look at a different call to dbms_parallel_execute to use numeric ranges or similar.

A reader, May 10, 2021 - 8:27 am UTC

hi, I also encountered the same problem. No matter how I set the two parameters of chunk_size and parallel_level, there are tens of thousands of chunks. Later I found that after collecting the statistics of the table, the chunks decreased as the chunk_size increased. When it increases to chunk_size=100000, the chunk has dropped from tens of thousands to more than 200. I don't know if this is the case for you, but I hope it can be helpful to you.

May 12, 2021 - 7:00 am UTC

thanks for the input

More to Explore

PL/SQL demos

Check out more PL/SQL tutorials on our LiveSQL tool.

PL/SQL docs

PL/SQL reference manual from the Oracle documentation library

Breadcrumb

Question and Answer

You Asked

and Chris said...

Rating

Comments

little big (commit) world

workers of AQ, unite

Questions and thoughts

the charge of trickle brigade

as always it depends

"Even processing"

If you can't wait

More to Explore

PL/SQL demos

PL/SQL docs

Number of Chunks in DBMS_PARALLEL_EXECUTE

Breadcrumb

Question and Answer

You Asked

and Chris said...

Rating

Comments

little big (commit) world

workers of AQ, unite

Questions and thoughts

the charge of trickle brigade

as always it depends

"Even processing"

If you can't wait

More to Explore

PL/SQL demos

PL/SQL docs