A reader, February 19, 2018 - 1:09 pm UTC
Thank you very much.
It's a good thing 12c has this function as a documented pl/sql package. I remember seeing an undocumented procedure in 11g but I couldn't remember the name.
One thing I was wondering while pondering this type of solution was how much overhead there would be. But in any case I would only need this for the duration of the data migration.
Another option I was thinking about was turning on dynamic sampling instance-wide for the duration of the migration, but I would like the stats to run with auto_sample_size and I am not sure if there is a DYNAMIC(..) parameter that will reproduce that stats collection algorithm (where it builds a hash as it goes without sorting rows.)
Have a great day.
February 20, 2018 - 1:15 am UTC
One thing I was wondering while pondering this type of solution was how much overhead there would be.
Compared to the time it will take to run a big CTAS....negligible. It is a parse-time operation.
but I would like the stats to run with auto_sample_size
That is the default anyway, so unless you've tweaked preferences we'll be using the optimized NDV stats algorithm
online stats gathering
Rajeshwaran Jeyabal, February 20, 2018 - 8:39 am UTC
....
but instead of doing an UPDATE it creates table B (one-to-one with table A) and then does a CTAS to create table C, then drops table B.
The migration developer did not bother to take statistics on table B.
....
Since you were on Oracle 12c, creating a table by default do stats gathering (dont need an addition table scan for stats gathering) because of the new feature "online stats gathering".
Exception is only for the partition tables where partition level stats wont be available by default.
....
The CTAS has a parallel join which is using broadcast instead of hash. It's adding an extra 12 hours to the migration.
....
Again in 12c, new feature called "Adaptive plans" that has the ability to changing the parallel distribution methods at run time (HYBRID-HASH distribution) either to go with Broadcast or HASH distribution based on the resultset size and PX servers available.
Does these features don't help here ?