You Asked

Hi,

I'm currently writing a PL/SQL procedure for the following requirement.
I need to update a foreign key column with new values (taken from a lookup table). Instead of updating the records, this is the plan.
I need to do this for 60+ tables. I have just used EMP table for my internal testing.

1) Create table EMP_BK AS SELECT * FROM EMP WHERE ROWNUM < 1
2) Enable nologging for EMP_BK
3) Insert records into the backup from table from main table

INSERT /*+ append parallel (EMP_BK,12) */
INTO EMP_BK (<columns list>)
SELECT E.<columns list>, L.NEW_ID
FROM EMP E, LOOKUP L
WHERE E.ID = L.OLD_ID;

4) Once records are inserted, retrieve Indexes, Constraints, Triggers, Grants definitions from EMP table
5) Build Indexes, Constraints, Triggers, Grants on EMP_BK table
6) Drop table EMP
7) Rename EMP_BK to EMP
8) Enable logging for EMP table

I have the following doubts:

a) The table could have 40-50 million records. How do we decide the number of threads in append parallel option generally?
b) I'm planning to hard code the index, constraints, triggers, grants definitions in the plsql procedure itself. Is this a good idea?
c) I tried playing around with DBMS_REDEFINITION.COPY_TABLE_DEPENDENTS to copy indexes, constraints etc. But it looks like I have to do the start and
finish steps too (DBMS_REDEFINITION.START_REDEF_TABLE and DBMS_REDEFINITION.FINISH_REDEF_TABLE). Not sure whether it will affect the performance.
d) Is it a good idea if I use DBMS_SCHEDULER to run the create index scripts in parallel if I'm doing this for multiple tables?
e) Should I be doing COMPUTE STATISTICS after the backup table has been inserted with records from the main table?
f) After reading Oracle's help, I came to know that we don't need to do compute statistics after index creation. It's part of the index creation
process. Please correct me if I'm wrong.
g) If I'm going to create indexes, constraints etc with the same name, should I be renaming the indexes and constraints on the main table, to retain the same names?

Any suggestions/corrections are welcome.

Samuel Schwartz

and Connor said...

a) The table could have 40-50 million records. How do we decide the number of threads in append parallel option generally?

=> 40/50 million rows isnt a lot, so you might not need parallel at all, but why not just test some values and pick what works best ?

b) I'm planning to hard code the index, constraints, triggers, grants definitions in the plsql procedure itself. Is this a good idea?

=> Perhaps look at using dbms_metadata to extract the ddl, and then just do 'execute immediate' on that

c) I tried playing around with DBMS_REDEFINITION.COPY_TABLE_DEPENDENTS to copy indexes, constraints etc. But it looks like I have to do the start and finish steps too (DBMS_REDEFINITION.START_REDEF_TABLE and DBMS_REDEFINITION.FINISH_REDEF_TABLE). Not sure whether it will affect the performance.

=> dbms_redef is not really approppriate here

d) Is it a good idea if I use DBMS_SCHEDULER to run the create index scripts in parallel if I'm doing this for multiple tables?

=> if your server has sufficient resource to do the work in parallel, then this would make sense. It does complicate the task a little so make sure its worth the effort in terms of the time saved.

e) Should I be doing COMPUTE STATISTICS after the backup table has been inserted with records from the main table?

=> In 12c, you should see the stats obtained automatically for *both* the table and the indexes (assuming you are doing a direct mode operation on an empty table)

f) After reading Oracle's help, I came to know that we don't need to do compute statistics after index creation. It's part of the index creation process. Please correct me if I'm wrong.

=> see above

g) If I'm going to create indexes, constraints etc with the same name, should I be renaming the indexes and constraints on the main table, to retain the same names?

=> Sounds like a good idea to me.

Rating

(1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Thanks a ton!

Samuel Schwartz, October 04, 2016 - 7:11 pm UTC

Thanks a lot, Connor!

Implemented your suggestions.

=> 40/50 million rows isnt a lot, so you might not need parallel at all, but why not just test some values and pick what works best ?

=>> 15M rows got inserted in 10 minutes without parallel option. So, using only the APPEND option after enabling NOLOGGING

=> Perhaps look at using dbms_metadata to extract the ddl, and then just do 'execute immediate' on that

=>> Used dbms_metadata.get_ddl to retrieve scripts for CREATE TABLE and CREATE INDEX.

=> dbms_redef is not really approppriate here

=>> Thanks. Ignored it.

=> if your server has sufficient resource to do the work in parallel, then this would make sense. It does complicate the task a little so make sure its worth the effort in terms of the time saved.

=>> Still haven't tested this for a large table.

=> In 12c, you should see the stats obtained automatically for *both* the table and the indexes (assuming you are doing a direct mode operation on an empty table)

=>> Actually the db is Release 11.2.0.4.0 - 64bit.
Was doing my initial testing in 12c.
So, for 11g db should we run COMPUTE STATISTICS after table and indexes
are created?
(Yes, I am doing a direct mode operation on an empty table. Creating a
backup table and inserting records fresh from the main table)

So, basically I'm left with 2 doubts.
1) Whether COMPUTE STATISTICS are required after table creation and index creation. I did do some testing.

a) Created table and inserted a record. No updates in USER_TABLES.NUM_ROWS.
b) Executed Analyze Table <table name> Compute Statistics. Value updated in USER_TABLES.NUM_ROWS.
c) Created index on table. Value updated in USER_TABLES.NUM_ROWS.

So, looks like tables have to be analyzed explicitly in 11g. Whereas statistics get updated for indexes on creation. Am I on the right path?

2) Parallel jobs for index creation. Testing on a large table is in progress.

October 05, 2016 - 1:04 am UTC

"So, looks like tables have to be analyzed explicitly in 11g. Whereas statistics get updated for indexes on creation. Am I on the right path? "

Yes. But one thing that people often do not take advantage of is *setting* statistics rather than gathering them.

eg, when I do

begin
insert into blah
select * from something_else;
end;

then I've access to sql%rowcount to tell me how many rows I inserted.

So if that was a populating of an empty table, then why bother re-scanning the thing with dbms_stats.gather_table_stats. I already *know* how many rows are in there, because I just populated it! I can just use dbms_stats.set_table_stats.

Similarly, using some sample data gives me enough to set row len, blocks, num distinct etc without the enormous cost.

Breadcrumb

Question and Answer

You Asked

and Connor said...

Rating

Comments

Thanks a ton!

More to Explore

PL/SQL demos

PL/SQL docs

Need tips on inserting records in a backup table and renaming it to the original table.

Breadcrumb

Question and Answer

You Asked

and Connor said...

Rating

Comments

Thanks a ton!

More to Explore

PL/SQL demos

PL/SQL docs