Skip to Main Content
  • Questions
  • Delete 50 percent data from a table with billions of records.

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Joe.

Asked: August 31, 2016 - 9:00 am UTC

Last updated: May 27, 2019 - 6:13 am UTC

Version: 11.2.0.1

Viewed 10K+ times! This question is

You Asked

Hi team,
I have a nightmare recently, it comes along with a poor table design of our customer's database: A table named T_PRODUCT_TEST_DATA which has more than 3.6 billion records. What's worse, neither is this table a partitioned table nor has a DATE column.
Since the huge records on this table, the DB runs as slow as my grandma. Finally, my customer realized that something gotta change.

First, they were planed to delete the rows like this:
SQL> set autocommit 1000; -- I doubt this will be effective.
SQL> alter session enable parallel DML;
SQL> delete /*+ parallel(t, 16) */ from T_TEST_PRODUCT_DATA t where TPD_MO in (select MO_NUMBER from T_PM_MO_BASE where MO_CREATE_DATE>=TO_DATE('20150831','YYYYMMDD'));
--about 25,000 out of 50,000 records on T_PM_MO_BASE in this sub query.
SQL> alter table T_TEST_PRODUCT_DATA shrink space cascade;

REBUILT INDEXES.
COMPUTE STATISTICS via DBMS_STATS package.

BTW, they planned to create an historical database center to store the deleted data (with Oracle Golden Gate synchronizing real time data, and OGG filter policy to ignore deleting to ensure the deleted data will remain in historical data center).

As you see, they decide to clear the historical data on this table first, that means about 50% of the records (1.8 billion, more than one-year-old data) will be deleted from this table. And it will consume lots of resource and elapsed for a long time!
I know I should redefinite this table from normal table to partition one. But before that, we have to make this table lose weight, that is to reduce 1.8 billion records on table T_PRODUCT_TEST_DATA. What am I suppose to do?

Lots of discussion about this, I've read this on AskTom
https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:2345591157689

Even so, I think we need your advice still. Cuz it was updated 2 years ago, maybe some new ideas or new solutions. :-)

and Chris said...

The advice in that thread is still good. A "create table as select" is likely to be the best approach.

A note on autocommit. As you suspect this probably isn't doing what the authors intended.

From the docs:

The SQL*Plus autocommit feature causes pending changes to be committed after a specified number of successful SQL DML transactions. (A SQL DML transaction is either an UPDATE, INSERT, or DELETE command, or a PL/SQL block.)

https://docs.oracle.com/cd/B19306_01/server.102/b14357/ch4.htm

So it's not how many rows you process. It's how many statements you run.

For example, the following inserts 99 rows in one statement. Autocommit is set to 10:

SQL> set autocommit 10
SQL> create table t (x int);

Table created.

SQL> insert into t select rownum from dual connect by level < 100;

99 rows created.

SQL> rollback;

Rollback complete.

SQL> select count(*)  from t;

  COUNT(*)
----------
         0


Note that all the rows were rolled back, because it's one transaction!

If we run 12 separate inserts then rollback, the first 10 are autocommitted. Only the last two are undone:

SQL> insert into t values (1);

1 row created.

SQL> /

1 row created.

SQL> /

1 row created.

SQL> /

1 row created.

SQL> /

1 row created.

SQL> /

1 row created.

SQL> /

1 row created.

SQL> /

1 row created.

SQL> /

1 row created.

SQL> /

1 row created.

Commit complete.
SQL> /

1 row created.

SQL> /

1 row created.

SQL> rollback;

Rollback complete.

SQL> select count(*)  from t;

  COUNT(*)
----------
        10

Rating

  (5 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Thanks and questions follwing

Joe Huang, September 02, 2016 - 1:20 am UTC

Thanks for the test example of 'autocommit'. And I do agree with you that "create table as select" is likely to be the better approach, it consumes much less redo segments, and I don't need the SHRINK SPACE and COMPUTE STATISTICS operations, that might save me plenty of time.

We may do it like these following steps during downtime:
1) alter database noarchivelog;
2) create table TMP_TABLE1 partition by range ... as select /*+ parallel(t,8) */ * from ... where...;
3) create index IDX_1 on TMP_TABLE1 (COL) parallel 8 local nologging;
4) alter index IDX_1 noparallel;
5) drop old big table during downtime;
6) rename the tablename of TMP_TABLE1;
7) purge recyclebin;


Here's my follow up questions:
1) To accelerate(or minimize downtime) these operation above, any other advice? Or what shall I pay attention to?
2) Before testing, I am wondering whether expdp/impdp (with where clause) will consumes less time than CTAS? It's a solution to this case, isn't it?
3) What if, just hypothesis, there's no enough space left to CTAS for certain big tables, will 'delete records with parallel option and commit transaction in time' be a better approach?

appreciate your help!
Connor McDonald
September 02, 2016 - 1:58 am UTC

The database can stay in archivelog mode (as long as you dont have "force logging" set). You just nominate nologging in your DDL's, ie

2) create table TMP_TABLE1 nologging partition by range ... as select /*+ parallel(t,8) */ * from ... where...;
3) create index IDX_1 on TMP_TABLE1 (COL) parallel 8 local nologging;

Also, make sure your recyclebin parameter is correctly set, otherwise you'll (silently) discover you've lost that old copy of the big table :-)

But if you've got available space, dont forget you could use DBMS_REDEFINITION and do all of this without much outage at all.

Hope this helps,
Connor

can DBMS_REDEFINITION redefinite part of a table

Joe Huang, September 02, 2016 - 3:03 am UTC

Thank you, Connor
1) I've check the "force logging" and "recyclebin", they are correctly set to default value, I will use 'nologging' as you said.

2)'But if you've got available space, dont forget you could use DBMS_REDEFINITION and do all of this without much outage at all.'
-- I don't quite get it. Do you mean redefinite a nonpartition table to partition? I understand that I could redefinite a table ONLINE, and the outage means the lock while executing DBMS_REDEFINITION.FINISH_REDEF_TABLE. However, can DBMS_REDEFINITION redefinite part of a table(i mean rows, not cols. for example, a where clause added)? I just need estimate 50% records of the table after all.
Connor McDonald
September 02, 2016 - 9:38 am UTC

You don't have to change the table to be partitioned when using DBMS_REDEFINITION. That's just a common use case.

When you start the redefinition, you create the target table. You can do your "create table as select 50%" at this point.

Chris

Suggest

Ghassan, September 02, 2016 - 7:16 am UTC

Table is date partitioned? If not it must be.
Use the partition drop split and move features after creating the target history table with compress tbs or table options.
Connor McDonald
September 02, 2016 - 9:38 am UTC

The OP says no.

Thanks

Joe Huang, September 02, 2016 - 10:38 am UTC

Thank you guys. That's of great help.

Creating a subset of data in the interim table is not possible

Guy, May 24, 2019 - 2:48 pm UTC

Hi Chris,

You claim "When you start the redefinition, you create the target table. You can do your "create table as select 50%" at this point." but nowhere is this detailed.

I created a test table in an 11.2 db en then an interim table with half the rows in it. When I invoke start_redef_table I find the resulting table with the full amount of rows again. My docs also state "You create an empty interim table". Has this been changed in later versions perhaps. Seeing your answer dates from 2016 I doubt that however.
Connor McDonald
May 27, 2019 - 6:13 am UTC

Not sure what Chris meant, but you could look at the following:

1) redefine the table online to a 2-partitions table (partition 1 = rows you want to get rid of, partition 2 = rows you want to keep)

So we have table T with partitions P1 and P2

- exchange out the partition you want to keep

So, alter table T exchange partition P2 with T_UNPAR

- drop the partitioned table

So T is dropped, and then rename T_UNPAR to T.

A few small outages there rather than a large one.

More to Explore

PL/SQL demos

Check out more PL/SQL tutorials on our LiveSQL tool.

PL/SQL docs

PL/SQL reference manual from the Oracle documentation library