You Asked

We have two tables, ITEM table has more than 150 million records and ITEM_EVENT table has more than 400 millions. Because of the growing nature of the data, we want to perform periodic cleanup of the tables. Could not find a performant way to achieve the goal, select query was taking very long and eventually got ORA-01114. CREATED Columns in both tables are not indexed. We can do that if they can help to achieve our goal. So please give us some suggestions to achieve our goal. Thanks.

Delete records is planned for:
- older than some compliance date
- with a batch size of say 50000 per iteration
- split the deletion in two steps, delete 1st the foreign key records and then primary keys

Our tables DDL:

CREATE TABLE "ITEM" 
   ( "ID" VARCHAR2(255 CHAR) NOT NULL ENABLE, 
 "CREATED" TIMESTAMP (6) NOT NULL ENABLE, 
 "ITEM_TYPE" VARCHAR2(255 CHAR) NOT NULL ENABLE, 
 "ITEM_ID" VARCHAR2(255 CHAR) NOT NULL ENABLE 
  PRIMARY KEY ("ID")
    )

CREATE INDEX "ITEM_ID_NDX" ON "ITEM" ("ITEM_ID") 

 CREATE TABLE "ITEM_EVENT" 
   ( "ID" NUMBER(19,0) NOT NULL ENABLE, 
 "CREATED" TIMESTAMP (6) NOT NULL ENABLE, 
 "ITEM_EVENT_TYPE" VARCHAR2(255 CHAR) NOT NULL ENABLE, 
 "ITEM_BID" VARCHAR2(255 CHAR) NOT NULL ENABLE, 
 "ITEM_STATE" VARCHAR2(255 CHAR), 
 "CHANGE_REASON" VARCHAR2(255 CHAR), 
 "ITEM_ID" VARCHAR2(255 CHAR) NOT NULL ENABLE, 
  PRIMARY KEY ("ID")
    )

alter table ITEM_EVENT
   add constraint ITEM_EVENT_FK_ITEM_BID
      foreign key (ITEM_BID)
      references ITEM;

CREATE INDEX "ITEM_EVENT_BID_NDX" ON "ITEM_EVENT" ("ITEM_BID")
CREATE INDEX "ITEM_EVENT_ID_NDX" ON "ITEM_EVENT" ("ITEM_ID")

Following query tried which was very slow and causing error:

DELETE FROM ITEM_EVENT
 WHERE ITEM_ID IN (
      SELECT ITEM_ID
   FROM ITEM_EVENT
      WHERE CREATED < current_timestamp - NUMTODSINTERVAL(180, 'DAY')          
      GROUP BY ITEM_ID
      HAVING MAX(ITEM_STATE) KEEP (DENSE_RANK LAST ORDER BY CREATED ASC)= 'DEACTIVATED'
      FETCH FIRST 50000 ROWS ONLY);

DELETE FROM ITEM i
 WHERE NOT EXISTS (SELECT 1 FROM ITEM_EVENT ie WHERE ie.ITEM_BID = i.ID)
 AND CREATED < current_timestamp - NUMTODSINTERVAL(180, 'DAY');

and Connor said...

Well, there are few things that *might* be where you are losing your time here.

- the derivation of items from item_event based on last item_state
- the deletion of item_events
- the deletion of items

or a combination of all of them.

So I would start by finding out which first. But let's say

SELECT ITEM_ID
FROM ITEM_EVENT
WHERE CREATED < current_timestamp - NUMTODSINTERVAL(180, 'DAY')          
GROUP BY ITEM_ID
HAVING MAX(ITEM_STATE) KEEP (DENSE_RANK LAST ORDER BY CREATED ASC)= 'DEACTIVATED'

is running reasonably efficiently.

btw, if its not, then can you assume the "DEACTIVATED" is a final state? ie, nothing ever comes back from there? Because if so, you could try:

SELECT ITEM_ID, max(case when ITEM_STATE = 'DEACTIVATED' then 1 end) has_been_deactivated
FROM ITEM_EVENT
WHERE CREATED < current_timestamp - NUMTODSINTERVAL(180, 'DAY')          
GROUP BY ITEM_ID
HAVING max(case when ITEM_STATE = 'DEACTIVATED' then 1 end) = 1

which might be less resource hungry/

In this cases, you might be more efficient to simply grab those item id's into a structure and then use them for primary / indexed deletes, eg

SQL> create table chd 
  2  as select d.* from dba_Objects d,
  3   ( select 1 from dual connect by level <= 100 )
  4  where d.object_id is not null;

Table created.

SQL>
SQL> create table par 
  2  as select * from dba_Objects
  3  where object_id is not null;

Table created.

SQL>
SQL> alter table par add primary key (object_id);

Table altered.

SQL> alter table chd add foreign key (object_id) references par (object_id);

Table altered.

SQL> create index chd_ix on chd ( object_id );

Index created.

SQL> declare
  2    type nlist is table of number
  3      index by pls_integer;
  4    n nlist := nlist();
  5  begin
  6    select object_id
  7    bulk collect into n
  8    from
  9    ( select distinct object_id
 10      from chd
 11      where rownum <= 50000
 12    );
 13
 14    forall i in 1 .. n.count
 15      delete from chd where object_id = n(i);
 16
 17    forall i in 1 .. n.count
 18      delete from par where object_id = n(i);
 19  end;
 20  /

PL/SQL procedure successfully completed.

Lot of "ifs" here depending on data distribution etc.

Rating

(1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Full Table Scan

A reader, May 21, 2024 - 1:22 pm UTC

Thanks for the Answer and explanation.

Executed the plan for first select on dev system, see it is doing full table scan. As mentioned in question, we have a couple of Indexes already in place and create new ones if needed.

| Id  | Operation           | Name        | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |             |  9758 |   505K|       |   160K  (1)| 00:00:07 |
|*  1 |  FILTER             |             |       |       |       |            |          |
|   2 |   HASH GROUP BY     |             |  9758 |   505K|    61M|   160K  (1)| 00:00:07 |
|*  3 |    TABLE ACCESS FULL| ITEM_EVENT |   999K|    50M|       |   156K  (1)| 00:00:07 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(MAX(CASE "ITEM_STATE" WHEN 'DEACTIVATED' THEN 1 END )=1)
   3 - filter(SYS_EXTRACT_UTC(INTERNAL_FUNCTION("CREATED"))<SYS_EXTRACT_UTC(CURRENT
              _TIMESTAMP(6)-INTERVAL'+000000180 00:00:00.000000000' DAY(9) TO SECOND(9)))

May 23, 2024 - 2:20 am UTC

1 million rows (assuming the estimate is a good one) wont take long to scan or group by.

That would suggest the time is being lost in the delete phase.

A trace on that will provide more insight.

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.

Breadcrumb

Question and Answer

You Asked

and Connor said...

Rating

Comments

Full Table Scan

More to Explore

Analytics

Delete Foreign Keys and Primary Table Rows in Large Table

Breadcrumb

Question and Answer

You Asked

and Connor said...

Rating

Comments

Full Table Scan

More to Explore

Analytics