We have two tables, ITEM table has more than 150 million records and ITEM_EVENT table has more than 400 millions. Because of the growing nature of the data, we want to perform periodic cleanup of the tables. Could not find a performant way to achieve the goal, select query was taking very long and eventually got ORA-01114. CREATED Columns in both tables are not indexed. We can do that if they can help to achieve our goal. So please give us some suggestions to achieve our goal. Thanks.
Delete records is planned for:
- older than some compliance date
- with a batch size of say 50000 per iteration
- split the deletion in two steps, delete 1st the foreign key records and then primary keys
Our tables DDL:
CREATE TABLE "ITEM"
( "ID" VARCHAR2(255 CHAR) NOT NULL ENABLE,
"CREATED" TIMESTAMP (6) NOT NULL ENABLE,
"ITEM_TYPE" VARCHAR2(255 CHAR) NOT NULL ENABLE,
"ITEM_ID" VARCHAR2(255 CHAR) NOT NULL ENABLE
PRIMARY KEY ("ID")
)
CREATE INDEX "ITEM_ID_NDX" ON "ITEM" ("ITEM_ID")
CREATE TABLE "ITEM_EVENT"
( "ID" NUMBER(19,0) NOT NULL ENABLE,
"CREATED" TIMESTAMP (6) NOT NULL ENABLE,
"ITEM_EVENT_TYPE" VARCHAR2(255 CHAR) NOT NULL ENABLE,
"ITEM_BID" VARCHAR2(255 CHAR) NOT NULL ENABLE,
"ITEM_STATE" VARCHAR2(255 CHAR),
"CHANGE_REASON" VARCHAR2(255 CHAR),
"ITEM_ID" VARCHAR2(255 CHAR) NOT NULL ENABLE,
PRIMARY KEY ("ID")
)
alter table ITEM_EVENT
add constraint ITEM_EVENT_FK_ITEM_BID
foreign key (ITEM_BID)
references ITEM;
CREATE INDEX "ITEM_EVENT_BID_NDX" ON "ITEM_EVENT" ("ITEM_BID")
CREATE INDEX "ITEM_EVENT_ID_NDX" ON "ITEM_EVENT" ("ITEM_ID")
Following query tried which was very slow and causing error:
DELETE FROM ITEM_EVENT
WHERE ITEM_ID IN (
SELECT ITEM_ID
FROM ITEM_EVENT
WHERE CREATED < current_timestamp - NUMTODSINTERVAL(180, 'DAY')
GROUP BY ITEM_ID
HAVING MAX(ITEM_STATE) KEEP (DENSE_RANK LAST ORDER BY CREATED ASC)= 'DEACTIVATED'
FETCH FIRST 50000 ROWS ONLY);
DELETE FROM ITEM i
WHERE NOT EXISTS (SELECT 1 FROM ITEM_EVENT ie WHERE ie.ITEM_BID = i.ID)
AND CREATED < current_timestamp - NUMTODSINTERVAL(180, 'DAY');
Well, there are few things that *might* be where you are losing your time here.
- the derivation of items from item_event based on last item_state
- the deletion of item_events
- the deletion of items
or a combination of all of them.
So I would start by finding out which first. But let's say
SELECT ITEM_ID
FROM ITEM_EVENT
WHERE CREATED < current_timestamp - NUMTODSINTERVAL(180, 'DAY')
GROUP BY ITEM_ID
HAVING MAX(ITEM_STATE) KEEP (DENSE_RANK LAST ORDER BY CREATED ASC)= 'DEACTIVATED'
is running reasonably efficiently.
btw, if its not, then can you assume the "DEACTIVATED" is a final state? ie, nothing ever comes back from there? Because if so, you could try:
SELECT ITEM_ID, max(case when ITEM_STATE = 'DEACTIVATED' then 1 end) has_been_deactivated
FROM ITEM_EVENT
WHERE CREATED < current_timestamp - NUMTODSINTERVAL(180, 'DAY')
GROUP BY ITEM_ID
HAVING max(case when ITEM_STATE = 'DEACTIVATED' then 1 end) = 1
which might be less resource hungry/
In this cases, you might be more efficient to simply grab those item id's into a structure and then use them for primary / indexed deletes, eg
SQL> create table chd
2 as select d.* from dba_Objects d,
3 ( select 1 from dual connect by level <= 100 )
4 where d.object_id is not null;
Table created.
SQL>
SQL> create table par
2 as select * from dba_Objects
3 where object_id is not null;
Table created.
SQL>
SQL> alter table par add primary key (object_id);
Table altered.
SQL> alter table chd add foreign key (object_id) references par (object_id);
Table altered.
SQL> create index chd_ix on chd ( object_id );
Index created.
SQL> declare
2 type nlist is table of number
3 index by pls_integer;
4 n nlist := nlist();
5 begin
6 select object_id
7 bulk collect into n
8 from
9 ( select distinct object_id
10 from chd
11 where rownum <= 50000
12 );
13
14 forall i in 1 .. n.count
15 delete from chd where object_id = n(i);
16
17 forall i in 1 .. n.count
18 delete from par where object_id = n(i);
19 end;
20 /
PL/SQL procedure successfully completed.
Lot of "ifs" here depending on data distribution etc.