Hi Tom,
i am trying to call inline function from 5 million rows of table (SSF_FACT_TABLE) and update the function's result into same (SSF_FACT_TABLE) table. it is taking two days to complete the process. Statistics of the table got updated. can you help me to write the better query.
Thank you!
CREATE TABLE SSF_TEMP (PK_SSF_ID INTEGER,STATUS VARCHAR2(15));
BEGIN
LOOP
INSERT /*+ WITH_PLSQL APPEND PARALLEL(8)*/ INTO SSF_TEMP (PK_SSF_ID,STATUS)
WITH FUNCTION INLINE_FN_STATUS (
P_SSF_DATE IN DATE,
P_PK_SSF_ID IN INTEGER)
RETURN VARCHAR2
IS
R_CUST_STATUS VARCHAR2 (15) := NULL;
BEGIN
SELECT SUBSCRIPTION_STATUS INTO R_CUST_STATUS
FROM (SELECT X.*,
ROW_NUMBER ()
OVER (ORDER BY SUBSCRIPTION_STATUS_ID ASC) RNK1
FROM (SELECT STATUS_DATE ,
SUBSCRIPTION_ID,
ROW_NUMBER ()
OVER (
PARTITION BY SUBSCRIPTION_ID
ORDER BY
SF.STATUS_DATE DESC,
SF.PK_SSF_ID DESC)
RNK,
CASE
WHEN SUBSCRIPTION_STATUS = 'ACTIVE'
THEN
1
WHEN SUBSCRIPTION_STATUS = 'IN-ACTIVE'
THEN
2
WHEN SUBSCRIPTION_STATUS = 'CANCELLED'
THEN
3
END
AS SUBSCRIPTION_STATUS_ID,
SUBSCRIPTION_STATUS
FROM SSF_FACT_TABLE SF
WHERE SSF_DATE <= P_SSF_DATE
AND PK_SSF_ID <= P_PK_SSF_ID
)
X
WHERE RNK = 1)
WHERE RNK1 = 1;
RETURN R_CUST_STATUS;
END;
SELECT /*+ PARALLEL(SF,2) PARALLEL(CL,2)*/ DISTINCT SSF.PK_SSF_ID ,
INLINE_FN_STATUS(SSF.SSF_DATE, SSF.PK_SSF_ID) AS STATUS
FROM SSF_FACT_TABLE SSF
LEFT JOIN SSF_TEMP T ON SSF.PK_SSF_ID=T.PK_SSF_ID
AND T.PK_SSF_ID IS NULL
AND ROWNUM<=500;
EXIT WHEN SQL%NOTFOUND;
COMMIT;
UPDATE SSF_FACT_TABLE SF SET STATUS=
(
SELECT STATUS FROM SSF_TEMP S1
WHERE S1.PK_SSF_ID=SF.PK_SSF_ID
)
WHERE EXISTS (SELECT 1 FROM SSF_TEMP S2 WHERE SF.PK_SSF_ID = S2.PK_SSF_ID AND SF.STATUS IS NULL)
AND SF.STATUS IS NULL
AND ROWNUM<=5000;
ROW_COUNT:=NVL(ROW_COUNT,0)+SQL%ROWCOUNT;
COMMIT;
END LOOP;
COMMIT;
UPDATE SSF_FACT_TABLE SF SET STATUS=
(
SELECT STATUS FROM SSF_TEMP S1
WHERE S1.PK_SSF_ID=SF.PK_SSF_ID
)
WHERE EXISTS (SELECT 1 FROM SSF_TEMP S2 WHERE SF.PK_SSF_ID = S2.PK_SSF_ID AND SF.STATUS IS NULL)
AND SF.STATUS IS NULL
AND ROWNUM<=5000;
ROW_COUNT:=NVL(ROW_COUNT,0)+SQL%ROWCOUNT;
COMMIT;
END;
I think no matter what you do here, you are in trouble from a data modelling perspective.
WHERE SSF_DATE <= P_SSF_DATE
AND PK_SSF_ID <= P_PK_SSF_ID
This snippet suggests to me that you have a requirement which is
"For a given date/id, find all rows previous to this and find the maximum of some attributes"
Whilst we could potentially do some indexing to improve that, it doesn't really strike me as sustainable long term, ie,
If SSF_FACT_TABLE has (say) 1000 rows, then processing
row #1 - look back 0 rows
row #10 - look back 9 rows
row #500 - look back 499 rows
row #1000 - look back 999 rows
So you are heading in to O(n*n) style of magnitude. Thus when the fact table is 1,000,000 rows, the amount of work is scaling out to 1,000,000,000,000 operations.
I'm also a bit confused by the logic (based on the column names).
WHERE SSF_DATE <= P_SSF_DATE
AND PK_SSF_ID <= P_PK_SSF_ID
also suggests that we're looking (potentially) through the entire fact table to get a status result. Wouldn't this be broken up by some other attibute? eg last status "for a customer" or last status "for a product" or something other than just all of the table?