Skip to Main Content
  • Questions
  • calling inline function in 5million records of select query causing performance Issues

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Saddam.

Asked: December 15, 2021 - 5:00 pm UTC

Last updated: January 10, 2022 - 12:57 am UTC

Version: 19c

Viewed 1000+ times

You Asked

Hi Tom,

i am trying to call inline function from 5 million rows of table (SSF_FACT_TABLE) and update the function's result into same (SSF_FACT_TABLE) table. it is taking two days to complete the process. Statistics of the table got updated. can you help me to write the better query.
Thank you!



CREATE TABLE SSF_TEMP (PK_SSF_ID INTEGER,STATUS VARCHAR2(15));


  BEGIN

    LOOP
  
    INSERT /*+ WITH_PLSQL APPEND PARALLEL(8)*/ INTO SSF_TEMP (PK_SSF_ID,STATUS)
    WITH  FUNCTION INLINE_FN_STATUS (
            P_SSF_DATE                      IN DATE,
            P_PK_SSF_ID              IN INTEGER)
            RETURN VARCHAR2
        IS
            R_CUST_STATUS   VARCHAR2 (15) := NULL;
        BEGIN
            SELECT SUBSCRIPTION_STATUS   INTO R_CUST_STATUS
              FROM (SELECT X.*,
                           ROW_NUMBER ()
                               OVER (ORDER BY SUBSCRIPTION_STATUS_ID ASC)    RNK1
                      FROM (SELECT  STATUS_DATE ,
                                   SUBSCRIPTION_ID,
                                   ROW_NUMBER ()
                                       OVER (
                                           PARTITION BY   SUBSCRIPTION_ID
                                           ORDER BY
                                               SF.STATUS_DATE DESC,
                                               SF.PK_SSF_ID DESC)
                                       RNK,
                                   CASE
                                       WHEN SUBSCRIPTION_STATUS = 'ACTIVE'
                                       THEN
                                           1
                                       WHEN SUBSCRIPTION_STATUS = 'IN-ACTIVE'
                                       THEN
                                           2
                                       WHEN SUBSCRIPTION_STATUS = 'CANCELLED'
                                       THEN
                                           3
                                   END
                                       AS SUBSCRIPTION_STATUS_ID,
                                   SUBSCRIPTION_STATUS
                              FROM SSF_FACT_TABLE SF
                             WHERE    SSF_DATE <= P_SSF_DATE
                                   AND PK_SSF_ID <= P_PK_SSF_ID
                                   )
                           X
                     WHERE RNK = 1)
             WHERE RNK1 = 1;
            RETURN R_CUST_STATUS;
        END;
SELECT /*+ PARALLEL(SF,2) PARALLEL(CL,2)*/ DISTINCT SSF.PK_SSF_ID ,
INLINE_FN_STATUS(SSF.SSF_DATE, SSF.PK_SSF_ID) AS STATUS
FROM SSF_FACT_TABLE SSF
LEFT JOIN SSF_TEMP T ON SSF.PK_SSF_ID=T.PK_SSF_ID
AND T.PK_SSF_ID IS NULL
AND ROWNUM<=500;
EXIT WHEN SQL%NOTFOUND;
COMMIT;

UPDATE SSF_FACT_TABLE SF SET STATUS=
(
SELECT STATUS FROM SSF_TEMP S1
WHERE S1.PK_SSF_ID=SF.PK_SSF_ID
)
WHERE EXISTS (SELECT 1 FROM SSF_TEMP S2 WHERE SF.PK_SSF_ID = S2.PK_SSF_ID AND SF.STATUS IS NULL)
AND SF.STATUS IS NULL
AND ROWNUM<=5000;

ROW_COUNT:=NVL(ROW_COUNT,0)+SQL%ROWCOUNT;
COMMIT;

END LOOP;
COMMIT;

UPDATE SSF_FACT_TABLE SF SET STATUS=
(
SELECT STATUS FROM SSF_TEMP S1
WHERE S1.PK_SSF_ID=SF.PK_SSF_ID
)
WHERE EXISTS (SELECT 1 FROM SSF_TEMP S2 WHERE SF.PK_SSF_ID = S2.PK_SSF_ID AND SF.STATUS IS NULL)
AND SF.STATUS IS NULL
AND ROWNUM<=5000;
ROW_COUNT:=NVL(ROW_COUNT,0)+SQL%ROWCOUNT;
COMMIT;

END;

and Connor said...

I think no matter what you do here, you are in trouble from a data modelling perspective.

WHERE    SSF_DATE <= P_SSF_DATE
AND PK_SSF_ID <= P_PK_SSF_ID


This snippet suggests to me that you have a requirement which is

"For a given date/id, find all rows previous to this and find the maximum of some attributes"


Whilst we could potentially do some indexing to improve that, it doesn't really strike me as sustainable long term, ie,

If SSF_FACT_TABLE has (say) 1000 rows, then processing

row #1 - look back 0 rows
row #10 - look back 9 rows
row #500 - look back 499 rows
row #1000 - look back 999 rows

So you are heading in to O(n*n) style of magnitude. Thus when the fact table is 1,000,000 rows, the amount of work is scaling out to 1,000,000,000,000 operations.

I'm also a bit confused by the logic (based on the column names).

WHERE SSF_DATE <= P_SSF_DATE
AND PK_SSF_ID <= P_PK_SSF_ID

also suggests that we're looking (potentially) through the entire fact table to get a status result. Wouldn't this be broken up by some other attibute? eg last status "for a customer" or last status "for a product" or something other than just all of the table?

Is this answer out of date? If it is, please let us know via a Comment

More to Explore

Performance

Get all the information about database performance in the Database Performance guide.