Thanks for the question, basant.

Asked: February 18, 2019 - 9:41 am UTC

Last updated: February 21, 2019 - 4:26 pm UTC

Version: 12.1

Viewed 1000+ times

You Asked

Hi Tom,

I am facing an issue while making sparse data dense.

Here is the problem statement: We are getting price information for securities from upstream in a file and prices will come only when either there will be new security on boarded or there is change in the price data for that secouity.
Now we need to replicate the prices for each security per day into our DB (due to some requirement), in order to do that I have written below code but it is not getting completed even in 3 hours.

The source table (Raw prices ) contains approx 150 Million sparse records(40K distinct securities in source), all these securities is not required for further processing so based on a lookup table i am filtering the securities further (13k distinct securities required) and to achieve the desired result
i am using the partition join technique.

My question to you is :-
1) Is it the right way to do it?
2) If yes, is there any way to speed up the process.

Query:-

WITH rws AS
(    SELECT trunc(SYSDATE - LEVEL) AS dy            FROM dual          CONNECT BY LEVEL <= 365*20) ,
stg as
 (SELECT MIN(Price_Date) over(PARTITION BY ss.sec_id ORDER BY Price_Date) MIN_PRICE_DATE_PER_SEC_ID,
         (Price_Date - lag(Price_Date, 1)
          over(PARTITION BY ss.sec_id ORDER BY Price_Date)) missing_days,
         ss.*
    FROM Raw_prices ss,
         (SELECT DISTINCT sec_id FROM sec_universe SU   WHERE SU.sec_id = SS.sec_id),
qty as
 (SELECT NVL(missing_days, 0) AS mdays,
         last_value(sec_id) IGNORE NULLS OVER(PARTITION BY sec_id ORDER BY dy ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) PREV_sec_id,
         rws.dy AS running_Price_Date,
         last_value(UNADJ_CLOSE_PRICE) IGNORE NULLS OVER(PARTITION BY sec_id ORDER BY dy ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) PREV_UNADJ_CLOSE_PRICE,
         ss.sec_id,
         ss.Price_Date,
         ss.UNADJ_CLOSE_PRICE,
         min(MIN_PRICE_DATE_PER_SEC_ID) over(partition by sec_id) MIN_EFF_DATE_PER_SEC
    from rws
    LEFT JOIN stg ss PARTITION BY(sec_id)
      ON dy = Price_Date
  
  )
SELECT CASE
         WHEN mdays = 0 THEN
          prev_sec_id
         ELSE
          sec_id
       END AS sec_id,
       CASE
         WHEN mdays = 0 THEN
          running_Price_Date
         ELSE
          Price_Date
       END AS Price_Date,
       CASE
         WHEN mdays = 0 THEN
          prev_UNADJ_CLOSE_PRICE
         ELSE
          UNADJ_CLOSE_PRICE
       END AS UNADJ_CLOSE_PRICE
  FROM qty q
 WHERE running_Price_Date >= MIN_EFF_DATE_PER_SEC


--------------------------------------------------------------------------------------------------------------------------------------
| Id   | Operation                                     | Name                           | Rows     | Bytes      | Cost    | Time     |
--------------------------------------------------------------------------------------------------------------------------------------
|    0 | SELECT STATEMENT                              |                                |   150427 |   78522894 | 2074931 | 00:01:22 |
|  * 1 |   VIEW                                        |                                |   150427 |   78522894 | 2074931 | 00:01:22 |
|    2 |    WINDOW SORT                                |                                |   150427 |   41517852 | 2074931 | 00:01:22 |
|    3 |     VIEW                                      |                                |   150427 |   41517852 | 2066007 | 00:01:21 |
|    4 |      NESTED LOOPS PARTITION OUTER             |                                |   150427 |   41517852 | 2066007 | 00:01:21 |
|    5 |       BUFFER SORT                             |                                |          |            |         |          |
|    6 |        VIEW                                   |                                |        1 |          6 |       2 | 00:00:01 |
|  * 7 |         CONNECT BY WITHOUT FILTERING          |                                |          |            |         |          |
|    8 |          FAST DUAL                            |                                |        1 |            |       2 | 00:00:01 |
|  * 9 |       FILTER                                  |                                |          |            |         |          |
|   10 |        SORT PARTITION JOIN                    |                                |   150427 |   40615290 | 1975550 | 00:01:18 |
|   11 |         VIEW                                  |                                |   150427 |   40615290 | 1103176 | 00:00:44 |
|   12 |          WINDOW SORT                          |                                | 15042715 | 2421877115 | 1103176 | 00:00:44 |
|   13 |           NESTED LOOPS                        |                                | 15042715 | 2421877115 |  568801 | 00:00:23 |
|   14 |            NESTED LOOPS                       |                                | 15339650 | 2421877115 |  568801 | 00:00:23 |
|   15 |             VIEW                              |                                |     9158 |      82422 |     617 | 00:00:01 |
|   16 |              SORT UNIQUE                      |                                |     9158 |     109896 |     617 | 00:00:01 |
| * 17 |               TABLE ACCESS FULL               | SEC_UNIVERSE            |    13511 |     162132 |     615 | 00:00:01 |
| * 18 |             INDEX RANGE SCAN                  | PK_RAW_PRICES            |     1675 |            |      10 | 00:00:01 |
|   19 |            TABLE ACCESS BY GLOBAL INDEX ROWID | RAW_PRICES   |     1643 |     249736 |      62 | 00:00:01 |
--------------------------------------------------------------------------------------------------------------------------------------

and Chris said...

1) Maybe?

I don't know the full details of what you're trying to do. Or the schema you're using.

If you post a complete example, including:

- create tables
- insert intos
- expected result from these data

we can provide more help.

Also, for performance question please include the execution plan. At a minimum this should include the E(stimated) rows & A(ctual) rows columns. Ideally the buffers/consistent gets too.

If you're not sure how to get one, read:

https://blogs.oracle.com/sql/how-to-create-an-execution-plan

2) If by this:

  from raw_prices ss,
       (
         select distinct sec_id
         from   sec_universe su
         where  su.sec_id = ss.sec_id
       )

You mean, "return all the raw_prices that have a matching sec_id in sec_universe"

it may be better written as an exists:

from   raw_prices ss
where  exists (
  select null
  from   sec_universe su
  where  su.sec_id = ss.sec_id
)

Rating

(4 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Thanks for the quick response.....

basant nailwal, February 19, 2019 - 4:24 am UTC

Thanks Chris for quick response. Below is the DDL and DML for the problem statement.

create table RAW_PRICES
(
sec_id VARCHAR2(30) not null,
price_date DATE not null,
UNADJ_CLOSE_PRICE NUMBER
);

INSERT INTO RAW_PRICES (SEC_ID, PRICE_DATE, UNADJ_CLOSE_PRICE)
values ('S1', to_date('14-02-19', 'dd-mm-yy'), 1);

INSERT INTO RAW_PRICES (SEC_ID, PRICE_DATE, UNADJ_CLOSE_PRICE)
values ('S2', to_date('15-02-19', 'dd-mm-yy'), 1.1);

INSERT INTO RAW_PRICES (SEC_ID, PRICE_DATE, UNADJ_CLOSE_PRICE)
values ('S2', to_date('18-02-19', 'dd-mm-yy'), 2);

INSERT INTO RAW_PRICES (SEC_ID, PRICE_DATE, UNADJ_CLOSE_PRICE)
values ('S3', to_date('15-02-19', 'dd-mm-yy'), 3);

INSERT INTO RAW_PRICES (SEC_ID, PRICE_DATE, UNADJ_CLOSE_PRICE)
VALUES ('S3', TO_DATE('17-02-19', 'dd-mm-yy'), 3.1);

sec_id Price_date UNADJ_CLOSE_PRICE
S1 14-FEB-19 1
S2 15-FEB-19 1.1
S2 18-FEB-19 2
S3 15-FEB-19 3
S3 17-FEB-19 3.1

Output :-
sec_id Price_date UNADJ_CLOSE_PRICE
S1 14-FEB-19 1
S1 15-FEB-19 1
S1 16-FEB-19 1
S1 17-FEB-19 1
S1 18-FEB-19 1
S1 19-FEB-19 1
S2 15-FEB-19 1.1
S2 16-FEB-19 1.1
S2 17-FEB-19 1.1
S2 18-FEB-19 2
S2 19-FEB-19 2
S3 15-FEB-19 3
S3 16-FEB-19 3
S3 17-FEB-19 3.1
S3 18-FEB-19 3.1
S3 19-FEB-19 3.1

With the below query i am getting the desired result for the above small subset. But the table contains around 150 million records and the query is going on and on from 10 hrs and so , if i see the plan, i can see it is sorting the data for so many these records which might causing the performance issue, can this be optimised?

WITH RWS AS
( SELECT trunc(SYSDATE - LEVEL+1) AS dy FROM dual CONNECT BY LEVEL <= (SELECT TO_dATE(SYSDATE,'DD-MON-YY')-MIN(PRICE_DATE)+1 FROM RAW_PRICES)) ,
stg as
(SELECT MIN(Price_Date) over(PARTITION BY ss.sec_id ORDER BY Price_Date) MIN_PRICE_DATE_PER_SEC_ID,
(Price_Date - lag(Price_Date, 1)
over(PARTITION BY ss.sec_id ORDER BY Price_Date)) missing_days,
ss.*
FROM Raw_prices ss

)
,qty as (
SELECT NVL(missing_days, 0) AS mdays,
last_value(sec_id) IGNORE NULLS OVER(PARTITION BY sec_id ORDER BY dy ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) PREV_sec_id,
rws.dy AS running_Price_Date,
last_value(UNADJ_CLOSE_PRICE) IGNORE NULLS OVER(PARTITION BY sec_id ORDER BY dy ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) PREV_UNADJ_CLOSE_PRICE,
ss.sec_id,
ss.Price_Date,
ss.UNADJ_CLOSE_PRICE,
min(MIN_PRICE_DATE_PER_SEC_ID) over(partition by sec_id) MIN_EFF_DATE_PER_SEC
from rws
LEFT JOIN STG SS PARTITION BY(SEC_ID)
ON dy = Price_Date
)
SELECT
SEC_ID,
CASE WHEN mdays = 0 THEN running_Price_Date ELSE Price_Date END AS Price_Date,
CASE WHEN mdays = 0 THEN prev_UNADJ_CLOSE_PRICE ELSE UNADJ_CLOSE_PRICE END AS UNADJ_CLOSE_PRICE
FROM qty q
WHERE running_Price_Date >= MIN_EFF_DATE_PER_SEC
;

PLAN:-

Plan hash value: 2645050106

------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 90432 | 5475K| | 953K (1)| 00:00:38 | | |
|* 1 | VIEW | | 90432 | 5475K| | 953K (1)| 00:00:38 | | |
| 2 | WINDOW SORT | | 90432 | 4592K| 6040K| 953K (1)| 00:00:38 | | |
| 3 | VIEW | | 90432 | 4592K| | 952K (1)| 00:00:38 | | |
| 4 | NESTED LOOPS PARTITION OUTER | | 90432 | 4327K| | 952K (1)| 00:00:38 | | |
| 5 | BUFFER SORT | | | | | | | | |
| 6 | VIEW | | 1 | 6 | | 2 (0)| 00:00:01 | | |
|* 7 | CONNECT BY WITHOUT FILTERING | | | | | | | | |
| 8 | FAST DUAL | | 1 | | | 2 (0)| 00:00:01 | | |
| 9 | SORT AGGREGATE | | 1 | 8 | | | | | |
| 10 | INDEX FAST FULL SCAN | PK_RAW_PRICES | 151M| 1155M| | 188K (1)| 00:00:08 | | |
|* 11 | FILTER | | | | | | | | |
| 12 | SORT PARTITION JOIN | | 1713 | 73659 | | 862K (1)| 00:00:34 | | |
| 13 | VIEW | | 1713 | 73659 | | 697K (1)| 00:00:28 | | |
| 14 | WINDOW SORT | | 15M| 444M| 635M| 697K (1)| 00:00:28 | | |
| 15 | NESTED LOOPS | | 15M| 444M| | 568K (1)| 00:00:23 | | |
| 16 | NESTED LOOPS | | 15M| 444M| | 568K (1)| 00:00:23 | | |
| 17 | VIEW | | 9158 | 82422 | | 617 (1)| 00:00:01 | | |
| 18 | SORT UNIQUE | | 9158 | 107K| | 617 (1)| 00:00:01 | | |
|* 19 | TABLE ACCESS FULL | SECURITY_UNIVERSE | 13511 | 158K| | 615 (1)| 00:00:01 | | |
|* 20 | INDEX RANGE SCAN | PK_RAW_PRICES | 1675 | | | 10 (0)| 00:00:01 | | |
| 21 | TABLE ACCESS BY GLOBAL INDEX ROWID| RAW_PRICES | 1643 | 36146 | | 62 (0)| 00:00:01 | ROWID | ROWID |
------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("MIN_EFF_DATE_PER_SEC"<=INTERNAL_FUNCTION("RUNNING_EFFECTIVE_DATE"))
7 - filter(LEVEL<= (SELECT TO_DATE(TO_CHAR(SYSDATE@!),'DD-MON-YY')-MIN("PRICE_DATE")+1 FROM
"DTA_DERIVED"."FS_ADJUSTED_DERIVED_PRICES_RTR" "FS_ADJUSTED_DERIVED_PRICES_RTR"))
11 - filter("PRICE_DATE"=INTERNAL_FUNCTION("DY"))
19 - filter("CLASSIFICATION_CODE"='PD')
20 - access("SS"."SEC_ID"="SU"."SEC_ID")

February 20, 2019 - 2:17 pm UTC

How does the sec_universe table come into this? It's not in the query or sample data you've provided!

Adding the stats ....

basant nailwal, February 19, 2019 - 5:18 am UTC

"SQL Monitoring Report
Global Information
------------------------------
Status : EXECUTING
Instance ID : 1
SQL ID : c2n308kbf0z26
SQL Execution ID : 16777216
Execution Started : 02/19/2019 04:59:27
First Refresh Time : 02/19/2019 04:59:31
Last Refresh Time : 02/19/2019 05:08:19
Duration : 532s
Module/Action : SQL Developer/-
Program : SQL Developer

Global Stats
========================================================================

| Elapsed |   Cpu   |    IO    | Buffer | Read | Read  | Write | Write |
| Time(s) | Time(s) | Waits(s) |  Gets  | Reqs | Bytes | Reqs  | Bytes |
========================================================================
|     545 |      58 |      487 |     2M | 622K |  20GB | 20012 |   4GB |
========================================================================

SQL Plan Monitoring Details (Plan Hash Value=2645050106)
=======================================================================================================================================================================================================================================================
| Id    |                   Operation                   |              Name              |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read  | Read  | Write | Write | Mem  | Temp | Activity |        Activity Detail        | Progress |
|       |                                               |                                | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs  | Bytes | Reqs  | Bytes |      |      |   (%)    |          (# samples)          |          |
=======================================================================================================================================================================================================================================================
|     0 | SELECT STATEMENT                              |                                |         |      |           |        |     1 |          |       |       |       |       |      |      |          |                               |          |
|     1 |   VIEW                                        |                                |   90432 |   2M |           |        |     1 |          |       |       |       |       |      |      |          |                               |          |
|     2 |    WINDOW SORT                                |                                |   90432 |   2M |           |        |     1 |          |       |       |       |       |      |      |          |                               |          |
|     3 |     VIEW                                      |                                |   90432 |   2M |           |        |     1 |          |       |       |       |       |      |      |          |                               |          |
|     4 |      NESTED LOOPS PARTITION OUTER             |                                |   90432 |   2M |         1 |    +42 |     1 |        0 |       |       |       |       |      |      |          |                               |          |
|     5 |       BUFFER SORT                             |                                |         |      |        39 |     +4 |     1 |        1 |       |       |       |       | 563K |      |          |                               |          |
|     6 |        VIEW                                   |                                |       1 |    2 |        39 |     +4 |     1 |    12467 |       |       |       |       |      |      |          |                               |          |
|     7 |         CONNECT BY WITHOUT FILTERING          |                                |         |      |        39 |     +4 |     1 |    12467 |       |       |       |       |      |      |          |                               |          |
|     8 |          FAST DUAL                            |                                |       1 |    2 |         1 |     +4 |     1 |        1 |       |       |       |       |      |      |          |                               |          |
|     9 |          SORT AGGREGATE                       |                                |       1 |      |        39 |     +4 |     1 |        1 |       |       |       |       |      |      |     0.19 | Cpu (1)                       |          |
|    10 |           INDEX FAST FULL SCAN                | PK_RAW_PRICES            |    151M | 189K |        41 |     +2 |     1 |     151M | 25309 |  15GB |       |       |      |      |     7.33 | Cpu (11)                      |     100% |
|       |                                               |                                |         |      |           |        |       |          |       |       |       |       |      |      |          | db file scattered read (28)   |          |
|    11 |       FILTER                                  |                                |         |      |           |        |     1 |          |       |       |       |       |      |      |          |                               |          |
|    12 |        SORT PARTITION JOIN                    |                                |    1713 |   2M |           |        |     1 |          |       |       |       |       |      |      |          |                               |          |
|    13 |         VIEW                                  |                                |    1713 |   1M |           |        |     1 |          |       |       |       |       |      |      |          |                               |          |
|    14 |          WINDOW SORT                          |                                |     15M |   1M |       491 |    +42 |     1 |        0 |       |       | 20012 |   4GB | 104M |   4G |     6.39 | Cpu (22)                      |          |
|       |                                               |                                |         |      |           |        |       |          |       |       |       |       |      |      |          | direct path write temp (12)   |          |
|    15 |           NESTED LOOPS                        |                                |     15M | 569K |       491 |    +42 |     1 |      20M |       |       |       |       |      |      |     0.19 | Cpu (1)                       |          |
| -> 16 |            NESTED LOOPS                       |                                |     15M | 569K |       491 |    +42 |     1 |      20M |       |       |       |       |      |      |          |                               |          |
| -> 17 |             VIEW                              |                                |    9158 |  617 |       491 |    +42 |     1 |     7021 |       |       |       |       |      |      |          |                               |          |
| -> 18 |              SORT UNIQUE                      |                                |    9158 |  617 |       491 |    +42 |     1 |     7021 |       |       |       |       | 757K |      |          |                               |          |
|    19 |               TABLE ACCESS FULL               | SEC_UNIVERSE                  |   13511 |  615 |         1 |    +42 |     1 |    13511 |       |       |       |       |      |      |          |                               |          |
|    20 |             INDEX RANGE SCAN                  | PK_RAW_PRICES            |    1675 |   10 |       491 |    +42 |  105K |      20M |  8234 |  64MB |       |       |      |      |     1.50 | Cpu (3)                       |          |
|       |                                               |                                |         |      |           |        |       |          |       |       |       |       |      |      |          | db file parallel read (5)     |          |
| -> 21 |            TABLE ACCESS BY GLOBAL INDEX ROWID | RAW_PRICES     |    1643 |   62 |       492 |    +42 |   22M |      20M |  588K |   4GB |       |       |      |      |    84.40 | Cpu (21)                      |          |
|       |                                               |                                |         |      |           |        |       |          |       |       |       |       |      |      |          | db file parallel read (104)   |          |
|       |                                               |                                |         |      |           |        |       |          |       |       |       |       |      |      |          | db file scattered read (2)    |          |
|       |                                               |                                |         |      |           |        |       |          |       |       |       |       |      |      |          | db file sequential read (322) |          |

======================================================================================================================================================================================================================================================="

February 20, 2019 - 2:25 pm UTC

Well, you're processing 20 million+ rows in raw_prices:

|    20 |             INDEX RANGE SCAN                  | PK_RAW_PRICES  |    1675 |   10 |       491 |    +42 |  105K |      20M |  8234 |  64MB | 
| -> 21 |            TABLE ACCESS BY GLOBAL INDEX ROWID | RAW_PRICES     |    1643 |   62 |       492 |    +42 |   22M |      20M |  588K |   4GB |

This accounts for well over 80% of the work the plan is doing.

But I think the real problem comes from SORT UNIQUE step above this. For each row this returns, it's querying raw_prices.

What happens when you change this select distinct to an exists subquery as I suggested?

Your input matters a lot.....

A reader, February 20, 2019 - 4:32 pm UTC

"How does the sec_universe table come into this? It's not in the query or sample data you've provided! "

Yes i didn't provide the sec_universe details for simplicity ....i was just trying to provide you sample data and basic query without sec_universe table filter......

And the plan is for the actual query where i need to filter all the securities from sec_universe....

And finally i did try by removing the distinct clause and tried with Exist but there is no significant effect of this.

On your suggestion of sort unique i am little confused of its presence , don't you think it should be Hash Unique instead of sort unique operation..??

Plus if i look the plan closely, most of the time spent in the sorting and the operation direct path write/read temp is slowing the whole operation , there are lots of records which doesn't fit in the memory and hence oracle uses disc for the sorting operation. This is my observation on this, any suggestions??

February 21, 2019 - 4:25 pm UTC

i was just trying to provide you sample data and basic query without sec_universe table filter......

We need as close as possible a like-for-like comparison when it comes helping; particularly for performance questions! The join between raw_prices and sec_universe is where all your time's going.

So for us to help you effectively, we need a test case that includes all the tables your query accesses. And their indexes. And ideally insert scripts to generate a large volume of realistic data (though I understand this is hard to do).

And finally i did try by removing the distinct clause and tried with Exist but there is no significant effect of this.

So what was the plan when you did this?

On your suggestion of sort unique i am little confused of its presence , don't you think it should be Hash Unique instead of sort unique operation..??

Whether it's a sort or hash unique operation is of little consequence. The point it adding DISTINCT means has to do one of these unique steps.

This is my observation on this, any suggestions??

Find a way to reduce the number of rows you're reading!

Your Thoughts?

A reader, February 20, 2019 - 4:48 pm UTC

Also i did one step further where i created a table on top of raw_prices by selecting all the securities data required after applying the lookup from sec_universe... the intention was to check if the query will work by reducing the data set.

The query took 7+ hours to get complete and it gives me the impression that it will take that time because of below reason:-

1) The dataset which is more than 20 million+ records
2) Because of the requirement we have to use last_value analytical function to get to the previous price which involves sorting and it will be performed over entire dataset.

May be and may be such huge vol will require immense I/O + sorting hence the time is justified ??

February 21, 2019 - 4:26 pm UTC

And the plan for that was?

More to Explore

Administration

Need more information on Administration? Check out the Administrators guide for the Oracle Database

Breadcrumb

Question and Answer

You Asked

and Chris said...

Rating

Comments

Thanks for the quick response.....

Adding the stats ....

Your input matters a lot.....

Your Thoughts?

More to Explore

Administration

Performance issue with data densification process

Breadcrumb

Question and Answer

You Asked

and Chris said...

Rating

Comments

Thanks for the quick response.....

Adding the stats ....

Your input matters a lot.....

Your Thoughts?

More to Explore

Administration