Thanks for the quick response.....
basant nailwal, February 19, 2019 - 4:24 am UTC
Thanks Chris for quick response. Below is the DDL and DML for the problem statement.
create table RAW_PRICES
(
sec_id VARCHAR2(30) not null,
price_date DATE not null,
UNADJ_CLOSE_PRICE NUMBER
);
INSERT INTO RAW_PRICES (SEC_ID, PRICE_DATE, UNADJ_CLOSE_PRICE)
values ('S1', to_date('14-02-19', 'dd-mm-yy'), 1);
INSERT INTO RAW_PRICES (SEC_ID, PRICE_DATE, UNADJ_CLOSE_PRICE)
values ('S2', to_date('15-02-19', 'dd-mm-yy'), 1.1);
INSERT INTO RAW_PRICES (SEC_ID, PRICE_DATE, UNADJ_CLOSE_PRICE)
values ('S2', to_date('18-02-19', 'dd-mm-yy'), 2);
INSERT INTO RAW_PRICES (SEC_ID, PRICE_DATE, UNADJ_CLOSE_PRICE)
values ('S3', to_date('15-02-19', 'dd-mm-yy'), 3);
INSERT INTO RAW_PRICES (SEC_ID, PRICE_DATE, UNADJ_CLOSE_PRICE)
VALUES ('S3', TO_DATE('17-02-19', 'dd-mm-yy'), 3.1);
sec_id Price_date UNADJ_CLOSE_PRICE
S1 14-FEB-19 1
S2 15-FEB-19 1.1
S2 18-FEB-19 2
S3 15-FEB-19 3
S3 17-FEB-19 3.1
Output :-
sec_id Price_date UNADJ_CLOSE_PRICE
S1 14-FEB-19 1
S1 15-FEB-19 1
S1 16-FEB-19 1
S1 17-FEB-19 1
S1 18-FEB-19 1
S1 19-FEB-19 1
S2 15-FEB-19 1.1
S2 16-FEB-19 1.1
S2 17-FEB-19 1.1
S2 18-FEB-19 2
S2 19-FEB-19 2
S3 15-FEB-19 3
S3 16-FEB-19 3
S3 17-FEB-19 3.1
S3 18-FEB-19 3.1
S3 19-FEB-19 3.1
With the below query i am getting the desired result for the above small subset. But the table contains around 150 million records and the query is going on and on from 10 hrs and so , if i see the plan, i can see it is sorting the data for so many these records which might causing the performance issue, can this be optimised?
WITH RWS AS
( SELECT trunc(SYSDATE - LEVEL+1) AS dy FROM dual CONNECT BY LEVEL <= (SELECT TO_dATE(SYSDATE,'DD-MON-YY')-MIN(PRICE_DATE)+1 FROM RAW_PRICES)) ,
stg as
(SELECT MIN(Price_Date) over(PARTITION BY ss.sec_id ORDER BY Price_Date) MIN_PRICE_DATE_PER_SEC_ID,
(Price_Date - lag(Price_Date, 1)
over(PARTITION BY ss.sec_id ORDER BY Price_Date)) missing_days,
ss.*
FROM Raw_prices ss
)
,qty as (
SELECT NVL(missing_days, 0) AS mdays,
last_value(sec_id) IGNORE NULLS OVER(PARTITION BY sec_id ORDER BY dy ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) PREV_sec_id,
rws.dy AS running_Price_Date,
last_value(UNADJ_CLOSE_PRICE) IGNORE NULLS OVER(PARTITION BY sec_id ORDER BY dy ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) PREV_UNADJ_CLOSE_PRICE,
ss.sec_id,
ss.Price_Date,
ss.UNADJ_CLOSE_PRICE,
min(MIN_PRICE_DATE_PER_SEC_ID) over(partition by sec_id) MIN_EFF_DATE_PER_SEC
from rws
LEFT JOIN STG SS PARTITION BY(SEC_ID)
ON dy = Price_Date
)
SELECT
SEC_ID,
CASE WHEN mdays = 0 THEN running_Price_Date ELSE Price_Date END AS Price_Date,
CASE WHEN mdays = 0 THEN prev_UNADJ_CLOSE_PRICE ELSE UNADJ_CLOSE_PRICE END AS UNADJ_CLOSE_PRICE
FROM qty q
WHERE running_Price_Date >= MIN_EFF_DATE_PER_SEC
;
PLAN:-
Plan hash value: 2645050106
------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 90432 | 5475K| | 953K (1)| 00:00:38 | | |
|* 1 | VIEW | | 90432 | 5475K| | 953K (1)| 00:00:38 | | |
| 2 | WINDOW SORT | | 90432 | 4592K| 6040K| 953K (1)| 00:00:38 | | |
| 3 | VIEW | | 90432 | 4592K| | 952K (1)| 00:00:38 | | |
| 4 | NESTED LOOPS PARTITION OUTER | | 90432 | 4327K| | 952K (1)| 00:00:38 | | |
| 5 | BUFFER SORT | | | | | | | | |
| 6 | VIEW | | 1 | 6 | | 2 (0)| 00:00:01 | | |
|* 7 | CONNECT BY WITHOUT FILTERING | | | | | | | | |
| 8 | FAST DUAL | | 1 | | | 2 (0)| 00:00:01 | | |
| 9 | SORT AGGREGATE | | 1 | 8 | | | | | |
| 10 | INDEX FAST FULL SCAN | PK_RAW_PRICES | 151M| 1155M| | 188K (1)| 00:00:08 | | |
|* 11 | FILTER | | | | | | | | |
| 12 | SORT PARTITION JOIN | | 1713 | 73659 | | 862K (1)| 00:00:34 | | |
| 13 | VIEW | | 1713 | 73659 | | 697K (1)| 00:00:28 | | |
| 14 | WINDOW SORT | | 15M| 444M| 635M| 697K (1)| 00:00:28 | | |
| 15 | NESTED LOOPS | | 15M| 444M| | 568K (1)| 00:00:23 | | |
| 16 | NESTED LOOPS | | 15M| 444M| | 568K (1)| 00:00:23 | | |
| 17 | VIEW | | 9158 | 82422 | | 617 (1)| 00:00:01 | | |
| 18 | SORT UNIQUE | | 9158 | 107K| | 617 (1)| 00:00:01 | | |
|* 19 | TABLE ACCESS FULL | SECURITY_UNIVERSE | 13511 | 158K| | 615 (1)| 00:00:01 | | |
|* 20 | INDEX RANGE SCAN | PK_RAW_PRICES | 1675 | | | 10 (0)| 00:00:01 | | |
| 21 | TABLE ACCESS BY GLOBAL INDEX ROWID| RAW_PRICES | 1643 | 36146 | | 62 (0)| 00:00:01 | ROWID | ROWID |
------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("MIN_EFF_DATE_PER_SEC"<=INTERNAL_FUNCTION("RUNNING_EFFECTIVE_DATE"))
7 - filter(LEVEL<= (SELECT TO_DATE(TO_CHAR(SYSDATE@!),'DD-MON-YY')-MIN("PRICE_DATE")+1 FROM
"DTA_DERIVED"."FS_ADJUSTED_DERIVED_PRICES_RTR" "FS_ADJUSTED_DERIVED_PRICES_RTR"))
11 - filter("PRICE_DATE"=INTERNAL_FUNCTION("DY"))
19 - filter("CLASSIFICATION_CODE"='PD')
20 - access("SS"."SEC_ID"="SU"."SEC_ID")
February 20, 2019 - 2:17 pm UTC
How does the sec_universe table come into this? It's not in the query or sample data you've provided!
Adding the stats ....
basant nailwal, February 19, 2019 - 5:18 am UTC
"SQL Monitoring Report
Global Information
------------------------------
Status : EXECUTING
Instance ID : 1
SQL ID : c2n308kbf0z26
SQL Execution ID : 16777216
Execution Started : 02/19/2019 04:59:27
First Refresh Time : 02/19/2019 04:59:31
Last Refresh Time : 02/19/2019 05:08:19
Duration : 532s
Module/Action : SQL Developer/-
Program : SQL Developer
Global Stats
========================================================================
| Elapsed | Cpu | IO | Buffer | Read | Read | Write | Write |
| Time(s) | Time(s) | Waits(s) | Gets | Reqs | Bytes | Reqs | Bytes |
========================================================================
| 545 | 58 | 487 | 2M | 622K | 20GB | 20012 | 4GB |
========================================================================
SQL Plan Monitoring Details (Plan Hash Value=2645050106)
=======================================================================================================================================================================================================================================================
| Id | Operation | Name | Rows | Cost | Time | Start | Execs | Rows | Read | Read | Write | Write | Mem | Temp | Activity | Activity Detail | Progress |
| | | | (Estim) | | Active(s) | Active | | (Actual) | Reqs | Bytes | Reqs | Bytes | | | (%) | (# samples) | |
=======================================================================================================================================================================================================================================================
| 0 | SELECT STATEMENT | | | | | | 1 | | | | | | | | | | |
| 1 | VIEW | | 90432 | 2M | | | 1 | | | | | | | | | | |
| 2 | WINDOW SORT | | 90432 | 2M | | | 1 | | | | | | | | | | |
| 3 | VIEW | | 90432 | 2M | | | 1 | | | | | | | | | | |
| 4 | NESTED LOOPS PARTITION OUTER | | 90432 | 2M | 1 | +42 | 1 | 0 | | | | | | | | | |
| 5 | BUFFER SORT | | | | 39 | +4 | 1 | 1 | | | | | 563K | | | | |
| 6 | VIEW | | 1 | 2 | 39 | +4 | 1 | 12467 | | | | | | | | | |
| 7 | CONNECT BY WITHOUT FILTERING | | | | 39 | +4 | 1 | 12467 | | | | | | | | | |
| 8 | FAST DUAL | | 1 | 2 | 1 | +4 | 1 | 1 | | | | | | | | | |
| 9 | SORT AGGREGATE | | 1 | | 39 | +4 | 1 | 1 | | | | | | | 0.19 | Cpu (1) | |
| 10 | INDEX FAST FULL SCAN | PK_RAW_PRICES | 151M | 189K | 41 | +2 | 1 | 151M | 25309 | 15GB | | | | | 7.33 | Cpu (11) | 100% |
| | | | | | | | | | | | | | | | | db file scattered read (28) | |
| 11 | FILTER | | | | | | 1 | | | | | | | | | | |
| 12 | SORT PARTITION JOIN | | 1713 | 2M | | | 1 | | | | | | | | | | |
| 13 | VIEW | | 1713 | 1M | | | 1 | | | | | | | | | | |
| 14 | WINDOW SORT | | 15M | 1M | 491 | +42 | 1 | 0 | | | 20012 | 4GB | 104M | 4G | 6.39 | Cpu (22) | |
| | | | | | | | | | | | | | | | | direct path write temp (12) | |
| 15 | NESTED LOOPS | | 15M | 569K | 491 | +42 | 1 | 20M | | | | | | | 0.19 | Cpu (1) | |
| -> 16 | NESTED LOOPS | | 15M | 569K | 491 | +42 | 1 | 20M | | | | | | | | | |
| -> 17 | VIEW | | 9158 | 617 | 491 | +42 | 1 | 7021 | | | | | | | | | |
| -> 18 | SORT UNIQUE | | 9158 | 617 | 491 | +42 | 1 | 7021 | | | | | 757K | | | | |
| 19 | TABLE ACCESS FULL | SEC_UNIVERSE | 13511 | 615 | 1 | +42 | 1 | 13511 | | | | | | | | | |
| 20 | INDEX RANGE SCAN | PK_RAW_PRICES | 1675 | 10 | 491 | +42 | 105K | 20M | 8234 | 64MB | | | | | 1.50 | Cpu (3) | |
| | | | | | | | | | | | | | | | | db file parallel read (5) | |
| -> 21 | TABLE ACCESS BY GLOBAL INDEX ROWID | RAW_PRICES | 1643 | 62 | 492 | +42 | 22M | 20M | 588K | 4GB | | | | | 84.40 | Cpu (21) | |
| | | | | | | | | | | | | | | | | db file parallel read (104) | |
| | | | | | | | | | | | | | | | | db file scattered read (2) | |
| | | | | | | | | | | | | | | | | db file sequential read (322) | |
======================================================================================================================================================================================================================================================="
February 20, 2019 - 2:25 pm UTC
Well, you're processing 20 million+ rows in raw_prices:
| 20 | INDEX RANGE SCAN | PK_RAW_PRICES | 1675 | 10 | 491 | +42 | 105K | 20M | 8234 | 64MB |
| -> 21 | TABLE ACCESS BY GLOBAL INDEX ROWID | RAW_PRICES | 1643 | 62 | 492 | +42 | 22M | 20M | 588K | 4GB |
This accounts for well over 80% of the work the plan is doing.
But I think the real problem comes from SORT UNIQUE step above this. For each row this returns, it's querying raw_prices.
What happens when you change this select distinct to an exists subquery as I suggested?
Your input matters a lot.....
A reader, February 20, 2019 - 4:32 pm UTC
"How does the sec_universe table come into this? It's not in the query or sample data you've provided! "
Yes i didn't provide the sec_universe details for simplicity ....i was just trying to provide you sample data and basic query without sec_universe table filter......
And the plan is for the actual query where i need to filter all the securities from sec_universe....
And finally i did try by removing the distinct clause and tried with Exist but there is no significant effect of this.
On your suggestion of sort unique i am little confused of its presence , don't you think it should be Hash Unique instead of sort unique operation..??
Plus if i look the plan closely, most of the time spent in the sorting and the operation direct path write/read temp is slowing the whole operation , there are lots of records which doesn't fit in the memory and hence oracle uses disc for the sorting operation. This is my observation on this, any suggestions??
February 21, 2019 - 4:25 pm UTC
i was just trying to provide you sample data and basic query without sec_universe table filter......
We need as close as possible a like-for-like comparison when it comes helping; particularly for performance questions! The join between raw_prices and sec_universe is where all your time's going.
So for us to help you effectively, we need a test case that includes all the tables your query accesses. And their indexes. And ideally insert scripts to generate a large volume of realistic data (though I understand this is hard to do).
And finally i did try by removing the distinct clause and tried with Exist but there is no significant effect of this.
So what was the plan when you did this?
On your suggestion of sort unique i am little confused of its presence , don't you think it should be Hash Unique instead of sort unique operation..??
Whether it's a sort or hash unique operation is of little consequence. The point it adding DISTINCT means has to do one of these unique steps.
This is my observation on this, any suggestions??
Find a way to reduce the number of rows you're reading!
Your Thoughts?
A reader, February 20, 2019 - 4:48 pm UTC
Also i did one step further where i created a table on top of raw_prices by selecting all the securities data required after applying the lookup from sec_universe... the intention was to check if the query will work by reducing the data set.
The query took 7+ hours to get complete and it gives me the impression that it will take that time because of below reason:-
1) The dataset which is more than 20 million+ records
2) Because of the requirement we have to use last_value analytical function to get to the previous price which involves sorting and it will be performed over entire dataset.
May be and may be such huge vol will require immense I/O + sorting hence the time is justified ??
February 21, 2019 - 4:26 pm UTC
And the plan for that was?