It's subqueries (with aggregations) in general, not just CTEs
Anthony Munoz, September 14, 2016 - 2:23 pm UTC
Thanks for the reply Connor. The 4 sec to 5 min problem happened much less frequently in Prod than in the lower life cycle I was testing in. (Happened once in Prod though with a larger data set, went from 2 sec to me having to kill it after 2.5 hours.) Haven't dug into the plan or statistics yet to pinpoint the cause.
That said, that was step one of my task to improve the performance of a 600+ line query that had 17 correlated subqueries with aggregations, plus unions, etc.
I took the same approach with each successive aggregation, adding them as "stand-alone" subqueries to which we can join. Performance degraded significantly with each. In the end there was a modest 50% reduction in execution time; from 1hr 12min to 34 min in one example.
with my_targ_pop as (select id1, id2 ...)
select col1, agg1.sum, nvl(agg2.sum,0)...
from my_targ_pop
join (sum(col)...) agg1
left join (sum(col)...) agg2
...
Then I noticed nearly every subquery ran isolated as a standalone query in a second or less. I tried using the exact same subqueries with indexed GTTs:
delete gtt1;
delete gtt2;
insert into gtt1
select sum()...;
insert into gtt2
select sum()...;
insert into gtt2
select another sum()...;
...
select col1, col2...
from gtt1
left join gtt2 on something...
left join gtt2 on something else...
...
Total execution time dropped from 1hr 12min to < 7 sec.
So, if I wasn't before, now I'm completely floored by the lack of local temp tables. In order to make this work in Prod I had to find someone with the rights to create and alter any (even permanent) tables so they could run my GTT create scripts. Then they had to explicitly grant me permissions to insert and delete.
Honestly it seems rather ridiculous that I need to go through all that to generate this ad-hoc report. Am I missing something? Any ideas you might suggest as a viable alternate approach without GTTs (short of building a data warehouse)?
Simpler Example
Anthony Munoz, September 15, 2016 - 9:08 pm UTC
More related to my original issue. In lieu of SQL Server's SELECT TOP syntax, here's my actual terribly simple query.
select /*+ GATHER_PLAN_STATISTICS */ * from
(select * from posting p
where nvl(p.amount, -999999) <> nvl(p.original_amount, -999999)
and p.amount <> 0
order by id desc)
where rownum <= 1
;
Run instantaneously of course, the FIRST time I run it (in Prod). I mean how much effort does it take to return a single row based on a descending PK? But the SECOND time I run the same untouched query (I'm beginning to think Oracle is trying to discourage me from running the same query more than once) I'm not sure how long it takes. It's been running for > 6 minutes and I'm still watching as I type this.
Obviously I'm trying to use gather_plan_statistics. Is this 'cardinality feedback' something I should expect to encounter often? 8 minutes and counting...
September 15, 2016 - 10:14 pm UTC
Since you are using gather_plan_statistics, you can send us the output (in code tags) from:
select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
for each execution.
Stats
Anthony Munoz, September 16, 2016 - 2:30 pm UTC
Yesterday I killed it after 18 minutes. Also worth mentioning, yesterday I could add pretty much anything to the where clause (e.g. 'and 1=1') and it would be nearly instantaneous again, but only once of course followed by the same problem.
Today the difference is much less dramatic: < 1 sec then 7 or 8 sec. First time it's using the PK index, second time looks like a table scan.
BEFORE (< 1sec)
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID 7bdyv5nkpnc40, child number 0
-------------------------------------
select /*+ GATHER_PLAN_STATISTICS */ * from (select * from
cycowner.b_posting p where nvl(p.amount, -999999) <>
nvl(p.original_amount, -999999) and p.amount <> 0 -- and
p.original_amount <> 0 -- and 1=1 order by id desc) where rownum <= 1
Plan hash value: 3888742477
--------------------------------------------------------------------------------
-----------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Row
s | A-Time | Buffers | Reads |
--------------------------------------------------------------------------------
-----------------------------------
| 0 | SELECT STATEMENT | | 1 | |
1 |00:00:00.14 | 211 | 29 |
|* 1 | COUNT STOPKEY | | 1 | |
1 |00:00:00.14 | 211 | 29 |
| 2 | VIEW | | 1 | 1 |
1 |00:00:00.14 | 211 | 29 |
|* 3 | TABLE ACCESS BY INDEX ROWID| B_POSTING | 1 | 19M|
1 |00:00:00.14 | 211 | 29 |
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
| 4 | INDEX FULL SCAN DESCENDING| B_POSTING_PKEY | 1 | 3 | 210
3 |00:00:00.01 | 11 | 6 |
--------------------------------------------------------------------------------
-----------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=1)
3 - filter(("P"."AMOUNT"<>0 AND "P"."AMOUNT"<>NVL("P"."ORIGINAL_AMOUNT",(-999
999))))
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
25 rows selectedAFTER (7-8 sec)
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID 7bdyv5nkpnc40, child number 1
-------------------------------------
select /*+ GATHER_PLAN_STATISTICS */ * from (select * from
cycowner.b_posting p where nvl(p.amount, -999999) <>
nvl(p.original_amount, -999999) and p.amount <> 0 -- and
p.original_amount <> 0 -- and 1=1 order by id desc) where rownum <= 1
Plan hash value: 904310323
--------------------------------------------------------------------------------
---------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Tim
e | Buffers | Reads | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------
---------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:09
.30 | 140K| 140K| | | |
|* 1 | COUNT STOPKEY | | 1 | | 1 |00:00:09
.30 | 140K| 140K| | | |
| 2 | VIEW | | 1 | 1 | 1 |00:00:09
.30 | 140K| 140K| | | |
|* 3 | SORT ORDER BY STOPKEY| | 1 | 1 | 1 |00:00:09
.30 | 140K| 140K| 12M| 1362K| 11M (0)|
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
|* 4 | TABLE ACCESS FULL | B_POSTING | 1 | 1 | 1486K|00:00:06
.00 | 140K| 140K| | | |
--------------------------------------------------------------------------------
---------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=1)
3 - filter(ROWNUM<=1)
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
4 - filter(("P"."AMOUNT"<>0 AND "P"."AMOUNT"<>NVL("P"."ORIGINAL_AMOUNT",(-999
999))))
Note
-----
- cardinality feedback used for this statement
30 rows selectedWorth mentioning my boss contacted our off-site DBA group and this was their reply:
I have reviewed the base table B_POSTING and it has 736M+ records. It seems to me that table B_POSTING and associated indexes are not analyzed by the auto scheduled maintenance job. This is because of oracle optimizer does not consider the stat on that table and indexes stale. To me it sounds like this is very large table and data set. Table and associated indexes are highly fragmented which is one of the reason for optimizer choosing the different plan. I feel, the data which is very old, can be archived in other table and rebuild the tables and associated indexes. I do not feel that Oracle Partitioning is an option in this case as database is running in Standard Edition. Partitioning the table could have helped in this case, but it needs enterprise edition.
You may also think about creating stored outline as mentioned in article
https://oracle-base.com/articles/misc/outlines This will ensure that optimal plan is used for sql statements.Granted I'm not an admin and it's a relatively large fragmented table but I still don't understand how the optimizer could strikeout so completely on the second run of each simple query yesterday, even with the worst available plan.
September 17, 2016 - 2:15 am UTC
Here's my hypothesis for you.
On the assumption that your 'amount' columns are not indexed , then the query can be thought of as equivalent to the simple case of:
select *
from (
select *
from
cycowner.b_posting p
where [some_condition]
order by id desc)
where rownum <= 1
So we've got two options here
a) walk backwards through the index until we meet [some_condition]
b) full scan
Now there is nothing to assist the optimizer with knowing just *how far* we need to walk back through that index - it could be one row, it could be all of them. So the optimizer went with a default percentage guess which came out to your 19M rows. When the query actually ran, the actuals were *way* different, and we marked the query as being worth another look (cardinality feedback) and we ended up with the full scan. Ultimately, when the query is aged out of the library cache, we do a re-reparse and the situation repeats itself.
Since in this case *we* know more than the optimizer, ie, that the descending scan is likely to get a match *quickly*, then this is a case where you could use an outline or sql plan baseline to lock down the query so that you dont bounce back-and-forth.
The comments about stats, and fragmentation and old data etc are not really pertinent or useful to this discussion.
Oops, it IS as bad as yesterday
Anthony Munoz, September 16, 2016 - 5:25 pm UTC
Doing too many things at once...
I just realized that it's not actually performing any better today. I forgot that in order to gather stats I had to switch back to a lower-life cycle environment where I had the necessary permissions; much less data but still a similar (albeit less extreme) issue.
In Prod it's still running in < 1 sec for the first run, who knows how long for subsequent runs... many minutes and counting again.
4 hours plus
Anthony Munoz, September 16, 2016 - 9:16 pm UTC
More exactly, I'm about to kill my select where rownum <= 1 query after 3 hours and 58 minutes.
This is an incredibly important table and we would have heard by now if applications or files were being affected by some sort of systemic problem. This is crazy.
But more than 4 hours!?!?!?
Anthony Munoz, September 19, 2016 - 2:53 pm UTC
Based on the latest followup it sounds as though it's not terribly surprising that Oracle could take < 1 sec to run a simple-as-ca-be query and then based on 'cardinality feedback' decide on a plan that takes who-knows-how-much-longer-than 4 hours!?
And if I want to run this query more than once it's up to ME to tell the optimizer what to do!?!?
Really!?!?
Anything else I could look at since you're telling me the response from our off-site "DBA's" was more or less irrelevant?