Thanks for the question, Jose.

Asked: July 15, 2009 - 6:36 pm UTC

Last updated: August 25, 2009 - 9:11 am UTC

Version: 10.2.0.4

Viewed 1000+ times

You Asked

In an 9.2.0.8 EE db recently upgraded to 10.2.0.4 the developers are reporting a strange behavior : in complex queries (with EXISTS sub-queries , or views on views) the first execution is relatively slow (say, 15 minutes), but the sub-sequente executions are extremely fast (say, 3 minutes). The cache of data blocks is NOT the cause (the behaviour is the same if they do a ALTER SESSION FLUSH BUFFER CACHE and ALTER SESSION FLUSH SHARED POOL), only after a db restart the execution goes back to the 15 minutes, and the subsequent are fast, again.
And more, if we alter the SQL (example, put or remove some columns in the main query, the execution is yet fast, only if we alter substantially the sub-query the execution goes back to the 'slow' timeframe, the impression is that some kind of 'result cache' is being triggered, the results of the sub-query are being 'cached', materialized in some kind of temporary table , or something like that.
I will be trying to trace the SQLs to be more sure, but meanwhile my question is : this suposition of some kind of 'result cache' internal in 10.2.0.4 makes sense, or no, not at all ?

Regards,

Chiappa

and Tom said...

... The cache of data blocks is NOT the cause ...

sure it is, just it isn't in the database buffer cache, it is in the OS file system cache.

You have a secondary buffer cache - the OS file system cache. If you run tkprof, you'll see the same amount of physical IO's (since you say the data isn't in the cache), but the IO's will be much faster the second time. This is because we issue a physical IO request to the OS and the OS looks in it's cache and says "here you go" instead of going to disk.

see http://asktom.oracle.com/pls/ask/search?p_string=%22secondary+sga%22

there is no magic result cache here, just the OS doing what it does.

Rating

(7 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Indeed ..

J. Laurindo Chiappa, July 16, 2009 - 3:42 pm UTC

Yes, it makes sense : really the datafiles are in an cooked FS (not raw nor ASM), surely the OS cache can be the responsible... I´m struggling for an 10046 trace to confirm this (it´s not my db, I´m just trying to help my customer´s DBAs), with it I would see the same I/Os taking less time in the second exec, thus proving the possibility, yes... Let´s see.
I was thinking in the code for 11g result cache set "creeping" (being introduced but nod documented) here in 10.2.04 patchset, but if ot´s not the case, only FS cache could explain it...

Regards,

Chiappa

July 16, 2009 - 6:01 pm UTC

... I´m struggling for an 10046
trace to confirm this ...

how can the database trace file prove something that is OS level?

Basically - run the query with trace for the first time, observe it is slow. run tkprof (and have wait events in there). You'll see the wait time for db file scattered read and/or db file sequential read.

Run the query again, observe it run fast, observer (via the tkprof of the second query) that it did the same amount of physical IO. Observe that the wait events for physical IO is significantly reduced.

We did the physical IO's both times, the second time it was faster. The only cause I know of that would do that would be.....

A secondary cache we (Oracle) are not part of - a cache on a SAN, a cache in the OS of the file system. We did the physical IO's, they were just apparently faster. We cannot tell the difference between a physical IO that really went all of the way to disk vs one that was satisfied out of the SAN cache vs one that was satisfied out of the OS cache.

Simple..

J. Laurindo Chiappa, July 17, 2009 - 8:03 am UTC

"how can the database trace file prove something that is OS level? "

easy, if in the second exec (flusing the cache before the 2nd exec to 'force' PIOs, altering the SQL to 'force' a parse, etc) the total I/Os are the same, BUT the time elapsed for some of them (it´s registered in 10046) is less, it proved that some kind of external cache is in action, that´s it.

July 24, 2009 - 8:24 am UTC

then you just answered your own question.

There is some 'outside force', but what pray tell was it?

Was it due to reduced contention?
Was it due to the file system cache?
Was it due to your logical volume manager?
Was it due to a ram cache in your SAN?
Was it due to the drive spinning up after it went to sleep?
Was it due to ???????????

That the IO's were faster is recorded and presented to you (that was in fact the original input to the question here, they NOTICED that) - what caused the IO's to be faster is something you have to figure out using your knowledge of your system - it could be MANY things that cause it.

Some thoughts more...

J. Laurindo Chiappa, August 02, 2009 - 6:15 pm UTC

Yeah Tom, the 'outside db' hiypothesis id becomming more and more plausible, and your are 100% right, one cannot know the real one looking inside the db (via traces), but as I said I can 'proof' that the cause is outside.
Before I comment the rest of your answer, let me ask : in http://jonathanlewis.wordpress.com/2006/11/06/filter-subqueries/ Jonathan Lewis speak about a case where fluctuation can occur due to Oracle caching average data (data coming from a subquery), this results being stored in a hash table. Can this be ocurring in my case ? It appears to be possible (I haven´t said it yet, but most of the related plans are hash-joins), but don´t know if the FLUSH BUFFER CACHE and the FLUSH SHARED POOL clean the recent created hash tables or no... Please chime in and let us to know...

Thinking about the other possibilities pointed by you :

>> Was it due to reduced contention?

Very unlikely, when I tested last time I check, the number of sessions and the work being done in the server (CPU/RAM/DISK usage) was the same during the 'fast' 2nd run and the 'slow' first run

>> Was it due to the file system cache?

Probably, but as I said if we shutdown the db the execution becomes 'slow' again : with FS caching this would occurs or not ?

>> Was it due to your logical volume manager?

Don´t know what volume manager is being used, but according to the sysadmins no kind of caching by the manager itselfs exists, something to be proved yet

>> Was it due to a ram cache in your SAN?

Possible, the SAN have a large cache-dedicated RAM, but again, the db shutdown really would 'clear' these caches ? What is your experience regarding it ?

Chiappa

August 04, 2009 - 1:27 pm UTC

... this results being stored in a hash table.
Can this be ocurring in my case ? ...

you would see a material, measurable drop in logical IO - but you say "the IO's in the tkprof are the same" - and the hash table he speaks of is used WITHIN A SINGLE QUERY EXECUTION - not across queries (so query 1 will not save and share its results with query 2)

This is almost certainly "the file system is your secondary SGA, you might need a larger SGA instead of relying on the file system buffering to make physical IO a little faster than it would normally be"

... Possible, the SAN have a large cache-dedicated RAM, but again, the db shutdown
really would 'clear' these caches ? ...

absolutely NOT - a OS reboot would probably not clear them either - they are outside of the machine.

Other possibilities are dying, FS cache must be the answer...

J. Laurindo Chiappa, August 04, 2009 - 2:07 pm UTC

>.. this results being stored in a hash table.
>Can this be ocurring in my case ? ...

>>you would see a material, measurable drop in logical IO - but you say "the IO's in the tkprof are the same" - and the hash table he
>> speaks of is used WITHIN A SINGLE QUERY EXECUTION - not across queries (so query 1 will not save and share its results with query 2)

This point is crucial to reject JL article in my case : as I said even if , besides the caches clering, I alter a little the SQL (adding columns, removing columns, something like that, getting a new SQL) the second execution is faster, so being true that query 1 not save and share the hash table with query 2 , the possibility cited by Jonathan is dead, totally not applicable.

>> ... Possible, the SAN have a large cache-dedicated RAM, but again, the db shutdown
>> really would 'clear' these caches ? ...

> absolutely NOT - a OS reboot would probably not

ok : as I said, after a db shutdown the query is "slow" again in the first execution, so the SAN-cache being not-affected by db restarts (and even OS restarts) it can be discarded.

Yeah, filesystem cache is becomming the most probable cause, yes.... Will try augment SGA and/or, as a test, try to use in the test environment raw disks, if possible.

Again, thank you for all your answers and comments.

Regards,

Chiappa

August 04, 2009 - 3:25 pm UTC

... as I said, after a db shutdown the query is "slow" again in the first
execution, so the SAN-cache being not-affected by db restarts (and even OS
restarts) it can be discarded....

not necessarily - just means that whatever happened during the database shutdown/restart did sufficient work somewhere to flush some cache.

Sometimes you can mount file systems with 'directio', that'll disable the caching - ask your (knowledgeable) system administrator about it.

Makes me think again

Kevin Meade, August 22, 2009 - 11:46 pm UTC

This is an interesting post. I have long been interested in the performance characteristics of queries that are both PHYSICAL I/O heavy vs. being PARSE heavy vs. being PLAN GENERATION heavy. Please let me confirm some points:

1) flushing the shard pool will cause statements to be hard parsed again
2) flushing the buffer cache would require oracle to request data from the O/S again

What interested me when I first read the question and description was the phrase "VIEWS ON VIEWS" which was clearly a red-herring, because I have a simliar problem which I thought was this one but clearly not. If you think I am not diverging too much from the original question, let me ask you about extending the question to cover PLAN GENERATION HEAVY queries.

I have had a problem with complex views and typically views on views. These views often have many many joins (20,30,40 tables being joined or existentially queried). A query on these views will take several minutes to execute the first time out and then centiseconds or even milliseconds on each subsequent execution. The number of rows and amount of data returned seems not to matter. I have concluded (maybe incorrectly) that my issue is extreme plan generation costs.

My queries on these view typically take several minutes the first time they are presented to the database and then only milliseconds after that. If I flush the shared pool, but not the buffer cache, and then execute the query again, it is back to the multiple minutes followed by instant response each time after.

Assuming the issue is PLAN GENERATION HEAVY queries, what steps can I take to reduce these costs in an environment where complex layered views with many joins and existential subqueries is the norm.

In fact, I can give you an example using Oracle products. Consider Oracle Workspace Manager. OWM has this same issue. Take a typical schema of 15 or so tables and enable vesioning on every table WITHOUT OVERWRITE and WITH VALIDTIME SUPPORT. As you know, Oracle will create a view layer with instead of triggers on top of our tables in order to implement the workspace and versioning logic. A query against any single exposed view is fast enough, but when you join several of them together to answer even a modestly interesting question OWM stalls for each new query the first time it is asked.

I was looking at this approach as one possible solution for a BI-TEMPORAL versioning implementation and OWM always gave the right answer (which I expected and which impressed a lot of people who did not believe BI-TEMPORALISM could actually be done), but the first time any query that did more than two joins was asked, OWM took minutes to give the answer even if the answer was zero rows, and then every time after that took only .01 seconds or less. I attributed this to plan generation time. The query plans were some 300, 400, and 500 lines long. There was even a set of queries where each query took 35 minutes to answer the first time it was asked, after which they returned zero rows. Subsequently these queries each took .01 seconds every time after that to give the zero rows answer.

I too have complex layered views with queries that have plans of several hundred lines long. These exhibit the same behavior as OWM described above.

So assuming my issue is PLAN GENERATION costs, what can I do in this situation. I have to think there is some kind of database parameter setup I am missing, some database configuration I may need to change, or some feature I can use to reduce or remove plan generation cost from the mix. Alas I may be out of my league on this one and I am so far ahead of the curve at my workplace that there is no one here to turn to for help (or I am just so stupid that they all know it and want no part of it).

Can you comment. Thanks, Kevin.

August 25, 2009 - 9:11 am UTC

1) flushing the shard pool will cause statements to be hard parsed again
2) flushing the buffer cache would require oracle to request data from the O/S again

1) correct
2) correct - and that request to the O/S might be satisfied from the O/S file system cache, a SAN cache, or by actually doing IO to disk.

.... The number of rows and amount of data returned seems not to matter. I have concluded (maybe incorrectly) that my issue is extreme plan generation costs.
....

that would be easy to measure.

a) enable trace (that'll force a hard parse the first time by the way...). Run query so that it is "slow"
b) run it again so it is fast
c) exit sqlplus
d) tkprof with aggregate=no and see where time was spent in each case.

... If I flush the shared pool, but not the buffer cache, and then execute the query again, it is back to the multiple minutes followed by instant response each time after. ....

that would tend to lead us to "optimization is taking a really long time"

... So assuming my issue is PLAN GENERATION costs, what can I do in this situation. I have to think there is some kind of database parameter setup I am missing, some database configuration I may need to change, or some feature I can use to reduce or remove plan generation cost from the mix. Alas I may be out of my league on this one and I am so far ahead of the curve at my workplace that there is no one here to turn to for help (or I am just so stupid that they all know it and want no part of it). ....

reduce your hard parse calls, use bind variables, move everything possible into plsql so as to reduce parsing period.

avoid the views of views of views - avoid views that join 30 tables together (when you only need 5 of them). Think of a view much like a stored procedure or an API - it should do something very specific, not generic.

Thanks Tom, I always feel better when you speak

Kevin Meade, September 04, 2009 - 8:25 pm UTC

Thanks very much Tom. I did not think of trace and tkprof. I'll investigate that right away for proof of my problem. As always your responses teach me something.

If I manage to get somewhere I will respond with my results so others may benefit.

Kevin

Confirmation of disk cache impact in some cases

Cris Mooney, July 25, 2012 - 2:15 pm UTC

While this is an old thread, Kevin did not confirm his results and Tom's great advice did prove valuable to me (a forced "weak" DBA) today using Oracle Database 11g Release 11.2.0.1.0 - 64bit. This info should be useful for most any version of any DB, warranting late web page update since it comes up high in Google.

My query ran a few minutes in the AM for customers, and yet less than a second the rest of the day. Optimizing this dynamically generated query would be challenging, and we would rather leave Oracle to it's own devices in this case rather than have to deal with it each upgrade. Moreover, even to optimize I needed to "recreate", which proved illusive.

My question was: why the unpredictable variation? My suspicion was that nightly tasks flushed some cached data, but not knowing what, and how, I could not recreate it. And, frankly, I was surprised by the EXTREME variation. Spend speculative time on query optimization, and it might be wasted.

Using Tom's insight on a simpler test system with local disks, I finally found that flushing both the the OS disk cache AND the Oracle BUFFER_CACHE allowed me to reproduce. This confirmed my expectations: nightly backup and cleanup tasks that happen during mostly idle system times must cause these caches to clear. Most importantly: reproducible, we can now look into work-arounds.

Specifically, running both "sync; echo 3 > /proc/sys/vm/drop_caches" (clear linux disk cache), and then as Oracle system user "alter system flush BUFFER_CACHE" (clear a relevant Oracle cache), gave a reproducible 3 min query time. Clearing just one of these caches did not have significant impact (less than a second), though clearing the Oracle cache had more immediate impact as one would predict.

Point: caching can dramatically impact performance not indicated in anyway by an explain plan (nor effecting it), and this performance can be illusive to diagnose with so many cache points. While advanced folks might delve into complexity of tkprof and logs, simpler cache flush tools (if commands are known) may be sufficient for others who have less time and ability. My solution is now to either improve the query with bad performance reproducible and/or force morning operations that preload the cache(s) at suspect times. At least I know the problem, and have a reproducible test case.

Relevant commands:

Linux (RedHat/CentOS) disk read cache flush:

root$ sync; echo 3 > /proc/sys/vm/drop_caches

Oracle cache flushing, logged in as "system":

sql> ALTER system FLUSH BUFFER_CACHE;
sql> ALTER system FLUSH SHARED_POOL;
sql> ALTER system FLUSH GLOBAL CONTEXT;

sql> select * from dba_tablespaces;
sql> ALTER TABLESPACE <TABLE SPACE NAME> OFFLINE;
sql> ALTER TABLESPACE <TABLE SPACE NAME> ONLINE;

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

Indeed ..

Simple..

Some thoughts more...

Other possibilities are dying, FS cache must be the answer...

Makes me think again

Thanks Tom, I always feel better when you speak

Confirmation of disk cache impact in some cases

More to Explore

PL/SQL demos

PL/SQL docs

Result cache for subqueries in 10gr2 ?

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

Indeed ..

Simple..

Some thoughts more...

Other possibilities are dying, FS cache must be the answer...

Makes me think again

Thanks Tom, I always feel better when you speak

Confirmation of disk cache impact in some cases

More to Explore

PL/SQL demos

PL/SQL docs