Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Venugopal.

Asked: November 26, 2002 - 7:51 pm UTC

Last updated: January 22, 2013 - 1:39 am UTC

Version: 8.1.7

Viewed 100K+ times! This question is

You Asked

Tom,
Have heard DBA's mention everytime to keep the logical reads low. But, when asked why, they can't come with an answer. When a block is requested by a query, Oracle looks for it in the Buffer Cache and if found, it results in a Logical read and if it does not find the Block in there it results in a physical read (disk I/O). I understand that physical I/O is the most expensive component and finding the block in the buffer will reduce most of the work.

Am i absolutely wrong? If not, why is Logical I/O harmful?

Can you please throw light on this.

regards

and Tom said...

goto

</code> http://www.hotsos.com/ <code>

download the "are you still using cache hit ratios"

and keep an open mind.

the less work we do -- well -- the less work we do... The less LIO's you do, the less WORK you do.

You have good DBA's -- keep them.


When tuning a query -- I ignore PIO's. I concentrate 100% on reducing LIO's.

Rating

  (163 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Logical Reads vs Physical Reads

Sitarama Peruvel, November 26, 2002 - 11:19 pm UTC

Tom i am not able to navigate to the website mentioned. Can you please confirm

Tom Kyte
November 27, 2002 - 7:08 am UTC

click on it again -- it is a "real" URL, it is a real site -- hotsos.com. The link works for me.

reduce lio - i'm missing something here ?

Ram, November 26, 2002 - 11:41 pm UTC

please correct me if my understanding is wrong.
LIO - from/to the buffer cache ( not disk ).
PIO - to/from the disk.
correct ?

in many of the threads on tuning i notice that you mention, the less the number of PIO as well as LIO the better. i can understand the 'physical' part of it but not the logical.
the query has to get the results and it has to do as much work as is necessary to get it, isnt it ? so at least that much of LIO is a must. how can we reduce that ?

Tom Kyte
November 27, 2002 - 7:47 am UTC

LIO = logical = buffer cache. LIO *may* have incurred a PIO in order to get into the cache in the first place.

PIO = physical = disk read

Consider this example. Pretend I am one of those people (of which there are many) that believe "if my query ain't using an index, it is broken". So, I have a table and query:

big_table@ORA817DEV.US.ORACLE.COM> select object_id, object_name from big_table order by object_id
2 /

1576192 rows selected.


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=9086 Card=1576192 Bytes=47285760)
1 0 SORT (ORDER BY) (Cost=9086 Card=1576192 Bytes=47285760)
2 1 TABLE ACCESS (FULL) OF 'BIG_TABLE' (Cost=658 Card=1576192 Bytes=47285760)




Statistics
----------------------------------------------------------
0 recursive calls
117 db block gets
22861 consistent gets
physical reads
0 redo size
63003739 bytes sent via SQL*Net to client
11664194 bytes received via SQL*Net from client
105081 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
1576192 rows processed

Now, being that person who ferverently believes "full scan = evil, index = goodness", I "fix" the optimizer via:

big_table@ORA817DEV.US.ORACLE.COM> select /*+ index( big_table big_table_idx2 ) */ object_id, object_name
2 from big_table order by object_id
3 /

1576192 rows selected.


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1579626 Card=1576192 Bytes=47285760)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'BIG_TABLE' (Cost=1579626 Card=1576192 Bytes=47285760)
2 1 INDEX (FULL SCAN) OF 'BIG_TABLE_IDX2' (NON-UNIQUE) (Cost=3434 Card=1576192)




Statistics
----------------------------------------------------------
8 recursive calls
0 db block gets
1684470 consistent gets
physical reads
0 redo size
63003739 bytes sent via SQL*Net to client
11664194 bytes received via SQL*Net from client
105081 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1576192 rows processe

Now, same answer -- same number of rows -- which is better? (yes, I purposely removed PIO's -- assume they are the same).

Each consistent get is a latch, latches = contention. Over abused indexes can be a killer for scaling. A runstats of the above showed me:

FULL SCAN INDEX
LATCH.cache buffers chains 67428 6547789 6480361

6.4 million more cbc latches - ugg.

pio many times == lio

ls cheng, November 27, 2002 - 6:02 am UTC

hi Ram from India

many times although Oracle interpret read as Physical they are actually Logical I/O from filesystem cache so you cant really say all physical I/O seen from v$sysstat are PIO

Many SQL statements reads more blocks than necessary, wehn we say we want to reduce I/O we are not saying we want to reduce the resultset, rather reduce extra reads. For example reading index blocks when not necessary. LIO uses CPU, if you are using unix/linux check processes that uses 100% CPU and you will see they are processes which run massive SQL statements which reads tons of blocks (very high buffer gets which are LIO)


regards

A reader, November 27, 2002 - 8:13 am UTC

Hi Tom

"When tuning a query -- I ignore PIO's. I concentrate 100% on reducing LIO's. "

Very strange........(For 90% DBA) Through some light on it.

I guess you are doing this for more scalability. Am i right?

Tom Kyte
November 27, 2002 - 9:24 am UTC

Scalability and because removing PIO's will come *naturally* by reducing LIO's.

Sure, I can reduce PIO's by increasing the buffer cache. But have I fixed the problem? No, maybe I just made it worse (read that paper I refer to!!!)

Thanks

Venugopal, November 27, 2002 - 9:48 am UTC

Tom,
Thanks for the response.

I need more clarity on your example showing table access by index scan and FTS. Wasn't that a bad example for index hinting because it was a query without any WHERE clause.

My question is -
Consistent gets of the second query (using index) is alarmingly high. Is it because it tries to find the index blocks in the buffer cache, and, would the setting of OPTIMIZER_INDEX_CACHING to a low value (say default value of 0) bring down the consistent gets in this case?

We shouldn't be using indexes at all in this case...but just curious.

regards



Tom Kyte
November 27, 2002 - 10:15 am UTC

where clause or no -- doesn't matter. Add where object_name = 'FRANK' to the query if you like... better yet, add "where object_id > 0". Here, I'll make it classic.

Fake question: "whats wrong with the cbo"
tom -- i analyzed my table and believe it or not, the cbo won't use the index. when I hint to use rule it does... consider:

big_table@ORA817DEV.US.ORACLE.COM> select object_id, object_name from big_table
2 where object_id > 0 order by object_id;

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=15189 Card=1576192 Bytes=33100032)
1 0 SORT (ORDER BY) (Cost=15189 Card=1576192 Bytes=33100032)
2 1 TABLE ACCESS (FULL) OF 'BIG_TABLE' (Cost=658 Card=1576192 Bytes=33100032)



big_table@ORA817DEV.US.ORACLE.COM> select /*+ RULE */ object_id, object_name from big_table
2 where object_id > 0 order by object_id;

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=HINT: RULE
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'BIG_TABLE'
2 1 INDEX (RANGE SCAN) OF 'BIG_TABLE_IDX2' (NON-UNIQUE)


end of fake question....

some people believe "i created this index, i've already sorted that data, why do I want to sort it again, darn database won't use the index, aren't those indexes for sorting after all"


The consistent gets cannot, will not be affected by any of the optimizer_* settings. they affect how the optimizer costs things, they have no effect on how things are actually processed.


My point here is people go nested loops happy on really big things cause "full table scans are bad" (NOT). They end up hinting to use indexes or using the RBO which is "index happy" (index crazy sometimes is more like it) and two things happen:

o they do tons more LIO's then they should (but they feel DARN GOOD about it cause their indexes are being used)

o they inflate their silly cache hit ratio (hey, there is a new idea -- a cache hit ratio ABOVE 95% is a bad thing -- means you are doing too much LIO probably, you need to work to reduce that). This inflation of the cache hit ratio (as they read and re-read and re-read the same blocks over and over and over and over and over) makes they feel *really* good, like they've done their tuning job (cause we all know that a LIO is 10,000 times faster then a PIO right -- not, and not that it EVEN MATTERS)



A reader, November 27, 2002 - 9:55 am UTC

Each consistent get is a latch, latches = contention

---Can you elaborate that a little? Thank you.

Tom Kyte
November 27, 2002 - 10:49 am UTC

to get a block from the cache, we have to latch (to prevent someone from modifying the data structures we are currently reading).

A latch is a lock.

Locks are serialization devices

Serialization devices inhibit scalability, the more you use them, the less concurrency you get.



Got it

Ram, November 28, 2002 - 3:40 am UTC

great explanation !

so the bottom line seems to be - " tune so that you get your results doing ONLY as much work as is required using MINIMUM possible resources "
and after that ...stop. Dont overtune it :)



Tom Kyte
November 28, 2002 - 8:04 am UTC

I would modify that slightly:

DEVELOP so that you get your results......


but yes, well said.

Why is this so?

A reader, November 28, 2002 - 8:31 pm UTC

Tom,

Iam just pasting your sentence here.

"This inflation of the cache hit ratio (as
they read and re-read and re-read the same blocks over and over and over and over and over)"

Can you please explain why a block is re-read again? Re-read from where, buffer or disk? If it is buffer, then isn't that the way it is supposed to be. I mean, if the bock is there in memory then it should be read from there right.
Thousand apologies if iam getting it wrong.

regards

Tom Kyte
November 29, 2002 - 9:34 am UTC

Look at my "big_table" example above.

I let the optimizer do what it wants. It full scans. It did:

Statistics
----------------------------------------------------------
0 recursive calls
117 db block gets
22861 consistent gets
physical reads
0 redo size
....


22,861 consistent gets (LOGICAL IOS). It read the table from head to tail and got me the answer.

Now, I forced it to use an index cause *we all know indexes are good, full table scans are evil* (sarcasm dripping as I type that) and we get:

Statistics
----------------------------------------------------------
8 recursive calls
0 db block gets
1684470 consistent gets
physical reads
0 redo size

1,684,470 logical ios!!

why so many more? Well, it read the first index leaf block. The first row on that leaf block said "read block 55 of the table", second row says "read block 100", 3rd = read block 1001, 4th = read block 2134 and so on. Say there were 500 index entries on that first leaf block. We might have had to of read 500 different database blocks for this first block alone.

Now, we move onto the second leaf block. It says "read block 2134" -- we'll, we now need to retrieve block 2134 from the buffer cache again. We already read it once on the first index block but here we are back at it again.

We'll read and re-read and re-read the table blocks over and over again because the index tells us the order of the rows to process -- we do not do the simple, fast, full scan.


If I do 1.6 million LIO's - that'll make my cache hit ratio go up more then if I do 22k LIO's. The end result of this deluded "tuner" who got the query to use the index

a) It uses an index, we all "know that is good", evil full table scan is gone
b) and look at that! my cache hit ratio is 99.999% now!!! After running this query just once! isn't that fantastic (no, no it is not -- in fact the opposite is true)


So, maybe we need to reverse the advice on the cache hit ratio. The higher it is, the worse your system is performing!!!





Could you pl explain how logical reads work on index scans?

Ramakrishna, November 29, 2002 - 12:08 am UTC

Dear Tom,
Since this posting is about logical reads, I wanted to clarify one point about them:
In your book, in the chapter on indexes, you have an example of a Colocated and a Disorganized table, both of them with the same data and the same index. A query that does a range scan on the two shows vastly different results based on the clustering of the index due to the huge difference in logical reads. My question is as follows:
When Oracle does an index scan, does it take one index entry, go to the block for the rowid, come back, take the next entry, go to the block for the rowid and so on or does it "batch" a number of consecutive index entries, figure out how many distinct blocks to visit and get them all at one time? If Oracle steps through the index row by row, then should we not get the exact same number of logical reads regardless of clustering?

thanks
ramakrishna

Tom Kyte
November 29, 2002 - 9:54 am UTC

it is a function of your arraysize and the cluster factor.  Here is an example using the same concept -- but different array sizes to show the difference.  I stuck with RBO since that thing is so index happy that it'll always using the index in my example below:


ops$tkyte@ORA920.LOCALHOST> create table colocated ( x int primary key, data char(255) );

Table created.

ops$tkyte@ORA920.LOCALHOST>
ops$tkyte@ORA920.LOCALHOST> insert into colocated select rownum, dbms_random.random from all_objects;

29319 rows created.

ops$tkyte@ORA920.LOCALHOST>
ops$tkyte@ORA920.LOCALHOST> create table disorganized ( x int primary key, data char(255) );

Table created.

ops$tkyte@ORA920.LOCALHOST> insert into disorganized select * from colocated ORDER BY DATA;

29319 rows created.

ops$tkyte@ORA920.LOCALHOST>
ops$tkyte@ORA920.LOCALHOST> commit;
Commit complete.


ops$tkyte@ORA920.LOCALHOST> set autotrace traceonly
<b>
ops$tkyte@ORA920.LOCALHOST> set arraysize 2</b>

ops$tkyte@ORA920.LOCALHOST> select * from colocated where x > 0;

29319 rows selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'COLOCATED'
   2    1     INDEX (RANGE SCAN) OF 'SYS_C003530' (UNIQUE)




Statistics
----------------------------------------------------------
...
      29890  consistent gets
...
      29319  rows processed

ops$tkyte@ORA920.LOCALHOST> select * from disorganized where x > 0;
29319 rows selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'DISORGANIZED'
   2    1     INDEX (RANGE SCAN) OF 'SYS_C003531' (UNIQUE)




Statistics
----------------------------------------------------------
...
      45116  consistent gets
...
      29319  rows processed


<b>See, here since the array size was 2 -- our consistent gets were pretty close to number of rows for colocated.  Why?  Each array fetch did this:

- get index block (1 LIO)
- get row 1 from the table (1 LIO for block X)
- get row 2 from the table (0 LIO if the second row was on block X which it will be since we put the data "in in that order)

So, every 2 rows = 2 LIO's.

Now, disoganized -- we have about 1.5 times the LIO's as we do rows.  why?  each array fetch did this:

- get index block (1 LIO)
- get row 1 from table (1 LIO for block X)
- get row 2 from table (1 LIO for block Y, the odds that Y=X are very very small so we'll do another LIO)

So, we do 3 LIOS on average to get 2 rows...


Now, going higher on the array size (this is the sqlplus default size):</b>


ops$tkyte@ORA920.LOCALHOST> set arraysize 15
ops$tkyte@ORA920.LOCALHOST> select * from colocated where x > 0;

29319 rows selected.


Statistics
----------------------------------------------------------
...
       5047  consistent gets
...
      29319  rows processed

ops$tkyte@ORA920.LOCALHOST> select * from disorganized where x > 0;

29319 rows selected.


Statistics
----------------------------------------------------------
...
      31325  consistent gets
...
      29319  rows processed

<b>the colocated table is now a fraction of the LIO's.  We did 15 row array fetches, meaning we did about 2,000 "fetch" calls.  Apparently we did about 2.5 LIO's per fetch.  How so?  Well we did this:

- get index block (1 LIO)
- get row 1 from table for this index block (1 LIO for block X)
- get rows 2..15 from table for this index block (maybe 0 LIO's - all on block X, maybe 1 additional LIO cause we just read all of the rows on block X)

So, we do 2-3 LIO's for every 15 rows -- 5k LIOS.

Disorganized -- for the same reason as above -- does LIO's like crazy cause we are constantly hopping from block to block in order to get this data.  The data is all over the place.

Stepping arraysize up again:</b>

ops$tkyte@ORA920.LOCALHOST> set arraysize 100
ops$tkyte@ORA920.LOCALHOST> select * from colocated where x > 0;

29319 rows selected.


Statistics
----------------------------------------------------------
...
       2597  consistent gets
...
      29319  rows processed

ops$tkyte@ORA920.LOCALHOST> select * from disorganized where x > 0;

29319 rows selected.


Statistics
----------------------------------------------------------
...
      30541  consistent gets
...
      29319  rows processed

ops$tkyte@ORA920.LOCALHOST> set autotrace off
ops$tkyte@ORA920.LOCALHOST>

<b>colocated gets better, disorganized not really.  no matter what you do, disorganized is disorganized -- we give up the block during the scan and end up reading and re-reading the same blocks over and over and over and over again...

</b>

The CBO is "smart" about this as well.  There is a statistic -- the cluster factor that it looks at to see if it wants to use an index or not.  It would be more likely to use colocated to retrieve a larger "range" of rows then it would with disorganized.  If you have my book "Expert one on one Oracle" -- that is the example that is in there. 

Thanks, it is clear now

A reader, December 02, 2002 - 3:40 am UTC

Dear Tom,
Thanks for your detailed explanation. I had not considered arraysize at all. From your example, it appears that the block on which the LIO was done is held on to till all the rows specified by the arraysize are fetched out and is then released because it is now time to send data back to the client.

regards
ramakrishna

How to Colocated

H, December 05, 2002 - 1:50 pm UTC

Tom,
How do i make sure that the indexes and tables are colocated?

Tom Kyte
December 05, 2002 - 3:25 pm UTC

(it is not that the table and indexes are colocated -- but rather that the table is organized -- typically sorted by -- the index key values. If you have my book "Expert One on One Oracle" -- i go over this in more detail)

Well, first you have to drop the PLURAL. It is doubtful that the clustering factor of two different indexes on the same table will both be "very good". it can happen but it is doubtful -- why? because the clustering factor relies on physical location of data and data can only be sorted one way.

If table COLOCATED is sorted by ID, an index on ID will have a good clustering factor -- BUT an index on DATA on this table would have a terrible one!

If table DISORGANIZED, containing the same data, is sorted by DATA -- an index on ID would have a bad clustering factor -- but an index on DATA would have an EXCELLENT one.


So, one you accept that pretty much one index is going to have a good clustering factor -- what can you do to make it so the one you want to have a good clustering factor does?

If you are a transactional system, you can use b*tree clusters to force physical location of data. A cluster had a really good clustering factor by definition -- all of the data for a given key is stored (optimally) on the same block (again, if you have the book -- lengthly detail on this in the chapter on tables)

If you are a data warehouse -- doing mass loads -- you would physically sort the data. Eg, in 9i -- i would do a direct path load from a flat file like this:

insert /*+ append */ into my_table
select * from flat_file order by <thing I wanted clustered>;

In 8i -- I might use an OS sort utility, or load the data into a scratch table and then do the insert /*+ append */ with the order by.



How to reduce LIO's

reader, December 19, 2002 - 10:38 am UTC

What should be done to reduce LIO's?

Tom Kyte
December 19, 2002 - 11:28 am UTC

tuning queries
using physical structures that facilitate the queries you ask most often
using database techniques like materialized views in a data warehouse, bitmaps, etc

among dozens of others -- using your knowledge of what Oracle provides and applying it to your problem at hand.

Re: Ramakrishna's question

Connor McDonald, December 19, 2002 - 12:16 pm UTC

On a somewhat related note, there is an oracle 9 enhancement when probing via an index in a nested loop. The index probing is still done as described in this question, but when it comes to looking up the table, Oracle can choose to defer that read until its retrieved all of the rowid from the index.

Then it sorts the rowids so that it only has to visit each table block once. Very nifty...

Latching and Consistent Gets

Ramakrishna, December 30, 2002 - 12:07 am UTC

Dear Tom,
I have a small doubt with the way latching, consistent gets and arraysize work together. Could you kindly clarify?
If I am doing a full table scan, Oracle will do the following:
1. Compute the hash address of the table block that is to be read.
2. Take a latch on the chain in the cache buffers hash table where this block resides
3. Read upto arraysize rows from the table block (this counts as one logical I/O)
4. Release the latch
Is this correct?

With an index scan, I am confused about how this happens in 8i (Connor above has mentioned how this happens in 9i). Suppose I have an index block that is well clustered and all the rowids in that block point to the same data block. In this case, does Oracle:
1. Compute the hash address of the table block corresponding to the first row in the index block.
2. Take a latch on the chain in the cache buffers hash table where this block resides
3. Read that first row value (one logical I/O)
4. Does Oracle release the latch on the table block here? Or, does it hold on to it till it has fetched upto arraysize entries from the index block? If the latter, then what happens if I had a query that did a
SELECT COUNT(*) FROM (SELECT something via index range scan from table)?
In this case, since arraysize doesn't matter, are the latches held till the entire range scan is complete? But, if that is the case, what if there is another query that is reading the same blocks 'backwards'? How does it all work without deadlocking?

It would be great if you could explain how the latching and logical IO is done for an index range scan.

thanks in advance,
Ramakrishna





Tom Kyte
December 30, 2002 - 8:18 am UTC

as for the full scan first example -- conceptually correct, yes.

Now, Connor was talking NL joins which is different from just an index range scan.  NL joins are bunches of separate index range scans.


The index range scan works like this:

o if the index is very well clustered (organized) the same as the table -- then array fetching will do the consistent gets much like a full table scan would.  That is, if the TABLE is ordered the same as the INDEX is ordered -- we'll get an index block that has our first value -- latch the table block that has that and keep that table block until either we hit arraysize rows or the index entry tells us to move onto a different database block.

o if the index is sorted very differently from the table data -- you'll see higher consistent gets since we jump from table block to table block as we go from row to row.

Here is a quick and dirty example:


ops$tkyte@ORA817DEV> create table t as select * from all_objects <b>order by object_id;</b>

Table created.

ops$tkyte@ORA817DEV> alter table t add constraint t_pk primary key(object_id);

Table altered.

ops$tkyte@ORA817DEV>
ops$tkyte@ORA817DEV> set autotrace traceonly explain

ops$tkyte@ORA817DEV> select object_name from t where object_id > 0;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'T'
   2    1     INDEX (RANGE SCAN) OF 'T_PK' (UNIQUE)



ops$tkyte@ORA817DEV> set autotrace traceonly statistics

ops$tkyte@ORA817DEV> set arraysize 10
ops$tkyte@ORA817DEV> select object_name from t where object_id > 0;
23712 rows selected.


Statistics
----------------------------------------------------------
       5083  consistent gets

ops$tkyte@ORA817DEV> set arraysize 100
ops$tkyte@ORA817DEV> select object_name from t where object_id > 0;
23712 rows selected.


Statistics
----------------------------------------------------------
        846  consistent gets

ops$tkyte@ORA817DEV> set arraysize 500
ops$tkyte@ORA817DEV> select object_name from t where object_id > 0;
23712 rows selected.


Statistics
----------------------------------------------------------
        472  consistent gets

ops$tkyte@ORA817DEV> set autotrace off



ops$tkyte@ORA817DEV> drop table t;
Table dropped.

ops$tkyte@ORA817DEV> create table t as select * from all_objects <b>order by dbms_random.random;</b>
Table created.

ops$tkyte@ORA817DEV> alter table t add constraint t_pk primary key(object_id);
Table altered.

ops$tkyte@ORA817DEV> set autotrace traceonly explain

ops$tkyte@ORA817DEV> select object_name from t where object_id > 0;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'T'
   2    1     INDEX (RANGE SCAN) OF 'T_PK' (UNIQUE)



ops$tkyte@ORA817DEV> set autotrace traceonly statistics

ops$tkyte@ORA817DEV> set arraysize 10
ops$tkyte@ORA817DEV> select object_name from t where object_id > 0;
23712 rows selected.

Statistics
----------------------------------------------------------
      26065  consistent gets

ops$tkyte@ORA817DEV> set arraysize 100
ops$tkyte@ORA817DEV> select object_name from t where object_id > 0;
23712 rows selected.


Statistics
----------------------------------------------------------
      23934  consistent gets

ops$tkyte@ORA817DEV> set arraysize 500
ops$tkyte@ORA817DEV> select object_name from t where object_id > 0;
23712 rows selected.

Statistics
----------------------------------------------------------
      23744  consistent gets

ops$tkyte@ORA817DEV> set autotrace off



<b>see how in the second case the consistent gets didn't go down significantly -- regardless of the array size.  Each row caused us to hop from block to block to block -- every row fetched via the index was another consistent get</b> 

Thanks, that clears it

A reader, December 30, 2002 - 11:15 pm UTC

Dear Tom,
Thanks very much for your reply. That explains why the logical I/O is less for a colocated table even through an index range scan.

regards
ramakrishna

LIO and tkprof

Sam, January 02, 2003 - 8:56 pm UTC

Tom,

Suppose I have a query that resulted in reading 10 blocks from disk.

In the tkprof output, does the value in the "query" column include the LIO that resulted in the PIO? Or do I need to sum disk+query+current to get the actual number of LIOs?

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.02 0.01 0 90 0 0
Execute 1 0.06 0.08 2 274 17 4
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.08 0.09 2 364 17 4

Tom Kyte
January 02, 2003 - 9:24 pm UTC

query+current = buffer reads.

physical IO's are *generally* part of query+current, that is a physical IO becomes a LIO right away.

Direct READS from temp -- do not become LIO's however -- so, it is easy to have disk > query+current in that case.


So, a physical io into the buffer cache will also bump query|current.
A physical IO from temp will not, that bypasses the cache.

Re: Connro's Response

Sam, January 02, 2003 - 10:10 pm UTC

Tom,

Could you also explain a bit more the 9i feature that Connor mentioned, and maybe point us to some documentation?

"On a somewhat related note, there is an oracle 9 enhancement when probing via an index in a nested loop. The index probing is still done as described in this question, but when it comes to looking up the table, Oracle can choose to defer that read until its retrieved all of the rowid from the index."

Thanks


Please explain this in this context

Subrata Saha, January 03, 2003 - 1:21 pm UTC

if mode eq ‘consistent’, then
clone(data block address, …)
‘consistent gets’++
else /* mode eq ‘current’ */
‘db block gets’++

What is this cloning?

Tom Kyte
January 03, 2003 - 2:25 pm UTC

you typed it in?

it looks like, from the context you typed in, it means to get a consistent read "clone" (copy) of the block -- using RBS data if need be to rolled back a block. consistent gets may go up by more then one (to reflect how many RBS reads we did as well)

how about famous "readers don't block readers"

Mikito Harakiri, February 24, 2003 - 7:02 pm UTC

>>> A latch is a lock.

>>> Locks are serialization devices

In other words, if blocks are on disk, readers don't block each other, but if blocks are in memory, they do! So long for "readers don't block readers".


Tom Kyte
February 25, 2003 - 9:03 am UTC

Mikito, you are boring after a while. really.

The only place really that Oracle reads blocks from is memory -- physical io is used to read the blocks into the buffer cache. The buffer cache -- being a shared data structure -- must be protected from concurrent "readers of memory" and "updaters of memory". hence the latch.

Now you tell me -- can you truly, even by "reductio ad absurdum" here you cannot actually believe what you are saying.

and "reads don't block reads" isn't famous - it is boring.

what is famous is

"reads don't block writes"
"writes don't block reads"



Hash Join versus Nested Loops

Mikito Harakiri, February 25, 2003 - 5:49 pm UTC

In your example Nested Loops has more buffer gets than, say, Hash Join. OK. This translates into less number of latches and better concurrent performance profile.

Now I'm trying to understand the reason. Suppose both tables are cached into the memory. Then we can ignore Random IO versus Sequential IO difference. Is Nested Loop algorithm inherently "nonoptimal" compared to Hash Join? What HJ does differently? It does put a hash table in front of the inner table. That's just extra work to me. Then, it does the similar nested looping which is only sugarcoated by the hash table probe. Specifically, oracle probes the hash table, so that if there is no match, then the target table is not accessed. But Indexed Nested Loop does the similar probing of the join index! Why Hash Join is considered "more scalable"?

Tom Kyte
February 25, 2003 - 8:47 pm UTC

see
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:6749454952894#6760861174154 <code>

for a better example.


Think of it like this. Each consistent get takes CPU work. If you do something that take 0.0001 cpu seconds once, it is really fast. Now, if you do it 100,000,000 times, it takes 10,000 seconds!

do something really fast and small alot and it adds up.


Lets say you have a room of boys and girls and you want to have a dance. You would like to have the kids paired up by size (tall girl, tall boy).

You could pick a girl and then search around for a boy about the same size, pair them up and continue. This works well if you have say 5 boys and 5 girls.

now, lets say you have 1000 of each. It gets a little unwieldy now -- takes quite a while to pair them up. You could have the boys sort themselves by height and then the girls -- now you pair them up quickly.

That is sort of a sort merge but it is similar to a hash join.

The first method -- pick a girl, find a boy -- output -- gets people on the dance floor really fast (first rows back fast) but it takes a long time to get all 1000 couples out there.

The second method keeps the dance floor empty during the sort -- but when the rows start coming -- they are flying out there (with very little work).


Neither is "less optimal" -- no more then an 18 wheeler is less optimial then a BMW. One is good for big long hauls, one is good for getting to work. Different tools for different jobs.


There are no "bad" features, just inappropriate uses of technologies.



Nice example

A reader, February 26, 2003 - 8:01 am UTC


sort-merge join versus nested loop

Mikito harakiri, February 26, 2003 - 12:40 pm UTC

Excellent explanation.

Let me elaborate, though. Suppose we want to pair 1M girls and 1M boys. In MJ method we sort girls -- that's about 6*1000000 operations -- then, we sort boys -- that's another 6*1000000 operations. Then we go through the sorted lists matching the pairs -- that's about 2*1000000 operations.

Now, for comparison, we perform indexed nested loops. For each boy we refer to the index of girl's height. We'll do roughly 6 (logarithm of 1M) navigations in the index structure. Index leaf refers to the exact location of the girl. Then, the total amount of work is about the same 6*1000000 operations.

Therefore, I still don't understand how can we see Merge Join having significantly less buffer gets than Nested Loops.

Tom Kyte
February 27, 2003 - 7:09 am UTC

you have it backwords -- sorts are the log, NL are linear.

for each girl -- you will do 3 index reads + one table read = 6 * 4 million operations.

the sorts are the ones that are on the order of log(n)


try it and see -- oh wait -- I already did -- see link above.

Jim, March 02, 2003 - 11:04 pm UTC

Excellent

arraysize and full-table scan

Sam, April 15, 2003 - 2:41 pm UTC

Tom,

You mentioned:
"Each array fetch did this:

- get index block (1 LIO)
- get row 1 from the table (1 LIO for block X)
- get row 2 from the table (0 LIO if the second row was on block X which it will be since we put the data "in in that order)"

In the case of a full-table scan, how does arraysize affect the number of LIOs? I assume if there are lots of rows in a block, with a small array size then Oracle will read the same block many times, whereas if there are only a few rows in a block then the negative impact will be less since the same block will be revisited fewer times. Is my understanding correct?

Also, what does Oracle "remember" between each array fetch?Does it remember the index block or table block it last read so it just resumes from there?

Tom Kyte
April 16, 2003 - 9:18 am UTC

that is correct (the arraysize / lio thing)


Oracle remembers everything it needs to in order to pick up processing where it left off. This could be thousands of things. But, think of it like this.

A SQL query is a program.
Oracle is a compiler.
Oracle compiles the program into object code (shared sql)
You run the program (like running something from the command line)
The operating system (Oracle) starts the program, sets an instruction pointer, creates a stack, initializes a data segment.
Program runs, gets pre-empted, OS restores context -- runs, gets pre-empted, OS restores context -- runs, gets pre-empted and so on

So, just think of SQL as a program, Oracle as an OS, arraysize is the timeslice the program ran for -- and you'll have a very nice analogy for what happens.

how much logical read is too much ?

kumar, May 21, 2003 - 6:27 am UTC

Tom,

How do i know whether the logical reads done for a particular query are actually essential ? How do i find the minimal/optimal no of reads approximately required for a particular query that returns a certain number of rows?

Thanks

Tom Kyte
May 21, 2003 - 8:53 am UTC

You think about it (seriously).

If I ask a query like:

select * from emp where empno = :x

I would expect 3 logical IOs for the index and one for the table. I would expect 4 logical IO's


If I ask a query like:

select * from emp, dept where emp.deptno = dept.deptno and dept.deptno = 10;

I would expect

3 LIO's to read the index on deptno plus
1 LIO to do the table access by rowid on DEPT to get the dept row plus
N*3 LIO's to read the index on emp(deptno) where N is the number of emps in a department plus
N LIO's to get the emp data


for example. You have to sort of reason it out.

Thats really a good one

kumar, May 21, 2003 - 9:02 am UTC

Tom,

Thanks for your response. I was also thinking on that line -but then i felt this method a little cumbersome when there are more number of tables involved in a query. I thought there might be some more efficient method of doing it. But for doing in this method I assume that we need to know the data completely - Am i right ?

Tom Kyte
May 21, 2003 - 9:41 am UTC

if you don't know the data completely, you cannot be tuning the queries. maybe this is a good chicken and egg situation, in order to know if the query is tuned -- you need to understand the data, in order to tune a query -- you need to understand the data...

the hotsos article doubt

A reader, June 24, 2003 - 8:48 pm UTC

You said
"download the "are you still using cache hit ratios"
and keep an open mind."

from the www.hotsos.com.

Quote from the article:
"
Of course, if your database spends a lot of time parsing, then you should try to reduce the
number of parse calls that your application requires. If your database spends a lot of time doing language processing,
then you should carefully consider moving that application logic to a middle tier, where scalable CPU capacity is
much less expensive than it is in your database."

I could not understand what the article meant about
language processing and the corresponding advice to
move that logic to middle tier. I also dont understand
what they mean by "scalable CPU capacity is much less
expensive than it is in your database".

Could you throw some light on it?

Thanx for the GREAT SITE!!!

Tom Kyte
June 25, 2003 - 11:47 am UTC

You would have to ask Cary ;)

well, I asked Cary and here is what he has to say

A reader, June 25, 2003 - 5:02 pm UTC

What are your thoughts on this Tom? I am not quite sure
if he is contradicting your opinion of putting more
stuff into database - elsewhere you said that
middle tier cpu cycle is not any more or less cheap than
a database cpu cycle - See "I sort of thought a cpu cycle was a cpu cycle?" at
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:883929178230 <code>

Cary says something interesting in this regard to
in answer to my second followup question.

My conversation with Cary
_________________________
Q1. I could not understand what the article meant about
language processing and the corresponding advice to
move that logic to middle tier. I also dont understand
what they mean by "scalable CPU capacity is much less
expensive than it is in your database".

Cary's response:
_______________
You often have a choice about whether to execute application logic in the database, or in the client using the database. There are tradeoffs. On one extreme, if you get one row at a time from the database and do all application logic (including joins) in the application, it is bad. It&#8217;s much more efficient to do filtering and joining, for example, in the database than in the application. At the other extreme, if you do all your application processing in the database (for example, CPU-intensive operations like data processing that have nothing to do with data storage and retrieval), then you&#8217;re using database server CPU capacity unnecessarily.

On busy databases, the Oracle database server is the bottleneck. It&#8217;s why Oracle Corp. is focusing so much effort on trying to make RAC a scalable solution for database systems that demand more CPU capacity than the biggest computers can provide. But RAC (and even huge single-node Oracle boxes) are very expensive. So, it&#8217;s smart to conserve all the CPU capacity that is practical on your database tier. In the second extreme described above, it&#8217;s much more scalable to move the processing to a cheap middle-tier computer instead of having extra workload that competes against all your other database processing for CPU capacity.

My Followup Q 2:
_______________
If you can, please give real life example of "cpu intensive" operations that dont involve data processing that are apt for middle tier and are NOT apt for database.

This is because sql engine itself is very rich for most of the data transformations that I can think of.

Is "language processing" in your article the right example - if so could you please elaborate?

At what point does the database cpu become "costlier" than the middle tier cpu?

Cary's response:
_______________
There are lots of examples. One is Oracle E-Business Suite&#8217;s Financial Statement Generator (FSG). These things get lots of data out of the database and then works for minutes to hours processing those rows (making balance sheets, income statements, and such). Anything complicated that happens to data after getting it out of the database is what I&#8217;m talking about.

By the way, by &#8220;language processing,&#8221; I mean the execution of PL/SQL that makes no database calls.

Regarding your final question, I think that one second of CPU capacity on a database server is more expensive than a second of CPU anywhere else if the database server is suffering from a CPU capacity shortage (which they often do).

When you get down to it, the database server is the hardest component of an Oracle application architecture to scale. It&#8217;s not an Oracle product deficiency; it&#8217;s a simple matter that the single component of an Oracle-based that gets visited repeatedly by every user of an application is usually the database server. So, any waste that takes place at that tier is usually the first problem to impact everyone. Therefore, it pays to be careful what you put there. If a CPU-consuming process can be run with equal functional effectiveness either on the database server or somewhere else, then it probably should be run somewhere else.







Tom Kyte
June 25, 2003 - 8:54 pm UTC

perfect, we are in aggreement....

application logic = outside, in the application, where ever that be.

data logic = inside, in the database, that be in the center.


In many (many many many) systems, the database is more then big enough for both.

In some (big big systems) the database is not.

Problem is, people think "everything big, everything needs 14 tiers". Many things, well, they just do not.


Like Cary, I hate things that pull data out to a middle tier a row at a time, process it, put it back. what a waste.

Things that pull it out, keep it out and format it for "looking at" or "reports" -- great, if you have the cycles and want to -- keep in in the center. If you don't have the cycles or don't want to -- put it outside.

But never -- never get into the mode of "we cannot do anything in the DB, too 'expensive'"

It is too expensive NOT to. The simple "pull it out, row by row, put it back" that is so common is much less "impactful" on the database if you simply use a single query and don't make the db throw the data out just to put it back somewhere else.

So, would I do a fast fourier transformation in my database?
No.

Would I do all of my ETL?
Yes.

Would I do my security?
Yes.

Would I use analytics?
Yes.

Would I use dbms_jobs to run many things?
Yes.

Would I generate a gif image?
Probably Not.

Would I text search?
Yes.
........


I would just end his last paragraph with

then it probably should be run somewhere else, assuming of course you've already gone down the path of multiple machines to service your uses. If you use a single machine as many people do, well then -- maybe you can just keep it in the database.....


It is truly amazing what you can do on a single 2/4 cpu server -- "database" and "middle tier" all on one. It takes a fairly "big" system to make that not feasible.

decreasing Physical read

A reader, July 18, 2003 - 10:03 am UTC

Hi Tom,

First I want to thank you again for your valuable information and informative site.

I have a table as follow:

graindba@COLG> set serveroutput on
graindba@COLG> exec show_space('FIN_GL_AUDIT_TRAIL');
Free Blocks.............................1
Total Blocks............................9045
Total Bytes.............................74096640
Unused Blocks...........................125
Unused Bytes............................1024000
Last Used Ext FileId....................4
Last Used Ext BlockId...................132091
Last Used Block.........................135

graindba@COLG> select count(*) from fin_gl_audit_trail;

COUNT(*)
----------
366874

The dbBlock size is 8k. As you see the table has 8919 Data Blocks. Now, when I execute the following query it takes 26 seconds which is not good for me. I need better performance. As you mentioned it before, I looked at the number of consistent gets which is exactly the same as number of data blocks. So to my knowledge it is OK. Because Oracle needs to read each block at least once. Am I right? But I noticed it has 6127 physical reads !

My question is, how can I tune this query? Do I need to decrease Physical Reads? If so, how?

Thanks again for your time,
Arash
P.S. The Lock_Flag coulmn has an Index. But I think CBO because of the data distribution it doesn't use the Index.



graindba@COLG> select 1
2 from
3 fin_gl_audit_trail
4 where lock_flag ='Y'
5 /

no rows selected

Elapsed: 00:00:26.04

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1354 Card=366874 Byt
es=366874)

1 0 TABLE ACCESS (FULL) OF 'FIN_GL_AUDIT_TRAIL' (Cost=1354 Car
d=366874 Bytes=366874)





Statistics
----------------------------------------------------------
0 recursive calls
4 db block gets
8919 consistent gets
6127 physical reads
0 redo size
210 bytes sent via SQL*Net to client
209 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
0 rows processed



Tom Kyte
July 19, 2003 - 11:10 am UTC

so, what is the distribution of values in that column, what does

select lock_flag, count(*) from t group by lock_flag

return?

A reader, July 20, 2003 - 1:17 pm UTC

Hi Tom,

select 1
2 from
3 fin_gl_audit_trail
4 where lock_flag ='Y'
5 /

no rows selected

As you see, it doesn't return any value for lock_flag ='Y'. It's kind of dynamic column which will be set to 'Y' any time a user query a record. So all records have the value 'N' by default.

Regarding the index on lock_flag I made a mistake(sorry). It didn't have any and when I added the index it worked ok. I mean the query execution time took less than a second.

Now another question came to my mind. I've read in Oracle books that we don't need to index low cardinalate columns. beause CBO won't use it and would rather to do FULL TABLE scan. So why does CBO used the index in this case?

and another thing, why that much Physical Reads!? How can I in general reduce Physical Reads?

Thanks millions of tons for your time,
Arash



Tom Kyte
July 20, 2003 - 3:48 pm UTC

the CBO intelligently used indexes when they make sense.

Here -- they know that "y" has few values -- index is good. If you did it for ='N', the CBO should go for a full scan.

You might consider indexing:

decode( lock_flag, 'Y', 'Y', NULL )

that way ONLY locked records will be in the index (index will be tiny) since entirely null entries are not made in b*tree indexes.

A reader, July 21, 2003 - 9:06 am UTC

Thank you so much. I learned one more thing.

A reader, November 07, 2003 - 12:34 pm UTC

I have a session, and its logical reads is keeping increasing. But there is not sql statement found by the session and status shown INACTIVE based on v$sqlarea. There do show some wait events for the session, majorally are log file sync and latch free shown 100% timed out. Could you explain what happens to the session? Even no sql statement in the sqlarea?

Tom Kyte
November 07, 2003 - 3:16 pm UTC

you are looking at the session as "an instant in time" and each time you look, it is inactive.

that is all, the session IS active.
Just not at the instant you look at it.




Where does Oracle hold this block

A reader, December 22, 2003 - 5:55 pm UTC

Tom,

In the above discussion you mention the following :
<Quote>
See, here since the array size was 2 -- our consistent gets were pretty close to number of rows for colocated. Why? Each array fetch did this:

- get index block (1 LIO)
- get row 1 from the table (1 LIO for block X)
- get row 2 from the table (0 LIO if the second row was on block X which it will be since we put the data in that order)
</Quote>

In the above case, we're almost sure that row 2 will be on the same block X because we put data in that order. However, I am not clear on this and I have the following question(s) :
It has access block X from the buffer cache again to get to the second row (which I understand is already in the buffer cache) which constitutes 1 LIO, but you mention that it does 0 LIOs in this case.
1. Can you please clarify as to why you say that it does 0 LIOs?
2. Does that mean that Oracle keeps the block X temporarily in a memory buffer array somewhere (without accessing the buffer cache again) after fetching row 1 but before fetching row 2. If so where does Oracle keep the block without accessing the buffer cache?
3. If Arraysize is set to 1 instead of 2, then does that mean it will perform 1 LIO for each fetch?

As always thanks much for your help.

Tom Kyte
December 22, 2003 - 6:49 pm UTC

1) we already had the block, we did not need to go back to the buffer cache to re-get it in that very same call. if the next row we need is on the block we already have -- 0 LIO's to get it.

2) yes, in the PGA (process global area) of your dedicated or shared server process.

3) it would perform 2 LIO's for each fetch. 1 for the index, 1 for the row. It would do 2x the number of fetches -- hence 2 times the LIO.

One question

A reader, December 22, 2003 - 7:23 pm UTC

Tom,

Please confirm the following :

1. Is the question 1 related to latching the table block as long as the next row is in the same tabe block being latched or the number of rows fetched exceeds array size?
2. If we do a single row-by-row fetch, that means, it would require latch for each table block fetched and is that what you say that latches are serialization devices which makes the query not scalable?
3. If we do row-by-row fetch (ie. arraysize = 1), it will do the following :
a. Read 1 index block (1 LIO) + 1 table block (1 LIO) = 2 LIOs
b. Get the index block read in step 1(1 LIO) and get the next row of the table block (1 LIO) = 2 LIOs. Is that what you say by 2 LIOs per row.

Thanks much

Tom Kyte
December 22, 2003 - 7:54 pm UTC

1) if we already have the block we need -- we do not get it again. that is all i'm trying to say.

2) yes

3) i mean you'll get index block (1 lio) that points to table block (1 lio) and that equals TWO.

every FETCH will require

a) get index block we last used
b) get table block it points to



Index block read

A reader, December 22, 2003 - 9:58 pm UTC

Tom,

To recap and understand/confirm what I have understood from the discussion on index range scan.

1. The index block is read from the buffer cache (1 LIO). The table block pointed to by the first row of the index is read from the buffer cache and brought to the PGA(1 LIO. If the block is not already in the buffer cache, a PIO is initiated). A latch is held on the table block if the next row of the index block points to the block we already have and the row is put into the array. This process continues until the number of rows exceeds the array size or the index entry points to the another table block at which time the next table block is accessed from the buffer cache and the LIO is incremented.
Pls. confirm if the above understanding is correct.
2. You mentioned that the table block is held in the PGA. Is the index block also held in the PGA? If not where is the index block held?
3. Suppose the arraysize is 5 and for e-g. the first block of the index entry points to table block X, second entry points to table block Y, third entry points to table block Z and the fourth entry points again to X and the fifth entry points to Z. Then in that case the following happens. :
1 index block read (1 LIO)
1 table block (X) read into PGA (1 LIO)
1 table block (Y) read into PGA (1 LIO)
1 table block (Z) read into PGA (1 LIO)
When reading the fourth and fifth table rows from block X and Z respectively,will Oracle use the blocks X and Z already read into the PGA or will it go to the buffer cache to access the blocks incurring additional LIOs? If so, will increasing the PGA size help reduce LIO and what parameters to increase/alter.
Please clarify questions 1 to 3.

As always, your explanation will be really helpful

Thanks

Tom Kyte
December 23, 2003 - 9:59 am UTC

1) the latch is removed as soon as we copied the block into the pga address space.

we latch to copy it. then unlatch it.

2) yes, copied data is in the pga where we can work on it.

3) it'll have to reget them (you haven't any real control over PGA memory like that, you cannot "increase it"

Clarification to Qn 3 above

A reader, December 23, 2003 - 4:20 pm UTC

Tom,

What I really meant in Qn 3 above was that whether Oracle stores any information that says 'Hey, you have read table block X while reading row 1 and is already in the PGA. Since
the index entry for row 4 also points to the same table block X, you don't reget it from the buffer cache, but instead use the table block X already in the PGA.

Tom Kyte
December 23, 2003 - 5:29 pm UTC

we don't still have -- we moved off of that block onto another one. block x isn't there.

Doubt

A reader, December 23, 2003 - 7:48 pm UTC

Tom,

Can you then explain what you mean by

'yes, copied data is in the pga where we can work on it.'

in response to question 2 above in the prior discussion



Tom Kyte
December 23, 2003 - 8:00 pm UTC

the block(s) we are processing is(are) copied from the buffer cache into the pga for further processing.

Pls. clarify

A reader, December 24, 2003 - 10:52 am UTC

Tom,

As a followup to the earlier discussion, you mentioned the following and I quote here :

we don't still have -- we moved off of that block onto another one. block x isn't there

What do you mean by :

1. We don't still have the block - Was it not copied to PGA when reading block X
2. Block X is not there - What does it mean?
Does it mean that when Oracle processes each row pointed by index entry, it will access the block from the PGA one-by-one until the array size is exhausted or until the index entry points to a different block at which point, it will leave the current block alone and fetch from the buffer cache, another table block Y pointed by the next index entry ie. (1 more LIO).
3. If I have the following index entries :
index entry 1 points to table block X
index entry 2 points to table block X
index entry 3 points to table block Y
index entry 4 points to table block X

Will it mean Oracle will perform the following for table blocks :
1 LIO to fetch table block X for 1st row
0 LIO since the 2nd row is on same block X
1 LIO to fetch table block Y (since it is in a different block)
1 LIO again to fetch table block X (since the 4th row is in X and Oracle has to re-read the block X. ie hop from block to block). My question is in the last case
Can't it use the table block X already in the PGA?.
If not pls. explain.
4. Tom, I am almost clear as to how the LIOs and physical reads happen except the following :
How Oracle accesses the required table and index blocks once the first index block and table blocks are copied onto PGA and how the LIOs and PIOs are affected from here on.
Please explain how Oracle handles this part.

Thanks much and have a Happy and Wonderful Christmas and New Year 2004.

Tom Kyte
December 24, 2003 - 11:03 am UTC

1) it was, but then we moved onto block y.

2) means "not there", we used it, we saw we needed y, we moved on. then discovered "oh, we need x again"

3) set up the test and try it out! (i did, its pretty easy to simulate)

4) sorry -- guess i'm not able to make it clear enough here. no whiteboards

LIO always larger than PIO ?

FZ, January 27, 2004 - 3:12 am UTC

Tom,
My understanding is LIO should always be larger than PIO for any SQL. But by querying statspack data, it has been seen that PIO > LIO for some SQLs. Can you shed light on this?
Thank you.

Tom Kyte
January 27, 2004 - 8:50 am UTC

IO to temp is not counted as LIO.

So, it is very easy to see PIO's far exceed LIO if you do lots of hash/sorts.

clarification

amit poddar, March 29, 2004 - 9:57 am UTC

quoting you from above
"3) it would perform 2 LIO's for each fetch. 1 for the index, 1 for the row. It
would do 2x the number of fetches -- hence 2 times the LIO. "

Does that mean at any time there are two blocks in the PGA for a index access ?

1. Index block
2. Table block

and Oracle keeps getting different data blocks as guided by the entries in the index blocks. and when it has completed reading the whold index block it gets the next leaf block and the process continues in similar fashion ??



Tom Kyte
March 29, 2004 - 10:43 am UTC

yes, no maybe.

depends on the array size, the query plan, whats going on.

So much useful info, so well explained!

hxlvt, March 29, 2004 - 12:02 pm UTC

Thanks so much.

More questions ..

RD, July 14, 2004 - 3:57 am UTC

Hi Tom,

You in your example of colocated and disorganised tables ( starting of this thread)
said :-

Stepping arraysize up again:

ops$tkyte@ORA920.LOCALHOST> set arraysize 100
ops$tkyte@ORA920.LOCALHOST> select * from colocated where x > 0;

29319 rows selected.


Statistics
----------------------------------------------------------
...
       2597  consistent gets
...
      29319  rows processed

ops$tkyte@ORA920.LOCALHOST> select * from disorganized where x > 0;

29319 rows selected.


Statistics
----------------------------------------------------------
...
      30541  consistent gets
...
      29319  rows processed

ops$tkyte@ORA920.LOCALHOST> set autotrace off
ops$tkyte@ORA920.LOCALHOST>

colocated gets better, disorganized not really."

on my system when I do:- 
select dbms_rowid.rowid_block_number(rowid), count(*) from colocated group by dbms_rowid.rowid_block_number(rowid);

I get:-(sample)
                               20329         27
                               20333         27
                               20337         27
                               20341         27
                               20345         27

DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID)   COUNT(*)
------------------------------------ ----------
                               20349         27
                               20353         14

948 rows selected.

That's 948 blocks with around 27 rows in each block.

What exactly happened at 100 arraysize ?
1) The first index block is gotten,(1 IO here),
then the datablock is read to get the rows (right?)
 a) The index block has the rowid and the key value for  each row there(also the block id is got from here). Do you 
mean more than one block was read here say 27 rows from first block + 27 from 2nd + 27 from 3rd + 19 from 4th = 100.
 THis means now there have been total of 2 IO's till now.
The 4th block is now revisited and now the same happens again. Now 3 IO's because 4th block was revisited.

for 948 blocks. Approximately -
(948/4 * 2) + 948/4 = 711 IO's.
948/4 = 2 IO's for every 4th block apporx.
948/4 = every 4th block is read again. Can be 3rd also sometimes but just for approximate measures here.

 b) How is the index block read exactly? I thought it was once to get each rowid and then the data is got from the block for the respected rowid and so on so if 10 rows to be got through an index that's 20.  For an earlier thread:-


"Followup:  
1) well, it'll get the index block to find the table block, and then get the 
index block to find the next table block, and so on.

ops$tkyte@ORA920LAP> set echo on
ops$tkyte@ORA920LAP> /*
DOC>drop table t;
DOC>create table t ( x int, y int );
DOC>create index t_idx on t(x);
DOC>insert into t select rownum , rownum from all_objects;
DOC>*/
ops$tkyte@ORA920LAP>
ops$tkyte@ORA920LAP> alter session set sql_trace=true;

Session altered.

ops$tkyte@ORA920LAP> begin
  2          for x in ( select * from t where x < 100 )
  3          loop
  4                  null;
  5          end loop;
  6  end;
  7  /

PL/SQL procedure successfully completed.

ops$tkyte@ORA920LAP> begin
  2          for x in ( select * from t where x < 200 )
  3          loop
  4                  null;
  5          end loop;
  6  end;
  7  /

PL/SQL procedure successfully completed.

select * from t where x < 100

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch      100      0.00       0.00          0        200          0          99
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      102      0.00       0.00          0        200          0          99

Rows     Row Source Operation
-------  ---------------------------------------------------
     99  TABLE ACCESS BY INDEX ROWID T (cr=200 r=0 w=0 time=1748 us)
     99   INDEX RANGE SCAN T_IDX (cr=101 r=0 w=0 time=984 us)(object id 41082)
********************************************************************************
select * from t where x < 200

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch      200      0.00       0.03          0        400          0         199
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      202      0.00       0.03          0        400          0         199

Rows     Row Source Operation
-------  ---------------------------------------------------
    199  TABLE ACCESS BY INDEX ROWID T (cr=400 r=0 w=0 time=29098 us)
    199   INDEX RANGE SCAN T_IDX (cr=201 r=0 w=0 time=27490 us)(object id 41082)


so, 100 rows -- 200 consistent gets.  200 rows -- 400 consistent gets.  2 
consistent gets per row....


2) well -- it is a matter of the number of rows per block, the array size and 
such.  It just "was" in this case.  do that test case and play with the 
arraysize - set it up and watch that go down. set it down and watch it go up.

3) no, no it isn't.  you need to look closer at the example.  look at the 
consistent gets between colocated and disorganized.  It is all about the LOGICAL 
IO, even more so then physical.  they may well perform the same amount of 
physical io.
"

Please clarify and also check if my understanding is OK or now,

By the way my result of the test is as follows...

SQL> select * from colocated where x > 0;

25583 rows selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'COLOCATED'
   2    1     INDEX (RANGE SCAN) OF 'SYS_C003812' (UNIQUE)




Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1498  consistent gets
          0  physical reads
          0  redo size
    6977320  bytes sent via SQL*Net to client
      28726  bytes received via SQL*Net from client
        257  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      25583  rows processed
Regards,
RD.
 


 

Tom Kyte
July 14, 2004 - 11:33 am UTC

please summarize your q's here -- there is such a mix of what "i said" and such that I cannot really tell what you are after??

Question again...

RD, July 14, 2004 - 5:52 pm UTC

Hi Tom,

In a previous thread I read an example you've given --
"Followup:
1) well, it'll get the index block to find the table block, and then get the
index block to find the next table block, and so on.

ops$tkyte@ORA920LAP> set echo on
ops$tkyte@ORA920LAP> /*
DOC>drop table t;
DOC>create table t ( x int, y int );
DOC>create index t_idx on t(x);
DOC>insert into t select rownum , rownum from all_objects;
DOC>*/
ops$tkyte@ORA920LAP>
ops$tkyte@ORA920LAP> alter session set sql_trace=true;

Session altered.

ops$tkyte@ORA920LAP> begin
2 for x in ( select * from t where x < 100 )
3 loop
4 null;
5 end loop;
6 end;
7 /

PL/SQL procedure successfully completed.

ops$tkyte@ORA920LAP> begin
2 for x in ( select * from t where x < 200 )
3 loop
4 null;
5 end loop;
6 end;
7 /

PL/SQL procedure successfully completed.

select * from t where x < 100

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 100 0.00 0.00 0 200 0 99
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 102 0.00 0.00 0 200 0 99

Rows Row Source Operation
------- ---------------------------------------------------
99 TABLE ACCESS BY INDEX ROWID T (cr=200 r=0 w=0 time=1748 us)
99 INDEX RANGE SCAN T_IDX (cr=101 r=0 w=0 time=984 us)(object id 41082)
********************************************************************************
select * from t where x < 200

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 200 0.00 0.03 0 400 0 199
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 202 0.00 0.03 0 400 0 199

Rows Row Source Operation
------- ---------------------------------------------------
199 TABLE ACCESS BY INDEX ROWID T (cr=400 r=0 w=0 time=29098 us)
199 INDEX RANGE SCAN T_IDX (cr=201 r=0 w=0 time=27490 us)(object id 41082)


so, 100 rows -- 200 consistent gets. 200 rows -- 400 consistent gets. 2
consistent gets per row....
"

which means 2 IO's per row right? Point I understood here was that when Oracle visits say the 1st index block (1 IO)for the datablock rowid it will read the datablock (1 IO),
and so on , Now if this sequence of one for one is maintained how is the arraysize going to matter at all since only one rowid is read at a time?? Or is the index block read differently under different circumstances.i.e- more than one rowid are gotten somehow?

Please clarify because not too many people really understand how the index block is read and then the corresponding data block.Is it per row basis or what?

Regards,
RahulD.

Ps - after this clarification I'll come to the question I posted above to avoid making a real hash out of it like last time.




Tom Kyte
July 15, 2004 - 11:52 am UTC

if you fetch 100 rows using an array size of 100 -- there are optimizations in fetching that reduce the number of consistent gets.

instead of getting that index block over and over -- we'll get it once and exhaust it.

Extra fetch seen in SQL trace

Sam, August 18, 2004 - 6:08 pm UTC

Tom,

I'm using 9.2.0.3 and I executed the following via sqlplus:

SQL> alter session set events '10046 trace name context forever, level 12';
Session altered.

SQL> set arraysize 20
SQL> select object_name from all_objects
  2  where rownum <= 40;

OBJECT_NAME
------------------------------
/1005bd30_LnkdConstant
/10076b23_OraCustomDatumClosur
...
/12012e35_ClassDescription

40 rows selected.

SQL> exit

In the trace file, I see 3 fetches:
PARSING IN CURSOR #1 len=54 dep=0 uid=61 oct=3 lid=61 tim=2461251641443 hv=1157308234 ad='78bc021c'
select object_name from all_objects
where rownum <= 40
END OF STMT
PARSE #1:c=40000,e=37037,p=0,cr=3,cu=0,mis=1,r=0,dep=0,og=4,tim=2461251641428
BINDS #1:
EXEC #1:c=0,e=781,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=4,tim=2461251642602
WAIT #1: nam='SQL*Net message to client' ela= 7 p1=1650815232 p2=1 p3=0
FETCH #1:c=0,e=842,p=0,cr=12,cu=0,mis=0,r=1,dep=0,og=4,tim=2461251643707
WAIT #1: nam='SQL*Net message from client' ela= 1125 p1=1650815232 p2=1 p3=0
WAIT #1: nam='SQL*Net message to client' ela= 2 p1=1650815232 p2=1 p3=0
FETCH #1:c=0,e=4031,p=0,cr=100,cu=0,mis=0,r=20,dep=0,og=4,tim=2461251649233
WAIT #1: nam='SQL*Net message from client' ela= 5529 p1=1650815232 p2=1 p3=0
WAIT #1: nam='SQL*Net message to client' ela= 2 p1=1650815232 p2=1 p3=0
FETCH #1:c=0,e=3716,p=0,cr=96,cu=0,mis=0,r=19,dep=0,og=4,tim=2461251658749

Could you explain why there are 3 fetches (instead of the expected 2), and why the first fetch only retrieved one row? 

Tom Kyte
August 18, 2004 - 11:42 pm UTC

sqlplus always over fetched by one, it just does, there isn't a good reason too, no I don't know why, it just does (been there, done that :)

A reader, November 08, 2004 - 12:54 pm UTC

"When tuning a query -- I ignore PIO's. I concentrate 100% on reducing LIO's"

Ok classic case of index-happy RBO vs FTS happy CBO.

Here is RBO

343 rows selected.

Elapsed: 00:00:47.74

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=HINT: RULE
1 0 SORT (UNIQUE)
2 1 NESTED LOOPS
3 2 TABLE ACCESS (BY INDEX ROWID) OF 'BIG_TABLE'
4 3 INDEX (RANGE SCAN) OF 'BIG_TABLE_IDX' (NON-UNIQUE)

5 2 TABLE ACCESS (BY INDEX ROWID) OF 'BIG_TABLE_2'
6 5 INDEX (UNIQUE SCAN) OF 'BIG_TABLE_2_PK' (UNIQUE)


Statistics
----------------------------------------------------------
36 recursive calls
0 db block gets
1779424 consistent gets
19086 physical reads
0 redo size
6389 bytes sent via SQL*Net to client
893 bytes received via SQL*Net from client
24 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
343 rows processed

Here is CBO

343 rows selected.

Elapsed: 00:03:46.56

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=15477 Card=1271 Bytes=38130)

1 0 SORT (UNIQUE) (Cost=15477 Card=1271 Bytes=38130)
2 1 HASH JOIN (Cost=12799 Card=761719 Bytes=22851570)
3 2 TABLE ACCESS (BY INDEX ROWID) OF 'BIG_TABLE' (Cost=1202 Card=761719 Bytes=12949223)

4 3 INDEX (RANGE SCAN) OF 'BIG_TABLE_IDX' (NON-UNIQUE) (Cost=215 Card=20)

5 2 TABLE ACCESS (FULL) OF 'BIG_TABLE_2' (Cost=5715 Card=14601076 Bytes=189813988)


Statistics
----------------------------------------------------------
7 recursive calls
1 db block gets
182937 consistent gets
166409 physical reads
44 redo size
6389 bytes sent via SQL*Net to client
893 bytes received via SQL*Net from client
24 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
343 rows processed

Obviously, in terms of elapsed time, RBO wins hands down. So why didnt the CBO come up with this plan? Yes, up-to-date stats (9.2.0.4).

CBO did minimize the LIO, but 90% of that LIO turned out to be PIO! :-( RBO, on the other hand, had a lot more LIO due to its NL plan, but very little incurred PIO

Comments? Thanks

Tom Kyte
November 08, 2004 - 5:29 pm UTC

obviously, in a single user system....... how about doing that with lots of users for example (which is what I'm mostly faced with)

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:6749454952894#6760861174154 <code>

it can work both ways, depends on sizes of things (buffer cache, how much of the stuff was cached before you started, can it all be cached) as well.


but when tuning sql, my first and foremost goal is "reduction of the amount of work it performs" which is LIO's in general

A reader, November 08, 2004 - 4:59 pm UTC

And yes, I did play around with the optimizer_index_* parameters, all 3 combos of this parameter (10/90, 50/50 and 90/10 did not change the CBO plan)

anto, November 08, 2004 - 5:34 pm UTC

Hi Tom,

Ours is a datawarehouse in 9i with a default block size of 8K. We currently have queries doing index range scan of bitmap indexes and subsequently using hash joins/ star transormations etc . My question is

Will it help if we move the bitmap indexes alone(and not corresponding table) to a tablespace of block size 32K ? My assumption here is that since it is a 32K block, each block will have more index entries when compared to 8k block index and this reduces the number of reads and hence consistent gets will get reduced - since oracle will be doing a db file sequential read of 1 block at a time for the indexes.

What other factors do I need to consider here ?

thanks


Tom Kyte
November 08, 2004 - 9:22 pm UTC

well 32k is one of those "outside the box" things (you may well need to recreate your database to get there).

do you have an identified physical io problem on your system?
has it been show to be on db file sequential reads?
that are definitely resulting from index reads?

anto, November 08, 2004 - 6:10 pm UTC

Hi Tom,

Regarding the above post - choosing higher db_block_size for indexes, after creating the indexes in tablespace of 32k block size and running a few queries, after doing 'set autotrace on', the consistents gets did not come down as expected(w.r.t to 8k block size index ). The physical reads went UP, for most of the queries, maybe since we set a cache size of only 400M for this 32k block size

Is there anything that I am missing here ? - pctused,clustering factor etc are same in both cases. Maybe a 10046 trace will help ?

Tom Kyte
November 08, 2004 - 9:24 pm UTC

No, you just read a posting by a guy with 20 years of experience - and found it was wrong.


you didn't isolate where the problem was, you tried a silver bullet -- but the problems with silver bullets is that they kill good guys just as easily as bad guys.



how many blocks did you range scan before? (probably small number)
and now you are caching really big blocks with stuff you don't need
bumping out stuff you do.

A reader, November 08, 2004 - 7:12 pm UTC

"it can work both ways, depends on sizes of things (buffer cache, how much of the stuff was cached before you started, can it all be cached) as well"

I agree with everything you say but in my case, no matter when I run the query, cache warmed or not, 1 user or 100, the RBo query always performs faster. As you can see, it does more LIO, but way less PIO. Yes, I realize all the stuff you mention above comes into play, but at the end of the day elapsed time is what counts.

I cant go to my end users and say "Not to worry, the query is actually using minimum LIO, it is perfect" No one cares about LIO, PIO, etc, all anyone cares about is 'Elapsed: NN:NN:NN'!!

How can I make the CBO query go faster? I agree that the CBO query is the superior one since it does way less LIO, but why isnt it the "winner"? Thanks

Tom Kyte
November 08, 2004 - 9:31 pm UTC

have you looked at system statistics and the optimizer_index_* parameters (system stats *first* -- tell the system about your CPU and IO stats)

System stats

A reader, November 08, 2004 - 9:40 pm UTC

I did look at system stats.

I did dbms_stats.gather_system_stats(gathering_mode=>'interval',interval=>60)

and then ran some pretty heavy batch jobs just to exercise the system and ramp up the LIO/PIO.

Didnt make any difference to this query.

And, yes, I already tried the 3 combos of the optimizer_index* parameters (10/90,50/50,90/10)

And yes, I even tried this in 9.2.0.6.

Sigh, it is frustrating when, sometimes, the CBO cannot get a simple query right. It is designed to handle 10 page queries with dozens of tables, predicates and sometimes it cant get a query with 3 tables right!

Tom Kyte
November 09, 2004 - 7:55 am UTC

Ok, do this -- run both with TKPROF and compare the card= in the autotrace explain plan with reality -- are they even *close*?

A reader, November 09, 2004 - 10:05 am UTC

They are bang-on, actually. Being a frequent reader of your site, that is the first thing I check!

Here is the tkprof row source operation and the autotrace

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.02 0.02 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 314 36.24 43.34 170886 175661 0 4689
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 316 36.26 43.36 170886 175661 0 4689

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 1235

Rows Row Source Operation
------- ---------------------------------------------------
4689 HASH JOIN (cr=175661 r=170886 w=0 time=43295732 us)
18068 TABLE ACCESS FULL L (cr=1206 r=69 w=0 time=161269 us)
5172 HASH JOIN SEMI (cr=174455 r=170817 w=0 time=43028269 us)
26533 TABLE ACCESS FULL A (cr=1738 r=712 w=0 time=487790 us)
54126 TABLE ACCESS FULL CTD (cr=172717 r=170105 w=0 time=42231529 us)

4689 rows selected.

Elapsed: 00:00:47.69

Execution Plan
----------------------------------------------------------
0
SELECT STATEMENT Optimizer=CHOOSE (Cost=13538 Card=14150 Bytes=693350)


1 0
HASH JOIN (Cost=13538 Card=14150 Bytes=693350)


2 1
TABLE ACCESS (FULL) OF 'L' (Cost=99 Card=18068 Bytes=234884)


3 1
HASH JOIN (SEMI) (Cost=13435 Card=15717 Bytes=565812)


4 3
TABLE ACCESS (FULL) OF 'A' (Cost=155 Card=26533 Bytes=530660)


5 3
TABLE ACCESS (FULL) OF 'CTD' (Cost=13272 Card=58910 B
ytes=942560)

It got the individual table cards right, but the output of the HASH JOIN SEMI is way off?

Here is the RBO output

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.01 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 314 25.84 34.42 23074 475040 0 4689
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 316 25.85 34.44 23074 475040 0 4689

Misses in library cache during parse: 1
Optimizer goal: RULE
Parsing user id: 1235

Rows Row Source Operation
------- ---------------------------------------------------
4689 VIEW (cr=475040 r=23074 w=0 time=34394331 us)
4689 SORT UNIQUE (cr=475040 r=23074 w=0 time=34388758 us)
30220 NESTED LOOPS (cr=475040 r=23074 w=0 time=33955797 us)
53996 NESTED LOOPS (cr=361386 r=22957 w=0 time=30979246 us)
54126 TABLE ACCESS BY INDEX ROWID CTD(cr=245931 r=22851 w=0 time=27198233 us)
910832 INDEX RANGE SCAN CTD_NK(cr=9295 r=9295 w=0 time=5161471 us)(object id 532812)
53996 TABLE ACCESS BY INDEX ROWID A (cr=115455 r=106 w=0 time=3181080 us)
54126 INDEX UNIQUE SCAN A_PK (cr=54128 r=0 w=0 time=1122936 us)(object id 202429)
30220 TABLE ACCESS BY INDEX ROWID L (cr=113654 r=117 w=0 time=2461588 us)
53996 INDEX UNIQUE SCAN L_PK (cr=53998 r=1 w=0 time=1092650 us)(object id 202710)



Thanks

Tom Kyte
November 09, 2004 - 11:05 am UTC

where is the query itself?

and what non-default init.ora parameters do you have set.

A reader, November 09, 2004 - 11:17 am UTC

Come on, you cant guess the query from the plan? :-) Here it is

select accnt_no
from a,l
where a.l_pk=l.l_pk
and l.col1 = 'Client'
and a.some_flg=1
and a_pk in (
select a_pk
from ctd
where ctd.the_date<='19-aug-2004' and ctd.the_date>='01-jan-2004' and ctd.the_code in ('1Y','6L','4Z')
) ;

There is an index on ctd.the_date

Optimizer related non-default parameters are

optimizer_index_cost_adj=10
optimizer_index_caching=90

But, changing these does not change the CBO plan, it still FTSs the 3 tables and hashes them

Tom Kyte
November 09, 2004 - 11:23 am UTC

no other non-default parameters? db_file multi block read count, hash area size, pga_aggregate_target, workarea policy, etc etc etc.

A reader, November 09, 2004 - 11:28 am UTC

select name,value from v$parameter where isdefault='FALSE';

[I edited out some archiver related parameters]

processes 400
timed_statistics TRUE
resource_limit TRUE
shared_pool_size 67108864
db_block_checksum TRUE
db_block_size 8192
db_cache_size 218103808
compatible 9.2.0
db_files 1021
undo_management AUTO
undo_tablespace RBS
undo_retention 86400
max_enabled_roles 148
session_cached_cursors 8
parallel_min_servers 4
parallel_max_servers 16
optimizer_features_enable 9.2.0
audit_trail DB
open_cursors 500
parallel_adaptive_multi_user TRUE
parallel_automatic_tuning TRUE
optimizer_index_cost_adj 10
optimizer_index_caching 90
pga_aggregate_target 104857600
workarea_size_policy AUTO


Tom Kyte
November 09, 2004 - 12:12 pm UTC

at the end of the day -- everything looks OK -- it is doing what it is designed to do. It did pick the "right plan" given all of the inputs it had.



select accnt_no
from a,l,(select a_pk
from ctd
where ctd.the_date<='19-aug-2004' and ctd.the_date>='01-jan-2004' and
ctd.the_code in ('1Y','6L','4Z')
) c
where a.l_pk=l.l_pk
and l.col1 = 'Client'
and a.some_flg=1
and a.a_pk = c.a_pk;

what does that do -- use distinct if need be on the inline view.

A reader, November 09, 2004 - 1:51 pm UTC

No difference

Tom Kyte
November 09, 2004 - 2:24 pm UTC

first rows hint it then.

A reader, November 09, 2004 - 3:15 pm UTC

Yes, that did the trick, but I have frequently found that FIRST_ROWS hinting produces the same plan as the RULE hint!

Why did first_rows work? What information did that give the CBO that it didnt already have?

So, in general, are you recommending that one should try FIRST_ROWS whenever CBO generates a sub-optimal plan?

4689 rows selected.

Elapsed: 00:00:28.52

Execution Plan
----------------------------------------------------------
0
SELECT STATEMENT Optimizer=HINT: FIRST_ROWS (Cost=37036 Card=53036 Bytes=2598764
)


1 0
NESTED LOOPS (Cost=37036 Card=53036 Bytes=2598764)


2 1
NESTED LOOPS (Cost=31101 Card=58910 Bytes=2120760)


3 2
SORT (UNIQUE)


4 3
TABLE ACCESS (BY INDEX ROWID) OF 'CTD' (Cost=24859 Card=58910 Bytes=942560)


5 4
INDEX (RANGE SCAN) OF 'CTD_NK' (NON-UNIQUE) (Cost=3926 Card=901746)


6 2
TABLE ACCESS (BY INDEX ROWID) OF 'A' (Cost=2 Card=1 Bytes=20)


7 6
INDEX (UNIQUE SCAN) OF 'A_PK' (UNIQUE) (Cost=1 Card=1)


8 1
TABLE ACCESS (BY INDEX ROWID) OF 'L' (Cost=2 Card=1 Bytes=13)


9 8
INDEX (UNIQUE SCAN) OF 'L_PK' (UNIQUE) (Cost=1 Card=1)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
268706 consistent gets
22167 physical reads
0 redo size
95541 bytes sent via SQL*Net to client
4083 bytes received via SQL*Net from client
314 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
4689 rows processed


Tom Kyte
November 09, 2004 - 3:40 pm UTC

it can produce any plan open to the CBO -- many options available to the CBO that the RBO isn't clued into at all.


You see, in this particular case, the original CBO plan was "right", it did everything by the book -- even if the semi join numbers were corrected -- your apparent issue was not at that level (it was with cdt, it already "happened" by then).

No, I'm not saying that in general, we are at the end of a winding road here -- it was not the first choice by any means at all.

To Reader on CBO vs RBO

msc, November 09, 2004 - 3:28 pm UTC

Hello Reader!

> How can I make the CBO query go faster?
> I agree that the CBO query is the
> superior one since it does way less LIO,
> but why isnt it the "winner"? Thanks

This is only quess but...

Oracle spends to much time on physical IO (look on tracefile). You have to reduce IO operations on full reads.

Your db_file_multi_block_read_count has default value (8).
Set is something *large*:

alter session set db_file_multi_block_read_count = 128

and try then again.



Tom Kyte
November 09, 2004 - 3:42 pm UTC

umm, not sure I would do that -- if you look at the difference between cpu and ela, the one doing more IO wasn't as penalized!

Even if you made the PO "free" here -- it wouldn't have "fixed" it.

and upping the dbmbrc means full scan costs go down -- meaning a full scan will become even more attractive in more cases.

difference between cpu and ela...

msc, November 09, 2004 - 5:50 pm UTC

Tom,

you are right (again ;-)) - cpu and ela are practically same - PIO wait times should not be problem here (and RBO has anyway longer wait times on PIO).

Fetch cpu elapsed diff(PIO wait)
CBO: 36.24 43.34 7,1
RBO: 25.84 34.42 8,58

Can the parameter db_block_checksum=TRUE cause more cpu on CBO case? More PIO results more CPU, but more LIO does not.

Fetch cpu disk query
CBO: 36.24 170886 175661
RBO: 25.84 23074 475040



> and upping the dbmbrc means full scan costs go down --
> meaning a full scan will
> become even more attractive in more cases.

Yes - I had no intention to avoid full scan. By the book the full scan is "the right thing". Rising dbmbrc usually reduces full scan time on PIO. I was looking at the line Row Source Operation:

54126 TABLE ACCESS FULL CTD (cr=172717 r=170105 w=0 time=42231529 us)

where the most time is spent.





Tom Kyte
November 09, 2004 - 7:42 pm UTC

i should have asked for the version -- it could be as simple as "sql_trace=true" is doing it -- in various 9ir2, sql_trace had a huge overhead in some cases.

I *suppose* it could have been db block checksum (be interesting to test that theory), since it is computed upon read/write only (not LIO) and we did read more.

A reader, November 09, 2004 - 7:52 pm UTC

My version is 9.2.0.4

Hmmm, wonder how to test the effects of db_block_checksum on PIO? But, regardless, if it is indeed that, there is not really much that be done, right? Having db_block_checksum turned on is a "good thing", right?

Also, why does the CBO plan incur 90% of its LIO as PIO? Does this have anything to do with the way my data is laid out? Why doesnt the RBO/first_rows plan see the same behaviour? Or am I over-analysing this? ;)

Tom Kyte
November 09, 2004 - 8:52 pm UTC

yes, having it on is a "good thing (tm)" in general.


the full scan blocks age out faster than single block IO's do, they are not there to be LIO'ed.

there probably isn't anything such as "over analysing" -- as long as we don't get paralyzed -- looking at something and trying to understand it, is good.

use single table index clusters to get colocated tables?

A reader, November 19, 2004 - 7:23 am UTC

hi tom,

perfect as always, you showed us the impact of colocated and disorganised tables on LIO's.

i would like to know if you would suggest to go for single table index clusters to get colocated tables? or would you prefer to simply rebuild heap organised tables from time to time?

regards,
max


Tom Kyte
November 19, 2004 - 11:22 am UTC

remember -- a table can be sorted exactly ONE WAY (unless there is some inferred relationship, eg: document_id = sequence.nextval, document_number = increasing sequence as well -- table could be sorted by both documnent id and document number by accident)


But you have to answer the question "gee, does the data need to be physically colocated".

If the answer is yes, you have the choice of

o clustering (hash or btree)
o IOT's
o waiting for the structures to break down and rebuild them

If physical colocation was my goal, i use a structure (IOT's are really nice for this) to do my work for me.

Best source of information about LIO/PIO/arraysize I have found anywhere.

Impressed, January 15, 2005 - 12:30 am UTC

What a great thread.

Latches

Ik, February 16, 2005 - 3:19 am UTC

Tom,

You have said "to get a block from the cache, we have to latch (to prevent someone from
modifying the data structures we are currently reading)."

Does this mean that readers block other readers? Means that if i issue a query that reads a row from Block A and simultaneously another query also tries to access block A - would this be serialized..in the sense - Only after the latch by query 1 gets released can query 2 read the block? Is it so?

Do not know if iam making much sense here...but i sure am confused here.

Thanks Tom in advance....

Tom Kyte
February 16, 2005 - 8:06 am UTC

depends on the implementation of latching on that OS how it is handled. And remember, there are many cbc latches -- the buffer cache is split up into many pieces by data base block address (DBA), we hash that to a list, and put that block on that list and there are many lists.



Why no timing results

Mike Ault, March 30, 2005 - 8:23 am UTC

Tom,

In looking through your results for your proofs, I rarely see timing results. I see lots of LIO/PIO and various other internal statistics, but rarely do I see the one statistic that in the end game the users will care about, namely: How fast does it come back to me?

Getting all of these fancy stats, without getting back timing data is quite frankly, like getting back wait statistics without timing data, worthless.

Mike

Tom Kyte
March 30, 2005 - 9:39 am UTC

tkprof? you don't see tons of tkprofs on this site? huh. (you were kidding on that rarely I hope, it is sort of a trademark of sorts. here is the tkprof, lets look at the cpu time, consider how doing this does that....). There is that thread on predictive stuff you participated in, as I recall there was a timing or two in there.

and no, most proofs, sets of evidence, facts here aren't about performance and timing, many (i would say most) of my proofs are evidence that what I say is true under a set of circumstances. "I am saying that in order to reduce LIO's, you want to measure this, look at this, try this".

But, to say "I rarely see", well, I don't know what site you have been reading or what books you have been looking at personally?

What I rarely see.... what I rarely see on some other sites are the supporting bits of evidence to go with statements, and those unqualified statements are many times wrong (we just had a long thread about that recently -- never could get educated on how to see something that has been seen so often by someone else).

It seems that showing how things work has been deemed irresponsible and evil by your company. These "false proofs" as your company calls them are "very dangerous, especially to beginners". I suppose filling them up with years of hunches, with no way to tell if they apply or not is better?? Lets give them a box of silver bullets....

In the "false proof" vs "I have a hunch", I'll come down on the side of "show me" every time...

You can twist the word "proof" any which way you and your company wants, but it is also known as "evidence". For example, if someone says "if you do this, you can expect to see this" -- I want the evidence as to why you make that cause and effect be linked together, we should be able to see it, we should be able to explain it. If you just say -- "well every time I've done X, Y seems to be true" that is nice, but doesn't explain anything and frankly I'm curious. Maybe what we have (typically I might dare say) is a case of false causality (examples below) where the thing X we are doing produces Y as an accidental side effect (sounds good at first), but if we actually took a second to look at it -- we could discover (many times do) that if we did Z instead of X, we not only achieve Y (and maybe even Y++), but we get Y or Y++ at a fraction of the cost as compared to when we did X.

And at least the numbers you see here are "real", they are not made up in an attempt to make the work look better or more real. Doing this stuff takes time, if I expect people to believe me, I better show them why what I say is true.
</code> http://www.jlcomp.demon.co.uk/review_01.html

"Me: use bind variables.  Developer: Why.  Me: Because I said so and I've been doing this for years.  Developer: Go away."

or

"Me: use bind variables.  Developer: Why.  

Me: Because here is the simulation that shows the amount of work performed by the database.  Here is a larger 5 user simulation showing the growing impact of this (whoa, what is that, he does multi-user stuff too, on a multi-cpu server, who'd of thunk it).  As you can see the increased latching (and let me define a latch for you...) in a MULTI-USER environment has this serious side effect, one that is provably not solvable with more hardware (the only thing you can prove is that you always need one more cpu when you have a massive hard parse problem).  Now lets ramp this up even more.

Developer: I had no idea, ok"

You might have the clout to just say "Because I said so", but you know some people still want to be shown.  And I have no problem showing how the software works.  Further, I have no problem showing how the software *doesn't* work (those negative proofs you don't like)

We were discussing LIO and PIO's here frankly.  Perhaps one of the other pages on this site has a timing or two. (and you know what is really cool -- you can take my work here and rip it to shreds.  You can actually see my assumptions, you can see exactly what I did, you can actually reproduce it -- or not, and then ask "why not"  -- and we'll discuss it!  We'll refine it.  I'll not just tell you "this is so, I've seen it, trust me".  I'll very much try to backup my hypothesis with something concrete.  No silver bullets, whatever they are. Wish we could comment on all published articles this way on the web -- don't you?)

Here I was trying to talk about why LIO's are not "free", many people believe that myth -- that LIO is good, free, or at least very cheap.  

do you disagree with "the less LIO's you do, the less work you do"?  Do LIO's require latches?  Are latches a serialization device?  Do you try to keep logical reads low yourself?  Or don't you care about how high they go?

Did you read the original question?  "why is Logical I/O harmful"?  That was the crux of this one particular page.

Here is one with time:
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:6749454952894#6760861174154
already linked to on this page.  Very much the same.  


guess not every 'proof' in the world is about performance, most of my examples are just that -- "hey, here is way to understand how the software actually works, you don't have to TRUST me to be right -- you can see for yourself. And remember, things change....  so keep on running them.  And if someone says "X is true, I know it to be so why?  BECAUSE I SEE IT ALL OF THE TIME" -- but cannot explain how they "see it all of the time", maybe they are seeing something else and not realizing it.  

They show you how to figure out what direction to take, what might be the best way to try first.

Rather than just "guess".

How about this:

I have a hunch, call it intuition, that being in my car makes my cell phone ring.  Really, I'm not making this up.  You see, almost every day when driving, it rings.  When I'm not driving, it doesn't ring nearly as much.  No fooling.  I see it all of the time, very repeatable.  I don't need to understand why it is true, it just is.  So, in order to get more calls, I'll go sit in my car.

Actually, I basically drive to and from work and that is about it.  My wife calls me on the way home to stop somewhere to do something like get milk frequently.  So, maybe it has less to do with me being in my car than the fact that I tend to leave at the same time each day to go home and she doesn't like to bother me at work.  Could be that -- or it could be that my car actually induces the phone to ring.


without studying things, we'll never understand why they are so.  I think Jonathan Lewis wrote this up very nicely:
http://www.jlcomp.demon.co.uk/ind_misc.html <code>
in the "The Rumsfeld Box observations and understanding".


Otherwise we get into the world of:

I've seen exp/imp of a table go faster (observation). compress=y is the default (observation). Therefore, a single extent is best (hypothesis). If you stop there, well, what do you have? A myth that was always a myth.

I've seen that by putting indexes on one disk and tables on another things go faster (observation). Therefore, tables and indexes must be separated (hypothesis). If you stop there, well, what do you have? A myth that was always a myth. (in actuality, if you were to have in the olden days used small extents and a single tablespace with BOTH devices in it -- two files, and let Oracle stripe the table and index over BOTH devices in a round robin fashion as it was designed to do -- you could have achieved the same or probably even better, it wasn't that index and tables need to be separate, but rather that IO should be distributed)

but of course that second "fact" about indexes/tables always conflicted with the first "fact" about single extents.... so it was a very circular sort of thing.

And the list goes on and on and on and on....
o most selective columns first in an index (utter mythology, meaningless actually)

o views are evil things that slow performance (myth)

o issue frequent commits (myth) AND yes, you can actually with 100% accuracy prove that issuing frequent commits is slower and wastes resources -- you can actually prove that. Believe it, or not.

o primary keys must have a unique index (myth)

o count(1) is superior to count(*) (myth)

o remove large table full table scans (myth)

o hit ratios should be as near 100% as possible (myth)

o .................


and a funny thing about proving things Mike. I sort of recall a discussion you and I had about "undropping a column"

I had an hunch, call it intuition, that updating the data dictionary -- especially as you had suggested it (being someone that many newbies 'trust'), was a "bad thing". This hunch was based on many years of seeing bad things happen as a result of this sort of technique. In fact, it was how asktom and this whole site got started (but that is another story)

At that point, you basically said "all hypothetical, doesn't mean it is bad, prove it to me". You asked me "can you show me an actual test case on this?"

You got me there, trapped by my own words. So I did, using the tools of the trade -- sql_trace to see what happens when Oracle sets a column unused, seeing what you did to undo it, HYPOTHESIZING what might go wrong since you fixed one of about a dozen update cascades in the dictionary. Found three easy ones (wrong answers from queries, sporadic ora-xxxx errors, and access to data I should not have access to).

So, it shows the positive power of a test case. Many times these test cases are in fact used to show that something is not universally true (it is in fact very easy to prove something is not true with them and that is in fact what most of them do).



Now, I know this is getting long winded, but as someone once said to me "I'll leave that to folks who use pages to show what could be easily stated in paragraphs" -- I'll continue to use pages thanks very much.



I call those people RAINDANCERs

Andy Schwarz, March 31, 2005 - 3:33 am UTC

Hi Tom,
that was the longest definition about raindancers and why to avoid them I've ever seen. Brilliant!

For me a raindancer is a person who claims he can make it rain through dancing.
He can always proof it in most places of the world. It's just a matter of how long you pay him. And of course, if it takes longer (weeks, months) you pay more and at the end when it's raining everybody is sure he did the right thing.

The IT world is full of highly paid raindances and carpet-sales persons(other story) but it's the only job where you can always KNOW. But raindances can PowerPoint better sometimes, and it is so much easier to understand.

Thanks for all your efforts and proves.
Andy


optimizer_* parameters

bob, March 31, 2005 - 8:26 am UTC

Tom,

Don Burleson says

<quote>
I think Tom Kyte might have forgotten how changing parameters can reduce LIO. Changing the values of optimizer_index_cost_adj and optimizer_index_caching will change the CBOs decision about whether to do a full-scan or an index access execution plan.

Thus, the optimizer_* setting has a direct impact on the amount of consistent gets for any query.

This statement is false, right?
</quote>

What about that?

Tom Kyte
March 31, 2005 - 9:16 am UTC

You must be referring to the thread on his discussion forums where he is supposedly "debunking" this "evil myth that test cases are evil, especially for beginners".  Actually, he says "is debunked" as in past tense.  It is funny you know, I always thought that debunking included some level of 'showing us'.  I've yet to see a single fact that shows us that these test cases are 'bad'.  I suppose that if he means "to debunk X is to say X is bad over and over and over", he has debunked it.  I think of debunking as showing - but there he and I apparently disagree.  

Given that most of the test cases I provide show how the software works, how to measure what you are doing -- not sure what his hang up is on "performance proofs" is, most proofs show how things work, give you a clue as to the path to take..  

How many proofs have you seen me do that say "do this always".  I'm more or less known for giving the good and the bad with everything (look how the chapters in Expert One on One are laid out.  For example a chapter on MV's

Materialized Views
  A Brief History
  An Example
  Why use materialized views <<<==== hey, lets understand when they are useful
  How they work              <<<==== hey, lets understand how they work
  Caveats                    <<<==== hey, lets understand WHEN THEY don't and 
                                     why

basically hundreds of pages following the same format -- "what, why, when, when not".  So when he said:

<quote>
Some authors are misleading their trusting followers with the mantra of Prove it, and they never note that your mileage may vary, especially for performance-related proofs.
</quote>

I certainly hope he wasn't referring to me (he's read my material, I certainly hope he did anyway -- or else his review on amazon.com is sort of misleading).

I don't see how showing how the software works (and more importantly doesn't work) is bad or misleading myself.


But back to the question at hand.  I said above the optimizer_* parameters don't have a direct cause and effect on LIO.  Why?  Because they do not.  They (as carefully explained in Effective Oracle by Design, chapter 6) have a direct cause and effect on the costing of plans.  

Maybe Don is forgetting that one of three things will happen upon changing these parameters:

a) nothing at all, not a thing changes
b) LIO goes up for a given query.
c) LIO goes down for a given query.

A direct impact would imply that we could set up a formula that says "if you change parameter A by n%, then you will obvserve a m% change in LIO".  But if any (or some or all) of A,B, and C can happen in my system -- can you say such a thing?

I know he isn't finding test cases useful, but perhaps an ilustrative example would be useful:


ops$tkyte@ORA9IR2> create table t as select * from all_objects;
 
Table created.
 
ops$tkyte@ORA9IR2> create index t_idx on t(object_id);
 
Index created.
 
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats(user,'T',cascade=>true);
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> set autotrace traceonly
ops$tkyte@ORA9IR2> alter session set optimizer_index_caching = 0;
 
Session altered.
 
ops$tkyte@ORA9IR2> alter session set optimizer_index_cost_adj = 100;
 
Session altered.
 
ops$tkyte@ORA9IR2> select * from t where object_id < &1;
old   1: select * from t where object_id < &1
new   1: select * from t where object_id < 900
 
837 rows selected.
 
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=62 Card=661 Bytes=61473)
   1    0   TABLE ACCESS (FULL) OF 'T' (Cost=62 Card=661 Bytes=61473)
 
 
 
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        443  consistent gets
          0  physical reads
          0  redo size
      44109  bytes sent via SQL*Net to client
       1104  bytes received via SQL*Net from client
         57  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        837  rows processed
 
ops$tkyte@ORA9IR2> select * from t where object_id < &1;
old   1: select * from t where object_id < &1
new   1: select * from t where object_id < 900
 
837 rows selected.
 
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=62 Card=661 Bytes=61473)
   1    0   TABLE ACCESS (FULL) OF 'T' (Cost=62 Card=661 Bytes=61473)
 
 
 
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        443  consistent gets
          0  physical reads
          0  redo size
      44109  bytes sent via SQL*Net to client
       1104  bytes received via SQL*Net from client
         57  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        837  rows processed
 
ops$tkyte@ORA9IR2> alter session set optimizer_index_caching = 90;
 
Session altered.
 
ops$tkyte@ORA9IR2> alter session set optimizer_index_cost_adj = 10;
 
Session altered.
 
ops$tkyte@ORA9IR2> select * from t where object_id < &1;
old   1: select * from t where object_id < &1
new   1: select * from t where object_id < 900
 
837 rows selected.
 
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=59 Card=661 Bytes=61473)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=59 Card=661 Bytes=61473)
   2    1     INDEX (RANGE SCAN) OF 'T_IDX' (NON-UNIQUE) (Cost=3 Card=661)
 
 
 
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        712  consistent gets
          0  physical reads
          0  redo size
      35571  bytes sent via SQL*Net to client
       1104  bytes received via SQL*Net from client
         57  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        837  rows processed
 
ops$tkyte@ORA9IR2> select * from t where object_id < &1;
old   1: select * from t where object_id < &1
new   1: select * from t where object_id < 900
 
837 rows selected.
 
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=59 Card=661 Bytes=61473)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=59 Card=661 Bytes=61473)
   2    1     INDEX (RANGE SCAN) OF 'T_IDX' (NON-UNIQUE) (Cost=3 Card=661)
 
 
 
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        712  consistent gets
          0  physical reads
          0  redo size
      35571  bytes sent via SQL*Net to client
       1104  bytes received via SQL*Net from client
         57  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        837  rows processed
 
ops$tkyte@ORA9IR2> set autotrace off


<b>so, there is an example whereby the LIO goes up.  On my system with my data, running @test 1000 shows *NO CHANGE IN LIO* because the plans did not change. and on my system @test 500 shows *DECREASE IN LIO*

So, did changing the optimizer_* parameters have a direct impact on the amount of consistent gets?  It had a direct IMPACT on the plans choosen, but for a given plan (assuming it did not change) it would have ZERO impact.

That is the point here -- simply changing that parameter does not mean LIO's will go down (that would be magic).  Changing that parameter means you've set off a rather complex chain of events that fundementally change the costing model of the optimizer and may lead to a different plan which will sometimes:

a) nothing at all, not a thing changes
b) LIO goes up for a given query.
c) LIO goes down for a given query.</b>



When asked repeatedly for an example -- a single example, just one -- whereby these tests cases, understanding how the software actually works -- Don had this very interesting comment to make:

<quote>
<from a reader>So, which of the hundreds of test cases that Tom has -- either in one of his two books or on his site -- don't hold up in the 'real world'. I'm curious to know which ones you means specifically for I've seen lots of his examples and they seem to hold water with me. Can you 'scientifically' tell, show or otherwise relate which ones don't hold up?
</from a reader>

<from Don> That's the whole point. <b>I don't really care enough to find out. </b>Really, a valid re-test can take weeks to set-up and run. All that matters (to me) is that something worked (or didn't) under a particular test. I refine my rule-of-thumb accordingly, and move on.
</from Don>
</quote>

wow, I actually could not believe what I was reading.  His argument is "I don't know why, I don't care why, it is bad -- trust me, I've been doing this for a long time and I feel it is bad".  

Huh.  Still scratching my head on that one.

Finally an example did come up, first starting with the "story so cool, I cannot tell you" Don says he'd have to ask his lawyers if it would be OK to talk of a DBA getting fired for:

<quote>
Yeah, I think we do! I know of one case where a DBA lost his job after submitting a performace "proof" from an Oracle book.

Let me talk to my lawyer and see if I can give you examples without getting sued!
</quote>

Wow, sounds pretty darn exciting.  Makes a nice headline.  Makes you scared to learn doesn't it.  So finally you get followup from Don, a tad more detail:

<quote>
In this the DBA read a "proof" that raw was faster than cooked I/O, and talked management into spending $20k on Veritas to speed-up his I/O, based on the "proof", from an author he trusted.

The I/O went faster, but the overall performance did not change (because he was CPU-bound).
</quote>

well, now excuse me for being really totally confused by this.  So, if instead of some anonymous author saying "Raw is faster, here is my setup, here are my conditions, here is my test" -- it would be just dandy for someone like Don with so much experience to just say "Raw is faster, trust me"

I don't see the difference here -- if DBA had read a "just trust me book" from an author he trusted and that author said "Raw is faster", how is that any different?????  How does this even remotely show 'test cases bad, misleading, very dangerous'???

In fact, as someone else pointed out, it seems to make the CASE for TEST CASES.  What that DBA should have done (I think you've heard me say this before) was evaluate the information, test it out, see if it holds water, see if it applies.

The DBA tried to apply a SILVER BULLET (I know of someone that's written that silver bullets exist so it must be true), the SILVER BULLET was "Raw is faster" (not true -- I can show raw faster, slower and just the same as cooked -- you need to understand some things about your system).  The problem was, the DBA never tried to understand what resource they needed to provide more of (either by reducing their usage of said resource or getting more of said resource).

His "example" helps prove that test cases, trying things out, benchmarking, testing are very important (for that, I thank him profusely, it is always good when someone provides evidence that the best practice of benchmarking is a good thing).


Sorry, but I'll stay in the camp of "I'm from Missouri, so show me".  Guesses, hunches, intuition - great if you are really lucky.  So, do you feel lucky?  


You know what I find misleading?  Making stuff up.  Providing things like silver bullets based on phony test cases.  Trying to make your stuff look real when it is provably made up.  That I find a tad misleading.  Here is an example of why showing something, offering EVIDENCE, should I say showing some proof is important:

<quote src= 
http://www.dbazine.com/burleson20.shtml
>
Implement cursor_sharing=force

I had a 9.0.2.4 database in Toronto Canada where the end-users were complaining about poor performance right after a new manufacturing plant was added to the existing database.  A quick look at the STATSPACK top-5 timed events looked like this:

Top 5 Wait Events
~~~~~~~~~~~~~~~~~                                             Wait     % Total
Event                                               Waits  Time (cs)   Wt Time
-------------------------------------------- ------------ ------------ -------
enqueue                                            25,901      479,654   46.71
db file scattered read                         10,579,442      197,205   29.20
db file sequential read                           724,325      196,583    9.14
latch free                                      1,150,979       51,084    4.97
log file parallel write                           148,932       39,822    3.88

My first look was into the SQL section of the STATSPACK report where I noted that almost all of the SQL was using literals in the WHERE clause of all queries.

WHERE customer_state = Alabama and customer_type = REDNECK;

This was a vendor package with dynamically-generated SQL, so cursor_sharing was the only fast solution.  Setting cursor_sharing=force greatly reduced the contention on the library cache and reduce CPU consumption. The end-users reported a 75% improvement in overall performance.
</quote>



One might question a couple of things here:

a) if  197,205  is 29.20% of the wait time, how could 196,583 be only 9.14%?

b) if this came from a 9i database, why does it say "Top 5 Wait Events"

c) since enqueue waits won't be affected by cursor sharing (they are basically the side effect of me updating a row and you trying to update the same row -- fixing a bind variable issue will have no effect on that) 

d) if you have a bind variable problem -- would the top SQL section of the statspack report (which reports by unique sql) really show you this?  I mean, if we have literal SQL -- executed once, not over and over....

e) How reducing the contention on the library cache and reduce CPU consumption would fix an enqueue wait problem

f) how removing that tiny bit of latch wait (relative to the enqueue and IO waits) would result in a 75% improvement



So, If I trusted this author, I am now led to believe that

a) when you have an enqueue wait problem, the solution is to set cursor sharing (with possible horrendous side effects by the way that might get you fired)


I myself would spend (have spent) a couple of pages (instead of paragraphs) explaining the "who, what, where, when and why".  The side effects of setting cursor_sharing=force/similar.  What cursor sharing = force/similar do under the covers.  What problems they can and cannot solve.  How to set them, where to set them, why to set them.

But to present as a magical silver bullet?  


Don seems to be putting words into mouths.  He keeps talking about performance proofs.  I have simulations that show how things work.  I wish he would show a performance proof and show what they look like?  Maybe he is crying fire in an empty theatre here.

Could be this is about the predictive thread 
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:35336203098853

all we did there was show that rebuilding can

a) do nothing
b) do something good
c) do something bad

and wouldn't it be nice to understand what your goals are and implement a solution for them.  Not to find indexes with whitespace and rebuild them (because it can make your 'system' go slower, faster or not change at all).

I'd still love -- just love -- to know how to see these unbalanced freelists that happen all of the time.


Never say Never, Never Say Always -- I always say.

(oh wait, that is a conundrum isn't it)
 

I can't believe that someone would purposely instruct people to NOT TEST things

Ian, March 31, 2005 - 9:47 am UTC

I just really can't believe it. It actually hurts my head to consider.

If you don't adequately test to prove/disprove a hypothesis, how you can be sure that it wasn't affected by variables that weren't directly controlled?

The whole debate over science/art is irrelevant to this. This is simple logic surely?

This kind of information coming to light really is scary for me as someone striving to learn all I can about database systems in general and Oracle in particular. The fact that 'experts' actually fabricate evidence to prove an instinct... If I didn't know about resources such as this site and Jonathan Lewis' site - how would I know this stuff is false..?

TESTING is how!!!!!!!

</rant>

Tom Kyte
March 31, 2005 - 9:51 am UTC

<quote>
how would I know this stuff is false..?
</quote>

well, by testing it :)

Amazing

A reader, March 31, 2005 - 9:54 am UTC

No Tom, its because of you.

Why my physical IO not getting reduced ?

Parag J Patankar, March 31, 2005 - 9:58 am UTC

Hi Tom,

I have done

create table t as select * from all_objects;

select * from t where object_name like 'T%'
20:19:48 SQL> /

783 rows selected.


Execution Plan
----------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   TABLE ACCESS (FULL) OF 'T'




Statistics
----------------------------------------------------
          0  recursive calls
         12  db block gets
        519  consistent gets
        462  physical reads
          0  redo size
      41883  bytes sent via SQL*Net to client
       3791  bytes received via SQL*Net from client
         55  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        783  rows processed

20:19:50 SQL> /

783 rows selected.


Execution Plan
----------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   TABLE ACCESS (FULL) OF 'T'




Statistics
----------------------------------------------------
          0  recursive calls
         12  db block gets
        519  consistent gets
        462  physical reads
          0  redo size
      41883  bytes sent via SQL*Net to client
       3791  bytes received via SQL*Net from client
         55  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        783  rows processed

20:19:53 SQL> 

I am executing sql twice immediately. My understanding regarding Logical IO and Physical IOs 

LIO = logical = buffer cache.  LIO *may* have incurred a PIO in order to get 
into the cache in the first place.

PIO = physical = disk read

So when I am executing SQL 2nd time my data should found in buffer. But when I saw two traces my physical I/Os are exactly same ? Kindly tell me why ?

regards & thanks
pjp 

Tom Kyte
March 31, 2005 - 10:21 am UTC

cache sizes, number of concurrent users, many things will affect that. whether you full scan, range scan, whatever.


That is why I prefer to show you how things work, rather than make blanket wide spread generalizations.  

Seems you just have a small buffer cache or other factors are influencing your ability to cache the blocks.  It is easy to see:


ops$tkyte@ORA9IR2> create table t as select * from all_objects union all select * from all_objects;
 
Table created.
 
ops$tkyte@ORA9IR2> show parameter db_cache_size;
 
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_cache_size                        big integer 67108864


ops$tkyte@ORA9IR2> set autotrace traceonly
ops$tkyte@ORA9IR2> select * from t;
 
55870 rows selected.
 
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   TABLE ACCESS (FULL) OF 'T'
 
 
Statistics
----------------------------------------------------------
       4431  consistent gets
        765  physical reads
 
ops$tkyte@ORA9IR2> select * from t;
55870 rows selected.
 
Statistics
----------------------------------------------------------
       4431  consistent gets
 
ops$tkyte@ORA9IR2> select * from t;
55870 rows selected.

Statistics
----------------------------------------------------------
       4431  consistent gets
          0  physical reads
 
ops$tkyte@ORA9IR2> set autotrace off
ops$tkyte@ORA9IR2> alter system set db_cache_size=4m;
System altered.
 
ops$tkyte@ORA9IR2> show parameter db_cache_size;
 
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_cache_size                        big integer 4194304

ops$tkyte@ORA9IR2> set autotrace traceonly
ops$tkyte@ORA9IR2> select * from t;
55870 rows selected.
 
Statistics
----------------------------------------------------------
       4431  consistent gets
        765  physical reads
 
ops$tkyte@ORA9IR2> select * from t;
55870 rows selected.

Statistics
----------------------------------------------------------
       4431  consistent gets
        675  physical reads
 
ops$tkyte@ORA9IR2> select * from t;
55870 rows selected.
 
Statistics
----------------------------------------------------------
       4431  consistent gets
        676  physical reads
 
ops$tkyte@ORA9IR2> set autotrace off
 

Excellent as usual, Tom!

Bill Schwartz, March 31, 2005 - 10:24 am UTC

As some here may know, I have been following that thread (very minor contributions on occasion, others are certainly providing more of the technical side) for days now. For me, it is like watching a traffic accident: I don't want to look, but I can't look away either. It absolutely astounds me how quickly the contradictions come. And the dancing, that thread sort of feels like the '70s all over again.

For anyone out there who is new to this site (AskTom), stay tuned here - this is one of the best Oracle sites going, Tom has always provided quality answers, and just remember to stay away from those "experts" that keep saying "trust me, I know, I don't have to prove it".

And Tom, please keep providing those same "useless" test cases - for we are learning from them.

Tom Kyte
March 31, 2005 - 10:48 am UTC

Funny thing on his site. we just have these confusing conundrums:

<quote src=</code> http://www.dba-oracle.com/oracle_tips_load_blksze.htm
>
In sum, I am always wary of anyone who claims that they can prove the way that Oracle behaves with a code snippet.
</quote>

but later screams out in clarification:

<quote src=
http://dba.ipbhost.com/index.php?showtopic=1396&st=90
>
Scripts CAN AND DO PROVE how Oracle works
</quote>

Anyway, you can twist terms and semantics all you want.  Proof is also known as evidence.  Knowledge is good, not bad (opinion obviously, call it a "hunch" or "intuition").  If you believe that X is caused by Y, we should be able to reason that out and in doing so, you might find that Z causes X and Z is much nicer than Y.

To paraphrase a cartoon of his
http://www.dba-oracle.com/oracle_cartoons.htm <code>
(i thought it was funnier when the cartoon had my name by it instead of "scientists's" but so be it. Gotta wonder why 'scientist' are so mean or evil as to deserve cartoons)

You can either do a massive room cleaning every month and live in clean stuff for a couple of days, progressively getting messier, until you mom yells as you to clean your room.

or

You can do it once, get it over with, keep it clean and have more time on Saturday to play with your friends.


You choose. I always hated cleaning my room as a kid, as I matured over time, I found that keeping it clean worked pretty well -- overall tons less work and I appreciated my room for a longer period.

His claim that there are two sides for example:

<quote src=cartoon>
There are two schools of thought about periodic Oracle maintenance. One claims that Oracle tables and indexes rarely become sub-optimal and seldom need re-building. The other-side maintains that regularly-scheduled table and index reorganizations are unavoidable, and that most high-DML objects show increasing rates of I/O as the internal structures become sub-optimal from high insert/update/delete activity.
</quote>


Hmm, wouldn't you rather figure out "hey, you set pctused/pctfree wrong on the table, we can rebuild this once and be done with it", rather than be involved in the "ground hog day movie"

(classic Y implies X falicy. Y = scheduled reorg. X = short term improvment. Z= figuring out pctused/pctfree is wrong, fixing it once and X++ = long term improvement without having to repeat Y over and over....)





Testing vs Proof

Niall Litchfield, March 31, 2005 - 10:26 am UTC

Hi Ian,

come along to some UKOUG things - well worth the money in my (possibly biased) opinion.

To be scrupulously fair to Don, I don't think he is really advocating not testing things, I *think* he is suggesting that you can't once and for all prove performance tuning rules with a simple script. He has certainly said on his board that scripts can illustrate how Oracle does and doesn't work. If he sticks to that in future I would welcome it. His argument with scripts is about performance 'proofs', the trouble is that I'm not sure that anyone, anywhere has posted performance 'proofs' - occasionally Tom and others have posted scripts that disprove some idea, but not that prove it. This does go to the heart of the science/art debate since the scientist (at least in theory) only accepts that theories can be disproven by individual tests, not that theories (with the exception of some mathematical ones) can be proven. To this extent I'm not really sure who Don is arguing with - he accepts that you can illustrate the behaviour of Oracle with simple scripts, and argues with a position that I have never seen anyone take.

btw, I prefer the term engineering to describe dba activities to science, defined (in my head) as the practical application of scientific knowledge to a particular task.

Tom Kyte
March 31, 2005 - 10:50 am UTC

well said -- I'm looking for this flotilla of performance proofs, they must be currently misplaced.

I have shown things like "an index rebuild can cause measurable, quantifiable, proovable increase in resource usage".

I've never shown that not rebuilding is the way to best performance.

I've tried to say "your mileage may vary, so be wary of precepts an broad generalizations. Most 'performance' proofs show the opposite side of the coin, the caveat. They show measurable increases in resource utilization.

A reader, March 31, 2005 - 10:26 am UTC

SQL> SHO USER
USER is "SYS"
SQL> SET AUTOTRACE TRACEONLY
SQL> SELECT * FROM T WHERE OBJECT_NAME LIKE 'T%'
  2  /

74 rows selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=40 Card=406 Bytes=34
          510)

   1    0   TABLE ACCESS (FULL) OF 'T' (Cost=40 Card=406 Bytes=34510)




Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          0  consistent gets
          0  physical reads
          0  redo size
          0  bytes sent via SQL*Net to client
          0  bytes received via SQL*Net from client
          0  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         74  rows processed


Tom, i really dont understand why there are no reads at all? 

Tom Kyte
March 31, 2005 - 10:53 am UTC

do not do things as SYS -- ever, not your account.

in fact, in 9i, you should see:

ops$tkyte@ORA9IR2> connect sys/manager;
ERROR:
ORA-28009: connection to sys should be as sysdba or sysoper
 


SYS is special, SYS is magic, SYS is different.

Please, use your own accounts and don't use sys.  many things are different for sys (eg: go ahead and set transaction read only; as sys... see what happens) 

To Niall

Ian, March 31, 2005 - 10:55 am UTC

Hi Niall,

Thanks for the invite. I'll definitely see if I can swing it with my boss :)

Just to add, your site is in my Favorites as well alongside a few others...

Cheers,

Ian

Tom Kyte
March 31, 2005 - 11:00 am UTC

the only problem I've had with the UKOUG is....

there is way too much good stuff to choose from simultaneously, impossible to see everything you want to see.

David Aldridge, March 31, 2005 - 5:05 pm UTC

When we're talking about reduction in LIO being beneficial, that's when we achieve it through improving the efficiency of the SQL (doing less work), right? Not when we achieve it by just making the block size larger -- or am I missing something there?

Tom Kyte
March 31, 2005 - 5:30 pm UTC

didn't really consider the alternative of changing the blocksize. In the original example (full scan vs index range scan) it (alternative larger block size) would have "most likely":

a) benefited the full table scan greatly as far as reducing the amount of LIO goes (say you doubled the blocksize, it would have halved the consistent gets almost in this case). Now, what you can draw from that performance wise is that it would technically scale better with more users, but we'd have to see about other things. It would not in general reduce any of the other work done (physical io, sorting and such).

b) probably not affect the index range scan since we were doing single block IO and unless the clustering factor of the index was really really good (near the number of blocks in the table) we'd be still spending lots of time skipping from block to block. I would hypothesize that the index range scan would consume approximately 1/2 the consistent gets, but that the table hip hops from the index would not show as good -- so it would be not so good for this one.


But I was showing the difference in LIO work performed by equivalent queries on the same data, changing nothing physical. There would be many ways to slice and dice this one.


Why so many LIO's ?

Ashiq Shamsudeen A, March 31, 2005 - 11:29 pm UTC

Can you explain what's the root of many LIO's though the query didn't fetched any rows. The statistics are upto date.

********************************************************************************

SELECT UNQ_ATT_ID
FROM
ATTRIBUTE_VAL_HEADERS WHERE OBJECT_ID = :b1 AND DEFINITION_ID = :b2 AND
DATA_SOURCE = :b3


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1772 1.19 1.32 0 0 0 0
Fetch 1772 6.17 6.35 0 228471 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3545 7.36 7.67 0 228471 0 0

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 84 (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
0 TABLE ACCESS BY INDEX ROWID ATTRIBUTE_VAL_HEADERS
0 INDEX RANGE SCAN I_ATTRIBUTE_VAL_HEADERS_FK4 (object id 33065)

The db version is 9205.

Tom Kyte
April 01, 2005 - 8:18 am UTC

Well, it is hard to set up exactly but it would appear that (there could be other reasons, but this is the most typical)

a) you have an index on some but not all of object_id, definition_id, data_source (say object_id, definition_id)

b) each of those combinations (object_id,definition_id) points to about 125 rows on different blocks.

c) when you traverse the index to the table, you discover that data_source <> :b3.


so for each execution of that query on average, you do 3 LIOs against the index (root/branch/leaf).

You do 125 table access by index rowids.

ops$tkyte@ORA9IR2> select 1772*128 from dual;
 
  1772*128
----------
    226816
 
can we see the table create and the index definition? 

Why so many ios?

Niall Litchfield, April 01, 2005 - 3:44 am UTC

Well you executed the query 1772 times with about 128 LIOs each time, that doesn't *seem* like many per execution to me, but I'm having to assume that this is a real production system with significantly sized objects since you dont tell us anything about the structure of the objects at all.

David Aldridge, April 01, 2005 - 9:35 am UTC

>> ... what you can draw from that [reduced LIO's due to larger block size] performance wise is that it would technically scale better with more users ... <<

I'm a bit fuzzy on this -- would the scaling improvement come from reduced use of a hardware resource (CPU I guess?) or from an internal Oracle process (reduced latch contention?)

Tom Kyte
April 01, 2005 - 11:03 am UTC

reduced latch contention -- but that would be dubious, for I would hope you would not have 1,000's of users full scanning a really large table.

The ultimate goal of reduced LIO is increased scalability, decreased cpu utilization.  

Ok, I got real curious, I've read about tests that said they prove index range scans do less LIO's when the index is in a large block tablespace, but I frankly "felt" they will not do significantly less  
http://www.dba-oracle.com/oracle_tips_tpc_c_large_blocks.htm
(funny, Mike points to a "proof" without any timing?  And I thought he thought "Getting all of these fancy stats, without getting back timing data is quite 
frankly, like getting back wait statistics without timing data, worthless."  Also, in the temp tablespace comment he says "All temp segment access is sequential and 32k blocksizes greatly reduce logical I/O and disk sort times". Hmm, what does LIO and temp have in common?  But he says he is working on a benchmark that shows a 1/3 reduction in runtime, I'll look for that 'proof' (aka 'evidence', 'test case')

But if indexes "like" this for range scans (unfortunately the test he points to is for a FAST FULL SCAN, sort of treating the index as a skinny table and using multi-block IO.  For a range scan, you typically have to introduce a table access by index rowid for we normally range scan to find rows in table) we should be able to see it.   

So, here we go.  I'll leave timing on, expose the IO, and explain that my buffer caches were a measely 32m for the 4k and 16k -- we could tune PIO perhaps by fixing the buffer cache sizing.  I sort of hid them before -- I wanted to concentrate on LIO. (but note: I did not hide them to hide anything, you or anyone could have run this see...)

Now, I forecasted above two probable outcomes based on my understanding of how things work.  The full table scan would benefit from reduced LIO (allowing it to 'scale', but again, if you have 1,000's of users -- full scanning a big table -- wow).  The index range scans -- because of their nature -- would not.  By their nature, what I mean is when you have: select * from t where x = :x using an index range/scan - table access by index rowid, you would expect the INDEX RANGE scan to contribute a very low percentage of IO to the overall outcome, and the table access by index rowid to contribute the most.

Note, Your Mileage May Vary on this and all things.  That is the primary reason for posting them.  When they diverge from what you see here, let's discuss "why".

so, here is the test, we'll discuss the results at the bottom.

ops$tkyte@ORA9IR2> create table t4k tablespace tbs4k
  2  as
  3  select * from big_table.big_table;
Table created.
ops$tkyte@ORA9IR2> create index t4k_idx on t4k(object_id) tablespace tbs4k;
Index created.
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T4K', cascade=>true);
PL/SQL procedure successfully completed.
                                                                                                                                                                                                                                      
ops$tkyte@ORA9IR2> create table t16k tablespace tbs16k
  2  as
  3  select * from big_table.big_table;
Table created.
ops$tkyte@ORA9IR2> create index t16k_idx on t16k(object_id) tablespace tbs16k;
Index created.
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T16K', cascade=>true);
PL/SQL procedure successfully completed.
                                                                                                                                                                                                                                      
ops$tkyte@ORA9IR2> set timing on
ops$tkyte@ORA9IR2> set autotrace traceonly
ops$tkyte@ORA9IR2> select object_id, object_name from t4k order by object_id;
2000000 rows selected.
                                                                                                                                                                                                                                      
Elapsed: 00:00:28.39
                                                                                                                                                                                                                                      
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=17078 Card=2000000 Bytes=60000000)
   1    0   SORT (ORDER BY) (Cost=17078 Card=2000000 Bytes=60000000)
   2    1     TABLE ACCESS (FULL) OF 'T4K' (Cost=5703 Card=2000000 Bytes=60000000)
                                                                                                                                                                                                                                      
Statistics
----------------------------------------------------------
         18  db block gets
      59297  consistent gets
      66697  physical reads
    2000000  rows processed
                                                                                                                                                                                                                                      
ops$tkyte@ORA9IR2> select object_id, object_name from t4k order by object_id;
2000000 rows selected.
                                                                                                                                                                                                                                      
Elapsed: 00:00:25.75
                                                                                                                                                                                                                                      
Statistics
----------------------------------------------------------
         18  db block gets
      59297  consistent gets
      66697  physical reads
    2000000  rows processed
                                                                                                                                                                                                                                      
ops$tkyte@ORA9IR2> select /*+ index( t4k t4k_idx ) */ object_id, object_name from t4k order by object_id;
2000000 rows selected.
                                                                                                                                                                                                                                      
Elapsed: 00:00:42.89
                                                                                                                                                                                                                                      
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1971443 Card=2000000 Bytes=60000000)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'T4K' (Cost=1971443 Card=2000000 Bytes=60000000)
   2    1     INDEX (FULL SCAN) OF 'T4K_IDX' (NON-UNIQUE) (Cost=8806 Card=2000000)
                                                                                                                                                                                                                                      
Statistics
----------------------------------------------------------
          0  db block gets
    2218155  consistent gets
     861030  physical reads
    2000000  rows processed
                                                                                                                                                                                                                                      
ops$tkyte@ORA9IR2> select /*+ index( t4k t4k_idx ) */ object_id, object_name from t4k order by object_id;
2000000 rows selected.
                                                                                                                                                                                                                                      
Elapsed: 00:00:40.77
                                                                                                                                                                                                                                      
Statistics
----------------------------------------------------------
          0  db block gets
    2220256  consistent gets
     827976  physical reads
    2000000  rows processed
                                                                                                                                                                                                                                      
ops$tkyte@ORA9IR2> select object_id, object_name from t16k order by object_id;
2000000 rows selected.
                                                                                                                                                                                                                                      
Elapsed: 00:00:25.52
                                                                                                                                                                                                                                      
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=14790 Card=2000000 Bytes=60000000)
   1    0   SORT (ORDER BY) (Cost=14790 Card=2000000 Bytes=60000000)
   2    1     TABLE ACCESS (FULL) OF 'T16K' (Cost=3415 Card=2000000 Bytes=60000000)
                                                                                                                                                                                                                                      
Statistics
----------------------------------------------------------
         18  db block gets
      14261  consistent gets
      23239  physical reads
    2000000  rows processed
                                                                                                                                                                                                                                      
ops$tkyte@ORA9IR2> select object_id, object_name from t16k order by object_id;
2000000 rows selected.
                                                                                                                                                                                                                                      
Elapsed: 00:00:25.15
                                                                                                                                                                                                                                      
Statistics
----------------------------------------------------------
         18  db block gets
      14261  consistent gets
      23239  physical reads
    2000000  rows processed
                                                                                                                                                                                                                                      
ops$tkyte@ORA9IR2> select /*+ index( t16k t16k_idx ) */ object_id, object_name from t16k order by object_id;
2000000 rows selected.
                                                                                                                                                                                                                                      
Elapsed: 00:00:54.63
                                                                                                                                                                                                                                      
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2002152 Card=2000000 Bytes=60000000)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'T16K' (Cost=2002152 Card=2000000 Bytes=60000000)
   2    1     INDEX (FULL SCAN) OF 'T16K_IDX' (NON-UNIQUE) (Cost=2152 Card=2000000)
                                                                                                                                                                                                                                      
Statistics
----------------------------------------------------------
          0  db block gets
    2216762  consistent gets
     777503  physical reads
    2000000  rows processed
                                                                                                                                                                                                                                      
ops$tkyte@ORA9IR2> select /*+ index( t16k t16k_idx ) */ object_id, object_name from t16k order by object_id;
2000000 rows selected.
                                                                                                                                                                                                                                      
Elapsed: 00:00:55.27
                                                                                                                                                                                                                                      
Statistics
----------------------------------------------------------
          0  db block gets
    2216393  consistent gets
     782773  physical reads
    2000000  rows processed
                                                                                                                                                                                                                                      
ops$tkyte@ORA9IR2> set autotrace off



<b>
Elapsed: 00:00:28.39
Elapsed: 00:00:25.75
Elapsed: 00:00:25.52
Elapsed: 00:00:25.15
                                                                                                                                                         
      59297  consistent gets
      66697  physical reads
                                                                                                                                                         
      59297  consistent gets
      66697  physical reads
                                                                                                                                                         
      14261  consistent gets
      23239  physical reads
                                                                                                                                                         
      14261  consistent gets
      23239  physical reads
</b>

Ok, runtimes -- no difference as far as I'm concerned by the wall clock.

The LIO's are down as predicted.  1/4 the number of blocks, 1/4th the number of LIOs.  What is up with the PIO though?  well, a tkprof (not shown here) showed me we did IO on the table wrote to temp, read from temp.  Most PIO translates into LIO but not all.  Temp for example.


<b>
Elapsed: 00:00:42.89
Elapsed: 00:00:40.77
Elapsed: 00:00:54.63
Elapsed: 00:00:55.27
                                                                                                                                                         
    2218155  consistent gets
     861030  physical reads
    2220256  consistent gets
     827976  physical reads
                                                                                                                                                         
    2216762  consistent gets
     777503  physical reads
    2216393  consistent gets
     782773  physical reads
</b>

Now that was interesting.  As predicted -- no significant change in LIO (if my index and table had a good clustering factor -- this would have been different perhaps, but in general I would not expect a significant change).

So maybe the comment "Indexes like large tablespaces. This is quite true. Robin Schumacher proved that the index tree builds cleaner and that range scans happen with less consistent gets: 
http://www.dba-oracle.com/oracle_tips_blocksizes-.htm
Note the test that proves it." was a little premature (again, Robin showed a FAST FULL SCAN -- basically a full table scan on an index, nothing wrong there, easy to predict.  the conclusion that index RANGE scans would is to be questioned)

So, why did the second set take longer?  It did less PIO and for all intents and purposes the same amount of LIO?

Well, it did less numbers of PIO, but larger IOS over all.

  1  select a, b, b-a, b/a*100
  2*   from (select 827976*4/1024 a, 777503*16/1024 b from dual)
ops$tkyte@ORA9IR2> /
 
         A          B        B-A    B/A*100
---------- ---------- ---------- ----------
3234.28125 12148.4844 8914.20313 375.616201

375% more physical IO (again, because of the smallish buffer cache).  But could it be anything else?

Maybe -- we'd need another test, but I can hypothesize that the overhead of moving 16k blocks from cache to pga over and over would consume more CPU cycles than moving 4k blocks from cache to pga over and over.  But that is a hunch right now, we'd need YAT (yet another test) to see that.


Thanks for pushing me on this one, was always curious if my guess was right, here is some evidence in support of that guess.

But, please remember, your mileage may vary.

 

David Aldridge, April 01, 2005 - 11:33 am UTC

Thanks Tom

When I was running some (single user) tests on this sort of thing I noted that a fast full index scan on a fully cached table was using more CPU than an index range scan of the same fully cached tables, which was anti-intuitive to me. I'm about to hop on a flight but I'll dig up the test case over the weekend.

Thanks again.

Tom Kyte
April 01, 2005 - 11:47 am UTC

I'd be interested to see, not 100% sure what you mean.


table & index structure

Ashiq Shamsudeen A, April 04, 2005 - 3:08 am UTC

the table structure for ATTRIBUTE_VAL_HEADERS

QL> desc ATTRIBUTE_VAL_HEADERS
Name Type Nullable Default Comments
------------------- -------------- -------- ------- --------
UNQ_ATT_ID NUMBER
OBJECT_ID NUMBER
DEFINITION_ID NUMBER
REFERENCE_ID NUMBER Y
ATT_VALUE VARCHAR2(2000) Y
NO_OF_INSTANCES NUMBER(4) Y
DATA_SOURCE VARCHAR2(1) 'E'
UPLOAD_REFERENCE_ID NUMBER Y
UPLOAD_VALUE VARCHAR2(2000) Y
DISCREPANCY VARCHAR2(1) Y


for index I_ATTRIBUTE_VAL_HEADERS_FK4 , we've columns they're OBJECT_ID, DEFINITION_ID, REFERENCE_ID, DATA_SOURCE.

From your answer you said "You do 125 table access by index rowids" and "select 1772*128 from dual". Can you explain me about this?

Tom Kyte
April 04, 2005 - 7:10 am UTC

Well, if you had say an index on OBJECT_ID, DEFINITION_ID and you have 125 rows in the table and their values are all:

(object_id,reference_id,data_source) = (1,1,'s')

and you searched for 1,1,'x' -- we would have to go from the index (1,1) is in the index 125 times) and hit the table 125 times. eg:


create table t ( object_id number, definition_id number, data_source varchar2(1), data char(2000) );

insert into t
select mod(rownum,5), mod(rownum,5), 'A', 'x'
from all_objects
where rownum <= 125*5;
create index t_idx on t(object_id,definition_id);

alter session set events '10046 trace name context forever, level 12';
select *
from t
where object_id = 1
and definition_id = 1
and data_source = 'B'
/


(yes, i carefully crafted this data so that each 1,1,'A' was on a different block). tkprof will show us:

select *
from t
where object_id = 1
and definition_id = 1
and data_source = 'B'

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 1 127 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 0.00 0.00 1 127 0 0

Rows Row Source Operation
------- ---------------------------------------------------
0 TABLE ACCESS BY INDEX ROWID OBJ#(39758) (cr=127 r=1 w=0 time=612 us)
125 INDEX RANGE SCAN OBJ#(39759) (cr=2 r=1 w=0 time=216 us)(object id 39759)


we did 2 IO's on the index (cr=2) and 125 on the table (cr=127-2). Total of 127 IO's in order to find....

Nothing.

Now, in your case, the problem is different. You say the index is on:

(object_id,definition_id,REFERENCE_ID,data_source).

So, all I can think is each object_id, definition_id has lots of reference_ids and we need to scan tons of blocks to find them all -- looking at each data_source. If this query is the "problem" query, you should have data_source in front of reference_id BUT BEAR IN MIND THAT WILL AFFECT OTHER QUERIES.

See
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:35336203098853#36038083399229 <code>
for a discussion on this -- the impact of column reordering in an index.

So here I'm not telling you to move the columns, just that given the order they are in, we have to (apparently) search lots of leaf rows. For example (yes, crafted data to demonstrate with)

create table t ( object_id number, definition_id number, reference_id number, data_source varchar2(1), data varchar2(1) );

insert into t
select 9999999, 9999999, 999999999+rownum, 'x', 'x'
from all_objects ;
create index t_idx on t(object_id,definition_id,reference_id,data_source);

alter session set events '10046 trace name context forever, level 12';
select *
from t
where object_id = 9999999
and definition_id = 9999999
and data_source = 'B'
/


select *
from t
where object_id = 9999999
and definition_id = 9999999
and data_source = 'B'

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.01 0.01 125 126 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 0.01 0.01 125 126 0 0

Rows Row Source Operation
------- ---------------------------------------------------
0 TABLE ACCESS BY INDEX ROWID OBJ#(39770) (cr=126 r=125 w=0 time=17491 us)
0 INDEX RANGE SCAN OBJ#(39771) (cr=126 r=125 w=0 time=17486 us)(object id 39771)



do that 1772 times and you'll have your IO counts.


A reader, April 04, 2005 - 12:18 pm UTC

Hi Tom,
I was going thru the discussion in Don's forum and since some of the people (well two of them) believe that sqlplus scripts don't prove anything about performance,it made me think if there could be cases where execution plan of a stement changed in a multi user environment from the one that of a single user system, all the statistics and parameters remaining same. I could only think of one where due to presence of histogram and bind variable peeking a plan changed from single user test and multi use test. But that in my opinion is a problem with the test case rather than a real change of plan. I think I remember reading in oracle newsgroup, JL saying that future version of oracle may change plan based on presence of cached data. But at present is there a chance of execution plans changing if we move from a single user development environment to a multi user environment with all other things remaining the same.

BTW, what on earth do people have to do to stop you from proving things with sqlplus scripts ;-)

Regards,
Kal

Tom Kyte
April 04, 2005 - 12:27 pm UTC

the execution plans are entirely reproducable -- single user, a billion users. They are somewhat deterministic.

Even when you add in things like sql profiles -- they are deterministic. You tell me what is coming into the system and we can tell you want will come out.

Their "point" is "We have observed X. After observing X, we note Y is true. Therefore X implies Y -- we see it all of the time"


However, my point (and I'm working on a much larger pair of papers to go with this point) is

X happens, Y is observed.

Did X in fact cause Y?

Did a side effect of X cause Y?

Is there perhaps a much better Y++ we could achieve?

Is there perhaps a much less resource intensive Z that achieves the goodness of Y?


Now, this is my opinion, but I feel it is imperative that a person who wishes to put forth these axioms "X implies Y" with hand waving and "it is so obvious" statements needs to supply some evidence (lets not get tied up in the semantics of how to define proof, proof is evidence) that what they say is true. That X in fact causes Y. And that in fact, given the investigation they did (they are 'experts' after all) they conclude that X is the best way to achieve Y.

If they choose not to, so be it, be they should be prepared to have people say "I don't believe you, you have been wrong in the past, we all have -- look at the myths about how things work abound out there."


I'll save the punchline and "proofs" for the papers. This is a serious topic and deserves attention.



A reader, April 04, 2005 - 2:36 pm UTC

Brilliant as always!

David Aldridge, April 04, 2005 - 4:46 pm UTC

>>When I was running some (single user) tests on this sort of thing I noted that a
fast full index scan on a fully cached table was using more CPU than an index
range scan of the same fully cached tables, which was anti-intuitive to me. ...


Followup:

I'd be interested to see, not 100% sure what you mean.<<

Well, can't reproduce the result now, so never mind.

Tom Kyte
April 04, 2005 - 5:37 pm UTC

It's ok, I'm going to break a big paper into two and take a look at this, measuring what you can and cannot expect :)

as i suspected, everytime I add another variable, the model gets more complex. arraysize really impacts this and other things too

Dave, April 04, 2005 - 7:36 pm UTC

publically available papers?

Tom Kyte
April 04, 2005 - 9:04 pm UTC

soon, soon. You'll see big links to them on the home page, they will be unavoidable.

In fact, an installment from Jonathan is there right now, my two will be later in the week....

Response for David Aldridge

Jonathan Lewis, April 05, 2005 - 3:02 am UTC

I've taken a guess about what may have been happening, and built a simple example that may be relevant.

create table t1 pctfree 90 pctused 10
as select rownum id, lpad(rownum,10) small_vc, rpad('x'
from all_objects
where rownum <= 10000
;

create index i1 on t1(id, small_vc) pctfree 90;

analyze table t1 compute statistics;

select small_vc from t1 where id between 100 and 200;

The execution path for this query could be either
INDEX (RANGE SCAN) OF 'I1' (NON-UNIQUE)
or
INDEX (FAST FULL SCAN) OF 'I1'

On my instance I saw the range scan, but could use a hint to get the fast full scan.

The choice is dictated (crudely speaking) by comparing the number of leaf blocks actually needed with the number of multiblock read requests that could pull the whole index from disc. Clearly, there will be a point where my 'between' range is so large that the CBO will switch from a range scan to a fast full scan.

With a little bad luck, you could find the CBO does a fast full scan on a big index, when the target data set is very small. If the index were fully cached (as per your original comment about the tables) then the CPU cost of checking each row in the index could be much higher than the CPU cost of finding the small number of leaf blocks that would be needed by a range scan - for no apparant reason. (If it were not cached, you would get a clue from the physical reads statistic showing a higher value for the FFS).




More LIO's on concurrent execution

Ashiq Shamsudeen A, April 05, 2005 - 7:38 am UTC

When I execute the package in a single user mode i'm getting less LIO's for the query discussed before. See below tkprof
********************************************************************************

SELECT UNQ_ATT_ID
FROM
ATTRIBUTE_VAL_HEADERS WHERE OBJECT_ID = :b1 AND DEFINITION_ID = :b2 AND
DATA_SOURCE = :b3


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1919 1.53 1.40 0 0 0 0
Fetch 1919 0.21 0.18 0 5762 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3839 1.74 1.58 0 5762 0 0

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 84 (recursive depth: 1)
********************************************************************************

But when I open tow sql*plus session and execute the package parallely the lio's going up drastically. Why there are more LIO's when i execute parallelly ?

For some queries in Tkprof i'm not getting query plan(i know this question out of this discussion topic).


Tom Kyte
April 05, 2005 - 12:03 pm UTC

are you modifying the object in this parallel test.

if so, you are seeing consistent reads.  Easy to see in single user as well:

ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create table t as select * from all_objects;
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select count(*) from all_objects;
 
  COUNT(*)
----------
     27941
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> update t set owner = owner;
 
27941 rows updated.
 
ops$tkyte@ORA9IR2> @trace
ops$tkyte@ORA9IR2> alter session set events '10046 trace name context forever, level 12';
 
Session altered.
 
ops$tkyte@ORA9IR2> declare
  2      pragma autonomous_transaction;
  3  begin
  4      for x in (select * from t other_transaction )
  5      loop
  6          null;
  7      end loop;
  8  end;
  9  /
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> begin
  2      for x in (select * from t same_transaction )
  3      loop
  4          null;
  5      end loop;
  6  end;
  7  /
 
PL/SQL procedure successfully completed.



SELECT * FROM T OTHER_TRANSACTION
                                                                                                                                                                                                     
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch    27942      0.45       0.36          0      56269          0       27941
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    27944      0.45       0.36          0      56269          0       27941
                                                                                                                                                                                                     
Rows     Row Source Operation
-------  ---------------------------------------------------
  27941  TABLE ACCESS FULL T (cr=56269 r=0 w=0 time=184321 us)
********************************************************************************
SELECT * FROM T SAME_TRANSACTION
                                                                                                                                                                                                     
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch    27942      0.28       0.30          0      27945          0       27941
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    27944      0.28       0.30          0      27945          0       27941


see
http://docs.oracle.com/docs/cd/B10501_01/server.920/a96524/c21cnsis.htm#2570

for details on multi versioning.

but you might ask yourself "why am I running this query thousands of times, why did I write slow by slow code, why didn't I write in sets, and why does it never return rows?  is there a way I can not even have to run this query that returns no data???"

 

Jonathon's test

David Aldridge, April 05, 2005 - 12:30 pm UTC

I'm a bit fuzzy on the internals of the fast full index scan, but I think I see what you're saying. Intuitively I would suppose that during the FFIS there is a CPU cost in examining every indexed value in every leaf block to see if it matches the query predicate, whereas the index range scan is more selective in which blocks' entries have to be "examined".

Now I wonder whether there would be a measurable difference between two FFIS's in which one of them needed a simple count(*) of all rows and the other needed a count(*) of some portion of the rows (to take the extreme example, a count(*) of rows that are known to not exist in the index). In both cases the rows have to be read, but I wonder whether there a FFIS can just read an index block header in the former case to just say "there's 100 indexed rows in this block"?

In fact, your example is pretty close to the original tests that I ran -- I was testing "select count(*) from t ... " based on FFIS and IRS of the same index, and the IRS was coming out with lower CPU usage. Maybe the IRS might just read the index block header and the FFIS reads the entries in the block. Perhaps Tom can post the source code for us ;)

Here are the test cases that I recalled -- they're on page 5 of this thread ... </code> http://dba.ipbhost.com/index.php?showtopic=1239&st=0 <code>... though the point of those was really to compare indexes on different block sizes, but ignore that bit -- Or not, as you please

Clarification

Mike Ault, April 12, 2005 - 2:43 pm UTC

I believe I have said repeatedly that the proofs are good for what they prove, that for a single user, small database with a single CPU and small memory, this is how it works. However, when data volumes grow, memory increases, CPU numbers increase, the results can be different.

For example, in 9i and 10g certain underscore parameters take affect at undocumented data volumes. Bitmap plans for example, the undoc parameter that controls them is set to TRUE. At a client that has been running fine on 10g for several months, they suddenly saw CPU usage leap from 30-40% to nearly 100% for each and every SQL statement run.

Using any and all available tests, nothing showed the problem. Finally one Oracle analyst smacked his forehead and said, I remember this happening before, unset the bitmap plans parameter. The client did and performance was better than before, a specific test query which had run at 20-30 seconds before the problem and 20 minutes during the problem, dropped to 6 seconds after setting the parameter to false, with the same expalin plan output and statistics. This also happend at a 9i client a few months ago (I wish we had a complete look at the stats for them, we might have been able to fix this second client sooner). At the volumes and such most of these small scale proofs are run at thsi problem would not occur.

I have no problem with proofs, as long as they are properly designed to mimic the system you are proving. A go cart uses gasoline, steers with a steering wheel and uses breaks to stop. Would I depend on load tests from using a go cart and mileage tests from a go cart to say what a Montero SUV would do? No.


Tom Kyte
April 13, 2005 - 8:37 am UTC




the magic underscores you talk of are 100% predicable in a single user mode (guess you are referring to your blog talk, one where you said "oracle support messed up, I saved the day and that proves tests are bad" again (without ONCE showing or stating any tests.... oh well)

the _ optimizer parameters *kick in starting when the table is created*. When the TABLE IS CREATED. when the database is started up, before anything happens.

The volume of data in a table (or more simply the stats fed to the optimizer) in conjunction with these parameter settings will cause different costs to be assigned to various execution operations. They are in fact in effect from row 0 to row infinity.

However, what you observed was that a table passed a threshold (i don't care if there was 1, 10, 1000000000000 users in this database) and the plan changed. Period. I can, you could, anyone can demonstrate that fact.

You are telling me you could not find HIGH LOAD SQL -- the SQL consuming the CPU??? You are telling me you could not then see the plans for that high load sql? Really?


There was no "proof" needed in such a case. You have a "cpu spiking" problem observed. Well, what was causing the spike. Someone smacking their forehead could have said

o cpu spiking -- I remember seeing hard parsing once, set cursor sharing

o cpu spiking -- I remember once a programmer used plsq and in a loop used all of our cpu, turn off plsql, erase it.

o cpu spiking -- i remember once that smon would consume all of our cpu coalesciing free extents, here is the oradebug commands to stop that


and the list goes on and on and on and one...


give me a break Mike, you had high load sql, you got lucky setting a parameter, the luck was the plans went back to what they were. But you didn't see the change in plan for whatever reason.

There is no other way that an optimizer parameter that kicks in with 0 users and 1000000000 users equally, could magically take a query from 20 minutes to 6 seconds.

This is precisely what I'm talking about with this mystic advice. "Hey everyone, you have queries running long times, try this mystic parameter, it might just work. I've seen it work, I've no idea why it works because apparently no plans change, but hey, it does work so give it a try, don't understand why it works or not, just do it"


But face it, 1 user or 10000000000 users, the plans would have changed because the plans are based on statistics....

You miss the point

Mike Ault, April 13, 2005 - 10:27 am UTC

This was seemingly caused by a bug in the code path, not a change in the execution path. Top SQL reports didn't show it because the SQL wasn't having problems on execution, it was during parsing. The bug is documented on Metalink. Why our searching before we called metalink didn't turn it up I don't know. Believe me we did search, we did check top SQLs, we did check events and waits before calling Oracle support.

Tom Kyte
April 13, 2005 - 10:28 am UTC

so, now you are telling us that sql was taking 19 minutes and 14 seconds to parse...


and you could not identify that from the v$ tables, that you have astronimical parse times? It was magical?

seems a tkprof would have made this painfully obvious immediately would it not?




<quote>
specific test query which had run at 20-30 seconds before the problem
and 20 minutes during the problem, dropped to 6 seconds after setting the
parameter to false, with the same expalin plan output and statistics.
</quote>

sorry, someone has made a mistake somewhere there Mike, think about it. Here is what you are saying in effect:

...
We set a magical parameter _b_tree_bitmap_plans....

Now, this didn't change anything, no plans changed....

But by setting this, queries when from 20 minutes to 6 seconds. It is truly magic. A silver bullet.
.......

it doesn't work that way Mike, that parameter simply enabled the optimizer to create bitmaps on the fly or not. If no plans changed how could this have affected anything.....

You couldn't find any high parse times really fast?

And it would only affect the hard parse, was this system hard parsing like mad?

What are all of the other variables, as I read your blog, all I see is "silver bullet"

Wouldn't you like to know, wouldn't you like to understand? I sure would have, especially before I publish on the internet "hey all, set this and yippee, things go faster and isn't oracle support pretty stupid"

Lets see what I can derive from your description of the problem:

<quote, stuff in bold is from me>
We never go onsite and start issuing recommendations without analysis. The last time I saw this technique used was from Oracle support three days ago. Without looking at detailed statistics or analysis they suggested increasing the shared pool (even though all of it wasn't being used), increasing the database cache size (even though 30% of the buffers were free in V$BH), turn off MTS (we had already tried this and it made no difference) and several other Oracle "Silver Bullets",

silver bullets, you work for a company that actively promotes them no? Funny thing is 2 of the 3 you listed are favorites of someone you work with. Add more ram.

Now, since the shared pool was not heavily used, we should be able to say "probably using bind variables". Is that safe? I think I might be able to conclude that because if you were hard parsing like mad, it would tend to be "more full"


none of which made the slightest difference in the problem. It turned out the problem was that 10g has a default value of TRUE for the parameter _B_TREE_BITMAP_PLANS which, even though the application had been running for months, suddenly started making a difference.

nothing to do with USER volumes, all about statistics. Running 10g, using the auto stats gathering jobs? Could have reverted the statistics back...

Note there were no bitmaps in the database other than those that Oracle had in their schemas. They offered no proofs, no explanations, just a sigh of relief that this Oracle Silver Bullet fixed the problem.

Now silver bullets are *good*? What if support had suggested this?

This is a classic example of why "simple", "repeatable" proofs can be hazardous.

what?????? where did we see ANYTHING about proofs here???? Me thinks you have been slamming them so much recently, you see them everywhere.

By all of the indications there was nothing wrong at the database level. No waits or events showed the problem, even plans for the SQL didn't change. Any SQL activity resulted in CPU spiking of up to 70-80%. However, the setting of this undocumented parameter had a profound effect on the optimizer and overall database performance. If an Oracle analyst hadn't "seen this before" and remembered this "Silver Bullet" the client would still be having problems. The troubled queries dropped form taking several minutes to 6 seconds.

a second ago, slamming the Oracle analysts for using silver bullets, now they are awesome for doing so
</quote>


I checked the bugdb, all of the execessive parse times related to this are in the 817 early 920 time frame. (and they were on the order of seconds, 10s of seconds, not 19 minutes). I see none in 10g, hope you file a new bug on this.

But since this system did not use the shared pool all up, it was probably bind variable friendly, so it is confusing how this could be a systemic long term problem as the hard parse would be the only affected thing.

And a simple tkprof would have shown it *immediately*
Statspack even would have (parse cpu....)

this is tom following up his followup

Thomas Kyte, April 13, 2005 - 11:43 am UTC

<quote>
Top SQL reports didn't show it because the SQL wasn't having
problems on execution, it was during parsing.
</quote>

didn't sit right with me either. I know, I know, this is a single user test.... but.... what the heck:

drop table t;
create table t ( x int );

exec statspack.snap
declare
l_sql long;
l_cursor sys_refcursor;
begin
l_sql := 'select t1.x x1';
for i in 2 .. 500
loop
l_sql := l_sql || ', t' || i || '.x x' || i;
end loop;
l_sql := l_sql || ' from t t1';
for i in 2 .. 500
loop
l_sql := l_sql || ', t t' || i;
end loop;

open l_cursor for l_sql;
close l_cursor;
end;
/

exec statspack.snap


(beware, please don't do that on your live system, it takes a bit of resources and all...)


Now, that query actually did appear in my top sql.


CPU Elapsd Old
Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
502 1 502.0 6.6 219.30 219.30 1725867735
Module: SQL*Plus
select t1.x x1, t2.x x2, t3.x x3, t4.x x4, t5.x x5, t6.x x6, t7.
x x7, t8.x x8, t9.x x9, t10.x x10, t11.x x11, t12.x x12, t13.x x
13, t14.x x14, t15.x x15, t16.x x16, t17.x x17, t18.x x18, t19.x
x19, t20.x x20, t21.x x21, t22.x x22, t23.x x23, t24.x x24, t25
.x x25, t26.x x26, t27.x x27, t28.x x28, t29.x x29, t30.x x30, t




Even just a quick peek in v$sql:

ops$tkyte@ORA10G> select * from (select sql_text, cpu_time, elapsed_time from v$sql order by 2 desc )
2 where rownum <= 50;
SQL_TEXT
------------------------------------------------------------------------------
CPU_TIME ELAPSED_TIME
---------- ------------
declare l_sql long; l_cursor sys_refcursor; begin l_sql := 'select t1.x x1'
; for i in 2 .. 500 loop l_sql := l_sql || ', t' || i || '.x x' || i; end
loop; l_sql := l_sql || ' from t t1'; for i in 2 .. 500 loop l_sql := l_
sql || ', t t' || i; end loop; prt( l_sql ); open l_cursor for l_sql; clos
e l_cursor; end;
219306579 219306579
select t1.x x1, t2.x x2, t3.x x3, t4.x x4, t5.x x5, t6.x x6, t7.x x7, t8.x x8,
t9.x x9, t10.x x10, t11.x x11, t12.x x12, t13.x x13, t14.x x14, t15.x x15, t1
6.x x16, t17.x x17, t18.x x18, t19.x x19, t20.x x20, t21.x x21, t22.x x22, t23
.x x23, t24.x x24, t25.x x25, t26.x x26, t27.x x27, t28.x x28, t29.x x29, t30.
x x30, t31.x x31, t32.x x32, t33.x x33, t34.x x34, t35.x x35, t36.x x36, t37.x
x37, t38.x x38, t39.x x39, t40.x x40, t41.x x41, t42.x x42, t43.x x43, t44.x
x44, t45.x x45, t46.x x46, t47.x x47, t48.x x48, t49.x x49, t50.x x50, t51.x x
51, t52.x x52, t53.x x53, t54.x x54, t55.x x55, t56.x x56, t57.x x57, t58.x x5
8, t59.x x59, t60.x x60, t61.x x61, t62.x x62, t63.x x63, t64.x x64, t65.x x65
, t66.x x66, t67.x x67, t68.x x68, t69.x x69, t70.x x70, t71.x x71, t72.x x72,
t73.x x73, t74.x x74, t75.x x75, t76.x x76, t77.x x77, t78.x x78, t79.x x79,
t80.x x80, t81.x x81, t82.x x82, t83.x x83, t84.x x84, t85.x x85, t86.x x86, t
87.x x87, t88.x x88, t89.x x89, t90.x x90, t91.x x91, t92.x x92,
219299348 219299348


would show it.


A tkprof would have had it "glaring hugely bigtime".


The statspack % non-parse cpu: would have flagged it.


So many things would point to "excessive parse times", for me, a tkprof would have been the thing, but even without that

parse time cpu 22,276 99.0 22,276.0
parse time elapsed 22,456 99.8 22,456.0


from the statspack would have flagged it.

Tom Kyte
April 13, 2005 - 11:53 am UTC

(just a note from self to self, it might not have appeared in the top sql unless you did a report showing SQL by cpu consumption, so it is true that it might not have been in the top sql reports, however -- there would have been tons of pointers to it)

David Aldridge, April 13, 2005 - 12:07 pm UTC

Seems that the _b_tree_bitmap_plans problem is documented in metalink note 225466.1, and it identifies one of the symptoms as being "Execution plan shows BITMAP CONVERSION". So it seems odd that "using any and all available tests, nothing showed the problem" -- if that was the case then the execution plans would have showed it. Unless this is a different bug of course.

Although Tom says "... it might not have appeared in the top sql unless you did a report showing SQL by cpu consumption, so it is true that it might not have been in the top sql reports ..." I'd say that if the major sympton of the problem was that "they suddenly saw CPU usage leap from 30-40% to nearly 100% for each and every SQL statement run" then finding top SQL's by CPU usage would have been an obvious place to start. On the other hand, if you have a battery of set scripts that you apply and none of them did such a report ... ?

_b_tree_bitmap_plans

bipul, April 13, 2005 - 1:48 pm UTC

We had the same issue when we moved from 8174 to 9203. Interestingly, it did't affect all the queries. Some queries were affected and the performance of database was not as good as it used to be on 8174 [I am not using quantitative terms like 20% decrease in performance as I dont have those numbers now]. Setting it to its previous default value of false fixed those issue. But saying that, we might be loosing on performance of other queries [where it might have a positive impact, I am sure if they changed the default value then there must be some positive intention ;-)].

Recently, around 6 weeks ago, we migrated our dev database to 10.1.0.2 and we had atleast one query that was taking forever to complete. Not sure if the cause is same . I will check if I can find that query/re-produce it and apply the same fix.

What I want to stress here is the fact that we see the same issue on dev database, which has far less number of users and usage compared to live database. The issue is reproducible irrespective of the size and number of users connected.

thanks
bipul

Tom Kyte
April 13, 2005 - 2:14 pm UTC

that is provably so...  it is an OPTIMIZER setting that affects plan generation.

Could it cause long parse times -- absolutely (was known to in 8iR2 and 9iR2).

Would you be able to detect that -- absolutely, everything you need to see it is right there.

It would only affect the hard parse if it were a parse isse. You would be able to definitly measure it, it would be a visible problem.


And you could definitely see any of this in a single user database...  If it were a parse issue, it could be more self evident (worse) with more users on an multi-cpu machine if you are hard parsing like mad (but again, with a "not all used" shared pool, it doesn't fit that profile) with the latching you would be doing but you would see it in a single user machine easily.

If it "was working" in the past and "stopped working", something changed, that would have been the stats (and wouldn't it be nice for the customer to know that so they could use the 10g feature to put back the stats that were working nicely for them yesterday?)


ops$tkyte@ORA10G> create table t as select * from all_objects where 1=0;
Table created.
 
ops$tkyte@ORA10G> create index t_idx on t(object_id);
Index created.
 
ops$tkyte@ORA10G> create index t_idx2 on t(object_name);
Index created.
 
ops$tkyte@ORA10G> create or replace procedure set_stats( p_n in number )
  2  as
  3  begin
  4          dbms_stats.set_table_stats( user, 'T', numrows => p_n, numblks => p_n/50 );
  5          dbms_stats.set_column_stats( user, 'T', 'OBJECT_ID', distcnt => p_n, nullcnt => 0 );
  6          dbms_stats.set_column_stats( user, 'T', 'OBJECT_NAME', distcnt => p_n, nullcnt => 0 );
  7  end;
  8  /
 
Procedure created.
 
ops$tkyte@ORA10G> set autotrace traceonly explain
ops$tkyte@ORA10G> exec set_stats(100);
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA10G> select object_id, object_name from t
  2   where object_id = 42 or object_name = 'FOO'
  3      or object_id = 55 or object_name = 'BAR';
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2 Card=4 Bytes=120)
   1    0   TABLE ACCESS (FULL) OF 'T' (TABLE) (Cost=2 Card=4 Bytes=120)
 
 
 
ops$tkyte@ORA10G> exec set_stats(1000);
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA10G> select object_id, object_name from t
  2   where object_id = 42 or object_name = 'FOO'
  3      or object_id = 55 or object_name = 'BAR';
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=6 Card=4 Bytes=120)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'T' (TABLE) (Cost=6 Card=4 Bytes=120)
   2    1     BITMAP CONVERSION (TO ROWIDS)
   3    2       BITMAP OR
   4    3         BITMAP CONVERSION (FROM ROWIDS)
   5    4           INDEX (RANGE SCAN) OF 'T_IDX' (INDEX) (Cost=1)
   6    3         BITMAP CONVERSION (FROM ROWIDS)
   7    6           INDEX (RANGE SCAN) OF 'T_IDX' (INDEX) (Cost=1)
   8    3         BITMAP CONVERSION (FROM ROWIDS)
   9    8           INDEX (RANGE SCAN) OF 'T_IDX2' (INDEX) (Cost=1)
  10    3         BITMAP CONVERSION (FROM ROWIDS)
  11   10           INDEX (RANGE SCAN) OF 'T_IDX2' (INDEX) (Cost=1)
 
 
 
ops$tkyte@ORA10G> set autotrace off
 

Wording

A reader, May 16, 2005 - 8:00 pm UTC

"The consistent gets cannot, will not be affected by any of the optimizer_*
settings. they affect how the optimizer costs things, they have no effect on
how things are actually processed."

I saw this quoted in an article by Mike Ault. I think what you meant by it was that the consistent gets for a PARTICULAR ACCESS PLAN are not affected by the optimizer_* settings, no?

Tom Kyte
May 16, 2005 - 9:04 pm UTC

Actually, it was Don, Mike just cut and pasted Don's prior material....

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:6643159615303#37716676582772 <code>

but YES, as both of them were told repeatedly by many people their forum "hey, little out of context don't you think", but anyway....

Yes, my point was "if you flip these little switches, they don't directly make LIO go down, or up -- they will do one of three things:

make it go up
make it go down
not move it at all"

So, in answer to the question
My question is -
Consistent gets of the second query (using index) is alarmingly high. Is it
because it tries to find the index blocks in the buffer cache, and, would the
setting of OPTIMIZER_INDEX_CACHING to a low value (say default value of 0) bring
down the consistent gets in this case?


the answer is "yes, no, maybe".

unless and until the plan changes, nothing will happen. These parameters affect how the optimizer COSTS things in the development of a plan, period. Simply changing them (without getting a resulting change in plan) isn't going to do a thing to the LIO performed by a query.



performance issue

Sanjeevakumar, May 17, 2005 - 2:32 am UTC

Hi Tom,
I am working on a datawarehouse project in which we are facing a performance issue.
We have few transaction tables using them we are building the fact tables for storing aggregated values based on some dimensions.Arround 6 months before the time taken for building these fact tables was arround 4 Hrs and the records in the main transaction table were 14 Million and data in the fact table was 12 Million.Now for the same fact build it is taking 9Hrs and data in the main transaction table is arround 20 Million.We have indexes and partitions on the main transaction table.For the process of building the facts we have created a PL/SQL procedure where we are verifying the dimension values, if it exists then we are updating the row in the fact table else we are inserting in to the fact table.Even we have partition in the fact table also.We are unable to find why the fact build is taking so much of time.
Can you please put some light on this..

Many Thanx.

Rgds
Sanjeev

Tom Kyte
May 17, 2005 - 8:27 am UTC

indexes probably are not a useful thing here (and if you are forcing them to be used, that could well be the problem)....

do you know about tkprof, sql_trace and other tools that come with the database?

sachin, May 31, 2005 - 4:14 pm UTC

Hi tom you have used the word "rowid" many times in this thread and i understand what a rowid is but i am litle confused about the difference between physical rowid,logical rowid and urowid..could you please explain this for me

Thanks


Tom Kyte
June 01, 2005 - 8:00 am UTC

a physical rowid is an address of a row. the rowid (since version 8) contains the object information (what data object it points to), the file the row is in, the block in the file, and the row slot on the block. It is an address.

a logical rowid, is used for objects that do not have real rowids, like rows in an index organized table. The urowid contains enough information to find the row - for an IOT, it contains the primary key value for example.

urowids are type capable of storing either a physical or logical rowid.

</code> http://docs.oracle.com/docs/cd/B10501_01/server.920/a96524/c13datyp.htm#918 <code>

Reader

sachin, June 01, 2005 - 11:04 am UTC

Thanks a lot,it is Reallyyyyy helpful:-)

sachin


High consistent gets w/ no CPU usage

Brandon Allen, September 01, 2005 - 2:00 pm UTC

Hi Tom,

I am trying to tune a query that is running for several hours and generating billions of consistent gets, but oddly the 'CPU used' statistic for the same session is not increasing.  Is it possible to perform consistent gets w/o using CPU, or is this possibly a bug with the CPU usage not being reported correctly?

Thanks,
Brandon

SQL> select value from v$sesstat a, v$statname b where a.statistic#=b.statistic# and b.name = 'CPU used by this session' and a.sid=78;

     VALUE
----------
    572663

SQL> select value from v$sesstat a, v$statname b where a.statistic#=b.statistic# and b.name = 'consistent gets' and a.sid=78;

     VALUE
----------
1899768658

SQL> select value from v$sesstat a, v$statname b where a.statistic#=b.statistic# and b.name = 'consistent gets' and a.sid=78;

     VALUE
----------
1900577773

SQL> select value from v$sesstat a, v$statname b where a.statistic#=b.statistic# and b.name = 'CPU used by this session' and a.sid=78;

     VALUE
----------
    572663
 

Tom Kyte
September 01, 2005 - 4:07 pm UTC

cpu is reported *at the end of the call*. hence you won't see it increment during the call.

a basic question again,

sns, October 20, 2005 - 11:11 am UTC

I have "select count(*) from tab" on instance A and instance B.

On Instance A, the rows start returing pretty instantly and on instance B it took a while (5 minutes) to start returning data on my SQL Navigator.

The consistent gets in instance A is less than consistent gets in B. The number of rows in both the cases are same.

My assumption is when the rows is requested by a query, it will first do physical reads into the memory, and later do consistent gets within the memory to perform latch and then return the data to the buffer (in my case it is SQL Navigator).

Does data start coming to buffer right after it gets into memory or will it wait until all the rows are in the memory?

In instance B, if it takes 5 minutes to start returning the data, I am assuming this extra time is due to physical reads. Is that correct?

The explain plan for the SQL statement in both the instances are exactly same (both are doing full table scans).

Thanks,




Tom Kyte
October 20, 2005 - 4:44 pm UTC

select count(*) is returning "rows"??????


or "row"

sorry...

sns, October 20, 2005 - 6:47 pm UTC

It is select * from tab...



Tom Kyte
October 21, 2005 - 7:56 am UTC

how queries are processed depends.

some queries can start returning data before the "last row is processed" -- select * from T, where T is a just a table for example - just get row 1 and return it. Immediate response.

select count(*) from T however must pretty much get the answer before you get your first row.


I'll guess that the optimizer modes are different on the two systems IF the plans are different.

If the plans are the same, then I cannot even "guess".

10g performance issue

Raj, January 09, 2006 - 11:54 am UTC

Tom ,

I have a business objects generated SQL that queries a one of the partitions of a partitioned fact table that has
approx. 100 million records. the fact table structure is below and it is range partitioned by YYMM_ID.



DEsc per_month_ne2_fact
Name Null? Type
----------------------------------------- -------- ---------------
SSN NOT NULL VARCHAR2(9)
REC_ID NOT NULL NUMBER
YYMM_ID NOT NULL NUMBER
ELEMENT_ID NOT NULL NUMBER
UNITS_COST NUMBER
COUNT NUMBER
ESC_YR1 NUMBER
ESC_YR2 NUMBER
ESC_YR3 NUMBER



In Dev ( Oracle 10gR1 runnig on Tru64 unix ) this takes 18 minutes and in PROD (Oracle 9.2 on Sun Solaris)
it takes 46 seconds . I have given below the details of running this SQL against DEV and PROD. In DEV it does
400K+ physical reads and in PROD it does 15K+ physical reads. (I should have read your new book on 9i/10g )


Can you please give your valuable inputs ?. Thanks for your help.


//=====================================================================================================//
SQLDEV> set autotrace traceonly
SQLDEV> SELECT
2 PER_ELEMENT_DIMENSION.ELEMENT_LEVEL1|| ' ' || PER_ELEMENT_DIMENSION.ELEMENT_DESC1,
3 PER_EMF_OMF_MONTH_DIMENSION_NE.PAY_GRADE,
4 PER_EMF_OMF_MONTH_DIMENSION_NE.MILITARY_SERVICE,
5 sum(PER_MONTH_NE2_FACT.UNITS_COST),
6 count(distinct(PER_MONTH_NE2_FACT.SSN)),
7 count(distinct(PER_MONTH_NE2_FACT.SSN))/12,
8 PER_TIME_MONTH_DIMENSION.CY_MONTH
9 FROM
10 PER_ELEMENT_DIMENSION,
11 PER_EMF_OMF_MONTH_DIMENSION PER_EMF_OMF_MONTH_DIMENSION_NE,
12 PER_MONTH_NE2_FACT,
13 PER_TIME_MONTH_DIMENSION
14 WHERE
15 ( PER_ELEMENT_DIMENSION.ELEMENT_ID=PER_MONTH_NE2_FACT.ELEMENT_ID )
16 AND ( PER_TIME_MONTH_DIMENSION.YYMM_ID=PER_MONTH_NE2_FACT.YYMM_ID )
17 AND ( PER_EMF_OMF_MONTH_DIMENSION_NE.EMF_ID=PER_MONTH_NE2_FACT.REC_ID )
18 AND ( PER_TIME_MONTH_DIMENSION.FILE_DATE=PER_EMF_OMF_MONTH_DIMENSION_NE.FILE_DATE )
19 AND ( PER_MONTH_NE2_FACT.ELEMENT_ID not in ( 311, 341, 361, 371) )
20 AND (
21 PER_TIME_MONTH_DIMENSION.CY_MONTH = '2004/09'
22 )
23 GROUP BY
24 PER_ELEMENT_DIMENSION.ELEMENT_LEVEL1|| ' ' || PER_ELEMENT_DIMENSION.ELEMENT_DESC1,
25 PER_EMF_OMF_MONTH_DIMENSION_NE.PAY_GRADE,
26 PER_EMF_OMF_MONTH_DIMENSION_NE.MILITARY_SERVICE,
27 PER_TIME_MONTH_DIMENSION.CY_MONTH
28 /

27 rows selected.

Elapsed: 00:18:00.06

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=19524 Card=11 Bytes=
891)

1 0 PX COORDINATOR
2 1 PX SEND* (QC (RANDOM)) OF ':TQ10005' (Cost=19524 Card=11 :Q1005
Bytes=891)

3 2 SORT* (GROUP BY) (Cost=19524 Card=11 Bytes=891) :Q1005
4 3 PX RECEIVE* (Cost=19524 Card=11 Bytes=891) :Q1005
5 4 PX SEND* (HASH) OF ':TQ10004' (Cost=19524 Card=11 :Q1004
Bytes=891)

6 5 SORT* (GROUP BY) (Cost=19524 Card=11 Bytes=891) :Q1004
7 6 PX RECEIVE* (Cost=19524 Card=11 Bytes=891) :Q1004
8 7 PX SEND* (HASH) OF ':TQ10003' (Cost=19524 Ca :Q1003
rd=11 Bytes=891)

9 8 SORT* (GROUP BY) (Cost=19524 Card=11 Bytes :Q1003
=891)

10 9 HASH JOIN* (Cost=19522 Card=37970 Bytes= :Q1003
3075570)

11 10 BUFFER* (SORT) :Q1003
12 11 PX RECEIVE* (Cost=2 Card=55 Bytes=10 :Q1003
45)

13 12 PX SEND* (BROADCAST) OF ':TQ10001'
(Cost=2 Card=55 Bytes=1045)

14 13 TABLE ACCESS (FULL) OF 'PER_ELEM
ENT_DIMENSION' (TABLE) (Cost=2 Card=55 Bytes=1045)

15 10 HASH JOIN* (Cost=19519 Card=38590 Byte :Q1003
s=2392580)

16 15 PX RECEIVE* (Cost=9901 Card=477601 B :Q1003
ytes=17193636)

17 16 PX SEND* (BROADCAST LOCAL) OF ':TQ :Q1002
10002' (Cost=9901 Card=477601 Bytes=17193636)

18 17 HASH JOIN* (Cost=9901 Card=47760 :Q1002
1 Bytes=17193636)

19 18 BUFFER* (SORT) :Q1002
20 19 PX RECEIVE* (Cost=2 Card=1 B :Q1002
ytes=13)

21 20 PX SEND* (PARTITION (KEY))
OF ':TQ10000' (Cost=2 Card=1 Bytes=13)

22 21 TABLE ACCESS (FULL) OF '
PER_TIME_MONTH_DIMENSION' (TABLE) (Cost=2 Card=1 Bytes=13)

23 18 PX PARTITION RANGE* (ITERATOR) :Q1002
(Cost=9768 Card=22924861 Bytes=527271803)

24 23 TABLE ACCESS* (FULL) OF 'PER :Q1002
_EMF_OMF_MONTH_DIMENSION' (TABLE) (Cost=9768 Card=22924861 B
ytes=527271803)

25 15 PX BLOCK* (ITERATOR) (Cost=2712 Card :Q1003
=22227822 Bytes=577923372)

26 25 TABLE ACCESS* (FULL) OF 'PER_MONTH :Q1003
_NE2_FACT' (TABLE) (Cost=2712 Card=22227822 Bytes=577923372)



2 PARALLEL_TO_SERIAL
3 PARALLEL_COMBINED_WITH_PARENT
4 PARALLEL_COMBINED_WITH_PARENT
5 PARALLEL_TO_PARALLEL
6 PARALLEL_COMBINED_WITH_PARENT
7 PARALLEL_COMBINED_WITH_PARENT
8 PARALLEL_TO_PARALLEL
9 PARALLEL_COMBINED_WITH_PARENT
10 PARALLEL_COMBINED_WITH_PARENT
11 PARALLEL_COMBINED_WITH_CHILD
12 PARALLEL_COMBINED_WITH_PARENT
13 PARALLEL_FROM_SERIAL
15 PARALLEL_COMBINED_WITH_PARENT
16 PARALLEL_COMBINED_WITH_PARENT
17 PARALLEL_TO_PARALLEL
18 PARALLEL_COMBINED_WITH_PARENT
19 PARALLEL_COMBINED_WITH_CHILD
20 PARALLEL_COMBINED_WITH_PARENT
21 PARALLEL_FROM_SERIAL
23 PARALLEL_COMBINED_WITH_CHILD SELECT distinct TBL$OR$IDX$PART$NUM("PER_EMF
_OMF_MONTH_DIMENSION", 0, d#, p#, "F

24 PARALLEL_COMBINED_WITH_PARENT
25 PARALLEL_COMBINED_WITH_CHILD SELECT distinct TBL$OR$IDX$PART$NUM("PER_MON
TH_NE2_FACT", 0, d#, p#, "YYMM_ID")

26 PARALLEL_COMBINED_WITH_PARENT


Statistics
----------------------------------------------------------
366 recursive calls
1390 db block gets
8739 consistent gets
409477 physical reads
0 redo size
2049 bytes sent via SQL*Net to client
437 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
37 sorts (memory)
8 sorts (disk)
27 rows processed

SQLdDEV>

SQLDEV> sho sga

Total System Global Area 419430400 bytes
Fixed Size 1302224 bytes
Variable Size 417784112 bytes
Database Buffers 0 bytes
Redo Buffers 344064 bytes

sort_area_size 8388608

sort_area_retained_size 2097152


db_cache_size 0

db_2k_cache_size 0

db_4k_cache_size 0

db_8k_cache_size 0

db_16k_cache_size 0

db_32k_cache_size 0

db_keep_cache_size 0

db_recycle_cache_size 0


sga_max_size 419430400

pre_page_sga FALSE

lock_sga FALSE

sga_target 0






//=====================================================================================================================/


SQLPROD> set autotrace traceonly
SQLPROD> SELECT
2 PER_ELEMENT_DIMENSION.ELEMENT_LEVEL1|| ' ' || PER_ELEMENT_DIMENSION.ELEMENT_DESC1,
3 PER_EMF_OMF_MONTH_DIMENSION_NE.PAY_GRADE,
4 PER_EMF_OMF_MONTH_DIMENSION_NE.MILITARY_SERVICE,
5 sum(PER_MONTH_NE2_FACT.UNITS_COST),
6 count(distinct(PER_MONTH_NE2_FACT.SSN)),
7 count(distinct(PER_MONTH_NE2_FACT.SSN))/12,
8 PER_TIME_MONTH_DIMENSION.CY_MONTH
9 FROM
10 PER_ELEMENT_DIMENSION,
11 PER_EMF_OMF_MONTH_DIMENSION PER_EMF_OMF_MONTH_DIMENSION_NE,
12 PER_MONTH_NE2_FACT,
13 PER_TIME_MONTH_DIMENSION
14 WHERE
15 ( PER_ELEMENT_DIMENSION.ELEMENT_ID=PER_MONTH_NE2_FACT.ELEMENT_ID )
16 AND ( PER_TIME_MONTH_DIMENSION.YYMM_ID=PER_MONTH_NE2_FACT.YYMM_ID )
17 AND ( PER_EMF_OMF_MONTH_DIMENSION_NE.EMF_ID=PER_MONTH_NE2_FACT.REC_ID )
18 AND ( PER_TIME_MONTH_DIMENSION.FILE_DATE=PER_EMF_OMF_MONTH_DIMENSION_NE.FILE_DATE )
19 AND ( PER_MONTH_NE2_FACT.ELEMENT_ID not in ( 311, 341, 361, 371) )
20 AND (
21 PER_TIME_MONTH_DIMENSION.CY_MONTH = '2004/09'
22 )
23 GROUP BY
24 PER_ELEMENT_DIMENSION.ELEMENT_LEVEL1|| ' ' || PER_ELEMENT_DIMENSION.ELEMENT_DESC1,
25 PER_EMF_OMF_MONTH_DIMENSION_NE.PAY_GRADE,
26 PER_EMF_OMF_MONTH_DIMENSION_NE.MILITARY_SERVICE,
27 PER_TIME_MONTH_DIMENSION.CY_MONTH
28 /

27 rows selected.

Elapsed: 00:00:41.07

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=804 Card=11 Bytes=95
7)

1 0 SORT* (GROUP BY) (Cost=804 Card=11 Bytes=957) :Q111030
05

2 1 SORT* (GROUP BY) (Cost=804 Card=11 Bytes=957) :Q111030
04

3 2 SORT* (GROUP BY) (Cost=804 Card=11 Bytes=957) :Q111030
03

4 3 HASH JOIN* (Cost=787 Card=50382 Bytes=4383234) :Q111030
03

5 4 TABLE ACCESS* (FULL) OF 'PER_ELEMENT_DIMENSION' (C :Q111030
ost=6 Card=55 Bytes=1045) 01

6 4 HASH JOIN* (Cost=780 Card=51205 Bytes=3481940) :Q111030
03

7 6 NESTED LOOPS* (Cost=65 Card=483127 Bytes=1739257 :Q111030
2) 02

8 7 TABLE ACCESS* (FULL) OF 'PER_TIME_MONTH_DIMENS :Q111030
ION' (Cost=6 Card=1 Bytes=13) 00

9 7 PARTITION RANGE* (ITERATOR) :Q111030
02

10 9 TABLE ACCESS* (FULL) OF 'PER_EMF_OMF_MONTH_D :Q111030
IMENSION' (Cost=59 Card=1 Bytes=23) 02

11 6 PARTITION RANGE* (ITERATOR) :Q111030
03

12 11 TABLE ACCESS* (FULL) OF 'PER_MONTH_NE2_FACT' ( :Q111030
Cost=585 Card=22227822 Bytes=711290304) 03



1 PARALLEL_TO_SERIAL SELECT /*+ CIV_GB */ A1.C0,A1.C1,A1.C2,A1.C3
,COUNT(DISTINCT SYS_OP_CSR(A1.C4,0))

2 PARALLEL_TO_PARALLEL SELECT /*+ TIV_GB */ A1.C0 C0,A1.C1 C1,A1.C2
C2,A1.C3 C3,SYS_OP_MSR(COUNT(DISTIN

3 PARALLEL_TO_PARALLEL SELECT /*+ PIV_GB */ A1.C1||' '||A1.C0 C0,A
1.C2 C1,A1.C5 C2,A1.C4 C3,SYS_OP_MSR

4 PARALLEL_COMBINED_WITH_PARENT
5 PARALLEL_FROM_SERIAL
6 PARALLEL_COMBINED_WITH_PARENT
7 PARALLEL_TO_PARALLEL SELECT /*+ ORDERED NO_EXPAND USE_NL(A2) */ A
1.C0 C0,A2.C0 C1,A2.C3 C2,A1.C1 C3,A

8 PARALLEL_FROM_SERIAL
9 PARALLEL_COMBINED_WITH_PARENT
10 PARALLEL_COMBINED_WITH_PARENT
11 PARALLEL_COMBINED_WITH_PARENT SELECT distinct TBL$OR$IDX$PART$NUM("PER_MON
TH_NE2_FACT", 0, d#, p#, "YYMM_ID")

12 PARALLEL_COMBINED_WITH_PARENT


Statistics
----------------------------------------------------------
10 recursive calls
16 db block gets
7655 consistent gets
15060 physical reads
0 redo size
1741 bytes sent via SQL*Net to client
282 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
78 sorts (memory)
0 sorts (disk)
27 rows processed

SQLPROD> sho sga

Total System Global Area 753190408 bytes
Fixed Size 730632 bytes
Variable Size 620756992 bytes
Database Buffers 131072000 bytes
Redo Buffers 630784 bytes

sort_area_size 8388608

sort_area_retained_size 2097152


db_cache_size 0

db_2k_cache_size 0

db_4k_cache_size 0

db_8k_cache_size 0

db_16k_cache_size 0

db_32k_cache_size 0

db_keep_cache_size 0

db_recycle_cache_size 0


sga_max_size 753190408

pre_page_sga FALSE

lock_sga FALSE

sga_target 0


//======================================================================================================================/


OK

A reader, March 10, 2006 - 2:36 am UTC

Hi Tom,
I have cached my widely used tables in the buffer cache
as I intended to go for Logical I/O.
But still the data retrieval is slower.

What can be the reasons behind that??

Could you please explain if you have time??

And also when to go for Logical I/O or Physical I/O ??

Please do reply.
Bye

Tom Kyte
March 10, 2006 - 12:13 pm UTC

er?

does
not
compute


because - there is no supporting information whatsoever.


slower than what? slower than 5? or 42?



Logical read and FTS

Suvendu, March 20, 2006 - 7:19 am UTC

Hi Tom,

Good Morning!!!
It’s a question regarding database buffer cache, as follows:

Step 1 :
A user issued: SELECT * FROM EMP WHERE DEPTNO=10, and now the data relating to this condition will retrieve from disk into buffer.
Step 2:
Another user issued: UPDATE EMP SET SAL=SAL+500 WHERE DEPTNO=20, and now data relating to this condition will retrieve from disk into buffer the old image (consistent) data will be in undo block.
Step 3:
Another user issued : SELECT * FROM EMP, and if the emp table have only data relating to 10, 20 and 30 department criteria then here will be FTS.

Now, the question is:
When Oracle retrieving data on FTS to emp, will it retrieve the ONLY data relating DEPT=30 which is NOT in buffer while it will access data for DEPT=10 and DEPT=20 from memory.
OR
It will retrieve the full data from disk.


Please, correct me if I’m wrong and your explanation with example will be clearer.

Thanks a lot for taking my question.

Regards,
Suvendu


Tom Kyte
March 20, 2006 - 7:23 am UTC

if the query is a serial query - it'll use a mix of buffer cache and disk.

if the query is a parallel query - it'll checkpoint and then do direct IO on the disk.



One more question....

Suvendu, March 20, 2006 - 12:34 pm UTC

Hi Tom,

It is new things to learn.
As per my understanding on Oracle Parallel Query(OPQ) operation, it takes place when:
1. Data segment created with parallelism option, i.e ALTER TABLE t1 PARALLEL (DEGREE 8); and session is enabled in parallel mode (ALTER SESSION ENABLE PARALLEL DML/DDL/QUERY).
OR
2. If database is configured with OPQ features enabling parameter like PARALLEL_SERVER, parallel_automatic_tuning etc effect to multi CPU environment.
OR
3. Forcing thru HINTS in multi CPU environment too.

Correct me if I’m wrong.

According to your answer “if the query is a parallel query - it'll checkpoint and then do direct IO on the disk”, on execution of any parallel statement (QUERY|DDL|DML) supporting to any above option there is always a checkpoint. Could you please, redirect to me to the document page where it is saying? If you could provide any example on it would be really more valuable to the answer.

As well as, your answers inspires me to learn a lot.

Thanking you once again for answering me.

Regards,
Suvendu


Tom Kyte
March 22, 2006 - 1:15 pm UTC

for first part, read:

</code> http://docs.oracle.com/docs/cd/B19306_01/server.102/b14223/usingpe.htm#i1007196 <code>

for point two about the direct io and checkpoints, see metalink technical Note 237328.1

A reader, March 20, 2006 - 7:01 pm UTC

Tom,

Regarding the question Suvendu from Bangalore, INDIA asked so i performed a small test turning on 10046 event.

Session 1.
scott@TEST> select * from emp where deptno=10
/

EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
7782 CLARK MANAGER 7839 09-JUN-81 2450 10
7839 KING PRESIDENT 17-NOV-81 5000 10
7934 MILLER CLERK 7782 23-JAN-82 1300 10
PARSING IN CURSOR #4 len=33 dep=0 uid=26 oct=3 lid=26 tim=41522641512 hv=1593706562 ad='674775bc'
select * from emp where deptno=10
END OF STMT
PARSE #4:c=490706,e=709894,p=22,cr=536,cu=0,mis=1,r=0,dep=0,og=1,tim=41522641503
BINDS #4:
EXEC #4:c=0,e=77,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=41522641666
WAIT #4: nam='SQL*Net message to client' ela= 5 p1=1111838976 p2=1 p3=0
WAIT #4: nam='db file sequential read' ela= 18829 p1=4 p2=132 p3=1
WAIT #4: nam='db file sequential read' ela= 10955 p1=4 p2=102 p3=1
FETCH #4:c=0,e=30994,p=2,cr=2,cu=0,mis=0,r=1,dep=0,og=1,tim=41522672779
WAIT #4: nam='SQL*Net message from client' ela= 569 p1=1111838976 p2=1 p3=0
WAIT #4: nam='SQL*Net message to client' ela= 2 p1=1111838976 p2=1 p3=0
FETCH #4:c=0,e=96,p=0,cr=2,cu=0,mis=0,r=2,dep=0,og=1,tim=41522673623
WAIT #4: nam='SQL*Net message from client' ela= 4463 p1=1111838976 p2=1 p3=0


--------------------------------------------
Session 2.
scott@TEST> update emp set sal=sal+500 where deptno=20;
5 rows updated.
--commit not given
=====================
PARSING IN CURSOR #1 len=42 dep=0 uid=26 oct=6 lid=26 tim=41703613359 hv=3408573792 ad='67353110'
update emp set sal=sal+500 where deptno=20
END OF STMT
PARSE #1:c=230331,e=291151,p=0,cr=21,cu=0,mis=1,r=0,dep=0,og=1,tim=41703613348
BINDS #1:
EXEC #1:c=10015,e=26047,p=0,cr=1,cu=3,mis=0,r=5,dep=0,og=1,tim=41703654802
WAIT #1: nam='SQL*Net message to client' ela= 7 p1=1111838976 p2=1 p3=0
WAIT #1: nam='SQL*Net message from client' ela= 336 p1=1111838976 p2=1 p3=0

----------------------------------------

Session 3

scott@TEST> select * from emp;

EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
7369 SMITH CLERK 7902 17-DEC-80 800 20
7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30
7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30
......
.....
14 rows selected.

=====================
PARSING IN CURSOR #1 len=17 dep=0 uid=26 oct=3 lid=26 tim=41826363760 hv=1745700775 ad='673330c8'
select * from emp
END OF STMT
PARSE #1:c=0,e=1643,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=41826363751
BINDS #1:
EXEC #1:c=10014,e=2387,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=41826377446
WAIT #1: nam='SQL*Net message to client' ela= 8 p1=1111838976 p2=1 p3=0
WAIT #1: nam='db file sequential read' ela= 64935 p1=4 p2=99 p3=1
WAIT #1: nam='db file scattered read' ela= 369 p1=4 p2=100 p3=2
FETCH #1:c=0,e=71149,p=3,cr=6,cu=0,mis=0,r=1,dep=0,og=1,tim=41826453239
WAIT #1: nam='SQL*Net message from client' ela= 634 p1=1111838976 p2=1 p3=0
WAIT #1: nam='SQL*Net message to client' ela= 6 p1=1111838976 p2=1 p3=0
WAIT #1: nam='db file scattered read' ela= 411 p1=4 p2=103 p3=2
FETCH #1:c=0,e=5562,p=2,cr=3,cu=0,mis=0,r=13,dep=0,og=1,tim=41826464599
WAIT #1: nam='SQL*Net message from client' ela= 11067 p1=1111838976 p2=1 p3=0
=====================

Why did Oracle performed scattered and sequential read ?

Thanks.

Tom Kyte
March 22, 2006 - 1:38 pm UTC

you are missing the important data:

a) the query plans

I would guess "where deptno = ..." used an index - hence db file sequential read (single block IOs from index to table)

I would guess that "select * from emp" full scans - no index - hence db file scattered reads (multi-block IOs from table)

I would guess the update was just there to confuse :) since it seems not relevant.

INSERTS

Ik, March 22, 2006 - 9:43 am UTC

Tom,

Why would an INSERT consume large amounts of logical reads?

Here's the TKPROF report

INSERT INTO TESTING ( PARTIITONED_COL, REG_COLUMN1... )
VALUES ( :B1 ,:B2 ,:B3 ,:B4 ,:B5 ,:B6 ,:B7 ,:B8 ,:B9 ,:B10 ,:B11 ,:B12 ,:B13 ,:B14 ,
:B15 ,:B16 ,:B17 ,:B18 )


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 91839 284.00 285.98 419 99144 2599849 91839
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 91840 284.00 285.98 419 99144 2599849 91839

Misses in library cache during parse: 1
Misses in library cache during execute: 3
Optimizer mode: CHOOSE
Parsing user id: 20 (recursive depth: 1)

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 435 0.03 1.37
buffer busy waits 1 0.00 0.00
control file sequential read 128 0.01 0.14
Data file init write 24 0.00 0.07
db file single write 8 0.00 0.00
control file parallel write 24 0.00 0.00
rdbms ipc reply 8 0.00 0.02
********************************************************************************

Please let us know.

Thanks,

Tom Kyte
March 22, 2006 - 4:12 pm UTC

Indexes.

take:

create table t
( a char(20),
b char(20),
c char(20),
d char(20)
)
/
alter session set sql_trace=true;
begin
for i in 1 .. 1000
loop
insert into t no_index values ( 'x', 'x', 'x', 'x' );
end loop;
execute immediate 'create index t_idx1 on t(a)';
for i in 1 .. 1000
loop
insert into t index1 values ( 'x', 'x', 'x', 'x' );
end loop;
execute immediate 'create index t_idx2 on t(a,b)';
for i in 1 .. 1000
loop
insert into t index2 values ( 'x', 'x', 'x', 'x' );
end loop;
execute immediate 'create index t_idx3 on t(a,b,c)';
for i in 1 .. 1000
loop
insert into t index3 values ( 'x', 'x', 'x', 'x' );
end loop;
execute immediate 'create index t_idx4 on t(a,b,c,d)';
for i in 1 .. 1000
loop insert into t index4 values ( 'x', 'x', 'x', 'x' );
end loop;
end;
/




INSERT INTO T NO_INDEX VALUES ( 'x', 'x', 'x', 'x' )
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1001 0.07 0.08 0 8 1157 1000
********************************************************************************
INSERT INTO T INDEX1 VALUES ( 'x', 'x', 'x', 'x' )
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1001 0.08 0.10 4 62 3477 1000
********************************************************************************
INSERT INTO T INDEX2 VALUES ( 'x', 'x', 'x', 'x' )
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1001 0.13 0.19 11 91 5652 1000
********************************************************************************
INSERT INTO T INDEX3 VALUES ( 'x', 'x', 'x', 'x' )
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1001 0.22 0.16 12 179 8024 1000
********************************************************************************
INSERT INTO T INDEX4 VALUES ( 'x', 'x', 'x', 'x' )
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1001 0.24 0.24 20 282 10807 1000


We have to read the structures and then modify them.

Reading is query mode gets.
Modifying involves the CURRENT mode gets.

They are done from the buffer cache - so they are logical IO's

How do we bring it down?

Ik, March 23, 2006 - 12:51 am UTC

Thanks Tom.

But, is there a way to bring it down? Apart from disabling Indexes before load?

Does it depend on the data inserted? I mean, if the data inserted is sequential - then would the index maintenance be less than if the data inserted is just random.

Is there a way to bring it down? Would having up to date stats on the Indexes help here?

Thanks,


Tom Kyte
March 23, 2006 - 10:36 am UTC

well, look at your processing:

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 91839 284.00 285.98 419 99144 2599849 91839
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 91840 284.00 285.98 419 99144 2599849 91839

slow by slow - 91,839 Executes to insert 91,839 rows. slow by slow by slow processing. The worst way to do it.

Each insert took 0.003 cpu seconds (not bad eh), but if you do something "fast" over and over and over (and over and over), it ADDS UP.

You want this to "go better", to "do less work", bulk it up.

take the above example, rewrite as:

declare
type array is table of t.a%type index by binary_integer;
l_data1 array;
l_data2 array;
l_data3 array;
l_data4 array;
begin
for i in 1 .. 100
loop
l_data1(i) := 'x';
l_data2(i) := 'x';
l_data3(i) := 'x';
l_data4(i) := 'x';
end loop;

for i in 1 .. 10
loop
FORALL i in 1 .. l_data1.count
insert into t no_index values
( l_data1(i), l_data2(i), l_data3(i), l_data4(i) );
end loop;
execute immediate 'create index t_idx1 on t(a)';
for i in 1 .. 10
loop
FORALL i in 1 .. l_data1.count
insert into t index1 values
( l_data1(i), l_data2(i), l_data3(i), l_data4(i) );
end loop;
execute immediate 'create index t_idx2 on t(a,b)';
for i in 1 .. 10
loop
FORALL i in 1 .. l_data1.count
insert into t index2 values
( l_data1(i), l_data2(i), l_data3(i), l_data4(i) );
end loop;
execute immediate 'create index t_idx3 on t(a,b,c)';
for i in 1 .. 10
loop
FORALL i in 1 .. l_data1.count
insert into t index3 values
( l_data1(i), l_data2(i), l_data3(i), l_data4(i) );
end loop;
execute immediate 'create index t_idx4 on t(a,b,c,d)';
for i in 1 .. 10
loop
FORALL i in 1 .. l_data1.count
insert into t index4 values
( l_data1(i), l_data2(i), l_data3(i), l_data4(i) );
end loop;
end;
/


Array insert 100 rows at a time, compare the tkprofs (first total record is the NEW one, second one is the OLD one)




INSERT INTO T NO_INDEX VALUES ( :B1 , :B2 , :B3 , :B4 )

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 11 0.00 0.00 0 24 161 1000
total 1001 0.07 0.08 0 8 1157 1000
********************************************************************************
INSERT INTO T INDEX1 VALUES ( :B1 , :B2 , :B3 , :B4 )

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 11 0.02 0.02 5 63 383 1000
total 1001 0.08 0.10 4 62 3477 1000
********************************************************************************
INSERT INTO T INDEX2 VALUES ( :B1 , :B2 , :B3 , :B4 )

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 11 0.03 0.05 8 110 712 1000
total 1001 0.13 0.19 11 91 5652 1000
********************************************************************************
INSERT INTO T INDEX3 VALUES ( :B1 , :B2 , :B3 , :B4 )

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 11 0.04 0.05 7 169 1127 1000
total 1001 0.22 0.16 12 179 8024 1000
********************************************************************************
INSERT INTO T INDEX4 VALUES ( :B1 , :B2 , :B3 , :B4 )
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 11 0.07 0.05 5 197 1371 1000
total 1001 0.24 0.24 20 282 10807 1000


Lots less IO (bulk operations!!!! they rule), lots less CPU, lots more scalable



-

Ik, March 23, 2006 - 1:48 pm UTC

Thanks very much, Tom.

We had purposefully not chosen this as it was difficult to implement given the business needs. It had to be row-by-row processing.

I had done a dummy test using FORALL and it clearly is the best solution.

But, given that we cannot change the design at this point, is there any way out for us? (apart from disabling indexes)

Thanks again

Tom Kyte
March 23, 2006 - 2:02 pm UTC

...
We had purposefully not chosen this as it was difficult to implement given the
business needs. It had to be row-by-row processing.
.....

I doubt it - we are developers, we develop algorithms. We write programs. We take business requirements and make programs from them. We implement the programs anyway we see fit. Bulk processing is pretty much ALWAYS possible.

changing the design - which I refer to as 'implementing correctly' - is likely the way to go. There aren't any "please_do_less_work_than_I_ask_you_to = true" parameters.

It is all about the implementation. Every time.

optimal number of LIO's for a query

Ajeet, April 25, 2006 - 8:21 am UTC

Hi Tom,

I have an OLTP system with tons of transaction. I am suppose to Tune it (it is a third party app , but we can modify the queries etc if required ) . Most of the queries are simple select /insert /update (updates are very less ) and they always select/insert/update 1 row at a time but number of concurrent users are high.

My question is how to know that if a query is doing 5 LIO's and this is OK to do 5 LIO's...whats the best way to find .

As from statspack , when i see a query with maximum buffer gets and gets/execution - i feel like reducing the LIO's -- but how to be sure that these numbers should be reduced or in other words they are a bit high..

any help will be big learning.

Tom Kyte
April 25, 2006 - 10:14 am UTC

A query doing 5 LIO's sounds like a "keyed read"

select * from t where primary_key = :x;

with 3 or 4 IO's in the index and an IO against the table. The only thing that would reduce that would be to use a different physical structure like a hash cluster, but that is sort of a big change for a third party application maybe.

yes. LIO's per execution are less

Ajeet, April 25, 2006 - 10:37 am UTC

Tom,

Thanks . True most of my queries returns 1 rows or insert 1 rows but the number of concurrent selects/inserts are very huge as we are doing a benchmark test.

so even if a query does just 5 or 10 LIO's the total number of gets for that query is too large as that query is getting executed say 250,0000 times or so..a snippet is below - sure i can change table to hash cluster if I know that 10 LIO is high (not optimal) for this query..so that is what i wanted to know,that is there a way..alogorithm to know this.

Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
2,350,341 287,923 8.2 13.2 27.54 27.58 2762571655
Module: lisrvr-uniserver@lnibmp51 (TNS V1-V3)
select mesg ,lchg_user_id ,TO_CHAR(lchg_time,'DD-MM-YYYY HH24:MI
:SS') ,rcre_user_id ,TO_CHAR(rcre_time,'DD-MM-YYYY HH24:MI:SS')
,tran_id ,TO_CHAR(tran_date,'DD-MM-YYYY HH24:MI:SS') ,NVL(ts_cnt
,0) ,sol_id ,contra_acid ,TO_CHAR(tran_amt) ,TO_CHAR(value_date,

also below is the header information of my statspack reprot for 1 run of benchmark

STATSPACK report for

Database DB Id Instance Inst Num Startup Time Release RAC
~~~~~~~~ ----------- ------------ -------- --------------- ----------- ---
729973733 FIN91BML 1 22-Apr-06 14:54 10.2.0.1.0 NO

Host Name: lnibmp52 Num CPUs: 8 Phys Memory (MB): 20,480
~~~~

Snapshot Snap Id Snap Time Sessions Curs/Sess Comment
~~~~~~~~ ---------- ------------------ -------- --------- -------------------
Begin Snap: 187 24-Apr-06 15:32:49 58 23.3
End Snap: 189 24-Apr-06 15:40:07 58 24.1
Elapsed: 7.30 (mins)

Cache Sizes Begin End
~~~~~~~~~~~ ---------- ----------
Buffer Cache: 584M Std Block Size: 8K
Shared Pool Size: 288M Log Buffer: 2,052K

Load Profile Per Second Per Transaction
~~~~~~~~~~~~ --------------- ---------------
Redo size: 2,225,939.05 7,176.17
Logical reads: 40,590.14 130.86
Block changes: 10,631.63 34.28
Physical reads: 13.32 0.04
Physical writes: 160.45 0.52
User calls: 10,527.63 33.94
Parses: 623.49 2.01
Hard parses: 0.01 0.00
Logons: 0.00 0.00
Executes: 9,250.04 29.82
Transactions: 310.18

% Blocks changed per Read: 26.19 Recursive Call %: 6.21
Rollback per transaction %: 0.00 Rows per Sort: 509.17

Instance Efficiency Percentages
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 99.95 Redo NoWait %: 100.00
Buffer Hit %: 99.97 In-memory Sort %: 100.00
Library Hit %: 100.00 Soft Parse %: 100.00
Execute to Parse %: 93.26 Latch Hit %: 99.64
Parse CPU to Parse Elapsd %: 2.19 % Non-Parse CPU: 99.89

Shared Pool Statistics Begin End
------ ------
Memory Usage %: 92.97 92.94
% SQL with executions>1: 12.67 12.72
% Memory for SQL w/exec>1: 37.00 37.11


Thanks again
Ajeet

Tom Kyte
April 25, 2006 - 10:57 am UTC

no way to tell, we cannot see the actual query from beginning to end.

excellent soft parse % and execute to parse

how to measure logical reads

Antonio Vargas, June 23, 2006 - 4:53 am UTC

Regarding logical reads, I've been toying with a query to try to measure the logical read complexity for the queries used at production database (mainly OLTP but also has to support big queries for big reports):


SELECT B.* FROM(
SELECT A.*, ROUND(AVG(MED) OVER (), 2) MED_AVG FROM(
SELECT WWW.ADDRESS || '-' || TO_CHAR(WWW.PIECE, '0000') ADDR,
WWW.SQL_TEXT TEXT,
QQQ.EXECUTIONS EXE,
QQQ.BUFFER_GETS GETS,
ROUND(QQQ.BUFFER_GETS / DECODE(QQQ.EXECUTIONS,0,1,QQQ.EXECUTIONS), 2) MED
FROM V$SQL QQQ, V$SQLTEXT WWW
WHERE QQQ.ADDRESS = WWW.ADDRESS
AND QQQ.HASH_VALUE = WWW.HASH_VALUE
AND QQQ.EXECUTIONS >= 2
) A ) B
WHERE MED >= MED_AVG
ORDER BY EXE DESC, MED DESC, ADDR ASC


Variying the 2 to 10 or 20 is producing very different results, and I on the good path to find out the most consuming queries?


Tom Kyte
June 23, 2006 - 10:18 am UTC

what is your definition of "most consuming queries"

to me, it could just be "IO/executions"

consistent gets

Jdam, June 28, 2006 - 9:16 pm UTC

Tom,

If I have a query how can I know if my consistent gets it is ok or not.

Thanks

Tom Kyte
June 29, 2006 - 7:12 am UTC

by understanding what is involved mostly. By understanding the physical structures available to you, the methods you can interact with the database, that LIO reduction is "in general" one of the goals.

for example

o using an IOT or cluster can radically decrease LIO (but could have other negative side effects as well)

o increasing the array fetch size may radically decrease LIO (or not, depends on how the query is processed)


There is no formula you can apply.

One thing you can do is look for "high LIO to execute" queries and mentally verify they are OK. Do they appear to be doing too much work based on how often they are executed and the amount of data inspected and utlimately returned.


You cannot use a formula like 'if it returns N rows from M tables it should take X LIOs' - since aggregates, sorts, clustering of data all affect it.

amazing amount of physical reads

A Reader, July 12, 2006 - 10:30 am UTC

Tom,

I've a sql from a third party application
which traced as below -

INSERT INTO TESTTAB1
NOLOGGING
(wr_konf_nr, wr_igr_nr,
wr_typ, wr_wert_char,
wr_wert_number,wr_wert_datum)
SELECT new_konf_nr, wr_igr_nr,
wr_typ, DECODE(:b2,wr_igr_nr,'False',wr_wert_char),
wr_wert_number, wr_wert_datum
FROM TESTTAB1, TESTTAB2
WHERE wr_konf_nr = old_konf_nr AND
konf_copy_nr = :b1

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 52.71 1098.85 800504 366191 3165596 383856
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 52.71 1098.85 800504 366191 3165596 383856


The blocksize is 8kb on this 9i database.

The total size of the segment TESTTAB1 is around 900MB
and the size of TESTTAB2 is much smaller (few MB's).

I checked the execution plan and it does an index scan on
TESTTAB1 (index size is around 700MB).

I see that the physical reads is 800504 blocks.. which when converted to GB's equals ~6 GB!!!

I'm surprised that its reading 6GB of data for this SQL.. does that sounds correct?

Thanks

Tom Kyte
July 12, 2006 - 5:02 pm UTC

could be, entirely a function of - well, lots of stuff.

can we see it with statistics level set to all? (the row source stuff in the tkprof - it'll show you exactly what step is doing the physical IO's)

probably shouldn't be using indexes at all....

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:6749454952894#6760861174154 <code>

LIO Correlation to Statspack

Terry, November 17, 2006 - 8:49 am UTC

Tom,
When reviewing the statspack we see the Physical reads, writes and Logical IO very clearly for the sample period. Can we logically assume that if there are high LIO queries involved that the Logical IO number shown in statspack may contain a significant Physical read component? The reason I ask is in our case we are seeing High LIO (approaching full saturation of the HBA) and there are many 10 digit LIO queries for the sample period.

Tom Kyte
November 19, 2006 - 3:51 pm UTC

a physical IO pretty much turns into a logical IO.

a logical IO is pretty much a cache read.

A high LIO query may or may not be increasing PIO.

A PIO turns into an LIO.
An LIO does not imply PIO.

latch vs lock,

A reader, December 04, 2006 - 5:31 am UTC

When I try to update a record a table, it is going to have a row level lock. Is this lock enforced in the memory or in disk?

I understand that a latch is held on the buffer cache chain to prevent someone modifying it in the buffer cache. The block which contains my row (that I am updating) will be in some hash chain and a latch will be applied to that chain.

If someother user process tries to update a different record and if that row belongs to the same block as the previous record, will the 2nd user process have to wait until the latch is released in the buffer cache? Now there will be two row locks (in disk or in memory??)

Which will happen first? a latch in the buffer cache or a lock?

Thanks,


Tom Kyte
December 04, 2006 - 7:27 am UTC

row locks like that are an attribute of the data, the "lock" is on the block itself. You can have 10,000,000,000 row locks in Oracle without consuming 'ram'

latches are locking/serialization devices used to prevent concurrent modifications to shared memory structures. They are held for EXTREMELY SHORT periods of time. the cache buffers chains latch is help only while we are traversing the list of blocks for example - when we found it, we give up that latch. Latches are not held for the transaction - only for extremely short operations. their goal is to protect those shared data structures in memory.

A reader

Reader, March 21, 2007 - 10:07 am UTC

Tom,

This is the autotrace stats

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=3 Card=1 Bytes=121)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T2217' (Cost=3 Card=1 By
tes=121)

2 1 INDEX (UNIQUE SCAN) OF 'IT2217' (UNIQUE) (Cost=2 Card=1)




Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
76710 consistent gets
65951 physical reads
0 redo size
632 bytes sent via SQL*Net to client
230 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
0 rows processed

select num_rows, chain_cnt, LAST_ANALYZED from all_tables where table_name = 'T2217';

NUM_ROWS CHAIN_CNT LAST_ANAL
---------- ---------- ---------
929415 38860 21-MAR-07

select count(1) from t2217;

COUNT(1)
----------
932866

The table has 98 columns. 12 columns have more than varchar2(255) but not every column has value.

What is the reason for such a higgh physical reads (65951)? Also the consistent gets are also high.

What could be done?



Tom Kyte
March 21, 2007 - 11:25 am UTC

umm, why don't we get to see the query?!?!?

the plan does not seem to match with the IO performed.

an index unique scan would do 2-5 IO's and then 1-2 more against the table.

something else is happening here.

here is the query

Reader, March 21, 2007 - 12:19 pm UTC

Tom,

SELECT T2217.C1,T2217.C1,C655500119,C600000010,C655630001,C655630000,C655630070,C655630312,C4,C536870920, C600000013,C600000138,C655631066,C655630166,C655632505,C7,C655620007,T2217.C1
FROM T2217
WHERE
(
(
(
(
0 = 1
AND T2217.C655630000 > 0
AND T2217.C600000138 IS NULL
AND T2217.C600000010 IS NULL
AND (T2217.C655500119 IS NULL OR T2217.C655500119 > 0)
AND T2217.C7 != 4
AND T2217.C7 != 1
AND (T2217.C655620007 = 4 OR T2217.C655620007 = 1)
)
OR
(
0 != 0
AND T2217.C655630000 > 0
AND T2217.C600000010 IS NULL
AND 0 != 0
AND (T2217.C655500119 IS NULL OR T2217.C655500119 > 0)
)
)
OR
(
0 = 0
AND T2217.C655630000 > 0
AND T2217.C1 = 'BOM000000931263'
AND T2217.C7 != 1
AND (T2217.C655620007 = 4 OR T2217.C655620007 = 1)
)
)
OR
(
0 = 2
AND T2217.C655630000 > 0
AND (T2217.C655632466 = ' ' OR T2217.C4 = ' ')
AND T2217.C7 != 1
AND (T2217.C655620007 = 4 OR T2217.C655620007 = 1)
)
)
ORDER BY 18 DESC

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=3 Card=1 Bytes=125)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T2217' (Cost=3 Card=1 By
tes=125)

2 1 INDEX (UNIQUE SCAN) OF 'IT2217' (UNIQUE) (Cost=2 Card=1)




Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
104546 consistent gets
43812 physical reads
0 redo size
844 bytes sent via SQL*Net to client
273 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1 rows processed

I wanted to clarify will there be any performance issue because there are unwanted conditions (like 0=2 ... 0!=0 ...)in the WHERE clause. I posted this as a seperate question.

Iam thinking because I removed the unwanted conditions and the query ran in less than 3 seconds.

Here is the modified query.

1 SELECT T2217.C1,T2217.C1,C655500119,C600000010,C655630001,C655630000,C655630070,C655630312,C4,
2 C600000013,C600000138,C655631066,C655630166,C655632505,C7,C655620007,T2217.C1
3 FROM T2217
4 WHERE
5 0 = 0
6 AND T2217.C655630000 > 0
7 AND T2217.C1 = 'BOM000000931263'
8 AND T2217.C7 != 1
9 AND (T2217.C655620007 = 4 OR T2217.C655620007 = 1)
10* ORDER BY 18 DESC
SQL> /

Elapsed: 00:00:02.00

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=3 Card=1 Bytes=119)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T2217' (Cost=3 Card=1 By
tes=119)

2 1 INDEX (UNIQUE SCAN) OF 'IT2217' (UNIQUE) (Cost=2 Card=1)




Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
844 bytes sent via SQL*Net to client
273 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

Note: Sorry for asking this question twice
Tom Kyte
March 21, 2007 - 12:27 pm UTC

just verify yourself that the two are the same - just a problem in logic. No need for us to sit down and rip apart your predicates for you????


that first plan does not seem to match your query.

Why ???

Reader, March 21, 2007 - 12:49 pm UTC

First of all, Why do you think will I send you a plan which does not match the query? I want my problem to be solved.

Secondly did not ask you go through the WHERE clause. My question was/is does it really effect the performance if there are conditions in WHERE clause which does not execute?

I think you could have answered it viz Yes/No/Can't say/Will not reply




Tom Kyte
March 21, 2007 - 1:28 pm UTC

laugh out loud. alter session set sense_of_entitlement = true;


you did ask us - if those two where clauses are the same - why would you post both? You would use - well - the one that "goes better"

I said that - because that query and that plan, posted at two entirely different times, do not appear to go together.

"prove it" as I like to say.

show us a tkprof with that big query (and yeah, the 0!=0 and so on make those bits "not relevant"), that includes that plan, and shows us doing that much IO.

Dude, Calm Down

Venkat K, March 21, 2007 - 1:21 pm UTC


Should I worry or ignore?

Mark, March 27, 2007 - 10:55 am UTC

In the query I have forced Oracle to use index as a result consistent gets are 44. In the second query the consistent gets are quite high. The difference in time taken is very small.

Should I worry too much or ignore and let Oracle decide?

Indexes are created on actlinkId of both the tables. This query is used frequently and it's one of Top 5 queries in Statspack under SQL ordered by Gets for DB.

Iam not using hint in the actual query. Iam on Oracle 9iR2.


SQL> SELECT /*+ index(a) */ a.actlinkId,groupId ,b.schemaId
2 FROM actlink_group_ids a,
3 actlink_mapping b
4 WHERE a.actlinkId=b.actlinkId
5 AND b.schemaId=3757
6 ORDER BY 1
7 ;

101 rows selected.

Elapsed: 00:00:02.01

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=74 Card=31 Bytes=403
)

1 0 SORT (ORDER BY) (Cost=74 Card=31 Bytes=403)
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'ACTLINK_GROUP_IDS' (Co
st=2 Card=1 Bytes=6)

3 2 NESTED LOOPS (Cost=54 Card=31 Bytes=403)
4 3 INDEX (RANGE SCAN) OF 'ACTLINK_MAPPING_IND' (UNIQUE)
(Cost=2 Card=26 Bytes=182)

5 3 INDEX (RANGE SCAN) OF 'ACTLINK_GROUP_IDS_IND' (NON-U
NIQUE) (Cost=1 Card=1)





Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
44 consistent gets
0 physical reads
0 redo size
1518 bytes sent via SQL*Net to client
313 bytes received via SQL*Net from client
8 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
101 rows processed




SQL> SELECT a.actlinkId,groupId ,b.schemaId
2 FROM actlink_group_ids a,
3 actlink_mapping b
4 WHERE a.actlinkId=b.actlinkId
5 AND b.schemaId=3757
6 ORDER BY 1;

101 rows selected.

Elapsed: 00:00:01.02

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=29 Card=31 Bytes=403
)

1 0 SORT (ORDER BY) (Cost=29 Card=31 Bytes=403)
2 1 NESTED LOOPS (Cost=9 Card=31 Bytes=403)
3 2 TABLE ACCESS (FULL) OF 'ACTLINK_GROUP_IDS' (Cost=9 Car
d=22122 Bytes=132732)

4 2 INDEX (UNIQUE SCAN) OF 'ACTLINK_MAPPING_IND' (UNIQUE)




Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
22176 consistent gets
0 physical reads
0 redo size
1518 bytes sent via SQL*Net to client
313 bytes received via SQL*Net from client
8 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
101 rows processed

Tom Kyte
March 27, 2007 - 11:58 am UTC

well, i cannot really tell what indexes are on what columns

and - does this hold true for all/most of the values you plug in for schemaid or is schemaid a heavily skewed column

Consistent Gets

Mark, March 27, 2007 - 12:16 pm UTC

ACTLINK_GROUP_IDS has 2 columns and both have normal index.

ACTLINK_MAPPING has 3 columns. Out of 3, SchemaId and ActlinkId column are indexed. They are normal index.

None of the tables have composite index.

select schemaid, count(1)
from actlink_mapping
group by schemaid

The above query returned 850 rows. The total number rows in actlink_mapping table are 22,000.

The ACTLINK_GROUP_IDS table also has 22,600 rows.


Tom Kyte
March 28, 2007 - 11:01 am UTC

and - does this hold true for all/most of the values you plug in for schemaid or is schemaid a heavily skewed column

you did not address that. so there are 850 rows - what are the min count(*)'s (count(1) - why do people count ones, count(*) is what you mean to use) and max count(*)'s

if that column is SKEWED, that is something we need to understand.


and knowing the names of the indexes and the columns they are on would be useful. so the columns are indexed, we still cannot tell what indexes on which columns are being used above

Thanks

Mark, March 28, 2007 - 12:23 pm UTC

Yes, schemaid is heavily skewed column.

select schemaid, count(*) ctr
from actlink_mapping
group by schemaid

The count(*) varies from 1 to 550. Min count(*) is 1 and maximum is 550.

select ctr, count(ctr)
from
(
select schemaid, count(*) ctr
from actlink_mapping
group by schemaid
) group by ctr

CTR COUNT(CTR)

1 80
2 79
3 45
4 42
5 36
6 39
7 32
8 29
9 56
10 19
11 25
12 31
13 17
14 24
15 18
16 14
17 9
18 12
19 10
20 7
21 5
22 7
23 2
24 3
25 5
26 6
27 7
28 5
29 1
30 4
31 1
32 1
33 3
34 4
35 5
36 1
37 7
38 1
39 4
40 3
41 1
42 2
43 1
44 3
45 3
46 1
47 1
49 2
50 1
51 2
52 4
54 1
55 1
56 4
57 2
58 1
59 2
60 5
62 6
63 4
64 1
65 2
66 3
67 4
68 1
69 1
70 1
71 3
72 2
74 2
75 1
76 3
78 2
79 1
80 1
81 2
83 1
84 1
85 2
86 1
87 2
90 1
91 1
94 1
95 1
96 1
99 1
102 1
105 1
110 1
121 1
122 1
130 1
131 2
136 1
141 1
143 1
145 1
146 1
147 1
159 2
160 1
162 1
163 1
172 1
173 1
174 1
177 1
201 1
203 1
204 1
219 1
226 1
230 1
232 1
240 1
243 1
246 1
264 1
266 1
277 1
296 1
355 1
408 1
417 1
458 1
502 1
550 1

The ACTLINK_GROUP_IDS table has ACTLINKID column which is indexed (ACTLINK_GROUP_IDS_IND)

The ACTLINK_MAPPING table has SCHEMAID, OBJECTID, ACTLINKID columns.
SCHEMAID and ACTLINKID columns are composite index (ACTLINK_MAPPING_ID). In my previous by mistake I mentioned that it's not composite index. Sorry for that.

Hope I have given sufficent information.




consistent read clone

Rajarshi, December 12, 2007 - 4:23 pm UTC

I am trying to research consistent read clone in 10g, is there a way to track this from in-memory v$ views ?
I really appreciate your help if you can point me to any detail documentation or discussions in this forum.
Thanks !
Rajarshi
Tom Kyte
December 13, 2007 - 9:37 am UTC

define "track", what are you looking for exactly, you can see that there are multiple versions in the cache via v$bh.

UTL to hotsos.com paper

Cary Millsap, December 17, 2007 - 10:44 am UTC

Note that the URL for the hotsos.com article that Tom refers to in this article is now http://www.hotsos.com/e-library/abstract.php?id=7

Physical Reads

A reader, September 27, 2008 - 2:37 am UTC

What are some of the possible reasons that physical and logical reads stay approximately the same after repeatedly running the same query over and over again, assuming that the db buffer cache is sufficiently large enough to accomodate all the db blocks required for the SQL and there are no other processes running on the system?

Sorry I couldn't come up with a test case at the moment, but I've seen this scenario somewhere.
Tom Kyte
September 27, 2008 - 11:40 am UTC

full scans of large tables.

we tend to not cache them in order to not blow out the cache.

Physical Reads

A reader, September 27, 2008 - 3:30 pm UTC

Yes, but...I thought in order for Oracle to read data, the data must be brought into the memory from disks if they're not already there. In the case of a full table scan, the blocks would be placed on the least recently accessed end of the buffer instead of the most recently accessed end, causing them to be flushed out sooner. If there are no processes running other than my SQL statement (which invokes a full table scan), when I run the same SQL again, wouldn't the blocks already be in the cache?
Tom Kyte
September 28, 2008 - 9:43 am UTC

we overwrite them - we do not want to flush out everything in the cache. We use and reuse the blocks we just read into. That is the entire point. It doesn't matter if we are the only user - the goal is to not make the cache devoid of the other blocks that are in the cache.

A full scan should not obliterate the cache.

Carry, June 05, 2009 - 6:10 am UTC

Hi Tom,

I can not find the document "are you still using cache hit ratios" on web site http://www.hotsos.com/


Where can I download it? Thanks in advance.

Regards,

Carry
Tom Kyte
June 08, 2009 - 12:14 pm UTC

Question about LIOs

pranay, July 09, 2009 - 3:07 pm UTC

Hi Tom,
I am an avid follower of your website and treat you as my idol. I have been following this website since past 1 years (I am an Junior DBA ). Few days ago I came across this discussion which said that LIO should be mostly avoided than PIO (PIO would take care of themselves)
This brought me to Hotsos.com I got two papers by Cary Millsap "Why a 99%+ Database Buffer Cache Hit Ratio is Not Ok" and "Why You Should Focus on LIOs Instead of PIOs".
But I wanted to know how can I actually achieve this... I mean how can I actually reduce the LIOs in my query...What steps should be undertaken for this... Can you please present some test cases regarding this
Tom Kyte
July 14, 2009 - 4:43 pm UTC

You have to sort of imagine the work the server must perform in order to retrieve your data.

eg: you have to understand what happens, how the query is processed, how it could be processed and what limits the physical organization of the data on disk might impose on you

Take this example:


ops$tkyte%ORA10GR2> create table documents
  2  ( username varchar2(30),
  3    docname  varchar2(30),
  4    metadata varchar2(1000) default rpad('x',1000,'x')
  5  )
  6  /

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> begin
  2      for i in 1 .. 100
  3      loop
  4          for x in (select username from all_users)
  5          loop
  6              for j in 1 .. 5
  7              loop
  8                  insert into documents
  9                  (username, docname)
 10                  values
 11                  ( x.username||case when j=1 then null else j end, x.username || i );
 12              end loop;
 13          end loop;
 14      end loop;
 15  end;
 16  /

PL/SQL procedure successfully completed.

<b>each user has 100 documents.  Note how I loaded though, note the order of the loops.  Since just like you would not create all 100 of your documents on a single day - I do not load all 100 of your documents in order.  I load ONE of your documents - then others from other people.  Then I load your second document and so on. 

So, the first document of yours is on block 1, the second is on SOME OTHER block (block 1 will be full by the time you create your second document).  This mimics how the data would arrive in real life.</b>

ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'DOCUMENTS' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> set autotrace traceonly;
ops$tkyte%ORA10GR2> set termout off
ops$tkyte%ORA10GR2> select username, docname, substr(metadata,1,10)
  2    from documents
  3   where username = USER;

100 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2677483334

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |   100 |    99K|   211   (1)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| DOCUMENTS |   100 |    99K|   211   (1)| 00:00:02 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("USERNAME"=USER@!)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       2965  consistent gets
          0  physical reads
          0  redo size
       2275  bytes sent via SQL*Net to client
        400  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

<b>Now, I go to retrieve your 100 documents.  Since we did not index or anything, it needs a full scan.

Enter "what would we do to reduce the logical IO's".

Options include (BUT ARE NOT LIMITED TO):

a) use compression, create the table compressed.
b) partition the data, range partition by username - if you partition into 100 equal partitions, we could reduce the IO to 1% of what it was before
c) create an index on some fields

I like option C for right now, let's try that and see what happens:</b>


ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> alter table documents add constraint t_pk primary key(username,docname);

Table altered.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> set termout off
ops$tkyte%ORA10GR2> select username, docname, substr(metadata,1,10)
  2    from documents
  3   where username = USER;

100 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2201894078

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |   100 |    99K|   102   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| DOCUMENTS |   100 |    99K|   102   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T_PK      |   100 |       |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("USERNAME"=USER@!)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        104  consistent gets
          0  physical reads
          0  redo size
       3265  bytes sent via SQL*Net to client
        400  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

<b>Pretty good - we are doing about 3.5% of the IO's now - not too bad. 

But, we are not happy yet.  We understand OUR data (we do, if we care to - I care to, do you?).  We understand how it is used (we retrieve by username frequently).  We understand how it is organized (arrives randomly by username over time).  We know that therefore if we have 100 documents to retrieve, we have to do AT LEAST 100 IO's (since our 100 documents will be scattered all over).

What could we do about that?

Options include (but are not limited to)

a) partitioning again, fewer users in a partition increases the odds that two or more of my documents could be on the same block

b) use a b*tree cluster with a cluster key of username
c) use a hash cluster with same key
d) use a sorted hash cluster with hash key of username and sort key of document name maybe
e) use an Index organized table (IOT) - remove the need for the primary key index altogether, store the data in a single segment, store it such that all of the rows for a given user are near each other


let's try (e)</b>

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create table new_documents
  2  (username,docname,metadata,constraint t_iot_pk primary key(username,docname))
  3  organization index
  4  as
  5  select * from documents;

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> drop table documents;

Table dropped.

ops$tkyte%ORA10GR2> rename new_documents to documents;

Table renamed.

ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'DOCUMENTS' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> set termout off
ops$tkyte%ORA10GR2> select username, docname, substr(metadata,1,10)
  2    from documents
  3   where username = USER;

100 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3941613593

-----------------------------------------------------------------------------
| Id  | Operation        | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT |          |   100 |    99K|    18   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| T_IOT_PK |   100 |    99K|    18   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("USERNAME"=USER@!)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         20  consistent gets
          0  physical reads
          0  redo size
       3265  bytes sent via SQL*Net to client
        400  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed




there, now we have about 0.6% the logical IO's. The max physical IO here is 20 (was 2965 before). We are more efficient with our buffer cache (when I run my query, I only use what I need of the cache, I don't put in there a lot of stuff others might not actually need). We removed the need to hip hop from index to table over and over. We stored data that is queried together - together on disk


And it all took "being able to close your eyes and envision what the server actually has to do/could do to put your data back together"

Excellent

A reader, July 14, 2009 - 4:54 pm UTC

Great example. This shows why it is so important to know the many features available in the database. Not everything should be in heap organized tables.

There's an excellent book on these matters written by a somewhat known guy ;) the title is: Expert Oracle Database Architecture.

Check what's available and test, test, test!


A reader, July 15, 2009 - 3:41 pm UTC

Once again an awesome explanation
you are great TOM!!!

Version 10.2.0.2

Alexander, September 04, 2009 - 2:21 pm UTC

This has got to be a rookie question but I'm going to ask it anyway. I don't understand this, came across it while tuning a query for a developer:

SQL> alter system flush shared_pool;

System altered.

SQL> set autotrace traceonly
SQL> select distinct ad.pin,
  2     ad.last_name,
  3     ad.first_name,
  4     ad.office_number,
  5     ad.department_number,
  6     to_date('09/20/2009', 'mm/dd/yyyy'),
  7     ad.lti_tracking,
  8     hrf.job_title,
  9     hrf.email_address
 10  from dct_alignment_detail_t ad, dct_hr_feed_t hrf
 11  where ad.pin = hrf.pin
 12  and ad.lti_tracking like '%Zero Touch Installation%';

1114 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 774132996

-------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                   | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                        |    75 |  7575 |  3998   (4)| 00:00:48 |
|   1 |  SORT UNIQUE                 |                        |    75 |  7575 |  3998   (4)| 00:00:48 |
|   2 |   TABLE ACCESS BY INDEX ROWID| DCT_HR_FEED_T          |     1 |    64 |     2   (0)| 00:00:01 |
|   3 |    NESTED LOOPS              |                        |    75 |  7575 |  3997   (4)| 00:00:48 |
|*  4 |     TABLE ACCESS FULL        | DCT_ALIGNMENT_DETAIL_T |    75 |  2775 |  3869   (4)| 00:00:47 |
|*  5 |     INDEX RANGE SCAN         | DCT_HR_FEED_T_X1       |     1 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("AD"."LTI_TRACKING" LIKE '%Zero Touch Installation%')
   5 - access("AD"."PIN"="HRF"."PIN")


Statistics
----------------------------------------------------------
       1536  recursive calls
          0  db block gets
      16540  consistent gets
          <b>0  physical reads</b>
          0  redo size
     119144  bytes sent via SQL*Net to client
       1283  bytes received via SQL*Net from client
         76  SQL*Net roundtrips to/from client
         75  sorts (memory)
          0  sorts (disk)
       1114  rows processed


No PIOs when I flushed the buffer cache?
Tom Kyte
September 04, 2009 - 4:01 pm UTC

you didn't flush the buffer cache, you flushed the shared pool (the shared sql area)

alter system flush buffer_cache;

SQL Tuning

A reader, September 05, 2009 - 12:54 pm UTC

excellent

A reader, September 05, 2009 - 12:58 pm UTC

Tom:

In your last example above, how is oracle getting the data.
it does not seem it is doing any work.

You either get it from buffer cache (logical) or disk (physical) and they both close to zero?


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
20 consistent gets
0 physical reads
0 redo size
3265 bytes sent via SQL*Net to client
400 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
100 rows processed

2. When you do SQL tuning is you primary goal reducing the time it takes to run or minimizing the logical and physical IO or what? can you list your criteria.
Tom Kyte
September 06, 2009 - 9:32 am UTC

what do you think consistent gets are? they are consistent reads (logical IO's) from the buffer cache.




and you do not always get it from the buffer cache or disk anymore, you might get the answer from the shared pool (result set caching - 11gr1)

2) I generally start with reducing logical IO, as logical IO consumes latching (contention) and might need physical IO (slow) and consumes CPU. In GENERAL (STRESS IN GENERAL - SAY IN GENERAL, SAY USUALLY many times over and over again), if you reduce the logical IO, you reduce the resources necessary - cpu, latching, physical IO

"db block gets" -> current block -> dirty read?

Duke Ganote, November 05, 2009 - 9:52 am UTC

Your article here
https://asktom.oracle.com/Misc/oramag/on-the-explicit-size-and-complex.html
says among other things
"db block gets" => "Current mode blocks are retrieved as they exist right now, not in a consistent read fashion."

That means, internally of course, Oracle may choose to do "dirty reads" for its own purposes?
-- Duke
Tom Kyte
November 11, 2009 - 12:53 pm UTC

No, it is not a dirty read.

A dirty read would look at uncommitted data. We are getting the block to get a row as it exists right now to modify it (that row). If that row has uncommitted changes - we obviously cannot modify that row - so the current mode get would have to wait until that row we are interested in is committed.


reducing logical ios

A reader, December 15, 2009 - 9:43 am UTC

Tom,

You demonstrate the effects of ARRAYSIZE on LIOs above, can you tell if ARRAYSIZE would affect the following;

select xmlagg(xmlelement(deptno,
                         (select xmlagg(xmlelement(empno))
                          from   emp
                          where  emp.deptno = dept.deptno
                         ),
                         (select xmlagg(xmlelement(manager))
                          from   dept_mgrs mgr
                          where  mgr.deptno = dept.deptno
                         )
             )
from   dept
where  deptno in (30,40,50)


We've recorded significant difference in "DB FILE SEQUENTIAL READ" time between EMP and DEPT_MGRS. Now, there are say 100 employees for each DEPT above, but only 2 dept managers for each DEPT.

What we notice was instead of roughly 50 times the amount of IO, reading EMP would take say 300 or more times longer in "DB FILE SEQUENTIAL READ" than reading DEPT_MGRS.

Just wonder if there is any way to diagnose what the delay is (we thought ARRAYSIZE could be a factor). EMP and DEPT_MGRS both have index on DEPTNO and are assigned to the same tablespace.

The bit of SQL resides in a PL/SQL stored procedure, if arraysize is a factor, can we change it to work for PL/SQL?
Tom Kyte
December 15, 2009 - 10:16 am UTC

did you measure the raw NUMBER of physical IO's against each.

My guess would be

a) index on emp(deptno) is large - because emp is large.
b) index on dept_mgrs(deptno) is small - because dept_mgrs is small

hypothesis: most, if not all, of dept_mgrs index is in cache. Some, but not most nor all of emp index is in the cache. Conclusion: we do more IO's against the emp index.

hypothesis: same thing about the tables.

Look at the RAW IO COUNTS - how many physical IOs did you do.

Since you are waiting on physical IO, reducing logical IO isn't going to do much in this case - you need to do the physical IO regardless in this case.

But yes, I could see a SMALL reduction in logical IO's against the DEPT table (only the DEPT table - not the other two) if you upped the array size.

A reader, December 16, 2009 - 2:42 am UTC

Thanks Tom.

TKPROF was showing "DB FILE SEQUENTIAL READ" as 95% of the total elapsed time, so I did a grep on the trace file for "READ" and "WAIT" events, imported these into a temp table and compared the wait times for EMP and DEPT_MGRS. The longest wait was on EMP, then it's index on DEPTNO, followed by DEPT_MGRS and its DEPTNO index.

How would I measure the raw number of phydical IO's ? Is it in rowsource operation as indicated by "r=" (in our trace, we don't have "r=", but we have "pr="?

DEPT_MGRS is actually almost an identical copy of EMP (don't ask about the design, that's another issue), so therefore the size of the tables are similar, but the DEPTNO index on DEPT_MGRS is smaller than EMP's.

It just seems a bit strange that although I understand there would be more physical IO (approx 50 times as much), but the wait events would indicate a whole lot more is being done on EMP.


Tom Kyte
December 16, 2009 - 7:26 am UTC

post the tkprof. R is for read as well.


Not only is the index smaller, but the number of blocks needed for that table to be in the cache is likely much smaller. The emp table is the universe of all employees - without anything else, if you query a large set of deptnos, you'll need that entire table. The dept_mgrs is a small universe of values with lots of extra junk in it that doesn't represent manager data - so a smaller working set of blocks need be present in the cache.


Just what I was looking for.....

Robert, January 22, 2010 - 2:21 pm UTC

Tom,

I am trying to clarify understanding of clustering factor, I/O, and PGA memory as Oracle processes a query.
From reading the above discussions are these statements true?.....

(1)
Oracle can basically keep only 1 index block and 1 table block in PGA memory at any given time.
(2)
For index scans Oracle does LIO from SGA on index block into PGA memory and begins sequentially reading the table/row entries in that block.
The Oracle process will do LIO from SGA on first row/table block in the index block and load that table block into PGA.
From within PGA memory the Oracle process will continue to read the rows from that table block and do no more LIO on the table until one of the following
a. arraysize limit is reached
b. next index entry in index block points to a different table block

Are the above 2 statements correct, more or less?

Thanks a lot,

Robert.

Tom Kyte
January 25, 2010 - 5:46 am UTC

conceptually it would be OK that way, conceptually. That is what you would observe to be happening in general.

sometimes we read the index, collect the rowids, sort the rowids and then access the table too - to try to 'improve' the clustering factor


But you need to add in client calls

eg:

open c for select * from t where indexed_column = x;
loop 
   fetch c into ...;
   exit when c%notfound
end loop;
close c


every leaf block in the index would be read and re-read for every fetched row.


versus

 
for x in (select * from t where indexed_column = x)
loop
   null;
end loop;


every leaf block would be read ONCE - maybe twice - instead of once per row.

We only keep the block for the duration of the call, the implicit cursor for look implicitly array fetches (10g and above) 100 rows at a time, hence would use the index more efficiently then the slow by slow code.

Most Very Most Useful!

Robert, January 25, 2010 - 11:07 am UTC

Thank you very much Tom!

A reader, February 11, 2010 - 4:33 am UTC

Sql Trace displays consistent gets based on array size. So if we are executing query in SQL then its consitent gets may be different than it actually does when front end application calls the procedure containing this query and returing ref cursor
Tom Kyte
February 16, 2010 - 9:37 am UTC

... Sql Trace displays consistent gets based on array size. ...

more accurate to say "sql trace displays consistents gets based on actual observed consistent gets" and that consistent gets may be affected (or not) by your array fetch size....


... So if we are executing
query in SQL then its consitent gets may be different than it actually does
when front end application calls the procedure containing this query and
returing ref cursor...


of course.

gets

Sam, February 16, 2010 - 10:37 am UTC

Tom:

I always test query performance using SQL*Plus which I believe uses array size of 15 by default.

Those queries are always called by other clients such as browser (web) or client/sever app (power builder).

How do you simulate the same environment. For web you call a stored procedure, so do you set the array size in Pl/sql
Tom Kyte
February 17, 2010 - 8:07 am UTC

... which I believe uses array size
of 15 by default.
...

correct



... by other clients such as browser...

tell me, how does a web browser be a client of the database? It doesn't, something in the middle is - if that is plsql, you would either be fetching 100 rows at a time (for x in (select ...) will array fetch 100 rows at a time in 10g and above), or one row at a time (if you use an explicit cursor and just fetch) or N rows - where N is decided by you - if you fetch bulk collect. If you use java, it'll be 10 by default - but many change that and so on....

Power builder, you control that as well - as far as I know.



You do not "set" an arraysize in plsql - you either

a) use implicit array fetching:

for x in (select * from all_objects) loop
... - 10g and above 100 rows at a time

b) single row fetching (bad idea)

open c for select * from all_objects;
loop
   fetch c into record;
   exit when c%notfound;



c) bulk collect


open c for select * from all_objects;
loop
   fetch c into BULK COLLECT into array_of_record;
   process array;
   exit when c%notfound;


d) select into BULK COLLECT





I would suggest using sql_trace and tkprof to see what really happens - use sqlplus "casually" to test out an idea or two, but use sql_trace and tkprof with the REAL PROGRAM to see what really happens.

full scan

A reader, May 12, 2010 - 9:58 am UTC

Hi Tom,

When I executed the below query ( in 9iR2) We full scanned the table for the first time , I executed the query second time immediately (with in one minute) - but I could not see at least 10 decrease in physical i/o ?

I thought there could be at least few blocks  located in Buffer cache which will reduce physical i/o - Why is it not happening ?

Are the buffers are overwritten during the full scan ?

Can you please clarify on this ?

2. To find a value exists in a table or not - One of our developer suggests always to use 'count(*)' - Can we implement using any other better solution ?

Many thanks 

SQL> select count(*) from lr_prov_orderheader where ordercomplete='0' and ptt_accountno='123646654';


  COUNT(*)
----------
         0


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   SORT (AGGREGATE)
   2    1     TABLE ACCESS (FULL) OF 'LR_PROV_ORDERHEADER'




Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      31274  consistent gets
      31263  physical reads
          0  redo size
        211  bytes sent via SQL*Net to client
        230  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> select count(*) from lr_prov_orderheader where ordercomplete='0' and ptt_accountno='123646654';


  COUNT(*)
----------
         0


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   SORT (AGGREGATE)
   2    1     TABLE ACCESS (FULL) OF 'LR_PROV_ORDERHEADER'




Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      31274  consistent gets
      31262  physical reads
          0  redo size
        211  bytes sent via SQL*Net to client
        230  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

full scan

A reader, May 24, 2010 - 5:03 pm UTC

Hi Tom,

When I executed the below query ( in 9iR2) We full scanned the table for the first time , I 
executed the query second time immediately (with in one minute) - but I could not see at least 10 
decrease in physical i/o ?

I thought there could be at least few blocks  located in Buffer cache which will reduce physical 
i/o - Why is it not happening ?

Are the buffers are overwritten during the full scan ?

Can you please clarify on this ?

2. To find a value exists in a table or not - One of our developer suggests always to use 
'count(*)' - Can we implement using any other better solution ?

Many thanks 

SQL> select count(*) from lr_prov_orderheader where ordercomplete='0' and 
ptt_accountno='123646654';


  COUNT(*)
----------
         0


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   SORT (AGGREGATE)
   2    1     TABLE ACCESS (FULL) OF 'LR_PROV_ORDERHEADER'




Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      31274  consistent gets
      31263  physical reads
          0  redo size
        211  bytes sent via SQL*Net to client
        230  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> select count(*) from lr_prov_orderheader where ordercomplete='0' and 
ptt_accountno='123646654';


  COUNT(*)
----------
         0


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   SORT (AGGREGATE)
   2    1     TABLE ACCESS (FULL) OF 'LR_PROV_ORDERHEADER'




Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      31274  consistent gets
      31262  physical reads
          0  redo size
        211  bytes sent via SQL*Net to client
        230  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed


Tom Kyte
May 24, 2010 - 7:48 pm UTC

when we full scan something, we decide if it is either

a) a short table
b) a long table (big)

if it is (a) we just do it normal - blocks are read into the cache and cached just like we would for any single block io.

If the table is (b), then we would not want that table to in effect FLUSH the buffer cache - so as we read a section of the table, we tend to immediately overwrite what we just read for that table with the next read of that table - so the data ages out immediately - we use a very small portion of the buffer cache.

And current releases of the database tend to use direct IO for full scans even if you do a serial (non-parallel) query - which means we basically bypass the buffer cache altogether as it would slow down the retrieval (if you have 32,000 blocks to get out of the cache, you have to do 32,000 individual lookups - we SCATTER them in the cache - better just to go to disk and read into your PGA and avoid the cache altogether).

A reader, May 25, 2010 - 4:30 am UTC

Thanks Tom for explaing about the Full Scans , could you please suggest a better way for the below ?

2. To confirm whether a value/record exists in a table or not - One of our developer suggests always to use
'count(*)' - Can we implement using any other better solution ?

select count(*) into v_count from lr_prov_orderheader
where account_no='12346'
if v_count>0 then do something here
else
do something here ...

As per my knowldge - count(*) is something which we can not predict the exact cost /plan while executing ..
so I want to know if there is any better way to do this.

Tom Kyte
May 25, 2010 - 7:35 am UTC

why not:


attempt to do something here.

if (attempt to do something resulted in nothing being done )
then
  do something here
end if;



eg:

update t set x = ? where y = ?;
if (sql%rowcount = 0)
then 
    insert into t values ( .... );
end if;


if you mean to update existing or insert new if no existing record is found. Or even:

merge into t 
using ( select ?,?,? from dual ) x
on (tx.y = x.y)
when matched then update set ...
when not matched then insert ...



I hate count(*), developers that do that drive me *nuts*.


count(*), immediately after counting, might change (there are other users after all). So, even if you count 0, there might be one when you get there. Even if you count 1, there might be zero when you get there. It is pointless, error prone (it is a bug mostly), it wastes resources.


Just

a) try to do something
b) and upon finding nothing to do, do something else.

logical reads vs cache buffer reads

a reader, October 15, 2010 - 2:36 pm UTC

you said: logical reads = cache buffer reads, and I agree, but this doesn't seem true in AWR report. The buffer cache reads under "IOStat by Function Summary" section, there is buffer cache reads in unit of bytes, and there is total logical reads under "Segments by Logical Reads" section, I assume each read is a block. Should the number of logical reads x 8k (if block size is 8k) = buffer cache reads?

But they are not equal in AWR, and not even close? Does it mean they are not what you meant here?
Tom Kyte
October 15, 2010 - 3:40 pm UTC

well, I don't know where I ever talked about "cache buffer reads"...


A logical read is a read from the cache.

Not all reads from cache are categorized as logical IOs. there are others - such as db block gets - that are reads from the cache as well.

what are other types of reads which are not logical IO

Ajeet ojha, October 18, 2010 - 1:07 am UTC

Hi Tom

why db block gets are not Logical IO ...even they are read from cache. can you please explain a bit.
also what are the other reads from cache which are not consiered as Logical IO and why.

Regards
Tom Kyte
October 25, 2010 - 8:44 am UTC

they are logical IO's, they are just tracked as db block gets - they are blocks gotten from the cache in current mode, not read consistent mode. You need to get the current version of a block to update it for example.

They are *all* considered logical IO's, they are just reported on by different names.

logical reads vs buffer cache reads

a reader, October 18, 2010 - 9:18 am UTC

Hi Tom,

logical reads = db block gets + consistent gets, logical reads are reads from buffer cache, and it includes db block gets, that's what I understand, are you saying that's not true?

I kind of think that there might be something else in buffer cache read not counted as logical read, though I don't know what it is, or there is no such thing.


Another thing that I want to confirm is even with same number of logical read, underneath I/O still could be different. For example, a query runs against a heap organized table vs same table created as index organized table with same PK index, I imagine logical reads would be somewhat similar because the query still reads same number of rows, but underneath IO for index organized could be far less because it doesn't have to get index and then get table bblock. Do I understand this correctly?

Thanks
Tom Kyte
October 25, 2010 - 9:02 am UTC

Another thing that I want to confirm is even with same number of logical read,
underneath I/O still could be different. For example, a query runs against a
heap organized table vs same table created as index organized table with same
PK index, I imagine logical reads would be somewhat similar because the query
still reads same number of rows, but underneath IO for index organized could be
far less because it doesn't have to get index and then get table bblock. Do I
understand this correctly?


How would one query needing less logical IO than another imply/infer that the underlying IO's would be "different". No, it would not be different - it would just be that ONE does LESS IO than the other. That is all.

There are many classifications of logical IO's - db block gets ARE a sort of logical IO.

consitent gets data

A reader, December 24, 2010 - 5:56 am UTC

Hi Tom,

I have the following different scenarioes for consistent gets. But i couldn't understand how we are getting the consistent gets.

In all the scenarions i excute select statement and run traces in differnet session and anlyze/commit did in different session.

In all the below cases why i didn't get any db block read.

db block read=PIO?
consistent gets= LIO?

1. table created but not analyzed

SQL>  select *from test2;

no rows selected


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   TABLE ACCESS (FULL) OF 'TEST2'




Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
        127  bytes sent via SQL*Net to client
        235  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

2. table analyzed but no row inserted.


no rows selected


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=13)
   1    0   TABLE ACCESS (FULL) OF 'TEST2' (Cost=2 Card=1 Bytes=13)




Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
        134  bytes sent via SQL*Net to client
        235  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

3. one row inserted but not commited.

no rows selected


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=13)
   1    0   TABLE ACCESS (FULL) OF 'TEST2' (Cost=2 Card=1 Bytes=13)




Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
        134  bytes sent via SQL*Net to client
        235  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

4. rows commited.


         A
----------
         2


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=13)
   1    0   TABLE ACCESS (FULL) OF 'TEST2' (Cost=2 Card=1 Bytes=13)




Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         15  consistent gets
          0  physical reads
          0  redo size
        206  bytes sent via SQL*Net to client
        276  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

5. table analyzed


         A
----------
         2


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=3 Card=1 Bytes=2)
   1    0   TABLE ACCESS (FULL) OF 'TEST2' (Cost=3 Card=1 Bytes=2)




Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         15  consistent gets
          0  physical reads
          0  redo size
        207  bytes sent via SQL*Net to client
        276  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

in case 1 >
i have created only the table and as i have not analyzed i can't see empty data blocks in dba_tables view. But how 3 LIO ORACLE did for the select statement?

in case 2 >
I have analyzed the table and then execute the select statement. In this case also i got 3 LIO. Then why it didn't differ? As i came to know after analyze Oracle will have all the statistics of the table so after anlayze LIO may differ?

in case 3 >

I inserted one row but not analyzed. Then oracle shows the LIO value 17 after select. How this number got increaseod to 17 from 3 at one go?

in case 4 >

I commited the data and execute the select statement. Then found LIO 15. Why this LIO is lowered to 15 from 17?

in case 5 >

I analyzed the table and execute the statemet. LIO is same as case 4. Why it didn't differ?

Thnaks in advance.

Tom Kyte
December 24, 2010 - 12:28 pm UTC

... db block read=PIO?
...

no, db block gets are CURRENT MODE gets.

.... In all the below cases why i didn't get any db block read.
...

because you didn't do any modifications - no need for current mode reads.


case1: the segment header was read to determine how many blocks to read... in this case - none, the table was empty. all three reads where the segment header.

case2: the plan didn't change so I don't know why you would think the IO's would? Nothing about the physical table changed.

case3: the table, for the first time, has data - you probably are using automatic segment space management which spreads inserts out in order to maximize concurrency. so, you advanced the "low high water mark" of the table and we had to scan the blocks under the low high water mark. Also, we had to read undo in order to rollback the changes. I didn't see 17 IO's and I didn't see any insert - so I assume a cut and paste issue on your part and I assume the insert was done in another session.

case4: no more rollback reads.

consistent gets

Bhaskar, December 24, 2010 - 6:06 am UTC

Hi Tom,

In the above scenarios why physical reads are 0 and redo size values are changing?
Thanks in advance.
Tom Kyte
December 24, 2010 - 12:30 pm UTC

physical reads are zero because........ drum roll...... we didn't need to have any physical io's, everything was in the buffer cache.

i didn't see any redo size changes, they were all zero.

Direct Reads

Rajeshwaran, Jeyabal, June 07, 2011 - 12:57 am UTC

Tom:

I am reading from Oracle product documentation, concepts guide.

http://download.oracle.com/docs/cd/E11882_01/server.112/e16508/cncptdba.htm#CHDBFDCB

<quote>
Direct writes operate on blocks above the high water mark and write directly to disk, bypassing the database buffer cache. Direct reads read directly from disk into the PGA, again bypassing the buffer cache
</quote>


We know that direct write is possible using

1) APPEND hint in sql statments
2) direct=true in sql loader.

Questions
1) What does this Direct reads means? does they mean Physical IO?

2) How to define a Direct read in sql statments? Can you please give me an example?

3) When this Direct reads will be really usefull?
Tom Kyte
June 07, 2011 - 2:01 pm UTC

1) yes, they are physical IO

2) you do not 'define' a direct read in sql statement, they just 'happen'. Parallel query for example - very often does direct reads. In parallel query, the thought is "buffer cache is too small to hold everything - we are doing big stuff. Checking the buffer cache for every block we want to read would be slow as well - since most of the blocks we want wouldn't be there. Therefore - we'll checkpoint before running the query and get all of the blocks onto disk and then do direct reads from disk into our parallel execution servers pga and process the data there - bypassing the buffer cache".

direct reads can happen for other things - like reading from temp. Also, in current releases - we might choose to use a direct read for a non-parallel full scan as well.

3) see above.

Direct Reads

Rajeshwaran, Jeyabal, June 07, 2011 - 4:22 pm UTC

The parallel query (direct path read) took only 3 IO's to complete.

select /*+ parallel(big_table,8) */ owner,object_type,count(*)
from big_table
group by owner,object_type
order by owner,object_type

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.02          0          0          0           0
Execute      1      0.07       0.99          0          3          0           0
Fetch        3      0.01       6.34          0          0          0         214
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        5      0.10       7.35          0          3          0         214

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 61  

Rows     Row Source Operation
-------  ---------------------------------------------------
    214  PX COORDINATOR  (cr=3 pr=0 pw=0 time=7234283 us)
      0   PX SEND QC (ORDER) :TQ10001 (cr=0 pr=0 pw=0 time=0 us)
      0    SORT GROUP BY (cr=0 pr=0 pw=0 time=0 us)
      0     PX RECEIVE  (cr=0 pr=0 pw=0 time=0 us)
      0      PX SEND RANGE :TQ10000 (cr=0 pr=0 pw=0 time=0 us)
      0       SORT GROUP BY (cr=0 pr=0 pw=0 time=0 us)
      0        PX BLOCK ITERATOR (cr=0 pr=0 pw=0 time=0 us)
      0         TABLE ACCESS FULL BIG_TABLE (cr=0 pr=0 pw=0 time=0 us)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       4        0.00          0.00
  SQL*Net message from client                     4        0.43          0.55
  reliable message                                1        0.00          0.00
  enq: KO - fast object checkpoint                1        0.08          0.08
  os thread startup                              18        0.02          0.24
  PX Deq: Join ACK                                2        0.00          0.00
  PX Deq: Parse Reply                             4        0.48          0.48
  PX Deq: Execute Reply                         154        0.28          6.12
  PX qref latch                                   9        0.00          0.00
  PX Deq: Table Q qref                            5        0.00          0.00
  SQL*Net more data to client                     1        0.00          0.00
  PX Deq: Signal ACK                             12        0.10          0.22
********************************************************************************


By the no-parallel query took 21K IO's.

select owner,object_type,count(*)
from big_table
group by owner,object_type
order by owner,object_type

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        3      0.81       0.81          0      21946          0         214
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        5      0.81       0.81          0      21946          0         214

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 61  

Rows     Row Source Operation
-------  ---------------------------------------------------
    214  SORT GROUP BY (cr=21946 pr=0 pw=0 time=817622 us)
1596672   TABLE ACCESS FULL BIG_TABLE (cr=21946 pr=0 pw=0 time=3193398 us)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       3        0.00          0.00
  SQL*Net message from client                     3        0.17          0.25
  SQL*Net more data to client                     1        0.00          0.00
********************************************************************************


Questions

1) we'll checkpoint before running the query and get all of the blocks onto disk and then do direct reads from disk into our parallel execution servers pga - Does this represents "enq: KO - fast object checkpoint" from Parallel query Wait Event?

2)Parallel query for example - very often does direct reads - bypassing the buffer cache - Which portion in Tkprof represents it? How do i know i am bypassing the buffer cache?
Tom Kyte
June 07, 2011 - 4:45 pm UTC

no it didn't take only 3 IO's, that is the trace of the query coordinator. You forgot to look at the other 8 traces for the parallel execution servers.


1) yes

2) you'd need to look at the trace files associated with the parallel execution servers themselves. You can use statspack/AWR at the system level to observer direct reads (statistic) increasing.

Direct Reads

Rajeshwaran, Jeyabal, June 07, 2011 - 10:26 pm UTC

Tom:

As you said direct reads (statistic) increasing for parallel query, Here is what I see in AWR reports.

Parallel Query took 22K buffer gets (Logical IO) and 21K Disk reads (Physical IO)

SQL ID: 0wk3r40kq7xuw                 DB/Inst: ORA10GR2/ora10gr2  Snaps: 90-91
-> 1st Capture and Last Capture Snap IDs
   refer to Snapshot IDs witin the snapshot range
-> select /*+ parallel(big_table,8) */ owner,object_type,count(*) from bi...

    Plan Hash           Total Elapsed                 1st Capture   Last Capture
#   Value                    Time(ms)    Executions       Snap ID        Snap ID
--- ---------------- ---------------- ------------- ------------- --------------
1   1556513508                 29,452             1            91             91
          -------------------------------------------------------------


Plan 1(PHV: 1556513508)
-----------------------

Plan Statistics                       DB/Inst: ORA10GR2/ora10gr2  Snaps: 90-91
-> % Total DB Time is the Elapsed Time of the SQL statement divided
   into the Total Database Time multiplied by 100

Stat Name                                Statement   Per Execution % Snap
---------------------------------------- ---------- -------------- -------
Elapsed Time (ms)                            29,452       29,452.4    60.3
CPU Time (ms)                                   342          342.1     5.3
Executions                                        1            N/A     N/A
Buffer Gets                                  22,288       22,288.0     7.9
Disk Reads                                   21,934       21,934.0    49.1
Parse Calls                                      17           17.0     0.3
Rows                                            214          214.0     N/A
User I/O Wait Time (ms)                      22,431            N/A     N/A
Cluster Wait Time (ms)                            0            N/A     N/A
Application Wait Time (ms)                       86            N/A     N/A
Concurrency Wait Time (ms)                      270            N/A     N/A
Invalidations                                     0            N/A     N/A
Version Count                                     1            N/A     N/A
Sharable Mem(KB)                                 12            N/A     N/A
          -------------------------------------------------------------

Execution Plan
-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name      | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |           |       |       |   714 (100)|          |        |      |            |
|   1 |  PX COORDINATOR          |           |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (ORDER)     | :TQ10001  |   368 |  5520 |   714   (6)| 00:00:09 |  Q1,01 | P->S | QC (ORDER) |
|   3 |    SORT GROUP BY         |           |   368 |  5520 |   714   (6)| 00:00:09 |  Q1,01 | PCWP |            |
|   4 |     PX RECEIVE           |           |   368 |  5520 |   714   (6)| 00:00:09 |  Q1,01 | PCWP |            |
|   5 |      PX SEND RANGE       | :TQ10000  |   368 |  5520 |   714   (6)| 00:00:09 |  Q1,00 | P->P | RANGE      |
|   6 |       SORT GROUP BY      |           |   368 |  5520 |   714   (6)| 00:00:09 |  Q1,00 | PCWP |            |
|   7 |        PX BLOCK ITERATOR |           |  1589K|    22M|   682   (2)| 00:00:09 |  Q1,00 | PCWC |            |
|   8 |         TABLE ACCESS FULL| BIG_TABLE |  1589K|    22M|   682   (2)| 00:00:09 |  Q1,00 | PCWP |            |
-------------------------------------------------------------------------------------------------------------------


Non-Parallel Query took 21K buffer gets (Logical IO) and 7 Disk reads (Physical IO)


Plan 1(PHV: 3184991183)
-----------------------

Plan Statistics                       DB/Inst: ORA10GR2/ora10gr2  Snaps: 90-92
-> % Total DB Time is the Elapsed Time of the SQL statement divided
   into the Total Database Time multiplied by 100

Stat Name                                Statement   Per Execution % Snap
---------------------------------------- ---------- -------------- -------
Elapsed Time (ms)                               758          758.1     0.8
CPU Time (ms)                                   704          703.9     2.5
Executions                                        1            N/A     N/A
Buffer Gets                                  21,946       21,946.0     1.6
Disk Reads                                        7            7.0     0.0
Parse Calls                                       1            1.0     0.0
Rows                                            214          214.0     N/A
User I/O Wait Time (ms)                          56            N/A     N/A
Cluster Wait Time (ms)                            0            N/A     N/A
Application Wait Time (ms)                        0            N/A     N/A
Concurrency Wait Time (ms)                        0            N/A     N/A
Invalidations                                     0            N/A     N/A
Version Count                                     1            N/A     N/A
Sharable Mem(KB)                                  8            N/A     N/A
          -------------------------------------------------------------

Execution Plan
--------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |       |       |  5203 (100)|          |
|   1 |  SORT GROUP BY     |           |   368 |  5520 |  5203   (7)| 00:01:03 |
|   2 |   TABLE ACCESS FULL| BIG_TABLE |  1589K|    22M|  4940   (2)| 00:01:00 |
--------------------------------------------------------------------------------



Full SQL Text

SQL ID       SQL Text
------------ -----------------------------------------------------------------
96ycbfdk53pj select owner, object_type, count(*) from big_table group by owner
             r, object_type order by owner, object_type


Questions

1) no it didn't take only 3 IO's, that is the trace of the query coordinator. You forgot to look at the other 8 traces for the parallel execution servers

I am using the below alter statments to enable Trace in session. How can I enable other 8 traces for the parallel execution servers from the query coordinator session?

alter session set timed_statistics=true;
alter session set events '10046 trace name context forever, level 12';


2)Parallel query for example - very often does direct reads - bypassing the buffer cache

Tom, if the above statment is True, why I am getting 22,288 Buffer Get's (Logical IO) for the parallel query?
Tom Kyte
June 08, 2011 - 10:30 am UTC

1) it is not very easy at all - not until 11g anyway with the sql monitoring that is newly available.

You'd have to turn trace on in the parallel execution servers - you can use dbms_monitor to enable tracing for another session. Don't forget to turn it off.

2) check the AWR report for direct reads, did you see direct reads take place.

Direct Reads

Rajeshwaran, Jeyabal, June 08, 2011 - 10:58 am UTC

Tom:

I think I am getting wrong here.

check the AWR report for direct reads, did you see direct reads take place

From the AWR report above for the parallel query(SQL ID: 0wk3r40kq7xuw). I don't see Direct reads. I see only Disk reads. Is that i am doing wrong?
Tom Kyte
June 08, 2011 - 11:02 am UTC

I'm not asking you to look at a sql report, I'm asking you to look in the AWR or statspack report for any evidence that "direct reads" took place.


search for the words "direct reads"

Direct Reads

Rajeshwaran, Jeyabal, June 08, 2011 - 11:15 am UTC

I'm not asking you to look at a sql report, I'm asking you to look in the AWR or statspack report for any evidence that "direct reads" took place.

I am running this awrsqlrpt.sql available in $oracle_home/rdbms/admin.

Is that something else has to be run to see this "direct reads" Tom?
Tom Kyte
June 08, 2011 - 12:13 pm UTC

@awrrpt

Direct Reads

Rajeshwaran, Jeyabal, June 08, 2011 - 3:22 pm UTC

Tom:

I am running this below sql from sql*plus and once done calling awrrpt.sql

exec dbms_workload_repository.create_snapshot();

select /*+ parallel(big_table,8) */ owner,object_type,count(*)
from big_table
group by owner,object_type
order by owner,object_type


Here is what i see in AWR Report. You see i am getting table scans (direct read) as zero

Statistic                                     Total     per Second     per Trans
-------------------------------- ------------------ -------------- -------------
rollback changes - undo records                   0            0.0           0.0
rollbacks only - consistent read                  4            1.3           0.5
rows fetched via callback                       367          122.4          45.9
session connect time                              0            0.0           0.0
session cursor cache hits                         6            2.0           0.8
session logical reads                         2,427          809.3         303.4
session uga memory max                    3,798,960    1,266,742.3     474,870.0
shared hash latch upgrades - no                  15            5.0           1.9
sorts (memory)                                   42           14.0           5.3
sorts (rows)                                  8,973        2,992.0       1,121.6
sql area evicted                                  0            0.0           0.0
sql area purged                                   0            0.0           0.0
switch current to new buffer                      0            0.0           0.0
table fetch by rowid                            535          178.4          66.9
table fetch continued row                         0            0.0           0.0
table scan blocks gotten                         20            6.7           2.5
table scan rows gotten                          595          198.4          74.4
table scans (direct read)                         0            0.0           0.0
table scans (long tables)                         0            0.0           0.0
table scans (rowid ranges)                        0            0.0           0.0
table scans (short tables)                        6            2.0           0.8
total number of times SMON poste                  0            0.0           0.0
transaction rollbacks                             0            0.0           0.0
undo change vector size                     144,700       48,249.4      18,087.5
user I/O wait time                                5            1.7           0.6
user calls                                        6            2.0           0.8
user commits                                      8            2.7           1.0
workarea executions - optimal                    36           12.0           4.5
          -------------------------------------------------------------



Tom Kyte
June 08, 2011 - 5:13 pm UTC

Well, it always shows no big table scans either - so not sure what we are looking at.


Direct Reads

Rajeshwaran, Jeyabal, June 08, 2011 - 11:57 pm UTC

Tom:

I think i got it.

rajesh@ORA10GR2> select count(*) from big_table;

  COUNT(*)
----------
  12773888

Elapsed: 00:00:32.35
rajesh@ORA10GR2>
rajesh@ORA10GR2> exec show_space(user,'BIG_TABLE','TABLE');
l_total_blocks****************  188416
l_total_bytes*****************  1543503872
l_unused_blocks***************  1665
l_unused_bytes****************  13639680
l_last_used_extent_file_id****  6
l_last_used_extent_block_id***  91657
l_last_used_block*************  6527
l_unformatted_blocks**********  0
l_unformatted_bytes***********  0
l_fs1_blocks******************  0
l_fs1_bytes*******************  0
l_fs2_blocks******************  0
l_fs2_bytes*******************  0
l_fs3_blocks******************  0
l_fs3_bytes*******************  0
l_fs4_blocks******************  0
l_fs4_bytes*******************  0
l_full_blocks*****************  186083
l_full_bytes******************  1524391936

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.42
rajesh@ORA10GR2>


Then ran the below script from sql*plus

exec dbms_workload_repository.create_snapshot();

select /*+ parallel(big_table,8) */ owner,object_type,object_name
from big_table;


Now this is what i see in AWR.

Top 5 Timed Events                                         Avg %Total
~~~~~~~~~~~~~~~~~~                                        wait   Call
Event                                 Waits    Time (s)   (ms)   Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
PX Deq Credit: send blkd            277,868      17,822     64   98.2      Other
direct path read                     23,911         176      7    1.0   User I/O
CPU time                                             54           0.3
db file scattered read               12,656          32      3    0.2   User I/O
db file sequential read              11,105          21      2    0.1   User I/O
          -------------------------------------------------------------

                                                                   Avg
                                             %Time  Total Wait    wait     Waits
Event                                 Waits  -outs    Time (s)    (ms)      /txn
---------------------------- -------------- ------ ----------- ------- ---------
PX Deq Credit: send blkd            277,868     .0      17,822      64     624.4
direct path read                     23,911     .0         176       7      53.7
db file scattered read               12,656     .0          32       3      28.4
db file sequential read              11,105     .0          21       2      25.0
control file parallel write           3,286     .0           9       3       7.4
control file sequential read         11,338     .0           8       1      25.5
db file parallel write                1,578     .0           5       3       3.5
SQL*Net more data to client         166,611     .0           1       0     374.4
os thread startup                        57     .0           1      24       0.1
log file parallel write               2,492     .0           1       1       5.6
enq: KO - fast object checkp              1     .0           1     813       0.0
Data file init write                     16     .0           0      26       0.0
rdbms ipc reply                          76     .0           0       4       0.2
latch free                               19     .0           0      16       0.0
PX Deq: Signal ACK                        6   66.7           0      35       0.0
SQL*Net message to client            85,410     .0           0       0     191.9
log file sync                           115     .0           0       1       0.3
log file switch completion                2     .0           0      34       0.0
kksfbc child completion                   1  100.0           0      59       0.0
cursor: pin S wait on X                   3  100.0           0      15       0.0
log buffer space                         74     .0           0       0       0.2
read by other session                     4     .0           0       4       0.0
direct path write                        22     .0           0       0       0.0
log file sequential read                  2     .0           0       5       0.0
PX qref latch                         1,658   99.9           0       0       3.7
PX Deq: Table Q Get Keys                 20     .0           0       0       0.0
direct path write temp                   10     .0           0       1       0.0
enq: RO - fast object reuse              73     .0           0       0       0.2
reliable message                         75     .0           0       0       0.2
log file single write                     2     .0           0       1       0.0
LGWR wait for redo copy                 104     .0           0       0       0.2
library cache load lock                   2     .0           0       1       0.0
row cache lock                            8     .0           0       0       0.0
db file single write                      1     .0           0       1       0.0
PX Deq: Table Q qref                      6     .0           0       0       0.0
enq: PS - contention                      3     .0           0       0       0.0
cursor: mutex S                           1     .0           0       0       0.0
Streams AQ: qmn slave idle w            345     .0       9,611   27858       0.8
Streams AQ: qmn coordinator             691   50.1       9,611   13909       1.6
virtual circuit status                  321  100.0       9,583   29853       0.7
PX Idle Wait                          3,595   99.5       7,121    1981       8.1
SQL*Net message from client          85,410     .0       6,756      79     191.9
Streams AQ: waiting for time              4  100.0       2,405  601321       0.0
PX Deq: Execution Msg                   539   34.7         407     756       1.2
jobq slave wait                          40  100.0         120    3000       0.1
PX Deq: Execute Reply                   139     .0          27     196       0.3
PX Deq Credit: need buffer              159     .0           7      45       0.4
PL/SQL lock timer                         2   50.0           1     495       0.0
PX Deq: Parse Reply                      22     .0           0       1       0.0
PX Deq: Table Q Sample                   21     .0           0       0       0.0
PX Deq: Msg Fragment                      7     .0           0       0       0.0
PX Deq: Join ACK                         17     .0           0       0       0.0
class slave wait                         30     .0           0       0       0.1
PX Deq: Table Q Normal                   10     .0           0       0       0.0
          -------------------------------------------------------------



Now Direct Path read is listed in Top 5 Events / Events.

When I removed the parallel hint from the query

exec dbms_workload_repository.create_snapshot();

select owner,object_type,object_name
from big_table;


I don't see the Direct Path read in AWR.

                                                                   Avg
                                             %Time  Total Wait    wait     Waits
Event                                 Waits  -outs    Time (s)    (ms)      /txn
---------------------------- -------------- ------ ----------- ------- ---------
db file scattered read               12,446     .0         176      14     177.8
db file sequential read               2,261     .0           6       3      32.3
control file parallel write           1,391     .0           6       4      19.9
control file sequential read          4,872     .0           3       1      69.6
SQL*Net more data to client         160,000     .0           2       0   2,285.7
db file parallel write                  468     .0           2       3       6.7
log file parallel write                 360     .0           0       1       5.1
os thread startup                        13     .0           0      22       0.2
SQL*Net message to client            85,164     .0           0       0   1,216.6
latch free                                5     .0           0      19       0.1
log file sync                            14     .0           0       3       0.2
direct path write                         4     .0           0       6       0.1
LGWR wait for redo copy                  32     .0           0       0       0.5
virtual circuit status                  139  100.0       4,164   29957       2.0
Streams AQ: qmn slave idle w            149     .0       4,152   27866       2.1
Streams AQ: qmn coordinator             298   50.0       4,152   13933       4.3
SQL*Net message from client          85,164     .0       3,352      39   1,216.6
Streams AQ: waiting for time              1  100.0       1,202 #######       0.0
class slave wait                         13     .0           0       0       0.2
          -------------------------------------------------------------


Please correct me if i am wrong.
Tom Kyte
June 09, 2011 - 9:57 am UTC

you are now seeing the direct path reads taking place.

Reads in dbms_xplan

Prashant Shenava, September 19, 2011 - 3:38 am UTC

I have 2 dbms_xplan outputs pasted below, one I consider BAD and second GOOD because of the response times and reads. The queries needed to generate these xplans have only 1 difference they are running for different months (05/2011 vs 08/2011)

1) BAD - Takes 64.57 seconds
----------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes |
----------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:01:04.57 | 453K| 25576 | 252 |
| 1 | SORT GROUP BY NOSORT | | 1 | 1 | 1 |00:01:04.57 | 453K| 25576 | 252 |
| 2 | VIEW | VW_DAG_0 | 1 | 1 | 128K|00:01:04.90 | 453K| 25576 | 252 |
| 3 | HASH GROUP BY | | 1 | 1 | 128K|00:01:04.38 | 453K| 25576 | 252 |
| 4 | NESTED LOOPS | | 1 | | 128K|00:01:02.30 | 453K| 25324 | 0 |
| 5 | NESTED LOOPS | | 1 | 1 | 144K|00:00:50.93 | 312K| 25324 | 0 |
| 6 | NESTED LOOPS | | 1 | 1 | 144K|00:00:48.79 | 168K| 25324 | 0 |
| 7 | NESTED LOOPS | | 1 | 1 | 144K|00:00:46.22 | 168K| 25324 | 0 |
| 8 | NESTED LOOPS | | 1 | 1 | 146K|00:00:41.69 | 168K| 25324 | 0 |
|* 9 | TABLE ACCESS BY INDEX ROWID | TIME | 1 | 1 | 28 |00:00:00.01 | 11 | 0 | 0 |
|* 10 | INDEX RANGE SCAN | IND_PE01_TIME | 1 | 1 | 28 |00:00:00.01 | 2 | 0 | 0 |
|* 11 | TABLE ACCESS BY GLOBAL INDEX ROWID| SUPPLIER_PAYMENT_SCHEDULE | 28 | 29 | 146K|00:00:54.45 | 168K| 25324 | 0 |
|* 12 | INDEX RANGE SCAN | IND03_SUPP_PAYMT_SCHED | 28 | 2961 | 196K|00:00:00.52 | 212 | 157 | 0 |
|* 13 | INDEX RANGE SCAN | IND_01_BOOKING_CODE | 146K| 1 | 144K|00:00:01.08 | 4 | 0 | 0 |
|* 14 | TABLE ACCESS BY INDEX ROWID | LEDGER | 144K| 1 | 144K|00:00:01.81 | 7 | 0 | 0 |
|* 15 | INDEX UNIQUE SCAN | PK_LEDGER | 144K| 1 | 144K|00:00:00.73 | 5 | 0 | 0 |
|* 16 | INDEX RANGE SCAN | IND_01_SUPPLIER | 144K| 1 | 144K|00:00:01.41 | 144K| 0 | 0 |
|* 17 | TABLE ACCESS BY INDEX ROWID | SUPPLIER | 144K| 1 | 128K|00:00:00.95 | 140K| 0 | 0 |
----------------------------------------------------------------------------------------------------------------------------------------------------

2) GOOD - Takes 11.34 seconds
----------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes |
----------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:11.34 | 426K| 710 | 231 |
| 1 | SORT GROUP BY NOSORT | | 1 | 1 | 1 |00:00:11.34 | 426K| 710 | 231 |
| 2 | VIEW | VW_DAG_0 | 1 | 1 | 119K|00:00:11.48 | 426K| 710 | 231 |
| 3 | HASH GROUP BY | | 1 | 1 | 119K|00:00:11.20 | 426K| 710 | 231 |
| 4 | NESTED LOOPS | | 1 | | 119K|00:00:14.03 | 426K| 479 | 0 |
| 5 | NESTED LOOPS | | 1 | 1 | 135K|00:00:08.18 | 295K| 479 | 0 |
| 6 | NESTED LOOPS | | 1 | 1 | 135K|00:00:06.54 | 159K| 479 | 0 |
| 7 | NESTED LOOPS | | 1 | 1 | 135K|00:00:04.44 | 159K| 479 | 0 |
| 8 | NESTED LOOPS | | 1 | 1 | 136K|00:00:07.42 | 159K| 479 | 0 |
|* 9 | TABLE ACCESS BY INDEX ROWID | TIME | 1 | 1 | 28 |00:00:00.01 | 19 | 0 | 0 |
|* 10 | INDEX RANGE SCAN | IND_PE01_TIME | 1 | 1 | 28 |00:00:00.01 | 2 | 0 | 0 |
|* 11 | TABLE ACCESS BY GLOBAL INDEX ROWID| SUPPLIER_PAYMENT_SCHEDULE | 28 | 29 | 136K|00:00:05.61 | 159K| 479 | 0 |
|* 12 | INDEX RANGE SCAN | IND03_SUPP_PAYMT_SCHED | 28 | 2961 | 187K|00:00:00.32 | 205 | 148 | 0 |
|* 13 | INDEX RANGE SCAN | IND_01_BOOKING_CODE | 136K| 1 | 135K|00:00:00.80 | 3 | 0 | 0 |
|* 14 | TABLE ACCESS BY INDEX ROWID | LEDGER | 135K| 1 | 135K|00:00:01.44 | 7 | 0 | 0 |
|* 15 | INDEX UNIQUE SCAN | PK_LEDGER | 135K| 1 | 135K|00:00:00.57 | 5 | 0 | 0 |
|* 16 | INDEX RANGE SCAN | IND_01_SUPPLIER | 135K| 1 | 135K|00:00:01.02 | 135K| 0 | 0 |
|* 17 | TABLE ACCESS BY INDEX ROWID | SUPPLIER | 135K| 1 | 119K|00:00:00.70 | 131K| 0 | 0 |
----------------------------------------------------------------------------------------------------------------------------------------------------

The index has a high clustering factor (7.5 mi) very close to the number of rows (9 mi) on this table. It wasn't using the index so I set the clustering factor using dbms_stats package to a lower value.

I wanted to understand if the reads are the issue and for the "BAD" (month) the data is scattered causing increase in LIOs. I tested again by re-organizing my table as (CTAS..order by the indexed column) and the performance improved. Am I on the right path. Do you suspect something else?
Tom Kyte
September 19, 2011 - 5:55 pm UTC

given I cannot really read this (does anyone look at what they post?) - can you tell me - do the a-rows and e-rows differ by an order of magnitude anywhere here. It looks like they are way way way way off

if so, that is a huge problem - perhaps we should NOT be using an index at all (probably!)

Sorry ..and thank you

Prashant, September 20, 2011 - 12:36 am UTC

Hi Tom
Sorry I didn't format it well.

You were still correct that there was a problem with estimated and actual rows. Why does Oracle use the index in such a case?
I tried with no_index hint option for testing purposes and it did a full scan but even then the estimated and actual rows were off. Is there a problem with the cardinality and do I need to collect histograms (lack of which) is perhaps causing oracle to select the index when it shouldn't?

Note: The table is yearly partitioned on record create date but not on the report driving date hence global indexes were created.

| Id |Operation |Name |Starts|E-Rows|A-Rows|Buffers|Reads|Writes|
-----------------------------------------------------------------------------------------------
| 0|SELECT STATEMENT | | 1 | | 1 | 516K| 118K| 231 |
| 1| SORT GROUP BY NOSORT | | 1 | 1 | 1 | 516K| 118K| 231 |
| 2| VIEW |VW_DAG_0 | 1 | 1 | 123K| 516K| 118K| 231 |
| 3| HASH GROUP BY | | 1 | 1 | 123K| 516K| 118K| 231 |
| 4| NESTED LOOPS | | 1 | | 123K| 516K| 117K| 0 |
| 5| NESTED LOOPS | | 1 | 1 | 140K| 380K| 117K| 0 |
| 6| NESTED LOOPS | | 1 | 1 | 140K| 239K| 117K| 0 |
| 7| NESTED LOOPS | | 1 | 1 | 142K| 117K| 117K| 0 |
|* 8| HASH JOIN | | 1 | 1 | 142K| 117K| 117K| 0 |
|* 9| TABLE ACCESS BY INDEX ROWID|TIME | 1 | 1 | 28 | 5 | 0 | 0 |
|* 10| INDEX RANGE SCAN |IND_PE01_| 1 | 1 | 28 | 2 | 0 | 0 |
| 11| PARTITION RANGE ALL | | 1 |77527 | 6614K| 117K| 117K| 0 |
|* 12| TABLE ACCESS FULL |SUPPLIER_| 7 |77527 | 6614K| 117K| 117K| 0 |
|* 13| TABLE ACCESS BY INDEX ROWID |LEDGER | 142K| 1 | 142K| 7 | 0 | 0 |
|* 14| INDEX UNIQUE SCAN |PK_LEDGER| 142K| 1 | 142K| 5 | 0 | 0 |
|* 15| TABLE ACCESS BY INDEX ROWID |BOOKING_C| 142K| 1 | 140K| 121K| 0 | 0 |
|* 16| INDEX RANGE SCAN |IND_01_BO| 142K| 1 | 142K| 4 | 0 | 0 |
|* 17| INDEX RANGE SCAN |IND_01_SU| 140K| 2 | 140K| 140K| 0 | 0 |
|* 18| TABLE ACCESS BY INDEX ROWID |SUPPLIER | 140K| 1 | 123K| 136K| 0 | 0 |
-----------------------------------------------------------------------------------------------

This is the query

SELECT
/*+ gather_plan_statistics NO_INDEX(a IND03_SUPP_PAYMT_SCHED) */
COUNT(1)
/* there are few computations that I have removed for readablity purposes */
FROM SUPPLIER_PAYMENT_SCHEDULE a,
SUPPLIER b,
TIME c,
BOOKING_CODE d,
LEDGER e
WHERE a.SUPPLIER_ID = b.SUPPLIER_ID
AND a.LEDGER = e.LEDGER
AND a.ORIG_BC = d.BC
AND e.CURRENCY =
CASE 'USD-Global'
WHEN 'USD-Global'
THEN e.CURRENCY
ELSE
CASE 'USD-Global'
WHEN 'USD-Local'
THEN 'USD'
ELSE 'USD-Global'
END
END
AND a.DUE_DT = c.CAL_DT
AND c.FIN_QTR_NAME = '2011 Q 1'
AND c.FIN_MONTH_NAME = '2011 / 02'
AND d.LEVEL_1_CODE = 'GEOTOTAL'
AND b.SUPPLIER_TYPE = 'VENDOR'
AND
CASE
WHEN a.INVOICE_DATE <> a.DUE_DT
THEN 1
ELSE 0
END = 1
AND CAST(c.FIN_YEAR AS CHARACTER ( 30 ) ) = '2011'
AND (e.LEDGER IN ('Australia Global AUD', 'BZL_FAS', 'Canada Global CAD', 'China Global CNY', 'Cummins Engine Company, Inc.', 'Cummins Singapore Group', 'European Global EUR', 'India Global INR', 'Japan Global JPY', 'Korea Global KRW', 'Mexico Global MXP', 'Norway Global NOK', 'South African Global ZAR', 'UNITED KINGDOM PRIMARY GBP'))
AND a.INVOICE_TYPE <> 'CREDIT'
GROUP BY c.FIN_MONTH_NAME
Tom Kyte
September 20, 2011 - 10:19 am UTC

You were still correct that there was a problem with estimated and actual rows.
Why does Oracle use the index in such a case?
I tried with no_index hint option for testing purposes and it did a full scan
but even then the estimated and actual rows were off. Is there a problem with
the cardinality and do I need to collect histograms (lack of which) is perhaps
causing oracle to select the index when it shouldn't?


why does oracle use the index? because we estimated a given cardinality (we guessed) and it was wrong, hence we used the wrong plan. Wrong cardinality leads many times to wrong plans (and wrong plans are almost always the result of getting the wrong cardinality).

Suppose I told you the following facts:

a) there are 1,000 rows in table T
b) column A has two values - Y and N
c) column B has two values - Y and N
d) the data in columns A and B are uniformly distributed (50% are Y and 50% are N in both columns)


Now, you can tell me easily:

1) how many rows are in the table (1,000)
2) how many rows would come back for "where a = 'N'"
3) how many rows would come back for "where b = 'N'"

Now - you tell me - how many rows will come back for "where a = 'N' and b = 'N'"????

it is somewhere between 0 and 500 - whatever you GUESS - I'll give you a data set that gives a different result. This is how the optimizer works - sometimes it has to GUESS.

and when the guess is wrong - we get the wrong cardinality and that can lead to the wrong plan....


so, how to fix that? tools you have are:

a) dynamic sampling - set it to level 3 or above and we'll validate (verify) our guesses with a dynamic sample
http://www.oracle.com/technetwork/issue-archive/2009/09-jan/o19asktom-086775.html

b) sql profiles - we gather statistics for a query - based on its where clause

c) extended statistics (new in 11g) we could gather statistics on A and B *at the same time* to see if there is a correlation - so we'd be able to accurately estimate.




we'd need to see the filter/access predicates for each step (especially the ones where the cardinality is way off) to diagnose this further at this point.

I'm going to guess it is going to be centered around things like:

AND CAST(c.FIN_YEAR AS CHARACTER ( 30 ) ) = '2011'


that is like "and f(database_column) = 'K'"

we have statistics on database_column - we could guess "database_column = 'K'" pretty well. We probably have NO INFOMATION on f(database_column) - so we'll have to GUESS.

why wouldn't you "and c.fin_year = 2011" ??? I'm assuming fin_year is a number - just compare a number to a number???? Or if fin_year is a string - compare a string to a string - why cast it?


Dynamic Sampling

Prashant, September 21, 2011 - 5:26 am UTC

Thanks for pointing me in this direction. Although I was aware of this but wouldn't have tried it out if you hadn't suggested.

select * from v$version

Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
"CORE 11.2.0.2.0 Production"
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production


2. Results from dynamic sampling
(A) Default dynamic sample of 2 (i.e no sampling)
SQL_ID bh522t78ffyz2, child number 2
-------------------------------------
SELECT /*+ gather_plan_statistics */ D1.c7 AS c1,
D1.c2 / NULLIF( D1.c1, 0) AS c2, D1.c6 / NULLIF( D1.c5, 0) AS c3,
D1.c4 AS c4, D1.c3 AS c5,
CASE WHEN 'USD-Global' IN ('USD-Global', 'USD-Local') THEN
'USD' ELSE 'USD-Global' END AS c6, 0 AS c7, D1.c2
AS c8, D1.c1 AS c9 FROM (SELECT COUNT(DISTINCT
T2009395.INVOICE_ID) AS
c1, SUM(( TRUNC( T2009395.PAYMENT_COMPLETION_DT ) - TRUNC(
T2009395.INVOICE_DATE ) )) AS c2, SUM( CASE WHEN
T2009395.PAYMENT_COMPLETION_DT <= ( CAST(T2009395.DUE_DT AS DATE) + 3 )
THEN 1 ELSE 0 END ) AS c3, SUM( CASE
WHEN T2009395.PAYMENT_COMPLETION_DT > ( CAST(T2009395.DUE_DT AS DATE)
+ 3 ) THEN 1 ELSE 0 END )

Plan hash value: 190296202

---------------------------------------------------------------------------------------------
| Id|Operation |Name |Starts|E-Rows|A-Rows|Buffers|Reads|
---------------------------------------------------------------------------------------------
| 0|SELECT STATEMENT | | 1| | 1| 547K| 252|
| 1| SORT GROUP BY NOSORT | | 1| 1| 1| 547K| 252|
| 2| VIEW |VW_DAG_0| 1| 1| 128K| 547K| 252|
| 3| HASH GROUP BY | | 1| 1| 128K| 547K| 252|
| 4| NESTED LOOPS | | 1| | 128K| 547K| 0|
| 5| NESTED LOOPS | | 1| 1| 144K| 407K| 0|
| 6| NESTED LOOPS | | 1| 1| 144K| 263K| 0|
| 7| NESTED LOOPS | | 1| 1| 144K| 263K| 0|
| 8| NESTED LOOPS | | 1| 1| 146K| 139K| 0|
|* 9| TABLE ACCESS BY INDEX ROWID |TIME | 1| 1| 28 | 21 | 0|
|*10| INDEX RANGE SCAN |IND_PE01| 1| 1| 28 | 2 | 0|
|*11| TABLE ACCESS BY GLOBAL INDEX ROWID|SUPPLIER| 28| 21| 146K| 139K| 0|
|*12| INDEX RANGE SCAN |IND03_SU| 28| 2491| 196K| 213 | 0|
|*13| TABLE ACCESS BY INDEX ROWID |BOOKING_| 146K| 1| 144K| 124K| 0|
|*14| INDEX RANGE SCAN |IND_01_B| 146K| 1| 146K| 5 | 0|
|*15| TABLE ACCESS BY INDEX ROWID |LEDGER | 144K| 1| 144K| 9 | 0|
|*16| INDEX UNIQUE SCAN |PK_LEDGE| 144K| 1| 144K| 6 | 0|
|*17| INDEX RANGE SCAN |IND_01_S| 144K| 2| 144K| 144K| 0|
|*18| TABLE ACCESS BY INDEX ROWID |SUPPLIER| 144K| 1| 128K| 139K| 0|
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

9 - filter(CAST(TO_CHAR("T2009605"."FIN_YEAR") AS CHARACTER ( 30 ) )='2011')
10 - access("T2009605"."FIN_MONTH_NAME"='2011 / 05' AND "T2009605"."FIN_QTR_NAME"='2011 Q 2')
11 - filter((CASE WHEN "T2009395"."INVOICE_DATE"<>"T2009395"."DUE_DT" THEN 1 ELSE 0 END =1 AND "T2009395"."INVOICE_TYPE"<>'CREDIT' AND
INTERNAL_FUNCTION("T2009395"."LEDGER")))
12 - access("T2009395"."DUE_DT"="T2009605"."CAL_DT")
13 - filter("T2009726"."LEVEL_1_CODE"='GEOTOTAL')
14 - access("T2009395"."ORIG_BC"="T2009726"."BC")
15 - filter("T2009005"."CURRENCY"=CASE WHEN ('USD-Global'='USD-Global') THEN "T2009005"."CURRENCY" ELSE CASE WHEN
('USD-Global'='USD-Local') THEN 'USD' ELSE 'USD-Global' END END )
16 - access("T2009005"."LEDGER"="T2009395"."LEDGER")
filter(("T2009005"."LEDGER"='Australia Global AUD' OR "T2009005"."LEDGER"='BZL_FAS' OR "T2009005"."LEDGER"='Canada Global CAD' OR
"T2009005"."LEDGER"='China Global CNY' OR "T2009005"."LEDGER"='XYZ Engine Company, Inc.' OR "T2009005"."LEDGER"='XYZ Singapore
Group' OR "T2009005"."LEDGER"='European Global EUR' OR "T2009005"."LEDGER"='India Global INR' OR "T2009005"."LEDGER"='Japan Global JPY' OR
"T2009005"."LEDGER"='Korea Global KRW' OR "T2009005"."LEDGER"='Mexico Global MXP' OR "T2009005"."LEDGER"='Norway Global NOK' OR
"T2009005"."LEDGER"='South African Global ZAR' OR "T2009005"."LEDGER"='UNITED KINGDOM PRIMARY GBP'))
17 - access("T2009395"."SUPPLIER_ID"="T2010557"."SUPPLIER_ID")
18 - filter("T2010557"."SUPPLIER_TYPE"='VENDOR')

(B) Set dynamic sample of 3
alter session set optimizer_dynamic_sampling = 3;

SQL_ID 1ftt3x9rhz9nn, child number 0
-------------------------------------
SELECT /*+ gather_plan_statistics */ D1.c7 AS c1,
D1.c2 / NULLIF( D1.c1, 0) AS c2, D1.c6 / NULLIF( D1.c5, 0) AS c3,
D1.c4 AS c4, D1.c3 AS c5,
CASE WHEN 'USD-Global' IN ('USD-Global', 'USD-Local') THEN
'USD' ELSE 'USD-Global' END AS c6, 0 AS c7, D1.c2
AS c8, D1.c1 AS c9 FROM (SELECT /*+ LEADING(T2009605) */
COUNT(DISTINCT T2009395.INVOICE_ID)
AS c1, SUM(( TRUNC( T2009395.PAYMENT_COMPLETION_DT ) -
TRUNC( T2009395.INVOICE_DATE ) )) AS c2, SUM( CASE
WHEN T2009395.PAYMENT_COMPLETION_DT <= ( CAST(T2009395.DUE_DT AS DATE)
+ 3 ) THEN 1 ELSE 0 END ) AS c3, SUM( CASE
WHEN T2009395.PAYMENT_COMPLETION_DT > ( CAST(T2009395.DUE_DT AS
DATE) + 3 ) THEN 1 ELSE 0 END )

Plan hash value: 190296202

---------------------------------------------------------------------------------------------
| Id|Operation |Name |Starts|E-Rows|A-Rows|Buffers|Reads|
---------------------------------------------------------------------------------------------
| 0|SELECT STATEMENT | | 1| | 1| 547K| 252|
| 1| SORT GROUP BY NOSORT | | 1| 1| 1| 547K| 252|
| 2| VIEW |VW_DAG_0| 1| 1| 128K| 547K| 252|
| 3| HASH GROUP BY | | 1| 1| 128K| 547K| 252|
| 4| NESTED LOOPS | | 1| | 128K| 547K| 0|
| 5| NESTED LOOPS | | 1| 1| 144K| 407K| 0|
| 6| NESTED LOOPS | | 1| 1| 144K| 263K| 0|
| 7| NESTED LOOPS | | 1| 1| 144K| 263K| 0|
| 8| NESTED LOOPS | | 1| 1| 146K| 139K| 0|
|* 9| TABLE ACCESS BY INDEX ROWID |TIME | 1| 1| 28 | 21 | 0|
|*10| INDEX RANGE SCAN |IND_PE01| 1| 1| 28 | 2 | 0|
|*11| TABLE ACCESS BY GLOBAL INDEX ROWID|SUPPLIER| 28| 1510| 146K| 139K| 0|
|*12| INDEX RANGE SCAN |IND03_SU| 28| 2491| 196K| 215 | 0|
|*13| TABLE ACCESS BY INDEX ROWID |BOOKING_| 146K| 1| 144K| 124K| 0|
|*14| INDEX RANGE SCAN |IND_01_B| 146K| 1| 146K| 5 | 0|
|*15| TABLE ACCESS BY INDEX ROWID |LEDGER | 144K| 1| 144K| 7 | 0|
|*16| INDEX UNIQUE SCAN |PK_LEDGE| 144K| 1| 144K| 5 | 0|
|*17| INDEX RANGE SCAN |IND_01_S| 144K| 2| 144K| 144K| 0|
|*18| TABLE ACCESS BY INDEX ROWID |SUPPLIER| 144K| 1| 128K| 139K| 0|
---------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

9 - filter(CAST(TO_CHAR("T2009605"."FIN_YEAR") AS CHARACTER ( 30 ) )='2011')
10 - access("T2009605"."FIN_MONTH_NAME"='2011 / 05' AND "T2009605"."FIN_QTR_NAME"='2011 Q 2')
11 - filter((CASE WHEN "T2009395"."INVOICE_DATE"<>"T2009395"."DUE_DT" THEN 1 ELSE 0 END =1 AND "T2009395"."INVOICE_TYPE"<>'CREDIT' AND
INTERNAL_FUNCTION("T2009395"."LEDGER")))
12 - access("T2009395"."DUE_DT"="T2009605"."CAL_DT")
13 - filter("T2009726"."LEVEL_1_CODE"='GEOTOTAL')
14 - access("T2009395"."ORIG_BC"="T2009726"."BC")
15 - filter("T2009005"."CURRENCY"=CASE WHEN ('USD-Global'='USD-Global') THEN "T2009005"."CURRENCY" ELSE CASE WHEN
('USD-Global'='USD-Local') THEN 'USD' ELSE 'USD-Global' END END )
16 - access("T2009005"."LEDGER"="T2009395"."LEDGER")
filter(("T2009005"."LEDGER"='Australia Global AUD' OR "T2009005"."LEDGER"='BZL_FAS' OR "T2009005"."LEDGER"='Canada Global CAD' OR
"T2009005"."LEDGER"='China Global CNY' OR "T2009005"."LEDGER"='Cummins Engine Company, Inc.' OR "T2009005"."LEDGER"='Cummins Singapore
Group' OR "T2009005"."LEDGER"='European Global EUR' OR "T2009005"."LEDGER"='India Global INR' OR "T2009005"."LEDGER"='Japan Global JPY' OR
"T2009005"."LEDGER"='Korea Global KRW' OR "T2009005"."LEDGER"='Mexico Global MXP' OR "T2009005"."LEDGER"='Norway Global NOK' OR
"T2009005"."LEDGER"='South African Global ZAR' OR "T2009005"."LEDGER"='UNITED KINGDOM PRIMARY GBP'))
17 - access("T2009395"."SUPPLIER_ID"="T2010557"."SUPPLIER_ID")
18 - filter("T2010557"."SUPPLIER_TYPE"='VENDOR')

Note
-----
- dynamic sampling used for this statement (level=3)





Questions
---------
1. Seems the table access estimates improved but its still off (28*1510 = 42 K compared with 146K). Any reason?
2. It selects the index so I am assuming that feels doing index scan is still better but the cardinalities seems off (28*2491 = 70K)? Strangely I also see 0 reads in most of the rows
3. We are planning to gather stats with “method_opt => size auto for all columns”. During production cutover we will run historical and gather stats.
However since sys.col_usage$ contains no rows it wouldn't generate histograms for columns which need it? Am I right? So do you recommend copying stats to production when we go-live?
4. I am yet to try sql profiles and extended stats. I am aware of what it does. Thanks for pointing me in this direction.
5. I have asked the OBIEE developers to remove cast. They have obliged :)

I must also add dynamic sampling helped in improving performance for some of the other reports. There are problems with design (stats gathering is been done with analyze currently not with dbms_stats etc)
which we are planning to correct but in the short term I am thinking of implementing a session level logon trigger to enable dynamic sampling.


Tom Kyte
September 21, 2011 - 9:40 am UTC

1) if you didn't have predicates affected by level 3, or just some - then it wouldn't check them out. did you go to the documentation and read about what the various levels do? the article recaps that information as well.

3) that would be correct - it would not. so - you could run the queries before gathering stats, that would be all that is necessary.


Thanks

Prashant, September 22, 2011 - 8:43 am UTC

Thanks. Extended stats is a good feature I am using for the first time and I have found success with my testing.

I am still not very clear on how Oracle estimates join cardinality. Can you please refer me to some docs?
Tom Kyte
September 22, 2011 - 6:00 pm UTC

it mostly has to average. If you join a table T1 by primary key to another table T2 - it would look at the number of rows in T2, the number of distinct values in that column(s) in T2 and average it out.

For example, join emp to dept by deptno. for each row in DEPT it figures it will get - it will think it is getting about 14 (number of rows in emp) / 3 (number of distinct values of deptno in emp) - or about 4 or 5 rows from emp for each dept record.


Logical Vs Physical Reads

Lal, September 21, 2012 - 9:03 am UTC

Tom,

I read this entire link. I have a query involving joins (outer joins) of 10 tables. Out of the 10 some are big transaction tables and some are small tables. When the data is not in cache the response time is around 7 secs (returns around 1500 rows). The same query when executed again returned in less than a second.

The plan used is the same for both executions and the consistent gets also the same for both executions. The consistent gets was around 4500 and physical reads around 1000. For the second execution the physical reads are 0.

I verified the execution plan(actual run plan) and it uses the required indexes and join order as expected (logically). The top wait event for this query is db_file_sequential_read.

Why is it taking 7 secs for the first execution?.

The difference in execution of two queries based on my undestanding is, First execution had to account for hard parsing and also the physical reads of 1000 blocks.

will hard parse and physical reads of 1000 blocks take 6 secs?

I did a full scan of another big table and based on the statistics of the query from db console i found that one block read required only 0.03 milli seconds.
If thats the case for 1000 phsycal reads it should take only (1000*0.03=30 milli seconds).
Will hard parse take more than a second?

Really confused on the split up of 7 secs for the first execution.

Can you please enlighten me.

The NFR requirement for this query is less than a sec. Majority of the cases the data wont be available in buffer cache.

Is there a way i can cache the big tables in memory? (Database Smart Flash Cache?)

I am using 11.2 database

Thanks in advance for your time
Tom Kyte
September 26, 2012 - 12:53 pm UTC

A random read (seek+IO) takes about 5ms on average.

if you do 1,000 of them, that is about 5 seconds. Add in the time to process the query and you are close to six.

Throw in a slow IO here or there and you easily have seven.


are you sure it was 0.03ms that is a really small number, are you sure you don't mean 3ms?


if you were taking IO times from a full scan (using big juicy multiblock reads) and dividing by block counts - then you have a meaningless number!

Time for physical reads

Lal, September 27, 2012 - 1:03 am UTC

Tom,

Thanks very much for your reply.

The following stats i got from db console. Basically i created a single table with all the joins and then did a count(*) from the summary table.

     Total  Per Execution  Per Row 
Executions    1   1     1.00 
Elapsed Time (sec) 31.66  31.66    31.66 
CPU Time (sec)   6.80  6.80    6.80 
Buffer Gets   947,326 947,326.00   947,326.00 
Disk Reads    947,378 947,378.00   947,378.00 
Direct Writes   0   0.00    0.00 
Rows     1   1.00    1 
Fetches    1   1.00    1.00 
    
31.66 secs for 947,378 Disk reads
So time required for one block = 31660/947,378 = 0.03ms

As you mentioned this was for a full scan.
Few more doubts.
1)For a nested loop query plan if 1000 blocks has to be read, will that reads happen sequentially or can that happen parallely(if data is striped)?
2)How can i measure the time taken for a single block physical read for my database? Any easy ways to find that?
3)If i have to get a query response time of 1 sec, i can do a max of may be 200 disk reads right? (assuming 5ms for one disk read and without any hard parse)
Tom Kyte
September 27, 2012 - 9:05 am UTC

your math is a bit off.

if you have an elapsed some of 31.66 seconds and you did 947,326 things, then the average time of each thing is:
ops$tkyte%ORA11GR2> select 31.66/947326 from dual;

31.66/947326
------------
   .00003342


but you were doing multiblock reads - large sequential (orders of magnitude less seek time) - not single block reads.



1) sequentially, serially

2) dbms_stats, system stats has that information. or you can look at the wait events for a query that used indexes to read all of its blocks

3) simple math, yes. 200*5ms = 1000ms = 1second.


ASM and parallel reads

Jayadevan, October 03, 2012 - 11:43 pm UTC

Hi Tom,
1) With reference to the question on will nested loop join do 'things' sequentially or in parallel - If the nested loop needs data to be fetched from two tables, and the tables are on separate disks - is it possible that the table reads will happen in parallel and hence the effective physical read time may go down?
2) Do we have some guideline on how to group big tables into different tablespaces and spread them across disks?
3) In case it is actually one physical disk, but we use ASM, can we get a similar effect (i.e. parallel reads)?
Thanks,
Jay
Tom Kyte
October 09, 2012 - 12:04 pm UTC

1) no, it doesn't work that way. Think about the processing involved.

you use index I1 to get a rowid, that tells you what block to read from T1. T1's row data gives you the key for index I2 into table T2, so you do the single block reads on I2 to get a rowid that tells you what block to read from T2.

it is one thing after the other.


2) just stripe everything over your data disks. ASM does that for you automatically.

3) yes - you can issue more than one IO request against a single disk. Obviously they won't really be "in parallel" physically - but they will be performed apparently in parallel.

Stripe and 'apparent' parallel reads

Jayadevan, October 11, 2012 - 11:59 pm UTC

Tom,
Thanks a lot for spending so much time on educating us on the finer details.
1) You are saying - don't bother with splitting tables into different tablespaces (for the sake of performance) or putting specific data files on different disks, but try to stripe the data chunks - either 'real' stripes (use RAID) or using ASM (i.e. automatic)?
2) When we use ASM, there will be 'apparent' parallelism. What is that and how will it help?
Jay
Tom Kyte
October 12, 2012 - 7:54 am UTC

1) yup. use tablespaces (and this has been true forever, this is not new or changed) as an administrative tool, as a tool to make your life easier. they were *never* a performance tuning tool.

2) I think you misunderstood me. If you have one disk - the heads can be in one place only. You never would get a true "parallel read", they might appear to be in parallel to the human eye - but you know that only one read could really be taking place at a time. If you have N disks where N > 1, then you could truly get a parallel read.

how to reduce LIO

A reader, October 19, 2012 - 2:17 am UTC

Hi Tom,

The more i read, the more i get confused for some certain topic. I know this is due to my poor knowledge base:)

But still want your patient to elaborate more on 'how to reduce LIO'.

From your example, you mentioned use unnecessary INDEX is a bad way, due to it need more latch to read those index data.

Beside overused index, could you plesae provide another example?

Basically my understanding is LIO is what ORACLE used to get our query resultset, how can it be reduced?
Tom Kyte
October 23, 2012 - 11:08 am UTC

xiaowei, January 21, 2013 - 5:59 am UTC

in autotrace,db block gets and consistent gets are logic read.db block gets appear in delete update ,insert.consistent gets appear in select.is it correct?
Tom Kyte
January 22, 2013 - 1:39 am UTC

in general. you have select for updates to consider as well.

xiaowei, January 21, 2013 - 6:24 am UTC

in autotrace,db block gets and consistent gets are logic read.db block gets appear in delete update ,insert.consistent gets appear in select.is it correct?

Why i have different Trace Output

Dillip Kumar Sahoo, April 22, 2014 - 7:42 am UTC

Hi Tom,
I am asked many question in past, Thanks a ton for answering them all. They all worked for me. ;)

I have funny issue, when i run autotrace on my production server, i get trace with different set of database parameters:

Statistics
----------------------------------------------------------
1 recursive calls
0 spare statistic 4
0 ges messages sent
0 db block gets direct
0 calls to kcmgrs
0 PX remote messages recv'd
224855 buffer is pinned count
72899 buffer is not pinned count
3 workarea executions - optimal
0 workarea executions - onepass
94 rows processed

Why is this, as i would love to see these columns instead which speak English :-):

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
0 consistent gets
0 physical reads
0 redo size
331 bytes sent via SQL*Net to client
332 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed


Thanks a lot !