Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Anandkumar.

Asked: September 21, 2011 - 11:09 am UTC

Last updated: November 01, 2013 - 7:52 pm UTC

Version: 11.2.0.1

Viewed 10K+ times! This question is

You Asked

Hi Tom,

I need one clarification regarding 11gR2 about direct path reads.

In one of our databases 11.2.0.1 running on RHEL linux 64 bit shows wait event
direct path reads and I cant see any events with name db file scattered read whenever there is full table scan.

From one of the metalink doc I came to know that this direct path read behavior got changed a bit compared to early releases when it comes to serial table scans.

As per metalink doc
-------------------------------------------------------------------------------
There have been changes in 11g in the heuristics to choose between direct path reads or reads through buffer cache for serial table scans.
In 10g, serial table scans for "large" tables used to go through cache (by default) which is not the case anymore. In 11g, this decision to read via direct path or through cache is based on the size of the table, buffer cache size and various other stats.Direct path reads are faster than scattered reads and have less impact on other processes because they avoid latches.

It means in 11gr2 and higher we wont see any db file scattered reads wait event any more? instead we will see direct path read whenever a session is waiting on db file scattered read(Full table scan).

Here is the configuration

Version : 11.2.0.1
OS : RHEL 5.3
SGA Target : 12Gig
PAG Target : 4Gig
Work area policy : Auto
Sort Area Size : 65535

In AWR top 5 wait events db file sequential read and direct path read are the top most events.

Your inputs are really valuable to us.

and Tom said...

It means in 11gr2 and higher we wont see any db file scattered reads wait event any more?

No, it does not mean that - the note doesn't say that, in fact the note states:

In 11g, this decision to read via direct path or through cache is based on the size of the table, buffer cache size and various other stats


It states it makes a decision based on those factors.

select count(*)
from
 t


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.01       0.02          0       1032          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.01       0.02          0       1032          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 274
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=1032 pr=0 pw=0 time=23995 us)
     72137      72137      72137   TABLE ACCESS FULL T (cr=1032 pr=0 pw=0 time=59165 us cost=288 size=0 card=72137)


that was a tkprof on 11gr2 that shows a modest sized table was done via the cache whereas:

select count(secondary)
from
 big_table.big_table


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.68       1.48      14501      14508          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.68       1.49      14501      14508          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 129
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=14508 pr=14501 pw=0 time=1489842 us)
   1000000    1000000    1000000   TABLE ACCESS FULL BIG_TABLE (cr=14508 pr=14501 pw=0 time=562526 us cost=3950 size=2000000 card=1000000)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       2        0.00          0.00
  db file sequential read                         1        0.00          0.00
  direct path read                              228        0.17          1.30
  SQL*Net message from client                     2        0.00          0.00



for a much larger table, it did direct IO.

Rating

  (11 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Tkprof in 11GR2

Rajeshwaran, Jeyabal, September 24, 2011 - 2:45 pm UTC

Tom:

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=14508 pr=14501 pw=0 time=1489842 us)
   1000000    1000000    1000000   TABLE ACCESS FULL BIG_TABLE (cr=14508 pr=14501 pw=0 
time=562526 us cost=3950 size=2000000 card=1000000)


1) What does Rows (1st), Rows (avg), Rows (max) those mean in Tkprof? also what option should i use to get the output like that in Tkprof ?

2)I was refering the below documentation link, But i dont find info about Rows (1st), Rows (avg), Rows (max). Can you guide me to the proper link?

http://download.oracle.com/docs/cd/E11882_01/server.112/e16638/sqltrace.htm#i4191
Tom Kyte
September 25, 2011 - 11:33 am UTC

When you execute a query more than once - you have a "1st" time you executed it - so we report the rows flowing through each step of the plan for the "1st" execution. You also have an average and an maximum number of rows.

for example:

ops$tkyte%ORA11GR2> create table t as select * from all_objects;

Table created.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> variable x varchar2(30)
ops$tkyte%ORA11GR2> @trace
ops$tkyte%ORA11GR2> alter session set events '10046 trace name context forever, level 12';

Session altered.

ops$tkyte%ORA11GR2> exec :x := 'SCOTT';

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> select count(*) from t where owner = :x;

  COUNT(*)
----------
        11

ops$tkyte%ORA11GR2> exec :x := 'SYS';

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> select count(*) from t where owner = :x;

  COUNT(*)
----------
     30982

ops$tkyte%ORA11GR2> exec :x := user

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> select count(*) from t where owner = :x;

  COUNT(*)
----------
        12


and tkprof will show (in 11.2 and above only....)

select count(*) 
from
 t where owner = :x
  

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        3      0.00       0.00          0          0          0           0
Execute      3      0.00       0.00          0          1          0           0
Fetch        6      0.08       0.12        703       3099          0           3
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       12      0.09       0.13        703       3100          0           3

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 274  
Number of plan statistics captured: 3

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------
         1          1          1  SORT AGGREGATE (cr=1033 pr=234 pw=0....
        11      10335      30982   TABLE ACCESS FULL T (cr=1033 pr=234 pw....



so, the first time I executed it - we had 11 rows go through - the maximum we saw was 30,982 and the average (11+12+30982)/3 = 10,335...

see also

http://antognini.ch/2010/10/tkprof-new-features-in-11gr2-11-2-0-2/


A reader, September 26, 2011 - 8:07 am UTC

Hi Tom,

As usual excellent way of demonstrating it when someone like me are in confusion , it's clear now.

If direct path read is top most event what could be the best way to tackle this issue. One more thing these are apps related queries.

One thing I noticed we have db_file_multiblock_read_count of 128 and not using workload system stats. Do you think this high value might be leading to full table scans instead of index access paths in some cases. For a test case I traced one simple query to see what actual value oracle is using internally for this MBRC and I surprised to see that its using a value of 16.

Here is the test case I followed

SQL> alter session set events '10046 trace name context forever, level 12';

SQL>select count(*) From APPLSYS.FND_CONCURRENT_REQUESTS;

SQL> alter session set events '10046 trace name context forever, off';

Below is the sample of trace file what I found

WAIT #2: nam='direct path read' ela= 223903 file number=701 first dba=635465 block cnt=16 obj#=74112 tim=1316715335780783
WAIT #2: nam='direct path read' ela= 17018 file number=711 first dba=740137 block cnt=16 obj#=74112 tim=1316715335798014
WAIT #2: nam='direct path read' ela= 2762 file number=732 first dba=748906 block cnt=15 obj#=74112 tim=1316715335801007
WAIT #2: nam='direct path read' ela= 14778 file number=741 first dba=744217 block cnt=16 obj#=74112 tim=1316715335815908
WAIT #2: nam='direct path read' ela= 2881 file number=761 first dba=741417 block cnt=16 obj#=74112 tim=1316715335819018
WAIT #2: nam='direct path read' ela= 82 file number=772 first dba=737210 block cnt=15 obj#=74112 tim=1316715335819191
WAIT #2: nam='direct path read' ela= 15666 file number=782 first dba=735065 block cnt=16 obj#=74112 tim=1316715335834939
WAIT #2: nam='direct path read' ela= 525 file number=790 first dba=732457 block cnt=16 obj#=74112 tim=1316715335835608
WAIT #2: nam='direct path read' ela= 15283 file number=798 first dba=618489 block cnt=16 obj#=74112 tim=1316715335851009
WAIT #2: nam='direct path read' ela= 1949 file number=813 first dba=723689 block cnt=16 obj#=74112 tim=1316715335853179
WAIT #2: nam='direct path read' ela= 1082 file number=821 first dba=721385 block cnt=16 obj#=74112 tim=1316715335854353
WAIT #2: nam='direct path read' ela= 100517 file number=827 first dba=613321 block cnt=16 obj#=74112 tim=1316715335954952
WAIT #2: nam='direct path read' ela= 15234 file number=838 first dba=623305 block cnt=16 obj#=74112 tim=1316715335970444
WAIT #2: nam='direct path read' ela= 5739 file number=132 first dba=730761 block cnt=16 obj#=74112 tim=1316715335976414


Kindly let me know your opinion on this high value of MBRC 128. Present value is ok or its causing optimizer to favor full table scans because as per trace its using MBRC value of 16 at the max.

Regards
Anandkumar S.M.
Tom Kyte
September 26, 2011 - 9:44 am UTC

If direct path read is top most event what could be the best way to tackle this issue. One more thing these are apps related queries.


well, you might already be optimized - as optimized as you can get on your platform in anycase - on your hardware.

Indexes are not necessarily faster than a full scan.

However, you should not be setting the mbrc in 10g and above - you should just let it default.

setting it artificially high will affect the costing of a full scan:

ops$tkyte%ORA11GR2> connect /
Connected.
ops$tkyte%ORA11GR2> set linesize 1000
ops$tkyte%ORA11GR2> set autotrace traceonly explain
ops$tkyte%ORA11GR2> select count(subobject_name) from big_table.big_table;

Execution Plan
----------------------------------------------------------
Plan hash value: 599409829

--------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |     1 |     2 |  3942   (1)| 00:00:48 |
|   1 |  SORT AGGREGATE    |           |     1 |     2 |            |          |
|   2 |   TABLE ACCESS FULL| BIG_TABLE |  1000K|  1953K|  3942   (1)| 00:00:48 |
--------------------------------------------------------------------------------

ops$tkyte%ORA11GR2> alter session set db_file_multiblock_read_count = 16;

Session altered.

ops$tkyte%ORA11GR2> select count(subobject_name) from big_table.big_table;

Execution Plan
----------------------------------------------------------
Plan hash value: 599409829

--------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |     1 |     2 |  3187   (1)| 00:00:39 |
|   1 |  SORT AGGREGATE    |           |     1 |     2 |            |          |
|   2 |   TABLE ACCESS FULL| BIG_TABLE |  1000K|  1953K|  3187   (1)| 00:00:39 |
--------------------------------------------------------------------------------

ops$tkyte%ORA11GR2> alter session set db_file_multiblock_read_count = 128;

Session altered.

ops$tkyte%ORA11GR2> select count(subobject_name) from big_table.big_table;

Execution Plan
----------------------------------------------------------
Plan hash value: 599409829

--------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |     1 |     2 |  2526   (1)| 00:00:31 |
|   1 |  SORT AGGREGATE    |           |     1 |     2 |            |          |
|   2 |   TABLE ACCESS FULL| BIG_TABLE |  1000K|  1953K|  2526   (1)| 00:00:31 |
--------------------------------------------------------------------------------


You want to use the default cost - not the artificially lowered cost - my default cost is the highest of the above three costs - that is what you want to go with.


You can test what the effect on the plan would be by setting your sessions value to various values and seeing if the plan changes or not...

Oleksandr Alesinskyy, September 27, 2011 - 4:00 am UTC

A reader, September 27, 2011 - 8:09 am UTC

Hi Tim,

I agree with you, I tried to change the db_file_multiblock_read_count value at session level but irrespective of the MBRC value cost of the given query is remaining same across for all the changed values for MBRC.

I will come up with test case to demonstrate the same here.

Thanks here again for your direction.

Regards
Anandkumar S.M.
Tom Kyte
September 27, 2011 - 8:34 am UTC

You don't have to come up with a test case - no one ever said "the plan will definitely change"

Only that "it affects the costing of the full scan and you want to let it default".

I did not say the plan would 100% definitely change. Only that you definitely want to let db_file_multiblock_read_count default in 10g and above.




A reader, September 27, 2011 - 9:29 am UTC

Hi Tom,

Thanks for clarifying it.

Regards
Anandkumar S.M.

regarding MBRC

Michael, September 28, 2011 - 7:03 am UTC

tom:
<quote>
However, you should not be setting the mbrc in 10g and above - you should just let it default.
</quote>


docu ( http://download.oracle.com/docs/cd/B28359_01/server.111/b28274/stats.htm#sthref1221 ):
<quote>
13.4.1.2 Multiblock Read Count
...
If Oracle Database cannot gather or validate gathered mbrc or mreadtim values, but has gathered sreadtim and cpuspeed values, then only the sreadtim and cpuspeed values are used for costing. In this case, the optimizer uses the value of the initialization parameter DB_FILE_MULTIBLOCK_READ_COUNT to cost a full table scan. However, if DB_FILE_MULTIBLOCK_READ_COUNT is not set or is set to 0 (zero), then the optimizer uses a value of 8 for costing.
</quote>


that's why i set DB_FILE_MULTIBLOCK_READ_COUNT to 128 (my max. IO size) as i don't want to get 8 if statistics gathering couldn't calculate the MBRC.

any opposites?

Tom Kyte
September 28, 2011 - 11:59 am UTC

let it default.

http://docs.oracle.com/docs/cd/B19306_01/server.102/b14237/initparams047.htm#REFRN10037

the default is ALREADY your maximum IO size - but you don't want to cost it with that since you'll pretty much NEVER do the maximum IO size.

regarding MBRC

Michael, September 29, 2011 - 8:32 am UTC

IF the MBRC cannot be computed
AND DB_FILE_MULTIBLOCK_READ_COUNT is not set
THEN the optimizer uses a value of 8 for costing

we are recalculating system stats once a month as we have a highly volatile (fully virtualised and shared) environment. i found the database often not beeing able to compute the MBRC. i would fall back to 8 in that case (having the default settings) which i don't want.

do you still think to let it on default?

Tom Kyte
September 30, 2011 - 5:50 pm UTC

it would use 8 to COST

it would use 1mb to do the IO

I don't think you should be recalculating once a month - just collect it once and for all and be done with it.


why don't you want 8 to COST and 1mb to READ? Seems like a good fair setting.


regarding MBRC

Michael, October 03, 2011 - 3:34 am UTC

hi,

our calculated MBRC is much higher than 8, mostly somewhere between 32 and 64.

using 8 for costing would probably result in much more NL joins and single block reads when hash joins and multiblock io would be much more efficient. no?

cheers,
michael
Tom Kyte
October 03, 2011 - 9:58 am UTC

probably not, no. Look at the costing differences I demonstrated above.

and if you have a calculated mbrc of 32 to 64 - it would USE THAT, not 8. It would use 8 when it didn't know better. (that is why you just want to capture it once and let it be)

I've said what I've said. Go default. If you don't want to - so be it. I've tried to explain why I say what I say.

direct path read too high

TARIQ LAKHO, December 17, 2012 - 3:38 am UTC

Dear Tom,

I am using Oracle 11g 11.1.0.7 and observed that one of my production database giving followin values in AWR. I took multiple AWR report but this value is very high.

Event Waits Time(s) Avg wait (ms) % DB time Wait Class
direct path read 6,658,002 13,731 2 21.49 User I/O
direct path read 745,665 1,188 2 3.39 User I/O
direct path read 1,922,646 2,674 1 3.55 User I/O
direct path read 3,482,257 4,204 1 4.97 User I/O

How can I receduce this. Please give two options to handel this situation.
Tom Kyte
December 18, 2012 - 12:31 pm UTC

1) run fewer queries
2) buy faster disks



2ms is awesome for a physical IO, the average is usually around 5ms...

I would guess you are not really doing physical IO's but you are using the OS file system cache as a secondary sga.

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:2109833600346625821


if you are using parallel query, then you are done (as long as the full scans make sense)

if you are not using parallel query, reviewing the various queries that are full scanning and make sure it makes sense to full scan - are you missing an index? (i am NOT SAYING FULL SCANS ARE BAD - I am just saying "make sure you are not full scanning 1,000,000,000 rows to find 10 rows)

25 ms of Direct path read in 11.2.0.3

Vignesh, October 16, 2013 - 2:42 pm UTC

After upgrade to 11.2.0.3 , Direct path read appears in top 5 with an average wait time of 10 to 25 ms .

Is this should be addressed?

Tom Kyte
November 01, 2013 - 7:52 pm UTC

maybe yes, maybe no.

do things run faster or slower after the upgrade?

more queries will use direct path reads rather than process tons of blocks through the buffer cache in 11 than 10g did. It used to be that only parallel queries used direct path IO - now serial queries can too.

10-25ms for a single read (it is not clear what the 10-25 represents) would be a long read however, that would be a problem if that is for a single IO. single IO's should probably be 5ms or less.

Select VS CTAS

Pinson, January 23, 2014 - 2:10 pm UTC

Hi Tom,

Have a strange case with a complex select:

- when ran as it is, it takes 103 seconds to complete
- if I do a CREATE TABLE AS SELECT with the very same select, it completes in 0.2 sec.

In both cases the set of retrieved rows are identical, and I have paid atention to the buffer cache effect, this does not come from the fact that one query takes profit of the buffer cache filled by the other.

I have looked into the plans, and:

- in the case of the "select" alone, the plan uses a hash join of the involved 3 big tables, which seems a correct choice, as we join it on FK columns without any condition,
- in the case of the "CTAS", the plan uses NL and indexes to join those 3 big tables, no hash join, no full scan, which seems inappropriate. But still it is magnitude faster.

I wonder if you have encountered this somewhere, if this is a big classic that I have never heard before. Is the "create table as select" making the select execution somehow at a physical level instead of using buffer cache, so the access methods are different, Oracle internally speaking?

I can provide query, plans, and indexes on the tables, however, it is huge, and I did not want to pollute my review with all this unless you judge it usefull to provide an explanantion. For now, I would just wish to know if, conceptually, this seems normal.

Thanks a lot for reading me,

Pinson




More to Explore

Performance

Get all the information about database performance in the Database Performance guide.