Consider Cost or Time - trying to compare the COST of two queries
Venkateshwaran, March 15, 2002 - 1:27 pm UTC
It was very useful information.
Thanks Tom.
then how do we tune SQL?
A reader, May 26, 2002 - 2:11 pm UTC
hi tom
if cost is not relative in 2 queries or even same query but with different execution plan (onme with hint one without) what can we use to base our tuning SQL strategies? The only way is run it and see which one is faster...? That would sound blizzard though
for example if I have
select nom_proceso
from md_grupo a, md_aplicacion b, md_tipo_carga c, md_proceso d
where a.cod_grupo=b.cod_grupo
and b.cod_aplicacion=c.cod_aplicacion
and c.cod_tipo_carga=d.cod_tipo_carga
and d.cod_tipo_carga=4
I get cost 5
if I add a hint
select /*+ ORDERED */ nom_proceso
from md_grupo a, md_aplicacion b, md_tipo_carga c, md_proceso d
where a.cod_grupo=b.cod_grupo
and b.cod_aplicacion=c.cod_aplicacion
and c.cod_tipo_carga=d.cod_tipo_carga
and d.cod_tipo_carga=4
I get cost 7, since you said we cannot compare the costs of queries can I say the second statement is faster if it returns the rows faster?
May 26, 2002 - 3:13 pm UTC
why does it sound bizzare?
You can say NOTHING about the speed of query two over query one. All you've done is use a hint to artifically inflate the costs used by the CBO internally for all other access plans. Hints work explicitly by playing with the internal costs -- making the path you want us to take look more appealing.
If getting the lowest cost resulted in the fastest query -- the CBO would get it perfect every time without your help ever!
ok then the only way is running it
A reader, May 26, 2002 - 4:24 pm UTC
thank you tom
I will run the queries now, instead of comparing costs
cheers
Cost
Andre, May 27, 2002 - 12:07 pm UTC
Speaking of costs, I'm trying to mesure the cost of a query using SQL*PLUS's autotrace. I've noticed tha "physical reads" stat in autotrace seems to be ONLY the blocks fetched though IO, whereas "physical reads" in V$SYSSTAT is the combination of db block gets, consistent gets both from IO and memory. It is right ?
May 27, 2002 - 3:14 pm UTC
physical reads on v$sysstat is physical IO as well. On a single user system:
ops$tkyte@ORA817DEV.US.ORACLE.COM> select name, value from v$sysstat where name = 'physical reads';
NAME VALUE
------------------------------ ----------
physical reads 7612
ops$tkyte@ORA817DEV.US.ORACLE.COM> set autotrace traceonly statistics;
ops$tkyte@ORA817DEV.US.ORACLE.COM> select count(*) from all_objects;
Statistics
----------------------------------------------------------
261 recursive calls
4 db block gets
76412 consistent gets
497 physical reads
0 redo size
369 bytes sent via SQL*Net to client
425 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
8 sorts (memory)
0 sorts (disk)
1 rows processed
ops$tkyte@ORA817DEV.US.ORACLE.COM> set autotrace off
ops$tkyte@ORA817DEV.US.ORACLE.COM> select name, value from v$sysstat where name = 'physical reads';
NAME VALUE
------------------------------ ----------
physical reads 8116
ops$tkyte@ORA817DEV.US.ORACLE.COM> select 8116-7612 from dual;
8116-7612
----------
504
ops$tkyte@ORA817DEV.US.ORACLE.COM>
That shows that whilst the query itself tool 76,412 logical IO's, there were only about 500 physical IO's performed -- both via autotrace reporting as well as v$sysstat querying. Remember v$sysstat is SYSTEM statistic, autotrace uses v$sessstat SESSION statistics.
Couple of queries....
Ruck, May 27, 2002 - 9:42 pm UTC
Tom,
Your explanation is very useful. But I still have couple of queries.
1. Does it mean the lowest cost may not get you the fastest query path?
2.How does this cost is calculated? Just based on the resource utilization and it doesn't bother about the execution speed of the query?
Thanks,
Ruck.
May 28, 2002 - 7:10 am UTC
1) yes.
2) The cost is a set a numbers assigned by the optimizer to various steps of a query based on available statistics. These costs are then factored together to come up with an over all query cost -- a single number that assigns a relative "cost" to a query plan. We then pick the plan for a given query with the lowest cost. In theory, this will be the best performing query (that after all is the goal). In reality, it doesn't work that way 100% of the time.
can we compare cost of SAME query?
A reader, June 14, 2002 - 7:22 am UTC
hi tomn
you said we cannot compare cost of 2 queries but can we compare of same query but different optimizer_mode such as
first_rows and all_rows????
if for example
select * from emp in first_rows I get cost 10 and in all_rows cost 20 does that mean that first_rows is better?
June 14, 2002 - 7:34 am UTC
NO. NO. NO. NO.
The optimizer mode is a tool you can use to ALTER (on purpose) the COST associated with various steps.
That is what your example shows. By changing the optimizer goal, we've increased the cost of some process that was "cheap" in the first rows case (eg: index access is "cheap" in first rows). All rows makes that "more expensive". For all we know -- the SAME plan might be generated with different costs! what then, would one be better then the other?
You cannot compare costs across queries, even if the queries are "identical".
You cannot make any judgments about the relative performance of a query based on the cost (queries with tiny costs may take days to run, queries with huge costs could run in milliseconds).
I wish we did not even expose this number so easily .....
Finally We Can Put This (apparent) MisConception to Bed!!!
Robert Wood, June 14, 2002 - 12:56 pm UTC
Hi Tom,
Thanks for being a breath of fresh air in the stuffy, aloof world of Oracle information!
Can you please confirm, yes or no....
What I hear you saying is this...
1) There is NO information we can use from the cost value.
2) We should COMPLETELY IGNORE the cost value.
Thanks,
Robert.
June 14, 2002 - 2:05 pm UTC
Well, no. I did not say that. I said:
... I wish we did not even expose this number so easily ....
they are useful, when properly used (eg: when you are TRYING to influence the optimizer via hints and TRYING to figure out why it is doing what it is doing).
Jonathan Lewis's most awesome book -- "Practical Oracle8i" shows how to practically use the cost in diagnosing a query for example.
I think we are missing a point here
An Oracle fan, June 14, 2002 - 1:20 pm UTC
Comparing the same query using COST is perfect legitimate because that is what the COST is for. What we expect is the optimizer to permute all possible combinations and pick up the plan with the least cost. If you could rewrite the same SQL to get a cheaper cost theoretically it should be faster. If it is not I guess it is an optimizer bug. Also if the COST is not just based on logical reads but on the CPU, then comparing the SQL in the same instance is still legitimate.
June 14, 2002 - 2:31 pm UTC
I'll agree with one statement you make:
"Comparing the same query using COST is perfect legitimate because that is what
the COST is for. What we expect is the optimizer to permute all possible
combinations and pick up the plan with the least cost."
that is accurate. After that, you are missing the point.
I don't know how to say it
I'll shout
THE COST HAS NOTHING WHATSOEVER TO DO WITH THE RUNTIME PERFORMANCE OF A QUERY.
YOU CANNOT COMPARE THEM AND SAY "THIS WILL BE FASTER"
IT IS NOT A BUG, IT IS A DESIGN FEATURE, THE WAY IT WORKS. IT IS DOING THIS ON PURPOSE.
YOU ARE PROJECTING YOUR THOUGHTS OF HOW YOU WOULD HAVE DONE IT -- instead of understanding how it is actually done. (hey, i have a case study of that in my book actually -- what happens when you assume something is true and it is false.)
I give up. Explain this output to me if you can. The queries do the same thing, they have the same cost. why are the performance characteristics so radically different? (hint -- it is not a bug, repeat, it is not a bug. answer, well, apparently you cannot really compare them... the costs that is)
ops$tkyte@ORA817DEV.US.ORACLE.COM> variable x varchar2(25)
ops$tkyte@ORA817DEV.US.ORACLE.COM> set timing on
ops$tkyte@ORA817DEV.US.ORACLE.COM> set autotrace traceonly
ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> select * from big_table a
2 where object_name like '%' || :x;
1101008 rows selected.
Elapsed: 00:05:21.46
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=431 Card=54951 Bytes=5495100)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'BIG_TABLE' (Cost=431 Card=54951 Bytes=5495100)
2 1 INDEX (RANGE SCAN) OF 'OBJECT_NAME_IDX' (NON-UNIQUE) (Cost=52 Card=54951)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
1166039 consistent gets
21649 physical reads
256 redo size
102352990 bytes sent via SQL*Net to client
8147825 bytes received via SQL*Net from client
73402 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1101008 rows processed
ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> select /*+ full( a ) */ * from big_table a
2 where object_name like '%' || :x;
1101008 rows selected.
Elapsed: 00:01:50.15
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=431 Card=54951 Bytes=5495100)
1 0 TABLE ACCESS (FULL) OF 'BIG_TABLE' (Cost=431 Card=54951 Bytes=5495100)
Statistics
----------------------------------------------------------
0 recursive calls
27 db block gets
87396 consistent gets
14409 physical reads
0 redo size
128246913 bytes sent via SQL*Net to client
8147825 bytes received via SQL*Net from client
73402 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1101008 rows processed
ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> set autotrace off
then how the heck these tools...
A reader, June 15, 2002 - 9:38 am UTC
tom
If we cannot tune a query WITHOUT running it then how the heck tools like Oracle Enterprise Manager, Quest SQL Navigator tune a SQL without running it?
June 17, 2002 - 7:10 am UTC
they apply a set of rule, heuristics (like the things I have in my head after doing it for 15 years)...
I can look at a query, and a plan -- and with a KNOWLEDGE of the data (statistics to the software, even more knowledge to a human) -- I can generally "make the query better". Not always, but many times.
That is the software does, it applies a series of rules to the queries and suggests (based on rules of thumb) enhancements to the schema or the query that would make it better.
I've yet to see any software take a query though and say things like "well, if you remove that non-necessary outer join, use the analytic functions instead it'll run much faster". I see them say "add this index" sort of advice (very basic).
Query tuning
Anton, June 16, 2002 - 6:58 pm UTC
Tom.
So, just to clarify, for tuning SQL, we should be using tkprof/explain and looking at the parse/fetch/execute stats and CPU stats and concentrating on lowering those, rather than lowering the query cost ?
PS - keep up thinking !!!
June 17, 2002 - 7:35 am UTC
Well, let me put it this way.
I never look at the cost personally -- unless I'm trying to figure out why the optimizer choose what it did (and even then, you need to use an event to have the optimizer dump more information so you can see the reasons it assigned the costs that it did)
Provide some more info
Same Oracle fan from USA, June 21, 2002 - 4:50 pm UTC
Could you provide the information like the DB_FILE_MULTIBLOCK_READ_COUNT, the statistics of the BIG_TABLE like BLOCKS, NUM_ROWS, and the index statistics like LEAF_BLOCKS, CLUSTERING_FACTOR, NUM_ROWS etc.,
Are we sure the statistics are correct and not manipulated? What if I set the index statistics to something crazy? It is really bizarre to see the COST of INDEX ACCESS and FTS to be same. Have you got any other example without using "%"?
I still believe comparing the COST for the same query is directly proportional to its run time.
June 21, 2002 - 6:57 pm UTC
You can do it yourself.
Frankly, I don't think anyone has accused me of "faking" a result to make a point before. Geez.
Yes, the stats were correct
No, the stats were not manipulated
but you know what -- EVEN IF THEY WERE, IT TOTALLY PROVES MY POINT, factually I might add -- not by belief.
Anyway, it is not bizzare -- it is (as I said) to be expected. The costs of two different queries cannot be compared, not judgement as to performance may be made, you -- you should definitely consider them random numbers. Others, who read Jonathan Lewis's great book -- they can use them rationally, to understand WHY the optimizer did what it did.
It is not about what you believe to be true. It is about what is in fact true vs not true.
Here is the test, done on a system running:
big_table@ORA817DEV.US.ORACLE.COM> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle8i Enterprise Edition Release 8.1.7.3.0 - Production
PL/SQL Release 8.1.7.3.0 - Production
CORE 8.1.7.0.0 Production
TNS for Solaris: Version 8.1.7.3.0 - Production
NLSRTL Version 3.4.1.0.0 - Production
big_table@ORA817DEV.US.ORACLE.COM> show parameter db_file_mul
NAME TYPE VALUE
------------------------------------ ------- ------------------------------
db_file_multiblock_read_count integer 100
big_table@ORA817DEV.US.ORACLE.COM> show parameter block_size
NAME TYPE VALUE
------------------------------------ ------- ------------------------------
db_block_size integer 8192
big_table@ORA817DEV.US.ORACLE.COM> exec show_space( 'BIG_TABLE' )
Free Blocks.............................0
Total Blocks............................20096
Total Bytes.............................164626432
Unused Blocks...........................63
Unused Bytes............................516096
Last Used Ext FileId....................7
Last Used Ext BlockId...................27529
Last Used Block.........................1
PL/SQL procedure successfully completed.
Here is the test you can run yourself. Nuff said. Facts is facts, beliefs -- well, nuff said.
big_table@ORA817DEV.US.ORACLE.COM> create table big_table as select * from all_objects;
Table created.
big_table@ORA817DEV.US.ORACLE.COM>
big_table@ORA817DEV.US.ORACLE.COM> insert /*+ append */ into big_table select * from big_table;
22811 rows created.
big_table@ORA817DEV.US.ORACLE.COM> commit;
Commit complete.
big_table@ORA817DEV.US.ORACLE.COM> insert /*+ append */ into big_table select * from big_table;
45622 rows created.
big_table@ORA817DEV.US.ORACLE.COM> commit;
Commit complete.
big_table@ORA817DEV.US.ORACLE.COM> insert /*+ append */ into big_table select * from big_table;
91244 rows created.
big_table@ORA817DEV.US.ORACLE.COM> commit;
Commit complete.
big_table@ORA817DEV.US.ORACLE.COM> insert /*+ append */ into big_table select * from big_table;
182488 rows created.
big_table@ORA817DEV.US.ORACLE.COM> commit;
Commit complete.
big_table@ORA817DEV.US.ORACLE.COM> insert /*+ append */ into big_table select * from big_table;
364976 rows created.
big_table@ORA817DEV.US.ORACLE.COM> commit;
Commit complete.
big_table@ORA817DEV.US.ORACLE.COM> insert /*+ append */ into big_table select * from big_table;
729952 rows created.
big_table@ORA817DEV.US.ORACLE.COM> commit;
Commit complete.
big_table@ORA817DEV.US.ORACLE.COM>
big_table@ORA817DEV.US.ORACLE.COM> create index object_name_idx on big_table(object_name) parallel 8 unrecoverable;
Index created.
big_table@ORA817DEV.US.ORACLE.COM>
big_table@ORA817DEV.US.ORACLE.COM> analyze table big_table compute statistics
2 for table
3 for all indexes
4 for all indexed columns;
Table analyzed.
big_table@ORA817DEV.US.ORACLE.COM>
big_table@ORA817DEV.US.ORACLE.COM> variable x varchar2(25)
big_table@ORA817DEV.US.ORACLE.COM> set autotrace traceonly explain
big_table@ORA817DEV.US.ORACLE.COM>
big_table@ORA817DEV.US.ORACLE.COM> select *
2 from big_table a
3 where object_name like '%' || :x;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=577 Card=72996 Bytes=7080612)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'BIG_TABLE' (Cost=577 Card=72996 Bytes=7080612)
2 1 INDEX (RANGE SCAN) OF 'OBJECT_NAME_IDX' (NON-UNIQUE) (Cost=69 Card=72996)
big_table@ORA817DEV.US.ORACLE.COM>
big_table@ORA817DEV.US.ORACLE.COM> select /*+ full(a) */ *
2 from big_table a
3 where object_name like '%' || :x;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=577 Card=72996 Bytes=7080612)
1 0 TABLE ACCESS (FULL) OF 'BIG_TABLE' (Cost=577 Card=72996 Bytes=7080612)
big_table@ORA817DEV.US.ORACLE.COM>
big_table@ORA817DEV.US.ORACLE.COM> set autotrace off
big_table@ORA817DEV.US.ORACLE.COM>
big_table@ORA817DEV.US.ORACLE.COM> select count(*) from big_table;
COUNT(*)
----------
1459904
I totally agree with you. I have one question though?
Prince, June 22, 2002 - 2:42 am UTC
Isn't it almost always, applying hint to a normal query (query without any hint) has higher cost.
I tested many cases and its the case in all my test cases.
I believe, oracle might already have considered the path (which I choose with the hint) and found out that it was costlier than the one chosen by oracle (present plan).
Just academic interest, Do you have an example where we can add some hints and still can get a lower cost?
Thanks,
June 22, 2002 - 9:43 am UTC
Yes, generally.
But as to "in all my test cases" -- i just gave you one where this is not the case, so you know now that it is not a rule, but a generalization.
Oracle will not "reason" about the query -- which is what your 3 paragraph seems to imply (be neat, maybe in Oracle20i). It applies rules, assigning numbers (costs) to various access paths. It weights those numbers, adds em up, and compares. Nothing more, nothing less. The "magic" is in the assignment of the numbers only. (It is really has nothing to do with "how long" something will take to execute -- just the number / weight assigned to a specific operation)
Here is an example going the other way:
big_table@ORA817DEV.US.ORACLE.COM> alter session set optimizer_goal=first_rows;
Session altered.
big_table@ORA817DEV.US.ORACLE.COM> select * from big_table where object_name is not null order by object_name;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=1454076 Card=1459904 Bytes=141610688)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'BIG_TABLE' (Cost=1454076 Card=1459904 Bytes=141610688)
2 1 INDEX (FULL SCAN) OF 'OBJECT_NAME_IDX' (NON-UNIQUE) (Cost=7356 Card=1459904)
big_table@ORA817DEV.US.ORACLE.COM> select /*+ all_rows */ * from big_table where object_name is not null order by object_name;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=HINT: ALL_ROWS (Cost=46883 Card=1459904 Bytes=141610688)
1 0 SORT (ORDER BY) (Cost=46883 Card=1459904 Bytes=141610688)
2 1 TABLE ACCESS (FULL) OF 'BIG_TABLE' (Cost=577 Card=1459904 Bytes=141610688)
So, here, by overriding the environment (the environment consists of many init.ora parameters other then optimizer goal - multi-block-read-count, sort-area-size, hash-join-enabled, etc etc etc) I got a query via a hint with a much lower cost.
Does it run faster? Maybe, maybe not ;)
Here we go again
Same Oracle fan, June 27, 2002 - 3:21 pm UTC
Ok, let me rephrase this again. You can compare the same SQL when you are trying to tune that SQL. If you can tune the SQL to get a lower cost than whatever CBO comes up with, it should perform better (lets not get to the run time because it depends on the physical characteristics like file system, disk speed etc.) I'm not wishing CBO to behave like this but thats how CBO works.
Cost CANNOT be treated as a random number. Till 8i Cost is a metric based on the I/O (logical reads) and starting 9i there is an option to make the Cost based on CPU also along with I/O (calling the API DBMS_STATS.GATHER_SYSTEM_STATS)
The very fact COST is exposed is to give the user an idea about the unit of work or resource it is estimating to use for the SQL.
I'm disappointed to see you pick up a SQL which is useless to prove your point. You are anyhow interested in selecting all the rows from the table and you are hiding that fact behind "like and %". Since CBO goes by the stats provided to it from dictionary tables and by the selectivity estimates assigned already, "like" results in 5% selectivity of the total rows (5%(1459904) = 72996). Thats why it always comes up with an index access plan. If you wanted to prove your point you can always pass null to your bind variable and let the SQL selects all the rows using INDEX which will be the most expensive way than FTS and claim your plan with FTS hint is faster than the default CBO plan.
If you know, the user will not pass a value, then why introduce a bind variable?
What if, you passed a value for your bind variable? Have you tested this scenario?
I'm not saying all the above after reading from a manual. I've been working on CBO for couple of years and I'm saying the above from my experience. I can also reiterate, if you can come up a tuned SQL where the COST is cheaper than the default CBO plan, then it is a CBO bug. It is not what I wish, it is how it is.
June 27, 2002 - 4:57 pm UTC
(for another question, I added a column ID with a primary key on it and reanalyzed the table and all indexes and all indexed columns)
Then, according to you, this is a "bug" right? (just want to be clear on this)
big_table@ORA817DEV.US.ORACLE.COM> select * from big_table where exists ( select null from big_table bt2 where bt2.id = big_table.id );
1459904 rows selected.
Elapsed: 00:03:34.90
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=577 Card=72996 Bytes=7518588)
1 0 FILTER
2 1 TABLE ACCESS (FULL) OF 'BIG_TABLE' (Cost=577 Card=72996 Bytes=7518588)
3 1 INDEX (UNIQUE SCAN) OF 'BIG_TABLE_PK' (UNIQUE) (Cost=2 Card=1 Bytes=5)
Statistics
----------------------------------------------------------
0 recursive calls
105 db block gets
4495763 consistent gets
15584 physical reads
0 redo size
185831456 bytes sent via SQL*Net to client
10803611 bytes received via SQL*Net from client
97328 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1459904 rows processed
big_table@ORA817DEV.US.ORACLE.COM> select * from big_table where id in ( select id from big_table );
1459904 rows selected.
Elapsed: 00:02:41.62
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1760 Card=1459904 Bytes=157669632)
1 0 HASH JOIN (Cost=1760 Card=1459904 Bytes=157669632)
2 1 INDEX (FAST FULL SCAN) OF 'BIG_TABLE_PK' (UNIQUE) (Cost=90 Card=1459904 Bytes=7299520)
3 1 TABLE ACCESS (FULL) OF 'BIG_TABLE' (Cost=577 Card=1459904 Bytes=150370112)
Statistics
----------------------------------------------------------
0 recursive calls
147 db block gets
119328 consistent gets
15622 physical reads
0 redo size
185831456 bytes sent via SQL*Net to client
10803611 bytes received via SQL*Net from client
97328 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1459904 rows processed
Two queries, semantically identical (will always give the same answer). Cost of faster one is 3x that of the slower under identical conditions.
It is still a bug !!
An Oracle fan, July 17, 2002 - 4:42 am UTC
Yes, this is a bug. Actually if you try your same query in 9i database this issue is fixed. You should see your query with EXISTS will show more COST than the one with IN.
I tried the same test case as yours in 9.2 and here are the numbers.
Case 1)
select * From big_test b
where exists
(select 1 from big_test b1
where b.id = b1.id);
500000 rows selected.
Elapsed: 00:02:46.36
Execution Plan
----------------------------------------------------------
SELECT STATEMENT Optimizer=CHOOSE (Cost=50190 Card=25000 Bytes=250000)
FILTER TABLE ACCESS (FULL) OF 'BIG_TEST' (Cost=190 Card=25000 Bytes=250000)
INDEX (UNIQUE SCAN) OF 'BIG_TEST_U1' (UNIQUE) (Cost=2 Card=1 Bytes=4)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
1534495 consistent gets
0 physical reads
0 redo size
9018679 bytes sent via SQL*Net to client
367166 bytes received via SQL*Net from client
33335 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
500000 rows processed
Case 2)
select * From big_test b
where id in
(select id from big_test b1);
500000 rows selected.
Elapsed: 00:02:14.11
Execution Plan
----------------------------------------------------------
SELECT STATEMENT Optimizer=CHOOSE (Cost=639 Card=500000 Bytes=7000000)
HASH JOIN (Cost=639 Card=500000 Bytes=7000000)
INDEX (FAST FULL SCAN) OF 'BIG_TEST_U1' (UNIQUE) (Cost=160 Card=500000 Bytes=2000000)
TABLE ACCESS (FULL) OF 'BIG_TEST' (Cost=190 Card=500000 Bytes=5000000)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
35545 consistent gets
0 physical reads
0 redo size
9018679 bytes sent via SQL*Net to client
367166 bytes received via SQL*Net from client
33335 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
500000 rows processed
Note the COST difference. In 8.1.7 the EXISTS showed a cheaper cost than IN which is a bug.
Controversial topic
Tahir, January 06, 2003 - 2:56 pm UTC
Tom I have tested both scenarios as presented by Oracle Fan and theoretically I agree it does make sense for COST to be a tuning metric but after testing in both 8.1.7 and 9.2.0 I did not find the evidence suggesting that 8.1.7 CBO has a bug which is fixed in 9.2. For instance COST of "IN" and "EXISTS" queries has exactly the same COST in 9.2 which should make "IN" less costly. I see that its different form 8.1.7 which is expected but where is the bug? Could you please confirm that if you find the same? I did not see any further discussions on this controversial topic.
Thank you.
10:11:33 TK@DB92> select * from
10:11:34 2 cbo_test where exists (select 1 from cbo_test ct
10:11:34 3 where cbo_test.object_id = ct.object_id)
10:11:34 4 /
1067520 rows selected.
Execution Plan
---------------------------------------------------------- 0
SELECT STATEMENT Optimizer=CHOOSE (Cost=91721 Card=68321280 Bytes=6900449280) 1 0
MERGE JOIN (Cost=91721 Card=68321280 Bytes=6900449280)
2 1
SORT (UNIQUE) 3 2 INDEX (FULL SCAN) OF 'ID_IDX' (NON-UNIQUE) (Cost=2359 Card=1067520 Bytes=4270080)
4 1
SORT (JOIN) (Cost=89362 Card=1067520 Bytes=103549440)
5 4
TABLE ACCESS (FULL) OF 'CBO_TEST' (Cost=1433 Card=1067520 Bytes=103549440)
Statistics
----------------------------------------------------------
0 recursive calls
195 db block gets
17090 consistent gets
67915 physical reads
0 redo size
9320953 bytes sent via SQL*Net to client
498444 bytes received via SQL*Net from client
71169 SQL*Net roundtrips to/from client
1 sorts (memory)
1 sorts (disk)
1067520 rows processed
10:22:38 TK@DB92> select * from cbo_test
10:23:10 2 where object_id in (select ct.object_id from cbo_test ct);
1067520 rows selected.
Execution Plan
----------------------------------------------------------
0
SELECT STATEMENT Optimizer=CHOOSE (Cost=91721 Card=68321280 Bytes=6900449280) 1 0
MERGE JOIN (Cost=91721 Card=68321280 Bytes=6900449280)
2 1
SORT (UNIQUE)
3 2
INDEX (FULL SCAN) OF 'ID_IDX' (NON-UNIQUE) (Cost=2359 Card=1067520 Bytes=4270080)
4 1
SORT (JOIN) (Cost=89362 Card=1067520 Bytes=103549440)
5 4
TABLE ACCESS (FULL) OF 'CBO_TEST' (Cost=1433 Card=1067520 Bytes=103549440)
Statistics
---------------------------------------------------------
0 recursive calls
195 db block gets
17090 consistent gets
67967 physical reads
0 redo size
9747969 bytes sent via SQL*Net to client
498444 bytes received via SQL*Net from client
71169 SQL*Net roundtrips to/from client
1 sorts (memory)
1 sorts (disk)
1067520 rows processed
January 06, 2003 - 3:29 pm UTC
you have a different test, not knowing what the parameters are, cannot really comment.
Just different table name
Tahir, January 06, 2003 - 4:16 pm UTC
I used different table name but other than that it's all the same.
TK@DB92> select * from v$version;
BANNER
------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.1.0 - 64bit Production
PL/SQL Release 9.2.0.1.0 - Production
CORE 9.2.0.1.0 Production
TNS for Solaris: Version 9.2.0.1.0 - Production
NLSRTL Version 9.2.0.1.0 - Production
select name, type, value
from v$parameter
where name like 'db_file_mul%'
NAME TYPE VALUE
------------------------------- ---------- ------
db_file_multiblock_read_count 3 16
db_block_size = 8192
1. create table cbo_test as select * from all_objects;
2. create index id_idx on cbo_test(object_id) parallel 8 unrecoverable;
3. analyze table big_table compute statistics
for table for all indexes for all indexed columns;
4. insert /*+ append */ into cbo_table select *
from cbo_table; to generete enough data.
January 06, 2003 - 4:21 pm UTC
some other setting is different -- i see it chose a sort merge (i got his same results by the way).
perhaps you have disabled hash joins or you have an enourmous sort area. Try with as vanilla an init.ora as you can and see.
Good catch
Tahir, January 06, 2003 - 4:40 pm UTC
I in fact did set hash_join_enabled=false but that is the only one at lest I can recall. So is it still a bug and COST can not/should not be used as a measuring stick?
January 06, 2003 - 4:51 pm UTC
I do not use cost as a measuring stick, no.
A reader, January 07, 2003 - 6:59 am UTC
in data warehouse database i need to run the query fast and i care about only
how can i get output fast so i wonder about CBO IT IS MAY BE not the fast
Cost of query is not related to performance of query
Wolfgang Breitling, January 07, 2003 - 11:40 am UTC
To those who cling to the illusion that the "cost" of an explained sql is in any way related to the performance, read Tom's lips: "It is not".
The cost is used during the parse to pick one plan among all the ones considered (the one with the lowest "cost"). Once it has served that purpose and a plan is chosen, it is meaningless. In particular, it can NOT be compared to the cost of a another explain. If everything is the same between the two explain, the resulting plans and their costs will be the same. If they are not, then something is different and you can not compare the results anymore.
In pre-Oracle9, the "cost" is roughly equal to the estimated number of logical reads (db blocks needed to find the answer). That estimate of the number of blocks to be visited can be different from reality for a number of reasons, only a few of which could be considered bugs. A lot of the reasons for wrong estimates are due to data distribution and dependencies. That is what Tom refers to a human knowledge about the data which goes far beyond what the optimizer can discern from the statistics. For a presentation I am giving at the Hotsos Performance Symposium (IOUG wasn't interested) I have prepared testcases where the same query on tables with virtually identical statistics returns vastly different numbers of rows to show that it is impossible for the optimizer to come up with accurate estimates and therefore the best plan in all cases.
reply to Wolfgang Breitling
Mikito harakiri, January 07, 2003 - 6:30 pm UTC
Of course, cost and execution time are related. Why did they add CPU costing to 9i? Why are they adding caching statistics? Why are they making query rewrite cost based? Those improvements serve a singular purpose: to eliminate any discrepancy between real performance and estimated cost. Because if there is such a discrepancy what is the basis to claim that the selected plan better than just a random plan?
It is well known that query optimization is a hard problem. On a pretty modest query optimizer can easily be confused by complex predicates, corellation between the column values, bind variables, and many other factors. This is why it why cost today is a very rough approximation of the execution time. This, however, should be considered as a bug, not a feature.
Response to Mikito harakiri
Wolfgang Breitling, January 07, 2003 - 10:45 pm UTC
Cost and execution time are NOT related. There are plenty of examples that prove that they are not. The comparison of costs can only be made by the optimizer while it is parsing a sql and is evaluating different access paths. Once an access path is chosen it is invalid to compare its cost number to that of a different plan and draw any conclusion from it.
When parsing a sql the optimizer, in addition to using the gathered statistics - table as well as system in 9i - the optimizer has to make many assumptions about the data (uniform distribution for example in the absence of histogram data) and if the reality differs from the assumption the plan that came out with the lowest cost based on the assumptions can very well be far inferior to a plan that has a higher cost number (based on the assumptions) but is better suited to the reality and hence performs better. It is a rather far stretch to call the inability of the optimizer to accurately estimate certain characteristics of the data in the database, given limited information, a bug.
cost of execution
Birna, February 14, 2003 - 5:01 am UTC
tremendously helpful, thanks
trial outline
Peter, April 23, 2003 - 10:45 am UTC
I think this topic is extremely important because many Oracle programmers use costs to optimize their SQLs' performance. This can happen because practice justifies them: in everyday practice smaller cost means faster execution almost always (for semantically equal selects). This way there is relation between cost and execution time -- at least in a statistical sense.
Taking notice of all the above follows the following.
Oracle (CBO) must pick up one from the possible execution plans. It tries to pick up the "best" one. The decision is based on costs (...). "Best" mostly means fastest. In Oracle 9 in general it is impossible to determine exactly which plan will result in the fastest execution. But usually it is possible exactly enough (through costs). This way cost represents execution time usually with sufficient accuracy. Maybe this is a side effect (so one can say "they are not related") but this side effect is the substance of cost. And Oracle (CBO) uses it essentially for this. As newer Oracle versions come the relation is keep getting tighter between cost and time.
Are these true?
April 23, 2003 - 7:17 pm UTC
well, yes and no, and sort of.
Since we'll always have knobs and dials to twist -- to artificially adjust costs (optimize_index_* for example, mulitblock read counts, and so on...) the relationship between COST and an amount of time will always be fuzzy.
The ultimate goal is to have it so that for a given set of plans generated for an individual query -- if all external influences are described correctly (we do that with init.ora parameters today), the plan with the lowest cost FOR THAT query will be the fastest plan for that query.
Will you be able to compare that cost X for query Y to cost A for query B and say query Y will run faster/slower then B -- weeeelll.... I'm not 100% convinced yet that it is truly obtainable 100% of the time. Incremental improvements will get us there eventually I suppose -- but it is a really hard task to undertake.
so: how?
Peter, April 24, 2003 - 11:14 am UTC
Thank you for your quick answer! Now I am sure that the hinge is to understand that the relation (between cost and time) is of statistical nature that is why anyone can show off individual examples to verify his/her standpoint and that is why these examples doesn't really prove the strict relation or the very opposite of it. And that is why we can use costs for tuning our SQLs but we never can be sure of our conclusions drawn from costs.
Above you confirmed that (if we need more reliable information) "for tuning SQL, we should be using tkprof/explain and looking at the parse/fetch/execute stats and CPU stats". Please explain this. What are the steps of getting these data? What system tables/views should we examine and when? What those data means? Etc. (Feel free to refer to the points of the Oracle (or other freely available) documentation.)
TIA.
sql timing
Tony, August 26, 2003 - 2:00 pm UTC
Tom,
Thanks a lot for all your help,
When we set timing on from sqlplus does the resulting time include the time to desplay all the results, my query returns 10,000 rows, just wondering if I can just get the actual time without displaying the results on screen (if possible)
August 27, 2003 - 7:47 am UTC
set timing on
set autotrace traceonly
run query
is one way -- or
alter session set sql_trace=true;
set termout off
run query
set termout on
in a script and use tkprof
Still in doubt - Can read ur lips but dont know if I should agree
Mahesh, September 02, 2003 - 2:26 am UTC
Scenario - CBO used rest all conditions vanilla
Query 1 and Query 2 - different queries returning same result set.
explain plan performed for both queries but queries not actually executed.
Consider cost of query 1 to be higher than cost of query 2.
Logout and restart database - run query 1 and check the time taken.
Logout and restart database again - run query 2 and check the time taken.
What do u anticipate about the time taken by each query and its relation to the cost if any ? What are the actual results ?
September 02, 2003 - 7:28 am UTC
no ideas...
(and you would have to reboot server, if you were trying to flush buffer cache, since you are probably using a buffered file system -- not that is matters since rebooting the server is as "not real world" as restarting the database)
Could you give some idea after trying ?
Mahesh, September 02, 2003 - 10:01 pm UTC
The reason I am trying to do this is as follows -
1. If Cost is not to be considered for better query decisions, then Oracle works like -
Query 1 is submitted returning a data set. Optimiser randomly assigns it a cost 200 and then finds multiple ways to do it better and assigns the best way to execute it at say 180 and hence thats the plan on which it will execute that query.
Query 2 is submitted returning same data set. Optimiser randomly assigns a cost of 300 and then finds multiple ways to do it better and assigns the best way to execute it at say 260 and hence thats the plan on which it will execute that query.
Since the algorithm generates the first cost used to compare amongst various execution plans for a query totally at random hence the cost is not comparable between two queries. Is that the best explanation ?
2. The reason I feel case 1 above is not true. Optimiser while deciding on various execution plans for query 1 should have considered the execution plan for query 2 as both return same result set OR vice versa. Am I right ?
If yes then cost can be considered for query performance decisions.
If no then the first random cost assignment for a query by the optimiser algorithm holds true.
Lastly, the reason I am doing this analysis. When a developer comes to a DBA and says which query is better in terms of time performance, the DBA should be able to say query 1 will take less time than query 2 considering that it is the only query fired on a newly started database. This is because if I fire query 1 or query 2 in whatever sequence the time for 2nd query is going to be affected by presence of some data blocks in buffer and so on. Thats why the database restart.
But the fact being in real world you dont want to always fire 2 queries and then note that their performance times and decide which is a better query.
This all reasoning to say that cost can be used to decide on query performance time is going to be wrong if you say
"Optimiser algorithm assigns the cost completely at random each time a query is submitted even though the queries may be returning the same result set."
I would appreciate getting your views in details rather than "no ideas". I understand its a well chewed topic but unfortunately the CBO is still there and query performance is a big time issue always. Please help.
September 03, 2003 - 6:49 am UTC
i understand what you are saying -- I'm also saying given the numerical information no one can with 100% certaintly say ANYTHING accurate.
they can only say "what is supposed to happen", not what will.
the numbers from the CBO are a predication based on its knownledge of the data, the system, and many other factors.
A tweak here (change a setting) and the numbers change radically. The same query -- same plan -- lower cost. Now, did tweaking that parameter really make the query go faster?? if that setting were db_file_multi_block_read_count -- maybe it did. if that setting was optimizer_index_cost_adj -- no way, that won't affect runtime performance -- just the costing of the query.
#1 of what you say above is not accurate at all. there is no randomness involved anywhere. it is a mathematical model into which inputs are fed and an output derived.
a database restart, as I said, is wholly insufficient to clear blocks from caches AND furthermore is totally non-real world.
Randomness ruled out....
Mahesh, September 03, 2003 - 11:39 pm UTC
Ok then one query to help me try and conclude my understanding
Query 1 works on a table returns 400000 rows out of a million.
Query 2 works on the same table returns same 400000 rows out of a million.
Will the optimiser algorithm while working out best execution plan for query 1 as part of its selection process assign a cost to a possible execution plan for query 2 as both return the same result set and then make a choice of execution plan for query 1 on basis of least cost ?
Would it be true vice versa as well ?
September 04, 2003 - 8:56 am UTC
query plans are generated for a query -- without looking at other query plans. they are generated independently.
i don't see how query 1 and query 2 could be even remotely related to eachother in general. they are apples and toaster ovens -- nothing but nothing in common.
Its cleared a lot of cobwebs and concepts of COST
Mahesh, September 04, 2003 - 8:06 pm UTC
Thanks that explains a lot.
Rules which govern the Rule based
Anil Pant, September 05, 2003 - 9:08 am UTC
Can u list down the rules which govern when rule based optimizer is set ? Else is there any book or link where I can go through
Thanks
Anil Pant
September 05, 2003 - 4:15 pm UTC
easier to say when it WON'T be used.
the cbo is used when:
o any of the tables accessed in a query are analyzed and the optimizer mode is not set to rule
o you access an object that requires the CBO -- such as a partitioned table, parallel table, index organized table and so on.
o your optimizer goal is set to anything other then choose or rule
o you use a hint
so, if you have analyzed tables, access a "special" table, force it via a optimizer mode or hint -- you are using the CBO.
Else, you are using the rbo.
Rule v/s Cost ........
Anil Pant, September 08, 2003 - 12:49 am UTC
You said cost is used when ...
(a) you access an object that requires the CBO -- such as a partitioned table, parallel table, index organized table and so on.
Only if I set optimzer as Cost or Choose then Cost is used in above case. If I set as Rule then Rule is used but still it may effect the performance. Im I correct ?
(b) you use a hint
so, if you have analyzed tables, access a "special" table, force it via a optimizer mode or hint -- you are using the CBO.
I could not completely get this statement. Pls explain
September 08, 2003 - 11:05 am UTC
a) no, incorrect. you will NEVER use the rbo against a partitioned table -- it quite simply doesn't recognise that object. We can see this simply:
ops$tkyte@ORA920LAP> create table t1 ( x int primary key ) organization index;
Table created.
ops$tkyte@ORA920LAP> create table t2 ( x int primary key );
Table created.
ops$tkyte@ORA920LAP>
ops$tkyte@ORA920LAP> alter session set optimizer_mode=rule;
Session altered.
ops$tkyte@ORA920LAP> set autotrace traceonly explain;
ops$tkyte@ORA920LAP> select * from t1;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=RULE (Cost=4 Card=2000 Bytes=26000)
1 0 INDEX (FAST FULL SCAN) OF 'SYS_IOT_TOP_40222' (UNIQUE) (Cost=4 Card=2000 Bytes=26000)
ops$tkyte@ORA920LAP> select * from t2;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=RULE
1 0 TABLE ACCESS (FULL) OF 'T2'
ops$tkyte@ORA920LAP> set autotrace off
see how the IOT has the cost/card/bytes -- that means "cbo was definitely used". the rbo doesn't understand "iot"
b) i was saying
so, if you are
o accessing tables that are analyzed
o access a special table (IOT, partitioned, parallel, etc)
o use the optimizer mode and say "first_rows" for example
o or use a hint
you are pretty much using the CBO. autotrace can be used to see which
what then is a better metric in approximating a faster execution time
bobspa, September 12, 2003 - 1:39 am UTC
tom,
i have come across this thread searching for clues in optimizing query performance.
going back to the 2 query formats giving different costs, you made it so clear that lower optimizer costs does not correlate with faster execution times. well said...
BTW, event 10035 gives us a window as to how the optimizer concluded the best plan.
so, given different execution plans for a given query, is there a way by which we can pick a plan(for faster exec) based on your "heuristics"?
OR, do we just go back to the "test a subset of data then conclude" principle on a "live/production" system. ( doing the tests on a development system then relying solely on elapsed times might not be very conclusive as datafiles/indexfiles are laid out differently thus impacting read seek times).
i was thinking that perhaps the plan with fewer LOGICAL I/O's (consistent gets + buffer gets from the stats) will generally run better, notwitstanding the access paths/join methods used.
what do you think?
thanks
September 12, 2003 - 10:00 am UTC
you cannot go solely by LIO's as you need to see how the query scales. It is easy to have cases where a query does exponentially MORE LIO's as the volume of data scales up. what you need to do is find out how the query scales and that is really really hard. you would need to say "well, when there are 100 rows, it does this, 1,000 rows it does this, 10,000 rows this, 100,000 rows this" and plot the scale. Then, you can sort of forecast what it will do.
it is my belief that test/dev should be a mirror of prod -- if you want smooth running prod's anyway. before inflicting anything on the end users, you would scale test things.
and the metric for scale up would be?
bobspa, September 26, 2003 - 12:19 am UTC
oops ... maybe i wan't very clear on the "... test a subset on a live/production db ..."
i was focusing on this environment from a reactive tuning perspective -- to get an understanding as to why a query is "misbehaving".
so going back to the previous notes from this thread,other LIO threads, tkprofs, etc. and basing on my limited knowledge on the optimizer, when searching for a better access path for a given SQL, what do we do (Given the system params):
1. do an explain plan ==> not too good to rely on as it is just predictive, heavily dependent on table/index stats no matter how outdated they may be, in the absence of stats assign some "heuristic" value per kke.c . right?
2. re-analyzing the tables and indices (plus histograms) on the fly is not likewise very attractive as we don't know how other modules are manipulating our targetted objects i.e. bitmap locks for a range of blocks during an update
3. trying to understand further CBOs behavior, we set event 10035, re-run the sql, tediously review the output.( BTW, would you happen to know the level settings for 10058 ?)
4. if not yet satisfied with access path chosen by CBO, pump in some hints, re-run the query, loop until OK.
*** i believe you are aware that sub-optimal plans generated by CBO are very common for queries involving 6 tables or more, mainly due (perhaps) to the default setting of _optimizer_search_limit to 5 -- which decimates search space pruning ***
all along the tkprof analyses, we are concentrating at cpu times and block visits as these are tightly related.( although i have seen some big hash joins with fewer LIOs, as compared to sort/merge or index-<whatever> scans but eats up a lot of CPU time). never mind the elapsed times and session waits as these are functions of concurrency...
you were saying query scaling... how is that? isn't this a function of the number of CPUs and parallel slaves? purportedly, the more the merrier as long as it does not encourage swapping/trashing/memory shortage, CPU shortage, processes shooting up wildly. these are ENV issues which are always given during tuning. reloading of blocks(thus increasing LIO further) to the cache thru an index range scan is likewise unpredictable as it is a function of data load -- this can't be simulated on a test/dev system, thus the PROD/live option during a lesser workload.
now, i'm confused. given a static ENV, if a query running very fast with 100K rows runs badly at 1000K rows, does'nt scale-up as you say, is that a design flaw, or is it the fact that CBO choose a different access plan believing that it will get a better throughput? a sub-optimal plan, in which case , we add a hint ...
so , what is a better metrics for evaluating if a query's plan is optimal, and will scale-up ?
as an aside, plan stability was introduced to tell CBO not to infringe with our endeared plan, which implies that CBO might opt for a red herring despite the presence of stats, histograms, bitmap indices or otherwise...
your comments please ...
September 26, 2003 - 1:05 am UTC
1) and not only that -- but can you tell from an EP whether it is "good" or not? I can't
3) do you really mean 10053?
4) no, you use hints to prove the optimizer wrong and then we figure out why. i don't like hints in general - in real code.
when I was talking about scale I was talking about testing the query against a subset of the data. that ONLY works if the same query against the full set of data does the same amount of work
eg: in order to get 5 rows of output from query X against 1,000 rows of data, we find it takes the same amount of work as against a 1,000,000,000 row table (the query does the same work regardless of the table size)
or does that query take many orders of magnitude more work to achieve its goal.
or it is somewhere in between.
when testing against a subset, if and only if all of your queries fall into that first category will your work be meaningful. else, it is not. that is what I was saying with regards to "scale". how does the query "scale" as the data volume "scales"
take the query "select * from emp where empno = :x". assuming empno is the primary key -- that query scales. it'll take about the same amount of work regardless of the number of rows in EMP.
take the query "select count(*) from emp where job = :x". that query does not scale in the same way. the more rows in emp, PROBABLY the more work that query must do in order to get the answer -- assuming the number of distinct values of job does not change dramatically.
Oracle Question
Mahesh, September 26, 2003 - 5:21 am UTC
Dear Sir,
I have different question on Orcle. Oracle is prepared in which language?.
For e.g: WE create table and got the message
Table Created.
What is the actual programming behind it. I want to generate such code. Please send me the code then i can write small module of that. What exactly behind it.
Bye
Thank's
Mahesh
September 26, 2003 - 6:23 am UTC
er?
Oracle is written in C.
sqlplus is written in C.
it sounds like you want to rewrite sqlplus. you would code that in C using OCI (the Oracle Call interface) to execute the sql.
Oracle
Mahesh, September 28, 2003 - 3:01 am UTC
Dear sir,
Thank's very much for replying. HOw to use OCI. SEnd me some tips for that. Send me the code also
Mahesh
Mahesh, September 28, 2003 - 11:39 pm UTC
Ok.
Is it 9i doc or 8i doc.
Which one I should read.
Mahesh
September 29, 2003 - 7:33 am UTC
you should read the one that is most relevant to you?
I pointed you to the 9i doc, no version from you -- so i point you to the most current stuff.
rejoinder
bobspa, September 30, 2003 - 5:52 pm UTC
rejoinder:
"1) and not only that -- but can you tell from an EP whether it is "good" or not?
I can't"
what?? if you can't, then nobody does -- this makes almost everybody plain pretenders ...sigh :<
seriously, with your stature, you won't be needing any QEP to tell if a SQL construct is "good" or not. From the other threads, you always give samples that uses fewer LIOs and latches, to persuade that it runs better... per your comments from runstats.sql ... so, how does everything adds-up?
"3) do you really mean 10053?"
yup, not 10035 (erred twice) ... OR, maybe 10060 ...
but the outputs of these events are inconsequential if we are not satisfied with CBOs decision, so we go to item (4), as a last resort ...
going back to scale-up, help me tune my understanding...
aside from the "classic case" of predicates evaluating to primary/unique key accesses, what else scales? certainly not hash joins, range scans, FFS. someone from years ago cries that "scale up" was Oracle's buzzword when PQO was introduced.
October 01, 2003 - 8:14 am UTC
stature or not, you have to benchmark -- I do. I cannot look at just any query and say "its good". I use those tools to try out different ideas using my knowledge of how things work. I can generally come up with a couple of different ways -- all that need to be benchmarked and studied.
Oracle
Mahesh, October 03, 2003 - 5:29 am UTC
Dear sir,
Thanks very much. I will read which most relevant to me.
I will try my level best.
Thanks
Mahesh
Count(*)
Mahesh, October 06, 2003 - 1:07 am UTC
Dear sir,
If I have 20 tables. Each tables contains 10 rows. I want total count from that 2o tables. How to do this?
For e.g : In genrel scott has 3 tables emp,dept and salgrade
In emp : 11 rows
Dept : 5 rows
Salgrade: 4 rows
If I fired below query then I will get output correctly. But this is not a correct method, if we have 40 tables or 1000 tables.
1 select count(*) + a+b cnt from emp,
2 (select count(*) a from dept),
3 (select count(*) b from salgrade)
4* group by a,b
SQL> /
CNT
---------
20
How to do this ?
Thanks
Mahesh
October 06, 2003 - 8:00 am UTC
it is sort of what I would catagorize as "a bad idea" here. I cannot imagine the relevance of "the sum of rows in N tables" -- unless you have a really bad design and the correct answer for that would be "fix it"
you have to hard code each table reference
select (select count(*) from t1)+(select count(*) from t2)+.......
from dual;
is one way. you could write a function using dynamic sql that accepted a tablename as input as well.
ops$tkyte@ORA920> create or replace
2 function get_rows( p_tname in varchar2 ) return number
3 AUTHID CURRENT_USER
4 as
5 l_columnValue number default NULL;
6 begin
7 execute immediate
8 'select count(*)
9 from ' || p_tname INTO l_columnValue;
10
11 return l_columnValue;
12 end;
13 /
Function created.
ops$tkyte@ORA920>
ops$tkyte@ORA920> select sum( get_rows(table_name) )
2 from user_tables
3 /
SUM(GET_ROWS(TABLE_NAME))
-------------------------
66
"Procedure for emp"
Mahesh, October 11, 2003 - 1:16 am UTC
Dear sir,
Please see the following procedure and give me how to solve it.
1 create or replace procedure q4
2 (v_ename out emp.ename%TYPE,
3 v_empno out emp.empno%TYPE,
4 v_sal out emp.sal%TYPE)
5 AS
6 begin
7 FOR i IN 6000..8000 loop
8 select ename, sal into v_ename, v_sal from emp
9 where empno = i;
10 dbms_output.put_line(v_ename);
11 end loop;
12 exception
13 when no_data_found then
14 dbms_output.put_line('Sorry');
15* end;
16 /
Procedure created.
SQL> execute q4;
BEGIN q4; END;
*
ERROR at line 1:
ORA-06550: line 1, column 7:
PLS-00306: wrong number or types of arguments in call to 'Q4'
ORA-06550: line 1, column 7:
PL/SQL: Statement ignored
I want all the employees whose
empno is between 6000 and 8000.
Thanks
Mahesh
October 11, 2003 - 10:24 am UTC
well lets see -- you have a procedure q4
It expects 3 out parameters.
You have given it no OUT parameters. hence it fails.
I don't understand your logic at all -- there are so many things wrong with this procedure.
if you want all emps between 6000 and 8000, you run a SINGLE query. procedure would be simple
procedure q4
is
begin
for x in ( select ename from emp where empno between 6000 and 8000 )
loop
dbms_output.put_line( x.ename );
end loop;
end;
/
Procedure
Mahesh, October 17, 2003 - 5:46 am UTC
Dear Sir,
Thanks very much for sending reply.I did it, it is working
successfully.
Thanks
Girish
Differences between 7.0,8i,9i
Mahesh, October 28, 2003 - 11:00 pm UTC
Dear sir,
Please tell me the major 10 differences between Oracle 7.0 , 8i and 9i
I want difference between them all three. I read the book but unable to find any differences between them.
Kind regards,
GIrsh
October 29, 2003 - 6:34 am UTC
hmmm, you read the 8i "getting to know" guide and the 9i "new features guide" and didn't find any differences????
goto otn.oracle.com
click on documentation
goto either the 8i docs and read the "getting to know" guide or the 9i docs and read the "new features guide"
Oracle
Mahesh, November 04, 2003 - 1:21 am UTC
Hi,
I have a differene question. The design of oracle is totally based on engineering concept and it has all grea features, then why user can not recover the table once it is droped. Once it is droped it is preety dropped, we can recover the table from backup utility or some another method also.
But why we can no recover the table by giving simple command. Why Oracle made such things that we can not pass DDL statement into DML. Is there any specific reason for dropping table.
Thanks,
Mahesh
November 04, 2003 - 8:17 am UTC
Oracle 10g has an "undrop" command. It is implemented using a recycle bin concept. Meaning, as long as there is tons of free space in the tablespace and we don't need to reuse the dropped tables extents -- you can undrop it. But, if you drop a table and then immediate recreate a new table that consumes lots of space in that tablespace -- the odds are going down that you can undrop the table.
I don't know what you mean by "we cannot pass ddl statement into dml"
VARRAY
Mahesh, November 04, 2003 - 11:45 pm UTC
Hello Sir,
Thanks for the reply on dropping table query.
NOw my queseion on VARRAY :
I am trying to write a function that will as its input parameters take a character string representing date formatted in at least seven different ways. Within the function, store all of those formats in a VARRAY. Function should return a DATE.
Please tell me how to use VARRAY concept in this.
Thanks
Mahesh
November 05, 2003 - 8:23 am UTC
sounds like a bad idea -- especially -- ESPECIALLY if the formats include
dd/mm/yyyy
mm/dd/yyyy
sounds like a really really bad idea. sounds like you are going to have garbage in your DATABASE!
ops$tkyte@ORA920PC> create or replace function my_to_date( p_str in varchar2 ) return date
2 as
3 type array is table of varchar2(30);
4
5 l_fmts array := array( 'dd-mon-yyyy', 'Mon DD YYYY' );
6 l_dt date;
7 begin
8 for i in 1 .. l_fmts.count
9 loop
10 begin
11 l_dt := to_date( p_str, l_fmts(i) );
12 return l_dt;
13 exception
14 when others then null;
15 end;
16 end loop;
17 raise_application_error( -20001, p_str || ' cannot be converted' );
18 end;
19 /
Function created.
ops$tkyte@ORA920PC>
ops$tkyte@ORA920PC>
ops$tkyte@ORA920PC> select my_to_date( 'fa' ) from dual;
select my_to_date( 'fa' ) from dual
*
ERROR at line 1:
ORA-20001: fa cannot be converted
ORA-06512: at "OPS$TKYTE.MY_TO_DATE", line 17
ops$tkyte@ORA920PC> select my_to_date( '01-jan-2003' ) from dual;
MY_TO_DAT
---------
01-JAN-03
ops$tkyte@ORA920PC> select my_to_date( 'Jan 01 2003' ) from dual;
MY_TO_DAT
---------
01-JAN-03
ops$tkyte@ORA920PC>
VARRAY
Mahesh, November 06, 2003 - 1:14 am UTC
Dear sir,
Thanks for sending the code. This is nested tble you used in that code. I want from VARRAY, but i did it and it is woking fine. Now i want 10 different formats of dates ,you have taken only 2 formats,then user can choose any one of that formats of dates. How to do it ?
Thanks
Mahesh
November 06, 2003 - 7:45 am UTC
why do you want a varray.
a nested table is easier.
you can put in any number of dates in that array, it was just "an example". i thought it was self evident that you would just list N dates in that array.
you can make it a varray easily - just change the create type. but -- to what end? it would only limit the number of elements in the array.
I never use varray's, never.
Optimizer Cost
Ash, November 06, 2003 - 9:42 pm UTC
I have always thought that the Cost shown in plan always depended upon resouces used such as cpu time, no of disk reads and so on. So if more of these resources are used, cost could go up and runtime may come down or go up irrespective of cost.
Is this right ?
Thanks for all your support.
November 07, 2003 - 8:29 am UTC
ESTIMATED -- guestimated.
if it were perfect, this would be true. It cannot be perfect (only hindsight is 20/20)
Drop table
Mahesh, November 10, 2003 - 5:50 am UTC
Dear sir,
Oracle 10g has an "undrop" command.
What is Oracle 10g ? Where it is used ? Still anybody using Oracle 10g?
Thnaks
Mahesh
November 10, 2003 - 8:00 am UTC
Oracle 10g is the next release of Oracle after Oracle 9iR2. It's the next release.
Mahesh NOOOOOOO
Matt, November 10, 2003 - 10:37 am UTC
Why don't you do some searches before adding simple questions to a completely unrelated thread.
Everything you could need can be found at:
technet.oracle.com
What do these figures mean???
A reader, November 10, 2003 - 5:58 pm UTC
Tom,
I have the 2 queries with same execution plans and cost but differnt rows/bytes based on whether I put a hint or not.
Query - 1 (with hint)
====================
SELECT /*+ORDERED USE_NL(pd,pcd) */ p.position_id, pd.position_detail_id, pd.Alt_Market_Value,
pd.Alt_Market_Value_Income, pd.Alt_Book_Value, pcd.Alt_GAAP_Cost_Value,
pd.Base_FX_Unrealized_GL, pd.Alt_Base_Unrealized_GL, pd.Alt_Base_FX_Unrealized_GL,
pd.Alt_Accrued_Income, pd.Local_to_sec_basis_fx_rate, pd.Income_Receivable_Local,
pd.Income_Receivable_Base, pd.Alt_Income_Receivable
FROM holdingdbo.position p,
holdingdbo.position_detail pd,
holdingdbo.position_cost_detail pcd
WHERE p.position_id = pd.position_id
AND pd.position_detail_id = pcd.position_detail_id
AND p.update_date >= '09-nov-03'
AND p.src_intfc_inst = 17
SELECT STATEMENT Hint=CHOOSE 1 K 1767
NESTED LOOPS 1 K 88 K 1767
NESTED LOOPS 1 K 76 K 703
TABLE ACCESS BY INDEX ROWID POSITION 16 304 47
INDEX RANGE SCAN I_POSITION_UPDATE_DATE 187 4
TABLE ACCESS BY INDEX ROWID POSITION_DETAIL 89 3 K 41
INDEX RANGE SCAN UK_POSITION_DETAIL 89 5
TABLE ACCESS BY INDEX ROWID POSITION_COST_DETAIL 1 9 3
INDEX UNIQUE SCAN SYS_C002482 1 2
Query 2 (Without hint)
======================
SELECT p.position_id, pd.position_detail_id, pd.Alt_Market_Value,
pd.Alt_Market_Value_Income, pd.Alt_Book_Value, pcd.Alt_GAAP_Cost_Value,
pd.Base_FX_Unrealized_GL, pd.Alt_Base_Unrealized_GL, pd.Alt_Base_FX_Unrealized_GL,
pd.Alt_Accrued_Income, pd.Local_to_sec_basis_fx_rate, pd.Income_Receivable_Local,
pd.Income_Receivable_Base, pd.Alt_Income_Receivable
FROM holdingdbo.position p,
holdingdbo.position_detail pd,
holdingdbo.position_cost_detail pcd
WHERE p.position_id = pd.position_id
AND pd.position_detail_id = pcd.position_detail_id
AND p.update_date >= '09-nov-03'
AND p.src_intfc_inst = 17
SELECT STATEMENT Hint=CHOOSE 1 K 1767
NESTED LOOPS 1 K 88 K 1767
NESTED LOOPS 1 K 76 K 703
TABLE ACCESS BY INDEX ROWID POSITION 16 304 47
INDEX RANGE SCAN I_POSITION_UPDATE_DATE 187 4
TABLE ACCESS BY INDEX ROWID POSITION_DETAIL 89 3 K 41
INDEX RANGE SCAN UK_POSITION_DETAIL 89 5
TABLE ACCESS BY INDEX ROWID POSITION_COST_DETAIL 85 M 733 M 3
INDEX UNIQUE SCAN SYS_C002482 1 2
1. Can you pls. explain me why the figures are different even though the cost and execution plans are similar.
2. What do these figures mean and what to look for in both of the above EXPLAIN PLAN. Both of the above queries select around 56583 records.
3. What does it mean by 85M in Rows and 733M in Bytes. Why is it 85M and 733M in second query 1 and 9 in first query and what difference does the hint make to the query even though they have the same execution plans.
4. The Rows says 1K, but the total number of rows retreived by the query is 58K.
5. Any ideas on how to read this output and the output produced by EXPLAIN PLAN section in TKPROF output such as the Card etc. I am struggling to interpret the EXPLAIN PLAN output. Pls let me know what to look for in these plans and where to focus our efforts in the plan
6. As soon as I showed the query without the hint, he immediately spotted the problem and asked me to put an hint (which is the first query). What would have prompted him to put the hint and if I were him how to diagonise such problems
Your help in this regard is highly appreciated.
November 10, 2003 - 7:13 pm UTC
the end result numbers are not any different?
but basically it is saying "you know, if you let me do it my way, i'll apply the predicates like this resulting in N rows from this table and M rows from over here.
If you make me do it your way, i'll have to goto this table and get L rows and then apply this other predicate to get K rows.
the end result is 1k rows -- in both cases
How hints made the difference
A reader, November 11, 2003 - 7:38 pm UTC
Tom,
My question is that even though they show the same execution plans in both cases, why did putting the hint made a huge difference in the number of bytes particularly for the last table position_cost_detail that shows a bytes of 733M without hints and only 9 with hints).
1. I would like to know how to interpret these figures and what these figures mean. I particularly want to know about Cost, Card, Rows and Bytes
2. How does putting a hint drastically change these figures
3. How can I get a clue that I need to use a hint or is it going to be a trial-and-error approach.
November 12, 2003 - 7:09 am UTC
it is pure "math" here.
you said "hey, apply these predicates and join these tables" -- without hints the optimizer said,
you know, I'll goto table t1 and get 50 rows given the predicate on t1, then we'll join that to t2 and get on average 2 rows for each row in t1 given the predicate and join condition on t2, resulting in 100 rows
you said "no no, don't drive from t1, drive from t2"
the optimizer says then
ok, we'll goto t2 and get 1,000 rows based on the predicate. we'll join that to t1 and get an average of 0.1 rows resulting in 100 rows
you made it do a different amount of work to START with -- but then the answer of course would be the same.
here is a simple example. t1 indexed by owner, t2 indexed by object_type.
if you say "where owner = scott", that'll get very few rows.
if you say "where object_type = table", that'll get a ton.
so you join t1 to t2 and have a predicate:
where t1.owner = scott and t2.object_type = table
if you do t1 FIRST, you get a few rows, then join to t2.
if you do t2 FIRST, you get a TON of rows, then join to t1.
you should drive from t1 into t2, your ordered use_nl "hint" changed that and said drive from t2 to t1. you changed the cardinalities by changing the order of application of predicates.
you should (IMHO) use hints only when
a) you've identified a problem
b) you want to see if there is a "better" way
c) you use hints for testing, to see if there is a better way
d) you then figure out what statistics you are missing based on that in order to get the hints the heck out of there. i don't use hints in real code (with the exception of "safe" hints like FIRST_ROWS -- hints that give the optimizer INFORMATION but don't tell it exactly HOW to process the query
(if you have my new book, effective oracle by design, I go over this in more detail)
DOC>create table t1 as select * from all_objects;
DOC>alter table t1 add constraint t1_pk primary key(object_id);
DOC>create index t1_idx on t1(owner);
DOC>analyze table t1 compute statistics
DOC>for table for all indexes for all indexed columns;
DOC>
DOC>create table t2 as select * from all_objects;
DOC>alter table t2 add constraint t2_pk primary key(object_id);
DOC>create index t2_idx on t2(object_type);
DOC>analyze table t2 compute statistics
DOC>for table for all indexes for all indexed columns;
DOC>*/
ops$tkyte@ORA920LAP>
ops$tkyte@ORA920LAP> set autotrace traceonly explain
ops$tkyte@ORA920LAP> select *
2 from t2, t1
3 where t1.owner = 'SCOTT'
4 and t2.object_type = 'TABLE'
5 and t1.object_id = t2.object_id;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=12 Card=10 Bytes=1920)
1 0 NESTED LOOPS (Cost=12 Card=10 Bytes=1920)
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'T1' (Cost=2 Card=10 Bytes=960)
3 2 INDEX (RANGE SCAN) OF 'T1_IDX' (NON-UNIQUE) (Cost=1 Card=10)
4 1 TABLE ACCESS (BY INDEX ROWID) OF 'T2' (Cost=1 Card=1 Bytes=96)
5 4 INDEX (UNIQUE SCAN) OF 'T2_PK' (UNIQUE)
ops$tkyte@ORA920LAP>
ops$tkyte@ORA920LAP> select /*+ ORDERED USE_NL( t2, t1 ) */ *
2 from t2, t1
3 where t1.owner = 'SCOTT'
4 and t2.object_type = 'TABLE'
5 and t1.object_id = t2.object_id;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=253 Card=10 Bytes=1920)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T1' (Cost=253 Card=1 Bytes=96)
2 1 NESTED LOOPS (Cost=253 Card=10 Bytes=1920)
3 2 TABLE ACCESS (BY INDEX ROWID) OF 'T2' (Cost=38 Card=1048 Bytes=100608)
4 3 INDEX (RANGE SCAN) OF 'T2_IDX' (NON-UNIQUE) (Cost=3 Card=1048)
5 2 BITMAP CONVERSION (TO ROWIDS)
6 5 BITMAP AND
7 6 BITMAP CONVERSION (FROM ROWIDS)
8 7 INDEX (RANGE SCAN) OF 'T1_IDX' (NON-UNIQUE)
9 6 BITMAP CONVERSION (FROM ROWIDS)
10 9 INDEX (RANGE SCAN) OF 'T1_PK' (UNIQUE)
ops$tkyte@ORA920LAP> set autotrace off
Execution plans did not change
A reader, November 12, 2003 - 6:36 pm UTC
Tom,
Your explanation on this is very clear. But what I am confused about is that both of the plans in my example show the same execution path. The only difference is in the bytes/rows accessed. My understanding and your example shows that the bytes/rows changed as a result of change in execution plan whereas in my case, the execution plan did not change,but only the bytes/rows changed (decreased) as a result of the hint.
Also, you mention that you will investigate as to why it chose the plan and once it is figured out, you will remove the hint. How do I approach this investigation and what steps needs to be followed and the things that needs to be checked
November 13, 2003 - 6:39 am UTC
can you post a nicely (not wrapped) formatted AUTOTRACE TRACEONLY EXPLAIN.
I'm not really sure what numbers I'm looking at here.
Why cost doesn't appear
OracleO, January 05, 2004 - 12:32 am UTC
Hi Tom,
I ran the following autotrace, but dont get to see the 'cost' variable.
here is the copy and paste of sqlplus session. Any ideas why cost isnot appearing. Oracle version 8.1.7.0.1.
------------------------------------------------
oracleo@oracle:indiadb>set autotrace traceonly
oracleo@oracle:indiadb>/
304 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 FILTER
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'OBJ$'
3 2 INDEX (RANGE SCAN) OF 'I_OBJ2' (UNIQUE)
4 1 TABLE ACCESS (CLUSTER) OF 'TAB$'
5 4 INDEX (UNIQUE SCAN) OF 'I_OBJ#' (NON-UNIQUE)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
840 consistent gets
0 physical reads
0 redo size
16470 bytes sent via SQL*Net to client
2641 bytes received via SQL*Net from client
22 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
304 rows processed
------------------------------------------------------
Thanks,
OracleO
January 05, 2004 - 7:15 am UTC
when using the RBO (rule based optimizer), there are no costs.
you are using the RBO -- I use autotrace like that just to see which optimizer was in fact used. no COST/CARD/BYTES -> RBO
Your Book
A reader, January 05, 2004 - 6:02 pm UTC
Tom,
In your book do you cover the following as you mentioned above?
<quot>
a) you've identified a problem
b) you want to see if there is a "better" way
c) you use hints for testing, to see if there is a better way
d) you then figure out what statistics you are missing based on that in order to get the hints the heck out of there. i don't use hints in real code (with the exception of "safe" hints like FIRST_ROWS -- hints that give the optimizer INFORMATION but don't tell it exactly HOW to process the query
</quote>
How do you identify step d. ie. to determine missing statistics based on putting the hint and removing the hint. Pls explain. Is that detailed in your new book?
January 06, 2004 - 8:01 am UTC
the chapter on the tools I use (autotrace, explain plan and the 10053 trace event) as well as the chapter on the CBO talk about this.
Not using RBO !
OracleO, January 06, 2004 - 12:15 am UTC
No Tom. I am not using RBO as the output shows optimizer = CHOOSE !!
Am i right.
Thanks for your time.
OracleO
January 06, 2004 - 8:28 am UTC
you are wrong. choose means "hey Oracle -- choose which optimizer you want to use, the RBO or CBO"
the database chose to use the RBO in your case. It'll do that when
o there are no stats on any referenced object
o you are not accessing any objects that require the cbo (partitioned tables for example, iots for example)
o there are no valid hints (other then RULE hint of course ;)
TKPROF output
A reader, January 07, 2004 - 10:44 am UTC
Tom,
1. In a TKPROF output, I see somethings like cr,r,w,us, time etc under Row Source Operation. Can you explain what those figures are and can we get any clues by looking at them. Also, I do not see these things discussed in "Expert One-on-One Oracle". Are they available in "Effective Oracle By Design". If not, where can I get info on those.
Rows Row Source Operation
------- ---------------------------------------------------
937495 UPDATE (cr=1918058 r=2957 w=0 time=203219226 us)
937495 INDEX UNIQUE SCAN SYS_C002495 (cr=1874992 r=1973 w=0 time=20988911 us)(object id 227224)
2. Also, on the 9i Trace output, I see sections on various Wait Events. Does that mean we no longer need to use the 10046 (extended trace) event in 9i
3. I have the OPTIMIZER_GOAL as FIRST_ROWS on the UPDATE statement and I use a BULK UPDATE using the FORALL syntax. My DBA says that since it is using FIRST_ROWS, it is trying to do a row-by-row process and asked me to change the optimizer mode to CHOOSE or ALL_ROWS.
Is his claim true? Also, what is the use of
OPTIMIZER_GOAL in a DML. My understanding is that it is
valid only for a query.
Will parallel DML be helpful if the table is not
partitioned. Also, how do we use parallel DML with BULK
operations?
Right now, I use a the following construct :
OPEN cursor
LOOP
BULK fetch
some update statements
END LOOP
CLOSE cursor
My DBA says, we could use SELECT FOR UPDATE and using
WHERE CURRENT OF in UPDATE to improve performance. How
do we do that in a FORALL construct.
4. I have difficulty in interpreting as to why a certain DML took a long time to execute. Any help on that is greatly appreciated.
I have a situation where an update of 937495 rows in each of the 3 tables and it takes 14 minutes for the entire process to complete (for all 3 tables together in spite of using bulk operations) and I update the tables one after another. Any way to do this in Oracle for all 3 tables at same time.
Thanks
January 07, 2004 - 6:20 pm UTC
1) they are new with 9ir2.
from Effective Oracle By Design:
...
Now, we see even more detail. Not only do we get the plan and the number of rows flowing through each step of the plan, but we also see logical I/O, physical I/O, and timing information. (This 9.2.0.2 example was performed on a different server, so these statistics are slightly different than the ones shown earlier.) These are represented as follows:
· cr is consistent mode reads, showing consistent gets (logical I/O)
· r is physical reads
· w is physical writes
· time is elapsed time in millionths of a second; us stands for
microsecond
.......
2) you NEED the 10046 to get the wait events recorded in the trace file.
what it means is you no longer need to read the trace file to find the waits, tkprof is showing them to you.
3) and tom says "lose the procedural code", just UPDATE.
done
loop
fetch
update
end
just
update
first_rows just affects plans -- it does not make it "row by row". if you are happy with the plan, it is just dandy (but first_rows in an update, which does not return until the last row is processed "doesn't make sense")
4) crystal ball is in the shop -- sorry, wth that level of detail -- no way I can answer (other then to reiterate that perhaps you need 3 updates with no procedural code!)
Optimizer operation
Wayne, March 01, 2004 - 9:08 am UTC
Tom
How have you been?
This article is helping me, but I am trying to figure out (1) what affects the Optimizer and its choice of operation? We have the same code & tables on a test and on a PROD box and the PROD is running dog slow (and choosing a full table scan over the index search).
(2) How does the clustering factor actually effect the CBO?
Or, what are the apples and oranges that I am actually comparing?
The PROD box is returning a query in 24 seconds and in TEST it is about 3.
Like you always say, "sho me!" ... clustering factor is within 2% of each other on the PROD and TEST
Clustering/rows is 70.xx% on PROD and it is 69.xx% on TEST for one of the tables.
I have looked at all of the arcane and seldom used parameters that I can think of.
Test was imported out of production last week. Test tablespace is on a LMT. I moved the tables, rebuilt the indexes and analyzed tables (and checked the date/time group in dba_tables & indexes.)
Thanks
Wayne
March 01, 2004 - 9:15 am UTC
lets see
a) the autotrace traceonly explain from both
b) the tkprof from both (using the same inputs)
optimizer Hierarchy
A reader, March 01, 2004 - 10:31 am UTC
Hi tom,
I know if I am retriving all the rows from the table, I need a full tablescan. but if I am searching a value based on key or some select operation where I am retriving a small amout of data, ( Using cost base analysis)what should be the "general Hierarchy" of the plan
like nested tables vs. hash join vs. sort mearge ... ??
how do I compare those ?
March 01, 2004 - 10:41 am UTC
well, it depends really.
select * from emp, dept where emp.deptno = dept.deptno;
Now, i'm accessing EMP by deptno, "searching based on a key or some select operation"
but, this would look like a HASH join to me.. however
select * from emp, dept where emp.deptno = dept.deptno and dname like 'A%';
might be a nested loops.
I would categorize/GENERALIZE it this way (generalizations never hold up 100% remember):
if i want first rows, nested loops
if i want all rows, hash
never want sort merge, that happens when you disable hash joins ;)
Thanks !!
A reader, March 01, 2004 - 10:46 am UTC
thanks tom.
what does the nested loop really do ? is it
for (int i, i< count; i++)
for (int j , j < i, j++ )
{ some operation ;
}
is this ? or something else ?
March 01, 2004 - 10:55 am UTC
thats sort of it, yes. if you are interested, i put the psuedo code for the access paths in my new book "Effective Oracle by Design"
autotrace
Wayne, March 01, 2004 - 11:29 am UTC
TEST ENVIRONMENT
SQL> set autotrace traceonly explain;
SQL> SELECT sum(total_trnd), sum(total_reqd), sum(avail_trnd), sum(avail_reqd) FROM trn_event_pers_summary s, v_pers_unit v WHERE cmdstr_cs_id = 95986 AND s.pers_id = v.pers_pers_id AND element_id = 1683150;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2363 Card=1 Bytes=10
2)
1 0 SORT (AGGREGATE)
2 1 NESTED LOOPS (OUTER) (Cost=2363 Card=1 Bytes=102)
3 2 NESTED LOOPS (OUTER) (Cost=2363 Card=1 Bytes=97)
4 3 NESTED LOOPS (OUTER) (Cost=2362 Card=1 Bytes=86)
5 4 NESTED LOOPS (Cost=2361 Card=1 Bytes=76)
6 5 NESTED LOOPS (Cost=2327 Card=17 Bytes=1003)
7 6 NESTED LOOPS (Cost=2327 Card=17 Bytes=935)
8 7 NESTED LOOPS (Cost=2327 Card=17 Bytes=850)
9 8 NESTED LOOPS (Cost=2327 Card=17 Bytes=816)
10 9 NESTED LOOPS (Cost=603 Card=862 Bytes=18
102)
11 10 NESTED LOOPS (Cost=3 Card=20 Bytes=240
)
12 11 TABLE ACCESS (BY INDEX ROWID) OF 'CO
MMAND_STRUCT' (Cost=2 Card=1 Bytes=10)
13 12 INDEX (UNIQUE SCAN) OF 'CS_PK' (UN
IQUE) (Cost=1 Card=1)
14 11 INDEX (FULL SCAN) OF 'VRANK_RANK_NUM
ERIC_UK' (UNIQUE) (Cost=1 Card=20 Bytes=40)
15 10 TABLE ACCESS (BY INDEX ROWID) OF 'MANP
OWER' (Cost=30 Card=203329 Bytes=1829961)
16 15 INDEX (RANGE SCAN) OF 'MP_PASCODE_I'
(NON-UNIQUE) (Cost=1 Card=203329)
17 9 TABLE ACCESS (BY INDEX ROWID) OF 'PERSON
NEL_FEED' (Cost=2 Card=145255 Bytes=3921885)
18 17 INDEX (RANGE SCAN) OF 'PFEED_MP_FK_I'
(NON-UNIQUE) (Cost=1 Card=145255)
19 8 INDEX (UNIQUE SCAN) OF 'VVL_FLYER_PK' (UNI
QUE)
20 7 INDEX (UNIQUE SCAN) OF 'VAFSC_PK' (UNIQUE)
21 6 INDEX (UNIQUE SCAN) OF 'PERS_PK' (UNIQUE)
22 5 TABLE ACCESS (BY INDEX ROWID) OF 'TRN_EVENT_PERS
_SUMMARY' (Cost=2 Card=498 Bytes=8466)
23 22 INDEX (UNIQUE SCAN) OF 'TRNEVPERS_PK' (UNIQUE)
(Cost=1 Card=498)
24 4 TABLE ACCESS (BY INDEX ROWID) OF 'UTC_GROUP_MEMBER
S' (Cost=1 Card=189060 Bytes=1890600)
25 24 INDEX (UNIQUE SCAN) OF 'UGM_PFEED_FK_I' (UNIQUE)
26 3 TABLE ACCESS (BY INDEX ROWID) OF 'UTC_GROUP' (Cost=1
Card=6875 Bytes=75625)
27 26 INDEX (UNIQUE SCAN) OF 'UG_PK' (UNIQUE)
28 2 INDEX (UNIQUE SCAN) OF 'VUTC_PK' (UNIQUE)
PROD ENVIRONMENT
SQL> set autotrace traceonly explain;
SQL> SELECT sum(total_trnd), sum(total_reqd), sum(avail_trnd), sum(avail_reqd) FROM trn_event_pers_summary s, v_pers_unit v WHERE cmdstr_cs_id = 95986 AND s.pers_id = v.pers_pers_id AND element_id = 1683150;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1596 Card=1 Bytes=10
2)
1 0 SORT (AGGREGATE)
2 1 NESTED LOOPS (OUTER) (Cost=1596 Card=1 Bytes=102)
3 2 NESTED LOOPS (OUTER) (Cost=1596 Card=1 Bytes=97)
4 3 NESTED LOOPS (OUTER) (Cost=1595 Card=1 Bytes=86)
5 4 NESTED LOOPS (Cost=1594 Card=1 Bytes=76)
6 5 NESTED LOOPS (Cost=1560 Card=17 Bytes=1003)
7 6 NESTED LOOPS (Cost=1560 Card=17 Bytes=935)
8 7 HASH JOIN (Cost=1560 Card=17 Bytes=850)
9 8 NESTED LOOPS (Cost=611 Card=80106 Bytes=32
84346)
10 9 HASH JOIN (Cost=611 Card=80106 Bytes=312
4134)
11 10 NESTED LOOPS (Cost=3 Card=20 Bytes=240
)
12 11 TABLE ACCESS (BY INDEX ROWID) OF 'CO
MMAND_STRUCT' (Cost=2 Card=1 Bytes=10)
13 12 INDEX (UNIQUE SCAN) OF 'CS_PK' (UN
IQUE) (Cost=1 Card=1)
14 11 INDEX (FULL SCAN) OF 'VRANK_RANK_NUM
ERIC_UK' (UNIQUE) (Cost=1 Card=20 Bytes=40)
15 10 TABLE ACCESS (FULL) OF 'PERSONNEL_FEED
' (Cost=603 Card=145255 Bytes=3921885)
16 9 INDEX (UNIQUE SCAN) OF 'VVL_FLYER_PK' (U
NIQUE)
17 8 TABLE ACCESS (FULL) OF 'MANPOWER' (Cost=34
3 Card=203329 Bytes=1829961)
18 7 INDEX (UNIQUE SCAN) OF 'VAFSC_PK' (UNIQUE)
19 6 INDEX (UNIQUE SCAN) OF 'PERS_PK' (UNIQUE)
20 5 TABLE ACCESS (BY INDEX ROWID) OF 'TRN_EVENT_PERS
_SUMMARY' (Cost=2 Card=492 Bytes=8364)
21 20 INDEX (UNIQUE SCAN) OF 'TRNEVPERS_PK' (UNIQUE)
(Cost=1 Card=492)
22 4 TABLE ACCESS (BY INDEX ROWID) OF 'UTC_GROUP_MEMBER
S' (Cost=1 Card=189118 Bytes=1891180)
23 22 INDEX (UNIQUE SCAN) OF 'UGM_PFEED_FK_I' (UNIQUE)
24 3 TABLE ACCESS (BY INDEX ROWID) OF 'UTC_GROUP' (Cost=1
Card=6905 Bytes=75955)
25 24 INDEX (UNIQUE SCAN) OF 'UG_PK' (UNIQUE)
26 2 INDEX (UNIQUE SCAN) OF 'VUTC_PK' (UNIQUE)
March 01, 2004 - 11:45 am UTC
ok, can you do a 10053 level trace on each and just post the differences in the init.ora parameters? eg:
alter session set events '10053 trace name context forever, level 1'
then autotrace traceonly explain in both, take the trace files and looking at the "PARAMETERS USED BY THE OPTIMIZER" section, diff them -- see if anything is "different"
followup
Wayne, March 01, 2004 - 11:56 am UTC
Thanks Tom!
The output will be out to you soon.
However, I was just told that the Resource Plan could be a problem depending upon the Operating System.
Could that be true?
Wayne
March 01, 2004 - 12:10 pm UTC
you mean with the dbms_resource_manager stuff? I'm not aware of that affecting the plan, it uses the developed plan, but I don't think it influences it.
DBMS_resource and optimizer
A reader, March 01, 2004 - 2:02 pm UTC
Tom,
I will send the full tkprofs to your email.
Here is the timing results - I think it may be that the underlying user is included in the batch resource plan. I cannot think of another good reason for the bigger box to spend that much extra time on the query.
PROD
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.85 0.84 0 23 0 0
Execute 2 0.00 0.00 0 0 0 0
Fetch 2 6.41 31.73 6052 12326 24 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 5 7.26 32.57 6052 12349 24 1
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 20853
TEST
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 1.57 1.70 50 1772 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.34 2.03 485 18517 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 1.91 3.73 535 20289 0 1
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 26694
March 01, 2004 - 3:10 pm UTC
nah, that won't be it -- i'd be interested in the parameters here.
Parameters affecting optimizer
Wayne, March 01, 2004 - 5:00 pm UTC
$> cat server_params.dif
$> diff training_params production_params > server_params.dif
7c7
< HASH_AREA_SIZE = 393216
---
> HASH_AREA_SIZE = 1048576
53,54c53,54
< DB_FILE_MULTIBLOCK_READ_COUNT = 8
< SORT_AREA_SIZE = 393216
---
> DB_FILE_MULTIBLOCK_READ_COUNT = 16
> SORT_AREA_SIZE = 1048576
-- ********************** TRAINING SERVER
training> cat training_params
***************************************
PARAMETERS USED BY THE OPTIMIZER
********************************
OPTIMIZER_FEATURES_ENABLE = 8.1.7
OPTIMIZER_MODE/GOAL = Choose
OPTIMIZER_PERCENT_PARALLEL = 0
HASH_AREA_SIZE = 393216
HASH_JOIN_ENABLED = TRUE
HASH_MULTIBLOCK_IO_COUNT = 0
OPTIMIZER_SEARCH_LIMIT = 5
PARTITION_VIEW_ENABLED = FALSE
_ALWAYS_STAR_TRANSFORMATION = FALSE
_B_TREE_BITMAP_PLANS = FALSE
STAR_TRANSFORMATION_ENABLED = FALSE
_COMPLEX_VIEW_MERGING = FALSE
_PUSH_JOIN_PREDICATE = FALSE
PARALLEL_BROADCAST_ENABLED = FALSE
OPTIMIZER_MAX_PERMUTATIONS = 80000
OPTIMIZER_INDEX_CACHING = 0
OPTIMIZER_INDEX_COST_ADJ = 100
QUERY_REWRITE_ENABLED = FALSE
QUERY_REWRITE_INTEGRITY = ENFORCED
_INDEX_JOIN_ENABLED = FALSE
_SORT_ELIMINATION_COST_RATIO = 0
_OR_EXPAND_NVL_PREDICATE = TRUE
_NEW_INITIAL_JOIN_ORDERS = FALSE
_OPTIMIZER_MODE_FORCE = TRUE
_OPTIMIZER_UNDO_CHANGES = FALSE
_UNNEST_SUBQUERY = FALSE
_PUSH_JOIN_UNION_VIEW = FALSE
_FAST_FULL_SCAN_ENABLED = TRUE
_OPTIM_ENHANCE_NNULL_DETECTION = TRUE
_ORDERED_NESTED_LOOP = FALSE
_NESTED_LOOP_FUDGE = 100
_NO_OR_EXPANSION = FALSE
_QUERY_COST_REWRITE = TRUE
QUERY_REWRITE_EXPRESSION = TRUE
_IMPROVED_ROW_LENGTH_ENABLED = TRUE
_USE_NOSEGMENT_INDEXES = FALSE
_ENABLE_TYPE_DEP_SELECTIVITY = TRUE
_IMPROVED_OUTERJOIN_CARD = TRUE
_OPTIMIZER_ADJUST_FOR_NULLS = TRUE
_OPTIMIZER_CHOOSE_PERMUTATION = 0
_USE_COLUMN_STATS_FOR_FUNCTION = TRUE
_SUBQUERY_PRUNING_ENABLED = TRUE
_SUBQUERY_PRUNING_REDUCTION_FACTOR = 50
_SUBQUERY_PRUNING_COST_FACTOR = 20
_LIKE_WITH_BIND_AS_EQUALITY = FALSE
_TABLE_SCAN_COST_PLUS_ONE = FALSE
_SORTMERGE_INEQUALITY_JOIN_OFF = FALSE
_DEFAULT_NON_EQUALITY_SEL_CHECK = TRUE
_ONESIDE_COLSTAT_FOR_EQUIJOINS = TRUE
DB_FILE_MULTIBLOCK_READ_COUNT = 8
SORT_AREA_SIZE = 393216
***************************************
BASE STATISTICAL INFORMATION
-- PRODUCTION SERVER
production> cat production_params
***************************************
PARAMETERS USED BY THE OPTIMIZER
********************************
OPTIMIZER_FEATURES_ENABLE = 8.1.7
OPTIMIZER_MODE/GOAL = Choose
OPTIMIZER_PERCENT_PARALLEL = 0
HASH_AREA_SIZE = 1048576
HASH_JOIN_ENABLED = TRUE
HASH_MULTIBLOCK_IO_COUNT = 0
OPTIMIZER_SEARCH_LIMIT = 5
PARTITION_VIEW_ENABLED = FALSE
_ALWAYS_STAR_TRANSFORMATION = FALSE
_B_TREE_BITMAP_PLANS = FALSE
STAR_TRANSFORMATION_ENABLED = FALSE
_COMPLEX_VIEW_MERGING = FALSE
_PUSH_JOIN_PREDICATE = FALSE
PARALLEL_BROADCAST_ENABLED = FALSE
OPTIMIZER_MAX_PERMUTATIONS = 80000
OPTIMIZER_INDEX_CACHING = 0
OPTIMIZER_INDEX_COST_ADJ = 100
QUERY_REWRITE_ENABLED = FALSE
QUERY_REWRITE_INTEGRITY = ENFORCED
_INDEX_JOIN_ENABLED = FALSE
_SORT_ELIMINATION_COST_RATIO = 0
_OR_EXPAND_NVL_PREDICATE = TRUE
_NEW_INITIAL_JOIN_ORDERS = FALSE
_OPTIMIZER_MODE_FORCE = TRUE
_OPTIMIZER_UNDO_CHANGES = FALSE
_UNNEST_SUBQUERY = FALSE
_PUSH_JOIN_UNION_VIEW = FALSE
_FAST_FULL_SCAN_ENABLED = TRUE
_OPTIM_ENHANCE_NNULL_DETECTION = TRUE
_ORDERED_NESTED_LOOP = FALSE
_NESTED_LOOP_FUDGE = 100
_NO_OR_EXPANSION = FALSE
_QUERY_COST_REWRITE = TRUE
QUERY_REWRITE_EXPRESSION = TRUE
_IMPROVED_ROW_LENGTH_ENABLED = TRUE
_USE_NOSEGMENT_INDEXES = FALSE
_ENABLE_TYPE_DEP_SELECTIVITY = TRUE
_IMPROVED_OUTERJOIN_CARD = TRUE
_OPTIMIZER_ADJUST_FOR_NULLS = TRUE
_OPTIMIZER_CHOOSE_PERMUTATION = 0
_USE_COLUMN_STATS_FOR_FUNCTION = TRUE
_SUBQUERY_PRUNING_ENABLED = TRUE
_SUBQUERY_PRUNING_REDUCTION_FACTOR = 50
_SUBQUERY_PRUNING_COST_FACTOR = 20
_LIKE_WITH_BIND_AS_EQUALITY = FALSE
_TABLE_SCAN_COST_PLUS_ONE = FALSE
_SORTMERGE_INEQUALITY_JOIN_OFF = FALSE
_DEFAULT_NON_EQUALITY_SEL_CHECK = TRUE
_ONESIDE_COLSTAT_FOR_EQUIJOINS = TRUE
DB_FILE_MULTIBLOCK_READ_COUNT = 16
SORT_AREA_SIZE = 1048576
***************************************
BASE STATISTICAL INFORMATION
March 01, 2004 - 5:19 pm UTC
7c7
< HASH_AREA_SIZE = 393216
---
> HASH_AREA_SIZE = 1048576
53,54c53,54
< DB_FILE_MULTIBLOCK_READ_COUNT = 8
< SORT_AREA_SIZE = 393216
---
> DB_FILE_MULTIBLOCK_READ_COUNT = 16
> SORT_AREA_SIZE = 1048576
Larger hash area size and larger multiblock read counts. that would do it.
do this in test
alter session set hash_area_size = 1048576;
alter session set DB_FILE_MULTIBLOCK_READ_COUNT = 16;
and see if you don't get the same behaviour (that would explain "why" they are different)
Awsome!
Wayne, March 02, 2004 - 9:30 am UTC
Tom,
.27 seconds is a pretty site for sore eyes.
Now then what are my implications of:
(1) reducing hash_area_size
(2) reducing DB_FILE_MULTIBLOCK_READ_COUNT
(3) reducing both?
Thanks!
Wayne
March 02, 2004 - 3:56 pm UTC
besides having a consistent set of parameters in test and development so you can expect near same behaviors :)
reducing 1) will make hash joins look less enticing. reducing 2) will make full scans more costly. reducing both will just be "cumulative"
Statistics
Mike, March 04, 2004 - 11:48 am UTC
Oracle 9.2.0.4
Tom,
We just moved from 8.1.7.4 (Rule Based) to 9.2.0.4 (Cost Based)...
And obviously...I am seeing some changes in my sleeping habbits :)
Before..I could see that the index was missing..and add the index..and boom..the query is faster..now I know that this won't be true...and am having a hard time explaining that to developers...any way here is my questions:
These are plans for the same query on two different databases:
production:
636 rows selected.
Elapsed: 00:00:01.09
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=125 Card=219 Bytes=2
5185)
1 0 HASH JOIN (Cost=125 Card=219 Bytes=25185)
2 1 TABLE ACCESS (FULL) OF 'CP_ORG' (Cost=3 Card=241 Bytes=1
0122)
3 1 TABLE ACCESS (BY INDEX ROWID) OF 'PROJ_PROJECT' (Cost=12
1 Card=219 Bytes=15987)
4 3 INDEX (RANGE SCAN) OF 'I_PROJ_PROJECT_UPPER_PROJID' (U
NIQUE) (Cost=4 Card=438)
Statistics
----------------------------------------------------------
4453 recursive calls
0 db block gets
11896 consistent gets
0 physical reads
0 redo size
35420 bytes sent via SQL*Net to client
570 bytes received via SQL*Net from client
44 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
636 rows processed
development:
636 rows selected.
Elapsed: 00:00:01.04
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=5 Card=1 Bytes=4114)
1 0 NESTED LOOPS (Cost=5 Card=1 Bytes=4114)
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'PROJ_PROJECT' (Cost=4
Card=1 Bytes=72)
3 2 INDEX (RANGE SCAN) OF 'I_PROJ_PROJECT_UPPER_PROJID' (U
NIQUE) (Cost=3 Card=1)
4 1 TABLE ACCESS (BY INDEX ROWID) OF 'CP_ORG' (Cost=1 Card=1
Bytes=4042)
5 4 INDEX (UNIQUE SCAN) OF 'I_CP_ORG_ORGID' (UNIQUE)
Statistics
----------------------------------------------------------
4453 recursive calls
0 db block gets
12563 consistent gets
0 physical reads
0 redo size
35049 bytes sent via SQL*Net to client
570 bytes received via SQL*Net from client
44 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
636 rows processed
question 1:
This query was running really slow..when we moved it to production...so I saw that some of the tables it used were not analyzed/monitored..so that I fixed that...and query ran really faster....now those same tables were not analyzed in development either...but query ran faster...
another difference was nested loops vs hash join...and my favorite (the one I am trying to get out of my head)...one is doing a FTS on CP_ORG.
question 2:
Do statistics drop on a snapshot table when we run complete refresh on it?
Thanks..
March 04, 2004 - 2:29 pm UTC
question 1: isn't a question? (but full scans are not evil)
question 2: it just does a truncate and insert /*+ append */ for a complete refresh (or delete + insert if part of a snapshot group). So, no, it does not drop the stats but the stats are in definite need of "regathering"
Tom is really great
TOM Would be friend, May 10, 2004 - 8:12 am UTC
Thanx TOM
Referential integrity in query optimizing
Dorothea, June 14, 2004 - 5:21 am UTC
Hi Tom,
hope you can help me in an argument about how the Oracle optimizer works...
The thing is, we develop our application with a 4 GL tool called UniFace, which has its own repository. So all referential integrity rules are stored in the UniFace repository, but they are not known to Oracle.
Does the Oracle query optimizer take into consideration the business rules? In other words, will the application performance be increased if we define the integrity rules also in Oracle?
Thank you so much in advance! Great to talk to you..
June 14, 2004 - 8:01 am UTC
yes, the optimizer uses primary key, foreign key, not null constraints in processing queries. consider:
ops$tkyte@ORA9IR2> create table t
2 as
3 select * from all_objects;
Table created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create index t_idx on t(object_id);
Index created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> analyze table t compute statistics for table for all indexes;
Table analyzed.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> variable x number;
ops$tkyte@ORA9IR2> set autotrace traceonly explain
ops$tkyte@ORA9IR2> select * from t where object_id = :x;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 <b>Card=321</b> Bytes=30816)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=2 Card=321 Bytes=30816)
2 1 INDEX (RANGE SCAN) OF 'T_IDX' (NON-UNIQUE) (Cost=1 Card=128)
ops$tkyte@ORA9IR2> set autotrace off
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> alter table t add constraint t_pk primary key(object_id);
Table altered.
ops$tkyte@ORA9IR2> set autotrace traceonly explain
ops$tkyte@ORA9IR2> select * from t where object_id = :x;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 <b>Card=1</b> Bytes=96)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=2 Card=1 Bytes=96)
2 1 INDEX (RANGE SCAN) OF 'T_IDX' (NON-UNIQUE) (Cost=1 Card=1)
ops$tkyte@ORA9IR2> set autotrace off
cost based optimization - one more question...
Dorothea, June 14, 2004 - 11:38 am UTC
Thank you very much, the example was very good! Left no place for doubts...
We actually have all primary key and not-null constraints in Oracle, only the foreign key constraints are missing..
Is it the same effect by the cost-based optimization? If so, by providing the foreign keys directly to Oracle we can make a significant improvement in performance!
June 14, 2004 - 11:55 am UTC
if you use materialized views and utilize query rewrite -- YES (if you have "Expert one on one Oracle", i give the classic example in there with emp and dept)
if you do not use materialized views and query rewrite -- I cannot think of a case where it would matter with foreign keys. Doesn't mean it doesn't exists, just that after thinking about it -- I cannot think of a case where I've seen it matter.
I can say that unless uniface LOCKS EXPLICITY (serializes lots of activity), it is pretty much impossible for it to enforce RI in a multi-user environment. (to enforce is correctly in any case). I'd be very very leary of not having RI on.
Thanx
Dorothea, June 15, 2004 - 5:15 am UTC
Thank you once again!
Why using like operator giving the different explain plan
Goh Seong Hin, June 16, 2004 - 12:31 am UTC
Dear Tom,
I did a small test on the like operator.
a. create table emp as select object_id, object_name,status from dba_objects;
b. create index emp_idx1 on emp (object_name,status);
c. dbms_stats.gather_table_stats('APPS','EMP');
d. select count(*) from emp;
COUNT(*)
----------
111102
e. set autotrace on exp
f. select * from emp *** It use index
where object_name like 'EM%'
and status = 'VALID'
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=2 Bytes=62)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'EMP' (Cost=4 Card=2 Bytes=62)
2 1 INDEX (RANGE SCAN) OF 'EMP_IDX1' (NON-UNIQUE) (Cost=3 Card=2)
g. select * from emp *** It use FTS
where object_name like 'E%'
and status = 'VALID'
/
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=89 Card=380 Bytes=11780)
1 0 TABLE ACCESS (FULL) OF 'EMP' (Cost=89 Card=380 Bytes=11780)
h. select object_name from emp
where object_name like 'E%'
and status = 'VALID'
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=11 Card=380 Bytes=10260)
1 0 INDEX (RANGE SCAN) OF 'EMP_IDX1' (NON-UNIQUE) (Cost=11 Card=380 Bytes=10260)
Why the optimizer use different method of executing the SQL? Thanks.
Rgds
SHGoh
June 16, 2004 - 12:21 pm UTC
f used index and table access by index rowid because the estimated cardinality was 2 (small)
g used full scan because the estimated cardinality was computed to be larger and I'd bet your cluster factor for that table was large too (meaning each of the blocks in the table that had an object starting with E contained only one object starting with E -- so it would have to visit lots and lots of blocks)
h used an index range scan because the TABLE itself did not need to be accessed, the entire answer would be retrieved from one or two blocks in the index structure.
Thanks Tom
Goh Seong Hin, June 17, 2004 - 3:48 am UTC
Dear Tom,
Thanks for clarification. Appreciated if you can elaborate more on answer (g).
A bit confused on the following statement. Why It choose index instead of FTS as shown in select * SQL.
select object_id,object_name from emp
where object_name like 'E%'
and status = 'VALID'
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=5 Card=380 Bytes=11780)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'EMP' (Cost=5 Card=380 Bytes=11780)
2 1 INDEX (RANGE SCAN) OF 'EMP_IDX1' (NON-UNIQUE) (Cost=2 Card=380)
Thanks.
RGds
SHGoh
June 17, 2004 - 10:01 am UTC
g. select * from emp *** It use FTS
where object_name like 'E%'
and status = 'VALID'
/
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=89 Card=380 Bytes=11780)
1 0 TABLE ACCESS (FULL) OF 'EMP' (Cost=89 Card=380 Bytes=11780)
g) used a full scan, you are confusing me here...
Sorry Tom
Goh Seong Hin, June 17, 2004 - 11:32 pm UTC
Dear Tom,
Sorry for the confusion, I totally can't recall what I have done to derive the FTS (Perhaps, Index not created or any other reasons).
Today, I try to re-run the SQL and it use index which it suppose to be.
select * from emp
where object_name like 'E%'
and status LIKE 'VALID'
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=8 Card=760 Bytes=24320)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'EMP' (Cost=8 Card=760 Bytes=24320)
2 1 INDEX (RANGE SCAN) OF 'EMP_IDX1' (NON-UNIQUE) (Cost=2 Card=760)
Once again, sorry for the confusion. Thanks.
Rgds
SHGoh
comparing 2 plans identical queries
sonali, September 22, 2004 - 11:23 am UTC
I have a client who says a perticular query runs slow.. but the same query runs fast here with the same client schema and data.
I can post the query here but don't know if that will help.
I asked him to do the explain plan for that query -
Here is what he sent me
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=19 Card=23 Bytes=351
9)
1 0 SORT (ORDER BY) (Cost=19 Card=23 Bytes=3519)
2 1 COUNT (STOPKEY)
3 2 FILTER
4 3 NESTED LOOPS (Cost=17 Card=23 Bytes=3519)
5 4 NESTED LOOPS (Cost=11 Card=1 Bytes=144)
6 5 HASH JOIN (Cost=10 Card=1 Bytes=42)
7 6 TABLE ACCESS (FULL) OF 'ETMOSTUSED' (Cost=3 Ca
rd=29 Bytes=551)
8 6 TABLE ACCESS (FULL) OF 'ETASSIGN' (Cost=5 Card
=8142 Bytes=187266)
9 5 TABLE ACCESS (BY INDEX ROWID) OF 'ETACTIVITY' (C
ost=1 Card=11618 Bytes=1185036)
10 9 INDEX (UNIQUE SCAN) OF 'SYS_C004844872' (UNIQU
E)
11 4 TABLE ACCESS (FULL) OF 'MWEBAUTH' (Cost=6 Card=938
1 Bytes=84429)
12 3 TABLE ACCESS (BY INDEX ROWID) OF 'ETASSIGN' (Cost=2
Card=1 Bytes=8)
13 12 INDEX (UNIQUE SCAN) OF 'SYS_C004844880' (UNIQUE) (
Cost=1 Card=1)
14 3 TABLE ACCESS (BY INDEX ROWID) OF 'ETASSIGN' (Cost=2
Card=1 Bytes=8)
15 14 INDEX (UNIQUE SCAN) OF 'SYS_C004844880' (UNIQUE) (
Cost=1 Card=1)
Statistics
----------------------------------------------------------
0 recursive calls
192 db block gets
1717329 consistent gets
0 physical reads
0 redo size
9538 bytes sent via SQL*Net to client
758 bytes received via SQL*Net from client
5 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
46 rows processed
I ran the plan for same query here on their schema -
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=29 Card=1 Bytes=189)
1 0 SORT (ORDER BY) (Cost=29 Card=1 Bytes=189)
2 1 COUNT (STOPKEY)
3 2 NESTED LOOPS (Cost=27 Card=1 Bytes=189)
4 3 NESTED LOOPS (Cost=26 Card=1 Bytes=177)
5 4 NESTED LOOPS (Cost=26 Card=1 Bytes=167)
6 5 NESTED LOOPS (Cost=25 Card=1 Bytes=157)
7 6 HASH JOIN (Cost=24 Card=1 Bytes=50)
8 7 TABLE ACCESS (FULL) OF 'ETMOSTUSED' (Cost=7
Card=31 Bytes=682)
9 7 TABLE ACCESS (FULL) OF 'ETASSIGN' (Cost=11 C
ard=8599 Bytes=240772)
10 6 TABLE ACCESS (BY INDEX ROWID) OF 'ETACTIVITY'
(Cost=1 Card=9775 Bytes=1045925)
11 10 INDEX (UNIQUE SCAN) OF 'SYS_C006874' (UNIQUE
)
12 5 TABLE ACCESS (BY INDEX ROWID) OF 'ETASSIGN' (Cos
t=1 Card=10308 Bytes=103080)
13 12 INDEX (UNIQUE SCAN) OF 'SYS_C006882' (UNIQUE)
14 4 TABLE ACCESS (BY INDEX ROWID) OF 'ETASSIGN'
15 14 INDEX (UNIQUE SCAN) OF 'SYS_C006882' (UNIQUE)
16 3 TABLE ACCESS (BY INDEX ROWID) OF 'MWEBAUTH' (Cost=1
Card=10184 Bytes=122208)
17 16 INDEX (UNIQUE SCAN) OF 'SYS_C007062' (UNIQUE)
Statistics
----------------------------------------------------------
0 recursive calls
8 db block gets
555 consistent gets
0 physical reads
0 redo size
11068 bytes sent via SQL*Net to client
750 bytes received via SQL*Net from client
5 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
46 rows processed
Why are these 2 plans different.. the queries are identical !
Additional info. they are using 8.1.7 where as we are using 8.1.7.4.1 for this test.
Also their database is set to RULE based optimizer so I sent him schema login trigger to set it to COST. The plan shows that they are using CBO.
Also in their plan I see 2 differences - It says FILTER in there what is that ?
and its doing the FULL table scan of mwebauth where as here its doing the unique index scan on mwebauth.. I am puzzled if we have same schema here and at their place same data.. both have be analyzed ( gather statistics on all the tables).
How come the plans are different ?
Thanks
September 22, 2004 - 11:52 am UTC
get 10053 traces (if you don't know what those are, search for that number or see chapter 6 of my "Effective Oracle by Design")
compare them.
it'll show you at the top what optimizer parameters were in effect and you can "diff" them to see what non-standard settings they have.
Same query in two databases of same kind!
Badari, October 13, 2004 - 9:44 am UTC
Hi Tom,
I am having a query which is taking long time in the production environment. So, when I looked at the explain plan, the query is not using particular index in the production database but the same query is using the particular index in the development database.
For you information the total number of records in the production and development are more or less same number. Both the databases are using CBO and analyzed.
Can you explain why would it be like that?
Thank You!
October 13, 2004 - 10:04 am UTC
clustering factors, different init.ora settings, different session settings, statistics gathered at different times, <long list here>
if you have my book "Effective Oracle by Design" -- see chapter 6, on using the 10053 trace to compare optimizer dumps from the system (can be very telling)
if not, search for 10053 on this site and you'll see it used. you'll compare the optimizer init.ora settings (all cbo related ones will be listed there), then you can compare the statistics used (the first part of the optimizer dump will have indexes, selectivity, cardinalities, clustering factors and the like). That'll help you understand why.
You can just compare the statistics as well using "sql" -- see if the num rows, blocks, clustering factors, etc etc etc etc etc are even remotely close.
Thanks!
Badari, October 13, 2004 - 11:45 am UTC
Hi Tom,
Thank you very much for you help. Yeah, its the optimizer_index_caching parameter was setup wrongly.
Thanks again!
Query is not returning with the results.....!
Badari, October 25, 2004 - 9:38 am UTC
Hi Tom,
I have tuned one of the query and it was returning with the results in a hour time. It was the case 3 days ago.
But, the same query in the same database is not returning with the results evenafter 2 hours. Can you flash some light what could be the reason?
Well for your information no data loading or deletion has happened on this database during these 3 days.
Thank You in Advance!
Badari
October 25, 2004 - 10:51 am UTC
something changed -- the number of users for example (eg: the load on the system).
Badari, October 25, 2004 - 11:08 am UTC
Hi Tom,
I know, it is surprising, but there is nothing changed during the weekend.
What I can observe from the TOAD is that, the query got stucked at the SQL execution for ever and it stated as below.
"SQL Execution: : 19291 out of 19292 units done"
Regards,
badari
October 25, 2004 - 11:40 am UTC
you have more users, you have less ram available, you are more load, there is more io going on.
I don't use toad, makes the brain weak. I'm guessing toad is having a bug with regards to how it processes v$session_longops -- looks like it was in the full scan and is done with the full scan now, off sorting or hashing or something else.
use v$sesstat/v$session_event to see what your session is doing/waiting on
So... what to do then?
Eddy Mak, December 22, 2004 - 3:03 am UTC
Hi Tom,
Your explaination is very useful. But just confused how we should choose the optimal plan/hint among a few different plans/hints generated by EXPLAIN PLAN.
db block gets? consistent gets? physical reads? sorts (memory)? sorts (disk)? rows processed?
Any thumb of rule?
Thanks.
December 22, 2004 - 9:34 am UTC
typically, in general, it is LIO based, the less LIO the better.
however, there are exceptions -- eg: a hash join will generally do lots less LIO but might chew up more cpu -- and if cpu is a scarce resource in your system...
So, you benchmark them.
EXPLAIN PLAN
Eddy Mak, December 22, 2004 - 9:51 pm UTC
Hi Tom,
Thanks.
Does 'Less LIO' mean less 'db block gets' or 'consistent gets'?
If so, I found that the first time I executed a SQL statement, Oracle will usually have larger numbers. When I re-executed it again, Oracle will have smaller numbers. I guess Oracle cached some db blocks. So what to do?
Otherwise, if there is only EXPLAIN PLAN available, TKPROF or SQL Trace is not setup/available, can we still use the plan output to make decision? or any query we could obtain LIO information of a SQL statement?
Many thanks.
December 23, 2004 - 10:57 am UTC
the first time you probably saw 'recursive sql' as well -- sql run on your behalf to compile your sql.
take the numbers from the second run, and LIO = sum(consistent+db block gets)
autotrace in sqlplus will show you the LIO's
Thank you.
Eddy Mak, December 23, 2004 - 8:41 pm UTC
Hi Tom,
Thanks so much. It's really useful.
This was interesting thread !
Vinayak, April 28, 2005 - 11:51 pm UTC
This was really very interesting thread. I am surprised to see how many people think that ACTUAL RUNTIME is directly proportional to COST and you can compare COSTs.
I think people need to understand just one basic thing that the oracle CBO is based on mathematical principals of probability. If one is familiar with theory of probablity, he can easily understand how the CBO is actually working and why the CBO picks up a certain access plan.
I think CBO checks possible access paths for a given query and assigns relative cost to each one of them and the one with the lowest cost is MOST PROBABLY going to run in shortest time. But it again it depends on your data. We know more about data than the optimizer, so sometimes what we know about data may not be available to CBO, so it may miss out on most optimal plan.
WIth a very quick and simple example, you can see that and reproduce that:
SQL> create table t1 (c1 number, c2 number, c3 number) ;
Table created.
Elapsed: 00:00:00.07
SQL> begin
2 for i in 1..10000 loop
3 insert into t1 values(i,i,i) ;
4 end loop;
5 end;
6 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:03.48
SQL> create table t2 as select * from t1 ;
Table created.
Elapsed: 00:00:00.17
SQL> create table t3 as select * from t1 ;
Table created.
Elapsed: 00:00:00.15
SQL> create index ix1 on t1(c1,c2) ;
Index created.
Elapsed: 00:00:00.86
SQL> create index ix2 on t2(c1,c2) ;
Index created.
Elapsed: 00:00:01.17
SQL> exec dbms_stats.gather_schema_stats(user) ;
PL/SQL procedure successfully completed.
Elapsed: 00:00:02.82
SQL> SELECT COUNT(*) FROM t1,t2,t3 WHERE t1.c1=t2.c1 AND t1.c2=t2.c2 AND t2.c1=t3.c1 ;
COUNT(*)
----------
10000
Elapsed: 00:01:58.56
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=1 Bytes=20)
1 0 SORT (AGGREGATE)
2 1 NESTED LOOPS (Cost=7 Card=1 Bytes=20)
3 2 NESTED LOOPS (Cost=3 Card=1 Bytes=16)
4 3 INDEX (FULL SCAN) OF 'IX1' (NON-UNIQUE) (Cost=26 Car
d=10000 Bytes=80000)
5 3 INDEX (RANGE SCAN) OF 'IX2' (NON-UNIQUE)
6 2 TABLE ACCESS (FULL) OF 'T3' (Cost=4 Card=1 Bytes=4)
************************************************
Note the cost here is 7
************************************************
SQL> SELECT /*+ USE_HASH(t1,t2,t3) */ COUNT(*) FROM t1,t2,t3 WHERE t1.c1=t2.c1 AND t1.c2=t2.c2 AND t2.c1=t3.c1 ;
COUNT(*)
----------
10000
Elapsed: 00:00:00.20
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=11 Card=1 Bytes=20)
1 0 SORT (AGGREGATE)
2 1 HASH JOIN (Cost=11 Card=1 Bytes=20)
3 2 HASH JOIN (Cost=6 Card=1 Bytes=16)
4 3 INDEX (FULL SCAN) OF 'IX1' (NON-UNIQUE) (Cost=26 Car
d=10000 Bytes=80000)
5 3 INDEX (FULL SCAN) OF 'IX2' (NON-UNIQUE) (Cost=26 Car
d=10000 Bytes=80000)
6 2 TABLE ACCESS (FULL) OF 'T3' (Cost=4 Card=10000 Bytes=4
0000)
************************************************
Note the cost here is more, but it runs a lot
faster than the other that has less cost
************************************************
************************************************
Now, If we give more knowledge about this data
to the optimizer, it will get different costs
for various plans and then choose an optimal plan.
************************************************
SQL> exec dbms_stats.set_column_stats(user,'T1','C1',distcnt=>1,density=>1) ;
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.04
SQL> exec dbms_stats.set_column_stats(user,'T2','C1',distcnt=>1,density=>1) ;
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.03
SQL> get test
1 SELECT COUNT(*) FROM t1,t2,t3
2* WHERE t1.c1=t2.c1 AND t1.c2=t2.c2 AND t2.c1=t3.c1
SQL> /
COUNT(*)
----------
10000
Elapsed: 00:00:00.36
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=34 Card=1 Bytes=20)
1 0 SORT (AGGREGATE)
2 1 MERGE JOIN (Cost=34 Card=10000 Bytes=200000)
3 2 NESTED LOOPS (Cost=3 Card=10000 Bytes=160000)
4 3 INDEX (FULL SCAN) OF 'IX2' (UNIQUE) (Cost=26 Card=10
000 Bytes=80000)
5 3 INDEX (UNIQUE SCAN) OF 'IX1' (UNIQUE)
6 2 SORT (JOIN) (Cost=31 Card=10000 Bytes=40000)
7 6 TABLE ACCESS (FULL) OF 'T3' (Cost=10 Card=10000 Byte
s=40000)
************************************************
See, it now runs much much faster than the
original query (plan)
************************************************
While looking for different access plans you should try to see what value for the "cardinality" the optimizer is picking. It should closely match with the actual values. Then you will get the most optimal plans.
Performance Problem
Your Fan From India, May 19, 2005 - 5:11 am UTC
Hi Tom,
I want to know whether restarting a database will really improve the performance as recently we have experienced in our project as one of the procedure was taking 10 Hrs and after restarting the database now it is taking 6Hrs under same conditions..
Thanks.
May 19, 2005 - 8:06 am UTC
restarting twice is even better ;)
seriously, you fixed something, you had a process running perhaps that was consuming large amounts of resources. by restarting, you cleaned it out (for example). There are lots of examples.
Conditions are "not the same", something is different (else, it would be the same)
No, I would not recommend restarting a database for "performance" -- I would recommend figuring out "why"
Performance Problem
Same Fan From India, May 20, 2005 - 3:17 am UTC
Hi Tom,
Seriously i have not understood what are you telling in the above said matter.
Please can you explain in more detail..
Thanks
May 20, 2005 - 8:00 am UTC
most likely "guess"
you had some process running away in the database, consuming resources.
bouncing the instance killed this process of yours
performance was fixed because this process is no longer running.
but, we can only "guess" given the supplied information.
how does oracle rewrote this?!
A reader, September 14, 2005 - 10:37 am UTC
Hi
I have this query
Select N_CONCN, P_VENTA, CODPRD, FECTUR, LTS, NUMTUR, CODTAN
From GCC_VNTCAR_PDC
Where Codtan IS NOT NULL;
GCC_VNTCAR_PDC is a plain table, when I look the explain plan it does this
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 221K| 8856K| 1274 |
| 1 | TABLE ACCESS BY INDEX ROWID| GCC_VNTCAR_PDC | 221K| 8856K| 1274 |
|* 2 | INDEX RANGE SCAN | IDX_VNTCAR_ORG | 221K| | 433 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
2 - access("GCC_VNTCAR_PDC"."ID_ORG"='03')
Note: cpu costing is off
SQL> select column_name from dba_ind_columns
2 where index_name = 'IDX_VNTCAR_ORG';
COLUMN_NAME
------------
ID_ORG
The query´s predicate is however Codtan IS NOT NULL, how does Oracle know it has to use the index on another column...?
September 14, 2005 - 11:27 am UTC
does it reproduce - like from a cut and paste
can you give me a way to reproduce it on my machine.
(is there VPD on this perhaps?)
ops$tkyte@ORA9IR2> create table t ( id int, x int );
Table created.
ops$tkyte@ORA9IR2> create index t_idx on t(id);
Index created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create or replace
2 function my_security_function( p_schema in varchar2,
3 p_object in varchar2 ) return varchar2
4 as
5 begin
6 return 'id = 3';
7 end;
8 /
Function created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> begin
2 dbms_rls.add_policy
3 ( object_schema => user,
4 object_name => 'T',
5 policy_name => 'MY_POLICY',
6 function_schema => user,
7 policy_function => 'My_Security_Function',
8 statement_types => 'select, insert, update, delete' ,
9 update_check => TRUE );
10 end;
11 /
PL/SQL procedure successfully completed.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> @plan "select * from t where x is null"
ops$tkyte@ORA9IR2> delete from plan_table;
3 rows deleted.
ops$tkyte@ORA9IR2> explain plan for &1;
old 1: explain plan for &1
new 1: explain plan for select * from t where x is null
Explained.
ops$tkyte@ORA9IR2> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | |
|* 1 | TABLE ACCESS BY INDEX ROWID| T | | | |
|* 2 | INDEX RANGE SCAN | T_IDX | | | |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("T"."X" IS NULL)
2 - access("T"."ID"=3)
Note: rule based optimization
16 rows selected.
ah yes it does have policies
A reader, September 14, 2005 - 12:21 pm UTC
Hi Tom
You are correct, there are policies!
Thank you!
decision making between less LIOs and waits
Narendra, November 09, 2005 - 6:10 am UTC
Hi tom,
I have following 2 tables
SQL> desc big_table
Name Null? Type
-------------------- -------- ------------
ID NOT NULL NUMBER => Primary Key
OWNER VARCHAR2(30)
OBJECT_NAME VARCHAR2(30)
SUBOBJECT_NAME VARCHAR2(30)
OBJECT_ID NUMBER
DATA_OBJECT_ID NUMBER
OBJECT_TYPE VARCHAR2(18)
CREATED DATE
LAST_DDL_TIME DATE
TIMESTAMP VARCHAR2(19)
STATUS VARCHAR2(7)
TEMPORARY VARCHAR2(1)
GENERATED VARCHAR2(1)
SECONDARY VARCHAR2(1)
SQL> desc chld
Name Null? Type
-------------------- -------- -------
C_ID NOT NULL NUMBER => Primary Key
ID NUMBER => FK BIG_TABLE(ID)
STATUS CHAR(1)
Following is TKProf output of a query with different hints.
Can you please guide as which one should be considered for scalability & best response time?
declare
ln_cnt number;
ln_start number;
begin
ln_start := dbms_utility.get_time;
select count(*)
into ln_cnt
from chld c, big_table p
where c.id = p.id
and c.status in ('P','A','Y');
dbms_output.put_line('Count with Join ' || to_char(ln_cnt) || ' in ' || (dbms_utility.get_time - ln_start));
--
ln_start := dbms_utility.get_time;
select /*+ USE_NL(p c) */ count(*)
into ln_cnt
from chld c, big_table p
where c.id = p.id
and c.status in ('P','A','Y');
dbms_output.put_line('Count with Join(NL Hint) ' || to_char(ln_cnt) || ' in ' || (dbms_utility.get_time - ln_start));
--
ln_start := dbms_utility.get_time;
select /*+ USE_HASH(p c) */ count(*)
into ln_cnt
from chld c, big_table p
where c.id = p.id
and c.status in ('P','A','Y');
dbms_output.put_line('Count with Join(HASH HINT) ' || to_char(ln_cnt) || ' in ' || (dbms_utility.get_time - ln_start));
--
ln_start := dbms_utility.get_time;
select /*+ USE_MERGE(p c) */ count(*)
into ln_cnt
from chld c, big_table p
where c.id = p.id
and c.status in ('P','A','Y');
dbms_output.put_line('Count with Join(MERGE HINT) ' || to_char(ln_cnt) || ' in ' || (dbms_utility.get_time - ln_start));
--
end;
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.00 0 0 0 0
Execute 1 0.00 0.12 0 0 0 1
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.01 0.13 0 0 0 1
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 64
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 1 0.00 0.00
SQL*Net message from client 1 0.00 0.00
********************************************************************************
SELECT count(*)
from chld c, big_table p
where c.id = p.id
and c.status in ('P','A','Y')
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.01 0 0 0 0
Fetch 1 1.26 3.49 522 250681 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 1.26 3.50 522 250681 0 1
Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 64 (recursive depth: 1)
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE
249926 NESTED LOOPS
249926 TABLE ACCESS FULL CHLD
249926 INDEX UNIQUE SCAN BIG_TABLE_PK (object id 31267)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file scattered read 78 0.02 0.61
db file sequential read 12 0.02 0.11
********************************************************************************
SELECT /*+ USE_NL(p c) */ count(*)
from chld c, big_table p
where c.id = p.id
and c.status in ('P','A','Y')
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 1.31 1.83 522 250681 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 1.31 1.83 522 250681 0 1
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 64 (recursive depth: 1)
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE
249926 NESTED LOOPS
249926 TABLE ACCESS FULL CHLD
249926 INDEX UNIQUE SCAN BIG_TABLE_PK (object id 31267)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file scattered read 78 0.03 0.43
db file sequential read 12 0.00 0.01
********************************************************************************
SELECT /*+ USE_HASH(p c) */ count(*)
from chld c, big_table p
where c.id = p.id
and c.status in ('P','A','Y')
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.35 1.57 1194 1177 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 0.35 1.58 1194 1177 0 1
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 64 (recursive depth: 1)
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE
249926 HASH JOIN
249926 TABLE ACCESS FULL CHLD
200000 INDEX FAST FULL SCAN BIG_TABLE_PK (object id 31267)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file scattered read 78 0.05 0.43
direct path write 96 0.02 0.12
db file sequential read 12 0.01 0.04
direct path read 96 0.03 0.31
********************************************************************************
SELECT /*+ USE_MERGE(p c) */ count(*)
from chld c, big_table p
where c.id = p.id
and c.status in ('P','A','Y')
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 1.12 3.24 1264 1171 11 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 1.12 3.25 1264 1171 11 1
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 64 (recursive depth: 1)
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE
249926 MERGE JOIN
200000 INDEX FULL SCAN BIG_TABLE_PK (object id 31267)
249926 SORT JOIN
249926 TABLE ACCESS FULL CHLD
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file scattered read 78 0.12 1.00
db file sequential read 12 0.03 0.05
direct path write 4 0.00 0.00
direct path read 58 0.04 0.69
********************************************************************************
BEGIN DBMS_OUTPUT.GET_LINES(:LINES, :NUMLINES); END;
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 1
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.00 0.00 0 0 0 1
Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 64
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 1 0.00 0.00
SQL*Net message from client 1 3.13 3.13
November 10, 2005 - 5:23 pm UTC
did you analyze this table.
Records
Narendra, November 09, 2005 - 6:12 am UTC
Tom,
BIG_TABLE has 200000 records and CHLD table has 300000 records.
Missed out version
Narendra, November 09, 2005 - 7:10 am UTC
Sorry Tom,
The oracle version I am using is 9.2.0.1.0
Thanks
Hi
Narendra, November 10, 2005 - 11:55 pm UTC
Tom,
Yes. I did analyze the tables. Even after that, optimizer is choosing nested loops plan that generates more LIOs than HASH JOIN plan which has less LIOs but more physical reads. Also there are WAIT events for HASH JOIN approach for DIRECT PATH READ and DIRECT PATH WRITE that do not exist for than NESTED LOOP approach.
Thanks
November 12, 2005 - 8:12 am UTC
are the cardinalities in an "autotrace traceonly explain" output close to reality (compare the plan to the actuals in the tkprof).
what non-standard init.ora optimizer parameters do you have
Cardinalities in Autotrace
Narendra, November 14, 2005 - 12:23 am UTC
Tom,
Here is the output on SET AUTOTRACE TRACEONLY EXPLAIN'.
SQL> select count(*) from chld c, big_table p where c.id = p.id and c.status in ('P','A','Y') ;
Elapsed: 00:00:00.00
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=74 Card=1 Bytes=13)
1 0 SORT (AGGREGATE)
2 1 NESTED LOOPS (Cost=74 Card=3000 Bytes=39000)
3 2 TABLE ACCESS (FULL) OF 'CHLD' (Cost=74 Card=3000 Bytes=24000)
4 2 INDEX (UNIQUE SCAN) OF 'BIG_TABLE_PK' (UNIQUE)
SQL> select /*+ USE_HASH(c p) */ count(*) from chld c, big_table p where c.id = p.id and c.status in ('P','A','Y') ;
Elapsed: 00:00:00.00
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=128 Card=1 Bytes=13)
1 0 SORT (AGGREGATE)
2 1 HASH JOIN (Cost=128 Card=3000 Bytes=39000)
3 2 TABLE ACCESS (FULL) OF 'CHLD' (Cost=74 Card=3000 Bytes=24000)
4 2 INDEX (FAST FULL SCAN) OF 'BIG_TABLE_PK' (UNIQUE) (Cost=42 Card=200000 Bytes=1000000)
I have not altered default value for any initialization parameters. Do you want details of any particular parameters or all?
Thanks
November 14, 2005 - 8:43 am UTC
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE
249926 NESTED LOOPS
249926 TABLE ACCESS FULL CHLD
249926 INDEX UNIQUE SCAN BIG_TABLE_PK (object id 31267)
The optimizer thought 3,000 rows (autotrace), reality said 249,926 rows.
This could be a case whereby histograms are something to consider on this column in order for the optimizer to better understand how the data is distributed by status. Consider:
ops$tkyte@ORA9IR2> create table t
2 as
3 select decode( mod(rownum,5), 0, 'P', 1, 'A', 2, 'Y', 3, 'X', 4, 'Z' ) xstatus,
4 all_objects.*
5 from all_objects;
Table created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T' );
PL/SQL procedure successfully completed.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> set autotrace traceonly explain
ops$tkyte@ORA9IR2> select * from t where xstatus in ( 'P', 'A', 'Y' );
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=61 Card=16870 Bytes=1602650)
1 0 TABLE ACCESS (FULL) OF 'T' (Cost=61 Card=16870 Bytes=1602650)
ops$tkyte@ORA9IR2> set autotrace off
ops$tkyte@ORA9IR2> select count(*) from t where xstatus in ( 'P', 'A', 'Y' );
COUNT(*)
----------
16870
<b>there it got it right - but only because:
ops$tkyte@ORA9IR2> select 3/5*28116 from dual;
3/5*28116
----------
16869.6
the data was evenly distributed - had 5 values - and I had 28,116 rows in T (we got 3/5ths of the data....
But watch:</b>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> update t set xstatus = lower(xstatus);
28116 rows updated.
ops$tkyte@ORA9IR2> commit;
Commit complete.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T' );
PL/SQL procedure successfully completed.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> set autotrace traceonly explain
ops$tkyte@ORA9IR2> select * from t where xstatus in ( 'P', 'A', 'Y' );
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=61 Card=16870 Bytes=1602650)
1 0 TABLE ACCESS (FULL) OF 'T' (Cost=61 Card=16870 Bytes=1602650)
ops$tkyte@ORA9IR2> set autotrace off
ops$tkyte@ORA9IR2> select count(*) from t where xstatus in ( 'P', 'A', 'Y' );
COUNT(*)
----------
0
<b>it still thinks 3/5ths, insufficient details on the values in the xstatus column:</b>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T', method_opt=>'for columns xstatus size 254' );
PL/SQL procedure successfully completed.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> set autotrace traceonly explain
ops$tkyte@ORA9IR2> select * from t where xstatus in ( 'P', 'A', 'Y' );
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=61 Card=1 Bytes=95)
1 0 TABLE ACCESS (FULL) OF 'T' (Cost=61 Card=1 Bytes=95)
ops$tkyte@ORA9IR2> set autotrace off
ops$tkyte@ORA9IR2> select count(*) from t where xstatus in ( 'P', 'A', 'Y' );
COUNT(*)
----------
0
<b>Now it gets it right. Before just adding a histogram to the mix - understand that this will likely change other query plans as well! make sure to test this out looking for any adverse affects</b>
Cardinality
Narendra, November 14, 2005 - 11:50 pm UTC
Tom,
Appreciate you the most for your this ability to trace the problem and provide fix. That fix was bang on target to address the cardinality issue.
However, after doing that, CBO is still choosing NESTED LOOP approach to run the query. Hers is AUTOTRACE output:
SQL> select count(*) from chld c, big_table p where c.id = p.id and c.status in ('P','A','Y') ;
COUNT(*)
----------
250125
Elapsed: 00:00:01.08
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=74 Card=1 Bytes=12)
1 0 SORT (AGGREGATE)
2 1 NESTED LOOPS (Cost=74 Card=250125 Bytes=3001500)
3 2 TABLE ACCESS (FULL) OF 'CHLD' (Cost=74 Card=250125 Bytes=1750875)
4 2 INDEX (UNIQUE SCAN) OF 'BIG_TABLE_PK' (UNIQUE)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
250880 consistent gets
748 physical reads
0 redo size
381 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
With HASH JOIN hint
SQL> select /*+ USE_HASH(c p) */ count(*) from chld c, big_table p where c.id = p.id and c.status in ('P','A','Y') ;
COUNT(*)
----------
250125
Elapsed: 00:00:02.00
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=447 Card=1 Bytes=12)
1 0 SORT (AGGREGATE)
2 1 HASH JOIN (Cost=447 Card=250125 Bytes=3001500)
3 2 INDEX (FAST FULL SCAN) OF 'BIG_TABLE_PK' (UNIQUE) (Cost=42 Card=200000 Bytes=1000000)
4 2 TABLE ACCESS (FULL) OF 'CHLD' (Cost=74 Card=250125 Bytes=1750875)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
1177 consistent gets
1476 physical reads
0 redo size
381 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
1. Can I draw a conclusion from this that CBO chose the execution based PURELY on cost, which is 447 for HASH JOIN and 74 for NESTED LOOPS, as mentioned in AUTOTRACE output? But on one of your followups, I read not to bother about cost.
2. Which plan will scale better?
3. You said other query plans might change due to this way of gathering statistics. On what basis, should we decide which approach should be taken to gather statistics in an OLTP environment, if ORACLE does not maintain the same correctly, as in this case?
Can you please throw some light on the same?
Thanks
November 15, 2005 - 8:35 am UTC
1) the cost is the thing that drives the CBO, it is what it optimizes towards. You and I looking at that cost? No, the optimizer - YES.
2) the hash one likely.
3) this i believe to be an optimizer "issue" with the costing of the primary key. consider:
ops$tkyte@ORA9IR2> @big_table 200000
ops$tkyte@ORA9IR2> create table big_table
2 as
3 select rownum id, a.*
4 from all_objects a
5 where 1=0
6 /
Table created.
ops$tkyte@ORA9IR2> alter table big_table nologging;
Table altered.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> declare
2 l_cnt number;
3 l_rows number := &1;
4 begin
5 insert /*+ append */
6 into big_table
7 select rownum, a.*
8 from all_objects a
9 where rownum <= &1;
10
11 l_cnt := sql%rowcount;
12
13 commit;
14
15 while (l_cnt < l_rows)
16 loop
17 insert /*+ APPEND */ into big_table
18 select rownum+l_cnt,
19 OWNER, OBJECT_NAME, SUBOBJECT_NAME,
20 OBJECT_ID, DATA_OBJECT_ID,
21 OBJECT_TYPE, CREATED, LAST_DDL_TIME,
22 TIMESTAMP, STATUS, TEMPORARY,
23 GENERATED, SECONDARY
24 from big_table
25 where rownum <= l_rows-l_cnt;
26 l_cnt := l_cnt + sql%rowcount;
27 commit;
28 end loop;
29 end;
30 /
old 3: l_rows number := &1;
new 3: l_rows number := 200000;
old 9: where rownum <= &1;
new 9: where rownum <= 200000;
PL/SQL procedure successfully completed.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> alter table big_table add constraint
2 big_table_pk primary key(id)
3 /
Table altered.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> begin
2 dbms_stats.gather_table_stats
3 ( ownname => user,
4 tabname => 'BIG_TABLE',
5 method_opt => 'for all indexed columns',
6 cascade => TRUE );
7 end;
8 /
PL/SQL procedure successfully completed.
ops$tkyte@ORA9IR2> select count(*) from big_table;
COUNT(*)
----------
200000
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create table chld
2 as
3 select rownum c_id, id,
4 case when mod(rownum,30) <= 8 then 'P'
5 when mod(rownum,30) <= 16 then 'A'
6 when mod(rownum,30) <= 23 then 'Y'
7 else 'Z'
8 end status
9 from big_table;
Table created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> insert into chld
2 select rownum+200000 c_id, id,
3 case when mod(rownum,30) <= 8 then 'P'
4 when mod(rownum,30) <= 16 then 'A'
5 when mod(rownum,30) <= 23 then 'Y'
6 else 'Z'
7 end status
8 from big_table
9 where rownum <= 100000;
100000 rows created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> alter table chld add constraint chld_pk primary key(c_id);
Table altered.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'CHLD', method_opt=>'for columns status size 254', cascade=>true );
PL/SQL procedure successfully completed.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> set autotrace traceonly explain
ops$tkyte@ORA9IR2> SELECT count(*)
2 from chld c, big_table p
3 where c.id = p.id
4 and c.status in ('P','A','Y');
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=75 Card=1 Bytes=16)
1 0 SORT (AGGREGATE)
2 1 NESTED LOOPS (Cost=75 Card=240006 Bytes=3840096)
3 2 TABLE ACCESS (FULL) OF 'CHLD' (Cost=75 Card=240006 Bytes=2640066)
4 2 INDEX (UNIQUE SCAN) OF 'BIG_TABLE_PK' (UNIQUE)
ops$tkyte@ORA9IR2> set autotrace off
ops$tkyte@ORA9IR2> alter table big_table drop constraint big_table_pk;
Table altered.
ops$tkyte@ORA9IR2> create index big_table_pk_nonunique on big_table(id) compute statistics;
Index created.
ops$tkyte@ORA9IR2> set autotrace traceonly explain
ops$tkyte@ORA9IR2> SELECT count(*)
2 from chld c, big_table p
3 where c.id = p.id
4 and c.status in ('P','A','Y');
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=329 Card=1 Bytes=14)
1 0 SORT (AGGREGATE)
2 1 HASH JOIN (Cost=329 Card=240006 Bytes=3360084)
3 2 INDEX (FAST FULL SCAN) OF 'BIG_TABLE_PK_NONUNIQUE' (NON-UNIQUE) (Cost=44 Card=200000 Bytes=600
000)
4 2 TABLE ACCESS (FULL) OF 'CHLD' (Cost=75 Card=240006 Bytes=2640066)
ops$tkyte@ORA9IR2> set autotrace off
Thanks
Narendra, November 15, 2005 - 11:36 pm UTC
Tom,
Thank you for those precise answers. I am not sure whether this kind of issue may occur in live environment. But tables that I used are similar to those which I had used in one of my projects. These were 2 tables which were frequently queried in some routines by joining them. There was a general perception (senior people thinking) that joining of 2 large tables (about 50000 rows of BIG_TABLE to 200000 rows in CHLD table) must be causing more execution time resulting in delay in response time. However, I still feel that is not so large volume that should force any process to run for hours. I know there are n number of factors affecting the performance. But with a windows machine having 512 MB RAM and having single CPU for oracle server, I strongly feel the processing should get over in seconds, not even in minutes. However, nobody seems to be convinced about the same. I think the only way to prove that is by giving evidance (as you do).
Thanks once again and will keep coming back to you with more queries.
November 16, 2005 - 8:55 am UTC
sql_trace and tkprof are your friends, you should use them to see where you are spending your time.
How to use cost to compare 2 queries?
Jack Douglas, January 23, 2006 - 1:44 pm UTC
Hi Tom,
This is a long thread and I didn't read it all, so I apologise if I am going over ground already covered.
I have used the following technique to relate the cost of 2 queries by making them into 1 query. In this case I am comparing the relative costs of the queries with different values of db_file_multiblock_read_count.
Is this a valid technique?
Thanks,
Jack
SQL*Plus: Release 10.2.0.1.0 - Production on Mon Jan 23 18:32:35 2006
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Connected to:
Oracle Database 10g Release 10.2.0.1.0 - 64bit Production
TRACKER@new> set timing off
TRACKER@new> create table t1 as select owner, table_name from dba_tables;
Table created.
TRACKER@new> alter table t1 add constraint pk_t1 primary key (owner, table_name);
Table altered.
TRACKER@new> analyze table t1 compute statistics;
Table analyzed.
TRACKER@new> create table t2 as select owner, table_name, column_name from dba_tab_columns;
Table created.
TRACKER@new> alter table t2 add constraint pk_t2 primary key (owner, table_name, column_name);
Table altered.
TRACKER@new> analyze table t2 compute statistics;
Table analyzed.
TRACKER@new> create or replace view v as
2 select /*+ ordered full (t1) full (t2) */ count(*) as a from t1 natural join t2
3 union all
4 select /*+ ordered use_nl (t2) */ count (*) from t1 natural join t2;
View created.
TRACKER@new> alter session set db_file_multiblock_read_count = 1;
Session altered.
TRACKER@new> explain plan for select * from v;
Explained.
TRACKER@new> select * from table (dbms_xplan.display);
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------
Plan hash value: 1644421090
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 26 | 2168 (1)| 00:00:27 |
| 1 | VIEW | V | 2 | 26 | 2168 (1)| 00:00:27 |
| 2 | UNION-ALL | | | | | |
| 3 | SORT AGGREGATE | | 1 | 44 | | |
|* 4 | HASH JOIN | | 52119 | 2239K| 298 (1)| 00:00:04 |
| 5 | TABLE ACCESS FULL | T1 | 1849 | 42527 | 13 (0)| 00:00:01 |
| 6 | TABLE ACCESS FULL | T2 | 52119 | 1068K| 285 (1)| 00:00:04 |
| 7 | SORT AGGREGATE | | 1 | 44 | | |
| 8 | NESTED LOOPS | | 52119 | 2239K| 1870 (1)| 00:00:23 |
| 9 | INDEX FAST FULL SCAN| PK_T1 | 1849 | 42527 | 10 (0)| 00:00:01 |
|* 10 | INDEX RANGE SCAN | PK_T2 | 28 | 588 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("T1"."TABLE_NAME"="T2"."TABLE_NAME" AND
"T1"."OWNER"="T2"."OWNER")
10 - access("T1"."OWNER"="T2"."OWNER" AND
"T1"."TABLE_NAME"="T2"."TABLE_NAME")
25 rows selected.
TRACKER@new> alter session set db_file_multiblock_read_count = 16;
Session altered.
TRACKER@new> explain plan for select * from v;
Explained.
TRACKER@new> select * from table (dbms_xplan.display);
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------
Plan hash value: 1644421090
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 26 | 1933 (1)| 00:00:24 |
| 1 | VIEW | V | 2 | 26 | 1933 (1)| 00:00:24 |
| 2 | UNION-ALL | | | | | |
| 3 | SORT AGGREGATE | | 1 | 44 | | |
|* 4 | HASH JOIN | | 52119 | 2239K| 69 (2)| 00:00:01 |
| 5 | TABLE ACCESS FULL | T1 | 1849 | 42527 | 4 (0)| 00:00:01 |
| 6 | TABLE ACCESS FULL | T2 | 52119 | 1068K| 65 (2)| 00:00:01 |
| 7 | SORT AGGREGATE | | 1 | 44 | | |
| 8 | NESTED LOOPS | | 52119 | 2239K| 1864 (1)| 00:00:23 |
| 9 | INDEX FAST FULL SCAN| PK_T1 | 1849 | 42527 | 4 (0)| 00:00:01 |
|* 10 | INDEX RANGE SCAN | PK_T2 | 28 | 588 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("T1"."TABLE_NAME"="T2"."TABLE_NAME" AND
"T1"."OWNER"="T2"."OWNER")
10 - access("T1"."OWNER"="T2"."OWNER" AND
"T1"."TABLE_NAME"="T2"."TABLE_NAME")
25 rows selected.
January 23, 2006 - 11:02 pm UTC
I say "no", not really.
You changed parameters - inputs to a mathematical model.
I can take a query.
I can get the cost for it.
I can make the cost go up or down (without changing the plan and without changing the 'access' - eg multiblock read count) by changing parameters.
does query go faster? maybe yes, maybe no, maybe I don't know.
I do not compare costs of two queries like that.
You might want to get Jonathan Lewis's book (linked to from home page here) to get a good understanding of the CBO and how the costing works.
Thanks
Jack Douglas, January 24, 2006 - 3:12 am UTC
Thanks Tom,
You input is valued as always - I will get hold of that book ASAP.
I didn't explain that I wasn't comparing the cost between the two queries - rather the relative cost of each side of the union. I thought I had proved that the full scan became relatively less expensive with a higher multiblock_io count (285/(1870-10)) > (65/1864-4).
Anyway - I definitely take your point about the costs. I've been using them in ways I shouldn't for quite a while, and with hindsight I can see it has caused a lot of confusion in my poor head :-)
Jack
January 24, 2006 - 8:10 am UTC
you showed that as you take the multiblock read count up and down - the cost of performing a full scan will go down and up. The cost assigned by Oracle will vary as you vary the inputs to the mathematical model that is the CBO.
You did show that - yes.
cost from Jonathan´s view
A reader, January 25, 2006 - 1:16 am UTC
Hi
I acquired Jonathan Lewis latest book about CBO, he states that the cost in 8.1.7 is basically the number of I/O which Oracle will perform, I think he even supplied the hidden function you have mentioned here. He further says if the cost is NOT the number of I/O then there must be something wrong with the statistics what he calls GIGO, Garbage In Garbage Out.
So cost at the end of the day CAN be used by us as well and not only the optimizer!
January 25, 2006 - 1:27 pm UTC
in a perfect world, you would be correct. problem is, that number is not always accurate. That would be what he means by GIGO
Confused
Oracle_Student, September 29, 2006 - 3:08 am UTC
Tom,
I read and reread the entire link and all I could understand is, nobody has really understood what to check while tuning the query. Till now thought cost and time are related. It seems it's not true.
Even now I'm not convinced if someone says they are not related because if they are not related why does explain show that.
All I want to ask is, if one cannot determine or evaluate on the basis of cost and time. Why can't it be removed from explain plan output?
Secondly, the explain shows random time value but in reality it does not take so much of time to execute the query. I referred to Oracle manuals but I never got any proper explaination as what this TIME mean?
I would appreciate if you explain it to me.
September 29, 2006 - 8:11 am UTC
because it helps us understand how oracle costs the queries - that is all. ignore it if you like.
I'm trying to be very clear when I say:
you are seeing guesses
you are seeing estimates
you are seeing the results of a mathematical model
a sometimes imperfect model
working with less than perfect inputs in many cases
see
</code>
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:8764517459743 <code>
also
Khurram Siddiqui From Pakistan,KHI
Khurram, February 13, 2007 - 12:52 am UTC
Tom I have read yours this thread may be i missed some point but still unable to understand,I have two query
which is fetching some records from table say INS_ASSORTED.This table has primary key column ASSORTED_CODE,
When i retrieve the data for PK column as well non indexed column DOCUMENT_CODE,ENT_BY ,optimizer do full scan ok,
as well when i retrieve only PK column ASSORTED_CODE then reading the index is faster because there is less physical
space to traverse.
But both plan have some diffrence in COST and TIME,Conceptually the second query will return faster result
as its reading only index column and the PLAN itself showing the lower cost (for which you are against to measure performance) as well as TIME.
My question is these plan in my hand then how will i decide which is more faster apart from considering the INDEX column,should i measure TIME or COST?
SQL> DELETE FROM plan_table
2 /
2 rows deleted.
SQL> COMMIT;
Commit complete.
SQL> EXPLAIN PLAN FOR SELECT assorted_code,document_code,ent_by FROM ins_assorted
2 /
Explained.
SQL> SET LINE 300
SQL> SELECT * FROM TABLE(dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------
Plan hash value: 2493300600
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9011 | 202K| 180 (1)| 00:00:03 |
| 1 | TABLE ACCESS FULL| INS_ASSORTED | 9011 | 202K| 180 (1)| 00:00:03 |
----------------------------------------------------------------------------------
8 rows selected.
SQL> DELETE FROM plan_table
2 /
2 rows deleted.
SQL> COMMIT;
Commit complete.
SQL> EXPLAIN PLAN FOR SELECT assorted_code FROM ins_assorted
2 /
Explained.
SQL> SELECT * FROM TABLE(dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------
Plan hash value: 194131136
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9011 | 114K| 8 (0)| 00:00:01 |
| 1 | INDEX FAST FULL SCAN| SYS_C00109334 | 9011 | 114K| 8 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
8 rows selected.
Khurram
prem, May 14, 2007 - 8:45 am UTC
Hi Tom,
I read through your article tuning CBO using sqlplus autotrace utility. I have some queries to ponder
1) can a query with different columns in the sql query can have same hash value? If "yes" how?
2) compare the two query below and one query is taking 3 minutes and 11 sec and the another one is taking 25 sec how?
I am running oracle 10.2.0.3(windows 2000 server)
SQL> SELECT CUST.CUSTOMER_NAME, TO_DATE(TO_CHAR(TIME_ACCESSED_LAST,'DD-MON-YY'),'DD-MON-YY') REPORT_DATE, CIRC.CIRCLE_NAME, COUNT( USAGEREP.MOBILE_NUM
BER) SUBSCRIBER FROM
2 x@y USAGEREP , CUSTOMER_MASTER CUST , CIRCLE_MASTER CIRC
3 WHERE USAGEREP.TIME_ACCESSED_LAST >= TO_DATE( '01-APR-2007','DD-MON-YY') AND USAGEREP.TIME_ACCESSED_LAST <(TO_DATE( '30-APR-2007','DD-MON-YY')+1)
AND CIRC.CIRCLE_ID IN ( 1,2,3,4,5,6,7,8,9,10 ) AND USAGEREP.APPS_ACCESSED LIKE '%ONSPIRE%' AND
4 CUST.CUSTOMER_ID = CIRC.CUSTOMER_ID AND CIRC.CIRCLE_ID = USAGEREP.CIRCLE_ID GROUP BY CUSTOMER_NAME, CIRCLE_NAME, TO_DATE(TO_CHAR(TIME_ACCESSED_LA
ST,'DD-MON-YY'),'DD-MON-YY') ORDER BY CUSTOMER_NAME, CIRCLE_NAME, TO_DATE(TO_CHAR(TIME_ACCESSED_LAST,'DD-MON-YY'),'DD-MON-YY');
208 rows selected.
Elapsed: 00:03:11.78
Execution Plan
----------------------------------------------------------
Plan hash value: 1043666463
------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Inst |IN-OUT|
------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 169 | 32617 | 1011 (2)| 00:00:15 | | |
| 1 | SORT ORDER BY | | 169 | 32617 | 1011 (2)| 00:00:15 | | |
| 2 | HASH GROUP BY | | 169 | 32617 | 1011 (2)| 00:00:15 | | |
| 3 | NESTED LOOPS | | 169 | 32617 | 1010 (1)| 00:00:15 | | |
|* 4 | HASH JOIN | | 10 | 280 | 5 (20)| 00:00:01 | | |
| 5 | INLIST ITERATOR | | | | | | | |
| 6 | TABLE ACCESS BY INDEX ROWID| CIRCLE_MASTER | 10 | 190 | 2 (0)| 00:00:01 | | |
|* 7 | INDEX RANGE SCAN | CIRCLE_MASTER_PK | 10 | | 1 (0)| 00:00:01 | | |
| 8 | TABLE ACCESS FULL | CUSTOMER_MASTER | 16 | 144 | 2 (0)| 00:00:01 | | |
| 9 | REMOTE | X | 17 | 2805 | 341 (1)| 00:00:05 | Y | R->S |
------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("CUST"."CUSTOMER_ID"="CIRC"."CUSTOMER_ID")
7 - access("CIRC"."CIRCLE_ID"=1 OR "CIRC"."CIRCLE_ID"=2 OR "CIRC"."CIRCLE_ID"=3 OR "CIRC"."CIRCLE_ID"=4 OR
"CIRC"."CIRCLE_ID"=5 OR "CIRC"."CIRCLE_ID"=6 OR "CIRC"."CIRCLE_ID"=7 OR "CIRC"."CIRCLE_ID"=8 OR "CIRC"."CIRCLE_ID"=9
OR "CIRC"."CIRCLE_ID"=10)
Remote SQL Information (identified by operation id):
----------------------------------------------------
9 - SELECT "MOBILE_NUMBER","CIRCLE_ID","TIME_ACCESSED_LAST","APPS_ACCESSED" FROM "X" "USAGEREP"
WHERE "TIME_ACCESSED_LAST">=TIMESTAMP'2007-04-01 00:00:00' AND "APPS_ACCESSED" LIKE '%ONSPIRE%' AND ("CIRCLE_ID"=1 OR
"CIRCLE_ID"=2 OR "CIRCLE_ID"=3 OR "CIRCLE_ID"=4 OR "CIRCLE_ID"=5 OR "CIRCLE_ID"=6 OR "CIRCLE_ID"=7 OR "CIRCLE_ID"=8
OR "CIRCLE_ID"=9 OR "CIRCLE_ID"=10) AND "TIME_ACCESSED_LAST"<TIMESTAMP'2007-05-01 00:00:00' AND :1="CIRCLE_ID"
(accessing 'Y' )
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
24 consistent gets
0 physical reads
0 redo size
7124 bytes sent via SQL*Net to client
543 bytes received via SQL*Net from client
15 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
208 rows processed
SQL>
SQL>
SQL> SELECT CUST.CUSTOMER_NAME, TO_DATE(TO_CHAR(TIME_ACCESSED_FIRST,'DD-MON-YY'),'DD-MON-YY') REPORT_DATE, CIRC.CIRCLE_NAME, COUNT( USAGEREP.MOBILE_NU
MBER) SUBSCRIBER FROM x@y USAGEREP , CUSTOMER_MASTER CUST , CIRCLE_MASTER CIRC WHERE USAGEREP.TIME_ACCESSED_FIR
ST >= TO_DATE( '01-APR-2007','DD-MON-YY') AND USAGEREP.TIME_ACCESSED_FIRST < (TO_DATE( '30-APR-2007','DD-MON-YY')+1) AND CIRC.CIRCLE_ID IN ( 1,2,3,4,5
,6,7,8,9,10 ) AND USAGEREP.APPS_ACCESSED LIKE '%ONSPIRE%' AND
2 CUST.CUSTOMER_ID = CIRC.CUSTOMER_ID AND CIRC.CIRCLE_ID = USAGEREP.CIRCLE_ID GROUP BY CUSTOMER_NAME, CIRCLE_NAME, TO_DATE(TO_CHAR(TIME_ACCESSED_F
IRST,'DD-MON-YY'),'DD-MON-YY')
3 ORDER BY CUSTOMER_NAME, CIRCLE_NAME, TO_DATE(TO_CHAR(TIME_ACCESSED_FIRST,'DD-MON-YY'),'DD-MON-YY');
192 rows selected.
Elapsed: 00:00:25.64
Execution Plan
----------------------------------------------------------
Plan hash value: 1043666463
------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Inst |IN-OUT|
------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 166 | 32038 | 1011 (2)| 00:00:15 | | |
| 1 | SORT ORDER BY | | 166 | 32038 | 1011 (2)| 00:00:15 | | |
| 2 | HASH GROUP BY | | 166 | 32038 | 1011 (2)| 00:00:15 | | |
| 3 | NESTED LOOPS | | 166 | 32038 | 1010 (1)| 00:00:15 | | |
|* 4 | HASH JOIN | | 10 | 280 | 5 (20)| 00:00:01 | | |
| 5 | INLIST ITERATOR | | | | | | | |
| 6 | TABLE ACCESS BY INDEX ROWID| CIRCLE_MASTER | 10 | 190 | 2 (0)| 00:00:01 | | |
|* 7 | INDEX RANGE SCAN | CIRCLE_MASTER_PK | 10 | | 1 (0)| 00:00:01 | | |
| 8 | TABLE ACCESS FULL | CUSTOMER_MASTER | 16 | 144 | 2 (0)| 00:00:01 | | |
| 9 | REMOTE | X | 17 | 2805 | 341 (1)| 00:00:05 | Y | R->S |
------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("CUST"."CUSTOMER_ID"="CIRC"."CUSTOMER_ID")
7 - access("CIRC"."CIRCLE_ID"=1 OR "CIRC"."CIRCLE_ID"=2 OR "CIRC"."CIRCLE_ID"=3 OR "CIRC"."CIRCLE_ID"=4 OR
"CIRC"."CIRCLE_ID"=5 OR "CIRC"."CIRCLE_ID"=6 OR "CIRC"."CIRCLE_ID"=7 OR "CIRC"."CIRCLE_ID"=8 OR "CIRC"."CIRCLE_ID"=9
OR "CIRC"."CIRCLE_ID"=10)
Remote SQL Information (identified by operation id):
----------------------------------------------------
9 - SELECT "MOBILE_NUMBER","CIRCLE_ID","TIME_ACCESSED_FIRST","APPS_ACCESSED" FROM "x" "USAGEREP"
WHERE "TIME_ACCESSED_FIRST">=TIMESTAMP'2007-04-01 00:00:00' AND "APPS_ACCESSED" LIKE '%ONSPIRE%' AND ("CIRCLE_ID"=1
OR "CIRCLE_ID"=2 OR "CIRCLE_ID"=3 OR "CIRCLE_ID"=4 OR "CIRCLE_ID"=5 OR "CIRCLE_ID"=6 OR "CIRCLE_ID"=7 OR
"CIRCLE_ID"=8 OR "CIRCLE_ID"=9 OR "CIRCLE_ID"=10) AND "TIME_ACCESSED_FIRST"<TIMESTAMP'2007-05-01 00:00:00' AND
:1="CIRCLE_ID" (accessing 'Y' )
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
24 consistent gets
0 physical reads
0 redo size
6603 bytes sent via SQL*Net to client
532 bytes received via SQL*Net from client
14 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
192 rows processed
SQL> SELECT CUST.CUSTOMER_NAME, TO_DATE(TO_CHAR(TIME_ACCESSED_LAST,'DD-MON-YY'),'DD-MON-YY') REPORT_DATE, CIRC.CIRCLE_NAME, COUNT( USAGEREP.MOBILE_NUM
BER) SUBSCRIBER FROM
2 x@y USAGEREP , CUSTOMER_MASTER CUST , CIRCLE_MASTER CIRC
3 WHERE USAGEREP.TIME_ACCESSED_LAST >= TO_DATE( '01-APR-2007','DD-MON-YY') AND USAGEREP.TIME_ACCESSED_LAST <(TO_DATE( '30-APR-2007','DD-MON-YY')+1)
AND CIRC.CIRCLE_ID IN ( 1,2,3,4,5,6,7,8,9,10 ) AND USAGEREP.APPS_ACCESSED LIKE '%ONSPIRE%' AND
4 CUST.CUSTOMER_ID = CIRC.CUSTOMER_ID AND CIRC.CIRCLE_ID = USAGEREP.CIRCLE_ID GROUP BY CUSTOMER_NAME, CIRCLE_NAME, TO_DATE(TO_CHAR(TIME_ACCESSED_LA
ST,'DD-MON-YY'),'DD-MON-YY') ORDER BY CUSTOMER_NAME, CIRCLE_NAME, TO_DATE(TO_CHAR(TIME_ACCESSED_LAST,'DD-MON-YY'),'DD-MON-YY');
208 rows selected.
Elapsed: 00:03:11.78
Execution Plan
----------------------------------------------------------
Plan hash value: 1043666463
------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Inst |IN-OUT|
------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 169 | 32617 | 1011 (2)| 00:00:15 | | |
| 1 | SORT ORDER BY | | 169 | 32617 | 1011 (2)| 00:00:15 | | |
| 2 | HASH GROUP BY | | 169 | 32617 | 1011 (2)| 00:00:15 | | |
| 3 | NESTED LOOPS | | 169 | 32617 | 1010 (1)| 00:00:15 | | |
|* 4 | HASH JOIN | | 10 | 280 | 5 (20)| 00:00:01 | | |
| 5 | INLIST ITERATOR | | | | | | | |
| 6 | TABLE ACCESS BY INDEX ROWID| CIRCLE_MASTER | 10 | 190 | 2 (0)| 00:00:01 | | |
|* 7 | INDEX RANGE SCAN | CIRCLE_MASTER_PK | 10 | | 1 (0)| 00:00:01 | | |
| 8 | TABLE ACCESS FULL | CUSTOMER_MASTER | 16 | 144 | 2 (0)| 00:00:01 | | |
| 9 | REMOTE | X | 17 | 2805 | 341 (1)| 00:00:05 | Y | R->S |
------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("CUST"."CUSTOMER_ID"="CIRC"."CUSTOMER_ID")
7 - access("CIRC"."CIRCLE_ID"=1 OR "CIRC"."CIRCLE_ID"=2 OR "CIRC"."CIRCLE_ID"=3 OR "CIRC"."CIRCLE_ID"=4 OR
"CIRC"."CIRCLE_ID"=5 OR "CIRC"."CIRCLE_ID"=6 OR "CIRC"."CIRCLE_ID"=7 OR "CIRC"."CIRCLE_ID"=8 OR "CIRC"."CIRCLE_ID"=9
OR "CIRC"."CIRCLE_ID"=10)
Remote SQL Information (identified by operation id):
----------------------------------------------------
9 - SELECT "MOBILE_NUMBER","CIRCLE_ID","TIME_ACCESSED_LAST","APPS_ACCESSED" FROM "X" "USAGEREP"
WHERE "TIME_ACCESSED_LAST">=TIMESTAMP'2007-04-01 00:00:00' AND "APPS_ACCESSED" LIKE '%ONSPIRE%' AND ("CIRCLE_ID"=1 OR
"CIRCLE_ID"=2 OR "CIRCLE_ID"=3 OR "CIRCLE_ID"=4 OR "CIRCLE_ID"=5 OR "CIRCLE_ID"=6 OR "CIRCLE_ID"=7 OR "CIRCLE_ID"=8
OR "CIRCLE_ID"=9 OR "CIRCLE_ID"=10) AND "TIME_ACCESSED_LAST"<TIMESTAMP'2007-05-01 00:00:00' AND :1="CIRCLE_ID"
(accessing 'Y' )
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
24 consistent gets
0 physical reads
0 redo size
7124 bytes sent via SQL*Net to client
543 bytes received via SQL*Net from client
15 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
208 rows processed
SQL>
SQL>
SQL> SELECT CUST.CUSTOMER_NAME, TO_DATE(TO_CHAR(TIME_ACCESSED_FIRST,'DD-MON-YY'),'DD-MON-YY') REPORT_DATE, CIRC.CIRCLE_NAME, COUNT( USAGEREP.MOBILE_NU
MBER) SUBSCRIBER FROM x@y USAGEREP , CUSTOMER_MASTER CUST , CIRCLE_MASTER CIRC WHERE USAGEREP.TIME_ACCESSED_FIR
ST >= TO_DATE( '01-APR-2007','DD-MON-YY') AND USAGEREP.TIME_ACCESSED_FIRST < (TO_DATE( '30-APR-2007','DD-MON-YY')+1) AND CIRC.CIRCLE_ID IN ( 1,2,3,4,5
,6,7,8,9,10 ) AND USAGEREP.APPS_ACCESSED LIKE '%ONSPIRE%' AND
2 CUST.CUSTOMER_ID = CIRC.CUSTOMER_ID AND CIRC.CIRCLE_ID = USAGEREP.CIRCLE_ID GROUP BY CUSTOMER_NAME, CIRCLE_NAME, TO_DATE(TO_CHAR(TIME_ACCESSED_F
IRST,'DD-MON-YY'),'DD-MON-YY')
3 ORDER BY CUSTOMER_NAME, CIRCLE_NAME, TO_DATE(TO_CHAR(TIME_ACCESSED_FIRST,'DD-MON-YY'),'DD-MON-YY');
192 rows selected.
Elapsed: 00:00:25.64
Execution Plan
----------------------------------------------------------
Plan hash value: 1043666463
------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Inst |IN-OUT|
------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 166 | 32038 | 1011 (2)| 00:00:15 | | |
| 1 | SORT ORDER BY | | 166 | 32038 | 1011 (2)| 00:00:15 | | |
| 2 | HASH GROUP BY | | 166 | 32038 | 1011 (2)| 00:00:15 | | |
| 3 | NESTED LOOPS | | 166 | 32038 | 1010 (1)| 00:00:15 | | |
|* 4 | HASH JOIN | | 10 | 280 | 5 (20)| 00:00:01 | | |
| 5 | INLIST ITERATOR | | | | | | | |
| 6 | TABLE ACCESS BY INDEX ROWID| CIRCLE_MASTER | 10 | 190 | 2 (0)| 00:00:01 | | |
|* 7 | INDEX RANGE SCAN | CIRCLE_MASTER_PK | 10 | | 1 (0)| 00:00:01 | | |
| 8 | TABLE ACCESS FULL | CUSTOMER_MASTER | 16 | 144 | 2 (0)| 00:00:01 | | |
| 9 | REMOTE | X | 17 | 2805 | 341 (1)| 00:00:05 | Y | R->S |
------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("CUST"."CUSTOMER_ID"="CIRC"."CUSTOMER_ID")
7 - access("CIRC"."CIRCLE_ID"=1 OR "CIRC"."CIRCLE_ID"=2 OR "CIRC"."CIRCLE_ID"=3 OR "CIRC"."CIRCLE_ID"=4 OR
"CIRC"."CIRCLE_ID"=5 OR "CIRC"."CIRCLE_ID"=6 OR "CIRC"."CIRCLE_ID"=7 OR "CIRC"."CIRCLE_ID"=8 OR "CIRC"."CIRCLE_ID"=9
OR "CIRC"."CIRCLE_ID"=10)
Remote SQL Information (identified by operation id):
----------------------------------------------------
9 - SELECT "MOBILE_NUMBER","CIRCLE_ID","TIME_ACCESSED_FIRST","APPS_ACCESSED" FROM "x" "USAGEREP"
WHERE "TIME_ACCESSED_FIRST">=TIMESTAMP'2007-04-01 00:00:00' AND "APPS_ACCESSED" LIKE '%ONSPIRE%' AND ("CIRCLE_ID"=1
OR "CIRCLE_ID"=2 OR "CIRCLE_ID"=3 OR "CIRCLE_ID"=4 OR "CIRCLE_ID"=5 OR "CIRCLE_ID"=6 OR "CIRCLE_ID"=7 OR
"CIRCLE_ID"=8 OR "CIRCLE_ID"=9 OR "CIRCLE_ID"=10) AND "TIME_ACCESSED_FIRST"<TIMESTAMP'2007-05-01 00:00:00' AND
:1="CIRCLE_ID" (accessing 'Y' )
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
24 consistent gets
0 physical reads
0 redo size
6603 bytes sent via SQL*Net to client
532 bytes received via SQL*Net from client
14 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
192 rows processed
Waiting for reply eagerly
with regards
Prem
May 14, 2007 - 2:15 pm UTC
1) of course. a hash number has a finite range (it is a discrete set of values, of finite size). the number of sql queries out there is infinite pretty much - at least I can safely say it is much larger than the number of hash values we hash into.
therefore, conflicts will happen. for it to happen due to a simple column name change - unlikely, but possible.
2) use tkprof - something is up with your environment I would suspect.
prem, May 17, 2007 - 7:51 am UTC
Hi Tom,
Kindly find the TKPROF output for the above queries
QUERY 1 :
=========
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.08 0 0 1 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 15 0.00 116.96 0 24 0 207
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 17 0.00 117.04 0 24 1 207
Misses in library cache during parse: 1d
Optimizer mode: ALL_ROWS
Parsing user id: 25
Rows Row Source Operation
------- ---------------------------------------------------
207 SORT ORDER BY (cr=24 pr=0 pw=0 time=116961499 us)
207 HASH GROUP BY (cr=24 pr=0 pw=0 time=116961000 us)
68816 NESTED LOOPS (cr=24 pr=0 pw=0 time=215956911 us)
10 HASH JOIN (cr=24 pr=0 pw=0 time=573 us)
10 INLIST ITERATOR (cr=20 pr=0 pw=0 time=223 us)
10 TABLE ACCESS BY INDEX ROWID CIRCLE_MASTER (cr=20 pr=0 pw=0 time=140 us)
10 INDEX RANGE SCAN CIRCLE_MASTER_PK (cr=10 pr=0 pw=0 time=77 us)(object id 9959)
16 TABLE ACCESS FULL CUSTOMER_MASTER (cr=4 pr=0 pw=0 time=36 us)
68816 REMOTE Y (cr=0 pr=0 pw=0 time=311794566 us)
Query 2:
========
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 14 0.40 16.96 0 24 0 193
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 16 0.40 16.97 0 24 0 193
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 25
Rows Row Source Operation
------- ---------------------------------------------------
193 SORT ORDER BY (cr=24 pr=0 pw=0 time=16966158 us)
193 HASH GROUP BY (cr=24 pr=0 pw=0 time=16965657 us)
51091 NESTED LOOPS (cr=24 pr=0 pw=0 time=14040018 us)
10 HASH JOIN (cr=24 pr=0 pw=0 time=558 us)
10 INLIST ITERATOR (cr=20 pr=0 pw=0 time=220 us)
10 TABLE ACCESS BY INDEX ROWID CIRCLE_MASTER (cr=20 pr=0 pw=0 time=137 us)
10 INDEX RANGE SCAN CIRCLE_MASTER_PK (cr=10 pr=0 pw=0 time=82 us)(object id 9959)
16 TABLE ACCESS FULL CUSTOMER_MASTER (cr=4 pr=0 pw=0 time=37 us)
51091 REMOTE Y (cr=0 pr=0 pw=0 time=29062154 us)
can also you suggest me the use of hint will improve the performance
With regards
Prem
May 17, 2007 - 11:24 am UTC
enable wait tracing as well please - you are waiting for something
is your cpu totally maxed out on that machine?
Manoj Pradhan, May 17, 2007 - 3:31 pm UTC
Hi Tom,
I have a different type of request .
When a user ask question ,Please show the status of user request in you web site .
like you put some comments or symbols ( red star , orange star etc )
1 . If you have time , you answer it .
2. If you have gone through the Question and you want to answer later later you could put orange .
3. If you dont want to answer Put Red Star .
4. Unread ( By default ) put Blue ...
ETC .....
This is a suggestion .....
Because I ask some question ....
You also answer it , in some question I unable to know what is happening .... Whether I 'll got replay or not ...
I am checking daily my Question ...
Now come to My Another Question ...
I have two Query Both will give same out put ...
On which basic I can prove one will run First ...
According to your suggestion I rewrite some Query fresh ..
Using Pivoting concept I reduced number of Tables in from clausse 85 to 53 .
COST show 98% reduced ...
But tkprof , Explain plan statitics all so my new query is not better ...
There are many tools ... TKPROF , AUTOTRACE , STATPACK .... etc ....
from which we will look for What ....
Now I am going through your both books ...
also this portal ....
But I am unable to draw any conclusion on this SQL tunning part ...
Prem, May 18, 2007 - 8:12 am UTC
A) Kindly find the wait events for above two query
QUERY 1:-
========
Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT MODE: ALL_ROWS
207 SORT (ORDER BY)
207 HASH (GROUP BY)
67917 NESTED LOOPS
10 HASH JOIN
10 INLIST ITERATOR
10 TABLE ACCESS MODE: ANALYZED (BY INDEX ROWID) OF
'CIRCLE_MASTER' (TABLE)
10 INDEX MODE: ANALYZED (RANGE SCAN) OF
'CIRCLE_MASTER_PK' (INDEX (UNIQUE))
16 TABLE ACCESS MODE: ANALYZED (FULL) OF
'CUSTOMER_MASTER' (TABLE)
67917 REMOTE OF 'X' (REMOTE) [Y]
SELECT "MOBILE_NUMBER","CIRCLE_ID","TIME_ACCESSED_LAST",
"APPS_ACCESSED" FROM "X" "USAGEREP" WHERE
"TIME_ACCESSED_LAST">=TIMESTAMP'2007-04-01 00:00:00' AND
"APPS_ACCESSED" LIKE '%ONSPIRE%' AND ("CIRCLE_ID"=1 OR
"CIRCLE_ID"=2 OR "CIRCLE_ID"=3 OR "CIRCLE_ID"=4 OR
"CIRCLE_ID"=5 OR "CIRCLE_ID"=6 OR "CIRCLE_ID"=7 OR
"CIRCLE_ID"=8 OR "CIRCLE_ID"=9 OR "CIRCLE_ID"=10) AND
"TIME_ACCESSED_LAST"<TIMESTAMP'2007-05-01 00:00:00' AND :1=
"CIRCLE_ID"
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 15 0.00 0.00
SQL*Net message to dblink 21 0.00 0.00
SQL*Net message from dblink 21 0.78 4.61
SQL*Net more data from dblink 1081 0.49 109.25
SQL*Net message from client 15 0.00 0.00
********************************************************************************
QUERY 2:-
========
Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT MODE: ALL_ROWS
193 SORT (ORDER BY)
193 HASH (GROUP BY)
51133 NESTED LOOPS
10 HASH JOIN
10 INLIST ITERATOR
10 TABLE ACCESS MODE: ANALYZED (BY INDEX ROWID) OF
'CIRCLE_MASTER' (TABLE)
10 INDEX MODE: ANALYZED (RANGE SCAN) OF
'CIRCLE_MASTER_PK' (INDEX (UNIQUE))
16 TABLE ACCESS MODE: ANALYZED (FULL) OF
'CUSTOMER_MASTER' (TABLE)
51133 REMOTE OF 'x' (REMOTE) [y]
SELECT "MOBILE_NUMBER","CIRCLE_ID","TIME_ACCESSED_FIRST",
"APPS_ACCESSED" FROM "X" "USAGEREP" WHERE
"TIME_ACCESSED_FIRST">=TIMESTAMP'2007-04-01 00:00:00' AND
"APPS_ACCESSED" LIKE '%ONSPIRE%' AND ("CIRCLE_ID"=1 OR
"CIRCLE_ID"=2 OR "CIRCLE_ID"=3 OR "CIRCLE_ID"=4 OR
"CIRCLE_ID"=5 OR "CIRCLE_ID"=6 OR "CIRCLE_ID"=7 OR
"CIRCLE_ID"=8 OR "CIRCLE_ID"=9 OR "CIRCLE_ID"=10) AND
"TIME_ACCESSED_FIRST"<TIMESTAMP'2007-05-01 00:00:00' AND :1=
"CIRCLE_ID"
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 14 0.00 0.00
SQL*Net message to dblink 21 0.00 0.00
SQL*Net message from dblink 21 0.90 3.61
SQL*Net more data from dblink 812 0.55 33.85
SQL*Net message from client 14 0.00 0.00
********************************************************************************
B)CPU usage is only around 60-70%
Suggest me the solution
With Regards
Prem
May 18, 2007 - 4:09 pm UTC
so, you see where the time is spent right.
you are doing things over a dblink, over a network.
A reader, May 18, 2007 - 2:30 pm UTC
check the network connection to the first server :)
Significant difference in
sqlnet more data from dblink
Prem, May 21, 2007 - 1:04 am UTC
Hi Tom,
I have some questions for you
Machine Machine
X --> Y query is fetching 193 rows in 21s (A)
X --> Y query is fetching 203 rows in 3 min 11s (B)
Both queries are using the same network(dblink)
1) Why the difference in time taken for the query?
2)If you say the difference between the no. of rows returned for the remote query A and B is around 17,725 rows. Then why is huge difference in time taken ?
With regards
Prem
May 21, 2007 - 10:23 am UTC
1) you obviously have two different queries. consider these two:
select * from t; <<== returns 193 rows, because there are 193 rows in the table
select a, count(*) from t2 group by a; <<== returns 203 rows, but there were 1,000,000,000 rows to read and aggregate
which one will take longer?
2) see #1, number of rows returned is not very relevant to query response time.
Alex, June 09, 2007 - 7:22 pm UTC
Tom,
Many times, I found solutions only here at asktom site. Thank you very much for your dedication!
But I have a question: our database has multiple schemas and when I query one table in the first one:
1* select count(*) from r.fastmove
SQL> /
COUNT(*)
--------------
15888611
1 row selected.
Elapsed: 00:00:06.25
Execution Plan
----------------------------------------------------------
Plan hash value: 1060808214
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10792 (2)| 00:02:10 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FAST FULL SCAN| MOVE_MV_NUM_IDX | 15M| 10792 (2)| 00:02:10 |
------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
2 recursive calls
0 db block gets
48883 consistent gets
48824 physical reads
0 redo size
414 bytes sent via SQL*Net to client
381 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
-------------
1* select count(*) from l.fastmove
SQL> /
COUNT(*)
--------------
11044672
1 row selected.
Elapsed: 00:00:00.50
Execution Plan
----------------------------------------------------------
Plan hash value: 1060808214
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4569 (3)| 00:00:55 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FAST FULL SCAN| MOVE_MV_NUM_IDX | 11M| 4569 (3)| 00:00:55 |
------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
2 recursive calls
0 db block gets
16736 consistent gets
0 physical reads
0 redo size
414 bytes sent via SQL*Net to client
381 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
and when I query the second table with the same structure in other schema though there less rows by ~ 4mil but it does not use any physical reads and the response time is less than 1 sec where the first one returns in more than 6 secs. Both those queries are using PK index. where should I look to fix the problem? Both tables are residing in Locally managed tablespaces and both of them are partitioned. I could not make a definite conclusion on my own.The first query runs slower all the time even when the number of rows was ~ equal in both tables
Thanks.
Alex
June 11, 2007 - 10:12 am UTC
Well, they might be within 4million rows of each other - but one is 300% the size.
Size counts.
Rows - not so much. Rows * width = size, size matters....
See the consistent gets - 48,000 vs 16,000.
Here is my hypothesis....
first index, not used to retrieve individual rows. Hence, the blocks from it are not in the cache via SINGLE BLOCK IO, only via FAST FULL SCAN (multi-block) IO. Hence - they are aged out very fast (so the fast full scan does not blow out the cache, wipe it out). Much like a FULL TABLE SCAN will perform multiblock IO and reuse the last sets of blocks in the cache rather than overwrite lots of other useful data.
second index, used heavily to retrieve individual rows. Hence, the blocks from it are IN the cache via SINGLE BLOCK IO. They are not aged out. When we go to fast full scan - we instead find we do not do physical IO, we find them all in the cache.
Here is an example that approximates yours, I create your 15,000,000 and 11,000,000 row tables with keys that are multiples of eachother in size (the 300% factor).
I used an 8k block.
sga_target was 1gb, sga ended up looking like this:
Total System Global Area 1073741824 bytes
Fixed Size 1265920 bytes
Variable Size 327159552 bytes
Database Buffers 738197504 bytes
Redo Buffers 7118848 bytes
I primed the SGA with the 11,000,000 index by doing an index access per row - sort of ensuring it would be all cached.
Then, ran counts against the two - first index, all physical IO - all of the time - because we never did any single block IO's of it. second index - all logical IO since it is fully cached from the single block IO's
ops$tkyte%ORA10GR2> /*
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> drop table t1;
ops$tkyte%ORA10GR2> drop table t2;
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create table t1 ( x char(25), y number );
ops$tkyte%ORA10GR2> insert into t1
ops$tkyte%ORA10GR2> select l, l
ops$tkyte%ORA10GR2> from (select level L from dual connect by level <= 15000000)
ops$tkyte%ORA10GR2> /
ops$tkyte%ORA10GR2> alter table t1 add constraint t1_pk primary key(x);
ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T1' );
ops$tkyte%ORA10GR2> create table t2 ( x char(8), y number );
ops$tkyte%ORA10GR2> insert into t2
ops$tkyte%ORA10GR2> select l, l
ops$tkyte%ORA10GR2> from (select level L from dual connect by level <= 11000000)
ops$tkyte%ORA10GR2> /
ops$tkyte%ORA10GR2> alter table t2 add constraint t2_pk primary key(x);
ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T2' );
ops$tkyte%ORA10GR2> */
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> alter system flush buffer_cache;
System altered.
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> declare
2 l_x t2.x%type;
3 l_rec t2%rowtype;
4 begin
5 for i in 1 .. 11000000
6 loop
7 l_x := i;
8 select * into l_rec from t2 where x = l_x;
9 end loop;
10 end;
11 /
PL/SQL procedure successfully completed.
ops$tkyte%ORA10GR2> set autotrace on
ops$tkyte%ORA10GR2> select count(*) from t1;
COUNT(*)
----------
15000000
Execution Plan
----------------------------------------------------------
Plan hash value: 1018460547
-----------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 21027 (2)| 00:04:13 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FAST FULL SCAN| T1_PK | 14M| 21027 (2)| 00:04:13 |
-----------------------------------------------------------------------
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
75548 consistent gets
75513 physical reads
0 redo size
411 bytes sent via SQL*Net to client
400 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
ops$tkyte%ORA10GR2> select count(*) from t1;
COUNT(*)
----------
15000000
Execution Plan
----------------------------------------------------------
Plan hash value: 1018460547
-----------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 21027 (2)| 00:04:13 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FAST FULL SCAN| T1_PK | 14M| 21027 (2)| 00:04:13 |
-----------------------------------------------------------------------
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
75548 consistent gets
75512 physical reads
0 redo size
411 bytes sent via SQL*Net to client
400 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
ops$tkyte%ORA10GR2> select count(*) from t2;
COUNT(*)
----------
11000000
Execution Plan
----------------------------------------------------------
Plan hash value: 3724366055
-----------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8070 (3)| 00:01:37 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FAST FULL SCAN| T2_PK | 11M| 8070 (3)| 00:01:37 |
-----------------------------------------------------------------------
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
29254 consistent gets
1 physical reads
0 redo size
411 bytes sent via SQL*Net to client
400 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
ops$tkyte%ORA10GR2> select count(*) from t2;
COUNT(*)
----------
11000000
Execution Plan
----------------------------------------------------------
Plan hash value: 3724366055
-----------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8070 (3)| 00:01:37 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FAST FULL SCAN| T2_PK | 11M| 8070 (3)| 00:01:37 |
-----------------------------------------------------------------------
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
29254 consistent gets
0 physical reads
0 redo size
411 bytes sent via SQL*Net to client
400 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
ops$tkyte%ORA10GR2> set autotrace off
A reader, June 11, 2007 - 12:52 pm UTC
simple and nice
Thanks
Alex, June 11, 2007 - 1:02 pm UTC
But if you will not cache t2 what COUNT on it will bring?
The same number of physical IO? I will get different IO result even from the start of my database. Avg row length in both tables almost identical 264 versus 261.
Should I in any way reorginize the table?
Thanks,
Alex
June 11, 2007 - 4:10 pm UTC
row length, row smength.
do you see the number of logical IO's you perform in each case, that first index of yours is 300% the size of the other one!!! it is BIGGER. Perhaps your index key is 3x the size.
did you understand what my example was demonstrating? I could reorganize my tables until the cows came home - but it would not change the results.
I had my 11,000,000 table use it's index to access each row. That used single block IO. Those index blocks were cached using in effect an LRU (least recently used) algorithm (based on a touch count really - but just presume LRU like caching).
My 15,000,000 row table - it did not use the index to access the rows one by one. That index only experienced FULL SCANS (fast full scans) - that is cached using an algorithm that says "these are full scan blocks, reuse their buffer cache space right away, please don't wipe out my cache"
It has nothing to do with reorganizing the data, it is all about how you ACCESS the data. Your index that experiences all physical IO's is just not used in a single block fashion, hence it is not cached.
Now question for you:
why the heck are you counting rows in a table. I usually find the way to tune that is simply:
THIS SPACE LEFT INTENTIONALLY BLANK
eg: counting rows in a table is "not smart", don't do it, it is generally not worth the bytes of code you wasted to type it in.
Alex, June 12, 2007 - 11:14 am UTC
Tom,
I am familiar with LRU algorithm and I understand what your example was doing...
I found that some simialarvqueries take longer on one table than on another and simple "count" was an example.
I do not count rows - unless necessary, of course.
Thanks,
Alex
June 12, 2007 - 1:54 pm UTC
Ok, it boils down to this:
things you full scan - do multiblock IO on only - will in general NOT be in the cache, by design. it has nothing to do with "reorganizing" the bits on disk - it is all about how you access the data.
things you single block IO on - they will in general be well cached (given a sufficiently sized cache) and if you sometimes do multi-block IO on them - we'll read the blocks from the cache, not disk.
Cost comparison
karthik, July 08, 2007 - 2:29 pm UTC
Tom,
please enlight me whether
1,Comparison of relative cost for each step in a plan with relative cost for each step in another plan for same sql statement is correct.Eg(one plan chooses FTS and another chooses IR scan where cost of FTS is greater than IR scan but knowledge of cardinality warns me to take FTS).
2,Comparison of cost for two different sql statements which return the same result set.
Ultimately i want to know the purpose of cost when it comes for comparison.
July 08, 2007 - 6:01 pm UTC
1) if the math you are comparing is similar - sure.
2) no, not at all.
cost is a number we assign to the various plans generated for a given query. we then sort by cost and take the one with the smallest number.
that is the purpose of the cost.
cost meaning
karthik, July 09, 2007 - 2:21 am UTC
Tom,
1.Thanks.Earlier in this post you told that the relative cost of each step in a plan is FACTORED to derive a number.Many a times i saw that the cost of the step next to topmost (say SELECT) has same value.Does it mean that factoring may happen or something like an exception.
2.Also for my earlier question you told about "same math for comparison".My understanding is cost is either purely based on I/O incurred or I/O+CPU if system statistics gathered.So i will calculate based on either one of these for understanding a cost.Any other "math" you mean which defers the foresaid i/o costing or cpu costing "math" which iam unaware about.
3.You say "Adding a hint inflates the cost artificially" in that case the cost of the plan shoots high and it should not be picked up when looking for a plan with least cost by optimizer. Then why we are using/forcing a hint to do something when it internally decrease the propability of selecting a plan.
July 09, 2007 - 7:00 am UTC
1) no, it is a mathematical model. rows times costs, cpu costing affects things, init.ora parameters affect things. if two steps have the "same cost" that just means "nothing happened there to affect the cost"
2) there are other factors, init.ora settings, statistics on the tables, system statistics.
3) i said in this page that adding a hint might cause a query with a higher cost (than the unhinted set of queries) to be chosen - because you hinted and said you wanted that. it did not decrease the probability at all, it caused us to chose a plan with a higher cost.
ops$tkyte%ORA9IR2> create table t as select * from all_objects;
Table created.
ops$tkyte%ORA9IR2> create index t_idx on t(object_id);
Index created.
ops$tkyte%ORA9IR2> exec dbms_stats.gather_table_stats(user,'T',cascade=>true);
PL/SQL procedure successfully completed.
ops$tkyte%ORA9IR2> set autotrace traceonly explain
ops$tkyte%ORA9IR2> select * from t order by object_id;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=881 Card=30303 Bytes=2818179)
1 0 SORT (ORDER BY) (Cost=881 Card=30303 Bytes=2818179)
2 1 TABLE ACCESS (FULL) OF 'T' (Cost=41 Card=30303 Bytes=2818179)
ops$tkyte%ORA9IR2> select /*+ index( t t_idx ) */ * from t order by object_id;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=25460 Card=30303 Bytes=2818179)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=25460 Card=30303 Bytes=2818179)
2 1 INDEX (FULL SCAN) OF 'T_IDX' (NON-UNIQUE) (Cost=68 Card=30303)
ops$tkyte%ORA9IR2> set autotrace off
Meaning of Cost
Paul, July 09, 2007 - 2:24 pm UTC
Jonathan Lewis in his book Cost-Based Oracle Fundamentals stated that 'The cost is the time spent on single-block reads, plus the time spent on multiblock reads, plus the CPU time required, all divided by the time it takes to do a single-block read. Which means the cost is the total predicted execution time for the statement, expressed in units of the single-block read time.' (page 4).
But from your answers, I get the impression that the cost might as well be a random number, meaningful only for the particular statements it's generated for.
Are the 2 views contracdictory?
July 09, 2007 - 6:07 pm UTC
if the optimizer were perfect - got the exact numbers, precisely correct (which is functionally impossible, impracticable) - the cost would be "time"
in theory, over time, it may well get near enough to be good enough.
for now, people build silly "laws" like "no query with a cost of 500 or more can go production" or they sit there and try to figure out why two queries that return the same results - but have wildly different costs (and just for a bonus presume the one with the honking big cost runs tons faster than the one with the tiny cost) - spinning wheels.
For what we (you and I and developers in general) need to do, you many times are better off saying 'random pretty number'.
when you are trying to diagnose why a query plan was chosen, the costing is very relevant - but I wouldn't be comparing it across queries, but looking within a SINGLE query and a set of generated plans
cpu vs i/o costing
karthik, July 09, 2007 - 3:12 pm UTC
Tom,
1.Our environment has a database which runs 24*7 supporting countries in different timezones with adhoc queries.In this case if i collect statistics for peak time then should it span 24 hrs?
2.Even if the peak time is fixed,then if i collect cpu costing queries will generate sub optimal plans during off peak time?right?
As we are moving from 9i to 10g when i suggested cpu costing which is default in 10g other DBA's pointed out like this.....
3.To implement CPU costing,knowledge of OS and hardware(particularly disks) is essential stating examples like
sreadtime greater than mreadtim when san disks with cache are used.is it right and io costing ignores such things?
4.i/o costing is good for databases where top wait events are i/o based and if wait is for cpu time then to select cpu costing.If so why oracle defaulted to cpu costing in 10g.If that is the case where my system uses analytic functions consuming cpu and read large data causing i/o then what should be my approach/costing?
Sorry to post too many questions in one shot.Eager to read your response which may get delayed due to timezone difference if i post one by one
July 09, 2007 - 6:11 pm UTC
1) probably not, doubtful you have "peak" for 24 hours. You want to give the optimizer representative values for single block IO speed, multi-block IO speed, the true multi-block read count and your cpu speed, that is the goal of system statistics
2) doubtful, you are telling Oracle about those things listed above, in practice, one setting is sufficient..
3) did not follow
4) who said that.
costing
karthik, July 12, 2007 - 3:53 pm UTC
Tom,
I read in another forum of yours "system statistics" that gathering system statistics is an additional one,and not to refrain from gathering table/index level statistics.
This leads to few questions
1,Previously when system statistics was turned off by default,the only available method of calculating cost is based on i/o where the cost of phy. reads equals logical reads.If my system does less io during a call but lot of cpu intensive functions,wont the cost of a query differ from the actual and mislead.So what is the strategy until 10g this
was allowed i.e to turn off cpu costing/system statistics.Even you told in one of the forums that you will leave "as it is" the cpu costing(off) default when you feel the system does not need that.
2,In a 9i default environment when system statistics was not there,how attributes like sreadtim,mreadtim,multiblock db read count were obtained to estimate a cost.does that mean,even without a "real time measured value" for aforesaid attributes still oracle could generate best plan.Simply asking what is the use of table/index/column stats which speaks about cardinality and selectivity when no way to know
about cpu or io utilization
3.why the cost of cpu incurred is accounted only when system stats is collected?if the cost of cpu is taken into account while collecting schema level statistics we can get best stats for any schema in particular who really needs it?
4.By CPU costing,only cpu overhead with LIO is costed or
cpu cost for cpu intensive functions are also accounted.
thanks
July 12, 2007 - 5:09 pm UTC
1) I say leave at default - and the default will change over time. I still say "leave at default" - whatever that happens to be in that version.
Since systems are extremely dynamic
And parsed sql is rather static
nothing is going to be 'perfect' for all situations - ever.
it wasn't before, it isn't now, it won't be in the future
2) they were not obtained, unless and until you obtained them - that was the point. If you didn't gather them, you didn't have them and they were not used.
3) that comment didn't make sense to me.
4) as much as can be taken into consideration - will be. these do the same LIOs:
ps$tkyte%ORA10GR2> select count(*) from t;
Elapsed: 00:00:00.00
Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522
-------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 82 (61)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| T | 1000K| 82 (61)| 00:00:01 |
-------------------------------------------------------------------
ops$tkyte%ORA10GR2> select sum(user_id) from t;
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 88 (64)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
| 2 | TABLE ACCESS FULL| T | 1000K| 12M| 88 (64)| 00:00:01 |
---------------------------------------------------------------------------
optimizer cost on inserts
Ray DeBruyn, July 18, 2007 - 4:53 pm UTC
Hi Tom,
I've been modifying SQL in my application over time and found a corelation between poor SQL and very high opimizer cost. I'm glad I haven't chased my tail so much after reading that the optimizer costs can't really be compared between SQL.
The last SQL I investigated was an insert. The table is simple, with no foreign keys and is not referenced by other foreign keys. It is large for this application (2 million rows) and it is accessed quite often. The DDL follows:
CREATE TABLE logins
(
check_value VARCHAR2(50) NOT NULL,
checksum VARCHAR2(10) NOT NULL,
login_date DATE NOT NULL,
logout_date DATE,
username VARCHAR2(30) NOT NULL,
logout_type VARCHAR2(1),
display_help_flag VARCHAR2(1) DEFAULT 'N',
organization_id NUMBER(10),
id NUMBER(10) NOT NULL,
last_hit_date DATE
)
/
ALTER TABLE logins
ADD CONSTRAINT login_pk PRIMARY KEY (id)
/
The optimizer cost is 2678 for:
INSERT INTO logins VALUES (:b1,:b2,:b3,:b4,:b5,:b6,:b7,:b8,:b9,:b10)
I think I may be just starting to chase my tail. I don't see any issues with this table and it's use in the application. Is it possible to have an optimizer cost this high and not be a cause for concern?
July 19, 2007 - 10:40 am UTC
you mean if you do this, you see a big cost??
I generally, well, just ignore the cost.
I would look for SQL that is performing poorly.
Looking for large cost SQL and then seeing if it runs "poorly" seems a lot less efficient than looking for poorly running SQL - regardless of cost.
ops$tkyte%ORA10GR2> CREATE TABLE logins
2 (
3 check_value VARCHAR2(50) NOT NULL,
4 checksum VARCHAR2(10) NOT NULL,
5 login_date DATE NOT NULL,
6 logout_date DATE,
7 username VARCHAR2(30) NOT NULL,
8 logout_type VARCHAR2(1),
9 display_help_flag VARCHAR2(1) DEFAULT 'N',
10 organization_id NUMBER(10),
11 id NUMBER(10) NOT NULL,
12 last_hit_date DATE
13 )
14 /
Table created.
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> ALTER TABLE logins
2 ADD CONSTRAINT login_pk PRIMARY KEY (id)
3 /
Table altered.
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> @plan "INSERT INTO logins VALUES (:b1,:b2,:b3,:b4,:b5,:b6,:b7,:b8,:b9,:b10)"
ops$tkyte%ORA10GR2> delete from plan_table;
0 rows deleted.
ops$tkyte%ORA10GR2> explain plan for &1;
old 1: explain plan for &1
new 1: explain plan for INSERT INTO logins VALUES (:b1,:b2,:b3,:b4,:b5,:b6,:b7,:b8,:b9,:b10)
Explained.
ops$tkyte%ORA10GR2> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 1 | 100 | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------
6 rows selected.
I Believe!
Ray DeBruyn, July 19, 2007 - 11:22 am UTC
I've been lucky so far in using the optimizer cost. In truth I started by identifying poor running SQL and noticed a corelation (it seemed). I've fixed maybe 10 SQL statements this way.
I see what you've been trying to convince so many in this post is true. The optimizer cost is only the best case in creating the explain plan of that SQL statement. Costs from two SQL statements cannot be compared and a higher number does not mean poor SQL.
I may have been lucky so far - or it may be that most of the 10 SQL statements I saw the corelation were dynamic ref cursors without bind variables! I was doing a full parse every time. They may have been given an unusually high cost and floated to the top.
You mentioned earlier companies with a policy not to allow SQL with a cost above a certain value. What a waste of good SQL!
Thanks again Tom!
Comparing output from explain plan and tkprof
Rune Mörk, September 18, 2007 - 4:41 am UTC
Just wondering, why is there (sometime) a difference in what the cbo is expecting to do, ie the explain plan, and what it actually did during run time, as see through tkprof.
I have expirenced that a query came out with a rather good explain plan, just as i expected it, bun when running the pl/sql job where the query is a part of, the plan, as seen via 10046 trace looks different.
What made the cbo alter its decision?
September 18, 2007 - 4:46 pm UTC
Compute vs. Estimate
Joe Bloggs, July 03, 2008 - 7:26 am UTC
If I 'analyse compute' a table, the query I run against it goes much slower than when I estimate the stats.
What the hell is the point of providing more accurate stats. if it doesn't yield better performance?
I never use 'compute stats' anymore. Queries always run slower than when the stats. are estimated !
July 07, 2008 - 10:01 am UTC
why the heck do you feel the need to use inappropriate language?
I can prove you wrong, you say "always run slower". All I need is one case where they do not and you are wrong.
And, by the way, you are wrong.
And given the information you have provided (none), that is about all we can say.
Cost is Time ( Again :-)
Parthiban Nagarajan, January 12, 2011 - 5:15 am UTC
January 12, 2011 - 11:02 am UTC
did you critically read his blog entry? In a perfect world - cost would be time, but as he says - the world isn't perfect.
You cannot compare the cost of two different queries with each other - practically speaking. As Jonathan himself wrote:
Cost IS time – but only in theory.
Unless you know the optimizer inside and out - better than I - comparing the cost of two queries is an exercise in frustration. Setting up a rule that says "all queries must have a cost less than X" is a bad idea.
RE: Cost is Time
Parthiban Nagarajan, January 13, 2011 - 2:26 am UTC
Hi Tom
Thanks for your kind reply.
I agree with your <quote>Setting up a rule that says "all queries must have a cost less than X" is a bad idea.</quote>
But ...
{©} What is the goal of the CBO?
}©{ To choose the best of all available execution plans for a particular query.
It arrives at the best only using the cost estimate. It compares the cost of all available execution plans for the query and chooses the execution plan with the least cost.
(1) My question is, if CBO can compare the costs of the execution plans, why should not we do comparison?
(2) The only difference I see between CBO and us is - "CBO compares the cost of execution plans for a single query whereas we compare the costs of two (or more) queries". But, that also makes sense because, we know that those queries, that are subjected to comparison, will output the same records. So, why should not we compare the costs of two queries?
January 13, 2011 - 10:28 am UTC
(1) it is comparing the costs of various ways to execute a SINGLE QUERY. The CBO itself doesn't really compare the cost of "select * from emp" with that of "select * from dept"
(unless of course dept is really "create materialized view dept as select * from emp"...)
The CBO compares the costs of various execution plans for a single query.
The CBO (outside of query rewrite which is a different beast altogether) doesn't compare the cost of random query Q1 with random query Q2.
(2) Because cost = time only in theory, because the optimizer is imperfect and works with imperfect data. As stated "in theory - it should be ok to compare them, but in reality - it doesn't work so well - unless you know the optimizer better than yourself"
Could it be that...
Enrique Aviles, January 13, 2011 - 12:35 pm UTC
People tend to compare cost between different queries because it turns that more often than not the one with the lower cost runs faster.
If this is true 80% (to make up a number) of the time, then even if it doesn't make sense to compare cost between queries chances are in one's favor that the one with the lower cost runs faster.
Again, I made up the 80% number but I'd venture to guess the number is higher than 50% otherwise this entire thread wouldn't even exist. People would have realized comparing costs doesn't mean anything or is wrong most of the time so the practice would have been abandoned.
I'm not saying cost=time for all cases or advocating for one point or another, just trying to offer a logical explanation on why people compare the cost of two queries.
January 13, 2011 - 2:36 pm UTC
the problem is that people look at the cost and say "that is time and therefore all queries with a lower cost must run faster than a query with a higher cost - they have to or it is all broken"
And they get sorely disappointed.
I know why people try to compare the costs - they make the association that cost = time and cost(x) < cost(y) implies X must be faster than Y and then find that many times it just doesn't work that way. Then they get confused, mad, annoyed, whatever and bad things happen. So, I just say "don't do it". Then there is no confusion.
RE: Cost is Time
Parthiban Nagarajan, January 14, 2011 - 2:09 am UTC
Hi Tom
What the Optimizer guy thinks about the Cost?
His goal is to deliver the output with
minimum response timeFor the same purpose, he chooses the plan with
the least cost (for a single query).
So, it means Cost is Time (or)
Cost = f(time)
I hope my understanding is correct.
January 14, 2011 - 12:04 pm UTC
as written many times above and in the reference post by Jonathan - in theory
time = f(cost)
and
cost = f( cpu and IO resources)
cost is absolutely a function of cpu and IO resources that can imperfectly be thought of as an estimate of the amount of time it will take.
However, both the cpu and IO resource estimates are just that - and if they are inaccurate (as they are from time to time) then the f(cost) giving time would be off as well.
Queries
Enrique Aviles, January 14, 2011 - 10:17 am UTC
This is what I understand from reading this thread. The premise is that we should not (or cannot) compare the cost of two queries and unequivocally conclude the one with the lowest cost will run faster or is more efficient.
Example.
Query1: The optimizer internally generates a number of plans (which we don't see) and chooses the one with the lower cost. Let's say the internal plans have the following costs: 100, 500, 1000, 5000. The plan with cost 100 wins and is how Query1 is executed.
Query 2: This one is a modified version of Query1 and returns exactly the same data as Query1. The optimizer again generates a number of plans internally. The costs of those plans are 1, 10, 100, 500, 1000, 10000. Again, the plan with cost 1 wins and is how Query2 is executed.
The point here is that we should not assume that Query2 with a cost of 1 is more efficient than Query1 with a cost of 100 because they are different, albeit semantically equivalent, queries.
But... how many times have we all seen that a query with a lower cost is more efficient than an equivalent query with a higher cost? I think more often than not which is why the notion of lower cost = faster query came to be. Of course, there are cases where the opposite is true as shown by the original poster but, are those cases as prevalent as the opposite cases? If they were we wouldn't be having this conversation, it would have been much clearer that comparing the cost of two queries is a waste of time.
January 14, 2011 - 12:26 pm UTC
if there are exceptions to a "rule of thumb" and the exceptions are frequent, then the "rule of thumb" ain't so good and I'll just say "do not do that, you'll only confuse yourself and others"
That is my stand on this and until the optimizer becomes perfect and all knowing (version 42 I think), it will continue to be my stance.
Single query
Enrique Aviles, January 15, 2011 - 7:13 pm UTC
OK, so comparing costs between two semantically equivalent queries needs to stop because the one with the lower cost doesn't necessarily mean it is better than the one with a higher cost.
Does the same principle apply for a single query? When the optimizer generates different plans for a single query, it chooses the one with the lowest cost. Is that guaranteed to be the best plan? What if the second best is really the best plan or the third best? Since the optimizer is not perfect (until version 42g :) ), I would assume there are situations where the chosen plan is not necessarily the optimal plan. Still, I'd guess the chosen plan (the one with the lowest cost) is the best most of the time. Is that correct?
January 23, 2011 - 3:08 pm UTC
it needs to not become a rule that the query with the lowest costs between two syntactically different queries is the one we will use.
If you know exactly how the cost is computed...
If you know the data (its physical layout, distributions, estimated cardinalities) really well
then you could actually do the comparison - but not many do.
... Is that guaranteed to be the best plan? ...
of course not - if it was, we wouldn't even be having this discussion, the optimizer would be so amazing that no one would ever talk about it other to say "how amazing".
It is a mathematical model, it uses cardinality ESTIMATES to come up with a cost, that is imperfect - it probably always will be - at least probably in our (ok, my) lifetime.
Assuming the estimated cardinalities are nearly correct, the optimizer does a very good job of picking the right plan.
guaranteed? no
John Stegeman, January 17, 2011 - 5:58 am UTC
@Enrique - no, it's not guaranteed to be the best plan. It is, however, guaranteed to be what Oracle "thinks" is the best plan, according to the mathematical model that is the optimizer.
Jonathan Lewis OUG
Oli, January 17, 2011 - 10:18 am UTC
If Cost is not relevant to the run time speed, why does Jonathan Lewis talk about making costs cheaper, and improving sql as a result of reducing the cost of the query.
To Oli from UK
Jonathan Lewis, January 19, 2011 - 2:43 am UTC
"If Cost is not relevant to the run time speed, why does Jonathan Lewis talk about making costs
cheaper, and improving sql as a result of reducing the cost of the query."
But I don't - at least, not intentionally.
If you read the book I make the point that people often use the word "cost" with two different meanings, first as the figure calculated by the optimizer and secondly as a reference to the actual resource usage at run-time.
I try quite hard to use the word "cost" only when I mean the optimizer's estimate. If I have ever said anything about "reducing the cost of the query to make it go faster" that's been a slip of the tongue, and the intention has been to say "reducing the resource consumption of the query to make it go faster".
Regards
Jonathan Lewis
Cost based query transformations
Enrique Aviles, January 19, 2011 - 9:32 am UTC
Isn't this similar to what's been discussed on this thread?
http://blogs.oracle.com/optimizer/2010/09/optimizer_transformations_subquery_unesting_part_1.html "When a SQL statement is parsed, the Optimizer will try to "transform" or rewrite the SQL statement into a semantically equivalent SQL statement that can be processed more efficiently." "Note the difference between the costs of the plan XC and the plan XA; recall that XA was generated by disabling unnesting. Clearly, the plan with unnesting (XC) is much more optimal; the cost has come down from 5006K to 2300."In this case the optimizer is not evaluating different plans for a single SQL statement, it is comparing the cost of plans for semantically equivalent SQL statements and choosing the one with the lowest cost. How is that different from a person doing the same task manually?
January 23, 2011 - 3:50 pm UTC
because the cbo is rather conservative when rewriting queries, because the rewrite is done internally and isn't necessarily a "sql to sql rewrite" but a "plan to plan rewrite" (more basic), because the cbo KNOWS the sql it rewrote is semantically equivalent (and I'd say 50% or more of the rewrites performed by many humans appears to be a semantically equivalent rewrite to them but ISN'T - the two queries are actually different - you may (or may not) be surprised at this, I'm not anymore.
If the optimizer does the rewrite and as a side effect gets the WRONG cardinality then you will get the wrong plan.
It all goes back to the above - as to why it is difficult (and therefore I do not recommend it) to compare the cost of two different queries - it all comes back to the estimated cardinalities and if they were perfect - we could easily compare.
But, they are not.
Cost, in theory, is time.
cost, cpu_cost, io_cost
Mark, April 08, 2011 - 9:36 am UTC
Hi Tom,
So the COST is just a number used by optimizer to compare different plans. Do CPU_COST and IO_COST reflect the CPU time or the number of IO operations or are they also just numbers?
April 12, 2011 - 3:38 pm UTC
they reflect the cost of using the cpu and the cost of doing the io. They are just numbers - numbers with a meaning, just like the "cost" is just a number, but a number with a meaning.
io cost is based on an estimate of the number of IO's that will be performed. cpu cost is based on an estimate of the cpu that would be necessary.
Value of cost in plan
A reader, April 21, 2011 - 2:01 pm UTC
Tom - I what is significance of COST in explain plan? I have a query using full outer joiner (due to business logic this can not be avoided) and is consuming resources. The cost of query is in billions and when I use hint NATIVE_FULL_OUTER_JOIN the cost reduces to thousands. I want to understand whether low cost will improve performance to any extent? Without hint the count on query is running for 12 hours (2M rows) but with hint count is returning value in 3 mins. Any suggestions are highly appreciated.
April 25, 2011 - 8:30 am UTC
I want to understand
whether low cost will improve performance to any extent?
A query X with a lower cost than a query Y
a) you might see X run faster than Y
b) you might see X run slower than Y
c) you might see X and Y run the same
The cost is the output of a mathematical function implemented by the cost based optimizer in an attempt to find the 'best' way to execute a query. If the mathematical model gets something wrong (an estimated cardinality for example), then the cost will be wrong as well (that is what leads to bad query plans).
So a lower cost does not assure you of faster performance - in a perfect world it would, but it isn't perfect - it is a model that includes some guessing at times.