Tatiane, March 27, 2003 - 3:41 pm UTC
Ok, but...
1) Don't you think the cases you mentioned are exceptions, and the ease of not worrying about composite indexes and then indexing single columns might outweigh that ?
2) Suppose another query comes up and uses the indexes some other way than that the ordering in the index is proper for. Then I would have to create another composite index for that, like first building t(a,b,c) and then t(a,c,b) -- then I would have 2 composite indexes for the same columns -- wouldn't single column indexes be better here ?
3) This is a bit off-topic. In your query:
>select *
> from t
> where x = :x
> and y > :y
> order by z;
Wouldn't the order by z be done in a second stage after you have retrieved the rows with x and y ? From what you said, I understand this is all done in one pass.
Thanks
March 27, 2003 - 4:15 pm UTC
1) nope, not at all -- I think the exceptions are the ones you are thinking are the common ones.
concatenated indexes are necessary tuning and performance tools. disregard them as your own peril.
How many times I've fixed a system simply by adding a column to an index -- bypassing the TABLE ACCESS BY ROWID step.
Consider ALL_OBJECTS -- pretend it was a table T. A common query is obviously:
select * from all_objects where owner = :x and object_type = :y and object_name = :z;
A concatenated index is necessary there.
2) No, look at it. Why would you create a,c,b when you have a,b,c?
In general, you'll get ONE index at a time to be used. conspicous exceptions to that are:
a) bitmap indexes -- these should probably never be concatenated but on single columns only.
b) index joins -- but there ALL columns need for the query need to be in the index.
Tell you what -- test it out. create single column indexes and try to get them to be used simultaneously. Then think "hmm, how could Oracle use more then one in this case"....
3) whoops -- missed something there --
ops$tkyte@ORA920> variable x number;
ops$tkyte@ORA920> variable y number;
ops$tkyte@ORA920> variable z number;
ops$tkyte@ORA920> set autotrace traceonly explain
ops$tkyte@ORA920> select * from t where x = :x and y > :y order by y,z;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=3 Card=50 Bytes=2600)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=3 Card=50 Bytes=2600)
2 1 INDEX (RANGE SCAN) OF 'T_IDX' (NON-UNIQUE) (Cost=2 Card=4)
order by y,z will use the index only -- no extra step. Alternatively:
ops$tkyte@ORA920> set autotrace traceonly explain
ops$tkyte@ORA920> select * from t where x = :x and y = :y order by z;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=3 Card=10 Bytes=520)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=3 Card=10 Bytes=520)
2 1 INDEX (RANGE SCAN) OF 'T_IDX' (NON-UNIQUE) (Cost=2 Card=2)
if we have equality on Y, we skip the sort. You are correct as originally written:
ops$tkyte@ORA920> select * from t where x = :x and y > :y order by z;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=6 Card=50 Bytes=2600)
1 0 SORT (ORDER BY) (Cost=6 Card=50 Bytes=2600)
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=3 Card=50 Bytes=2600)
3 2 INDEX (RANGE SCAN) OF 'T_IDX' (NON-UNIQUE) (Cost=2 Card=4)
it would have sorted after the fact....
Errata -- one more thing I forgot to mention
Tatiane, March 27, 2003 - 3:47 pm UTC
Sorry... in 2) above, suppose I'm not using 9i's skip scanning -- suppose I'm using 8i. With skip-scanning, t(a,b,c) and t(a,c,b) would work the same -- not considering performance.
March 27, 2003 - 4:16 pm UTC
you do not need to have skip scans for those to be very much almost the same.
But tell me -- you need to explain how
t(a)
t(b)
t(c)
would be superior to a query that would make use of t(a,b,c) in the first place!
What if composite index and individual column indexes co-exist...
A reader, March 28, 2003 - 1:46 am UTC
What would Oracle use if the composite index on t(x,y,z) and the indexes on individual columns t(x), t(y), t(z) co-exist? Is there any benefit to have them all, or is it even worse than simple wasting of space? Thanks.
March 28, 2003 - 7:10 am UTC
yes it would.
the index on t(x) might be considered redundant in that set of indexes.
On Index Range Scan
raajesh, January 06, 2005 - 4:35 am UTC
Hi Tom
I want one clarification. In my explain plan for a query, It does an index range scan using one of my index. On the cardinality,the number of rows are given as 22. How can one manually determine the value of cardinality in index range scans..or is it internal to oracle.
Please clarify.
Thanks much
raajesh
January 06, 2005 - 10:53 am UTC
it uses statistics and math.
say you collect simple column stats and oracle knows:
column X
----------
lo value is 1
hi value is 100
number of distinct values is 100
number of total values is 500
and you query:
select * from t where x = :x
oracle says "100 distinct values, 500 total values, average of 5 rows per value, you are doing equals -- hence, this will return about 5 rows on average, cardinality = 5"
for example.
Composite index
A reader, January 06, 2005 - 2:16 pm UTC
I have a very large table (non-partitioned, I am working on evaluating if partitioning would be beneficial, anyway...)
It has 70 million rows, 100 columns but only 8 columns are used very heavily in queries
(acct_no,sec_no,enter_date,trade_date,settle_date,amt1,amt2,amt3)
The dates range evenly over 10 years
acct_no - 30K distinct values
sec_no - 100K distinct values
I want to be able to access data by any of the above columns (except the amt columns!).
But creating separate indexes on each combination (with each column on the leading edge) would be expensive, these indexes are large.
Suggestions? Thanks. Since the cardinality of the columns are quite high, the index skip scan feature in 9i wouldnt kick in here, right?
Thanks
January 06, 2005 - 6:57 pm UTC
tell us more -- how is the table updated?
are any of the columns "always" accessed?
anything else that might be interesting?
A reader, January 06, 2005 - 7:01 pm UTC
The table is never updated. Its a transaction log table, its always inserted into.
The 3 amt columns are always accessed. The date columns are also accessed. Its like the queries we have been discusing in
</code>
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:9623094173253 <code>
with the CASE and stuff.
Either the acct_no or sec_no or both are always present in the queries.
Some queries are by ONE of the date columns alone.
I just want to design as few indexes as posible to give me the most bang for my buck!
Thanks
January 06, 2005 - 7:50 pm UTC
(inserts are "updates")
rules out bitmaps.
date1, date2, acct_no, seq_no, amt1, amt2, amt3
date1, date2, seq_no, amt1, amt2, amt3
would give you the ability to sum/avg/whatever amt1..3
with predicates on acct_no, OR seq_no OR acct_no, seq_no
if you range scan on either of acct_no or seq_no -- answer might change.
A reader, January 06, 2005 - 8:22 pm UTC
Either the acct_no or sec_no or both are always present in the queries
Given that acct_no and/or sec_no are always present in all queries, shouldnt they be on the leading edge of the index?
January 07, 2005 - 8:49 am UTC
it'll depend totally on the predicate
put the things that use "=" the most on the leading edge
put the things that use "range operations" inside the index.
A reader, January 07, 2005 - 4:46 pm UTC
So I guess if 5 columns use "=", I need 5 indexes?
Anyway, I created some indexes and ran some queries/views where all the columns referenced are in the indexes, but I still see a "table access by rowid" step in the explain plan. Why would this be?
Thanks
January 08, 2005 - 4:18 pm UTC
no -- my point was if you:
where x = :x and y = :y and z > :z
the logical ordering would be:
(x,y,z)
or
(y,x,z)
that is, things you use equality on -- go first, things you range on -- typically go after them
if you queried "x=:x and y=:y and z=:z" then X,Y,Z in ANY order would be appropriate.
prove that last part to me -- that you have the index on ALL columns and that you still have table access by index rowid.
A reader, January 07, 2005 - 7:27 pm UTC
If I have
create table t(i int,j int,k int,.....);
create index i on t(i,j,k);
create or replace view v as
select i,j,sum(k) k from t group by i,j;
Now, if I do
select i,sum(k) from v group by i;
I think this is essentially what I am seeing. Since the query above can be satisfied fully from the index, why do I see a table access by index rowid?
January 08, 2005 - 4:24 pm UTC
Actually -- I'd be asking you "why or how could you possibly even use the index in this case as it would give the wrong answer -- since I, J, K are all NULLABLE"
I would not expect to see an index here *at all*
that example screams "full table scan"
and if you make column I NOT NULL -- i would expect nothing other than an index full scan or fast full scan.
And that is in fact what I see:
ops$tkyte@ORA9IR2> create table t(i int,j int,k int );
Table created.
ops$tkyte@ORA9IR2> create index i on t(i,j,k);
Index created.
ops$tkyte@ORA9IR2> create or replace view v as
2 select i,j,sum(k) k from t group by i,j;
View created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec dbms_stats.set_table_stats( user, 'T', numrows => 10000000, numblks => 10000 );
PL/SQL procedure successfully completed.
ops$tkyte@ORA9IR2> exec dbms_stats.set_index_stats( user, 'I', numrows => 10000000, numlblks => 1000 );
PL/SQL procedure successfully completed.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> set autotrace traceonly explain
ops$tkyte@ORA9IR2> select i,sum(k) from v group by i;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=39160 Card=10000000 Bytes=260000000)
1 0 SORT (GROUP BY) (Cost=39160 Card=10000000 Bytes=260000000)
2 1 VIEW OF 'V' (Cost=39160 Card=10000000 Bytes=260000000)
3 2 SORT (GROUP BY) (Cost=39160 Card=10000000 Bytes=390000000)
4 3 TABLE ACCESS (FULL) OF 'T' (Cost=1519 Card=10000000 Bytes=390000000)
ops$tkyte@ORA9IR2> alter table t modify I NOT NULL;
Table altered.
ops$tkyte@ORA9IR2> select i,sum(k) from v group by i;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1001 Card=10000000 Bytes=260000000)
1 0 SORT (GROUP BY) (Cost=1001 Card=10000000 Bytes=260000000)
2 1 VIEW OF 'V' (Cost=1001 Card=10000000 Bytes=260000000)
3 2 SORT (GROUP BY NOSORT) (Cost=1001 Card=10000000 Bytes=390000000)
4 3 INDEX (FULL SCAN) OF 'I' (NON-UNIQUE) (Cost=1001 Card=10000000 Bytes=390000000)
ops$tkyte@ORA9IR2> set autotrace off
So are you saying one should add indexes on only NOT NULL columns ?
Sonali, January 11, 2005 - 2:14 pm UTC
January 11, 2005 - 2:52 pm UTC
where do you see anything remotely close to that being said?
I need to know, so I can fix it.
NO, that is not even remotely close to what I was saying.
I said the index could not possibly be used for that query in that example directly above (because it would get the wrong answer).
However, that index could be infinitely useful to the person asking:
select * from t where i = :i and j = :j and k = :k;
Not clear
A reader, January 12, 2005 - 12:24 pm UTC
I guess we are still not on the same page.
My query output columns are
(acct_no,sec_no,enter_date,trade_date,settle_date,amt1,amt2,amt3)
All these are NOT NULL columns
Most queries use
acct_no=:b1 and sec_no=:b2 and *_date <= :b3
Some queries use
*_date=:b1
Some queries use
sec_no=:b1 and *_date <= :b2
amt1/amt2/amt2 are NEVER in the predicate, just sum/avg in the output.
Given this, what should my indexes be? Minimum number of indexes to satisfy above needs
Thanks
January 12, 2005 - 1:53 pm UTC
why minimum?
do you want 'best performance' or 'smallest size'
the minimum is of course ZERO. you need none.
the mimimum that lets you always use an index is 3. but that might be horrible for performance.
the three minimal ones are:
i1(enter_date,sec_no,acct_no)
i2(trade_date,sec_no,acct_no)
i3(settle_date,sec_no,acct_no)
they can do all of:
acct_no=:b1 and sec_no=:b2 and *_date <= :b3
*_date=:b1
sec_no=:b1 and *_date <= :b2
however, what if we used:
sec_no=:b1 and *_date <= :b2
and :b2 was "sysdate" and ALL of the records where less than "sysdate" (or we just use a :b2 whereby MOST of the records are less than sysdate)
that would have to read MOST, if not all of the index a block at a time -- since the data is sorted by date.
if the index were on
sec_no,enter_date
instead -- then it would only read the relevant records from the index. consider:
ops$tkyte@ORA9IR2> create table t(sec_no int, edate date, amt1 number );
Table created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> insert into t
2 select mod(rownum,1000), sysdate-rownum , rownum
3 from all_objects;
27939 rows created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> variable b1 number
ops$tkyte@ORA9IR2> variable b2 varchar2(30)
ops$tkyte@ORA9IR2> exec :b1 := 42; :b2 := to_char(sysdate,'dd-mon-yyyy');
PL/SQL procedure successfully completed.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create index t_idx on t(edate,sec_no);
Index created.
ops$tkyte@ORA9IR2> set autotrace traceonly
ops$tkyte@ORA9IR2> @trace
ops$tkyte@ORA9IR2> alter session set events '10046 trace name context forever, level 12';
Session altered.
ops$tkyte@ORA9IR2> select * from t t1 where sec_no = :b1 and edate <= to_date(:b2,'dd-mon-yyyy');
28 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T'
2 1 INDEX (RANGE SCAN) OF 'T_IDX' (NON-UNIQUE)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
121 consistent gets
90 physical reads
0 redo size
1145 bytes sent via SQL*Net to client
510 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
28 rows processed
ops$tkyte@ORA9IR2> @traceoff
ops$tkyte@ORA9IR2> alter session set events '10046 trace name context off';
Session altered.
ops$tkyte@ORA9IR2> set autotrace off
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> drop index t_idx;
Index dropped.
ops$tkyte@ORA9IR2> create index t_idx on t(sec_no,edate);
Index created.
ops$tkyte@ORA9IR2> set autotrace traceonly
ops$tkyte@ORA9IR2> @trace
ops$tkyte@ORA9IR2> alter session set events '10046 trace name context forever, level 12';
Session altered.
ops$tkyte@ORA9IR2> select * from t t2 where sec_no = :b1 and edate <= to_date(:b2,'dd-mon-yyyy');
28 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T'
2 1 INDEX (RANGE SCAN) OF 'T_IDX' (NON-UNIQUE)
Statistics
----------------------------------------------------------
29 recursive calls
0 db block gets
35 consistent gets
1 physical reads
0 redo size
1145 bytes sent via SQL*Net to client
510 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
28 rows processed
ops$tkyte@ORA9IR2> @traceoff
ops$tkyte@ORA9IR2> alter session set events '10046 trace name context off';
Session altered.
tkprof says:
select * from t t1 where sec_no = :b1 and edate <= to_date(:b2,'dd-mon-yyyy')
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 3 0.01 0.01 90 121 0 28
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 5 0.01 0.01 90 121 0 28
Rows Row Source Operation
------- ---------------------------------------------------
28 TABLE ACCESS BY INDEX ROWID T (cr=121 r=90 w=0 time=12737 us)
28 INDEX RANGE SCAN T_IDX (cr=93 r=90 w=0 time=12539 us)(object id 34753)
<b>notice the CR=93, we read the ENTIRE index there -- had to, had no choice. CBO on a large table would most likely say "index fast full scan", so while it MIGHT use an index, it would be full scanning it</b>
********************************************************************************
select * from t t2 where sec_no = :b1 and edate <= to_date(:b2,'dd-mon-yyyy')
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 3 0.00 0.00 1 32 0 28
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 5 0.00 0.00 1 32 0 28
Rows Row Source Operation
------- ---------------------------------------------------
28 TABLE ACCESS BY INDEX ROWID T (cr=32 r=1 w=0 time=319 us)
28 INDEX RANGE SCAN T_IDX (cr=4 r=1 w=0 time=173 us)(object id 34754)
<b>turn the index around and wow -- we did 4 IO's against the index </b>
So, do you really want "minimum needed just to say we are using indexes" or "the right amount to make queries go fast.
Only you know however which is right, it could well be that:
i1(enter_date,sec_no,acct_no)
i2(trade_date,sec_no,acct_no)
i3(settle_date,sec_no,acct_no)
is more than satisfactory because you *know* the value supplied against the date will be selective enough.
On the other hand, you might well need more indexes, some with sec_no, acct_no on the leading edge, others with dates on the leading edge.
A reader, January 13, 2005 - 10:06 am UTC
"why minimum?do you want 'best performance' or 'smallest size'"
Well, the 2 are not necessarily mutually exclusive goals. I want best performance with smallest size! I dont want a brute force design with indexes all over the place. I know exactly how the queries are written, this is a very focused, specific app doing known things. So, I want to use this knowledge to design the least number of indexes to get max performance.
Thanks for the example. In my case, I know that the date specified will be selective, selecting less than 5% of the range of values. So, I guess your i1,i2,i3 will be sufficient.
Regarding the need to throw in the 3 amt fields in the index also to avoid the "table access by rowid" entirely)...
Is it really a desirable goal to try to satisy queries using the index alone? i.e. is aiming to skip the "table access by rowid" step a worthwhile goal?
In other words, given a table segment of xMB and an index segment of xMB, would the FTS be faster or would the full index scan be faster? Since a b*tree index is a much more complicated structure to traverse compared to a table, I would think a FTS would be faster, right?
Thanks
January 13, 2005 - 11:03 am UTC
no, you either want
a) smallest size, with the performance it gets
b) best performance, with the size that results
it is not that they are mutually exclusive -- you may be able to achieve both -- but it depends totally on your data, your queries and your inputs to your queries against that data.
if you can satisfy the query from the index entirely -- you save tons of work.
a table segment of Xmb and index segment of Xmb would full scan/fast full scan in the same amount of time.
a full index scan would read the data IN ORDER (which can be used to do group by nosorts)
but you said "less than 5%..." so where does a full scan come into play? I'd be trying to skip the table access by index rowid as a result of a index range scan, not an index fast full scan.
related question
sonali, January 13, 2005 - 11:02 am UTC
IF I make a composite index say
idex1 (col1, col2, col3)
and column col2 is my Foreign key..do you think I need to have separate index on that column in 1st place.
Like this
idex1 (col1, col2, col3)
idex2 (col2) to avoid locking issues. I see lots of documents that says that all foreign keys must be indexed. But now with 9i and up with skip scanning mentioned in this link I would guess that I would not need separate index. AM I wrong ?
Thanks
January 13, 2005 - 11:08 am UTC
the fkey columns need be on the LEADING EDGE of the index to be used by the referential integrity code.
skip scanning is an access path -- it is not sufficient for the fkey issue. besides, unless col1 is very low cardinality -- skip scanning does not take place.
(and we have to have a precise section of the index to lock here -- remember, the table lock happens
a) when we delete parent
b) update parent primary key
we need to "lock" NON-EXISTENT data in the child table for all intents and purposes. a skip scan would give us many "non-existent" places to lock!)
A reader, January 13, 2005 - 11:46 am UTC
"a table segment of Xmb and index segment of Xmb would full scan/fast full scan in the same amount of time"
Hm, thats surprising. FTS seems much "simpler" than traversing with b*tree, left/right, breadth-first or depth-first, etc issues involved. OK I will take your word for it.
"but you said "less than 5%..." so where does a full scan come into play?"
Most users/queries retrieve 5% of the data, but I was just asking in case some crazy user puts in a date range that happens to cover the entire index!
"I'd be trying to skip the table access by index rowid as a result of a index range scan, not an index fast full scan"
Ok, lets say I put all my desired columns in the index so as to try and skip the table access entirely.
This makes my index segment larger than the table segment.
As you said above, it takes the same amount of time/resources to full scan both a table or index segment of the same size.
So, by trying to avoid table acess by rowid, I stuff all the columns in the index and thus make the index larger, thus making a "range scan" (or even a full table/index) scan operation on the index more expensive than the table, thus defeating the purpose of adding the columns to the index to begin with?!
Is my logic flawed? Thanks
January 13, 2005 - 12:53 pm UTC
full scan - table
FAST full scan - index, reads index blocks in "extent order", not from root to branch to leaf to leaf to leaf to leaf. Just reads the index like a table.
there is an index full scan as well -- but if you were using the index "as a skinny table", it would be a FAST full scan.
so, to me the size of an index is quite simply not relevant. Not with disk being so darn cheap anyway.
My point was -- if I can range scan and get the data, all of it, bypassing the table access by index rowid step -- that is going to be faster.
If we are going to full scan, the optimizer would pick the best one to scan -- if the scan of the table was cheaper than index, it would do the table.
Table access by rowid
A reader, January 18, 2005 - 10:38 am UTC
"prove that last part to me -- that you have the index on ALL columns and that you still have table access by index rowid"
Here is my ddl
create table txn
(
tran_id number(14) not null,
acct_num char(6 byte) not null,
sec_num char(6 byte) not null,
settle_date date not null,
trade_date date not null,
enter_date date not null,
tran_units number(15,3) not null,
book_cost number(14,2) not null,
tax_cost number(14,2) not null,
)
create index l on txn
(sec_num, enter_date, trade_date, settle_date, tran_units,
book_cost, tax_cost)
create index k on txn (acct_num, sec_num, enter_date, trade_date, settle_date,
tran_units, book_cost, tax_cost)
Here is my view
CREATE OR REPLACE VIEW vw
AS
SELECT
acct_num,
sec_num,
CASE rec_type
WHEN 's' THEN settle_date
WHEN 'e' THEN enter_date
WHEN 't' THEN trade_date
END bal_date,
rec_type bal_type,
SUM (tran_units) units,
SUM (book_cost) book_cost,
SUM (tax_cost) tax_cost
FROM txn,(SELECT 's' rec_type FROM my_dual
UNION ALL
SELECT 'e' FROM my_dual
UNION ALL
SELECT 't' FROM my_dual)
GROUP BY
acct_num,
sec_num,
CASE rec_type
WHEN 's' THEN settle_date
WHEN 'e' THEN enter_date
WHEN 't' THEN trade_date
END,
rec_type;
Here is the query
ALTER SESSION SET nls_date_format='mm/dd/yyyy';
SELECT sec_num,bal_date,SUM(units)
FROM vw
WHERE sec_num='123456'
AND bal_type='S'
AND bal_date='1/1/2003'
GROUP BY sec_num,bal_date;
Here is the plan
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 32 | 12 (25)|
| 1 | SORT GROUP BY | | 1 | 32 | 12 (25)|
| 2 | VIEW | VW | 7 | 224 | |
| 3 | SORT GROUP BY | | 7 | 385 | 11 (19)|
| 4 | TABLE ACCESS BY INDEX ROWID| TXN | 2 | 104 | 2 (50)|
| 5 | NESTED LOOPS | | 7 | 385 | 10 (10)|
| 6 | VIEW | | 3 | 9 | |
| 7 | UNION-ALL | | | | |
|* 8 | FILTER | | | | |
| 9 | INDEX FULL SCAN | SYS_C0037974 | 1 | | 27 (4)|
|* 10 | FILTER | | | | |
| 11 | INDEX FULL SCAN | SYS_C0037974 | 1 | | 27 (4)|
|* 12 | FILTER | | | | |
| 13 | INDEX FULL SCAN | SYS_C0037974 | 1 | | 27 (4)|
|* 14 | INDEX RANGE SCAN | L | 2 | | 2 (50)|
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
8 - filter('s'='S')
10 - filter('e'='S')
12 - filter('t'='S')
14 - access("TXN"."SEC_NUM"='123456')
filter(CASE WHEN "from$_subquery$_003"."REC_TYPE"='s' THEN
"TXN"."SETTLE_DATE" WHEN "from$_subquery$_003"."REC_TYPE"='e' THEN
"TXN"."ENTER_DATE" WHEN "from$_subquery$_003"."REC_TYPE"='t' THEN
"TXN"."TRADE_DATE" END =TO_DATE('2003-01-01 00:00:00', 'yyyy-mm-dd
hh24:mi:ss'))
Why does step 4 have the table access by rowid since everything is in the index?
Thanks
January 18, 2005 - 11:44 am UTC
you do see that it is blowing off your big index in favor of bulk operations against the table right?
that is, it is not using that huge index.
have you sufficiently scaled up your example, the counts look "very teeny tiny" in that plan.
A reader, January 18, 2005 - 1:46 pm UTC
I am not sure I understand. Yes, the table is large (~80 million rows). The index segment is larger than the table segment. But the query I am asking is very specific so the card numbers are actually pretty much right on.
You seemed to imply that if a query can be satisfied from an index, Oracle will never go to the table, ever. No ifs and or buts. Now you are indicating otherwise?
How does it matter if the index is huge or not? We just had a discussion at
</code>
http://asktom.oracle.com/pls/ask/f?p=4950:61:15083739629133027981::::P61_ID:1295519444744#31406314364415 <code>
where you said that the size of the index is irrelevant, the index is not there to be smaller than the table, it is there to speed up queries.
In this case, using the index is the right thing to do, so why does Oracle go to the table?
Thanks
January 18, 2005 - 1:54 pm UTC
so basically it is saying "regardless of the fact that you have this nice big fat index, we find this smaller index to be better"
Here, Oracle chose not to use the big fat index -- so therefore it did not use it. Since it did not use it, it had to go to the table.
Perhaps I should say "if and when Oracle decides to use an index that has all of the columns needed to satisfy the query, Oracle can skip going to the table"
In this case, this index isn't deemed appropriate by the optimizer -- my statement is still accurate, but only when the index is actually *used*.
it is looking at your pivot -- obviating the ability to use the index on the date columns -- and saying "where sec_no = whatever" returns WAY too many rows.
that pivot in there will obviate most concatenated indexes.
given this view:
CREATE OR REPLACE VIEW vw
AS
SELECT
acct_num,
sec_num,
CASE rec_type
WHEN 's' THEN settle_date
WHEN 'e' THEN enter_date
WHEN 't' THEN trade_date
END bal_date,
rec_type bal_type,
SUM (tran_units) units,
SUM (book_cost) book_cost,
SUM (tax_cost) tax_cost
FROM txn,(SELECT 's' rec_type FROM my_dual
UNION ALL
SELECT 'e' FROM my_dual
UNION ALL
SELECT 't' FROM my_dual)
GROUP BY
acct_num,
sec_num,
CASE rec_type
WHEN 's' THEN settle_date
WHEN 'e' THEN enter_date
WHEN 't' THEN trade_date
END,
rec_type;
the ONLY index that could be used would be on (sec_no,acct_num) and that is probably no where near selective enough.
If you lose the view, and pivot your final ANSWER, it might well be in the realm of "useful"
A reader, January 18, 2005 - 2:05 pm UTC
"If you lose the view, and pivot your final ANSWER"
Not sure what you mean by that, can you please elaborate?
Also, I guess I can try adding acct_num to the index "l"
create index l on txn (sec_num, acct_num, enter_date, trade_date, settle_date, tran_units, book_cost, tax_cost)
But since acct_num is aggregated away in the final query, I thought it didnt matter?
Gasp...wait a minute, is it going to the table to get the acct_num since it isnt present in the index "l"? How can I tell "why" it is going to the table, to get what column?
Thanks
January 18, 2005 - 3:25 pm UTC
query the data you want,
then pivot that data.
do you see -- that if you query on that pivoted data -- gasp -- it would be impossible to use the date in your view against the index?
This is different from the way you said you would query it above, this indexing schema therefore makes no sense.
when you said:
All these are NOT NULL columns
Most queries use
acct_no=:b1 and sec_no=:b2 and *_date <= :b3
Some queries use
*_date=:b1
Some queries use
sec_no=:b1 and *_date <= :b2
I assumed that *_date meant ONE OF the dates, I could not anticipate a pivot being in there.
so really -- this entire discussion has been "sort of moot", we haven't been talking about the same things *at all*
A reader, January 18, 2005 - 3:56 pm UTC
January 18, 2005 - 4:00 pm UTC
you can query data, then pivot
if you can pivot and query, you can certainly query and pivot.
so, I'm missing something.
Seems you might well want to 'pivot' with a union all view of three queries - each of which would be able to use an appropriate index.
and remember, I get lots and lots of questions here -- lots of conversations. It is many times hard enough to keep the context on a single page, let alone across pages. You are just "a reader" after all, that makes this even more impossible to do.
Context
A reader, January 18, 2005 - 5:27 pm UTC
Sorry for mixing up the context across 2 pages.
Let me try to recap here
Here is my table
create table txn
(
tran_id number(14) not null,
acct_num char(6 byte) not null,
sec_num char(6 byte) not null,
settle_date date not null,
trade_date date not null,
enter_date date not null,
tran_units number(15,3) not null,
book_cost number(14,2) not null,
tax_cost number(14,2) not null,
);
Here is my (intermediate) view
CREATE OR REPLACE VIEW vw
AS
SELECT
acct_num,
sec_num,
CASE rec_type
WHEN 's' THEN settle_date
WHEN 'e' THEN enter_date
WHEN 't' THEN trade_date
END my_date,
rec_type,
SUM (tran_units) units,
SUM (book_cost) book_cost,
SUM (tax_cost) tax_cost
FROM txn,(SELECT 's' rec_type FROM my_dual
UNION ALL
SELECT 'e' FROM my_dual
UNION ALL
SELECT 't' FROM my_dual)
GROUP BY
acct_num,
sec_num,
CASE rec_type
WHEN 's' THEN settle_date
WHEN 'e' THEN enter_date
WHEN 't' THEN trade_date
END,
rec_type;
Here is my desired output (pivot)
create or replace view final_output_1
select acct_num,my_date,
sum(decode(rec_type,'s',amt,0)) settle_amt,
sum(decode(rec_type,'e',amt,0)) enter_amt,
sum(decode(rec_type,'t',amt,0)) trade_amt
from vw
group by acct_num,my_date;
create or replace view final_output_2
select sec_num,my_date,
sum(decode(rec_type,'s',amt,0)) settle_amt,
sum(decode(rec_type,'e',amt,0)) enter_amt,
sum(decode(rec_type,'t',amt,0)) trade_amt
from vw
group by sec_num,my_date;
create or replace view final_output_3
select acct_num,sec_num,my_date,
sum(decode(rec_type,'s',amt,0)) settle_amt,
sum(decode(rec_type,'e',amt,0)) enter_amt,
sum(decode(rec_type,'t',amt,0)) trade_amt
from vw
group by acct_num,sec_num,my_date;
When querying by acct_num and date, final_output_1 is used
When quering by sec_num and date, final_output_2 is used
When quering by acct_num and sec_num and date, final_output_3 is used
Questions:
1. How can I optimally use indexes in this situation?
2. Do I need to query and then pivot instead of pivot and query? Not sure how can I do this given the design above?
Thanks a lot
January 19, 2005 - 10:10 am UTC
either your data model is wrong (it is, you have in record dimensions that should have been cross record -- not columns -- but rows)
or
you want to use a union all of three queries to pivot, in order to be able to use indexes.
this is getting too large and out of control here, please -- when I'm taking new questions, feel free to put it there.
Composite indexes ??
A reader, March 01, 2005 - 11:02 am UTC
Hi Tom,
In our current project we have a web retrieval application from which we can initiate some searches. There are a lot of fields that can be searched on like empno, lastname, firstname, date of birth, ssn, height, weight, gender etc. In our table we have an index on these fields separately. There are times when they can combine 2 or more fields for a search. During these times, the performance is very slow. It gets very cumbersome to imagine all possible combinations and create composite indexes. I don't know how else to address this issue? Do you have any thoughts? Your help is greatly appreciated.
Thanks.
March 01, 2005 - 11:27 am UTC
these retrievals
mostly read only data you are querying or is this data updated frequently.
tell me about the use pattern of the data.
I have two ways to solve this depending on your answer.
Inserts
A reader, March 01, 2005 - 11:31 am UTC
This table is mostly insert only.
March 01, 2005 - 11:33 am UTC
"bulk insert at night"
or
"row by row inserts continously"
?
A reader, March 01, 2005 - 12:28 pm UTC
Bulk insert at night. I have a scheduled job that runs every night and inserts into this database from another database.
March 01, 2005 - 12:37 pm UTC
as long as it is a true bulk insert:
o insert /*+ append */ as select...
o insert as select ...
o or inserts done with large array sizes (100-500 rows)
then bitmap index the columns of interest. We can AND / OR bitmaps together nicely.
single column bitmap indexes (not a concatenated one)
A reader, March 01, 2005 - 1:26 pm UTC
so should I create individual bitmap indexes on all columns like lastname, firstname, dob, gender, height, weight, etc. should i drop the existing indexes?
A reader, March 01, 2005 - 1:56 pm UTC
Thank you for directing me to these articles. Let me go through them and get back to you if I have questions. Thanks a lot.
Testing??
A reader, March 01, 2005 - 2:05 pm UTC
Hi Tom,
If I am going to be following the articles you directed me to and if I want to test the use of bitmap indexes versus btree indexes, can I perform the following tests:
1. In the current setup for example I have btree indexes on lastname separately, firstname separately and (lastname, firstname) together.
2. I will drop all these 3 indexes and create bitmap indexes on lastname and firstname separately.
If I do the above 2 steps and then run a search, will that give me an indication of which is better? And then I can carry on the change to all the other fields? Am I missing something?
Thanks.
March 01, 2005 - 3:08 pm UTC
well -- with b*trees you should probably only have 2
lname,fname
fname
or
fname,lname
lname
I'd probably want to test a broader range of queries (involving typical predicates -- perhaps you can gather that out of v$sql? )
but it would start to give you a feel.
A reader, March 02, 2005 - 10:15 am UTC
Ok Tom, I will run a test and then update you.
Thanks.
Laura, March 03, 2008 - 5:14 pm UTC
Hi Tom,
the situation in my company is very similar to the one above, but it is an extreme case. We also have one 100 mil rows single table, and users are querying only a few columns. However all queries always contain the same first 3 columns.
So the table has 10 composite indexes, with this pattern:
index1: col1, col2, col3, col44
index2: col1, col2, col3, col99, col88
index3: col1, col2, col3, col66
...
...
etc. The columns in the indexes are very large, and the load to this table, which is done in real time several times a day, takes a very long time and interferes with response time for the queries.
I tried putting a single column index, or a smaller composite index, on the columns that vary, but Oracle never uses the smaller indexes. There must be a better solution other than having the same 3 columns repeated in 10 indexes, is there?
The second question is about an observation that I found bizarre:
the query was:
Select *
from table1
where id = 123
The id is the primary key. Oracle used a composite index where id was the second column. It did not use the unique PK index. What could be the rationale for that?
Thank you.
March 03, 2008 - 9:00 pm UTC
In general - a single index will be used for access (bitmaps - that is not true, there is also a special "index join")
...
The second question is about an observation that I found bizarre:
the query was:
Select *
from table1
where id = 123
The id is the primary key. Oracle used a composite index where id was the
second column. It did not use the unique PK index. What could be the rationale
for that?
...
got the example, show us the table, the indexes in place - any chance that ID is a varchar2? show us the plans
A reader, March 04, 2008 - 7:16 pm UTC
I just noticed that in the 'create index' script the pk index is created as noparallel, and the composite index is created as parallel (degree 4).
Both indexes are on integers, not varchar2. Could it be that Oracle will always prefer a parallel processing index to a noparallel one? It would make sense, right?
March 04, 2008 - 8:00 pm UTC
no example, pointless to guess.
no plans
no datatypes
no information
sorry - I will not hypothesize. No, Oracle does not always prefer anything to anything. It is costed out and then selected.
Laura, March 05, 2008 - 2:26 pm UTC
Sorry for the delay, here it is:
Query:
select count(0) from MAIN_TBL where ID < 56020255 and ID >= 16020255
Plan:
SELECT STATEMENT ALL_ROWS
Cost: 23,484 Bytes: 7 Cardinality: 1
6 SORT AGGREGATE Bytes: 7 Cardinality: 1
5 PX COORDINATOR
4 PX SEND QC (RANDOM) PARALLEL_TO_SERIAL SYS.:TQ10000 Bytes: 7 Cardinality: 1
3 SORT AGGREGATE PARALLEL_COMBINED_WITH_PARENT
Bytes: 7 Cardinality: 1
2 PX BLOCK ITERATOR PARALLEL_COMBINED_WITH_CHILD
Cost: 23,484 Bytes: 63,489,951 Cardinality: 9,069,993
1 INDEX FAST FULL SCAN INDEX PARALLEL_COMBINED_WITH_PARENT MAIN_TBL_IDX
Cost: 23,484 Bytes: 63,489,951 Cardinality: 9,069,993
CREATE TABLE MAIN_TBL
(
ID INTEGER NOT NULL,
...
...
REGION_ID INTEGER,
....)
)
TABLESPACE MAIN_DATA
PCTUSED 0
PCTFREE 10
INITRANS 1
MAXTRANS 255
STORAGE (
INITIAL 40M
NEXT 40M
MINEXTENTS 1
MAXEXTENTS UNLIMITED
PCTINCREASE 0
BUFFER_POOL DEFAULT
)
LOGGING
NOCOMPRESS
NOCACHE
PARALLEL ( DEGREE 4 INSTANCES 1 )
MONITORING;
CREATE UNIQUE INDEX MAIN_TBL_PK ON MAIN_TBL
(ID)
LOGGING
TABLESPACE MAIN_IDX
PCTFREE 10
INITRANS 2
MAXTRANS 255
STORAGE (
INITIAL 400K
NEXT 400K
MINEXTENTS 1
MAXEXTENTS UNLIMITED
PCTINCREASE 0
BUFFER_POOL DEFAULT
)
NOPARALLEL;
CREATE INDEX MAIN_TBL_IDX ON MAIN_TBL
(REGION_ID, ID)
LOGGING
TABLESPACE MAIN_DATA
PCTFREE 10
INITRANS 2
MAXTRANS 255
STORAGE (
INITIAL 40K
NEXT 40K
MINEXTENTS 1
MAXEXTENTS UNLIMITED
PCTINCREASE 0
BUFFER_POOL DEFAULT
)
PARALLEL ( DEGREE 4 INSTANCES 1 );
From constraints table:
constraint name
SYS_C0071476
table:
MAIN_TBL
Type:
primary key
column name:
id
Thank you. By the way, my main concern is on question 1, how to get rid of those 3 leading columns in all 10 of our indexes. They have very low cardinality. I can't make my case with my mgr though, because the Oracle optimizer is choosing the larger indexes instead of the 1-column ones.
You suggested bitmap indexes. I read though that bitmap indexes have to be dropped and re-created very often, or they get out of date. The client wants the ability to query 24x7, and recreating the indexes takes too many hours, so we never do it. Can I use bitmap indexes anyway?
(P.S. the join indexes you mention seem to be used for table joins, and we have only one table. At least we don't have that problem...)
Thank you very much for answering!
March 06, 2008 - 7:45 am UTC
it is doing a fast full scan, it is not using the index as an index, but as a skinny version of the table.
It preferred a parallel fast full scan of that index
over
a serial full scan of the other
seems pretty straightforward.
Composite Index structure
Arun Mathur, July 02, 2010 - 11:29 am UTC
Hi Tom,
I've gathered a lot of useful information on composite indexes. I could be missing something simple, but how is a composite index structure stored?
For example, if I have a table and composite index such as :
SQL> create table t(t_pk number(11) not null primary key,
2 t_val1 number(11),
3 t_desc varchar2(100)) tablespace users;
Table created.
SQL> create index t_idx on t(t_val1,t_desc) tablespace users;
Index created.
Does this imply that the values for the branch blocks will consist of ranges for both t_val1 and t_desc values in the order of (t_val1,t_desc)? Or, does the branch block contain ranges for just t_val1, followed by additional branches for t_desc, or none of the above? :)
Regards,
Arun
July 06, 2010 - 2:40 pm UTC
the entire key is considered, t_val1,t_desc would be the key, rowid would be the functionally dependent data (in a unique index)
In a non-unique index, the key would be t_val1,t_desc,rowid (hence in a non-unique single column index, you really have a composite unique index on column,rowid!!)
And the entire key can be seen in the root and branch blocks.
Thank you
Arun Mathur, July 07, 2010 - 9:40 am UTC