But...
A reader, September 22, 2002 - 7:35 pm UTC
Tom,
My question is whether in any situation would indexes slow down the query or will it cause the optimizer to take more time in arriving at the best path?
Would having Indexes on columns when it is NOT very selective have negative effects on Querying? Like say an index on a date field. There could be any number of records having the same date. This question is because we often query all records within a date range.
Other situation is having multiple indexes on one field with different combinations. For eg:- i have an index on id,exp_dt combination and another on id, some_other_col1, some_other_col2. Would this impact the queries which use only id in the where clause?
This is with respect to OLAP application.
Thanks very much Tom...Really appreciate your efforts.
September 23, 2002 - 7:24 am UTC
Sure, absolutely, but that can happen with 1, 2, 3, 4, 5, ... 10000 indexes. The *number* of indexes is not relevant here.
If you use the CBO and analyze your objects, this becomes less of a concern as the optimizer will understand if the data is selective or not.
In your case, you said you already did the analysis, all indexes are needed -- hence they must be selective or you wouldn't need them.
All Columns are indexed
Tony, September 23, 2002 - 12:41 am UTC
Hi Tom,
1.Is it good to index all columns in a table if required?
Is there any alternative?
2. I heared that date columns should not be indexed. Is it true? Why?
September 23, 2002 - 7:38 am UTC
1) if it is required, although I'm pretty hard pressed to come up with an example
2) myth, totally false.
I have heard of the 5-index "limit" before
Tony Andrews, September 23, 2002 - 6:23 am UTC
It used to be the case (I was told) that once the RBO had considered 5 indexes on a table for use in a query, it picked the best of those 5 and did not consider any others. We had our database design checked by an Oracle consultant "guru" some years ago, and he advised dropping all but 5 indexes on our main table to "improve" performance, citing the reason above. Since he was supposed to be the expert, our manager insisted we follow his advice, despite us knowing it to be nonsense (i.e. despite seeing an immediate degradation in performance). Once the consultant had left, I quietly reinstated the necessary indexes and no more was ever heard about it...
September 23, 2002 - 8:02 am UTC
http://docs.oracle.com/docs/cd/A87860_01/doc/server.817/a76992/optimops.htm#39156
it is talking about being able to MERGE upto 5 single column indexes together.
for example:
ops$tkyte@ORA920.US.ORACLE.COM> create table t ( c1 int, c2 int, c3 int, c4 int, c5 int, c6 int );
Table created.
ops$tkyte@ORA920.US.ORACLE.COM>
ops$tkyte@ORA920.US.ORACLE.COM> begin
2 for i in 1 .. 6
3 loop
4 execute immediate 'create index c'||i||'_idx on t(c'||i||')';
5 end loop;
6 end;
7 /
PL/SQL procedure successfully completed.
ops$tkyte@ORA920.US.ORACLE.COM>
ops$tkyte@ORA920.US.ORACLE.COM>
ops$tkyte@ORA920.US.ORACLE.COM> set autotrace traceonly explain
ops$tkyte@ORA920.US.ORACLE.COM> select *
2 from t
3 where c1 = 55 and c2 = 32 and c3 = 21 and c4 = 34 and c5 = 121 and c6 = 6543;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T'
2 1 AND-EQUAL
3 2 INDEX (RANGE SCAN) OF 'C1_IDX' (NON-UNIQUE)
4 2 INDEX (RANGE SCAN) OF 'C2_IDX' (NON-UNIQUE)
5 2 INDEX (RANGE SCAN) OF 'C3_IDX' (NON-UNIQUE)
6 2 INDEX (RANGE SCAN) OF 'C4_IDX' (NON-UNIQUE)
7 2 INDEX (RANGE SCAN) OF 'C5_IDX' (NON-UNIQUE)
We'll merge upto 5 indexes together to get the answer.
Now, think about it -- should I drop c6_idx?
If you say yes, I say "no, no way". Why? because the most popular query in our system is
select * from t where c6 = 42313;
If you say no, I say "yes, sure, go ahead". Why? because our system never queries on c6 alone, in fact -- it is rare to use c6 in a predicate at all.
If you want to "tune" the above query, should you add a 7th index to the system? Could be.
ops$tkyte@ORA920.US.ORACLE.COM> create index c7_idx on t(c1,c2,c3,c4,c5,c6);
Index created.
ops$tkyte@ORA920.US.ORACLE.COM> set autotrace traceonly explain
ops$tkyte@ORA920.US.ORACLE.COM> select * from t where
2 c1 = 55 and c2 = 32 and c3 = 21 and c4 = 34 and c5 = 121 and c6 = 6543;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 INDEX (RANGE SCAN) OF 'C7_IDX' (NON-UNIQUE)
would definitely be faster then 5 index scans plus a merge.
<b>Here you are all seeing the genesis of a myth. Virtually all Oracle myths (you should have as few extents as possible, one being best for example) have a nugget of truth hiding in them somewhere.
Here the number 5, being the number we can merge. It has NOTHING WHATSOEVER to do with "how many indexes should a table have" -- nothing. But it has been perverted to mean that over time.
The right number = the number that is right for you. Somewhere between zero and inifinity will do</b>
I despise these rules of thumb, too
Keith, September 23, 2002 - 10:59 am UTC
One other point I like to make when people tell me that a table has too many indexes.
Would you rather a table have too many or too few indexes? I always error on the side of too many. Why, because a rogue query which is scanning a large table for a small amount of rows will always impact the performance of the system. If the number of concurrent "bad" queries equals the number of CPU's on your db server, everything gets bogged down, not just the resopnse time of the query.
September 23, 2002 - 1:12 pm UTC
There is no such thing as "too many" or "too few"
There is only one answer "just right".
Just right might be 100
Just right might be 1
It totally depends on your needs. You do not want to have indexes for which there is no requirements. You need to have all indexes for which there is a requirement.
A reader, October 23, 2002 - 8:07 pm UTC
Hi Tom,
Will an insert take more time if I have a composite index of two columns rather than one column?
Thanks
October 24, 2002 - 6:47 am UTC
It'll be a function of the size of the indexed columns -- the wider the key -- the longer it takes.
It is marginal.
A reader, October 24, 2002 - 12:29 pm UTC
Thanks for answering me Tom!
Let me see if I understand correctly, when you say "it is marginal" does it mean that difference between the time it will take to insert a row in a table with a compositive index of two columns and the time to insert in the same table but with an index of one column is so low that we can consider it insignificant?
Thanks
October 24, 2002 - 3:14 pm UTC
Depends on the size.
char(5)
vs
char(5) || char(255)
might not be as "marginal" but
char(5)
vs
char(5) || char(5)
marginal.
Interesting!
Jim, October 24, 2002 - 5:55 pm UTC
Very informative discussion Tom..
I think people for some reason people
believe indexes slow DML when in fact
indexes MAY slow down DML, but they MAY
also speed certain operations up
On equestion regarding design
Suhail, December 29, 2003 - 3:06 pm UTC
Tom,
We have a table which have several millions of rows almost 10 millions . It has 30 columns and out of 30 columns there are 25 indexes. This table consist of several sub tables ie just think of Sub type , we have 8 sub type on this table and if we enter data of one sub type then other 7 sub types data will not be entered , meaning those will be NULL. Now when we generate this table from Designer, it creates all the foreign keys links and indexes on all the sub type columns. So we have almost 25 indexs. This is very much similar to FACT table in DW environment whereas we are building an OLTP application. Let me know is this really a good design or should we break one table into multiple smaller tables.
When I argued with my project manager( He is DBA too) that if we break this table into multiple tables instead of one table nthen we woul dbetter model. He did not agree, he thinks that for reporting purpose this is best design, he thinks that reports will be slow due to join so he wants to have one big flat table. I donot agree, whats your opinion on this matter? Could you provide some compelling arguments if you think my argument would give a better result?
Thanks
Suhail
December 29, 2003 - 5:49 pm UTC
insufficient data to answer.
if reporting is "key, crucial, critical, the thing to design for" -- your coworker may have a point.
It seems "ok" to me -- i've done this before -- but not knowing all of the details, it is really hard to "say". the 25 indexes don't scare me since we only maintain the index when the key to be placed in there is NOT NULL so -- really, only the affected indexes for an insert/update will be bothered (not all 25).
so, you may both be right -- for different reasons, depends on the totality of the circumstances.
Indexs and sub types
Suhail, December 29, 2003 - 9:33 pm UTC
Tom I agree that only the effecetd inderxs would be bothered in case of index/update. But dont you think we are wasting lot of space here. If I enter data for one type , other types data are empty in that row so in a row we only enter 5 columns at a tiem and all other columns are going null. Whereas if tables are divided into seperate independednt tables then we will not mix apple with oranges.
I donot think its a good design at all specially in the OLTP environment. We have several complicated online screens along with reports so reports is not only the key factor, we got to have good online screens.
Thanks for your any comments.
December 30, 2003 - 9:40 am UTC
you will "waste" 25 bytes per row with 25 nulls -- not really too much considering.
My question to you is
are you really an OLTP environment with some casual reporting or are you a system where there is some data entry to support massive reporting and reporting is the key element.
The schema won't affect screen design at all, that I don't see the tie in between.
Tell me why you think this will be "bad" -- I'm not saying its good, but I don't see any "really bad things" here yet?
its OLTP environment
A reader, December 30, 2003 - 2:57 pm UTC
Well its OLTP environment, however we are suppose to load lot of data from batch, online screens are there to do the same job but its one at a time not in bulk. Well here is the design of the table
(it_id number, PK -- sequence
att_id number, --FK
bmp number not null,
emp number not null,
length number not null,
description varchar2(100),
att_value varchar2(100),
cty_code varchar2(3)--FK
Cty_date date , --FK
muni_id number, --FK
dscrptn varchar2(100),
eff_date date ,
reg_id number , --FK
org_id number , --FK
dvd_id number, --FK
tip_id number, --FK
dig_id number, --FK
lid_id number, --FK
tiv_id number, --FK
bltm number,
eltm number,
....
and so one , there are some more FKs and more columns.
For every PK it_id , at least all not null columns and att_id must be stored and one of other Fks. All other FKs are null and plus other columns may be null.
Does it not make more sence to have smaller table similar to one as follows:
(it_id number , --PK
att_id number not null, FK
bmp number not null,
emp number not null,
length number not null,
description varchar2(100),
reg_id number --FK
)
similary more tables like this for every FKs ie we will have 23 smaller tables in my case. It will be more normalised and better for OLTP environment. I had looked the reporting needs and none of our reports can be directly developed based on the above big table structure. I have to create several temp tables to pre process the data in order to run the reports.
We will be adding at least 5- millions rows on an yearly basis in this single table. Initial load is 10 Millions if we load into one tables , if we create several smaller tables then each table will have several thousands rows ( which is not so bad) .
Just for a report format here is one example
BMP EMP LENGTH Description Muni_id org_id dig_id...
1 5 4 Atlanta 123 34 190
Just imagine how complicated it would be to create a report like above , meaning I have to write a pl/sql Package to pre process the data into a temp table and then run the report from the temp table. Whereas , if we store the data into multiple smaller tables then we can just use view or simple join to create a report like above.
Thanks
December 30, 2003 - 3:19 pm UTC
what are those things fk's too? i don't see the other part out there?
FKs
A reader, December 30, 2003 - 3:53 pm UTC
FKs are foreign keys from different lookup tables.
December 30, 2003 - 4:53 pm UTC
right, i don't understand the model, i don't see the other tables -- tell you what, this isn't working very well here, getting way off the original topic and all.
I have no problems with one table in general.
I have problems if the design
a) cannot meet the required performance of the critical features (a simple prototype would show that)
b) cannot supply the most frequently requested, important data efficiently (see a, a simple prototype)
so, suggestion -- prototype both models and see what you see!
most exciting
deepak, February 25, 2005 - 12:00 am UTC
this was quite useful for me.i am reader and it gives me much more knowledge.
Excellent! the ultimate answer for antique PM's...
Shimon Tourgeman, February 28, 2005 - 3:38 am UTC
One of the most annoying dilemmas have found rest, at last...
May the force be with you, Tom...
Jess, March 27, 2017 - 12:04 pm UTC
Hi Tom,
We have a 'user' table of about 1M records that has your usual name, DOB, etc., as well as a system-generated unique id (not the samek as PK) that we share with 3rd party systems to identify a user. When each party sends us records, they send us their ID for each of our users. There are 10 sources, so we need to keep 10 columns. It seems we need an index for each of these because we need to query based on each source id.
Would it be more sensible to add these 10 into the 'user' table and add 10 indexes or to take our user unique id into a separate table and have 10 indexes there? I am leaning toward the latter, but stuck on deciding if each index should be a pair of (source_use_id, our_user_id) or if it'd be ok to only index source IDs and read ours from disk? Not really sure where the io/memory trade-off works here... What would you advise?
March 28, 2017 - 1:15 am UTC
If you stored the entries as:
("external system name", "their provided id", "reference to our id")
then would you not only need a single index ?
Because if an external system wants to query your data, then they'll provide their ID, and you will know where they came from.
Jess, March 30, 2017 - 5:31 pm UTC
Hi Connor,
You're totally right. I think I was having a major stupid moment...
We'll do exactly what you're saying (though we'll need 2 indexes, as other queries search by UID without knowing the source and then fetch the data from correct source data accordingly).
Thanks again!
March 31, 2017 - 4:43 am UTC
glad we could help