Skip to Main Content

Breadcrumb

Dev Live Dev Intro

We are celebrating Developers at AskTOM. We welcome Developers of all levels of experience to join us at our FREE Developer Live events coming in August and September. Just click on the left to register today! If you are brand new to Database Technology, then we also have got you covered. Just click on the right for your comprehensive FREE training program to kick start your Oracle Database Development journey!

Question and Answer

Tom Kyte

Thanks for the question, Hariharan.

Asked: October 20, 2009 - 8:00 am UTC

Answered by: Tom Kyte - Last updated: January 10, 2017 - 4:03 am UTC

Category: Database - Version: 10.02

Viewed 10K+ times! This question is

You Asked

Hi Tom,

Good Day

I was reading a book written by a popular author (I would prefer not to mention the name of the book and its author). He says:

In discussion on bitmap indexes, the following was written:

"Concatenated B*tree indexes were being used and the distinct values for each of these columns numbered less than 200. By replacing the b*tree indexes with bitmap indexes, the overall performance improvement of the system was incredible"...

Further the author also says...

Bitmap indexes do not update quickly, so the best approach is to only use them in database that update all rows in a nightly batch. In this fashion, the bitmap is dropped, changes applied to the table and the bitmap rebuilt.

In view of the above, can you please clarify the following points for us?

a) Do we create bitmap indexes when the distinct numbers are 200? Is it not high?
b) Why Bitmap indexes do not update quickly? Is it only for update or any DML activities?
c) I also heard many people telling me that number of bitmap indexes to be created on a table should be limited. Is this true?
d) Any new books on Oracle Database by you in near future?

Thanks for helping us

Hari

and we said...

a) 200 distinct values means the key would in general (on average) return 0.5% of the table.

A b*tree index would be great for that - a bitmap index would be great for that. Whether it was a 100,000 row table or a 100,000,000 row table.


However, I think there is missing information. I put the quote into google and it is clear where it comes from:

http://www.remote-dba.net/t_bitmap_index_tuning.htm

The entire quote was:

An ad-hoc query system was experiencing slow query performance. The system was read-only except for a 30-minute window at night for data loading. Inspection of the SQL showed complex combinational WHERE clauses like:

WHERE color=’BLU’ and make=’CHEVY’ and year=1997 and doors=2;

Concatenated B-tree indexes were being used and the distinct values for each of these columns numbered less than 200. By replacing the b-tree indexes with bitmap indexes, the overall performance improvement of the system was incredible.

 Queries that had taken as much as three seconds to run were reduced to runtimes of under one-tenth of a second. 


Now, if I had a concatenated index on (color,make,year,doors) - I don't really care if it was a b*tree or bitmap index - it would perform approximately the same.

If I had to just "count" things - both would be able to count via the index (without touching the table). The bitmap index might be smaller than the b*tree index - but not to such a degree that it would turn a 3 second query into a 1/10th of a second query.

If I had to access the table - both would perform *about the same*, the bitmap might even go a tad bit slower CPU wise since it has to convert bitmaps into rowids whereas the b*tree index would have the rowids already.

Wouldn't it be nice if the book had an example, one that showed the difference before and after? And then an explanation as to WHY it was so massively improved.



Here is how a bitmap index could be very useful. Think of an ad-hoc situation. You have a table with 20 attributes. All of them are searched on - but you never know which ones will be used. Maybe I like to query on C1, C5, C20 and you like to use C1..C5, C17. Maybe someone else likes to use another set. Maybe I use equals on C1, you use > on C1 (I would like C1 to be near the front of a b*tree index - you would like it to be near the end in the list of columns).

Now, you have these 20 attributes and the end users pick what they want to query against at runtime and they get to pick HOW they want to query against them (=, <, >, etc). I defy you to come up with a set of concatenated b*tree indexes that would be useful for each and every query that would benefit from an index.

Now, if I used single column bitmap indexes - having say 20 indexes, one each on each of the columns - we can put them together and create new indexes that never existed by ANDing and ORing them together.

Here is my write up on this


http://asktom.oracle.com/pls/asktom/z?p_url=ASKTOM%2Edownload_file%3Fp_file%3D6950060120505500493&p_cat=bitmap.pdf&p_company=822925097021874



b) See, the fact that you ask this means "things that talk about Oracle should explain why they say the things they say - if someone writes 'bitmap indexes do not update quickly' - they should qualify that.

Bitmaps in fact do update rather quickly. However, they also tend to SERIALIZE and do not respond well to slow by slow (row by row) updates. They are suitable for read only/read mostly data that is modified in BULK (not row by row)


They tend to serialize because a bitmap index by definition is an index such that a single key value in the index points to 10's or 100's or 1000's of rows in a table. If you had a bitmap index on the JOB column in the EMP table, and you issued

update emp set job = 'CLERK' where ename = 'SCOTT' -- scott is an analyst


you would lock an index key of ANALYST (to change a 1 to a 0, scott isn't an analyst anymore) and an index key of CLERK (to change a 0 to a 1...). However, that ANALYST key points to hundreds of other EMP records - they would be effectively locked, you could not (probably) insert another analyst right now as that key is locked. Same for the clerk key - you could not delete someone that was a clerk and you'd have a hard time updating the JOB column for clerks and analysts, and creating a new record with those values would block as well.



c) ask them why, ask them for the science. The right number of indexes on a table TOTALLY DEPENDS ON THE USE CASES.


d) nothing concrete right now, but lots of answers to your bitmap questions - so you can UNDERSTAND them, to see how they work, what they do and don't do (so you can decide for yourself WHEN and WHERE to use them - rather then "it was incredible once - maybe it'll be incredible again some day") is in Expert Oracle Database Architecture right now.

and you rated our response

  (44 ratings)

Is this answer out of date? If it is, please let us know via a Review

Reviews

Bitmap Indexes

October 20, 2009 - 8:41 am UTC

Reviewer: Hariharan from Chennai, India

Brilliant Tom.

I actually do not want to specify the author name as I remember seeing some heated arguments exchanged on the "Precepts".

But you found out.

Good Explanation.

Thanks Again

Hmmmmm

October 20, 2009 - 9:55 am UTC

Reviewer: djb from UK

'I actually do not want to specify the author name'

Have to say, it was pretty obvious !

Don't worry, Hari

October 20, 2009 - 9:57 am UTC

Reviewer: Martijn Hoekstra

The author is currently far too busy over here: http://forums.oracle.com/forums/message.jspa?messageID=3838839#3838839 and here: http://forums.oracle.com/forums/message.jspa?messageID=3836101#3836101

;)

(very interesting and informing reads by the way...)

October 20, 2009 - 12:30 pm UTC

Reviewer: A reader

Hello Tom,

<qoute>

the bitmap might even go a tad bit slower CPU wise since it has to convert bitmaps into rowids
whereas the b*tree index would have the rowids already

<qoute>


Is it not that bitmap index in leaf block have start row-id and end row-id ? Am I correct ?

could please clarify your above comment


Thanks





Tom Kyte

Followup  

October 22, 2009 - 4:45 pm UTC

the bitmap index has a rowid range (start/stop) and a bitmap

in order to get the rowid for the i'th row in the bitmap, you must do some "math" on the start/stop rowids - perform some function f(start,stop,bitmap,i)

the b*tree just has the rowid there.

Attachment link

October 21, 2009 - 12:28 am UTC

Reviewer: Raajesh from India

Hi Tom,

I checked the PDF link that you have provided and got a doubt in that.

"If a session modifies the indexed data, then all of the rows that index entry points to are effectively locked in most cases. Oracle cannot lock an individual bit in a bitmap index entry; it locks the entire bitmap index entry"

Is this an implementation limitation or is there a better way to address this?

Can you explain the overhead of this individual bit locking if it is possible?

Thanks,
Raajesh
Tom Kyte

Followup  

October 22, 2009 - 4:47 pm UTC

It is pretty much in the definition of a bitmap index - in these structures, a single key points to hundreds or thousands of rows.

That is exactly what sets them apart from a b*tree or inverted list, or r*tree or any of the other indexing schemes.

They are all tools
They are all appropriate some time
They are all inappropriate other times
None are best
None are worst
None is better than the other

they are just *tools*

Understand how they work and you can use them safely.

Just like you would not use that table top saw without reading the directions (well, hope you wouldn't)

Excellent Explanation

October 21, 2009 - 1:47 am UTC

Reviewer: A reader

Excellent Explanation about bitmap indexes.

In an OLTP multi-user system with update, delete and insert you have better not implement Bitmap indexes at all in order to avoid deadlocks.

Houri Mohamed

Bitmap Indexes

November 16, 2009 - 7:28 am UTC

Reviewer: Hari from Chennai, India

Hi Tom,

Good Day.

We need some clarifications on the following scenario from you.

Database: Oracle 10gR2
OS: Solaris 10

There is a huge table called TRADES in one of the products in our company, which is partitioned on the basis of transaction_date. Almost around 300K records gets loaded into this table daily and around 80-90 million records are there as of now.

There are four boolean columns like ISCANCEL, ISACTIVE, ISTRADER and ISLOSS, which are often used in most of the queries.

My DBA and system architect recommending to create bitmap index on these four columns to improve performance.

Approximately 300 partitions are currently available in this table. Is it okay to create bitmap indexes on these columns?

If you need any other information, please let me know

Thanks

Hari


Tom Kyte

Followup  

November 23, 2009 - 10:01 am UTC

..
My DBA and system architect recommending to create bitmap index on these four
columns to improve performance.
.....

How do they anticipate it improving performance? Give me a for example "for example, we have this query..."

For you see, if there are only two values - it is likely that the bitmap index would retrieve MILLIONS of records from the table. You would need to use all four columns together - frequently - and have us bitmap OR and AND them together to find a small set of rows. I'm skeptical - so give me a for example...

When you say 300k records get loaded - is that a bulk load of 300k records or is this a transactional table.


Bitmap Indexes

November 19, 2009 - 9:50 am UTC

Reviewer: Hari from Chennai, India

Hi Tom,

Good Day.

I was reading a chapter written by Jonathan Lewis and found the following statement:

Updates to bitmapped columns, and general insertion/deletion of data can degrade the quality of the indexes quite dramatically.

In order to check this fact, I did the following:

With Bitmap Index

create table bitmapcheck as select rownum id, mod(rownum,10) bt_col, mod(rownum,10) bit_col, rpad('x',200) pad from all_objects;
Table created.
create index btindex on bitmapcheck(bt_col);
Index created.
create bitmap index bitindex on bitmapcheck(bit_col);
Index created.
intxqa > insert into bitmapcheck select * from bitmapcheck;
41291 rows created.
Execution Plan
----------------------------------------------------------
Plan hash value: 52221000
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time|
---------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 82490 | 11M| 559 (1)| 00:00:07|
| 1 | TABLE ACCESS FULL| BITMAPCHECK | 82490 | 11M| 559 (1)| 00:00:07|
---------------------------------------------------------------------------------
Note
-----
- dynamic sampling used for this statement
Statistics
----------------------------------------------------------
716 recursive calls
98204 db block gets
4665 consistent gets
95 physical reads
22488228 redo size
555 bytes sent via SQL*Net to client
577 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
3 sorts (memory)
0 sorts (disk)
41291 rows processed

Without Bitmap Index

create table bitmapcheck as select rownum id, mod(rownum,10) bt_col, mod(rownum,10) bit_col, rpad('x',200) pad from all_objects;

Table created.

create index btindex on bitmapcheck(bt_col);
Index created.

intxqa > insert into bitmapcheck select * from bitmapcheck;
41293 rows created.
Execution Plan
----------------------------------------------------------
Plan hash value: 52221000
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time|
---------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 82490 | 11M| 559 (1)| 00:00:07|
| 1 | TABLE ACCESS FULL| BITMAPCHECK | 82490 | 11M| 559 (1)| 00:00:07|
---------------------------------------------------------------------------------
Note
-----
- dynamic sampling used for this statement
Statistics
----------------------------------------------------------
670 recursive calls
97416 db block gets
4448 consistent gets
81 physical reads
21830272 redo size
562 bytes sent via SQL*Net to client
577 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
41293 rows processed

Pardon me for the improper formatting.

Now, when comparing the results except for a very few differences in Consistent Gets and Physical Reads, I do not see any major difference.

In what situation bitmap indexes affect the DML operations adversely?

Thanks

Hari
Tom Kyte

Followup  

November 23, 2009 - 3:37 pm UTC

Jonathan was talking about single row inserts, updates, and deletes. Operations done in bulk (direct path loads, large insert as selects) are fine with bitmap indexes

it is single row inserts/updates/deletes that kill them - test that out.



Bitmap Indexes

March 24, 2010 - 5:03 am UTC

Reviewer: Hari from Chennai, India

Hi Tom,

Good Day.

I have a peculiar scenario in my current company. We have an Oracle 10g database on Solaris 10 holding a product database. They have a table called Account with the following information:

Account_Number
Account_Name
Is_Error_Account
Is_Processed
Account_Networth
Is_Current
Is_NRI
.......
......

Now, in this table they have bitmap indexes on IS_ERROR_ACCOUNT, IS_PROCESSED, IS_CURRENT, IS_NRI

When the developer uses a query statement which does not involve the bitmap indexed columns in both select and update, the explain plan shows the usage of bitmap index as shown below.

select statement
SORT
BITMAP CONVERSION
BITMAP INDEX <Bitmap Index Name> Fast Full Scan

Even if we see the explain plan for the following query

select count(*) from account

Oracle uses the bitmap index to execute the query

Also, Oracle will not use the same bitmap index column everywhere, it randomly picks the bitmap index column.

I am very much confused and still not able to understand why Oracle chooses the bitmap indexed columns to execute the queries if it is not used.

Can you provide me an insight and solution for this?

Thanks

Hari
Tom Kyte

Followup  

March 26, 2010 - 10:53 am UTC

... Even if we see the explain plan for the following query

select count(*) from account

Oracle uses the bitmap index to execute the query
...

umm - perfect - why would you want it to do anything else??!??!?!!?




so, give us an example of a query that should NOT use an index that you see it using.


I would much rather Oracle used a small bitmap index to count rows in a table, rather then full scan the table - wouldn't you???

Bitmap Indexes

March 29, 2010 - 12:48 am UTC

Reviewer: Hari from Chennai, India

Hi Tom,

Greetings.

Thanks for your explanation. But I am still not able to provide an answer for the question asked by the developers. I will explain this with the real situation:

update <table A> set (columnA1, ColumnA2)=(select columnB1, Column B2 from <Table B> where <Column B3>=<value> and <Column B4>=<Value>) amd <Column B4>=<Value> and table b.column A3=tableb.column B5)

Now, Table A does not contain any bitmap indexes and Table B contain bitmap indexes but not used in the where clause of the select statement. Also, when we make these bitmap indexes to unused, the above update statement takes less than 2 minutes to execute and if we do not make it to unused it takes couple of hours to execute.

Also when I saw the explain plan, the usage of these bitmap indexes are taking either FULL SCAN/FAST FULL SCAN and cost is comparatively higher.

I would like to know the following:

a) Why is this taking more time?
b) I am not able to understand when you said that the usage of bitmap index (when it is not used in the where clause of a select statement) is nothing wrong. Can you please tell me the reason behind it?

Thanks for your help

Hari
Tom Kyte

Followup  

April 05, 2010 - 9:16 am UTC

(a) more time than what? You give us *nothing* to work with here.

Probably the answer is "two minutes is extremely fast". But we don't know, because you have given so little data - so little information here to work with.

an update of the form:

updtae a set (c1,c2) = (select x1, x2 from b where ....)

will FULL SCAN A (hopefully, you are updating EVERY SINGLE ROW in A, all of them, every single one).

and hopefully, it would full scan B and do a semi join, since you are hitting SO MUCH OF A, you will be hitting MOST ALL OF B as well in general. So, I would hope for full scans all over the place and would be very upset if it used any indexes.

read:

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:6749454952894#6760861174154

full scans are AWESOME

(b) read my answer to (a) again

Bimap Indexes

April 15, 2010 - 3:43 am UTC

Reviewer: Hari from Chennai, India

Hi Tom,

Database: Oracle 10g
OS: Solaris
Type of Job: Batch Job

In Sequel to my previous post, I have some data in my hands which I got it from explain plan and need your expert explanation on this.

The update statement is:

update stg_extend_account s set (account_number,liquid_net_worth,INVESTMENT_OBJ_CODE)=
(select /*+ index (a idx_account_acct_num_fn)*/ a.acct_num, case when s.LIQUID_NET_WORTH > 99999999999 then 5000001 else s.liquid_net_worth end ,
decode(s.INVESTMENT_OBJ_CODE,'%N/A%',substr(IBD,1,3)||' -UNDC',null,substr(IBD,1,3)||' -UNDC',substr(IBD,1,3)||' -'||substr(s.INVESTMENT_OBJ_CODE,1,4))
from act_dev.account a,party p where substr(a.acct_num,1,length(a.acct_num)-9)=s.account_number and
p.party_key=a.primary_party_key and p.ssn=s.ssn) where processed='A'

(Note: Sorry about the usage of Index Hint. I keep telling them not to use hints, but nobody is listening to me :) :) )

Now the explain plan for the above statement is:

OPERATION OBJECT_NAME OPTIONS COST
Update Statement 23
UPDATE STG_EXTEND_ACCOUNT
TABLE ACCESS STG_EXTEND_ACCOUNT FULL 23
FILTER PREDICATES
PROCESSED='A'
HASH JOIN 2507
ACCESS PREDICATES
P.PARTY_KEY=A.PRIMARY_PARTY_KEY
TABLE ACCESS PARTY FULL 381
FILTER PREDICATES
P.SSN=:B1
TABLE ACCESS ACCOUNT FULL 2125
FILTER PREDICATES
SUBSTR(A.ACCT_NUM,1,LENGTH(A.ACC....)

Now the above plan is generated without any bitmap indexes on account table.

If we create the bitmap indexes for one of the tables ACCOUNT and re-execute the above query, the plan changes and it is provided below:

OPERATION OBJECT_NAME OPTIONS COST
Update Statement 23
UPDATE STG_EXTEND_ACCOUNT
TABLE ACCESS STG_EXTEND_ACCOUNT FULL 23
FILTER PREDICATES
PROCESSED='A'
TABLE ACCESS ACCOUNT BY INDEX ROWID 23624933
FILTER PREDICATES
AND
SUBSTR(A.ACCT_NUM,1,LENGTH(A.ACC....)
P.PARTY_KEY=A.PRIMARY_PARTY_KEY
NESTED LOOPS 23624933
TABLE ACCESS PARTY FULL 381
FILTER PREDICATES
P.SSN=:B1
BITMAP CONVERSION TO ROWIDS
BITMAP INDEX BI_IDX_ACCOUNT_IS_ERR_ACT FULL SCAN

Now, my questions:

a) Why the cost is very high after the bitmap index is created?
b) The bitmap index is not used in the update statement. I would like to know why the whole plan itself is changed?
c) Please provide me a detailed explanation on why this happens

Thanks for your help

Hari
Tom Kyte

Followup  

April 15, 2010 - 8:26 am UTC

a) I see a cost of 23 for both, what did you see?

b) I see a bitmap index in the send plan??

c) I cannot explain since we seem to be looking at different stuff.

Bitmap Indexes

April 15, 2010 - 8:43 am UTC

Reviewer: Hari from Chennai, India

Hi Tom,

In the first plan, the table access for ACCOUNT table is

TABLE ACCESS ACCOUNT FULL 2125
FILTER PREDICATES
SUBSTR(A.ACCT_NUM,1,LENGTH(A.ACC....)

Where as in the second plan, the table access for the ACCOUNT table is

TABLE ACCESS ACCOUNT BY INDEX ROWID 23624933
FILTER PREDICATES
AND
SUBSTR(A.ACCT_NUM,1,LENGTH(A.ACC....)
P.PARTY_KEY=A.PRIMARY_PARTY_KEY

Also, there is a new operation with huge cost (23624933) in the second plan which is not seen in my first plan:

NESTED LOOPS 23624933
TABLE ACCESS PARTY FULL 381
FILTER PREDICATES
P.SSN=:B1

The increase in this cost is noticed after creating the bitmap indexes in the ACCOUNT table and that is what we are actually getting worried and confused and not able to justify why there is a increase in cost

Am I clear on my statements here?

Thanks

Hari
Tom Kyte

Followup  

April 15, 2010 - 9:02 am UTC

the cost is just a number, different inputs, different costs - tell us, what are the OBSERVED RESULTs.


Bimap Indexes

April 15, 2010 - 9:18 am UTC

Reviewer: Hari from Chennai, India

Hi Tom,

The following are the observed results:

a) Execution Time

The update statement without bitmap indexes take 7 minutes to execute.
The update statement with bitmap indexes take almost more than 3 hrs to execute

b) Number of records

Number of records remains the same

Thanks

Hari
Tom Kyte

Followup  

April 15, 2010 - 9:52 am UTC

a) that is likely not due to the bitmap slowing down the query, but the bitmap slowing down the actual update.

Meaning, even if we didn't use the index, it would be slow - just because it exists.

I was suspicious of the bitmap index in the first place, why would you add an index that is so expensive to maintain for a query that updates the data the index is on?

b) of course - I would certainly hope so.

Unusual Usage of Bitmap Indexes

July 27, 2010 - 6:23 am UTC

Reviewer: Hari from Chennai, India

Hi Tom,

Good Day.

I was experimenting the results on index usage to demonstrate the same to my team members, when I found something which I could not understand. Can you please explain?

I created a table and 3 indexes as follows:

create table generalworks (GENNO NUMBER(7), GENNAME VARCHAR2(40), GENCATG VARCHAR2(2), GENDESC VARCHAR2(3), GENDETS VARCHAR2(6))

create index GENERALWORKSGENCATGIDX on generalworks(gencatg);
create index GENERALWORKSGENNMEIDX on generalworks(genname);
create index GENERALWORKSGENIDX on generalworks(genno);

Now I inserted 50K records into the table. When I issue a select statement as follows, I am getting the BITMAP Conversion execution step as well:

SQL> select * from generalworks where genname='aaaabbbb' and gencatg='ab';

3847 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 4242548220
-----------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name                   | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                        |   298 |  8046 |    99   (2)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID     | GENERALWORKS           |   298 |  8046 |    99   (2)| 00:00:02 |
|   2 |   BITMAP CONVERSION TO ROWIDS    |                        |       |     |            |          |
|   3 |    BITMAP AND                    |                        |       |     |            |          |
|   4 |     BITMAP CONVERSION FROM ROWIDS|                        |       |    |            |          |
|*  5 |      INDEX RANGE SCAN            | GENERALWORKSGENCATGIDX |  3860 |    |    12   (0)| 00:00:01 |
|   6 |     BITMAP CONVERSION FROM ROWIDS|                        |       |    |            |          |
|*  7 |      INDEX RANGE SCAN            | GENERALWORKSGENNMEIDX  |  3860 |    |    16   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("GENCATG"='ab')
   7 - access("GENNAME"='aaaabbbb')

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        571  consistent gets
          0  physical reads
          0  redo size
     160080  bytes sent via SQL*Net to client
       3197  bytes received via SQL*Net from client
        258  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       3847  rows processed

a) Why is the extra step of BITMAP CONVERSION TO ROWIDS and FROM ROWIDs are?
b) Will this cause any issues in the execution step?
c) I have created only Normal Index. Why is the BITMAP operation is happening?
d) Please note that this BITMAP conversion occurs only when I use GENCATG column, otherwise No. (i.e.) the following Select does not involve BITMAP CONVERSION

SQL> select * from generalworks where genno > 10000 and genname='aaaabbbb';

3077 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1336713796
-----------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                       |  3090 | 83430 |245   (1)| 00:00:03 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| GENERALWORKS          |  3090 | 83430 |245   (1)| 00:00:03 |
|*  2 |   INDEX RANGE SCAN          | GENERALWORKSGENNMEIDX |  3860 |       | 16   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("GENNO">10000)
   2 - access("GENNAME"='aaaabbbb')

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        685  consistent gets
          0  physical reads
          0  redo size
      90912  bytes sent via SQL*Net to client
       2636  bytes received via SQL*Net from client
        207  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       3077  rows processed

Last question, I inserted around 200K records into this table. However, when I executed the above select statements, it never used the index and always went for FULL TABLE SCAN. Any specific reason?

Thanks

Hari

Tom Kyte

Followup  

July 27, 2010 - 12:32 pm UTC

a) we decided to make a bitmap index on the fly. We converted rowids into 0's and 1's (a bitmap) did the operation (bitmap and) and converted the 0's and 1's back into rowids to find the rows again.

b) other than hopefully getting the data back faster than any other approach would provide - no.

c) because we can do lots of things under the covers, create temporary tables, cartesian joins, hash joins, sort merges, index skip scans, etc etc etc

d) it might, if your data was generated differently. It all has to do with the expected "rows". Look at your last plan there - step 2 get 3,860 rows via the index (where genname='aaaabbbb'). Step 1, apply filter after step 2 took place, reduce the set from 3,860 to 3,090. It decided the reduction of less than 800 rows could be done more efficiently by going to the single index and then the table rather than two indexes - converting to bitmaps, anding, converting to rowids, getting rows from table. that is all




last answer: because, based on the nature of your data, that is what it deemed was the most efficient way to get it. Indexes are useful to get a small number of rows from a big thing, when you get a (relatively) large number of rows from a big thing - they stop being as useful.

Bitmap Indexes

August 18, 2010 - 9:37 am UTC

Reviewer: Hari from India

Hi Tom,

Thanks for your excellent and expert reply for my previous post. I am still not able to find the answer for the following. Can you please help me?

From my previous post:

SQL> select * from generalworks where genname='aaaabbbb' and gencatg='ab';

3847 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 4242548220
----------------------------------------------------------------------------------------------------
-------
| Id  | Operation                        | Name                   | Rows  | Bytes | Cost (%CPU)| 
Time     |
----------------------------------------------------------------------------------------------------
-------
|   0 | SELECT STATEMENT                 |                        |   298 |  8046 |    99   (2)| 
00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID     | GENERALWORKS           |   298 |  8046 |    99   (2)| 
00:00:02 |
|   2 |   BITMAP CONVERSION TO ROWIDS    |                        |       |        |            |   
       |
|   3 |    BITMAP AND                    |                        |       |        |            |   
       |
|   4 |     BITMAP CONVERSION FROM ROWIDS|                        |       |          |            | 
         |
|*  5 |      INDEX RANGE SCAN            | GENERALWORKSGENCATGIDX |  3860 |          |    12   (0)| 
00:00:01 |
|   6 |     BITMAP CONVERSION FROM ROWIDS|                        |       |          |            | 
         |
|*  7 |      INDEX RANGE SCAN            | GENERALWORKSGENNMEIDX  |  3860 |          |    16   (0)| 
00:00:01 |
----------------------------------------------------------------------------------------------------
-------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("GENCATG"='ab')
   7 - access("GENNAME"='aaaabbbb')

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        571  consistent gets
          0  physical reads
          0  redo size
     160080  bytes sent via SQL*Net to client
       3197  bytes received via SQL*Net from client
        258  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       3847  rows processed

There are 3847 rows selected from the table for this particular query. However, in the execution plan, the ROWS column for STEP 0 and STEP 1 shows as 298. Can you please tell me the reason? 

The tables and indexes are analyzed using DBMS_STATS.

Thanks

Hari

Tom Kyte

Followup  

August 19, 2010 - 1:53 am UTC

because an execution plan is a ESTIMATE and will rarely if ever have exactly the same precise numbers as reality.

Bitmap Indexes

August 19, 2010 - 6:27 am UTC

Reviewer: Hari from India

Hi Tom,

Thanks for your response.

Sorry for the following questions. Could not convince myself:

a) There can be some difference between the reality and estimate. But not too much. Is this because of some other issue?
b) The CLUSTERING_FACTOR for this index is 264. Will there be any connection between the cost and clustering_factor?
c) When we analyze table, will Index also gets analyzed?

Thanks

Hari
Tom Kyte

Followup  

August 19, 2010 - 3:06 pm UTC

a) it can be widely different and sometimes 'fixable' and sometimes 'not'. The predicate being evaluated:

where genname='aaaabbbb' and gencatg='ab';


has access to some basic information - for example, it might know that 'genname=aaaabbbb' returns about 25% of the table. further, it might know that gencatg=ab returns 25% of the table. It would assume that the two together would return 25%*25% of the table (1/16th of the table).

In 11g, you could gather extended statistics on genname AND gencatg together in order to get a better estimate. But you might not need to - the estimate it currently gets might typically be better than good enough. But - the process to get extended statistics exists (in 11g and above) and is yet another tool you can use.


b) bitmap indexes treat the clustering factor differently than b*tree indexes, but yes, it will affect the costing. All statistics will affect the costing.

search for

bitmap clustering lewis

on google, Jonathan Lewis has written about bitmaps and clustering factors many times - he explains it well.


c) do not use analyze, use dbms_stats to gather statistics. And if you want, you would use cascade=>true to cascade the statistics gathering from the table down to the indexes.

what is bulk update?

August 25, 2010 - 2:42 am UTC

Reviewer: jian huang zheng from china

Hi Tom,

You answered:They are suitable for read only/read mostly data that is modified in BULK (not row by row).

I thought update always is row by row process, since you have to find each row to update,
can you give an example of bulk update?

Thanks,
Tom Kyte

Followup  

August 26, 2010 - 11:56 am UTC


for example, instead of:

for x in (select * from new_updates)
loop
   update existing_data
      set columns = values
    where existing_data.key = X.key;
end loop;



which is "slow by slow", really inefficient, you would:

update (select ed.columns, nu.values
          from existing_data ed, new_updates nu
         where ed.key = nu.key)
set columns = values;

which is "bulk updating"


or - at the very least, you would:


open c for select * from new_values;
loop
   fetch bulk collect c into your_array limit 100;
   .... whatever processing you do ...
   forall i in 1 .. c.count
      update existing_data set columns = your_array(i);
   exit when c%notfound;
end loop;
close c;



I would prefer the single update if at all possible, the bulk collect and forall i processing is a far down the path second choice, the slow by slow is something I would not consider.

re: what is bulk?

August 25, 2010 - 12:47 pm UTC

Reviewer: Duke Ganote from Amelia, OH USA

Thanks,

August 26, 2010 - 7:57 pm UTC

Reviewer: jian huang zheng from china

Thanks,

To continue with Zheng`s question

January 10, 2011 - 10:06 am UTC

Reviewer: phoenixbai from China

quote:
I thought update always is row by row process, since you have to find each row to update,
can you give an example of bulk update?

------
I have a similar question as above:

either the sql is written in 'row by row' format or 'bulk updating', I always thought that the processes that perform the actual update, would do the update row by row, in serialized way.

But now it seems not.

So, could you help explain how the bulk update is performed internally? like how does it manage to do the update faster than the 'row by row'?

Also, is the 'row by row' update performed in serialized way?

Note: I am only talking about the one update statement in one transaction.

Thanks

Tom Kyte

Followup  

January 10, 2011 - 10:31 am UTC

pretend you have a stack of paper (rows)

pretend someone else wants that stack of paper (updates the rows)

You could

a) row by row it. They would ask for page 1, you would give them page 1. They would ask for page 2, you would give them page 2. And so on and so on and so on.


b) bulk update it. They would ask for pages 1..100. You would give them pages 1..100. They would ask for pages 101..200, You would give them pages 101..200 and so on and so on and so on.

c) do it in a single efficient sql statement. They would ask for the stack of paper. You would give it to them.




So, instead of:
for x in (select a,b,c from t)
loop
   update t set b = ..., c = ... where a = x.a;
end

which would be slow by slow...

and instead of:

open c;
loop
    fetch c bulk collect into some_arrays limit 100;
    for i in 1..some_array.count
    loop
         array(i).b = ...;
         array(i).c = ...;
    end loop;
    forall i in 1.. some_array.count 
         update t set b=array(i).b, c = array(i).c
         where a = array(i).a;
    exit when c%notfound;
end loop
close c;


which would be a clumsy bulk update - better than row by row but not better than:

update t set b = ..., c = ...;


and be done with it.

Bitmap index takes more sort than index size

June 28, 2011 - 2:46 am UTC

Reviewer: goiyala3 from India

I have a query which is using 2 indexes both are bitmap indexes (sizes are 37 and 24 Mbs) and table size is 17gb. While i ran the following querywhich can very well get the index itself, it takes around 6-8 minutes and using pga around 3 gb.
could you please explain me why or is it accessing the table ?

SQL_ID 5z0a50a2yzdky, child number 0
-------------------------------------
select count(1) from (select distinct SRC_SYS,PROD_KEY from dds.REV_F)
Plan hash value: 867747470
--------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 221K(100)| | | |
| 1 | SORT AGGREGATE | | 1 | | | | | |
| 2 | VIEW | | 24533 | | 221K (6)| 00:44:22 | | |
| 3 | HASH UNIQUE | | 24533 | 479K| 221K (6)| 00:44:22 | | |
| 4 | VIEW | index$_join$_002 | 63M| 1209M| 212K (2)| 00:42:28 | | |
|* 5 | HASH JOIN | | | | | | | |
| 6 | PARTITION LIST ALL | | 63M| 1209M| 3591 (1)| 00:00:44 | 1 | 145 |
| 7 | BITMAP CONVERSION TO ROWIDS| | 63M| 1209M| 3591 (1)| 00:00:44 | | |
| 8 | BITMAP INDEX FULL SCAN | REV_F_IDX1 | | | | | 1 | 145 |
| 9 | PARTITION LIST ALL | | 63M| 1209M| 13724 (1)| 00:02:45 | 1 | 145 |
| 10 | BITMAP CONVERSION TO ROWIDS| | 63M| 1209M| 13724 (1)| 00:02:45 | | |
| 11 | BITMAP INDEX FULL SCAN | REV_F_IDX5 | | | | | 1 | 145 |
--------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access(ROWID=ROWID)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 610.89 1464.86 707459 17090 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 610.90 1464.87 707459 17090 0 1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: SYS
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=17090 pr=707459 pw=446115 time=1464867976 us)
26066 VIEW (cr=17090 pr=707459 pw=446115 time=1464795748 us)
26066 HASH UNIQUE (cr=17090 pr=707459 pw=446115 time=1464769678 us)
63422824 VIEW index$_join$_002 (cr=17090 pr=707459 pw=446115 time=1084846545 us)
63422824 HASH JOIN (cr=17090 pr=707459 pw=446115 time=958000889 us)
63422824 PARTITION LIST ALL PARTITION: 1 145 (cr=3561 pr=0 pw=0 time=63423134 us)
63422824 BITMAP CONVERSION TO ROWIDS (cr=3561 pr=0 pw=0 time=9554 us)
7112 BITMAP INDEX FULL SCAN REV_F_IDX1 PARTITION: 1 145 (cr=3561 pr=0 pw=0 time=155525 us)(object id 366074)
63422824 PARTITION LIST ALL PARTITION: 1 145 (cr=13529 pr=8864 pw=0 time=190268771 us)
63422824 BITMAP CONVERSION TO ROWIDS (cr=13529 pr=8864 pw=0 time=63553723 us)
432700 BITMAP INDEX FULL SCAN REV_F_IDX5 PARTITION: 1 145 (cr=13529 pr=8864 pw=0 time=3157351 us)(object id 366658)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 2 0.00 0.00
direct path write temp 29741 1.62 107.05
db file sequential read 8864 0.20 2.35
direct path read temp 46573 0.79 211.02
SQL*Net message from client 2 29.22 29.22

Tom Kyte

Followup  

June 28, 2011 - 11:54 am UTC

look at the plan - you have hash joins feeding into hash joins feeding into sort steps. There are multiple intermediate results to content with here.


I wish you would have used the CODE button so we could actually read it :(


Bitmap indexes join

July 15, 2011 - 7:14 am UTC

Reviewer: goiyala3

Thanks tom. here is formatted plan

SQL_ID  5z0a50a2yzdky, child number 0
-------------------------------------
select count(1) from (select distinct SRC_SYS,PROD_KEY from dds.REV_F)
 
Plan hash value: 867747470
 
--------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name                 | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |                      |       |       |   221K(100)|          |       |       |
|   1 |  SORT AGGREGATE                   |                      |     1 |       |            |          |       |       |
|   2 |   VIEW                            |                      | 24533 |       |   221K  (6)| 00:44:22 |       |       |
|   3 |    HASH UNIQUE                    |                      | 24533 |   479K|   221K  (6)| 00:44:22 |       |       |
|   4 |     VIEW                          | index$_join$_002     |    63M|  1209M|   212K  (2)| 00:42:28 |       |       |
|*  5 |      HASH JOIN                    |                      |       |       |            |          |       |       |
|   6 |       PARTITION LIST ALL          |                      |    63M|  1209M|  3591   (1)| 00:00:44 |     1 |   145 |
|   7 |        BITMAP CONVERSION TO ROWIDS|                      |    63M|  1209M|  3591   (1)| 00:00:44 |       |       |
|   8 |         BITMAP INDEX FULL SCAN    | REV_F_IDX1           |       |       |            |          |     1 |   145 |
|   9 |       PARTITION LIST ALL          |                      |    63M|  1209M| 13724   (1)| 00:02:45 |     1 |   145 |
|  10 |        BITMAP CONVERSION TO ROWIDS|                      |    63M|  1209M| 13724   (1)| 00:02:45 |       |       |
|  11 |         BITMAP INDEX FULL SCAN    | REV_F_IDX5           |       |       |            |          |     1 |   145 |
--------------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   5 - access(ROWID=ROWID)
 
28 rows selected.
 
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2    610.89    1464.86     707459      17090          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4    610.90    1464.87     707459      17090          0           1
 
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: SYS
 
Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=17090 pr=707459 pw=446115 time=1464867976 us)
  26066   VIEW  (cr=17090 pr=707459 pw=446115 time=1464795748 us)
  26066    HASH UNIQUE (cr=17090 pr=707459 pw=446115 time=1464769678 us)
63422824     VIEW  index$_join$_002 (cr=17090 pr=707459 pw=446115 time=1084846545 us)
63422824      HASH JOIN  (cr=17090 pr=707459 pw=446115 time=958000889 us)
63422824       PARTITION LIST ALL PARTITION: 1 145 (cr=3561 pr=0 pw=0 time=63423134 us)
63422824        BITMAP CONVERSION TO ROWIDS (cr=3561 pr=0 pw=0 time=9554 us)
   7112         BITMAP INDEX FULL SCAN REV_F_IDX1 PARTITION: 1 145 (cr=3561 pr=0 pw=0 time=155525 us)(object id 366074)
63422824       PARTITION LIST ALL PARTITION: 1 145 (cr=13529 pr=8864 pw=0 time=190268771 us)
63422824        BITMAP CONVERSION TO ROWIDS (cr=13529 pr=8864 pw=0 time=63553723 us)
 432700         BITMAP INDEX FULL SCAN REV_F_IDX5 PARTITION: 1 145 (cr=13529 pr=8864 pw=0 time=3157351 us)(object id 366658)
 
Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       2        0.00          0.00
  direct path write temp                      29741        1.62        107.05
  db file sequential read                      8864        0.20          2.35
  direct path read temp                       46573        0.79        211.02
  SQL*Net message from client                     2       29.22         29.22


bitmap indexes (sizes are 37 and 24 Mbs) join is taking around 3gb sort/hash and runs slower than the full table scan.

In general, Document says Datawarehouse environment will benefit with bitmap single column indexes, which helps in bitmap index joins. If all the bitmap index joins take this much sort/hash, then it is obviously slower than the full table scan.

is this plan visiting the table? Please clarify
Tom Kyte

Followup  

July 18, 2011 - 9:12 am UTC

but I already answered this:

look at the plan - you have hash joins feeding into hash joins feeding into sort steps. There are multiple intermediate results to content with here.


you look like you have 63,000,000 rows in one of your intermediate results there, that accounts for your pga usage and such.



Your temp space looks like it might be on "flaky" disks - a max write time of well over a second? A read time of 0.79? That's pretty bad.

March 21, 2012 - 1:37 pm UTC

Reviewer: Jess from USA

Hi Tom,

We recently had an issue with a bitmap index on a table. It has a few b-tree indexes and one bitmap. The bitmap indexed column has 3 values ('A' for about 90% of the rows, 'B' for about 8% and 'C' for 2%).

There are never updates to this table, only inserts. This is OLTP, so transactions are inserted one at a time by a number of parallel threads. (Deletes are possible, but the code that does it isn't currently being run though will be shortly.)

The size of the table is meant to be about 1-2 M records by design. The index doesn't appear to have caused any problems, certainly no deadlocks or timeouts.

Recently the table grew to 10M rows. One day (have we hit some capacity point?), we just started getting 'distributed txn waiting for lock' and 'deadlock detected' errors on insert with massive row lock waits. The index was dropped temporarily to allow in-flight records to continue processing.

The bitmap index has been called out as the cause. I am struggling to understand whether this really is the primary cause and, if so, why the problem hit so suddenly (the table growth was gradual).

We're planing to remove unnecessary records from the table and take it back down to 1-2 M. It's been recommended that we recreate the index as b-tree at that time.

What do you make of this?

Thanks as always.
Tom Kyte

Followup  

March 21, 2012 - 10:52 pm UTC

This is OLTP,

i read that and here is my answer before reading anything else:

DROP THE BITMAP INDEX RIGHT NOW. bitmapped indexes work in a read mostly, reporting/warehouse environment. They absolutely DO NOT WORK in OLTP - period, nothing will change that - not ever.

and bitmaps are not just for "low cardinality data", b*trees are too. If you were wanting to select out B or C values - a b*tree index on this column would be ENTIRELY appropriate. If you are going at the A values - you do not want an index.


you do not want the bitmap. the size of the table doesn't matter - it wasn't the size, it was that you are doing more and more concurrent transactions now and it is killing you - it was always negatively impacting you - but you didn't care about it before, it was "small enough of a hit". now it isn't.

you cannot have a bitmap index in this situation, plain and simple.

March 22, 2012 - 4:49 am UTC

Reviewer: A reader

Thanks Tom!
There are a few bitmaps sprinkled around in that database, so sounds like it might be worthwhile to go through the entire lot and see about changing the rest of them as well.

March 22, 2012 - 5:25 am UTC

Reviewer: Jess from USA

Having scanned other bitmap indexes this one popped out as an interesting one: A 'customer_ids' table of ~5M (growing maybe 1M a year). It seems it started out with a unique b-tree index on customer_id and a non-unique b-tree index on customer_type.

Customer_type has about 6 different values (4 are evenly spread across most of the values, with 2 small outliers).

It seems that at some point last year the indexes were changed. The unique customer_id pk index was changed to a unique b-tree reverse key index on combination of customer_id and customer_type. The customer_type index was not dropped but, rather, re-created as a bitmap index... There is no documentation for why this change was done.

Most of the queries selecting from this table do select [all] customer_id where customer_type = 'X'.

Based on your earlier advice, the bitmap index shouldn't be there, but is it actually necessary to retain a stand-alone customer_type index at all (as b-tree)? It doesn't seem like it's necessary at all.

Thank you again for all your help.


Tom Kyte

Followup  

March 22, 2012 - 9:04 am UTC

An index on (customer,customer_id) might make sense.

that index could be used INSTEAD OF THE TABLE to answer that query.

March 22, 2012 - 9:40 am UTC

Reviewer: Jess from USA

Thanks Tom. Sounds like the bitmap can go without being replaced by anything, as the two columns are already indexed.
Tom Kyte

Followup  

March 22, 2012 - 10:35 am UTC

are they indexed together? in a single index? with the type first?

How to test?

March 23, 2012 - 9:30 am UTC

Reviewer: Micheal Kutz from Greensboro, NC

Tom,
Going back to the original question: in part of your answer, you mentioned that bitmap index could be very useful for ad-hoc queries.

Your example had 20 attributes.

How in the world would I write a test script to (dis)prove that using bitmap index would be faster for more ad-hoc queries then if I used b*tree?
I understand the logic behind "why bitmap would be faster then b*tree [under certain circumstance]", but how do I know if it will work for MY data set?
If I limit the type of queries on each column to =,<,>,"do not query", (using your example) that gives me 4^20 (~1 trillion) combinations!
Do I need to test ALL combinations? Do I need to use the exact query for both methods?

Thanks,

MK
Tom Kyte

Followup  

March 24, 2012 - 10:12 am UTC

How in the world would I write a test script to (dis)prove that using bitmap
index would be faster for more ad-hoc queries then if I used b*tree?


using a keyboard perhaps? I don't know what you mean here.

You have a theory, a hypothesis - "using bitmap index would be faster for more ad-hoc queries than if I used b*tree"

Now, set out your parameters - such as

a) user can "where" on any combination of the 20 attributes - they can pick any, all, some, none of 20 of them and use any predicate they like.

b) Now, come up with a bitmap indexing schema for that scenario (probably 20 single column bitmap indexes or less) - if you have some columns that are horribly non-selective and not skewed (so they would always return hundreds of thousands or millions of rows) you might skip them. Maybe

c) then - do the impossible. Come up with an indexing scheme using b*trees to satisfy the same



Do I need to test ALL combinations?

of course not, you have actually already proven out a major portion of your hypothesis and as far as I'm concerned - you are done.

20 bitmaps
~1 trillion b*trees

which would be better?

Confustion about bitmap index

March 28, 2012 - 1:51 am UTC

Reviewer: A reader

Hi Tom,

Per your response below, i am a little bit confused

'Jonathan was talking about single row inserts, updates, and deletes. Operations done in bulk (direct path loads, large insert as selects) are fine with bitmap indexes
it is single row inserts/updates/deletes that kill them - test that out. '

I am thinking this should be same for B*tree index or even table without index right?

I know your point is that bitmap index has lock issue, but my though is even i do insert/update/delete individually,
after the insert/update/delete statement is done, the lock gets disappeared, so how comes there is issue?

Are you talking about insert/update/delete in different session?

Tom Kyte

Followup  

March 28, 2012 - 8:59 am UTC

I am thinking this should be same for B*tree index or even table without index
right?


no, you are not.

It is true that bulk (direct path) operations will handle b*tree modifications more efficiently than single row (conventional path) operations - but not anything on the order of a bitmap index.


In a bitmap index - a key may cover many rows. If you insert a row at a time, the amount of rows it covers "shrinks" and you end up with the same key in the bitmap index (covering small sets of rows) over and over - in the worst case, you get almost a "key per row" - in effect a b*tree index! This is because of the way bitmaps are maintained - the more rows you through at us at a time, the larger the bitmap will be per key in general. The less rows you give us at a time, the less rows covered by each key.


In a b*tree index - there is a one to one relationship between key and rows. It doesn't matter if you give us one row or one million rows - there will still be a key entry per row

How to test? (part 2)

March 28, 2012 - 10:57 am UTC

Reviewer: Micheal Kutz from Greensboro, NC

Tom,
In an attempt to clearify my previous post:
I'm actually trying to test 20 B*trees vs 20 bitmaps with my data by developing a benchmark to answer "Will bitmaps actually help out?"
The concern was really: "How do I test a really large number of different WHERE clauses"?
For that question, I've decided to use a procedure (a la: dynamic SQL) for the searching part and a table of inputs to simulate the multiple ad-hoc queries.

The question I now have is in regards to the processing of the SYS_REFCURSOR durring the benchmark run.
(see run_benchmark() procedure at the bottom)

For accurate testing, do I need to do anything with the the SYS_REFCURSOR or will a simple "open-close" (as currently coded) be suffice?

For simplification, my example code is using only three(3) attributes.

Thanks,

MK

create table t_x (
t_pk int primary key
,att_1 number
,att_2 number
,att_3 number
);

REM copy real data into t_x

REM add 3 bitmap indexes to t_x

REM analyze t_x

REM repeat for T_Y but use 3 b*tree indexes instead

create or replace procedure search_T( p_results in out sys_refcursor
,p_1 in number ,p_1_type in varchar2
,p_2 in number ,p_2_type in varchar2
,p_3 in number ,p_3_type in varchar2
,p_table in varchar2) /* p_table determines which table to search */
as
l_sql varchar2(1024);
begin
/* verify input */
if not ( (p_1_type is null or p_1_type in ('<','=','>') )
and (p_2_type is null or p_2_type in ('<','=','>') )
and (p_3_type is null or p_3_type in ('<','=','>') ) )
then
raise_application_error(-20000,'Bad search terms');
end if;

/* build initial SQL statement */
if upper(p_table) = 'X' then
l_sql := 'select * from T_X where 1=1';
else
l_sql := 'select * from T_Y where 1=1';
end if;

/* add portion for att_1 */
if p_1 is null or p_1_type is null
then
l_sql := l_sql || ' and (1=1 or :p1 is null)';
else
l_sql := l_sql || ' and ( att_1 ' || p_1_type || ' :p1 )';
end if;

/* add portion for att_2 */
if p_2 is null or p_2_type is null
then
l_sql := l_sql || ' and (1=1 or :p2 is null)';
else
l_sql := l_sql || ' and ( att_2 ' || p_2_type || ' :p2 )';
end if;

/* add portion for att_3 */
if p_3 is null or p_3_type is null
then
l_sql := l_sql || ' and (1=1 or :p3 is null)';
else
l_sql := l_sql || ' and ( att_3 ' || p_3_type || ' :p3 )';
end if;

/* open cursor */
open p_results for l_sql using p_1,p_2,p_3;
end;
/

create table table_of_searches (
p1 number
,p_1_type varchar2(1) -- check (p1_type in ('<','=','>'))
,p2 number
,p_2_type varchar2(1)
,p3 number
,p_3_type varchar2(1)
);

REM fill in table_of_searches

create or replace procedure run_benchmark
as
l_result sys_refcursor;
begin
runstats_pkg.rs_start;

/* search table T_X */
for curr1 in ( select * from table_of_searches )
loop
search_T( l_result
,curr1.p1, curr1.p_1_type
,curr1.p2, curr1.p_2_type
,curr1.p3, curr1.p_3_type
,'X' );
-- QUESTION What else needs to be done?
close l_result;
end loop;

runstats_pkg.rs_middle;

/* search table T_Y */
for curr2 in ( select * from table_of_searches )
loop
search_T( l_result
,curr2.p1, curr2.p_1_type
,curr2.p2, curr2.p_2_type
,curr2.p3, curr2.p_3_type
,'Y' );
-- do the same here as for curr1
close l_result;
end loop;

runstats_pkg.rs_end;
end;
/

Tom Kyte

Followup  

March 28, 2012 - 12:04 pm UTC

if you only have 3 searchable attributes - why have 20 indexes?

You don't really need to do this - just write an arbitrary query - flip a coin to see if an attribute should be or should not be in a given query.

then get the plan - with bitmaps, without. run the query. evaluate the performance

ask yourself "how would I index using b*trees for this query, then how would I index for the next possible query, and so on"

ask the same question with bitmaps

one of them will have a small bounded set of indexes, the other will have a very large set of indexes.

bitmap index, btree index, bitmap join index

July 23, 2012 - 3:51 am UTC

Reviewer: A reader

Hi Tom,

I did some testing and found below:
1. for low cardinality column, bitmap index size < B*tree index size
2. for high cardinality column, bitmap index size > B*tree index size
3. if the number of distinct value is same as the number of rows, bitmap index size = B*tree index size
/*******************************************test script********************************************************/
create table test(id int, name1 varchar2(10), name2 varchar2(10));
insert into test select rownum, 'name'||rownum, 'name'||rownum from dual connect by level<=50000;
commit;
create index btree_ind1 on test(name1);
create bitmap index bitmap_ind1 on test(name2);
exec dbms_stats.gather_schema_stats(USER);
select segment_name, bytes from user_segments where segment_name like '%IND%';
SEGMENT_NAME BYTES
--------------------------------------------------------------------------------- ----------
BTREE_IND1 2097152
BITMAP_IND1 2097152
/*******************************************test script********************************************************/

Question 1. I know it is due to how the index is implemented and stored, but not quite sure, could you please elaborate more?

I intented to test if both B*tree and Bitmap index exists, which one will be choosed by ORACLE, unfortunately i can
not create 2 indexes on 1 column, so i used below:

/*******************************************test script********************************************************/
create table test(id int, name varchar2(10));
insert into test select rownum, 'name'||rownum from dual connect by level<=50000;
commit;
create index btree_ind1 on test(name,1);
create bitmap index bitmap_ind1 on test(name,2);
exec dbms_stats.gather_schema_stats(USER);
select segment_name, bytes from user_segments where segment_name like '%IND%';
SEGMENT_NAME BYTES
--------------------------------------------------------------------------------- ----------
BTREE_IND1 2097152
BITMAP_IND1 2097152

select * from test where name='name1';
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 15 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID | TEST | 1 | 15 | 2 (0)| 00:00:01 |
| 2 | BITMAP CONVERSION TO ROWIDS| | | | | |
|* 3 | BITMAP INDEX RANGE SCAN | BITMAP_IND1 | | | | |
--------------------------------------------------------------------------------------------

select /*+ index(test btree_ind1) */ * from test where name='name1';
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 15 | 3 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TEST | 1 | 15 | 3 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | BTREE_IND1 | 1 | | 2 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
/*******************************************test script********************************************************/

Question 2, why seems ORACLE prefer bitmap index to b*tree index? Or it is just random? Or my test script is not qualified enough?

Also in your book, you talk about the BITMAP-JOIN-INDEX.
I am think it is behaviouring same as normal index except it refer to more than one tables.
/*******************************************test script********************************************************/
create table dept(did number, name varchar2(10), location varchar2(10));
alter table dept add constraint dept_pk primay key(did);
create table emp(eid number, did number, name varchar2(10), title varchar2(10));
create bitmap index ind1 on emp(d.name, e.name)
from emp e, dept d
where e.did = d.did;

begin
dbms_stats.set_table_stats( user, 'EMP', numrows => 1000000, numblks => 300000 );
dbms_stats.set_table_stats( user, 'DEPT', numrows => 100000, numblks => 30000 );
dbms_stats.delete_index_stats( user, 'IND1' );
end;
/

select e.name from emp e, dept d where e.did = d.did and d.name='AA';
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10000 | 195K| 6140 (1)| 00:01:14 |
| 1 | TABLE ACCESS BY INDEX ROWID | EMP | 10000 | 195K| 6140 (1)| 00:01:14 |
| 2 | BITMAP CONVERSION TO ROWIDS| | | | | |
|* 3 | BITMAP INDEX RANGE SCAN | IND1 | | | | |
-------------------------------------------------------------------------------------

select d.name from emp e, dept d where e.did = d.did and d.name='AA';
select e.name,d.name from emp e, dept d where e.did = d.did and d.name='AA';
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100 | 4000 | 6141 (1)| 00:01:14 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 100 | 4000 | 6141 (1)| 00:01:14 |
| 3 | TABLE ACCESS BY INDEX ROWID | EMP | 10000 | 195K| 6140 (1)| 00:01:14 |
| 4 | BITMAP CONVERSION TO ROWIDS| | | | | |
|* 5 | BITMAP INDEX RANGE SCAN | IND1 | | | | |
|* 6 | INDEX UNIQUE SCAN | DEPT_PK | 1 | | 0 (0)| 00:00:01 |
|* 7 | TABLE ACCESS BY INDEX ROWID | DEPT | 1 | 20 | 0 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
/*******************************************test script********************************************************/

Question 3. why my last 2 queries need to visit table 'DEPT'? I am thinking all info are covered by index, why not just accessing index is enough?
Tom Kyte

Followup  

July 30, 2012 - 8:49 am UTC

1. for low cardinality column, bitmap index size < B*tree index size
2. for high cardinality column, bitmap index size > B*tree index size
3. if the number of distinct value is same as the number of rows, bitmap index
size = B*tree index size,


maybe, maybe and maybe

and you do realize that your #2 and #3 contradict each other. You cannot get higher distinct cardinality than "number of distinct values = number of rows".....


you should have used two tables created identically so as to remove any issues with the extra bits and bytes of the concatenated columns in the index.


Question 1. I know it is due to how the index is implemented and stored, but
not quite sure, could you please elaborate more?


the bitmap index may have a single entry in it for a given user followed by a compressed bitmap that represents pointer to all of the rows with that key value.

for example - scott might have one entry in the bitmap index like this:


SCOTT, <low-rowid>, <high-rowid>, <compressed bitmap of 0's and 1's>

and that could cover hundreds, thousands of rows.


In the b*tree index however, scott would have:

SCOTT, <rowid>
SCOTT, <rowid>
.... hundred/thousands of entries
SCOTT, <rowid>


the key would repeat once per row and there would be a rowid (big, many bytes) for each row.


Question 2, why seems ORACLE prefer bitmap index to b*tree index? Or it is just
random? Or my test script is not qualified enough?


the height of the bitmap index was probably smaller (bitmaps are stored in b*tree index structures!!!). It looks like it was - so there would have been one less IO against the bitmap perhaps.


Question 3. why my last 2 queries need to visit table 'DEPT'? I am thinking all
info are covered by index, why not just accessing index is enough?,


just an optimmization that isn't yet implemented for the bitmap join index.

Limit serialization on bitmap index updates?

July 23, 2012 - 1:11 pm UTC

Reviewer: Matthew McPeak from Cherry Hill, NJ

Tom,

In your original answer, you wrote:

Bitmaps in fact do update rather quickly. However, they also tend to SERIALIZE and do not respond well to slow by slow (row by row) updates. They are suitable for read only/read mostly data that is modified in BULK (not row by row).

Now, I have a table where I would like to create five (5) bitmap indexes, because we use Oracle BI to facilitate ad-hoc queries on these 5 columns, and I want the performance to be good no matter which combination of the 5 columns the user specifies values for.

Say the columns are A, B, C, D, and E. Now, I have a program that will maintain the table T. The program only maintains the data for one value of column A at a time. Multiple copies of the program run in parallel, each working on a distinct value of "A".

Is there any way to build the bitmap indexes so that the bitmap maintenance done by the program maintaining A=1, say, does not serialize with the program maintaining A=2?

If it matters, there are only 7 distinct values of "A".



Tom Kyte

Followup  

July 30, 2012 - 8:59 am UTC

as long as they work on separate key values for A, they'll be fine on the A index.


But, what if A=1 has B=5
And A=2 has B=5 also.

they will contend on the B index, and the C, D, E indices as well.


It isn't about A, it is about b,c,d,e on this case.

one more thing...

July 23, 2012 - 1:30 pm UTC

Reviewer: Matthew McPeak from Cherry Hill, NJ

I should probably mention that we are not licensed for Partitioning. :(

Thanks!

No tricks?

July 31, 2012 - 8:53 am UTC

Reviewer: Matthew McPeak from Cherry Hill, NJ

Thanks, Tom. I guess I was hoping there was some "trick", like, including column A into every index or something.

I've backed of the bitmap indexes for now, even though they really make one of the reports in the system fly. Performance is still decent with B*tree indexes.

I can probably count on one hand the number of times I've actually used bitmap indexes in a live system... they often appear promising, it seems like you *really* need quiescent tables to get away with them.

It seems that not only must there be infrequent updates, but those updates also cannot be done in parallel?



Tom Kyte

Followup  

July 31, 2012 - 12:49 pm UTC

it seems like you
*really* need quiescent tables to get away with them.


it doesn't seem like it is true.

it is true.


bitmap indexes and concurrent updates just do not happen, they are like oil and water.


you MUST use bulk SQL (large inserts of thousands or millions of rows - not row by slow row).

you MUST be the only one doing it against a given segment (table, partition)

if you need to do it in multiple sessions - you would have to use parallel DML, not a do it yourself parallel

row by row is slow for the bitmap indexes

August 22, 2012 - 12:12 am UTC

Reviewer: Hrishikesh from New Zealand

Hi Tom,

Grretings!! It's an honour being able to converse with you.

I have gone throughthe thread above, I have a related question.

following is the sample query I am running

MERGE INTO smp s
USING( SELECT service_key
, calc_latest_flag
FROM ( SELECT service_key
, latest_flag
, CASE WHEN ( row_rank = 1 )
THEN 'Y'
ELSE 'N'
END AS calc_latest_flag
FROM ( SELECT service_key,
latest_flag,
row_number() OVER ( PARTITION BY service_id ORDER BY start_date DESC) as row_rank
FROM smp
)
)
WHERE (latest_flag <> calc_latest_flag)
) inset
ON ( s.service_key = inset.service_key )
WHEN MATCHED THEN
UPDATE SET latest_flag = inset.calc_latest_flag

WHEN NOT matched THEN
INSERT (service_key) VALUES ( -1 )

& I have bitmap index on latest flag & it's taking hours to update. Now latest flag only has 2 values Y & N so bitmap keys covering lot of rows.. so in order to improve the performance..

should I

- Write an update statement with
update smp a set (latest flag)=(...smp b where a.key=b.key ) where exists (...smp b where a.key = b.key)?

-- drop & recreate index pre & post this query?



Tom Kyte

Followup  

August 28, 2012 - 1:33 pm UTC

why do you have a bitmap index?


if the answer is "because there are only two values" that is the wrong answer.


an index - any index, b*tree or bitmap, are good to retrieve a small number of rows from a table. Period.


Now, if your Y/N values are very skewed so that most are one value - and you frequently want to use an index to retrieve those very few values - then you would want an index.

but probably not a bitmap index. You want a bitmap when you have the opportunity to combine many of them together. You don't want a bitmap when you update the data, do single row modifications, have any sort of concurrent modifications. You do not create a bitmap index just because there are few values.


So, you probably meant to use a b*tree index I'm guessing? If you really meant to use an index at all???


Followup

August 23, 2012 - 5:40 pm UTC

Reviewer: Hrishikesh from New Zealand

Hi Tom, Request you to kindly answer the query!

follow up

August 26, 2012 - 9:02 pm UTC

Reviewer: Hrishikesh from New Zealand

Hi Tom, Request you to kindly answer the query!



Regarding scenario explained by Hrishikesh

February 22, 2013 - 12:29 pm UTC

Reviewer: Amit from New York, USA

Hi Tom,

I am little curious about question asked by Hrishikesh. I agree with you that even though flag has only 2 values, it is not good to use bitmap index in this case as this column is updated frequently. But I am curious to know if B Tree index is used on flag column where , lets say many records have value 1 (representing 'Processed'). And my application frequently queries that table for value 0 (representing 'Not Processed' which occurs for less records).
1. So, in this scenario, will optimizer use index as there are only 2 distinct values?
2. Application will have query like "SELECT * FROM TABLE_NAME WHERE FLAG_COLUMN = v_flag_not_processed; ". Hence, when this query is parsed for first time, how will optimizer decide when to use B Tree index?
3. I guess, in 11g, this scenario can benefit from bind peeking. But will it work in older versions like 9i?

Thanks!
Tom Kyte

Followup  

February 25, 2013 - 11:17 am UTC

bitmap, b*tree, ANY-INDEX-TYPE HERE - are good for retrieving small numbers of rows from a table (not a certain percentage, just a SMALL number of rows)


if you have a flag column with two values, and the values are skewed so that one of the values occurs very frequently and the other non-frequently (points to a small number of rows) - then an index, be it bitmap or btree, on this column makes sense if you retrieve the infrequently occurring values.

You can make the index smaller by using a function based index:


create index my_index on t( case when flag = 0 then 0 end );

and query:

select * from t where (case when flag=0 then 0 end) = 0;

I'd suggest in 11g using a virtual column to add that column to the table (no extra storage) and then indexing that - so you can more easily query the data. In 10g and before, use a view plus function based index to accomplish the same.

In this fashion, ONLY the flag=0 rows will be indexed.



1) yes, it would - even if you didn't use the function based index approach above. the optimizer works by estimating cardinalities and it would estimate "flag=0" as returning a small number of rows. Hence it would tend towards an index.

2) bind peeking would kick in, optimizer would estimate small number of rows.

But in this case, if you are going after unprocessed records, why would you even bind???? just query "where flag = 0" - no need to bind in that case.

3) bind peeking functioned in 9i, 10g, and 11g on up...

Using a BitMap Index or not using any index?

July 02, 2013 - 7:15 am UTC

Reviewer: Pravellika from chennai, India

Hello Tom,
I was recently asked in an interview about BitMaps Index.
The question was that in a column there are only 2 distinct values 'M', 'F' which are equal in count.
These are no frequent updates/inserts on these tables.
Table count = 10M.
Would you use a Bitmap index or leave it without any index? Which would be better/faster?
Thanks,
Pravellika
Tom Kyte

Followup  

July 02, 2013 - 4:59 pm UTC

I would say "you haven't given me enough information to answer that"

do you ever use this column in predicates?

do you ever use this column in predicates with some other columns that already have a bitmap index on them?

do you ever plan on asking "how many M's do I have?" or "how many F's do I have?"



if you are planning on running queries with just a where clause of "c = 'M'" or "c = 'F'" - and retrieving those rows, then no, a bitmap index (any index) would be not smart - there are too many rows being retrieved.


If you are planning on combining this bitmap index on this column with a few other bitmap indexes - so you can do a where clause like:

where gender = 'M' and age between 20 and 25 and ( state = 'WI' or state = 'WA' )


and so you'd be taking a bitmap index on gender, age, and state and or'ing/and'ing them together to get a new bitamp - yes, it might make sense.


In short, you'd have to tell me a lot more before I could come up with an answer - and hopefully the interviewers know that and would be prepared to come up with the missing data. If anyone ever gave them an answer - they probably shouldn't be hired :)

How to get data distribution

July 04, 2013 - 6:18 am UTC

Reviewer: Pradeep Sorari from India

I read somewhere that if you are selecting less than 15% of data from a table then you should use index and this percentage varies with the fact that how data is clustered with regards to the index key,i.e If data is stored near to each other then index might be useful in selecting large percentage of data since it would get it by reading fewer blocks ......

Question is, given a unknown database how one is going to determine the data distribution across the data files..... ..and then to take decision, index should be used or not.....

Please add up scenarios with your experiences !!

As always,really appreciate your help !!

Thanks in advance.

Tom Kyte

Followup  

July 16, 2013 - 1:08 pm UTC

I read somewhere that if you are selecting less than 15% of data from a table
then you should use index


find out where you read that and throw that book away and stop reading that author.


a) you use an index to select a small number of rows.
b) you use an index as a smaller version of the table (the index has all columns needed to answer the query, known as a 'covering' index)


you do not use an index to retrieve 15% of the rows in a table. You might, but there isn't any "rule". You might not use an index to retrieve 1% of the rows in a table!!!!!!!!!!!!!!!!

what if you have a 1,000,000 row table and you want 1% of that.

suppose further, this table has about 100 rows per block.

suppose also that this columns data value is uniformally distributed throughout the table so that approximately 1 row exists on each block


to get this set of 1% out of the table via an index will require you to READ EVERY BLOCK IN THE TABLE using single block IO.

to get this set of 1% out of the table via a full scan will require you to READ EVERY BLOCK IN THE TABLE using *multi-block IO*.

so, now, what would you like - an index (have to read the index AND read every block in the table one at a time) or an efficient full scan (have to read every block in the table using BIG IO's)


so, throw out that 15% number. it is rubbish. Sometimes you might want to retrieve 100% of the rows via an index (think of first rows optimization where you want the first 25 rows as fast as possible). you might not use an index to retrieve 0.1% of the rows in a table.




On an unknown database - you have to get to know it. You have to understand the data arrival patterns, how the data is used, what the data means. If you don't - you cannot make an informed decision. You need to understand the schema, the applications - the ecosystem you are working in. adding an index might make one query go faster - but it might make a data load that you are not looking at go MUCH slower all of a sudden.

the only way I know to make a truly informed decision is to having some understanding of the data patterns.



other than that, get the tuning diagnostic pack, let it suggest indexes to you, use the SQL Performance analyzed and Real application testing to test the outcome of adding those suggested indexes and let the software do the work for you (basically a brute force approach, but a safe one since you'll be doing all testing in a non-production environment)

Please answer

July 09, 2013 - 12:19 pm UTC

Reviewer: Pradeep Sorari from India

Hi Tom,

I know you are very much busy, I had been trying to search answer to above question but didn't got a satisfactory answer.
Would you please take some time and help me out.

Looking forward for your response !!

Many Thanks.
Tom Kyte

Followup  

July 16, 2013 - 3:14 pm UTC

I was traveling - part business, but mostly vacation....

everyone has to take time off every now and then.

Many thanks TOM !!

July 18, 2013 - 4:34 pm UTC

Reviewer: Pradeep Sorari from India

Yes we do require off !! coincidentally I was also off these days .... Many thanks for your response.

I might had't been able to explain : Author wasn't actually mandating a 15% rule for index usage, it was just explanation of some examples where index can be used for
larger than expected percentage of data and sometime not for very small percentage ... Which comes to below two examples:

1. (Which you already mentioned)- IF data is evenly distributed thought the database then we "might" not require index for even 1% of data ....
2. When data is highly clustered with respect to index key we might require it for 15% or more than that as we may find all of our data in fewer data files only.....

By unknown database I meant is there any way to find out data distribution with the data dictionary views ... etc ..by querying in database and calculating etc

Yes I know by knowing the data arrival methods , knowing how data is used and how it is being stored , we will get our answer. I am looking for any other way like querying the some data dictionary views etc ... (Quite weird , but just to gain some knowledge) :)

Many Thanks !!
Tom Kyte

Followup  

July 18, 2013 - 5:53 pm UTC

you might use an index to retrieve 100% of the rows in a table too - regardless of the clustering factor (think of a first rows situation where you want the query to start returning data as soon as possible and you will page through the result set on screen).


what do you mean by "data distribution"?

if you mean "can we discover how clustered the data is", you'd sort of need to have an index in place with representative statistics on it so you could see the clustering factor. if you do not have that, you'd have to write a query using lag/lead to read the data out and compare the blocks of rows in an ordered fashion to compute the clustering factor yourself (eg: full scan the data, sort it, analyze it). unless we have an index - we don't really need the clustering factor and so we wouldn't compute it.



for example, let's create a table with a well clustered ID1 column and a poorly clustered ID2 column.

then I'll create an index on each just to see what the clustering factor is

and then I'll show you the queries that could tell you what the clustering factors would be - without having the need for an index to be in place.

ops$tkyte%ORA11GR2> create table t
  2  as
  3  select rownum id1, trunc( dbms_random.value( 0, 1000000 ) ) id2, rpad( 'x', 100, 'x' ) data
  4    from all_objects
  5  /

Table created.

ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> create index id1_idx on t(id1);

Index created.

ops$tkyte%ORA11GR2> create index id2_idx on t(id2);

Index created.

ops$tkyte%ORA11GR2> exec dbms_stats.gather_table_stats( user, 'T' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> select num_rows, blocks
  2    from user_tables
  3   where table_name = 'T';

  NUM_ROWS     BLOCKS
---------- ----------
     72908       1204

ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> select index_name, clustering_factor
  2    from user_indexes
  3   where table_name = 'T'
  4   order by index_name;

INDEX_NAME                     CLUSTERING_FACTOR
------------------------------ -----------------
ID1_IDX                                     1176
ID2_IDX                                    72850

ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> select count( case when bno <> last_bno then 1 end )
  2    from (
  3  select dbms_rowid.rowid_block_number(rowid) bno,
  4         lag(dbms_rowid.rowid_block_number(rowid)) over (order by id1, rowid) last_bno
  5    from t
  6         )
  7  /

COUNT(CASEWHENBNO<>LAST_BNOTHEN1END)
------------------------------------
                                1175

ops$tkyte%ORA11GR2> select count( case when bno <> last_bno then 1 end )
  2    from (
  3  select dbms_rowid.rowid_block_number(rowid) bno,
  4         lag(dbms_rowid.rowid_block_number(rowid)) over (order by id2, rowid) last_bno
  5    from t
  6         )
  7  /

COUNT(CASEWHENBNO<>LAST_BNOTHEN1END)
------------------------------------
                               72849





April 11, 2014 - 7:41 pm UTC

Reviewer: Sujit Mondal from Westfield , NJ

Tom ,
I have a table with 68 bitmap index among those 4 are function based index. Inserting into this table took around 3-4 minute for 1000 records. While without the bitmap indxes it inserts in seconds. We refresh new data into this table every half hour. Is there anyway I can improve performance of this insert. We are using 11.2.0.3 , table has 8.5 million rows , with 1.2 million blocks. I am not sure what additional information you need to better explain the scenario , please let me know if you need any addtional info.

Tom Kyte

Followup  

April 16, 2014 - 3:57 pm UTC

how do you "refresh", be very very very specific. I have no clue what "refresh" might mean to you.

what is the size of the "refresh". be very very very specific

this should be a direct path load of all records in ONE step.


followup

April 16, 2014 - 4:28 pm UTC

Reviewer: Sujit from Westfield , NJ

Hi Tom,
This is followup to the earlier response in this thread. By refresh I mean insert of incremental data. It is insert of 1000 records at a time. Are you suggesting to do the dircet path insert with bitmap indexes on.
Tom Kyte

Followup  

April 16, 2014 - 4:40 pm UTC

you only insert 1,000 records every 30 minutes???? that is teeny tiny and is going to really break those bitmaps (you'll need rebuilding frequently).

are you sure you want/need bitmaps? what types of queries are you doing that necessitate bitmaps in this case? do you have lots of plans that merge multiple bitmaps together?


are you employing partitioning at all. since this is an add only table (just inserts), you *probably* should be partitioning by some date range so you can load say a weekly or monthly partition and at the end of the month - alter move the partition to compress and rebuild the bitmaps on that partition and then leave it alone.


are you doing that tiny load in direct path in a SINGLE step? (single step - ONE insert /*+ append */ as select * from external_table)

num_rows in BITMAP index

January 09, 2017 - 8:08 am UTC

Reviewer: Rajeshwaran Jeyabal

Team,

With B*Tree index, for each row in the table we have one entry in the leaf block.
with Bitmap index, for each DISTINCT value in the indexed column, we have only bitmap piece in the bitmap index.

But could you help me to understand why this num_rows is shown here as 600?

demo@ORA12C> create table t as
  2  select a.*, mod(rownum,100) as x
  3  from all_objects a , all_users
  4  where rownum <=1000000;

Table created.

demo@ORA12C>
demo@ORA12C> select count(*) , count(distinct x) from t;

  COUNT(*) COUNT(DISTINCTX)
---------- ----------------
   1000000              100

1 row selected.

demo@ORA12C>
demo@ORA12C> create bitmap index t_idx on t(x);

Index created.

demo@ORA12C> select num_rows,leaf_blocks,distinct_keys,
  2        sample_size,avg_leaf_blocks_per_key,
  3          avg_data_blocks_per_key
  4  from user_indexes;

  NUM_ROWS LEAF_BLOCKS DISTINCT_KEYS SAMPLE_SIZE AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY
---------- ----------- ------------- ----------- ----------------------- -----------------------
       600         300           100         600                       3                       6

1 row selected.

demo@ORA12C>

Connor McDonald

Followup  

January 10, 2017 - 4:03 am UTC

A bitmap index is similar to a b-tree *structure* where the index information contains:

column_data, start_rowid, end_rowid

so the NUM_ROWS is is still the number of "rows" in the index, where the definition of "row" is the permutations of the three attributes above.

So even for the same column_data value, as you cross (for example) an extent boundary, you would get another 'row'.