Skip to Main Content
  • Questions
  • Partial indexing using multiple function indexes or precomputed json object

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, paul.

Asked: May 20, 2020 - 9:15 am UTC

Answered by: Chris Saxon - Last updated: May 22, 2020 - 12:14 am UTC

Category: Database Development - Version: 11.1.0.6

Viewed 100+ times

You Asked

I have an orders table with a status flag showing whether that order has been completed. Most orders in the tables are completed: probably only 5%ish of the table are incomplete.
I know I can build a partial index by using a function based index which returns the order id or null depending upon the value of the status flag. This way, only incomplete orders are stored in the index. This all works fine.
My question is if i need a number of other columns from the table, and want an index only scan to avoid TABLE ACCESS BY INDEX ROWID in the plan, what's the best way of doing this. Let's say I also want customer id and product id. These values will never be null, so if I add them to the index, it will mean I wont end up with a partial index as null values will just be added for the initial function column, as the other 2 columns will always have values. So I see to basic approaches, and am not sure which is better.

1. I build an index made up of a string of functions, all returning the required values based upon the value of the status field.
2. I retain the first column based upon the function looking at status, but I only build one more index which is not used as part of the predicate or filter, but actually returns a larger precomputed json object that is returned from a second function within the index, and returns a json object based upon all the required values being parameters within that function, and also the status flag being used to determine if the function returns null or the json object.

Or maybe there is just a much better way of building a partial index with additional foelds?

and we said...

First question: do you really need to do anything fancy?

Is performance "good enough" with a plain index on just STATUS?

If not, a plain old index on:

create index ...
  on ... ( status, customer, product );


Should get an index-only scan. Is this "good enough"?

I'd only start doing something more esoteric if the above really are too slow.

If you really need a skinny index that only stores rows of interest, then I'd:

- Create virtual columns returning null for all columns of interest for completed rows
- Index those
- Use the virtual columns in your query:

create table t as 
  select level tid,
         case 
           when level > 9500 then 'TODO'
           else 'DONE'
         end status,
         mod ( level, 31 ) product_id, 
         mod ( level, 79 ) customer_id,
         rpad ( 'x', 100, 'x' ) stuff
  from   dual
  connect by level <= 10000;

alter table t 
  add (
    status_todo as ( case when status = 'TODO' then status end ),
    customer_todo as ( case when status = 'TODO' then customer_id end ),
    product_todo as ( case when status = 'TODO' then product_id end )
  );

create index i 
  on t ( status_todo, customer_todo, product_todo );
  
select num_rows from user_indexes
where  index_name = 'I';

NUM_ROWS   
        500 

set serveroutput off

select customer_todo, product_todo
from   t
where  status_todo = 'TODO';

select * 
from   table(dbms_xplan.display_cursor(format => 'BASIC LAST'));

---------------------------------                                          
| Id  | Operation        | Name |                                          
---------------------------------                                          
|   0 | SELECT STATEMENT |      |                                          
|   1 |  INDEX RANGE SCAN| I    |                                          
---------------------------------

and you rated our response

  (4 ratings)

Reviews

When to use a skinny index?

May 21, 2020 - 10:12 am UTC

Reviewer: Paul Brown from LA

Chris, thanks for the reply. I had no idea virtual columns existed in Oracle: they seem super useful.
My follow up question looking at your answer is what are the strongest user cases for skinny indexes. Because our application is only really interested in unprocessed orders which is only about 5% of the data (and getting smaller), it seems to make sense to build an index with only the data that we are interested in working with in the application. However, I understand that the nature of the B-tree and the way it traverses branched means huge tables can be traversed through a small # of branches. With this in mind, Is a partial or skinny index a waste of time? The only real advantage I can see is the index will be a 20th of the size of an index covering the whole table. Is this reason enough (and that proportion grows smaller as the table grows)?
Also, Even though I am only getting 5% of the data, I may still need to get 1000's of rows for certain report exports. If I wanted to put these in the index to avoid the cost of fetching from the table, are there any major pitfalls in building a wide index? Are there any advantages to precomputing a lot of the data into a json object and returning that in a function which is part of the index, so rather than indexing 15 or so columns, i index the first couple which are used as part of the indexed search , and the third is a function that returns a json object containing all the rest of the data from multiple columns, which would be a deterministic function defined as part of the index?

Chris Saxon

Followup  

May 21, 2020 - 3:01 pm UTC

Your situation seems like a good use case for a skinny index. But think about the trade-offs you're making using this:

- An index range scan of < 5% of entries will still be efficient; it's likely to be only marginally more work than scanning the skinny index.

- To use the skinny index, you need to write queries in a specific way (using the virtual columns). This detail tends to get lost over time & people write new queries using the regular columns so the database still reads the table

- The fat index may be useful for other queries. If there are only 2-3 values for STATUS, the optimizer may choose an index-skip scan when searching for rows using whichever column is second in the index. This might avoid the need for another index on this column.

- If the optimizer chooses a full index scan, it's much faster to read the skinny index (vs fat).

For me the big drawback of a skinny index is point 2 - it increases the "surprise factor" of the database. In my experience someone will be caught out by this at some point in the (distant) future.

If your testing proves that the performances gains of the skinny index are worth it, by all means go ahead. Just make sure you comment/document this EXTENSIVELY!

If I wanted to put these in the index to avoid the cost of fetching from the table, are there any major pitfalls in building a wide index?

Probably the biggest thing to watch for is the index key - the maximum length of the indexed data types - exceeds the index's blocksize. Note that the data doesn't have to be large itself. If you index a varchar2(4000), the database assumes it needs up to 4,000 bytes of index space for each entry. Even if all strings are less than 100 bytes.

See https://blogs.oracle.com/sql/how-to-fix-ora-01450-maximum-key-length-6398-exceeded-errors

Also there's the issue of diminishing returns. If you've stuck most of the columns from a table in an index, the index could be bigger than the table itself! So full scanning the index is MORE work than table accesses by index rowid.

Are there any advantages to precomputing a lot of the data into a json object

You could create a JSON object returning the other columns... but frankly that idea seems crackers to me!

Aside from being convoluted, remember that the JSON object includes the table metadata - i.e. the column names (as attributes). So this will be bigger than just indexing the columns themselves. In extreme cases - long attribute names, short values - well over half the size of JSON may be metadata!

Partial index Vs skinny index

May 21, 2020 - 2:30 pm UTC

Reviewer: Rajeshwaran, Jeyabal

Partial index are possible if the base tables are partitioned

RichardFootee - got an excellent example about that (simillar to yours case) @ https://richardfoote.wordpress.com/2013/07/08/12c-partial-indexes-for-partitioned-tables-part-i-ignoreland/

where as skinny indexes are something - used, while indexing only the interested set of values, making the index to appeal more tiny to the optimizer.
Chris Saxon

Followup  

May 21, 2020 - 3:04 pm UTC

True. Adding partitioning (if not already used) adds a whole bunch of other considerations though...

Full index scan on skinny indexes

May 21, 2020 - 4:58 pm UTC

Reviewer: Paul Brown from LA

Table partitioning is not an option at the moment, but a really interesting article link to read.
Thanks for feedback Chris: really insightful. Last couple of points on this.
Effectively putting some of the predicate conditions into the logic of the virtual column removes the need for a composite index. In my case, i would probably only need to search on one condition: location id. Does a much simpler predicate have any effect on index performance and speed, or does the balanced structure of the B-tree negate this.
Finally, when you say a full index scan will be much quicker, is the general rule of thumb still that an index will perform a full index scan if more than 10% of the index rows need to be selected?
Chris Saxon

Followup  

May 21, 2020 - 5:22 pm UTC

Does a much simpler predicate have any effect on index performance and speed, or does the balanced structure of the B-tree negate this.

I'm not sure what you're asking here. Could you clarify please?

Finally, when you say a full index scan will be much quicker

I meant this only in the context of the optimizer choosing a full index scan. Full scanning a skinny index will be quicker than full scanning a fat index. This is due to the fact the smaller index is a tiny fraction of the size of the larger index.

is the general rule of thumb still that an index will perform a full index scan if more than 10% of the index rows need to be selected?

There is no such rule of thumb! Generally the decision to full scan an index is independent of the number of rows accessed. Usually it's because a range scan is impossible.

Simpler predicate

May 21, 2020 - 8:22 pm UTC

Reviewer: Paul Brown from LA

Does a much simpler predicate have any effect on index performance and speed, or does the balanced structure of the B-tree negate this.

So lets say I add a virtual column where the order_id for a row is only returned if both the status flag and the paidInFull flag are both set to 1. I can then index this virtual column and I have a skinny index of current orders, which are the only orders the application is interested in, and are a fraction of the table as a whole.
I can then build an index based on loctionId and this virtual column which gives me current orders for a given location.
Without the virtual column, I would have to build an index based upon loctionId , status and paidInFull columns to find the current orders for a given location. In other cases, I have queries where I may need to search on 4 or 5 columns.
In building a virtual column, a lot of the where clauses are already filtered for me, and I only have 2 search conditions . Are there any advantages in performance of having a simpler where clause with fewer conditions?
Connor McDonald

Followup  

May 22, 2020 - 12:14 am UTC

Are there any advantages in performance of having a simpler where clause with fewer conditions?


Very unlikely. There could be niche cases where that might be true, but as a general rule you'd never notice.


More to Explore

Design

New to good database design? Check out Chris Saxon's full fundamentals class.