Chris, thanks for the reply. I had no idea virtual columns existed in Oracle: they seem super useful.
My follow up question looking at your answer is what are the strongest user cases for skinny indexes. Because our application is only really interested in unprocessed orders which is only about 5% of the data (and getting smaller), it seems to make sense to build an index with only the data that we are interested in working with in the application. However, I understand that the nature of the B-tree and the way it traverses branched means huge tables can be traversed through a small # of branches. With this in mind, Is a partial or skinny index a waste of time? The only real advantage I can see is the index will be a 20th of the size of an index covering the whole table. Is this reason enough (and that proportion grows smaller as the table grows)?
Also, Even though I am only getting 5% of the data, I may still need to get 1000's of rows for certain report exports. If I wanted to put these in the index to avoid the cost of fetching from the table, are there any major pitfalls in building a wide index? Are there any advantages to precomputing a lot of the data into a json object and returning that in a function which is part of the index, so rather than indexing 15 or so columns, i index the first couple which are used as part of the indexed search , and the third is a function that returns a json object containing all the rest of the data from multiple columns, which would be a deterministic function defined as part of the index?
May 21, 2020 - 3:01 pm UTC
Your situation seems like a good use case for a skinny index. But think about the trade-offs you're making using this:
- An index range scan of < 5% of entries will still be efficient; it's likely to be only marginally more work than scanning the skinny index.
- To use the skinny index, you need to write queries in a specific way (using the virtual columns). This detail tends to get lost over time & people write new queries using the regular columns so the database still reads the table
- The fat index may be useful for other queries. If there are only 2-3 values for STATUS, the optimizer may choose an index-skip scan when searching for rows using whichever column is second in the index. This might avoid the need for another index on this column.
- If the optimizer chooses a full index scan, it's much faster to read the skinny index (vs fat).
For me the big drawback of a skinny index is point 2 - it increases the "surprise factor" of the database. In my experience someone will be caught out by this at some point in the (distant) future.
If your testing proves that the performances gains of the skinny index are worth it, by all means go ahead. Just make sure you comment/document this EXTENSIVELY!If I wanted to put these in the index to avoid the cost of fetching from the table, are there any major pitfalls in building a wide index?
Probably the biggest thing to watch for is the index key - the maximum length of the indexed data types - exceeds the index's blocksize. Note that the data doesn't have to be large itself. If you index a varchar2(4000), the database assumes it needs up to 4,000 bytes of index space for each entry. Even if all strings are less than 100 bytes.
Also there's the issue of diminishing returns. If you've stuck most of the columns from a table in an index, the index could be bigger than the table itself! So full scanning the index is MORE work than table accesses by index rowid.Are there any advantages to precomputing a lot of the data into a json object
You could create a JSON object returning the other columns... but frankly that idea seems crackers to me!
Aside from being convoluted, remember that the JSON object includes the table metadata - i.e. the column names (as attributes). So this will be bigger than just indexing the columns themselves. In extreme cases - long attribute names, short values - well over half the size of JSON may be metadata!