We have a column with varchar2(2000) and the column is indexed. But first few characters can be same for multiple rows. but when the first few string length goes it narrows the numbers of hits.
Our data is like a nested directory structure.
I reading an article about this
http://www.bwong64.com/oracles-32-character-limit-in-generating-histograms/ "if you have a column that is wider than 32 characters AND its values are longer than 32 characters, Oracle will only consider the first 32 characters in the column in generating histograms. In other words, if the first 32 characters don’t vary much in all the rows or it’s identical in all rows, then the histograms generated or the absence of them can be way off. As a consequence, the optimizer will make wrong decisions, doing range scans when it shouldn’t, thinking that more rows will match the criteria than there should be."
One of the option is reverse key index but that is not effective in case of range scan ...
Could you please explain and provide your valuable thoughts and solutions
Thanks
Rama
see
http://www.oracle.com/technetwork/issue-archive/2011/11-jan/o11asktom-195084.html for more info.
In there I wrote:
<quote>
You should be aware of this histogram implementation, however, because it could prompt you to store some data attributes differently. For example, suppose you have a system that stores URLs in a table—an audit trail, for example. If you look at a URL such as the ones you see on asktom.oracle.com, you’ll discover that the first 39 characters of my URL are constant—
http://asktom.oracle.com/pls/asktom/f?p —regardless of what page you end up on in my site. Having a histogram on that data would not be very useful.
However, what if I stored the data in a more meaningful way, such as storing the URL in a series of fields—specifically, using the fields PROTOCOL, HOST, PORT, and PATH? In this case, the PROTOCOL field would contain values such as HTTP or HTTPS, the HOST field would store asktom .oracle.com (and other host names), the PORT field would contain one of the standard port numbers used by typical HTTP, and the PATH field would be the unique bit. You have the ability to render the original URL in its entirety, so you haven’t lost any data. And you have the new ability to provide even better searching capabilities, because you have the components all split out. You also gain the ability to compress this data in the database very easily. (Oracle’s basic and advanced compression for tables works by removing repetitive attributes in a block. PROTOCOL, HOST, and PORT values would all repeat often.) And you would now have an attribute (PATH) that would be able to support a histogram nicely.
This is just another case where knowing how things work enables you to implement a solution more efficiently or at least justify why you have chosen to do something in a particular fashion.
</quote>
so, for example, if were storing file names - I would probably store the path in one attribute and the file name in another. That would give me two search fields - each with their own 32 character histogram. I might even go a step further and store "mount point", "path", "filename", "extension" - since mount point (the parent directory path) would repeat ALOT, path less often, filename less so, and so on. Giving us more attributes to gather statistics on.