Skip to Main Content
  • Questions
  • Index creation time estimation with XML Type data in the index

Breadcrumb

May 4th

Question and Answer

Connor McDonald

Thanks for the question, L�szl�.

Asked: January 11, 2019 - 5:20 pm UTC

Last updated: January 13, 2019 - 6:32 am UTC

Version: Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production

Viewed 1000+ times

You Asked

We have a few tables, each of them has a structure something like this:
CREATE TABLE "myTABLE" 
   ( "FIELD1" CHAR(36 BYTE) NOT NULL ENABLE, 
 "FIELD2_XML" "SYS"."XMLTYPE" , 
 "FIELD3" NUMBER DEFAULT 0 NOT NULL ENABLE, 
 "FIELD4" NUMBER(*,0)
   );



"FIELD2_XML" contains XML data – the main data in the table – of course in various length.

There is function based composite index on each table like this:
CREATE INDEX "myHASH_INDEX" ON "myTABLE " 
(
RAWTOHEX("CREATEHASH_VARCHAR2"("FIELD1"))||
RAWTOHEX("CREATEHASH_XMLTYPE"("FIELD2_XML"))||
RAWTOHEX("CREATEHASH_VARCHAR2"("FIELD3"))||
RAWTOHEX("CREATEHASH_VARCHAR2"("FIELD4")), 
"FIELD1", "FIELD3", "FIELD4");



Createhash_Varchar2(text) practically means a
DBMS_CRYPTO.HASH(Utl_Raw.Cast_To_raw(text),3);

function, while Createhash_Xmltype() practically means a
Dbms_Crypto.Hash(Text.getBlobVal(NLS_CHARSET_ID ('UTF8')) ,3);

function.

The index is used to speed up hash calculation algorithm, in which the whole tables (each row of them) must be considered.

The question is: If we know, how long does it take to create that indices on TEST environment, how can we estimate the creation time on the PRODUCTION environment? (we can suppose same HW/SW components, configuration settings etc. of the TEST and PROD systems). Of course, there is a significant difference on the number of the rows in the table and size of the segments.

What I have already tried?
Try1:
EXPLAIN PLAN FOR
CREATE INDEX …..
SELECT * FROM TABLE(dbms_xplan.display);


Unfortunately its result and the real result measured on the TEST system are very far from each other ☹….therefore I’m skeptic to us this estimation on PROD system.

Try2:
I determined total size of the source data to be read for the index calculation
select bytes from user_segments where segment_name='myTABLE';

For XMLTYPE column
select bytes from user_lobs, user_segments joined by table_name filtered by table_name;

and the size of the index is also known after the creation
select bytes from user_segments where segment_name='myHASH_INDEX';


The elapsed time can be considered as sum of read and write parts. Well, I know, that there is a calculation time as well, but I wanted to keep the model simple.

I executed the create index command many times on many tables and measured the elapsed time. Knowing the data to be read and to be written, I tried to calculate a kind “Read Speed” and the “Write Speed” values.
After that I used them to estimate the execution time using the data/segment size of the PRODUCTION system …..But the time which I got was unrealistic long…… -> I think, this model cannot be used…..

What can I do? How can I get a realistic execution time estimation?


Thanks in advance,
L. Károly



and Connor said...

Index creation is pretty much:

A- scan the data
B- sort the data
C- write the new segment

and in your case, "scan the data" will include an execution of each of your functions.

So in terms of estimation, you could do:

For (A) and (B) run a series of:

select
RAWTOHEX("CREATEHASH_VARCHAR2"("FIELD1"))||
RAWTOHEX("CREATEHASH_XMLTYPE"("FIELD2_XML"))||
RAWTOHEX("CREATEHASH_VARCHAR2"("FIELD3"))||
RAWTOHEX("CREATEHASH_VARCHAR2"("FIELD4"),"FIELD1", "FIELD3", "FIELD4"
from mytable
order by 1,2,3,4


and time those and extrapolate out. The scan time will scale pretty much linearly, but note that the sort time is perhaps less predictable because it will depend on temp segment performance etc.

For (C), you know that the index size will be approx ( keysize + rowid) * rows. Pad that out by say 20% and see how fast you can write that much data to a tablespace.


But is there any obstacle to creating the index with the ONLINE clause? The duration of build time is much less of an issue then no?

Is this answer out of date? If it is, please let us know via a Comment

More to Explore

VLDB

If you are new to partitioning, check out Connor McDonald's introduction series here.

VLDB

Documentation set on VLDB and Partitioning.