Thanks for the question, goran.

Asked: December 06, 2003 - 11:17 am UTC

Last updated: October 06, 2020 - 6:33 am UTC

Version: 817

Viewed 50K+ times! This question is

You Asked

Hi Tom,

Here is a problem:
I have snapshot of one of our big tables and wanted to test how will be the performance of that table if it is partitioned.
OK, here is a model:
New partitioned table with name orig_pos_histories_01_new is partitioned per column opoh_valid_from on a monthly basis.
Old table called orig_pos_histories_orig. Tables have around 3Mil. records.
So, when I execute the following queries, the query with old table runs a little bit faster than with the new table.
Here is a queries and explain plans:

For new one:
select *
from orig_position_histories_new
where opoh_valid_from = to_date('31.05.2002', 'dd.mm.yyyy');

EP:
Operation Object Name Rows Bytes Cost Object Node In/Out PStart PStop

SELECT STATEMENT Optimizer Mode=CHOOSE 416 7
PARTITION RANGE SINGLE KEY KEY
TABLE ACCESS BY LOCAL INDEX ROWID ORIG_POSITION_HISTORIES_NEW 416 1 M 7 KEY KEY
INDEX RANGE SCAN OPOH_NEW_LOC_VF 416 1 KEY KEY

Old one:
select *
from orig_position_histories_orig a, ids.orig_validity_dates b
where a.opoh_ovld_id = b.ovld_id
and b.ovld_validity_date = to_date('31.05.2002', 'dd.mm.yyyy');
Note:
this query have to be like this (with joij), old database desing

EP:
Operation Object Name Rows Bytes Cost Object Node In/Out PStart PStop

SELECT STATEMENT Optimizer Mode=CHOOSE 182 K 17825
NESTED LOOPS 182 K 33 M 17825
TABLE ACCESS FULL ORIG_VALIDITY_DATES 22 1 K 5
TABLE ACCESS BY INDEX ROWID ORIG_POSITION_HISTORIES_ORIG 3 M 346 M 810
INDEX RANGE SCAN OPOH_OVLD_FK_ORIG_I 3 M 48

So, what I can see is that new query have a cost of 7 and old one cost of 18000 and also all other parameters of new query are better that old one, but old one somehow still run almost with the same speed like new one. So, it seems to me that partition will not give me better performance.
I can not understand the reason of this behavior.

Thanks in advance.

Goran

and Tom said...

Well, the myth you have to expel is "partitioning is like setting fast=true"

(i wrote about this in both my books -- expert one on one oracle and more recently in Effective Oracle by Design -- you might be very interested in the full discourse on this)

Partitioning is a tool.

It is a tool to make administration easier (divide and conquer)

It is a tool to make availability higher (one of your partitions could take a hit and the others stay up and running while you repair one. many people might well never know there was a failure)

It is a tool that on a transactional system with lots of indexed reads you need to employ carefully to ensure you do not NEGATIVELY impact performance. It is a tool on a warehouse with lots of "big operations" (full scans for example) you can use to significantly reduce run times (assuming you've DESIGNED to exploit that of course).

Look at the physics behind the data here. You take a table and partition it. To access the rows by that (assuming local) local index on "opoh_valid_from" Oracle must:

a) determine what partition to access
b) index range scan that partition(s)
c) table access by index rowid the real data

Unpartitioned Oracle must:

a) index range scan that index
b) table access by index rowid the real data

There is no net reduction in work here! (maybe, just maybe if the index height in the partitioned index is reduced -- you may remove a logical in the partition case but that would only show up as a benefit "in the long haul" over many queries executed simultaneously -- or over hundreds/thousands of queries in sequence)

Your transactional system with this keyed read may well now benefit from concurrent inserts/updates and deletes -- given that you have N index segments (n root blocks to contend over, n branch blocks to contend over -- not just ONE) and N table segments (N freelists at least, increased concurrency).

You have to be careful in many cases that partitioning doesn't KILL YOU performance wise. That you saw no net change is a good thing here -- it is not "fast=true" for these types of queries (although if you do it wrong, it can easily, easily mean slow=sure)

Rating

(30 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Review

Goran, December 07, 2003 - 6:36 am UTC

Hi Tom,
Thanks for answer.
Yes, what you said I have already experienced. But, what badger me is the following:
Do you have from your experience case whare for example up to the certain number of records queries on a non-partitioned table have better performance than on a partitioned but, when number of records in table are bigger then some number, in that case performance on partitioned table are better. If so, could you give me example of such table?

Thanks a lot!

December 07, 2003 - 8:56 am UTC

No not really, do you?

Look -- if you partition intelligently, you can make it so that performance is either

o not really affected
o tiny bit better
o dramatically better (warehouse, bulky big queries, not single row accesses)

If you do it "wrong", just turn it on you can make performance go down the tubes.

None of it has to do with "n" records.

All of it has to do with how the data is organized, the physics of the data.

Review

goran, December 07, 2003 - 10:03 am UTC

Thanks Tom for answer.
Yes, you are right, the physics of the data is the key.
Partitoning is a hot thing. If one does not use it correctly, will have a lot of pain.

Thanks a lot once more Tom!

Cheers!!!

just cross my mind

Goran, December 07, 2003 - 3:32 pm UTC

Hi Tom,
Once more I with my problem...)))
This thinking just cross my mind and I would like to hear your opinion.
Let us say that - in above mentioned case - that I do not have local index on partition. It will mean that when I run the query, DB will do a full table scan on particular partition depending on a date. On other hand on a non-partitioned table DB will still do range index scan. In both cases result set is about 200000 records.
If tables grows up from 3Mil to 30Mil, partition full table scan will not be affected by table growth - DB will find partition for particular date and do full table scan for 200000 i.e. partition full scan is not affected by table growth.
For non-partitioned table, when table grow up from 3Mil to 30Mil, DB have to find using index three 200000 records out of 30Mil. records.
If my logic is correct, that will means partition query should run faster or I am wrong.

December 07, 2003 - 7:15 pm UTC

why wouldn't the partitioned table "still do index range scan" as well?

I'm not "following the logic" here.

the LIO's to get 20,000 records out of 30million or 3million is "about the same"

the LIO's to full scan a partition that did not grow in size will be "exactly the same"

data distribution

Goran, December 08, 2003 - 6:00 am UTC

Thing is in data distribution. Partitions are on monthly basis, but data distribution per months is something like this:
Example:
from 01th to 30th in month total number of records per month is 30%. The rest of 70% comes on 31th.
If the table is partitioned and I say give me all records for date 31 (and it is let us say 200000 records) the execution time will be the same regardless of table size - this is clear.
On other hand, for plain table where index exist for date column, I was wondering whether index selectivity will remain the same for table that have 3Mil or 30Mil. rec. if index have to return me 200000 records with the same data distribution.
If I have understand you correctly, you said that:
"the LIO's to get 20,000 records out of 30million or 3million is 'about the same'"
lead me to conclusion that index scan no matter if he returning single column or some big data set will always be faster than partitioning. It is clear to me that number of index iterations between 3Mil and 30Mil table is not big,
but I just had in mind that maybe, just maybe when index have to return bulky number of records, index performance will decrease a lot.

December 08, 2003 - 6:44 am UTC

You are getting yourself confused.

partitioning is a tool

partitioning is NOT fast = true

I can partition your tables and make them behave the same as your single non-partitioned table performance wise.

Partitioning will not necessarily speed up, slow down or whatever your performance.

It is a tool to make things easier to admin, more available, more concurrent, faster in warehouses with lots of smaller full scans due to partition elimination and partition wise joins.

It is false to say:

lead me to conclusion that index scan no matter if he returning single column or
some big data set will always be faster than partitioning.

A little suggestion to Goran

Christo Kutrovsky, December 08, 2003 - 12:09 pm UTC

Goram,

You said that from about ~200'000 rows for a month, 30% are from 1 to 30th, and 70% at 31st.

Why don't you partition a little bit more, and have 2 partitions per month. One for 1-30 (or whatever the end date of the month) and another for the last day of the month.

That way, if you query for 31st, you will have a FTS of a single partition, which is much better then a index range scan.

P.S.
Don't forget to gather statistics on the partition level. You could eventually benefit from histograms also, as your data is probably skewed. Use:
EXECUTE DBMS_STATS.GATHER_TABLE_STATS(ownname => 'lala', tabname => 'lala', method_opt => 'FOR ALL COLUMNS SIZE AUTO', cascade => true);

When to partition

Richard, December 08, 2003 - 7:10 pm UTC

Goran -

According to the instructor of my Oracle 9i Implement Partitioning course, 5GB is about the size you should look for before implementing table partitioning, however size is just one of the variables to review. Tom brings up several valid points above. If performance is your issue you need to review the concept of "partitioning pruning" where the CBO will actually eliminate partitions from the partition access list.

December 09, 2003 - 6:10 am UTC

did anyone ask them "why, why 5gig, what was the science behind that number. why not 4gig?

what if I have a DW with a 4 gig table -- with 4 regions and many full scans -- but by region. would not 4 1gig segments make perfect sense for that??

what if I have an audit trail. must keep 12 months online. must roll off oldest month every month. I generate about 1.5 million audit records a month, 100 bytes per record. about 2gig for a year. would not partitioning make perfect sense for me??

size is not a relevant variable if you ask me, at least not on the low end. Maybe a more accurate statement would have been "If your segments are getting into the 5 gig and above range, partitioning is something you want to start looking at for administrative and availability reasons -- if you haven't started looking at it yet"

always question authority, push people to back up their statements, regardless of who is saying it.

Review

Goran, December 09, 2003 - 7:18 am UTC

I think that Tom is right.
Number of records in table does not play role whether the table should be partitioned or not. For a plain table if we are retrieving via index scan 200000 records out of 3Mil. or 30Mil. table (as Tom already said) is similar. So, no big improvement on that level. As Tom already said, the physics of the data is the key.
Basically, what I have tested and sow from the test, performance improvement I will have (in my case) only for bulky big queries where instead of index scan we have full table scan.
Local index scan gave me no big performance improvement - for some cases a little bit better for some the same exec. time.
Here I would like to ask Tom a question.
I have found in documentation that local indexes can improve performance since many index partitions can be scanned in parallel by range queries on the index key. On the other hand, on some other topic here I have found that you said something like local index scan can be very expensive if oracle have to scan more than one partition.
I will try to give you an example:
If table is partitioned per date column and on each partition I have local index on columns (opoh_valid_from, opoh_sdsy_id) and if I execute query like this:

select *
from orig_pos_hist_new
where opoh_valid_from < to_date('31052002', 'ddmmyyyy')
and opoh_sdsy_id = 123

execution plan is:

Operation Object Name Rows Bytes Cost Object Node In/Out PStart PStop

SELECT STATEMENT Optimizer Mode=CHOOSE 15 K 116
PARTITION RANGE ITERATOR 1 KEY
TABLE ACCESS BY LOCAL INDEX ROWID ORIG_POSITION_HISTORIES_NEW 15 K 37 M 116 1 KEY
INDEX RANGE SCAN OPOH_NEW_SDSY_VF_LI 15 K 85 1 KEY

So, what I can see is that database is doing local index scan for more then one partition.
Question: Is it done in parallel (automatically) or serial i.e. index scan on each partition (which is qiute expensive)?

December 09, 2003 - 8:19 am UTC

it can be done in parallel using parallel query but it will default to serial which for most index cases you'll find faster then setting up parallel!! (eg: index implies few row *in general* and for few rows, PQ is too expensive)

In your case, there are lots of rows, PQ might be called for IF and ONLY IF you have excess capacity on your machine (can afford the cost of PQ)

for Christo

Goran, December 09, 2003 - 7:20 am UTC

Thanks Christo,
I did already something simillar.
Anyway, you have good way of thinking.

Thanks,

Goran

Carmen, March 04, 2004 - 3:41 pm UTC

Oracle9i multiple slaves can act on each partition

A reader, November 29, 2004 - 6:36 am UTC

Tom,

" In Oracle9i multiple slaves can act on each partition increasing performance where there is data skew between partitions."

I came across the above lines. If we need to have multiple slaves act on each partition - what is that required?

Does oracle take care of this internally by itself? - Please let me know

Thanks

November 29, 2004 - 8:26 am UTC

Oracle has always dynamically partitioned PQ slaves transparently, you do not influence it at all.

Surprising performance improvement after partitioning

Pinto, May 06, 2005 - 7:00 am UTC

Hello Tom,
I was analyzing to check if partitioning will help in improving performance in a big way for us. I have read your chapter on partitioning in expert one-on-one and i was not expecting the results that i got. Can you please help understand why exactly i am getting this result.
table creation script is
CREATE TABLE ADS_AlarmObject (
AlarmID NUMBER(10) NOT NULL,
NEName VARCHAR2(60) NOT NULL,
IntProcStatID NUMBER(1) NOT NULL,
ClearabilityID NUMBER(1) NOT NULL,
IsConfirmable NUMBER(1) NOT NULL,
IsAlarm NUMBER(1) NOT NULL,
IsSecurityAlarm NUMBER(1) NOT NULL,
AlarmTypeID NUMBER(1) NOT NULL,
MaintenanceManualKey VARCHAR2(255),
ManagedObjectClass VARCHAR2(255),
ManagedObjectInstance VARCHAR2(255),
EventTypeID NUMBER(3) NOT NULL,
EventTime DATE NOT NULL,
ProbableCauseID NUMBER(4) NOT NULL,
SpecificProblems VARCHAR2(255),
SeverityID NUMBER(1) NOT NULL,
ProposedRepairAction VARCHAR2(255),
NotificationID NUMBER(10),
AdditionalText VARCHAR2(2000),
SecurityAlarmDetector VARCHAR2(255),
ServiceUser VARCHAR2(255),
ServiceProvider VARCHAR2(255),
CookieID NUMBER(5) DEFAULT -1,
AlarmSourceID NUMBER(5) DEFAULT -1,
IsLOBInCookie NUMBER(1) DEFAULT 0,
SyncState NUMBER(1) DEFAULT 0,
UsedSegID NUMBER(10) NOT NULL,
CONSTRAINT ADS_PK_AO PRIMARY KEY (AlarmID)
)
PARTITION BY RANGE (UsedSegID)
(
PARTITION SegA001 VALUES LESS THAN (1),
PARTITION SegA002 VALUES LESS THAN (2),
PARTITION SegA003 VALUES LESS THAN (3),
PARTITION SegA004 VALUES LESS THAN (4),
PARTITION SegA005 VALUES LESS THAN (5),
PARTITION SegA006 VALUES LESS THAN (6),
PARTITION SegA007 VALUES LESS THAN (7),
PARTITION SegA008 VALUES LESS THAN (8),
PARTITION SegA009 VALUES LESS THAN (9),
PARTITION SegA010 VALUES LESS THAN (10)
)
ENABLE ROW MOVEMENT
/
CREATE INDEX ADS_AO_NN ON ADS_AlarmObject(NEName);
inserted around 1 million records into this table with each partition having roughly 100,000 records. I created a similar table but without the partition and inserted same data into that table also.
Most of our query is based on nename and there is an application logic which will make sure that same nename will have same UsedSegID which is the parition key.
if i run a query like this
select * from ads_alarmobject where nename='NE3' and intprocstatid=5;
It returns roughly 11,000 records.
The explain plan for query on both partitioned and non partitioned table are exactly the same
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=83 Card=1 Bytes=410)
1 0 TABLE ACCESS (BY GLOBAL INDEX ROWID) OF 'ADS_ALARMOBJECT'
(Cost=83 Card=1 Bytes=410)

2 1 INDEX (RANGE SCAN) OF 'ADS_AO_NN' (NON-UNIQUE) (Cost=3 C
ard=100000)

but non-partitioned table takes 17 seconds to return the data while partitioned table takes only 3 seconds.
can you please explain why the partioned table query is so fast even though the explain plan is indentical (cost, card everything is same)
I used the following script to insert data into the table
declare
ne_id number:=1;
begin
for i in 1 .. 1000000 loop
insert into ads_alarmobject values(i,'NE'||(mod(i,10)+1),mod(i,9),1,3,2,2,2,
'hjkdhfgkseriouteiorukdhgkldfhjgl;dfkgl;dgkjdfklg',
'dfhjkghreutyeuityeruityierutyuiertuioerutperio',
'dhfgjhdfjgherutyuierytijfhgjdfdfg',
234,sysdate,423,
'dfhsgjkhdgjkertuerytuierytuiyeriut',
1,
'shdgjdfheruytuierytuiert',
12343,
'djsghdfjkghsdjhgeruityeruityuiqwyetughadbshfdgsaftrweyryugfsdhgfasdhgfhj',

'dfjghdfjghwerteruitiuioerut',
'dfkgjdkfgjdfrtoioiouio',
'xcnb,xc,vncbghdfghdfguydfuig',234,4544,1,1,mod(i,10)+1);
if (mod(i,100000)=0) then
commit;
end if;
end loop;
end;
if i look at the traces generated, there are more physical reads for non-partitioned table qry and db file sequential read is the wait event. but why more physical read in non-partitioned table?
Thanks and regards,
Pinto.

May 06, 2005 - 8:04 am UTC

lets see the tkprofs of each.

(and why more PIO on one and not the other ? because one was already in the cache and the other was not -- you did run this multiple times right?)

Surprising performance improvement after partitioning

Pinto, May 09, 2005 - 2:23 am UTC

Hello Tom,
I have run it multiple times.
For non-partitioned table, the time taken does not decrease for the second and the third run. Always its almost the same.
For partitioned table, there is a big difference between the first and the subsequent runs

tkprof for non-partitioned table (First 3 runs)
***********************************************
SELECT *
FROM
ADS_ALARMOBJECT WHERE NENAME='NE3' AND INTPROCSTATID=5

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 3 0.01 0.00 0 0 0 0
Execute 3 0.00 0.00 0 0 0 0
Fetch 33336 7.09 55.45 177566 224517 0 33333
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 33342 7.10 55.46 177566 224517 0 33333

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 25 (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
11111 TABLE ACCESS BY INDEX ROWID ADS_ALARMOBJECT
100000 INDEX RANGE SCAN ADS_AO_NN (object id 7148)

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 177566 0.02 50.64
********************************************************************************

tkprof for non-partitioned table (Just the 4th Run)
***************************************************
SELECT *
FROM
ADS_ALARMOBJECT WHERE NENAME='NE3' AND INTPROCSTATID=5

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 11112 2.68 18.84 59177 74839 0 11111
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 11114 2.68 18.84 59177 74839 0 11111

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 25 (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
11111 TABLE ACCESS BY INDEX ROWID ADS_ALARMOBJECT
100000 INDEX RANGE SCAN ADS_AO_NN (object id 7148)

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 59177 0.04 17.24
********************************************************************************

tkprof for partitioned table (First 3 runs)
***********************************************

SELECT *
FROM
ADS_ALARMOBJECT WHERE NENAME='NE3' AND INTPROCSTATID=5

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 3 0.01 0.02 2 85 0 0
Execute 3 0.00 0.00 0 0 0 0
Fetch 33336 0.90 4.08 6356 83616 0 33333
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 33342 0.92 4.10 6358 83701 0 33333

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 26 (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
11111 TABLE ACCESS BY GLOBAL INDEX ROWID ADS_ALARMOBJECT PARTITION: ROW LOCATION ROW LOCATION
100000 INDEX RANGE SCAN ADS_AO_NN (object id 7173)

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 6356 0.02 3.28
********************************************************************************

tkprof for partitioned table (Just the 4th Run)
***************************************************

SELECT *
FROM
ADS_ALARMOBJECT WHERE NENAME='NE3' AND INTPROCSTATID=5

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 11112 0.21 0.21 0 27872 0 11111
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 11114 0.21 0.22 0 27872 0 11111

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 26 (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
11111 TABLE ACCESS BY GLOBAL INDEX ROWID ADS_ALARMOBJECT PARTITION: ROW LOCATION ROW LOCATION
100000 INDEX RANGE SCAN ADS_AO_NN (object id 7173)

*************************************************************

Thanks and Regards,
Pinto.

May 09, 2005 - 7:45 am UTC

only thing I can think is that by partitioning by usedsegid -- you accidently or by design got all of the 'NE3' records nicely clustered together whereas before they where not.

To show you how this would affect it, I created your tables:

CREATE TABLE ADS_AlarmObject_nonpart
(
    AlarmID                 NUMBER(10)      default 1 NOT NULL,
    NEName                  VARCHAR2(60)    NOT NULL,
....
    UsedSegID        NUMBER(10)    NOT NULL
)
/
                                                                                                                                                                                                                                             
                                                                                                                                                                                                                                             
CREATE TABLE ADS_AlarmObject_part
(
    AlarmID                 NUMBER(10)      default 1 NOT NULL,
    NEName                  VARCHAR2(60)    NOT NULL,
....
    UsedSegID        NUMBER(10)    NOT NULL
)
    PARTITION BY RANGE (UsedSegID)
    (
        PARTITION SegA001 VALUES LESS THAN (1),
        PARTITION SegA002 VALUES LESS THAN (2),
        PARTITION SegA003 VALUES LESS THAN (3),
        PARTITION SegA004 VALUES LESS THAN (4),
        PARTITION SegA005 VALUES LESS THAN (5),
        PARTITION SegA006 VALUES LESS THAN (6),
        PARTITION SegA007 VALUES LESS THAN (7),
        PARTITION SegA008 VALUES LESS THAN (8),
        PARTITION SegA009 VALUES LESS THAN (9),
        PARTITION SegA010 VALUES LESS THAN (10)
    )
    ENABLE ROW MOVEMENT
/
                                                                                                                                                                                                                                             
insert /*+ APPEND */ into ads_alarmobject_nonpart( usedsegid, nename )
select mod(rownum,10), 'NE' || mod(rownum,10)
  from big_table.big_table
 where rownum <= 1000000;
commit;
insert /*+ APPEND */ into ads_alarmobject_part (usedsegid, nename )
select usedsegid, nename
  from ads_alarmobject_nonpart;
commit;
                                                                                                                                                                                                                                             
create index ads_ao_nn_part on ads_alarmobject_part(nename);
create index ads_os_nn_nonpart on ads_alarmobject_nonpart(nename);
                                                                                                                                                                                                                                             
ops$tkyte@ORA9IR2> update ads_alarmobject_part set intprocstatid = 5 where nename = 'NE3' and rownum <= 11111;
 
11111 rows updated.
 
ops$tkyte@ORA9IR2> update ads_alarmobject_nonpart set intprocstatid = 5 where nename = 'NE3' and rownum <= 11111;
 
11111 rows updated.

exec dbms_stats.gather_table_stats( user, 'ADS_ALARMOBJECT_NONPART', cascade=>true );
exec dbms_stats.gather_table_stats( user, 'ADS_ALARMOBJECT_PART', cascade=>true );


<b>but I defaulted everything to be a 1, sysdate or rpad('*',40,'*') and left the primary key off (not relevant to the example).

Note, how I just happened to have put the NE3 data together, it is all in sega004</b>


Now, I run your queries -- once no tracing, once with, results:



SELECT * FROM ADS_ALARMOBJECT_part WHERE NENAME='NE3' AND INTPROCSTATID=5
                                                                                                                                                       
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch      742      0.65       0.69       6486       7952          0       11111
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      744      0.65       0.69       6486       7952          0       11111
                                                                                                                                                       
Rows     Row Source Operation
-------  ---------------------------------------------------
  11111  TABLE ACCESS BY GLOBAL INDEX ROWID ADS_ALARMOBJECT_PART PARTITION: ROW LOCATION ROW LOCATION (cr=7952 r=6486 w=0 time=624270 us)
 100000   INDEX RANGE SCAN ADS_AO_NN_PART (cr=1008 r=237 w=0 time=134169 us)(object id 38599)
********************************************************************************
SELECT * FROM ADS_ALARMOBJECT_nonpart WHERE NENAME='NE3' AND INTPROCSTATID=5
                                                                                                                                                       
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch      742      2.93       3.02      61489      64090          0       11111
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      744      2.93       3.02      61489      64090          0       11111
                                                                                                                                                       
Rows     Row Source Operation
-------  ---------------------------------------------------
  11111  TABLE ACCESS BY INDEX ROWID OBJ#(38587) (cr=64090 r=61489 w=0 time=2936385 us)
 100000   INDEX RANGE SCAN OBJ#(38600) (cr=952 r=211 w=0 time=133005 us)(object id 38600)


<b>and these results are directly due to the implicit relationship between the nename='NE3' and the usedsegid I generated in my test data, could that be the case here?  Your partitioned table is accidently "very well clustered" and your non partitioned table is worst case clustered??  (I'm guessing YES)

Also, you know that index is the wrong index, it should be a concatenated index on both columns.  when you see 1000000 rows flowing out of an index set and 11111 rows flowing out of the table access step -- that means you want to add a column or two to the end in general</b>

Funny, I was just rewriting this section in Expert One on One Oracle over the weekend, here is the original explanation of clustering factor to go with the example -- to further explain "why"

<quote>

There are factors of course that change these calculations (aren't there always?).  Suppose you have a table such that the rows have a primary key populated by a sequence.  As data is added to the table, rows with sequential sequence numbers are in general "next" to each other.  They table is naturally clustered in order by the primary key (since the data is added in more or less that order).  It will not be strictly clustered in order by the key of course (we would have to use an IOT to achieve that) - but in general rows with primary keys that are close in value will be "close" in physical proximity.  Now when you issue the query: "select * from T where primary_key between :x and :y" - it will be the case that the rows you want are typically co-located on the same blocks.  Here an index range scan may be useful even if it accesses a large percentage of rows simply because the database blocks we need to read and re-read will most likely be cached - since the data is co-located.  On the other hand, if the rows are not co-located using that same index may be disastrous for performance.  A small demonstration will drive this fact home.  We'll start with a table that is pretty much ordered by its primary key:

tkyte@TKYTE816> create table colocated ( x int, y varchar2(2000) ) pctfree 0;
Table created.

tkyte@TKYTE816> begin
  2    for i in 1 .. 100000
  3    loop
  4           insert into colocated values ( i, rpad(dbms_random.random,75,'*') );
  5    end loop;
  6  end;
  7  /
PL/SQL procedure successfully completed.
 
tkyte@TKYTE816> alter table colocated    
  2  add constraint colocated_pk primary key(x);
Table altered.

This is a table fitting the description we laid out above - about 100 rows/block in my 8k database.  Here there is a very good chance that the rows with x = 1, 2, 3 are on the same block.  Now, we'll take this table and purposely "disorganize" it.  In the colocated table above, we created the Y column with a leading random number - we'll use that fact to "disorganize" the data - so it will definitely not be ordered by primary key anymore:

tkyte@TKYTE816> create table disorganized nologging pctfree 0
  2  as
  3  select x, y from colocated ORDER BY y
  4  /
Table created.
 
tkyte@TKYTE816> alter table disorganized 
  2 add constraint disorganized_pk primary key(x);
Table altered.

Arguably these are the same table - it is a relational database, physical organization plays no bearing on how things work (at least that's what they teach in theoretical database courses).  In fact, the performance characteristics of these two tables are night and day different given the same exact query:

tkyte@TKYTE816> select * from COLOCATED where x between 20000 and 40000;
20001 rows selected.

Elapsed: 00:00:01.02

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'COLOCATED'
   2    1     INDEX (RANGE SCAN) OF 'COLOCATED_PK' (UNIQUE)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       2909  consistent gets
        258  physical reads
          0  redo size
    1991367  bytes sent via SQL*Net to client
     148387  bytes received via SQL*Net from client
       1335  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      20001  rows processed

tkyte@TKYTE816> select * from DISORGANIZED where x between 20000 and 40000;
20001 rows selected.

Elapsed: 00:00:23.34

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'DISORGANIZED'
   2    1     INDEX (RANGE SCAN) OF 'DISORGANIZED_PK' (UNIQUE)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      21361  consistent gets
       1684  physical reads
          0  redo size
    1991367  bytes sent via SQL*Net to client
     148387  bytes received via SQL*Net from client
       1335  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      20001  rows processed
 
That is pretty incredible (I think so).  What a difference physical data layout can make.  To summarize the results:

    Elapsed Time    Logical I/O
Colocated    1.02 seconds    2,909
Disorganized    23.34 seconds    21,361

In my database using an 8k block size - these tables had 1,088 total blocks apiece.  The query against disorganized table bears out the simple math we did above - we did 20,000 plus logical I/Os.  We processed each and every block 20 times!!  On the other hand, the physically colocated data took the logical I/Os way down.  Here is the perfect example of why rules of thumb are so hard to provide - in one case, using the index works great, in the other - it stinks.  Consider this the next time you dump data from your development system and load it into production - this may very well be part of the answer to the question that typically comes up "why is it running differently on this machine - they are identical" (they are not identical).  

</quote>

Surprising performance improvement after partitioning

Pinto, May 10, 2005 - 9:05 am UTC

Thanks a lot for that explanation.
Yes, your guesses are correct.
The application was designed such a way so that the same NE data is put in same partition. But this big a performance improvement was not expected.

btw, are you going to release a new edition of your expert one-on-one? if so, when is it due? it is by far one of the best oracle books i have read.
Thanks and Regards,
Pinto.

May 10, 2005 - 9:25 am UTC

i'm working on the second edition now, yes, it'll be out this year.

You accidently "clustered" the data.

We could achieve the same without partitioning using

b*tree clusters
hash clusters
index organized tables

as well (goes to show that these structures can be very useful)

speed and partitioning

varun verma, July 19, 2007 - 4:55 am UTC

hi tom

As you stated initially in your reviews that partitioning does not neccessarily help in speeding up of queries what can be an alternative if in case you want to speed up query fetches. FOr instance i have a set of indexes assigned to a table'; when i search in for a particular data a lot of unwanted data gets traversed whcih i want to avoid.

In lieu of the above i partitoned the table ( into 10 partitions) depending upon a column(say date of birth) and
now when i write in a stmnt

" select * from table_name where dob > 11-aug-1990 ", will there be any effect on the performance of the query or will the dob's that are less than 11-aug-1990 will also be traversed ?

Please give me any other alternative in case i want to sped up the above search !

Statistics collection on partitions

Yong, March 05, 2008 - 1:59 pm UTC

Tom,

We have a fact table having over 200 partitions and 4000 subpartitioins! We are experiencing performance problem on data load. The fact table has 40million records. Do you think the table are over partitioned? plus, our developer create a primary key on the fact table, so that each time a partition is truncated, we have to rebuild the global indexes, the data loading is become slower and slower and data grows.

The data loading process is also including statistics collections on the fact table.
If we do a partition exchange and collect statistics on the parition table, do we have to collect statistics on the fact table after the partition exchange?
what will happen to the statistics on the partition, if we collect statistics on the fact table after the partition exchange?

thanks,

March 06, 2008 - 7:42 am UTC

... We are
experiencing performance problem on data load. ..

I won't hazzard a guess, you should trace it and/or gather relevant v$ information to see what it is bottlenecked on.

Look to the obvious problems - where you are spending time. And you don't have to rebuild global indexes, you could do the truncate and maintain the indexes. You'll want to

a) figure out where you spend time
b) concentrate on reducing that

it'll likely involve a complete overhaul to the approach of data loading and until you do that, no suggestions are really possible or in fact helpful

A reader, March 06, 2008 - 1:55 pm UTC

if there is a global index on a partitioned table, after truncating a partition, the global index become UNUSABLE.
Rebuilding global indexes takes 2 hours. After rebuilding the indexes, collect statistics takes another 2 hours.

Using partition exchange should be a best approach. Here is my suggestions

1) change global index to local index so that after truncating a partition, there is no need to rebuild global index
2) collect statistics on loading table before the partition exchange.

I have question on the statistic collection:

1) if we collect statistics on the loading table, after the partition exchange with a fact table, do we need to collect statistics on the main fact table? if there is no need to run statistics on the main fact table, it will save a lot time.

2) If after the partition exchange with statistics collected before the exchange, does oracle re-collect statistics on that partition?

thanks

March 07, 2008 - 5:21 pm UTC

... after truncating a
partition, the global index become UNUSABLE.
...

only because you wanted it to, not because it has to...

1) DO NOT EVEN THINK ABOUT THAT. You cannot willy nilly just say "change to local", you would likely KILL PERFORMANCE of all data retrieval. People use a global index *on purpose*, for a reason.

2) you might not even HAVE to gather statistics, totally depends. If you just wiped out a partition, and you only have local statistics, you might just need to wipe out that partitions statistics.

(wish you wouldn't use 1, 2 and 1, 2 - confusion abounds....)

1) you give entirely insufficient information to say anything. If you only gather local statistics ever - maybe not. If you gather global statistics for a reason - yes you would.

2) oracle only gathers statistics when you tell it to (or you run the default job that looks for stale or missing statistics)

But please - lose the idea of turning a global index into a local index until you understand WHY it is global in the first place. It will likely become obvious *why* is cannot possibly be a local index.

for example:

a) you hash partition in to 64 partitions a big table by "ID"

b) you query by last_name

if you local partition the index on last_name, you'll do 64 index range scans for "where last_name = :x"

if you globally partition the index on last_name - putting the A's into one partition, B's in another and so on, you'll index range scan ONE partition.

Please - do not ever suggest "no global indexes" or "change the global to local" - you would seriously damage your database.

ops$tkyte%ORA10GR2> CREATE TABLE t
  2  (
  3    dt  date,
  4    x   int,
  5    y   varchar2(30)
  6  )
  7  PARTITION BY RANGE (dt)
  8  (
  9    PARTITION part1 VALUES LESS THAN (to_date('13-mar-2003','dd-mon-yyyy')) ,
 10    PARTITION part2 VALUES LESS THAN (to_date('14-mar-2003','dd-mon-yyyy')) ,
 11    PARTITION junk VALUES LESS THAN (MAXVALUE)
 12  )
 13  /

Table created.

ops$tkyte%ORA10GR2> insert into t select to_date( '12-mar-2003','dd-mon-yyyy')+mod(rownum,3), user_id, username from all_users;

58 rows created.

ops$tkyte%ORA10GR2> create index t_idx on t(x) global;

Index created.

ops$tkyte%ORA10GR2> select status from user_indexes where index_name = 'T_IDX';

STATUS
--------
VALID

ops$tkyte%ORA10GR2> alter table t truncate partition part1 update global indexes;

Table truncated.

ops$tkyte%ORA10GR2> select status from user_indexes where index_name = 'T_IDX';

STATUS
--------
VALID

ops$tkyte%ORA10GR2> alter table t truncate partition part2 ;

Table truncated.

ops$tkyte%ORA10GR2> select status from user_indexes where index_name = 'T_IDX';

STATUS
--------
UNUSABLE

Regarding Data loading

anupam, May 20, 2008 - 12:42 am UTC

I have a finance table which gets loaded from a flat file 
which is generated every hour .This flat file is loaded with follwing 
processes
 
 1> create a temp table with the same structure as of fact table.
 2> Load the flat file into temp table with the help of sqlldr.
 3>create the index on the temp table having same defintion as of 
    main fact table . 
 4>create the partition into main fact table for the hour being loaded through 
   flat file . 
 5>exchange the temp table with the new partition created in main fact table.

 But now here comes the problem .

 Now along with the finance flat file we are supposed to load the accounts data 
 also in the same table.

 But the problem is there is a bit of time lag between getting the two flat files .
 Suppose we have 03 hrs accounts data and 01 hrs finanace data.We want a parrellel loading kind of thing .

 Right now i can't think of any optimised way of loding these two flat files 
 in parellel .Please help me in this regard.

 Thanks in advance

May 20, 2008 - 10:56 am UTC

I don't follow what the problem is here.

load each file into it's own table.
make each file it's own partition.

sort of like you have now - not sure what the stumbling block here is?

anupam, May 20, 2008 - 2:12 pm UTC

Target table is same where we will have to load the finance as well as accounts data.
Right now the way i have implemented is as follows

1> wait for both the files(accounts and finance) to be available of same hour(say both are of 03 hour).
2> load them into temp table.
3>create partition for 03 hour in main partition table .
4>exchange partition with temp table.

Now problem is in first step .Both the files for same hour will not be available same time .So wait time is huge.

Suppose I have finance data for 03 hour and accounts data for 01 hour .So there will be two files generated every hour for loading by upstream process.

I was thinking of spawning two parellel process for loading these files because i cant afford doing them one by one it will take lot of time.But in implementing this the problem is if at 03 hour i load 01 hour data of finace where in main table 01 hour accounts is already there then it wipes out the data for accounts because of exchange partition .

Plz give me some guidelines so that i can implement it properly.

Thanks in advance.

May 20, 2008 - 3:57 pm UTC

I still don't get the problem here.

You are right now deciding to wait until two file types arrive and then load them into a single partition and slide it in

why not just create a partition for each file as it arrives (as you were doing hourly, now you just have an extra file) and load

The "problem" is not clear here - I don't know what "issue" you have.

I don't even know what is involved in your "loading" process - if it is just a create table as select or insert into as select - just use parallel query and we'll parallel process for you if that makes sense.

A reader, May 21, 2008 - 1:37 am UTC

CREATE TABLE fact
(timeid NUMBER NOT NULL,
fileid NUMBER NOT NULL,
metric1 NUMBER NOT NULL,
metric2 NUMBER NOT NULL)
PARTITION BY RANGE (TIMEID)
(PARTITION PARTITION T_2008052100 VALUES LESS THAN (2008052101)
PARTITION PARTITION T_2008052101 VALUES LESS THAN (2008052102)
PARTITION PARTITION T_2008052102 VALUES LESS THAN (2008052103)
);

Step 1>Now I get to flat files from upstream process.
a>accounts.txt( for 04 hour )
b>finance.txt ( for 01 hour)
(Now i cant load them into single temp table and do exchange partiton because
fact table is partitioned based on hour)

Step2> Load Accounts temp

Create table accounts_temp
(timeid number,
fileid number,
metric1 number,
metric2 number
);

LOAD DATA
APPEND INTO TABLE accounts_temp
FIELDS TERMINATED BY " " OPTIONALLY ENCLOSED BY '"'
(
timeid,
fileid constant 1,
metric1 ,
metric2
)

Step3> Load finance temp.

Create table finance_temp
(timeid number,
fileid number,
metric1 number,
metric2 number
);

LOAD DATA
APPEND INTO TABLE finance_temp
FIELDS TERMINATED BY " " OPTIONALLY ENCLOSED BY '"'
(
timeid,
fileid constant 2,
metric1 ,
metric2
)

Step4> Create partiton in main fact to accomodate the accounts temp data .

if partiton exists then
null;
else
alter table fact add partition T_2008052104 values less than (2008052105);
end if;

step5> Create partiton in main fact to accomodate the finance temp data .

if partiton exists then
null; --timeid 2008052101 already exists .
else
alter table fact add partition T_<timeid> values less than (<timeid>+1);
end if;

step6>In final step we are using exchange partition.

Alter table fact exchange partition T_2008052104 with table accounts_temp including indexes;

Step7>
Alter table fact exchange partition T_2008052101 with table finance_temp including indexes;

Here for timeid 2008052101 ,accounts data was already loaded and now if we do exchange patition
it will wipe out the data for acconts and insert the data of finance .

Problem is that i wanted to use the exchange partition utility in last step for fast loading but not able to use because of
above problem .Is there any trick by which i can solve this problem .

Thanks in advance

May 21, 2008 - 8:36 am UTC

... (Now i cant load them into single temp table and do exchange partiton
because
fact table is partitioned based on hour)
...

so, change your partitioning scheme to suit your new requirements?

A reader, May 26, 2008 - 12:17 am UTC

Actually i dont want to disturb the partitioning schema which is right now in use.Because there were lots of consideration we used for this partitoninig schema...

Thanks and regards,
Anupam

May 27, 2008 - 7:51 am UTC

and your requirements have CHANGED - so, maybe time to rethink what you have. Think about that.

Rethink partitioning: real Business case

YukonKid, May 29, 2008 - 5:09 am UTC

If you let me contribute a real business case here to encourage you going with the partition option and avoiding pitfalls (rethinking strategies):

We have a SAP BW Datawarehouse - for god's sake it runs on Oracle 9i :-) .
In this warehouse we have a customer application that does costs calculation on actual
, budget and forecast figures on monthly basis , storing several versions
for different costs scenarios. One Scenarion is defined by the fields USAGE, YearMonth and Version.

1. For 1 scenario we have about 5 -11 Million rows (about 4 - 9 Gbyte of storage).

2. Everytime they calculate the Developers have done a boring DELETE to remove a scenario form the
unpartitioned table: 3- 5 hours runtime, other processes running in the warehouse were
competing about the resources (UNDO, CPU , I/O...)

3. They did this for a year and in the meantime this table became the biggest
in the Warehouse (140 Gybtes).

4. The ran into problems to deliver the data on time to the board (especially when
the calculation broke and they had to rerun (guess what: the Rollback put additional
stress on the database)

5. Project leader cried for the fast = true parameter (as Tom would state it...)
6. They detected it's not existing
7. They ask the DB experts (about 1,5 years after they start...)

8. We devloped the strategy:

* What should be the goal of partitioning the big table: Reducing writes or speed up queries?
(the administration benefit was a bonus we gave to the SAP Basis team .. but it was not the
main goal of our strategy).

It was clear that partitioning was the way to go: We had to reduce the writes to the database
(Undo generation). But we had to keep the runtime of the warehouse queries at least at the same speed
as before partitioning: There were only INDEX RANGE scans so no full table scans could be replaced by
full "partition scans". They did the indexing well so no big gain from partitioning on the query side.
A few queries select over several scenarios so a parallel INDEX RANGE scan on some partitions could
speed up somehow.
(btw: Bitmap indexes were not considered because the accesses on the big table was a mixed-use of OLTP and
datawarehouse processes updating and selecting from it)

* Finding the right partition key(s):

The key(s) should be able to identify 1 calculation scenario that we could truncate later
on the partition.
Because one scenario was defined by the fields USAGE, YearMonth,Version wich could be the
right key? Sure alltogether would do the trick...but we were running on an SAP Netweaver Application
server and we were limited somehow:
We could only use the partitioning options that Netweaver allowed us to use to
not loose support of the SAP system: RANGE on single column , no subpartitions possible.

The goal was to introduce a so called ID column - Number( 10,0) - that would identify
a scenario defined by USAGE, YearMonth,Version.

We discussed the topic and in the beginning the project leader refused to change the data model
because of the additional costs involved with the changes. They prefer the YearMonth as a
partition key - wich was more than suboptimal (it would make no sense):
Only a few scenarios were uniquely defined by the YearMonth field, more often you found sevaral scenarios
in one YearMonth. And looking into the future they wanted to use the version field
intensively: In one year the partition key would be completely senseless - wasting money!

We could convince him to do it right in the start and that he would waste money
keeping with the old model (this works always with your project leader: help him to
save money by spending money to be on the right track).

So the big table looked like that now:

ID (partition key) ,USAGE, YearMonth,Version ...other charcteristics , ...fact figures...,

* To migrate the table was not really difficult

* Partition elimination and artificial keys like our ID...

It was a MUST that the ID had to be in the WHERE clause of all queries selecting on the partitioned
table to get the partitions eliminated.
But how to tell the user to forget his well-known characteristics and look-up the ID value
pointing to his scenario he wanted to select from?

On the other hand there were a lot of ABAP Routines using the table as source for data retrievals
into Warehouse cubes, Web interfaces selecting form the table ...you name it.
A lot of changes to be done. But we want to keep them at a minimum and didn't want to force the user
to think in scenario ID's and let him use his business characteristics to select on.

In that case a reference table came in really handy (we needed it to inforce refrential integrity
with the calculations in the big table).

It looked like that:

ID ,USAGE, YearMonth,Version

What we did was to join the partitioned table to the reference table and put that in a view.
Now we used the View name instead of the table name in the Routines and queries (these changes were
quickly done).

The important test was: could the CBO eliminate partitions on the table when we select on the view
that would return the ID(s) to do this pruning?
We theoretically know it should do it (see Metalink note :209070.1 ("Partition Pruning based on Joins
to Partitioning Criteria Stored in Dimension Tables")
and the good news was : it did it very well...

* The Truncate on the partitioned table was really cool: Instead of waiting 3- 4 hours
we delete in zero time (15 seconds)

* Because we programmed an ABAP interface for the partition handling (add, split, drop,
truncate partition)the Developers didn't need the SAP Basis team and they don't have to care about creating the
partitions on time: It could all be handeled inside the ABAP program using this package.

* All were happy and we started the next project to create data life cycle management
(what is based on Partitioned tables) to reduce storage costs by moving historical
scenarios to low-cost storage devices -> saving more money: lucky project leader,
lucky developers, lucky DBA ;-)

access current partition

Reader, July 12, 2009 - 1:21 am UTC

We have RANGE partitioned tables (weekly partitions) using the date column as partition key column. Our application runs end of the day/week reports. We have created local index for all partitioned tables. Also, we have several composite global indexes created as well. Global indexes do not necessarily have the partition key column as a leading column. Is this a problem? How do we ensure that our query always accesses the current partition? We see in execution plan that it uses global index. However, I was hoping it would use local index to get to the current week partition. Our RDBMS is 9.2.0.7.0. Thanks.

July 14, 2009 - 5:39 pm UTC

... Global indexes do not necessarily have the
partition key column as a leading column. Is this a problem? ...

In general - no, of course not (else it wouldn't be permitted)

In your specific case? Maybe, but we have no idea. Probably not.

... How do we ensure
that our query always accesses the current partition? ...

define to me what a 'current partition' is, there is no such concept. Your where clause tells us what data you need to access and we access that data. If you supply a where clause that gets you to your current partition - we'll be OK.

As for the last bit about the global index - so?

Say you range partition by DT and each partition has 30 days of data (30 unique dates) in it. And say each partition has 1,000,000 rows. (so about 33,333 rows/dt)

Say you have a global index on last_name. It is range partitioned on the last name (there are 26 partitions - one for each letter of the alphabet)

Now, suppose you say:

select * from t where dt = :dt and last_name = :ln;

I would presume that last name is in general 'selective' (forget about outliers like JONES and SMITH - think 'KYTE').

I would expect that query to return maybe 1 to 10 rows because there at 10 people with the same last name in ALL partitions.

If I use the index on DT, I'd have to hit the table 33,333 times to find the 10 rows.

If I use the index on last_name, I'd have to hit the table 1 to 10 times to find the rows.

which makes more sense now?

global index

Reader, July 14, 2009 - 9:22 pm UTC

With reference to your above comments...Do you recommend global index also should be partitioned to obtain optimal performance for a query?

July 15, 2009 - 11:59 am UTC

there are only two answers to every question

a) why?
b) it depends

the answer in this case is (b).

A partitioned index might

a) be faster
b) be slower
c) be the same as

a non-partitioned index, it depends.

For example

suppose the index is unique. suppose the index is HUGE (it has a height of 5) when not partitioned. assume the index is used by the query:

select * from t where index_key = :bv;

a billion times per day. So that query returns a single row. We do 5 IO's against the index and 1 IO against the table. If we range partition that index into N partitions - maybe the new index partitions would have heights of 3 instead of 5. We now run a bit of code to figure out what partition we'll hit (CPU), then read that partition (3 IO's - a little less cpu, less latching) and then read the table.

So, we'll have gone from 6 billion logical IO's per day down to 4 billion logical IO's per day, the cost of processing the partitioned index is almost certainly more than offset by the reduction in logical IO's (not only that, but we have made the host index root block less hot by having more of them)

Outcome: probably beneficial.

Suppose however, the index is not that large, it has a height of 3 (possible for many tens of millions of records). Suppose that partitioning it makes it have a height of.... 3 (no change). Now we have made index range scans perhaps take a little longer since we didn't reduce the logical IO's but we did increase the CPU needed to make use of the index (we have to partition eliminate first).

Outcome: probably either negative or no impact one way or the other.

In other words - you need to close your eyes and envision the work that will be performed in your system, using your knowledge of how the database works, how partitioning works, how your data is accessed, how your data is physically represented.

where clause

Reader, July 14, 2009 - 9:27 pm UTC

In your example above,

select * from t where dt = :dt and last_name = :ln;

why would I refer dt column FIRST in the where clause, if I am interested only in last_name?

Can I just write the query as,

select * from t where last_name = :ln;

Thanks.

July 15, 2009 - 12:02 pm UTC

because the data is partitioned by dt

and last_name = :ln could be in ANY (or all or none) of the partitions.

I didn't want the data for "last_name = :ln" from the old data. That is why. It was my query, I can make it be anything I want.

In other words, it was an *example* - in direct response to the "a reader" above who said they wanted to query the 'current partition' (which since they partition by date, must have a date column in the where clause to hit the 'current partition') but use other things in the where clause - they were upset (for whatever reason - none provided - just 'fear' of the unknown I guess) that we used the global index and not the index on the partition key.

So I gave a conceptual example to demonstrate that "it is probably OK, think about it..."

Partition key column

David, July 17, 2009 - 10:46 pm UTC

select * from t where dt = :dt and last_name = :ln;

If I used for example, today's date for dt above, would the optimizer access the current week partition alone? OR would it use the global index on last name and then access the data in current week partition?

Thanks

July 24, 2009 - 9:04 am UTC

it depends. did you apply critical reading to what I wrote above? I addressed this exact case.

If the optimizer thinks the index would be best, it will use the global index.

If the optimizer thinks a full scan of a given partition would be best, it will full scan the single partition.

I can demonstrate BOTH cases being used easily. Sometimes the index is correct, sometimes a full scan of a single partition is correct.

insert into partitioned tables slow

A reader, April 08, 2010 - 1:17 pm UTC

Hi

We have a DWH and every night we load around 500 million of rows into the fact tables.

We run plain INSERT APPEND ... INTO TABLE ... SELECT. Really slow, compared with INSERT APPEND ... INTO TABLE PARTITION() ... SELECT.

The first takes 35 minutes to load 30 million of rows, the second takes 4 minutes. Big difference.

Obvisouly specifying partition is faster but sometimes we load so many partitions data that it can be tricky to find the partition names.

My question is: is the behaviour we observe normal?

April 13, 2010 - 8:12 am UTC

tell me, are you doing this "in parallel", meaning - are you firing off many sessions at the same time?

If so, only one session at a time can append into a given segment.

insert /*+ append */ into table;

locks the TABLE.

insert /*+ append */ into table partition)(p);

locks the PARTITION.

I'd say you are doing many sessions and you are serializing unless you use the partition extended name.

did I guess right? Not too much to go on here, no diagnostics by you? No wait events?

insert into partitioned tables slow

A reader, April 24, 2010 - 1:29 pm UTC

Hi

Sorry for the late reply, was travelling with no inet access

We run one insert per table, this insert invokes parallel dml and parallel query so we dont suffer any locking issues.

We found some bottleneck, seems that the space management was causing the slowing down. For this system we use bigfiles in 10gR2 in RAC and whenever the datafile needs to extend we suffer contention on block 2 in the file (the lmt bitmap?), the autoextend seems too small, only 50MB, after increased to 1024MB we observe better load time.

But later on we found that when we do insert into table uses much more rowcache than insert into table partition(), guess that is unavoidable.

Thanks for the help

Ashwin, January 20, 2011 - 6:06 am UTC

Hi Tom,
A fairly simple question here: Whats the effect of doing a "select *" from a very BIG table as opposed to reading all the records across all the partitions of the same table after partitioning the table? I hate to ask this but which one would be faster?

Regards,
Ashwin,

January 24, 2011 - 6:03 am UTC

it would be something I call "six one way, half a dozen the other"

they would be about the same - assuming the IO speeds were the same (eg: the partitioned table is on devices that have the same speed, layout, contention as the non-partitioned table)

Partition table with Index

car, February 14, 2011 - 8:56 pm UTC

Hi Tom,

We have a Table Partitioned by Range(Date) then subpartion by list. There are 14mn records in the table. We also created an Index on four columns with the same above parition plan as local.

When i run a query on the table without index it is taking 2 secs and resulting 20,000 records. If i create Index the same query is taking more than 5 mins. When i checked explain plan without index it is using parallel and with index also it is using parallel and Index. but still the query is slow.

Is there any explanation for this behavior?

Regards
car

February 15, 2011 - 8:18 am UTC

no example detailing what the query looks like, what the index looks like, what the table looks like, what the data looks like, what the plans actually are....

no comment can be made.

Full table scan or partition scan

Udhay, October 05, 2020 - 6:39 am UTC

Hi,

Let us consider due to a large number of entries in a table, the oracle decides to perform a full table scan instead of an index range scan.

My questions are:

1. If our table is partitioned (consider the query contains partition key ) will it search only the partition or the entire table. If the search happens only in partition, then the performance is better than a non-partitioned table. Is my understanding correct?

2. If our table is partitioned (consider the query contains partition key) will it search all the partitions .In this case, it will be equivalent to the FTS of the non-partitions table. Is my understanding correct?

October 06, 2020 - 6:33 am UTC

Depends on the query.

If you do

select * from table where col = ....

and "col' is NOT the partition key, then we'll scan (index or otherwise) each partition

select * from table where parcol = .... and col = ...

and 'parcol' is the partitioning column, then we'll only need to scan the particular partition. (Same goes for each greater than and less than)

Similarly, the index itself might be partitioned which can impact how much of the index is scanned (if an index was chosen by the optimizer)

More to Explore

VLDB

If you are new to partitioning, check out Connor McDonald's introduction series here.

VLDB

Documentation set on VLDB and Partitioning.

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

Review

Review

just cross my mind

data distribution

A little suggestion to Goran

When to partition

Review

for Christo

Oracle9i multiple slaves can act on each partition

Surprising performance improvement after partitioning

Surprising performance improvement after partitioning

Surprising performance improvement after partitioning

speed and partitioning

Statistics collection on partitions

Regarding Data loading

Rethink partitioning: real Business case

access current partition

global index

where clause

Partition key column

insert into partitioned tables slow

insert into partitioned tables slow

Partition table with Index

Full table scan or partition scan

More to Explore

VLDB

VLDB

performance problem with partitioning table

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

Review

Review

just cross my mind

data distribution

A little suggestion to Goran

When to partition

Review

for Christo

Oracle9i multiple slaves can act on each partition

Surprising performance improvement after partitioning

Surprising performance improvement after partitioning

Surprising performance improvement after partitioning

speed and partitioning

Statistics collection on partitions

Regarding Data loading

Rethink partitioning: real Business case

access current partition

global index

where clause

Partition key column

insert into partitioned tables slow

insert into partitioned tables slow

Partition table with Index

Full table scan or partition scan

More to Explore

VLDB

VLDB