Thanks for the question, Venugopal.

Asked: March 05, 2020 - 6:49 pm UTC

Last updated: March 10, 2020 - 9:21 am UTC

Version: oracle database 11g

Viewed 1000+ times

You Asked

Hi Team,

We have non partitioned Audit tables with billion of rows and now we are trying to partition those tables and archive them periodically .
From your blog we understood that partition strategy should be purely based on data access and manageability.
Most of our tables are parent/child relationship in 11g database and so we have chosen composite Reference partitioning in POC.
Main partition is Range over (log_date_time) and sub partition Hash on Corpid (CIID Number).
We tested both Global Hash /Local Indexes on primary keys( CIID, ADLOGKEY) and ( log_date_time,CIID, ADLOGKEY)
With Global Hash on indexes (insert performed better with less Index contention) but select where bad for 3months range too with proper Indexing.
With Local Indexes we see insert taking 2-3mints more than global with more contention and select doing good.
We want an optimal solution for both OLTP and OLAP.

Most of our select queries will be based on logdatetime range between 1 to 18 months ,with VPD Context on Corpid (CIID) in every query.
Ex:

 select few columns from FROM  ADLOG_TMP, ADLOG_DETAIL_TMP
 WHERE  ADLOG_TMP.account_id IS NOT NULL
       AND ADLOG_TMP.auditlog_key = ADLOG_DETAIL_TMP.auditlog_key (+)
       AND (ADLOG_TMP.log_date_time) >= to_date('1/1/2018 00:00:00 ','mm/dd/yyyy hh24:mi:ss')
       AND (ADLOG_TMP.log_date_time) <= to_date('12/31/2018 23:59:59','mm/dd/yyyy hh24:mi:ss')
       AND (ADLOG_TMP.record_type) NOT IN (
   'Forfeiture Rate',
            'Performance Measurement',
            'Contribution'
            
    )
    AND nvl((ADLOG_TMP.log_type), 'NULL_LOG_TYPE') = CASE ('All')
        WHEN 'All'         THEN nvl( (ADLOG_TMP.log_type),'NULL_LOG_TYPE')
  WHEN 'Additions'   THEN 'ADD'
  WHEN 'Deletions'   THEN 'DELETE'
  WHEN 'Changes'     THEN 'CHANGE'
    END

Partition strategy POC details :

1) Single partition Range on table and global on index primary key ( CIID,ADLOGKEY)
2) Composite partition Range-Hash on (logdatetime/CIID ) and with local Indexes (logdatetime,CIID,ADLOGKEY)

does single/composite partition makes a difference or not ? please suggest.

metrics are similar in both scenarios

single partition - Inserts elapsed time Indexing(global)
6M 6:23:24
Selects elapsed time
400krows 4Months 2:23:24

Composite (Range/Hash) 6M 6:00 23 Indexing(local)
400krows 4Mon 1:50 23

Composite(Range/Hash) 6M 4:59:05 (Global Hash)
selects elapsed time
400krows 4Months Query takes long time. fetches only 1500rows in 7mints.

Questions :
1) which one we should go with single /composite partitioning based on insert & selects.
2) what type of indexing would be better in this scenario
3) Why selects not performing good with globalHash indexing.

Table Partitioning:

CREATE TABLE "XXX"."ADLOG_TMP" 
   ( "AUDITLOG_KEY" NUMBER(38,0) NOT NULL ENABLE, 
 "RECORD_TYPE" VARCHAR2(100 CHAR), 
 "ACCOUNT_ID" VARCHAR2(11 CHAR), 
 "GRANT_NUMBER" VARCHAR2(25 CHAR), 
 "REFERENCE" VARCHAR2(100 CHAR), 
 "AUDIT_CONTEXT" VARCHAR2(100 CHAR), 
 "LOG_TYPE" VARCHAR2(15 CHAR), 
 "USER_ID" VARCHAR2(30 CHAR), 
 "LOG_DATE_TIME" DATE not null, 
 "PROCESS_TYPE" VARCHAR2(50 CHAR), 
 "LASTNAME" VARCHAR2(30 CHAR), 
 "RECORDCOUNT" NUMBER, 
 "RELATION" NUMBER, 
 "PROCESS_KEY" NUMBER, 
 "PARENTCHECK" NUMBER(1,0), 
 "CORPINSTANCE_ID" NUMBER(15,0) DEFAULT SYS_CONTEXT ('VPDSETM_EEO_CONTEXT','CIID') NOT NULL ENABLE, 
 "RECORD_LAST_MODIFIED" DATE, 
 "RECORD_ADDED" DATE, 
 "USER_NAME" VARCHAR2(20 CHAR), 
 "VERSIONCOLUMN" NUMBER(15,0), 
 "LOG_DATE" DATE , 
 "SQLTEXT" VARCHAR2(4000 CHAR)
  )
  SEGMENT CREATION IMMEDIATE 
  PCTFREE 10 PCTUSED 40 INITRANS 15 MAXTRANS 255 
 COMPRESS FOR OLTP  LOGGING 
  STORAGE( BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "EEO_LARGE_DATA" 
  PARTITION BY RANGE ("LOG_DATE_TIME") 
  SUBPARTITION BY HASH(CORPINSTANCE_ID) SUBPARTITIONS 64
( PARTITION "P1"  VALUES LESS THAN (TO_DATE(' 2013-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  PARTITION "P2"  VALUES LESS THAN (TO_DATE(' 2013-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  PARTITION "P3"  VALUES LESS THAN (TO_DATE(' 2013-10-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  PARTITION "P4"  VALUES LESS THAN (TO_DATE(' 2014-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  PARTITION "P5"  VALUES LESS THAN (TO_DATE(' 2014-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  PARTITION "P6"  VALUES LESS THAN (TO_DATE(' 2014-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  PARTITION "P7"  VALUES LESS THAN (TO_DATE(' 2014-10-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  PARTITION "P8"  VALUES LESS THAN (TO_DATE(' 2015-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  PARTITION "P9"  VALUES LESS THAN (TO_DATE(' 2015-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  PARTITION "P10"  VALUES LESS THAN (TO_DATE(' 2015-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  PARTITION "P11"  VALUES LESS THAN (TO_DATE(' 2015-10-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  PARTITION "P12"  VALUES LESS THAN (TO_DATE(' 2016-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  PARTITION "P13"  VALUES LESS THAN (TO_DATE(' 2016-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  PARTITION "P14"  VALUES LESS THAN (TO_DATE(' 2016-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  PARTITION "P15"  VALUES LESS THAN (TO_DATE(' 2016-10-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  PARTITION "P16"  VALUES LESS THAN (TO_DATE(' 2017-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  PARTITION "P17"  VALUES LESS THAN (TO_DATE(' 2017-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  PARTITION "P18"  VALUES LESS THAN (TO_DATE(' 2017-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  PARTITION "P19"  VALUES LESS THAN (TO_DATE(' 2017-10-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  PARTITION "P20"  VALUES LESS THAN (TO_DATE(' 2018-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  PARTITION "P21"  VALUES LESS THAN (TO_DATE(' 2018-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  PARTITION "P22"  VALUES LESS THAN (TO_DATE(' 2018-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  PARTITION "P23"  VALUES LESS THAN (TO_DATE(' 2018-10-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  PARTITION "P24"  VALUES LESS THAN (TO_DATE(' 2019-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  PARTITION "P25"  VALUES LESS THAN (TO_DATE(' 2019-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  PARTITION "P26"  VALUES LESS THAN (TO_DATE(' 2019-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  PARTITION "P27"  VALUES LESS THAN (TO_DATE(' 2019-10-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  PARTITION "P28"  VALUES LESS THAN (TO_DATE(' 2020-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  PARTITION "P29"  VALUES LESS THAN (TO_DATE(' 2020-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')), 
  PARTITION "P30"  VALUES LESS THAN (TO_DATE(' 2020-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  PARTITION "P31"  VALUES LESS THAN (TO_DATE(' 2020-10-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) 
  );
  
Child Table :
=========
 CREATE TABLE "xxxx"."ADLOG_DETAIL_TMP" 
   (
    "AUDITLOGDETAIL_KEY" NUMBER(38,0) not null, 
 "AUDITLOG_KEY" NUMBER(38,0) not null, 
 "FIELD_CHANGED" VARCHAR2(30 CHAR), 
 "PRIOR_VALUE" VARCHAR2(500 CHAR), 
 "NEW_VALUE" VARCHAR2(500 CHAR), 
 "RELATION" NUMBER, 
 "CORPINSTANCE_ID" NUMBER(15,0) DEFAULT SYS_CONTEXT ('VPDSETM_EEO_CONTEXT','CIID') not null,
 "RECORD_LAST_MODIFIED" DATE, 
 "RECORD_ADDED" DATE, 
 "USER_NAME" VARCHAR2(20 CHAR), 
 "VERSIONCOLUMN" NUMBER(15,0) DEFAULT 0, 
 "USER_ID" NUMBER(38,0),
 "LOG_DATE_TIME" date not null,
 CONSTRAINT "FK_ADLOGDETTMP" FOREIGN KEY ("LOG_DATE_TIME","CORPINSTANCE_ID","AUDITLOG_KEY")
   REFERENCES "CISETM1"."ADLOG_TMP" ("LOG_DATE_TIME","CORPINSTANCE_ID","AUDITLOG_KEY" ) ON DELETE CASCADE ENABLE 
    )
 SEGMENT CREATION IMMEDIATE 
  PCTFREE 10 PCTUSED 40 INITRANS 15 MAXTRANS 255 
 COMPRESS FOR OLTP  LOGGING STORAGE(BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "EEO_LARGE_DATA" 
    Partition by Reference (FK_ADLOGDETTMP) ;

Thanks in Advance
Venu

and Chris said...

If you haven't already done so, check Connor's partitioning guide:

https://asktom.oracle.com/partitioning-for-developers.htm

This should help you decide what to go for. Ultimately there's always going to be trade-offs. You need to decide which gains are worth the costs. That's a decision you need to make bearing in mind your business objectives.

1) I'm not entirely clear what your numbers are showing. Exactly what are the inserts and selects?

But I'd stick with the KISS principle and go with single-level partitioning. Unless you find strong performance or manageability benefits of composite partitioning.

2) Hard to say for sure without seeing the exact queries you use. But you've mentioned VPD, so you'll need to ensure any indexes you create account for predicates this will add.

3) We'll need to see the execution plan for these queries. If you're saying it takes 7 minutes to get 1,500 rows back... something's not right.

If you're not sure how to get the plan, read:

https://blogs.oracle.com/sql/how-to-create-an-execution-plan

Rating

(4 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

some suggestions

A reader, March 09, 2020 - 10:33 am UTC

Hi

In order to have benefits of partitiong one should have in where clause or in join conditions references to partitioning columns.
There should be ADLOG_TMP.CIID = ADLOG_DETAIL_TMP.CIID join condition on this sql statement.

If You use the partitioning scheme You suggested, You might try to change index (log_date_time,CIID, ADLOGKEY) to (CIID, ADLOGKEY,log_date_time)
That might make that index more attractive to optimizer.

One possibility to consider is to partition tables on by the date (log_date_time) and then subpartition them by
automatic list partition by several columns (record_type, log_type).

The benefits of this partitioning method are very dependent of the distribution of values and what kind of queries one is performing.
Be also beware that
- current version of partition pruning seems not to understand 'not in' -operator. This is a real shame because this kind subpartitioning is attractive in some cases with multicolumn list partitioning.
- functions around partitioning columns disable partition pruning.

So Your current sql example would not not benefit of this partitioning.

Also consider compressing of old partitions. These rows won't change, so this is a good case for this.
Also consider using of linear clustering. In your suggested partioning method linear clustering by CIID, ADLOGKEY, LOG_DATE_TIME would make indexes to have much less IO when fetching rows.
Using basic compression and linear clustering without online clause as an separate operation when partition is 'ready', should not require any extra licenses beside EE and partitioning.

lh

March 09, 2020 - 10:42 am UTC

Yep, those suggestions are worth exploring.

reference partitioning

lh, March 09, 2020 - 11:42 am UTC

Hi

Also it might be worth noticing : https://jonathanlewis.wordpress.com/2018/03/19/reference-costs/

But if this is meaningful here depends how Your rows are inserted...

lh

March 09, 2020 - 2:33 pm UTC

Ah yes, missed that it's using reference partitioning. Good catch.

Venu, March 10, 2020 - 12:49 am UTC

Thanks for the useful info Tom & LH.

some clarifications required :

a) In order to have benefits of partitiong one should have in where clause or in join conditions references to partitioning columns.
There should be ADLOG_TMP.CIID = ADLOG_DETAIL_TMP.CIID join condition on this sql statement.

Our env is VPD enabled, by default this predicate clause will be added for each query and I was able to see them in explain plain clearly.

b) One possibility to consider is to partition tables on by the date (log_date_time) and then subpartition them by
automatic list partition by several columns (record_type, log_type).

Our business requirement is to partition on CorpID , so that it help out in other process which completely depend on this partition.

..I mean if we can segregate data using CIID , ease to fetch & copy for particular ciid data in smaller chunks only from that partition.

IS it right approach ?

c) Also it might be worth noticing : https://jonathanlewis.wordpress.com/2018/03/19/reference-costs/

Thanks for sharing that link, i was able to understand the extra cost for inserting in child table.

But with out reference partitioning, I was not able to delete the parent partition as referential constraint was enabled. Since we are still in 11g

I have to either disable and re-enable the constraint inorder to drop my parent partition/delete the rows which may cost more as per my understanding.

IS there any better way apart from referene partition which drop's parent & child related table partitions with out any overhead.

Thanks in Advance.
Venu

March 10, 2020 - 9:21 am UTC

IS it right approach ?

You'll need to test and verify it meets your performance goals, but it sounds like a good strategy. But it's worth trying out a few partitioning methods to see how they compare.

IS there any better way apart from referene partition which drop's parent & child related table partitions with out any overhead.

Not that I'm aware of.

humble suggestion

lh, March 13, 2020 - 4:22 pm UTC

Hi

In order to make good (?) suggestions some vital information is lacking.
- how many rows are inserted into tables each month.
- how many distinct values there are for CORPINSTANCE_ID and for (CORPINSTANCE_ID, AUDITLOG_KEY) per one month.

I would however dare to suggest following data structures and operations.

ADLOG_TMP:
partitioned by LOG_DATE_TIME, monthly basis (preferrable by internal partitioning)
subpartioned by hash(corpinstance_id), 64partitions (as currently is)

primary key (CORPINSTANCE_ID, AUDITLOG_KEY), compress 1. Globally partitioned by CORPINSTANCE_ID, hash 64 partitions.
define linear clustering by (CORPINSTANCE_ID, AUDITLOG_KEY)

ADLOG_DETAIL_TMP:
reference partitioned by ADLOG_TMP.

primary key index (CORPINSTANCE_ID, AUDITLOG_KEY, LOG_DATE_TIME), compress 1 or 2, depends of the how many different auditlog_key values there are)
globally partitioned by hash(CORPINSTANCE_ID)

or
primary key index (CORPINSTANCE_ID, AUDITLOG_KEY, LOG_DATE_TIME), compress 1 or 2, depends of the how many different auditlog_key values there are)
globally partitioned by LOG_DATE_TIME.

In either case linear clustering to CORPINSTANCE_ID, AUDITLOG_KEY, LOG_DATE_TIME.

Also it should be considered is there risk for 'hot blocks' when these audit logs are inserted. Is there need to distribute inserts into different partitions by having
hash(CORPINSTANCE_ID)?

After there will no more new records to these partitions, old partitions should be reorganized by doing partition move and having linear clustering.
That would reduce number of io-operations when accessing data by CORPINSTANCE_ID.

When considering options for partitioning strategy and indexes it might be useful to look at
https://richardfoote.wordpress.com/2018/10/25/hidden-efficiencies-of-non-partitioned-indexes-on-partitioned-tables-part-iii-ricochet/

((
The partitioning scheme that used multicolumns list partitioning is rather unknown. It was mentioned mainly as to give idea that this is currently (12.2) possible.
Also I wanted to point out, that Oracle has made a great feature, but it is not 'complete'. Oracle's feature that 'not in' operator is not doing partition pruning
has been quite sensible simplification in earlier versions. However multicolumn list partitioning has made list partitioning much more attractive and it now prevents partition pruning.
( Another one of these features that make much more sense nowadays is that, optimizer would utilize metadata more. E.g. if in table2 list partition is based on column STATUS
and there is statement
select * from table1 where exists (select * from table2 where table1.column1 = table2.column2 and table2.status='ACTIVE')
and table2 has index on column2.
Nowadays subquery does fetch row on table2, but if it would utilize metadata, it would only have to access index.
)

))

lh

More to Explore

Administration

Need more information on Administration? Check out the Administrators guide for the Oracle Database

Breadcrumb

Question and Answer

You Asked

and Chris said...

Rating

Comments

some suggestions

reference partitioning

humble suggestion

More to Explore

Administration

Partition strategy implementation

Breadcrumb

Question and Answer

You Asked

and Chris said...

Rating

Comments

some suggestions

reference partitioning

humble suggestion

More to Explore

Administration