You Asked

Hi AT

The database I am tasked to work on has partitions across a date column. It has 90 million rows and growing, which I know is considered a small table, by reading some of the answers to other questions on Ask Tom.

The following is just an example, not the actual table design.

There are currently 5 primary partitions, 2020, 2019, 2018, 2017, and 2016, and growing. Each of these partitions are divided into 4 sub partitions based on a VARCHAR2 claim status type column. I will use the following for the example:

BC - Before Claim
DC - During Claim
AC - After Claim
RC - Retired Claim.

So as a claim status type changes, what happens to the data, does the data "move" from one sub partition to the other sub partition within a year partition, with updates to the primary and secondary indexes? This would be one type of performance issue, correct?

When trying to work on queries with response time performance issues, I often have queries that do full table scans. I also know from reading other questions on Ask Tom that "Full Table Scans are Not Evil" necessarily.

Some times the queries need all data across all years for one of the claim types, say the BC claim type as a report. So 5 year partitions are used and 5 sub partitions are used.

If the partition scheme is reversed, where the Claim Types are used as the primary partitions and the years as the sub partitions, would there be any query performance advantage? For the same query across all years for the BC claim type, one main partition and (currently) 5 year partitions would be used. The number of rows processed would go down as a claim "moved" from one primary partition to the next and finally becomes a "RC" retired claim. (The RC primary partition would eventually have more records than the other primary partitions.)

The number of rows also increase as new years are added, but older claims are "in the process" of being "moved" through the various Claim Type status changes and help manage the number of rows being processed for report queries.

Would the "update" overhead be too much of a trade off?

Thank you
Charles

and Chris said...

So as a claim status type changes, what happens to the data, does the data "move" from one sub partition to the other sub partition within a year partition, with updates to the primary and secondary indexes?

If you update the partitioning column(s) for a table, then yes, the affected rows will move to another (sub)partition.

You need to enable row movement on the table to be able to do this.

If the partition scheme is reversed, where the Claim Types are used as the primary partitions and the years as the sub partitions, would there be any query performance advantage?

If you partition by year, subpartition by claim type, then finding all the rows for a claim type, the database will:

- Iterate through all the year partitions
- Access only the relevant type subpartition within this

If you flip the scheme around, then the query will

- Access only the partition for the claim type
- Read all the year subpartitions in this

To answer which is faster, the best thing to do is test!

Here's partitioning by year, subpartitioning by type:

create table t (
  id       integer,
  ins_year integer,
  status   varchar2(2),
  stuff    varchar2(100)
    default rpad ( 'stuff', 100, 'f' )
) partition by range ( ins_year )
  interval ( 1 )
  subpartition by list ( status ) 
  subpartition template (
    subpartition sub_bc values ( 'BC' ),
    subpartition sub_dc values ( 'DC' ),
    subpartition sub_ac values ( 'AC' ),
    subpartition sub_rc values ( 'RC' )
  ) (
    partition p2016 values less than ( 2017 )
  ) enable row movement;
  
insert into t ( id, ins_year, status )
with rws as (
  select level x from dual
  connect by level <= 100000
)
  select x, 
         ceil ( x / 25000 ) + 2016,
         case mod ( x, 4 ) 
           when 0 then 'BC'
           when 1 then 'DC'
           when 2 then 'AC'
           when 3 then 'RC'
         end 
  from   rws;
  
commit;

set serveroutput off
alter session set statistics_level = all;

select count(*) from t
where  status = 'AC';

select * 
from   table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST +PARTITION'));

----------------------------------------------------------------------------------------------------------    
| Id  | Operation               | Name | Starts | E-Rows | Pstart| Pstop | A-Rows |   A-Time   | Buffers |    
----------------------------------------------------------------------------------------------------------    
|   0 | SELECT STATEMENT        |      |      1 |        |       |       |      1 |00:00:00.01 |     628 |    
|   1 |  SORT AGGREGATE         |      |      1 |      1 |       |       |      1 |00:00:00.01 |     628 |    
|   2 |   PARTITION RANGE ALL   |      |      1 |  24962 |     1 |1048575|  25000 |00:00:00.01 |     628 |    
|   3 |    PARTITION LIST SINGLE|      |      5 |  24962 |   KEY |   KEY |  25000 |00:00:00.01 |     628 |    
|   4 |     TABLE ACCESS FULL   | T    |      5 |  24962 |   KEY |   KEY |  25000 |00:00:00.01 |     628 |    
----------------------------------------------------------------------------------------------------------


select count(*) from t
where  ins_year = 2018;

select * 
from   table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST +PARTITION'));

----------------------------------------------------------------------------------------------------------    
| Id  | Operation               | Name | Starts | E-Rows | Pstart| Pstop | A-Rows |   A-Time   | Buffers |    
----------------------------------------------------------------------------------------------------------    
|   0 | SELECT STATEMENT        |      |      1 |        |       |       |      1 |00:00:00.01 |     596 |    
|   1 |  SORT AGGREGATE         |      |      1 |      1 |       |       |      1 |00:00:00.01 |     596 |    
|   2 |   PARTITION RANGE SINGLE|      |      1 |  20455 |     3 |     3 |  25000 |00:00:00.01 |     596 |    
|   3 |    PARTITION LIST ALL   |      |      1 |  20455 |     1 |     4 |  25000 |00:00:00.01 |     596 |    
|*  4 |     TABLE ACCESS FULL   | T    |      4 |  20455 |     9 |    12 |  25000 |00:00:00.01 |     596 |    
----------------------------------------------------------------------------------------------------------

Flipping the partitioning schemes around, loading the same data and running the same queries gives:

drop table t 
  cascade constraints purge;
create table t (
  id       integer,
  ins_year integer,
  status   varchar2(2),
  stuff    varchar2(100)
    default rpad ( 'stuff', 100, 'f' )
) partition by list ( status ) 
  subpartition by range ( ins_year )
  subpartition template (
    subpartition p2016 values less than ( 2017 ),
    subpartition p2017 values less than ( 2018 ),
    subpartition p2018 values less than ( 2019 ),
    subpartition p2019 values less than ( 2020 )
  ) (
    partition sub_bc values ( 'BC' ),
    partition sub_dc values ( 'DC' ),
    partition sub_ac values ( 'AC' ),
    partition sub_rc values ( 'RC' )
  ) enable row movement;

select count(*) from t
where  status = 'AC';

---------------------------------------------------------------------------------------------------------    
| Id  | Operation              | Name | Starts | E-Rows | Pstart| Pstop | A-Rows |   A-Time   | Buffers |    
---------------------------------------------------------------------------------------------------------    
|   0 | SELECT STATEMENT       |      |      1 |        |       |       |      1 |00:00:00.01 |     628 |    
|   1 |  SORT AGGREGATE        |      |      1 |      1 |       |       |      1 |00:00:00.01 |     628 |    
|   2 |   PARTITION LIST SINGLE|      |      1 |  20455 |   KEY |   KEY |  25000 |00:00:00.01 |     628 |    
|   3 |    PARTITION RANGE ALL |      |      1 |  20455 |     1 |     4 |  25000 |00:00:00.01 |     628 |    
|   4 |     TABLE ACCESS FULL  | T    |      4 |  20455 |       |       |  25000 |00:00:00.01 |     628 |    
---------------------------------------------------------------------------------------------------------

select count(*) from t
where  ins_year = 2018;

-----------------------------------------------------------------------------------------------------------    
| Id  | Operation                | Name | Starts | E-Rows | Pstart| Pstop | A-Rows |   A-Time   | Buffers |    
-----------------------------------------------------------------------------------------------------------    
|   0 | SELECT STATEMENT         |      |      1 |        |       |       |      1 |00:00:00.01 |     660 |    
|   1 |  SORT AGGREGATE          |      |      1 |      1 |       |       |      1 |00:00:00.01 |     660 |    
|   2 |   PARTITION LIST ALL     |      |      1 |  28637 |     1 |     4 |  25000 |00:00:00.01 |     660 |    
|   3 |    PARTITION RANGE SINGLE|      |      4 |  28637 |     3 |     3 |  25000 |00:00:00.01 |     660 |    
|*  4 |     TABLE ACCESS FULL    | T    |      4 |  28637 |       |       |  25000 |00:00:00.01 |     660 |    
-----------------------------------------------------------------------------------------------------------

So the work done (buffers) is similar for all the queries. Querying on the partition key comes out slightly faster than the subpartition key in both cases, but the difference is small. YMMV

Would the "update" overhead be too much of a trade off?

Only you can answer that!

From your description, it sounds to me like you'll have up to three status (and thus partition changing) updates to a row in its life.

I suspect you'll query each row many times more than this. Meaning it's likely you'll get much more benefit from partition elimination speeding up your queries than an extra overhead for updates will cost you. But it really comes down to the details of how exactly and how often you'll be querying and updating.

Test, test, test!

One other important consideration is partition maintenance.

You can only have interval partitioning at the top-level, not subpartitions. So if you choose to make year the subpartition, you'll need to update the subpartition templates regularly to add the new years - or pre-create a huge number.

Just in case this is overlooked one year, you'll also want a maxvalue subpartition to avoid any data loss. Which means splitting rows into subpartitions in the event this is missed.

Whereas if you partition by year, subpartition by type, you can use interval partitioning. So you'll only need to do partition maintenance in the (rare?) event you add more statuses.

Is this answer out of date? If it is, please let us know via a Comment

More to Explore

Design

New to good database design? Check out Chris Saxon's full fundamentals class.

Breadcrumb

Question and Answer

You Asked

and Chris said...

More to Explore

Design

Choosing Subpartition vs. Partition Columns

Breadcrumb

Question and Answer

You Asked

and Chris said...

More to Explore

Design