Skip to Main Content
  • Questions
  • Best indexing for LEAD / LAG analytic functions

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Claudio.

Asked: October 03, 2016 - 11:51 am UTC

Last updated: October 05, 2016 - 1:06 am UTC

Version: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

Viewed 10K+ times! This question is

You Asked

Hi,

I was wondering which is the best index structure you can issue on a table in order to get LEAD / LAG functions perform at their best.

An example.

CREATE TABLE DWH1_PLF.TEST
(
  COD_SAP      VARCHAR2( 10 CHAR ),
  PLAFOND      NUMBER,
  DT_CALENDAR  DATE
)


Each day a full set of new rows is added ( /*+Append */) to the table, with current values of PLAFOND field for that day.
With a full Customer base of about 50.000+ codes, each day the TEST table therefore increases by 50.000+ rows.

Now I'd like to scan the table to extract the variations of the attribute PLAFOND, squeezing therefore the information into a much more manageable table with a fraction of the "master" Filter table (which is an object I cannot modify).

I wrote:

SELECT   
        Q.COD_SAP, 
        Q.PLAFOND_PREV,
        Q.DT_PLAFOND_NEW - 1 AS DT_UP_TO,
        Q.PLAFOND_NEW,
        Q.DT_PLAFOND_NEW
FROM 
        (
        SELECT 
                COD_SAP,
                DT_CALENDAR AS DT_PLAFOND_NEW,
                PLAFOND AS PLAFOND_NEW,
                LAG ( PLAFOND ) 
                  OVER (
                       PARTITION BY COD_SAP 
                       ORDER BY DT_CALENDAR 
                       ) AS PLAFOND_PREV
        FROM 
                TEST
        ) Q
WHERE 
        PLAFOND_PREV != PLAFOND_NEW 
    OR  PLAFOND_PREV IS NULL
ORDER 
    BY  COD_SAP, 
        DT_PLAFOND_NEW



which gets, for the code "1614052":

COD_SAP PLAFOND_PREV    DT_UP_TO    PLAFOND_NEW  DT_PLAFOND_NEW
======================================================================
1614052       <NULL>    2015-09-11        20000  2015-09-12
1614052        20000    2016-01-12        10000  2016-01-13
1614052        10000    2016-02-15         5000  2016-02-16
1614052         5000    2016-04-26            0  2016-04-27



that is what I want. But, I was wondering, which is the best combination of Indexes that should be put on the table to get the best performance?

Would it be better to add several simple indexes (e.g. an Index on COD_SAP, another one on PLAFOND, another one on DT_CALENDAR) or two or more compund indexes, e.g. one with (COD_SAP, DT_CALENDAR, PLAFOND) and one with (PLAFOND, DT_CALENDAR) or a combination of both?
Another way to pose the question is: in presence of an analytical grouping function LEAD / LAG, is it there a "standard indexing guideline" recommending, for example, to have an index on the field used in the PARTITION BY, another one for the field used in the ORDER BY and so on? (just like the "rule" to index the columns used in a JOIN clause).

Thank you for your support.

Claudio de Biasio

and Connor said...

I'm assuming that you are scanning all of the COD_SAP values each day - which pretty much means a indexing strategy is unlikely to be of benefit, because that's at least 50,000 index probes you would be doing.

But let's explore options

1) Basic table

SQL> CREATE TABLE t
  2  (
  3    COD_SAP      VARCHAR2( 10 CHAR ),
  4    PLAFOND      NUMBER,
  5    DT_CALENDAR  DATE
  6  ) pctfree 0 ;

Table created.

SQL> insert /*+ append */ into t
  2  select x1,
  3         trunc(x2/10),
  4         date '2016-01-01' + x2
  5  from
  6    ( select /*+ no_merge */ level x1 from dual connect by level <= 50000 order by dbms_random.value ) x,
  7    ( select level x2 from dual connect by level <= 100 ) y;

5000000 rows created.

SQL> commit;

Commit complete.

SQL> set arraysize 100
SQL> set autotrace traceonly stat
SQL> SELECT
  2          Q.COD_SAP,
  3          Q.PLAFOND_PREV,
  4          Q.DT_PLAFOND_NEW - 1 AS DT_UP_TO,
  5          Q.PLAFOND_NEW,
  6          Q.DT_PLAFOND_NEW
  7  FROM
  8          (
  9          SELECT
 10                  COD_SAP,
 11                  DT_CALENDAR AS DT_PLAFOND_NEW,
 12                  PLAFOND AS PLAFOND_NEW,
 13                  LAG ( PLAFOND )
 14                    OVER (
 15                         PARTITION BY COD_SAP
 16                         ORDER BY DT_CALENDAR
 17                         ) AS PLAFOND_PREV
 18          FROM
 19                  t
 20          ) Q
 21  WHERE
 22          PLAFOND_PREV != PLAFOND_NEW
 23      OR  PLAFOND_PREV IS NULL
 24  ORDER
 25      BY  COD_SAP,
 26          DT_PLAFOND_NEW;

550000 rows selected.


Statistics
----------------------------------------------------------
        206  recursive calls
          6  db block gets
      13508  consistent gets
      43168  physical reads
          0  redo size
   15990077  bytes sent via SQL*Net to client
      61041  bytes received via SQL*Net from client
       5501  SQL*Net roundtrips to/from client
          1  sorts (memory)
          1  sorts (disk)
     550000  rows processed


------------------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |  5000K|   200M|       | 29909   (1)| 00:00:02 |
|*  1 |  VIEW               |      |  5000K|   200M|       | 29909   (1)| 00:00:02 |
|   2 |   WINDOW SORT       |      |  5000K|    81M|   134M| 29909   (1)| 00:00:02 |
|   3 |    TABLE ACCESS FULL| T    |  5000K|    81M|       |  2395   (2)| 00:00:01 |
------------------------------------------------------------------------------------


So you can see we did a big sort across the whole table, and as a result, I had to dump some sort data to disk. So you can see sorts to disk, and 43,000 IO's.

2) Partitioned table

We can achieve some data "locality" by partitioning the table on COD_SAP, so data for each code is more co-located, and thus requires less sorting effort


SQL> drop table t1 purge;

Table dropped.

SQL>
SQL> CREATE TABLE t1
  2  (
  3    COD_SAP      VARCHAR2( 10 CHAR ),
  4    PLAFOND      NUMBER,
  5    DT_CALENDAR  DATE
  6  ) pctfree 0
  7  partition by hash ( COD_SAP )
  8  partitions 32;

Table created.

SQL>
SQL> insert /*+ append */ into t1
  2  select x1,
  3         trunc(x2/10),
  4         date '2016-01-01' + x2
  5  from
  6    ( select /*+ no_merge */ level x1 from dual connect by level <= 50000 order by dbms_random.value ) x,
  7    ( select level x2 from dual connect by level <= 100 ) y;

5000000 rows created.

SQL>
SQL> commit;

Commit complete.

SQL>
SQL> set arraysize 100
SQL> set autotrace traceonly stat
SQL>
SQL> SELECT
  2          Q.COD_SAP,
  3          Q.PLAFOND_PREV,
  4          Q.DT_PLAFOND_NEW - 1 AS DT_UP_TO,
  5          Q.PLAFOND_NEW,
  6          Q.DT_PLAFOND_NEW
  7  FROM
  8          (
  9          SELECT
 10                  COD_SAP,
 11                  DT_CALENDAR AS DT_PLAFOND_NEW,
 12                  PLAFOND AS PLAFOND_NEW,
 13                  LAG ( PLAFOND )
 14                    OVER (
 15                         PARTITION BY COD_SAP
 16                         ORDER BY DT_CALENDAR
 17                         ) AS PLAFOND_PREV
 18          FROM
 19                  t1
 20          ) Q
 21  WHERE
 22          PLAFOND_PREV != PLAFOND_NEW
 23      OR  PLAFOND_PREV IS NULL
 24  ORDER
 25      BY  COD_SAP,
 26          DT_PLAFOND_NEW;

550000 rows selected.


Statistics
----------------------------------------------------------
         85  recursive calls
          0  db block gets
      13585  consistent gets
      13520  physical reads
        124  redo size
   15990077  bytes sent via SQL*Net to client
      61041  bytes received via SQL*Net from client
       5501  SQL*Net roundtrips to/from client
         34  sorts (memory)
          0  sorts (disk)
     550000  rows processed



------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |  5000K|   200M|       | 29984   (1)| 00:00:02 |       |       |
|   1 |  SORT ORDER BY        |      |  5000K|   200M|       | 29984   (1)| 00:00:02 |       |       |
|   2 |   PARTITION HASH ALL  |      |  5000K|   200M|       | 29984   (1)| 00:00:02 |     1 |    32 |
|*  3 |    VIEW               |      |  5000K|   200M|       | 29984   (1)| 00:00:02 |       |       |
|   4 |     WINDOW SORT       |      |  5000K|    81M|   134M| 29984   (1)| 00:00:02 |       |       |
|   5 |      TABLE ACCESS FULL| T1   |  5000K|    81M|       |  2470   (2)| 00:00:01 |     1 |    32 |
------------------------------------------------------------------------------------------------------




So we still *read* the same amount of data, but notice our sorting was in memory, because we did a sort *per partition*, because a COD_SAP cannot span a partition. Hence our IO's dropped noticeably.

3) IOT

You'll see why I said an indexing strategy is possible not appropriate. What if the table itself was an index - does that help ?


SQL> drop table t2 purge;

Table dropped.

SQL>
SQL> CREATE TABLE t2
  2  (
  3    COD_SAP      VARCHAR2( 10 CHAR ),
  4    PLAFOND      NUMBER,
  5    DT_CALENDAR  DATE,
  6    constraint t2_pk primary key ( COD_SAP,DT_CALENDAR, PLAFOND)
  7  )
  8  organization index ;

Table created.

SQL>
SQL> insert /*+ append */ into t2
  2  select x1,
  3         trunc(x2/10),
  4         date '2016-01-01' + x2
  5  from
  6    ( select /*+ no_merge */ level x1 from dual connect by level <= 50000 order by dbms_random.value ) x,
  7    ( select level x2 from dual connect by level <= 100 ) y;

5000000 rows created.

SQL>
SQL> commit;

Commit complete.

SQL>
SQL> set arraysize 100
SQL> set autotrace traceonly stat
SQL>
SQL> SELECT
  2          Q.COD_SAP,
  3          Q.PLAFOND_PREV,
  4          Q.DT_PLAFOND_NEW - 1 AS DT_UP_TO,
  5          Q.PLAFOND_NEW,
  6          Q.DT_PLAFOND_NEW
  7  FROM
  8          (
  9          SELECT
 10                  COD_SAP,
 11                  DT_CALENDAR AS DT_PLAFOND_NEW,
 12                  PLAFOND AS PLAFOND_NEW,
 13                  LAG ( PLAFOND )
 14                    OVER (
 15                         PARTITION BY COD_SAP
 16                         ORDER BY DT_CALENDAR
 17                         ) AS PLAFOND_PREV
 18          FROM
 19                  t2
 20          ) Q
 21  WHERE
 22          PLAFOND_PREV != PLAFOND_NEW
 23      OR  PLAFOND_PREV IS NULL
 24  ORDER
 25      BY  COD_SAP,
 26          DT_PLAFOND_NEW;

550000 rows selected.


Statistics
----------------------------------------------------------
        459  recursive calls
          4  db block gets
      13319  consistent gets
      46729  physical reads
        352  redo size
   15990077  bytes sent via SQL*Net to client
      61041  bytes received via SQL*Net from client
       5501  SQL*Net roundtrips to/from client
          1  sorts (memory)
          1  sorts (disk)
     550000  rows processed


---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |  4594K|   184M| 13160   (1)| 00:00:01 |
|*  1 |  VIEW             |       |  4594K|   184M| 13160   (1)| 00:00:01 |
|   2 |   WINDOW BUFFER   |       |  4594K|   127M| 13160   (1)| 00:00:01 |
|   3 |    INDEX FULL SCAN| T2_PK |  4594K|   127M| 13160   (1)| 00:00:01 |
---------------------------------------------------------------------------




You can see - even though we walked the index in order, the window operation still required a hefty sort.

Hope this helps.

Rating

  (2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Thank you!

Claudio de Biasio, October 04, 2016 - 10:00 am UTC

Hi Connor,

thank you so much for your comprehensive answer. I hoped that would have not been the case, and that an index could have helped...

Well, I have *A LOT* to learn about Oracle, I see!!! Thank you again for your interesting analysis.
Connor McDonald
October 04, 2016 - 11:33 am UTC

We all never stop learning about Oracle.

That's why I love working on AskTom

very good ; +question

A reader, October 04, 2016 - 12:44 pm UTC

does
set autotrace traceonly stat
works on sqldeveloper?

what would be the privs and rights to make it work

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.