You Asked

Hi!,
recently I came with the task of getting the newest address from customers from an ORACLE JD Edwards database. In JDE, the table F0101 is the Address Book Master and table F0116 is Address by Date.
I got two solutions: one using joins:
SELECT T0.*
FROM jdedata910.F0116 T0
INNER JOIN (SELECT Max(T2.ALUPMJ) as MaxALUPMJ, T2.ALAN8
FROM jdedata910.F0116 T2
GROUP BY T2.ALAN8
) T1 ON (T0.ALAN8 = T1.ALAN8)
AND (T0.ALUPMJ = T1.MaxALUPMJ);

Other using Rank() Over():
SELECT *
FROM (SELECT T1.*,
RANK() OVER (PARTITION BY ALAN8
ORDER BY ALUPMJ DESC) "R"
FROM jdedata910.F0116 T1);

Where ALAN8 is the Address Number, AJUPMJ is the Address Date...

The SQLdeveloper advisor tells me that the first solution has a cost of 46 and that the second one a cost of 128.

I like the second solution very much being very straight forward, but it costs more. Please advice: am I over killing it using RANK() for this task?

By the way: in the submit form, Years Using Oracle select list always display one char, ie. 15 years shows 1.

Regards

and Connor said...

Since you have the two versions of the SQL, rather than looking at COST, I would look at what is really going to matter to users/admins, ie, elapsed time and resources consumed.

So use some simple tests in SQL Plus, eg

SQL> set timing on
SQL> set autotrace traceonly stat
SQL> select count(*) from t;

Elapsed: 00:00:00.00

Statistics
----------------------------------------------------
          0  recursive calls
          0  db block gets
        455  consistent gets
          0  physical reads
          0  redo size
        542  bytes sent via SQL*Net to client
        551  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Compare elapsed time for each, and "consistent gets/physical reads" for each.

Please post the two here - because I'm curious to see which one "wins"

Rating

(2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Luis Capriles, August 28, 2016 - 5:13 pm UTC

First, there was a typo in the Rank() query: WHERE missing... here it is:

SELECT *
FROM (SELECT T1.*, RANK() OVER (PARTITION BY ALAN8 ORDER BY ALUPMJ DESC) "R"
      FROM jdedata910.F0116 T1)
WHERE R = 1;

After setting autotrace I got this results (run the query three times in a row), that I think are not very interesting because the data set is very small...

                                         Test#1 Join Test#2 Join Test#3 Join  Test#1 Rank Test#2 Rank Test#3 Rank
      SAMPLE 1                           0:00:00.14  0:00:00.04  0:00:00.04   0:00:00.07  0:00:00.06  0:00:00.07
      recursive calls                             1           0           0            1           0           0
      db block gets                               0           0           0            0           0           0
      consistent gets                           208         208         208           76          76          76
      physical reads                              0           0           0            0           0           0
      redo size                                   0           0           0            0           0           0
      bytes sent via SQL*Net to client      128479      128479      128479       128387      128387      128387
      bytes received via SQL*Net from cli       990         990         990          990         990         990
      SQL*Net roundtrips to/from client          59          59          59           59          59          59
      sorts (memory)                              0           0           0            1           1           1
      sorts (disk)                                0           0           0            0           0           0
      rows processed                            860         860         860          860         860         860 of 864 rows

So I switched database and repeated:
      SAMPLE 2                           0:00:00.74  0:00:00.71  0:00:00.71   0:00:01.59  0:00:01.37  0:00:01.40
      recursive calls                             0           0           0           10          10          10
      db block gets                               0           0           0            5           5           5
      consistent gets                         10187       10186       10186         4531        4531        4531
      physical reads                           7117        7115        7115         5737        5737        5737
      redo size                                   0           0           0            0           0           0
      bytes sent via SQL*Net to client     2793698     2793698     2793698      2600750     2600750     2600750
      bytes received via SQL*Net from cli     34107       34107       34107        34107       34107       34107
      SQL*Net roundtrips to/from client        3055        3055        3055         3055        3055        3055
      sorts (memory)                              0           0           0            0           0           0
      sorts (disk)                                0           0           0            1           1           1
      rows processed                          45802       45802       45802        45902       45802       45802 of 561840 rows

Rank() is consistently taking almost double with low or medium volume of data, and I think worst, the sort went from memory to disk... I am curious how MAX() in the join query gets the result without sorting -full scan, per address? But RANK() uses less gets and reads: in a busy database will that be an advantage?

So, the winner is... join query (?)

Thank you.

August 29, 2016 - 12:50 am UTC

Interesting results on the larger set (especially the sort going to disk).

Could you run them both with as:

select /*+ GATHER_PLAN_STATISTICS */ ...
from ...

and then run

select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'))

I'd be interested in seeing the output for each.

In terms of "who wins" ? Well...(in most cases) priority #1 is response time for the users, so if your second set is representative of the real data, I'd pick the join.

Luis Capriles, August 29, 2016 - 4:01 am UTC

Hi!

Join query...

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST')) ;

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------
SQL_ID  4p0bx29gs48rq, child number 0
-------------------------------------
SELECT  /*+ GATHER_PLAN_STATISTICS */ T0.* FROM GDCNO.GD_CC_CITAS T0
INNER JOIN (SELECT Max(T2.HORAINICIO) as MaxHI, T2.PERSONAID
            FROM GDCNO.GD_CC_CITAS T2
            GROUP BY T2.PERSONAID
            ) T1 ON (T0.PERSONAID = T1.PERSONAID)
                AND (T0.HORAINICIO = T1.MaxHI)

Plan hash value: 3542141419


PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |                 |      1 |        |  45802 |00:00:00.52 |   10186 |   7134 |       |       |        |

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------
|*  1 |  HASH JOIN              |                 |      1 |     38 |  45802 |00:00:00.52 |   10186 |   7134 |  2550K|  1511K| 3169K (0)|
|   2 |   VIEW                  |                 |      1 |  45856 |  45802 |00:00:00.32 |    2630 |   2606 |       |       |        |
|   3 |    HASH GROUP BY        |                 |      1 |  45856 |  45802 |00:00:00.31 |    2630 |   2606 |  2534K|  1399K| 2951K (0)|
|   4 |     INDEX FAST FULL SCAN| GD_CC_CITAS_UK1 |      1 |    560K|    561K|00:00:00.08 |    2630 |   2606 |       |       |        |
|   5 |   TABLE ACCESS FULL     | GD_CC_CITAS     |      1 |    560K|    561K|00:00:00.10 |    7556 |   4528 |       |       |        |
-----------------------------------------------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------

   1 - access("T0"."PERSONAID"="T1"."PERSONAID" AND "T0"."HORAINICIO"="T1"."MAXHI")


27 rows selected.

Rank() query...

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST')) ;

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------
SQL_ID  05abcrwd8vvy1, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */ * 
FROM (SELECT T1.*, RANK() OVER (PARTITION BY PERSONAID ORDER BY HORAINICIO DESC) "R"       
FROM GD_CC_CITAS T1) 
WHERE R = 1

Plan hash value: 2208011245

---------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
---------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |             |      1 |        |  45802 |00:00:03.49 |    4536 |   5737 |   1209 |       |   |          |         |
|*  1 |  VIEW                    |             |      1 |    560K|  45802 |00:00:03.49 |    4536 |   5737 |   1209 |       |   |          |         |
|*  2 |   WINDOW SORT PUSHED RANK|             |      1 |    560K|  87044 |00:00:01.99 |    4536 |   5737 |   1209 |    10M|  1264K|   45M (1)|   10240 |
|   3 |    TABLE ACCESS FULL     | GD_CC_CITAS |      1 |    560K|    561K|00:00:00.10 |    4531 |   4528 |      0 |       |   |          |         |
---------------------------------------------------------------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("R"=1)
   2 - filter(RANK() OVER ( PARTITION BY "PERSONAID" ORDER BY INTERNAL_FUNCTION("HORAINICIO") DESC )<=1)


23 rows selected.

August 29, 2016 - 12:20 pm UTC

It looks like somewhere between 2.5M and 10M is the crossover point for your sorts to head down to disk, which possibly accounts for the rank option being that little slower.

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.

Breadcrumb

Question and Answer

You Asked

and Connor said...

Rating

Comments

More to Explore

Analytics

Table join vs Rank()Over()

Breadcrumb

Question and Answer

You Asked

and Connor said...

Rating

Comments

More to Explore

Analytics