Skip to Main Content

Breadcrumb

May 4th

Question and Answer

Chris Saxon

Thanks for the question, Omar.

Asked: July 08, 2019 - 8:46 pm UTC

Last updated: May 23, 2022 - 10:16 am UTC

Version: 11G R2

Viewed 1000+ times

You Asked

Hello, I'm not a native speaker so I hope to explain myself correctly,

I have a table three with the headers and a body table with the data, and I need get the average by every level, there are 5 levels but could modify in the future, but there are some conditions actually the level 4 and 5 have data (I know this could by weird but that was the requirement years ago), now I need to make a report getting the average by every level hierarchical, but I couldn't get the expected result, I show you an example:

WITH filtered AS (
    SELECT matriz_key,
           coalesce(paa,0) paa
      FROM totales
     WHERE period_key = 1
),groupeddata AS (
    SELECT mh.matriz_key,
           matriz_key_f,
           coalesce(paa,0) paa
      FROM matriz_h mh,
         filtered a1
     WHERE mh.matriz_key = a1.matriz_key (+)
)
SELECT matriz_key,
       matriz_key_f,
       paa,
       coalesce(round( (
           SELECT AVG(a2.paa)
             FROM groupeddata a2 START WITH
               a2.matriz_key_f = a1.matriz_key
           CONNECT BY
               PRIOR a2.matriz_key = a2.matriz_key_f
       ),2),0) avg_paa
  FROM groupeddata a1 START WITH
    matriz_key_f IS NULL
CONNECT BY
    PRIOR matriz_key = matriz_key_f;

MATRIZ_KEY MATRIZ_KEY_F LABEL LVL PAA AVG_PAA Avg_paa_man
1  -  1  1 0 22.06 31.88
2 1 1.1  2 0 23.44 31.88
3 2 1.1.1  3 0 25.00 20.00
4 3 1.1.1.1  4 0 25.00 20.00
5 4 1.1.1.1.1 5 10 0.00 0.00
6 4 1.1.1.1.2 5 0 0.00 0.00
7 4 1.1.1.1.3 5 50 0.00 0.00
8 4 1.1.1.1.4 5 40 0.00 0.00
9 3 1.1.1.2  4 20 20.00 20.00
10 9 1.1.1.2.1 5 0 0.00 0.00
11 9 1.1.1.2.2 5 35 0.00 0.00
12 9 1.1.1.2.3 5 0 0.00 0.00
13 9 1.1.1.2.4 5 45 0.00 0.00
14 3 1.1.2  3 0 43.75 43.75
15 14 1.1.2.1  4 35 46.67 43.75
16 15 1.1.2.1.1 5 100 0.00 0.00
17 15 1.1.2.1.2 5 20 0.00 0.00
18 15 1.1.2.1.3 5 20 0.00 0.00
19  -  2  1 0 26.82 36.88
20 19 2.1  2 0 29.50 36.88
21 20 2.1.1  3 0 25.00 25.00
22 21 2.1.1.1  4 0 33.33 25.00
23 22 2.1.1.1.1 5 30 0.00 0.00
24 22 2.1.1.1.2 5 0 0.00 0.00
25 22 2.1.1.1.3 5 70 0.00 0.00
26 20 2.1.2  3 0 48.75 48.75
27 26 2.1.2.1  4 55 46.67 48.75
28 27 2.1.2.1.1 5 65 0.00 0.00
28 27 2.1.2.1.1 5 75 0.00 0.00
29 27 2.1.2.1.2 5 0 0.00 0.00


in my example I calculated avg_paa in sql, but in excel I calculated avg_paa_man, avg_paa_man is how I expected get the result, is calculated getting the average from level 4-5, then to get the level 3 is avg of all level 4, after all to get level 2 is get the average of all level 3, for get level 1 I get the average of all level 2, but in SQL I get the average of all rows, and the calculates change, so, how can I correct my query to get avg_paa_man?

I hope you can understand me, thank you

--More info

In some cases this appears to be the mean of all the rows lower in the tree. In others it includes the current row. <- That's right, level 4 is the mean of all the rows lower even itself, because level 4 sometimes have some value by example matriz_key = 9 paa = 20, so level 4 needs to get average of all rows lower in the tree and itself, but in level 1-3 only need the average of the immediate leaves

now, avg_paa_man this is calculated of the next way:
Example calculations for tree 1.0:
matriz_key = 4 => avg(matriz_key=>4-8 & lvl => 4,5) = (0+10+0+50+40)/5=20
matriz_key = 9 => avg(matriz_key=>9-13 & lvl = 4,5) = (20+0+35+0+45)/5=20
matriz_key = 3 => (matriz_key=>4,9 & lvl => 4) = (20+20)/2=20
matriz_key = 2 => (matriz_key=>3,14 & lvl => 3) = (20+43.75)/2=round(31.875,2) = 31.88
matriz_key = 1 => (matriz_key=>2 & lvl => 2) = 31.88/1 = 31.88

Example calculations for tree 2.0:
matriz_key = 22 => avg(matriz_key=>22-25 & lvl => 4,5) = (0+30+0+70)/4=25
matriz_key = 27 => avg(matriz_key=>27-29 & lvl => 4,5) = (55+65+75+0)/4=48.75
matriz_key = 21 => (matriz_key=>22 & lvl => 4) = (25)/1=25
matriz_key = 26 => (matriz_key=>27 & lvl => 4) = (48.75)/1=48.75
matriz_key = 20 => (matriz_key=>20,26 & lvl => 3) = (25+48.75)/2=round(36.875,2)=36.88
matriz_key = 19 => (matriz_key=>20 & lvl => 2) = (36.88)/1=36.88

In this example I only wrote some leaves, but for each level = 2 there are around of 2-4, for level = 3 there are around 1-20 for level = 4 there are around 1-20 and for level =5 there are around 1-20, so the tree could be very big, but the calculations are like in my examples, only that I need to include every leaf for calculated it.

Thank you

with LiveSQL Test Case:

and Chris said...

To solve this problem you need to traverse the data set twice. Once down the tree to build the hierarchy. And back up it again to calculate the averages.

I reached out to the community on Twitter for help building solutions. Several people submitted queries, which we discussed in this month's SQL Ask TOM Office Hours. You can see this recording below:



The key to cracking this is to note that you're computing the mean across all rows with the same parent. You could do this using pipelined table functions, recursive with or the model clause.

For simplicity, I'm only showing Stew Ashton's model solution. You can get the code for other solutions on Live SQL:

https://livesql.oracle.com/apex/livesql/file/content_IOHVI2K5THQ213SZ6XOZPB402.html

Also we've assumed that:

- Only the bottom two levels have PAA values to include in the mean
- All leaves are at the same depth

If these aren't true, things get a more complicated ;)

The core of the model solution is:

    rules iterate (999) until iteration_number >= l.lvl[1] 
    ( 
      avg_paa[l.lvl[1] - iteration_number, any, any] =  
      case iteration_number 
        when 0 then 0 
        when 1 then 
          (paa[cv(),cv(),cv()] + sum(paa)[cv(lvl)+1, cv(matriz_key), any]) / 
          (1 + count(*)[cv(lvl)+1, cv(matriz_key), any]) 
        else avg(avg_paa)[cv(lvl)+1, cv(matriz_key), any] 
      end 
    ) 


l.lvl[1] stores the maximum depth of the tree. So the iteration loops through these levels until it hits this level. But reverses the level values. So the leaves (level 5) are zero.

For the leaves the average is simply zero. So this is the first case expression.

The next is the most complex. Not only does it have to average its child values, it has to add itself to this calculation. Which happens here:

(paa[cv(),cv(),cv()] + sum(paa)[cv(lvl)+1, cv(matriz_key), any]) / 
(1 + count(*)[cv(lvl)+1, cv(matriz_key), any]) 


Finally, the clause works back up the tree with:

avg(avg_paa)[cv(lvl)+1, cv(matriz_key), any] 


This averages all rows below the current which have the current matriz_key as the parent.

Put it together and you have:

with data as ( 
  select matriz_key, matriz_key_f, label, 
    coalesce(paa,0) paa, 0 avg_paa 
  from matriz_h mh 
  left join totales tot using(matriz_key) 
) 
, hier as ( 
  select matriz_key, matriz_key_f, label, level lvl, paa, avg_paa 
  from data s 
  start with matriz_key_f is null 
  connect by matriz_key_f = prior matriz_key 
) 
select matriz_key, matriz_key_f, label, lvl, paa, avg_paa 
from hier 
model 
  reference l on (select 1 dim, max(lvl) lvl from hier) 
    dimension by (dim) 
    measures (lvl) 
  main m 
    dimension by (lvl, matriz_key_f, matriz_key) 
    measures (label, paa, avg_paa) 
    rules iterate (999) until iteration_number >= l.lvl[1] 
    ( 
      avg_paa[l.lvl[1] - iteration_number, any, any] =  
      case iteration_number 
        when 0 then 0 
        when 1 then 
          (paa[cv(),cv(),cv()] + sum(paa)[cv(lvl)+1, cv(matriz_key), any]) / 
          (1 + count(*)[cv(lvl)+1, cv(matriz_key), any]) 
        else avg(avg_paa)[cv(lvl)+1, cv(matriz_key), any] 
      end 
    ) 
order by matriz_key;

MATRIZ_KEY    MATRIZ_KEY_F    LABEL        LVL    PAA    AVG_PAA   
            1          <null> 1                 1      0     31.875 
            2               1 1.1               2      0     31.875 
            3               2 1.1.1             3      0         20 
            4               3 1.1.1.1           4      0         20 
            5               4 1.1.1.1.1         5     10          0 
            6               4 1.1.1.1.2         5      0          0 
            7               4 1.1.1.1.3         5     50          0 
            8               4 1.1.1.1.4         5     40          0 
            9               3 1.1.1.2           4     20         20 
           10               9 1.1.1.2.1         5      0          0 
           11               9 1.1.1.2.2         5     35          0 
           12               9 1.1.1.2.3         5      0          0 
           13               9 1.1.1.2.4         5     45          0 
           14               2 1.1.2             3      0      43.75 
           15              14 1.1.2.1           4     35      43.75 
           16              15 1.1.2.1.1         5    100          0 
           17              15 1.1.2.1.2         5     20          0 
           18              15 1.1.2.1.3         5     20          0 
           19          <null> 2                 1      0     36.875 
           20              19 2.1               2      0     36.875 
           21              20 2.1.1             3      0         25 
           22              21 2.1.1.1           4      0         25 
           23              22 2.1.1.1.1         5     30          0 
           24              22 2.1.1.1.2         5      0          0 
           25              22 2.1.1.1.3         5     70          0 
           26              20 2.1.2             3      0      48.75 
           27              26 2.1.2.1           4     55      48.75 
           28              27 2.1.2.1.1         5     65          0 
           29              27 2.1.2.1.2         5     75          0 
           30              27 2.1.2.1.3         5      0          0 


PS - there appears to be a couple of errors in the source data.

* matiz_key 14 has 3 instead of 2 as a parent.
* matiz_key 28 appears twice in the data set.

I'm assuming these are mistakes, so have massaged the data.

Is this answer out of date? If it is, please let us know via a Comment

More to Explore

SQL

The Oracle documentation contains a complete SQL reference.