Skip to Main Content
  • Questions
  • Calculating an exact percentile value in 8i

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Andrew.

Asked: March 23, 2004 - 10:03 am UTC

Last updated: September 11, 2007 - 8:24 am UTC

Version: 8.1.6

Viewed 1000+ times

You Asked

Tom,

I need to calculate a 95th (or any) percentile from a list of values. I realise there are NTILE and PERCENT_RANK functions in 8i but not the PERCENT_CONT function that's in 9i.

Do you know of a way of using the above 8i functions or another easy way to calculate a precise percentile value?

I realise this will involve interpolating the value. I'd like to use the analytic functions as I believe that should reduce the amount of code required.

Thanks

Andrew

and Tom said...

You may well be able to get what you need with row_number() and count(*). for example:

ops$tkyte@ORA9IR2> select deptno,
2 count(*),
3 percentile_cont(0.5) within group (order by sal) med
4 from emp
5 group by deptno
6 /

DEPTNO COUNT(*) MED
---------- ---------- ----------
10 3 2450
20 5 2975
30 6 1375

well, we want the average of the middle or middle two rows -- we can use this to see what rows we are interested in:

ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select deptno,
2 sal,
3 cnt,
4 rn,
5 case when
6 ((mod(cnt,2) = 1 and rn = ceil(cnt/2) )
7 or (mod(cnt,2) = 0 and rn in ( cnt/2, cnt/2+1 ) )) then '<<===='
8 end
9 from (
10 select deptno,
11 sal,
12 count(*) over (partition by deptno) cnt,
13 row_number() over(partition by deptno order by sal) rn
14 from emp
15 )
16 /

DEPTNO SAL CNT RN CASEWH
---------- ---------- ---------- ---------- ------
10 1300 3 1
10 2450 3 2 <<====
10 5000 3 3
20 800 5 1
20 1100 5 2
20 2975 5 3 <<====
20 3000 5 4
20 3000 5 5
30 950 6 1
30 1250 6 2
30 1250 6 3 <<====
30 1500 6 4 <<====
30 1600 6 5
30 2850 6 6

14 rows selected.


and then it becomes easy to get the 50th percenttile

ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select deptno,
2 avg(sal),
3 cnt
4 from (
5 select deptno,
6 sal,
7 count(*) over (partition by deptno) cnt,
8 row_number() over(partition by deptno order by sal) rn
9 from emp
10 )
11 where (mod(cnt,2) = 1 and rn = ceil(cnt/2) )
12 or (mod(cnt,2) = 0 and rn in ( cnt/2, cnt/2+1 ) )
13 group by deptno, cnt
14 /

DEPTNO AVG(SAL) CNT
---------- ---------- ----------
10 2450 3
20 2975 5
30 1375 6



using this technique, you should be able to find the row(s) such that 85% are 'before' it and 15% are 'after' it -- just by using different predicates on cnt and rn.

Rating

  (4 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Excellent

Andrew Briggs, March 23, 2004 - 11:38 am UTC

Tom,

Thanks for that. I took some time to understand it though! I've tried it on my sample data (too big to include) and I get a result that reconciles with my test spreadsheet function.

Could you give me a quick example of how I would modify the cnt an rn predicates to get a 95th percentile for example, just to be clear.

Thanks again

Andrew

Tom Kyte
March 23, 2004 - 5:24 pm UTC

Ok, head hurts, been long day -- this is as far as I've gotten it, say you wanted 95th percentile:

ops$tkyte@ORA10G>
ops$tkyte@ORA10G> select distinct deptno,
  2             max( case when rn/cnt <= 0.95 then rn/cnt end ) over ( partition by deptno ) min_rncnt,
  3             max( case when rn/cnt <= 0.95 then sal end ) over ( partition by deptno ) min_sal,
  4             min( case when rn/cnt >= 0.95 then rn/cnt end ) over ( partition by deptno ) max_rncnt,
  5             min( case when rn/cnt >= 0.95 then sal end ) over ( partition by deptno ) max_sal
  6    from (
  7  select deptno,
  8         sal,
  9         count(*) over (partition by deptno) cnt,
 10         row_number() over(partition by deptno order by sal) rn
 11    from emp
 12         )
 13  /
 
    DEPTNO  MIN_RNCNT    MIN_SAL  MAX_RNCNT    MAX_SAL
---------- ---------- ---------- ---------- ----------
        10 .666666667       2450          1       5000
        20         .8       3000          1    7781.24
        30 .833333333       1600          1       2850


that gives you by each deptno, the row(s) that are closest to 0.95 -- one below and one above (or just the one row if that applies).

Somewhere between 2450 and 5000 is your value for deptno 10 (and so on) :)

somewhere between....  I'll let someone who knows how to calculate a percentile take it from here :) 

It's put me on the right track

Andrew Briggs, March 23, 2004 - 6:04 pm UTC

Tom,

Thaks for the help. The use of analytics has helped put me on the right track.

I have a mathematical formula for getting the exact percentile. I'll test with the analytic functions and then post the answer.

Cheers

Andrew

Formula for calculating the percentile

Andrew Briggs, March 24, 2004 - 3:51 am UTC

Tom,

Hope your head is better this morning. This is the statement I eventually
produced. It only considers deptno = 10. I did this to simplify it for myself.

select (max(sal)-min(sal)) *
((1+(0.95*(cnt-1))) - floor(1+(0.95*(cnt-1)))) +
min(sal)
from
(
select sal,
count(*) over () cnt,
row_number() over(order by sal) rn
from scott.emp
where deptno = 10
)
where rn = floor(1+(0.95*(cnt-1)))
or rn = ceil(1+(0.95*(cnt-1)))
group by cnt

This example uses a formula that is consistent with the percentile function in
Excel, according to information I have found on the internet, although this
differs from other percentile calculation formulas.

The exact row number for a percentile is calculated using the formula below

m = 1 + p/100(n - 1)

If this gives an integer then the value at that row is the percentile.

If it gives a decimal then we have to interpolate between the value for the row
number immediately below the percentile (x) and the value at the row number
immediately above the percentile (y). The point between these values is
determined by the decimal fraction (df) for m, as defined above. e.g. if row
number is 2.9 then the decimal fraction is 0.9

These values are used in the formula below.

95th percentile = x + df(y - x)

Using the three values for deptno 10 in the emp table

1300
2450
5000

The exact row number = 1 + 0.95(3 - 1) = 2.9

Therefore the 95th percentile = 2450 + 0.9(5000 - 2450) = 4745

If anyone spots any errors in this or a better way of doing it please post an
answer

Thanks

Andrew



Tom Kyte
March 24, 2004 - 9:02 am UTC

that interpolation is the part i was missing, I could not get 4745 (percentile_cont() in 9i does...)

10G Average percentile

Bonnie, September 06, 2007 - 2:04 pm UTC

Hi Tom,

Your example was very helpful. Thank you for showing this. I'm trying to do the same thing, but instead of the min and max values, is there a way to get the average? Also, are there newer functions in 10G that allows this type of calculation?
Tom Kyte
September 11, 2007 - 8:24 am UTC

this question was about median's - not min max

so no idea what you mean....


AVG works as an aggregate
AVG works as an analytic function

it always has.

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.