Skip to Main Content

Breadcrumb

May 4th

Question and Answer

Connor McDonald

Thanks for the question.

Asked: October 17, 2006 - 12:30 pm UTC

Last updated: August 30, 2022 - 1:32 am UTC

Version: 9.2.0

Viewed 10K+ times! This question is

You Asked

Are analytical functions replacement for aggregate functions? Some of the developers here are hurrying in replacing the group by aggregate functions with analytical functions just because they learnt something new :-).. Are the analytical functions faster always(compared to aggregate)?

and Tom said...

aggregates take "many rows, collapse them into one"

analytics "do not, there is no row collapsing"


the two are orthogonal concepts really.


It is not sensible to replace aggregates with analytics - I don't even know how you would do that (short of using a distinct after computing the analytic for lots of rows - which would be "a bad idea" indeed)

Rating

  (26 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Thanks, Tom

Sundar, October 17, 2006 - 1:43 pm UTC

Yes tom, right now distinct is being applied on the resultset of the analytical queries. I'll suggest them to revert back to group by wherever aggregate functions can be used. Thanks a lot.

Tom Kyte
October 17, 2006 - 1:51 pm UTC

absolutely!

No way you want to

a) compute the analytic
b) distinct out the single rows

versus

a) just group by and do the minimal work

more questions..

raajesh, October 17, 2006 - 2:56 pm UTC

For a table data like this

Dept_Name

A1
A1
A1
A2
A3
A3

insert into dept values('A1');
insert into dept values('A1');
insert into dept values('A1');
insert into dept values('A2');
insert into dept values('A3');
insert into dept values('A3');


I give a SQL like this

select Dept_Name,count(*) from dept group by Dept_Name

This would give an output like this
A1 3
A2 1
A3 2

Rather , If I execute

select distinct dept_name,counter from (select Dept_Name,count(*) over(partition by Dept_Name) as counter from dept)

This will give the same output

Now, for a very huge table which one would you prefer? Which is better?

Does the optimizer convert the SQLs to the same execution plan internally?







Tom Kyte
October 17, 2006 - 3:37 pm UTC

this should be obvious don't you thing???!?!?!?


the first query, with group by, is the only reasonable query to consider.

The second query is just plain "wrong"

"analytical vs aggregate functions",

Karl Bergerson, October 17, 2006 - 7:47 pm UTC

Perhaps someone might review the responses for proper English.

Ravi, October 18, 2006 - 3:57 am UTC

What about the amount of work done by Analytics, say if we don't mind rows to be collapsed and return all rows with a column for aggregates. Would the analytics do less work. My guess is they would do the same amount, because they need to perform aggregation anyway.

Something like say

Select sum(sal) over (partition by emp.dept) sal_per_Dept
from emp

vs

select sum(sal), emp.dept from emp group by emp.dept

???

Tom Kyte
October 18, 2006 - 8:06 am UTC

think about temp.

say you have 100 deptnos
with an average of 100 employees per deptno


which would you rather have sitting in your temp? 100 rows, or 10,000


I don't even know why we are having this discussion - it seems so blatantly obvious that

a) you use aggregation when you needs to, well, AGGREGATE
b) you use analytics when you don't want to AGGREGATE

Even if they performed IDENTICALLY - I cannot understand why we would be having this discussion -


select sum(sal), deptno from emp group by deptno;

versus

select distinct sum(sal) over (partition by deptno), deptno from emp;

it just seems obvious which is the "right" approach.


select owner, count(*) from big_table group by owner

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.01 0 0 0 0
Fetch 3 0.84 0.83 12856 14465 0 26
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 5 0.84 0.84 12856 14465 0 26

Rows Row Source Operation
------- ---------------------------------------------------
26 HASH GROUP BY (cr=14465 pr=12856 pw=0 time=845645 us)
1000000 TABLE ACCESS FULL BIG_TABLE (cr=14465 pr=12856 pw=0 time=1011134 us)
********************************************************************************
select distinct owner, count(*) over (partition by owner) from big_table

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 3 2.37 2.32 12856 14465 0 26
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 5 2.37 2.32 12856 14465 0 26

Rows Row Source Operation
------- ---------------------------------------------------
26 HASH UNIQUE (cr=14465 pr=12856 pw=0 time=2325741 us)
1000000 WINDOW SORT (cr=14465 pr=12856 pw=0 time=2709348 us)
1000000 TABLE ACCESS FULL BIG_TABLE (cr=14465 pr=12856 pw=0 time=1000114 us)





Not fully clarified

raajesh, October 18, 2006 - 4:08 am UTC

Please refer to the information on my executions.

Ansluten till:
Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options

Case 1: Less number of rows


SQL> select * from testtable;

RBTACCTNAME
--------------------------------------------------------------------------------
A1
A1
A1
A2
A3
A3

6 rader.

SQL> set autotrace traceonly statistics;
SQL> select rbtacctname,count(*) from testtable group by rbtacctname;


Statistik
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
        273  bytes sent via SQL*Net to client
        278  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          3  rows processed

SQL> select distinct rbtacctname,counter from(select rbtacctname,count(*) over (partition by rbtacct
name) as counter from testtable)
  2  /


Statistik
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
        272  bytes sent via SQL*Net to client
        278  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          3  rows processed



Case 2: Added some more rows



SQL> select rbtacctname,count(*) from testtable group by rbtacctname
  2  /

5098 rader.


Statistik
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         31  consistent gets
          0  physical reads
          0  redo size
     133063  bytes sent via SQL*Net to client
       2651  bytes received via SQL*Net from client
        341  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       5098  rows processed

SQL> select distinct rbtacctname,counter from (select rbtacctname,count(*) over (partition by rbtacc
tname) as counter from testtable)
  2  /

5098 rader.


Statistik
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         31  consistent gets
          0  physical reads
          0  redo size
     132804  bytes sent via SQL*Net to client
       2651  bytes received via SQL*Net from client
        341  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
       5098  rows processed

------------------------------------------------------------------------------------------------------------------------------------------------------

Case 3: An index on field rbtacctname



SQL> create index tes_idx on testtable(rbtacctname)
  2  /

Index är skapat.

SQL> analyze table testtable compute statistics;

Tabellen är analyserad.

SQL> select rbtacctname,count(*) from testtable group by rbtacctname
  2  /

5098 rader.


Statistik
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        369  consistent gets
          0  physical reads
          0  redo size
     132337  bytes sent via SQL*Net to client
       2651  bytes received via SQL*Net from client
        341  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       5098  rows processed

SQL> select distinct rbtacctname,counter from (select rbtacctname,count(*) over (partition by rbtacc
tname) as counter from testtable)
  2  /

5098 rader.


Statistik
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         34  consistent gets
          0  physical reads
          0  redo size
     130075  bytes sent via SQL*Net to client
       2651  bytes received via SQL*Net from client
        341  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
       5098  rows processed
------------------------------------------------------------------------------------------------------------------------------------------

In the third case Analytic functions prove to be better. Are'nt they? What is your comment?
 

Tom Kyte
October 18, 2006 - 8:11 am UTC

no they didn't, see above.


you are being mislead on the logical IO's there. The "third" case has analytics copying the data out into temp (artificially reduced the logical IOs) and doing work in temp instead of just hitting the table itself.

Ask yourself which is "better" in the following example:


big_table%ORA10GR2> set autotrace traceonly statistics;
big_table%ORA10GR2> select * from big_table;

1000000 rows selected.


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
80099 consistent gets
12615 physical reads
204 redo size
63252350 bytes sent via SQL*Net to client
733711 bytes received via SQL*Net from client
66668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1000000 rows processed

big_table%ORA10GR2> select * from big_table order by 1,2,3;

1000000 rows selected.


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
14465 consistent gets
11934 physical reads
0 redo size
63252350 bytes sent via SQL*Net to client
733711 bytes received via SQL*Net from client
66668 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1000000 rows processed

big_table%ORA10GR2> set autotrace off



If you were to increase your array fetch size (set arraysize 100), you would see a measurable decrease in logical IO for the correct query (the one with aggregates)

raajesh, October 18, 2006 - 9:25 am UTC

Tom,

The "Consistent Gets" increased only after adding an index to the table. Before that, it was the same for both. The execution plan is given below

Körschema
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=30 Card=5098 Bytes
=81568)

1 0 SORT (GROUP BY NOSORT) (Cost=30 Card=5098 Bytes=81568)
2 1 INDEX (FULL SCAN) OF 'TES_IDX' (INDEX) (Cost=30 Card=748
2 Bytes=119712)


I'm not giving the execution plan for Analytics case.

As "indexes are not always good", is it like group by on an indexed field will cause a considerable decrease in performance?

Also, why did the optimizer chose to go with the index though it could have used the one without index?

Oracle Version

Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options

Tom Kyte
October 18, 2006 - 9:38 am UTC

right, look at the sorts memory.

did you get my point about the sort - its reads the blocks into temp and then works on them, versus the other query that didn't need to read blocks into temp, it got them from the buffer cache.

tell you what, run your third case with


set arraysize 500

you might be surprised :)


but I'm still surprised we are having a discussion on this at all. If you want to aggregate - then AGGREGATE. Don't use analytics and distinct, that is just so very wrong.

raajesh, October 18, 2006 - 10:11 am UTC

Almost clear. I gave the array size as 500.

Statistik
----------------------------------------------------------
0 recursive calls
0 db block gets
41 consistent gets
30 physical reads
0 redo size
115817 bytes sent via SQL*Net to client
359 bytes received via SQL*Net from client
12 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
5098 rows processed

The physical reads shot up now :(



Tom Kyte
October 18, 2006 - 11:04 am UTC

so? run it again, they'll go down. physical IO goes up, it goes down, it goes up, it goes down - depending on what is in the cache.


"In general, a goal to have tuning queries is logical IO reduction"

However, reducing logical IO by forcing a sort into TEMP isn't necessarily a good idead, that is what you did with your inefficient analytic query + distinct.

if you look at my example above on big_table - you'll see the logical IO went down (because of the forced sort into temp) but the CPU went way way up (because of the forced sort into temp)

raajesh, October 18, 2006 - 11:28 am UTC

That was a great piece of patience and explanation. Thanks Tom !!

Starting a discussion again !!! Need to understand better

Vinayak, October 27, 2006 - 5:58 am UTC

Hi Tom,
I am posting two versions of a query one with aggragates and other with analytics (without distinct but with row_number)

Aggregate Query
---------------

SELECT p.employee_id,v.display_name,
ccy_code unit,
SUM (cp.value_usd) display_value,
MIN (cp.price_date) value_date
FROM
tab1 cp, --840696 rows
vehicle v, -- 2046 rows
plan pl, -- 1613 rows
tab2 p -- 308418 rows
WHERE cp.key = v.key
AND cp.pkey = p.pkey
AND cp.pl_key = pl.pl_key
AND pl.pp_name != 'Open System'
AND p.employee_id NOT IN ('-3','-4')
AND p.flag = 'N'
AND cp.pl_key NOT IN (-3, -4)
AND (cp.key NOT IN (-3, -4) OR
(v.display_name = 'Unix' AND v.pk_id IN (218, 325, 1538, 1539)))
GROUP BY p.employee_id,v.display_name
ccy_code
HAVING SUM (cp.value_usd) != 0

Analytic Query
---------------

select employee_id,ccy_code,display_name,display_value,value_date FROM(
select employee_id,ccy_code,group_name display_name,
SUM (value_usd) OVER(partition by employee_id,GROUP_NAME,ccy_code) display_value ,
min (price_date) OVER(partition by employee_id,GROUP_NAME,ccy_code) value_date,
row_number() over(partition by employee_id,GROUP_NAME,ccy_code ORDER BY decode(employee_id,NULL,1,0)) rn
from(
SELECT p.employee_id,v.display_name,
ccy_code unit,
cp.value_usd display_value,
cp.price_date value_date
FROM
tab1 cp, --840696 rows
vehicle v, -- 2046 rows
plan pl, -- 1613 rows
tab2 p -- 308418 rows
WHERE cp.key = v.key
AND cp.pkey = p.pkey
AND cp.pl_key = pl.pl_key
AND pl.pp_name != 'Open System'
AND p.employee_id NOT IN ('-3','-4')
AND p.flag = 'N'
AND cp.pl_key NOT IN (-3, -4)
AND (cp.key NOT IN (-3, -4) OR
(v.display_name = 'Unix' AND v.pk_id IN (218, 325, 1538, 1539)))
)) where rn=1 and display_value!=0

But the problem is both queries are returning differnt rows of rows although the normal query (without aggregate/analytic function) has the same number of records.
Aggregate returns 91943 records whereas analytic returns 93556 records.

What is wrong there ?

Tom, in queries like this how one should decide what approach to take : Aggregate or Analytic

Are there any guidelines for this.Please help me understand.
Also how to make text bold while posting ??

thanks


Tom Kyte
October 27, 2006 - 7:52 am UTC

I cannot reverse engineer your queries on the fly to explain why they are different queries.

do not use analytics when aggregation is what you want to do, that is the bottom line

Still Confused!!!!!!!!!!!!!!

Nikhil, November 21, 2006 - 8:02 am UTC

I have gone thru entire set of discussion,but i still hav some doubts.

Here i do hav two sets of query.

1.SELECT cp.period_id
FROM cn_periods cp, cn_srp_period_quotas cspq, cn_quotas cq, cn_srp_plan_assigns cspa
WHERE cspq.period_id = cp.period_id
AND cspq.salesrep_id = 100001539
AND cq.CREDIT_TYPE_ID = -1000
AND cspq.quota_id = cq.quota_id
AND cspq.srp_plan_assign_id = cspa.srp_plan_assign_id
AND cp.period_id BETWEEN (SELECT DECODE(cq.attribute1,'Q',MIN(cp2.period_id),2007005)
FROM cn_periods cp1, cn_periods cp2
WHERE cp1.period_id = 2007005
AND cp1.quarter_num = cp2.quarter_num
AND cp1.period_year = cp2.period_year)

Statistics
----------------------------------------------------------
108 recursive calls
0 db block gets
2296 consistent gets
14 physical reads
0 redo size
307 bytes sent via SQL*Net to client
496 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
4 sorts (memory)
0 sorts (disk)
1 rows processed

2.SELECT distinct cp.period_id
FROM cn_periods cp, cn_srp_period_quotas cspq, cn_quotas cq, cn_srp_plan_assigns cspa
WHERE cspq.period_id = cp.period_id
AND cspq.salesrep_id = 100001539
AND cq.CREDIT_TYPE_ID = -1000
AND cspq.quota_id = cq.quota_id
AND cspq.srp_plan_assign_id = cspa.srp_plan_assign_id
AND cp.period_id BETWEEN (SELECT DECODE(cq.attribute1,'Q',MIN(cp2.period_id),2007005)
FROM cn_periods cp1, cn_periods cp2
WHERE cp1.period_id = 2007005
AND cp1.quarter_num = cp2.quarter_num
AND cp1.period_year = cp2.period_year)
AND 2007005

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
2257 consistent gets
6 physical reads
0 redo size
307 bytes sent via SQL*Net to client
496 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
4 sorts (memory)
0 sorts (disk)
1 rows processed

i feel i still need some clarification on it, i mean which one to use in order to improve performance.

i may sound little bit naive,but need ur inputs

Looking forward to hear from you.




Tom Kyte
November 22, 2006 - 3:30 pm UTC

not really sure what you could possibly be looking for.



analytic

Rafael Almeida, November 22, 2006 - 3:41 pm UTC

I don't advise to move

Tom Kyte
November 24, 2006 - 12:50 pm UTC

but if you don't move, how do you accomplish anything?

don't you get stiff and sore after a while?

The previous post

Shivdeep Modi, December 15, 2006 - 8:49 am UTC

Apart from the contents of the of thread. The last post certainly is amusing. Things like this really brighten up your day :)

A reader, July 09, 2010 - 10:48 am UTC

""aggregates take "many rows, collapse them into one"
analytics "do not, there is no row collapsing"
the two are orthogonal concepts really. ""

Sir,
1.Can you please explain about what is 'row collapsing'?
if possible with one example :

2. can you please explain when to use aggregate and when to use analytics functions ? if a work can be done by both - which is best to vote for ?

3.What are drawbacks of Analytics ?

Thanks
Tom Kyte
July 09, 2010 - 10:53 am UTC

select count(*) from t;

that is an aggregate, it will take zero, one or many rows of T and return a single row.


select deptno, sum(sal) from emp group by deptno;

that will likewise take the rows of emp and collapse many of them into a single row - and return just a single row for each department.


2) use aggregates to get aggregates. Use analytics when you need the details.

Eg: I want to report on the salaries by deptno -

that would be "select deptno, sum(sal) from emp group by deptno"

that would aggregate and return three rows from the original 14 rows in scott.emp


Eg: I want to report on each employees salary and show it as a percentage of the total salaries paid and the total salaries paid by deptno

select empno, ename, sal, ratio_to_report(sal) over (), ratio_to_report(sal) over (partition by deptno) from emp;


that would not aggregate, that would return all 14 rows from emp.



Your QUESTION - what you are trying to ANSWER - will dictate whether you use aggregation or analytics or neither.

3) the same as the drawbacks for select itself. I don't really know how to answer that - analytics, when used to answer a problem directly - thus avoiding multiple self joins and other overheads - are great. If you use them when you shouldn't have, did not need to have used them - they are not great as they represent extra work you shouldn't have done.

row collapsing, etc.

Duke Ganote, July 09, 2010 - 1:46 pm UTC

Exactly the questions I had starting to understand analytics back in 2004 when I bought and devoured Tom's "Expert Oracle One-on-One" book.

Examples like the following helped me understand:

select count(*) as cnt -- "many rows collapsed into one"
from (
select level as emp#
from dual
connect by level < 6
) EMP;
CNT
---------------
5

select count(*)
OVER () as cnt -- "no row collapsing"
, emp#
from (
select level as emp#
from dual
connect by level < 6
) EMP;
CNT EMP#
--------------- ----------
5 1
5 2
5 3
5 4
5 5

"Which is better?" They're different. Is an axe better than a jigsaw? They both cut wood, but "better" depends on what you're trying to accomplish.

Do not use analytics in materialized views?

Mihail Bratu, July 13, 2010 - 4:08 am UTC

Hello Tom,
Once, I saw an implementation with analytics in a materialized view. I think that is a unfortunate combination. In this way we lose the power of query rewrite feature. Materialized views naturally work with aggregates.
What is your oppinion?

Regards
Tom Kyte
July 19, 2010 - 11:12 am UTC

huh??? what????

you are mistaken.


ops$tkyte%ORA11GR2> create table t
  2  as
  3  select * from all_objects
  4  /

Table created.

ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> alter table t add constraint t_pk primary key(object_id);

Table altered.

ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> exec dbms_stats.gather_table_stats( user, 'T' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> create materialized view mv
  2  enable query rewrite
  3  as
  4  select *
  5    from (
  6  select owner, object_type, object_name, created, object_id,
  7         row_number() over (partition by owner order by created , object_id) rn
  8    from t
  9         )
 10   where rn <= 10;

Materialized view created.

ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> exec dbms_stats.gather_table_stats( user, 'MV' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> set autotrace traceonly explain
ops$tkyte%ORA11GR2> select *
  2    from (
  3  select owner, object_type, object_name, created, object_id,
  4         row_number() over (partition by owner order by created , object_id) rn
  5    from t
  6         )
  7   where rn <= 3;

Execution Plan
----------------------------------------------------------
Plan hash value: 572630632

-------------------------------------------------------------------------------------
| Id  | Operation                    | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |      |   100 |  4500 |     3   (0)| 00:00:01 |
|*  1 |  MAT_VIEW REWRITE ACCESS FULL| MV   |   100 |  4500 |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("MV"."RN"<=3)

ops$tkyte%ORA11GR2> set autotrace off

Do not use analytics in materialized views?

Mihail Bratu, July 20, 2010 - 10:23 am UTC

Hello Tom,
If you understood, from my above post, that I doubt that the query rewrite works fine in the context of using analytic functions in materialized views, is only my fault, caused by my poor English!
I will attempt to be more specific:
Suppose we copy the Scott's emp table; 

create table emp
as
select * from scott.emp;

We make some preliminaries,

create unique index e__empno__ux on emp
  (empno asc
  )
/
alter table emp
add constraint e__empno__pk primary key (empno)
/
create index e__deptno__ix on emp
  (deptno asc
  )
/
exec dbms_stats.gather_table_stats( user, 'EMP' );

Now, let's determine the employees who have the salary less or equal with the average of their department!
Basically there are two ways, one using analytic and other using aggregate functions:

select ename
   ,sal
from (
    select ename
       ,sal
       ,avg(sal) over(partition by deptno) asal
    from emp
    )
where sal <= asal
order by 1
/

select emp.ename
   ,emp.sal
from emp
  ,(select deptno
       ,avg(sal) asal
    from emp
    group by deptno
   ) dept
where emp.deptno = dept.deptno
    and emp.sal <= dept.asal
order by 1
/
We all agreed with the assertion that the analytic solution must be the choice of the two above. "Analytics rock and roll".

It is the time to add a new dimension in this test case, supposing that we need, for some obscure reasons, to implement a fast refresh materialized view. 

create
materialized view log on emp
/

Corresponding of the issues above we have a choice between analytic and aggregate:

create
materialized view mv_analytic
build immediate 
refresh on commit with primary key
enable query rewrite
as
select ename
   ,sal
   ,avg(sal) over(partition by deptno) asal
from emp
/

create
materialized view mv_aggregate
build immediate 
using index 
refresh on commit with primary key
enable query rewrite
as
select deptno
   ,count(sal)
   ,avg(sal)
from emp
group by deptno
/

In this case I think that the analytic solution isn't anymore the best! Probably analytic solution entails more work to do in the fast refresh process (?) and less versatility for other queries (less query rewrite options).
That was my point that I wanted to express in the previous post.

mihai@ORCL11G> select ename
  2     ,sal
  3  from (
  4      select ename
  5         ,sal
  6         ,avg(sal) over(partition by deptno) asal
  7      from emp
  8      )
  9  where sal <= asal
 10  order by 1
 11  /
Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 1038994406

---------------------------------------------------------------------------------------------
| Id  | Operation                     | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |             |    11 |   363 |     4  (25)| 00:00:01 |
|   1 |  SORT ORDER BY                |             |    11 |   363 |     4  (25)| 00:00:01 |
|*  2 |   MAT_VIEW REWRITE ACCESS FULL| MV_ANALYTIC |    11 |   363 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("MV_ANALYTIC"."SAL"<="MV_ANALYTIC"."ASAL")

Note
-----
   - dynamic sampling used for this statement

mihai@ORCL11G>
mihai@ORCL11G> select emp.ename
  2     ,emp.sal
  3  from emp
  4    ,(select deptno
  5         ,avg(sal) asal
  6      from emp
  7      group by deptno
  8     ) dept
  9  where emp.deptno = dept.deptno
 10      and emp.sal <= dept.asal
 11  order by 1
 12  /
Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 82276337

--------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |               |     1 |    39 |     7  (29)| 00:00:01 |
|   1 |  SORT ORDER BY                   |               |     1 |    39 |     7  (29)| 00:00:01 |
|   2 |   MERGE JOIN                     |               |     1 |    39 |     6  (17)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID   | EMP           |    14 |   182 |     2   (0)| 00:00:01 |
|   4 |     INDEX FULL SCAN              | E__DEPTNO__IX |    14 |       |     1   (0)| 00:00:01 |
|*  5 |    FILTER                        |               |       |       |            |          |
|*  6 |     SORT JOIN                    |               |     3 |    78 |     4  (25)| 00:00:01 |
|   7 |      MAT_VIEW REWRITE ACCESS FULL| MV_AGGREGATE  |     3 |    78 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - filter("EMP"."SAL"<="MV_AGGREGATE"."AVG(SAL)")
   6 - access("EMP"."DEPTNO"="MV_AGGREGATE"."DEPTNO")
       filter("EMP"."DEPTNO"="MV_AGGREGATE"."DEPTNO")

Note
-----
   - dynamic sampling used for this statement

mihai@ORCL11G>

I'll appreciate your comments on this issue.

Regards

Tom Kyte
July 23, 2010 - 7:55 am UTC

it is true that incremental refreshes and analytics do not mix (for good reason, there is not a mapping to a single row for a given change to be applied to).

However that does not mean they are "not best", it just means you have to consider whether a full refresh is possible - it is a trade off.

RE: aggregates + analytics

Duke Ganote, October 29, 2011 - 8:53 am UTC

And of course they play nicely together, as long as you understand the logic order of operation, e.g. analytics last:

select deptno
, count(*) as emp_per_dept
, sum(count(*)) over () as emp_across_depts
from emp
group by deptno
order by deptno
/
deptno emp_per emp_across
------ ------- ----------
10 3 14
20 5 14
30 6 14

RE: aggregates + analytics

Duke Ganote, December 05, 2011 - 11:51 am UTC

As says the docs:

"Analytic functions are the last set of operations performed in a query except for the final ORDER BY clause. All joins and all WHERE, GROUP BY, and HAVING clauses are completed before the analytic functions are processed. Therefore, analytic functions can appear only in the select list or ORDER BY clause."
http://docs.oracle.com/cd/E11882_01/server.112/e17118/functions004.htm

Applause for TOM

chandra prakash, September 21, 2012 - 7:27 am UTC

Tom however I am not an expert but your way of explanations are great

Padmaja Pothumudi, June 17, 2014 - 7:26 am UTC

Hi Tom,
We are using aggregate function group by to check if sum of credit and debit amounts are balanced or not. What would be best approach w.r.t performance using group by or over partition.
here temp1 table is hash partitioned on temp_id,temp_date

Case 1:
SELECT 1 FROM DUAL WHERE EXISTS (
SELECT 1 FROM (
SELECT SUM(REF_AMT * DECODE(PART_TEMP_TYPE, 'D', -1, 1)) over( partition by REF_CRNCY_CODE) sm FROM TEMP1 WHERE BANKID = :1
AND TEMP_DATE = TO_DATE( :2 , 'DD-MM-YYYY HH24:MI:SS') AND TEMP_ID = :3 AND NVL(DEL_FLG, 'N') != 'Y' )
WHERE SM!=0);


PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2047113583

-------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 4 (25)| 00:00:01 | | |
|* 1 | FILTER | | | | | | | |
| 2 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 | | |
|* 3 | VIEW | | 1 | 13 | 2 (50)| 00:00:01 | | |
| 4 | WINDOW SORT | | 1 | 33 | 2 (50)| 00:00:01 | | |
| 5 | PARTITION HASH SINGLE | | 1 | 33 | 1 (0)| 00:00:01 | KEY | KEY |
|* 6 | TABLE ACCESS BY LOCAL INDEX ROWID| TEMP1 | 1 | 33 | 1 (0)| 00:00:01 | KEY | KEY |
|* 7 | INDEX RANGE SCAN | IDX_TEMP1 | 1 | | 1 (0)| 00:00:01 | KEY | KEY |
-------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter( EXISTS (SELECT 0 FROM (SELECT SUM("REF_AMT"*DECODE("PART_TEMP_TYPE",'D',(-1),1)) OVER ( PARTITION BY
"REF_CRNCY_CODE") "SM" FROM TEMP1 WHERE "TEMP_DATE"=TO_DATE(:2,'DD-MM-YYYY HH24:MI:SS')
AND "TEMP_ID"=:3 AND NVL("DEL_FLG",'N')<>'Y' AND "BANKID"=:1) "from$_subquery$_002" WHERE "SM"<>0))
3 - filter("SM"<>0)
6 - filter(NVL("DEL_FLG",'N')<>'Y')
7 - access("TEMP_ID"=:3 AND "TEMP_DATE"=TO_DATE(:2,'DD-MM-YYYY HH24:MI:SS') AND BANKID"=:1)
filter("BANKID"=:1)

25 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3013799171

---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 | 16336 | 19 (6)| 00:00:01 |
| 1 | COLLECTION ITERATOR PICKLER FETCH| DISPLAY | 8168 | 16336 | 19 (6)| 00:00:01 |
---------------------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
19 recursive calls
12 db block gets
55 consistent gets
0 physical reads
0 redo size
3170 bytes sent via SQL*Net to client
535 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
25 rows processed


case 2:

SELECT 1 FROM DUAL WHERE EXISTS (
SELECT 1 FROM (
SELECT SUM(REF_AMT * DECODE(PART_TEMP_TYPE, 'D', -1, 1)) over( partition by REF_CRNCY_CODE) sm FROM TEMP1 WHERE BANKID = :1
AND TEMP_DATE = TO_DATE( :2 , 'DD-MM-YYYY HH24:MI:SS') AND TEMP_ID = :3 AND NVL(DEL_FLG, 'N') != 'Y' )
WHERE SM!=0);


PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2047113583

-------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 4 (25)| 00:00:01 | | |
|* 1 | FILTER | | | | | | | |
| 2 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 | | |
|* 3 | VIEW | | 1 | 13 | 2 (50)| 00:00:01 | | |
| 4 | WINDOW SORT | | 1 | 33 | 2 (50)| 00:00:01 | | |
| 5 | PARTITION HASH SINGLE | | 1 | 33 | 1 (0)| 00:00:01 | KEY | KEY |
|* 6 | TABLE ACCESS BY LOCAL INDEX ROWID| TEMP1 | 1 | 33 | 1 (0)| 00:00:01 | KEY | KEY |
|* 7 | INDEX RANGE SCAN | IDX_TEMP1 | 1 | | 1 (0)| 00:00:01 | KEY | KEY |
-------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter( EXISTS (SELECT 0 FROM (SELECT SUM("REF_AMT"*DECODE("PART_TEMP_TYPE",'D',(-1),1)) OVER ( PARTITION BY
"REF_CRNCY_CODE") "SM" FROM TEMP1 WHERE "TEMP_DATE"=TO_DATE(:2,'DD-MM-YYYY HH24:MI:SS')
AND "TEMP_ID"=:3 AND NVL("DEL_FLG",'N')<>'Y' AND "BANKID"=:1) "from$_subquery$_002" WHERE "SM"<>0))
3 - filter("SM"<>0)
6 - filter(NVL("DEL_FLG",'N')<>'Y')
7 - access("TEMP_ID"=:3 AND "TEMP_DATE"=TO_DATE(:2,'DD-MM-YYYY HH24:MI:SS') AND BANKID"=:1)
filter("BANKID"=:1)

25 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3013799171

---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 | 16336 | 19 (6)| 00:00:01 |
| 1 | COLLECTION ITERATOR PICKLER FETCH| DISPLAY | 8168 | 16336 | 19 (6)| 00:00:01 |
---------------------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
19 recursive calls
12 db block gets
55 consistent gets
0 physical reads
0 redo size
3170 bytes sent via SQL*Net to client
535 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
25 rows processed

Not Quite...

N.B., October 28, 2015 - 5:49 pm UTC

in the following case we do need analytic over aggregate:
(select all managers having at least 10 employees under him, and all those employees have at least 5 jobs assigned to them)
</>select distinct manager_name from (
select m.manager_name, count(j.description) job_count, count(me.employee_name) employee_count from manager m
left join manager_employee me on me.manager_id = m.manager_id
left join jobs j on j.employee_id = me.employee_id
)
where employee_count >= 10
and job_count >= 5</>.

if you have more than one count condition it is easier to make it a analytic function.

Seems documentation bug or ambiguity

Asim, August 01, 2022 - 1:41 am UTC


The 21c documentation here

https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlrf/Aggregate-Functions.html#GUID-62BE676B-AF18-4E63-BD14-25206FEA0848

says

......In the listing of aggregate functions at the end of this section, the functions that allow the windowing_clause are followed by an asterisk (*).......

But when I see the SUM, AVG, and many other functions, they are not followed by *, whereas they do allow windowing clause.

Moreover CORR allows windowing but not followed by *

Moreover CORR_ dosnt allow window clause but it is followed by *, later on I realized that in case of CORR_* star denotes K or S.

So is there any documentation bug?

Is there any documentation ambiguity in case of CORR_*.?


Chris Saxon
August 01, 2022 - 4:03 pm UTC

Hmmm, that is confusing. I'm not sure what's going on there, as those functions don't appear to support a window!

I'll follow up internally.

Waiting

Asim, August 02, 2022 - 10:37 am UTC

Please let me know as soon as you get the information, as I am waiting to see if it is really a documentation error or am I misunderstanding some thing. Because its hard to believe that it maybe a documentation error.
Chris Saxon
August 02, 2022 - 3:58 pm UTC

It's a doc error - we're removing the line about the windowing_clause.

Because its hard to believe that it maybe a documentation error.

The Oracle Database docs are huge! The chances of there being zero errors in any body of text that size are tiny. Thanks for reporting this.

The docs include a thumbs up/down link at the bottom of each page. If you spot issues you can raise them in the popup you get when clicking these.

In which version

Asim, August 04, 2022 - 1:48 am UTC

So will this correction appear in Oracle's next versions document ie 22c ?

Or the current 21c doc will be edited/updated?




Connor McDonald
August 04, 2022 - 11:54 am UTC

We generally attempt to back port fixes to any currently supported versions.

User defined Analytic functions

Asim, August 10, 2022 - 8:25 pm UTC


I have seen in other discussions Tom created a User defined aggregatw function STRAGG

Can we create user defined analytical functions in Oracle? if no, any possibility in future versions?

Connor McDonald
August 11, 2022 - 4:34 am UTC

Livesql issue 19c or 21c

Asim, August 27, 2022 - 7:58 pm UTC


ANY_VALUE aggregate function
and
GROUPS 3 preceding Keyword inside the OVER

are both 21c features. But livesql which is on 19c,
is runing the ANY_VALUE correctly but giving error on GROUPS 3 preceding keyword inside OVER.

So is livesql on 19c or 21c? or somewere between 19c and 21c?



Connor McDonald
August 30, 2022 - 1:32 am UTC

Several 21c features have been backported to 19c, eg

SQL> select any_value(empno)
  2  from emp;

ANY_VALUE(EMPNO)
----------------
            7369

SQL> select banner_full from v$version;

BANNER_FULL
------------------------------------------------------------------------
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.15.0.0.0


LiveSQL is 19c currently.