Skip to Main Content
  • Questions
  • SUM OVER PARTITION BY with a DISTINCT in the SELECT clause gives unexpected results

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Pavankumar.

Asked: February 28, 2017 - 4:33 am UTC

Last updated: March 06, 2017 - 1:55 pm UTC

Version: 12C

Viewed 10K+ times! This question is

You Asked

Hi Tom,
We see a very strange issue with SUM OVER PARTITION BY statements when there is DISTINCT clause in the query. I can illustrate the issue with an example as below:
Assume there is a stock table with the structure:

create table TEST_STOCK (item_id number, location_id number, QUANTITY number);

Please note that the table can have duplicate data for a given item_id and location_id combination.

We have to show the total quantity for a given item in a location in our UI. We have the following SQL (legacy code )in our code :

SELECT DISTINCT item_id , org_id,
SUM(quantity) OVER (PARTITION BY org_id,item_id) as sum_quantity
FROM test_stock ts
WHERE item_id = :ITEM_ID
AND org_id = :ORG_ID;

This query works fine. If we pass a value of 1 and 1 for the bind variables (for example), the sum_quantity column returns a value of 10.

At run time, we will include a dynamic EXISTS clause to the above query to filter out some results. Sine the EXISTS clause gets added on top of the existing criteria, we expect that there should be no increment in the sum_quantity column of the result. But we observed that the sum_quantity column gets doubled, tripled or even more value depending on the EXISTS clause. For example, let us consider that the EXISTS clause is

EXISTS (SELECT 1 from EMP_TEST WHERE TS.item_id in (select 1 from DUAL union select 2 from DUAL))

With the above EXISTS clause, the value of SUM_QUANTITY column in multiplied by the number of records in EMP_TEST table. So, if the number of reords in EMP_TEST table is 6, then the column shows a value of 60.

If we remove the DISTINCT clause from the SELECT clause, then the results look fine.

Can you please let us know what we are missing here ?

DMLs into test_stock :

insert into TEST_STOCK values (1,1,10)
insert into TEST_STOCK values (1,2,9)
insert into TEST_STOCK values (2,1,10)
insert into TEST_STOCK values (2,2,9)

DDL and DMLs of EMP_TEST:

create table EMP_TEST (EMP_ID number, DEPT_ID number);
insert into EMP_TEST values (1,1)
insert into EMP_TEST values (2,1);
insert into EMP_TEST values (3,2);
INSERT INTO EMP_TEST VALUES (4,2);

Thanks,
Pavan.


and Chris said...

Hmmm, so it does!

create table TEST_STOCK (item_id number, org_id number, QUANTITY number); 
create table EMP_TEST (EMP_ID number, DEPT_ID number);

insert into TEST_STOCK values (1,1,10);
insert into TEST_STOCK values (1,2,9);
insert into TEST_STOCK values (2,1,10);
insert into TEST_STOCK values (2,2,9);

insert into EMP_TEST values (1,1);
insert into EMP_TEST values (2,1);
insert into EMP_TEST values (3,2);
INSERT INTO EMP_TEST VALUES (4,2);
commit;

var item_id number;
var org_id number;

exec :item_id := 1;
exec :org_id := 1;

SELECT DISTINCT item_id , org_id,
   SUM(quantity) OVER (PARTITION BY org_id,item_id) as sum_quantity
FROM test_stock ts
WHERE item_id = :ITEM_ID
AND org_id = :ORG_ID;

ITEM_ID  ORG_ID  SUM_QUANTITY  
1        1       10  

SELECT 
   item_id , org_id,
   SUM(quantity) OVER (PARTITION BY org_id,item_id) as sum_quantity
FROM test_stock ts
WHERE item_id = :ITEM_ID
AND org_id = :ORG_ID
and  EXISTS (
  SELECT 1 from EMP_TEST WHERE TS.item_id in (select 1 from DUAL union select 2 from DUAL)
);

ITEM_ID  ORG_ID  SUM_QUANTITY  
1        1       10 

SELECT 
   distinct
   item_id , org_id,
   SUM(quantity) OVER (PARTITION BY org_id,item_id) as sum_quantity
FROM test_stock ts
WHERE item_id = :ITEM_ID
AND org_id = :ORG_ID
and  EXISTS (
  SELECT 1 from emp_test WHERE TS.item_id in (select 1 from DUAL union select 2 from DUAL)
);

ITEM_ID  ORG_ID  SUM_QUANTITY  
1        1       40  


This appears to be related to Bug 8714580 - Wrong results from DISTINCT query with unnest of ANY subquery (Doc ID 8714580.8). This is fixed in 12.1.

But there's an important wrinkle in your SQL:

You're not joining emp_test in the subquery!

SELECT 1 from emp_test WHERE TS.item_id in 


Refers to item_id in the main table, test_stock. There's no reference to emp_test columns! So I'm struggling to understand what purpose your exists clause serves...

In any case, there are a couple of workaround available:

- Replace emp_test in the subquery with a single row table such as dual
- Put the /*+ no_unnest */ hint in the exists clause, as suggested in the MOS note:

SELECT 
   distinct
   item_id , org_id,
   SUM(quantity) OVER (PARTITION BY org_id,item_id) as sum_quantity
FROM test_stock ts
WHERE item_id = :ITEM_ID
AND org_id = :ORG_ID
and  EXISTS (
  SELECT 1 from dual WHERE TS.item_id in (select 1 from DUAL union select 2 from DUAL)
);

ITEM_ID  ORG_ID  SUM_QUANTITY  
1        1       10   

SELECT 
   distinct
   item_id , org_id,
   SUM(quantity) OVER (PARTITION BY org_id,item_id) as sum_quantity
FROM test_stock ts
WHERE item_id = :ITEM_ID
AND org_id = :ORG_ID
and  EXISTS (
  SELECT /*+ no_unnest */1 from emp_test WHERE TS.item_id in (select 1 from DUAL union select 2 from DUAL)
);         

ITEM_ID  ORG_ID  SUM_QUANTITY  
1        1       10   

Rating

  (2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Thanks a lot Chris

A reader, March 03, 2017 - 1:45 pm UTC

Thanks a lot Chris. This information is very much useful.
Our query is more complex. I tried to reproduce the issue with simple query and hence the "meaningless where clause".
Chris Saxon
March 03, 2017 - 3:30 pm UTC

FYI, I've raised this bug internally.

Hey Chris. I also raised one with the number 25644910

A reader, March 03, 2017 - 6:29 pm UTC

Hey Chris. I also raised one with the number 25644910. Shall we close one of ours as a DUP ?
Thanks,
Pavan.
Chris Saxon
March 06, 2017 - 1:55 pm UTC

Keep yours. I'll close mine.

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.