Skip to Main Content
  • Questions
  • the best way to count distinct tuples

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Sokrates.

Asked: January 20, 2017 - 1:32 pm UTC

Last updated: January 26, 2017 - 1:03 pm UTC

Version: all versions

Viewed 1000+ times

You Asked

I was always afraid to ask, but probably you can tell me:
what is the best way to count distinct tuples ?

Why does just
select count(distinct a, b) from
(
 select 1 a, 1 b, 1 c from dual union all
 select 1 a, 1 b, 2 c from dual union all
 select 2 a, 3 b, 4 c from dual
)
/

gives
ORA-00909: invalid number of arguments
and
select count(distinct (a, b)) from
(
 select 1 a, 1 b, 1 c from dual union all
 select 1 a, 1 b, 2 c from dual union all
 select 2 a, 3 b, 4 c from dual
)
/

gives
ORA-00907: missing right parenthesis,
instead we have to use
select count(*) from
(
select distinct a, b
from
(
 select 1 a, 1 b, 1 c from dual union all
 select 1 a, 1 b, 2 c from dual union all
 select 2 a, 3 b, 4 c from dual
)
)
/

  COUNT(*)
----------
         2

?

Should we better use something else for example
select max(rownum) from
(
select a,b
from
(
 select 1 a, 1 b, 1 c from dual union all
 select 1 a, 1 b, 2 c from dual union all
 select 2 a, 3 b, 4 c from dual
)
group by a, b
)
/

MAX(ROWNUM)
-----------
          2

?

and Chris said...

Well:

select count(*) from
(
select distinct a, b
from
(
 select 1 a, 1 b, 1 c from dual union all
 select 1 a, 1 b, 2 c from dual union all
 select 2 a, 3 b, 4 c from dual
)
)


Is the most obvious as to your intent. There are many other ways you could write this, like:

select count(*) from (
select row_number() over (partition by a, b order by a) rn
from
(
 select 1 a, 1 b, 1 c from dual union all
 select 1 a, 1 b, 2 c from dual union all
 select 2 a, 3 b, 4 c from dual
)
)
where  rn = 1;

COUNT(*)  
2 


But these all make it harder for anyone looking at your code to figure out what it's doing. I'd stick with the "select count(*) from (select distinct ..." method.

Rating

  (3 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

one more alternative

Igor, January 20, 2017 - 8:11 pm UTC

select count(distinct a ||','|| b) from
(
select 1 a, 1 b, 1 c from dual union all
select 1 a, 1 b, 2 c from dual union all
select 2 a, 3 b, 4 c from dual
)

If either a or b contain commas, use another symbol

count distinct tuples and distinct part of them

Sokrates, January 24, 2017 - 8:34 pm UTC

ok, now I want not only count distinct tuples but also the distinct parts of them separately and group these by another column.

See
create view v(letter, day, n)
as
 select
     cast(dbms_random.string('l', 1) as varchar2(1)) as  letter
     date'2017-01-01' + trunc(dbms_random.value(0, 365)) as day
     trunc(dbms_random.value(0, 100)) as n
  from dual
  connect by level<=1e4
/


I want to group by letter and count the number of distinct days per letter, the number of distinct n's per letter and the number of distinct pairs (day, n) per letter.

select
   letter,
   count(distinct day),
   count(distinct n),
   count(distinct day, n)
from
   v
group by
   letter
order by
   letter
/

again gives the annoying ORA-00909: invalid number of arguments (why ?)
select
   letter,
   count(distinct day),
   count(distinct n),
   count(distinct (day, n))
from
   v
group by
   letter
order by
   letter
/

gives ORA-00907: missing right parenthesis (why oh why ?), so I usually use here the ugly - but working
select
   letter,
   count(distinct day),
   count(distinct n),
   count(distinct day || chr(0) || n)
from
   v
group by
   letter
order by
   letter
/


ugly because it uses implicit datatype conversion (even without explicitly specifying a format mask), but working

Question: is there any better way to do what I want ?
Bonus question: why does SQL (at least Oracle SQL) not support any of the better (looking better for me) alternatives I tried ?

Thanks for having a look !
Chris Saxon
January 26, 2017 - 1:03 pm UTC

Why is it like that? Because that's how it's implemented! As in no one's done the work to support count (distinct (c1, c2, ...)). It's not part of the ANSI standard either as far as I know.

If you feel this is something we should implement you could:

- Submit an ER for this. Including a business case stating how this wi
- Submit it to the DB ideas forum and see if you can gather support from the community:

https://community.oracle.com/community/database/database-ideas

alternative version

Racer I., January 26, 2017 - 2:16 pm UTC

create table tv(letter, day, n)
as select
cast(dbms_random.string('l', 1) as varchar2(1)) as letter,
date'2017-01-01' + trunc(dbms_random.value(0, 365)) as day,
trunc(dbms_random.value(0, 100)) as n
from dual
connect by level<=1e4

WITH
Ranks AS (
select
letter,
DENSE_RANK() OVER (PARTITION BY letter ORDER BY day) dd,
DENSE_RANK() OVER (PARTITION BY letter ORDER BY n) dn,
DENSE_RANK() OVER (PARTITION BY letter ORDER BY day, n) ddn
from tv),
CountA AS (
select letter, max(dd) dd, max(dn) d, max(ddn) ddn
from Ranks
group by letter
order by letter),
CountB AS (
select
letter,
count(distinct day) dd,
count(distinct n) dn,
count(distinct day || chr(0) || n) ddn
from tv
group by letter
order by letter)
SELECT 'A', c.* FROM CountA c
UNION
SELECT 'B', c.* FROM CountB c
ORDER BY 2, 1

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.