Skip to Main Content
  • Questions
  • How to Use Count(Distinct) for Oracle Analytic View Aggregations

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Ross.

Asked: December 09, 2021 - 10:06 pm UTC

Last updated: September 06, 2022 - 1:23 am UTC

Version: 19c

Viewed 1000+ times

You Asked

Hi.
I'm using Oracle Analytic Views (from 19c) as defined here: https://docs.oracle.com/en/database/oracle/oracle-database/19/dwhsg/overview-analytic-views.html

I can create measures with simple aggregations (eg: SUM, COUNT, MIN), but I can't create a measure that collates COUNT(DISTINCT). I've tried both calculated and basic measures, without success. So,

How do I define a measure that is equivalent to aggregating by COUNT(DISTINCT(x)) using Oracle Analytic Views?

Any help you can provide would be greatly appreciated!
Some stripped-down, example SQL to show my issue is (which I can run, but not share, in livesql.oracle.com):

create table sales_fact (
    CUSTOMER_ID VARCHAR2(50 CHAR),
    COUNTRY VARCHAR2(50 CHAR),
    PRODUCT_ID VARCHAR2(50 CHAR),
    ORDER_ID VARCHAR2(50 CHAR),
    MONTH_ID VARCHAR2(30 CHAR)
    );

insert into sales_fact(CUSTOMER_ID, COUNTRY, PRODUCT_ID, ORDER_ID, MONTH_ID) values ('A', 'France', '1', 'A1', 'Apr-11');
insert into sales_fact(CUSTOMER_ID, COUNTRY, PRODUCT_ID, ORDER_ID, MONTH_ID) values ('A', 'France', '2', 'A1', 'Apr-11');
insert into sales_fact(CUSTOMER_ID, COUNTRY, PRODUCT_ID, ORDER_ID, MONTH_ID) values ('A', 'France', '1', 'A2', 'Jun-11');
insert into sales_fact(CUSTOMER_ID, COUNTRY, PRODUCT_ID, ORDER_ID, MONTH_ID) values ('B', 'England', '2', 'B1', 'Jul-11');
insert into sales_fact(CUSTOMER_ID, COUNTRY, PRODUCT_ID, ORDER_ID, MONTH_ID) values ('C', 'Germany', '2', 'C1', 'Aug-11');
insert into sales_fact(CUSTOMER_ID, COUNTRY, PRODUCT_ID, ORDER_ID, MONTH_ID) values ('D', 'Germany', '3', 'D1', 'Apr-12');

create table time_dim (
    MONTH_ID VARCHAR2(30),
    MONTH_NAME VARCHAR2(40),
    YEAR_ID VARCHAR2(30),
    YEAR_NAME VARCHAR2(40)
);
insert into time_dim(MONTH_ID, MONTH_NAME, YEAR_ID, YEAR_NAME) values ('Apr-11', 'April', '11', 'CY2011');
insert into time_dim(MONTH_ID, MONTH_NAME, YEAR_ID, YEAR_NAME) values ('Jun-11', 'June', '11', 'CY2011');
insert into time_dim(MONTH_ID, MONTH_NAME, YEAR_ID, YEAR_NAME) values ('Jul-11', 'July', '11', 'CY2011');
insert into time_dim(MONTH_ID, MONTH_NAME, YEAR_ID, YEAR_NAME) values ('Aug-11', 'August', '11', 'CY2011');
insert into time_dim(MONTH_ID, MONTH_NAME, YEAR_ID, YEAR_NAME) values ('Apr-12', 'April', '12', 'CY2012');

CREATE OR REPLACE ATTRIBUTE DIMENSION time_attr_dim
USING time_dim
ATTRIBUTES
 (year_id,
  month_id)
LEVEL MONTH
  KEY month_id
  DETERMINES (year_id)
LEVEL YEAR
  KEY year_id;

CREATE OR REPLACE HIERARCHY time_hier
USING time_attr_dim
 (month  CHILD OF year);

CREATE OR REPLACE ANALYTIC VIEW sales_av
USING sales_fact
DIMENSION BY
  (time_attr_dim
    KEY month_id REFERENCES month_id
    HIERARCHIES (time_hier DEFAULT)
  )
MEASURES
 (salesCount FACT ORDER_ID AGGREGATE BY COUNT,
  orderCount FACT ORDER_ID AGGREGATE BY COUNT -- How to make this count(distinct)?
 )
;

-- 2011 should have a salesCount = 5 and orderCount = 4 (if it was distinct)
SELECT YEAR_ID, salesCount, orderCount
FROM sales_av HIERARCHIES(time_hier)
WHERE time_hier.level_name = 'YEAR'
ORDER BY time_hier.hier_order;

and Chris said...

Try defining it as COUNT DISTINCT and you'll find:

CREATE OR REPLACE ANALYTIC VIEW sales_av
USING sales_fact
DIMENSION BY
  (time_attr_dim
    KEY month_id REFERENCES month_id
    HIERARCHIES (time_hier DEFAULT)
  )
MEASURES
 (
   salesCount FACT ORDER_ID AGGREGATE BY COUNT,
   orderCount FACT ORDER_ID AGGREGATE BY COUNT DISTINCT -- How to make this count(distinct)?
 );

ORA-18297: invalid data type detected on base measure "ORDERCOUNT" for use with aggregation operator "COUNT DISTINCT"


Before we resolve that, some background.

In a classic star schema, the various _ID columns are typically integer surrogate keys. These have FK references to dimension tables.

Measures on the fact table are usually numeric values - quantity sold, price, etc. The columns CUSTOMER_ID, PRODUCT_ID, etc. are really dimensions, not measures.

Personally I'd create a fact table like this:

create table sales_fact (
    CUSTOMER_ID INTEGER,
    COUNTRY_ID INTEGER,
    PRODUCT_ID INTEGER,
    ORDER_ID INTEGER,
    DATE_ID INTEGER,
    units_sold integer,
    unit_price number
);


With all the *ID columns having FKs to a corresponding dimension table (CUSTOMER_DIM, PRODUCT_DIM, etc.).

So how do you enable COUNT DISTINCT?

Change the measures to be numeric columns!

For simplicity I've changed PRODUCT_ID to be an integer from your demo:

create table sales_fact (
    CUSTOMER_ID VARCHAR2(50 CHAR),
    COUNTRY VARCHAR2(50 CHAR),
    PRODUCT_ID INTEGER, 
    ORDER_ID VARCHAR2(50 CHAR),
    MONTH_ID VARCHAR2(30 CHAR)
    );

insert into sales_fact(CUSTOMER_ID, COUNTRY, PRODUCT_ID, ORDER_ID, MONTH_ID) values ('A', 'France', '1', 'A1', 'Apr-11');
insert into sales_fact(CUSTOMER_ID, COUNTRY, PRODUCT_ID, ORDER_ID, MONTH_ID) values ('A', 'France', '2', 'A1', 'Apr-11');
insert into sales_fact(CUSTOMER_ID, COUNTRY, PRODUCT_ID, ORDER_ID, MONTH_ID) values ('A', 'France', '1', 'A2', 'Jun-11');
insert into sales_fact(CUSTOMER_ID, COUNTRY, PRODUCT_ID, ORDER_ID, MONTH_ID) values ('B', 'England', '2', 'B1', 'Jul-11');
insert into sales_fact(CUSTOMER_ID, COUNTRY, PRODUCT_ID, ORDER_ID, MONTH_ID) values ('C', 'Germany', '2', 'C1', 'Aug-11');
insert into sales_fact(CUSTOMER_ID, COUNTRY, PRODUCT_ID, ORDER_ID, MONTH_ID) values ('D', 'Germany', '3', 'D1', 'Apr-12');

create table time_dim (
    MONTH_ID VARCHAR2(30),
    MONTH_NAME VARCHAR2(40),
    YEAR_ID VARCHAR2(30),
    YEAR_NAME VARCHAR2(40)
);
insert into time_dim(MONTH_ID, MONTH_NAME, YEAR_ID, YEAR_NAME) values ('Apr-11', 'April', '11', 'CY2011');
insert into time_dim(MONTH_ID, MONTH_NAME, YEAR_ID, YEAR_NAME) values ('Jun-11', 'June', '11', 'CY2011');
insert into time_dim(MONTH_ID, MONTH_NAME, YEAR_ID, YEAR_NAME) values ('Jul-11', 'July', '11', 'CY2011');
insert into time_dim(MONTH_ID, MONTH_NAME, YEAR_ID, YEAR_NAME) values ('Aug-11', 'August', '11', 'CY2011');
insert into time_dim(MONTH_ID, MONTH_NAME, YEAR_ID, YEAR_NAME) values ('Apr-12', 'April', '12', 'CY2012');

CREATE OR REPLACE ATTRIBUTE DIMENSION time_attr_dim
USING time_dim
ATTRIBUTES
 (year_id,
  month_id)
LEVEL MONTH
  KEY month_id
  DETERMINES (year_id)
LEVEL YEAR
  KEY year_id;

CREATE OR REPLACE HIERARCHY time_hier
USING time_attr_dim
 (month  CHILD OF year);

CREATE OR REPLACE ANALYTIC VIEW sales_av
USING sales_fact
DIMENSION BY
  (time_attr_dim
    KEY month_id REFERENCES month_id
    HIERARCHIES (time_hier DEFAULT)
  )
MEASURES
 (
   sales_count FACT PRODUCT_ID AGGREGATE BY COUNT,
   product_count FACT PRODUCT_ID AGGREGATE BY COUNT DISTINCT
 )
;

SELECT YEAR_ID,  sales_count, product_count
FROM sales_av HIERARCHIES(time_hier)
WHERE time_hier.level_name = 'YEAR'
ORDER BY time_hier.hier_order;


YEAR_ID    SALES_COUNT    PRODUCT_COUNT    
11                   5                2 
12                   1                1 

Rating

  (6 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Count Distinct with large values

Ross Adamson, January 09, 2022 - 7:55 pm UTC

Hi Chris,

thanks for taking the time to answer my question.
I can make it work, just as you say, but as soon as I enter a value greater than 64991 in the data, I get the error:
ORA-62600: Invalid value is passed to a bitmap operator.
when performing an OAV query.
For our application we need the potential for more than 65000 values. Is there a way to disable the bitmap operator (or some other solution) so that larger distinct counts can be supported?
Thanks,
Ross.

A complete script showing the issue is:
create table sales_fact (
    CUSTOMER_ID VARCHAR2(50 CHAR),
    COUNTRY VARCHAR2(50 CHAR),
    PRODUCT_ID INTEGER, 
    ORDER_ID VARCHAR2(50 CHAR),
    MONTH_ID VARCHAR2(30 CHAR)
    );

insert into sales_fact(CUSTOMER_ID, COUNTRY, PRODUCT_ID, ORDER_ID, MONTH_ID) values ('A', 'France', '1', 'A1', 'Apr-11');
insert into sales_fact(CUSTOMER_ID, COUNTRY, PRODUCT_ID, ORDER_ID, MONTH_ID) values ('A', 'France', '2', 'A1', 'Apr-11');
insert into sales_fact(CUSTOMER_ID, COUNTRY, PRODUCT_ID, ORDER_ID, MONTH_ID) values ('A', 'France', '1', 'A2', 'Jun-11');
insert into sales_fact(CUSTOMER_ID, COUNTRY, PRODUCT_ID, ORDER_ID, MONTH_ID) values ('B', 'England', '2', 'B1', 'Jul-11');
insert into sales_fact(CUSTOMER_ID, COUNTRY, PRODUCT_ID, ORDER_ID, MONTH_ID) values ('C', 'Germany', '2', 'C1', 'Aug-11');
insert into sales_fact(CUSTOMER_ID, COUNTRY, PRODUCT_ID, ORDER_ID, MONTH_ID) values ('D', 'Germany', '70000', 'D1', 'Apr-12');

create table time_dim (
    MONTH_ID VARCHAR2(30),
    MONTH_NAME VARCHAR2(40),
    YEAR_ID VARCHAR2(30),
    YEAR_NAME VARCHAR2(40)
);
insert into time_dim(MONTH_ID, MONTH_NAME, YEAR_ID, YEAR_NAME) values ('Apr-11', 'April', '11', 'CY2011');
insert into time_dim(MONTH_ID, MONTH_NAME, YEAR_ID, YEAR_NAME) values ('Jun-11', 'June', '11', 'CY2011');
insert into time_dim(MONTH_ID, MONTH_NAME, YEAR_ID, YEAR_NAME) values ('Jul-11', 'July', '11', 'CY2011');
insert into time_dim(MONTH_ID, MONTH_NAME, YEAR_ID, YEAR_NAME) values ('Aug-11', 'August', '11', 'CY2011');
insert into time_dim(MONTH_ID, MONTH_NAME, YEAR_ID, YEAR_NAME) values ('Apr-12', 'April', '12', 'CY2012');

CREATE OR REPLACE ATTRIBUTE DIMENSION time_attr_dim
USING time_dim
ATTRIBUTES
 (year_id,
  month_id)
LEVEL MONTH
  KEY month_id
  DETERMINES (year_id)
LEVEL YEAR
  KEY year_id;

CREATE OR REPLACE HIERARCHY time_hier
USING time_attr_dim
 (month  CHILD OF year);

CREATE OR REPLACE ANALYTIC VIEW sales_av
USING sales_fact
DIMENSION BY
  (time_attr_dim
    KEY month_id REFERENCES month_id
    HIERARCHIES (time_hier DEFAULT)
  )
MEASURES
 (
   sales_count FACT PRODUCT_ID AGGREGATE BY COUNT,
   product_count FACT PRODUCT_ID AGGREGATE BY COUNT DISTINCT
 )
;

SELECT YEAR_ID,  sales_count, product_count
FROM sales_av HIERARCHIES(time_hier)
WHERE time_hier.level_name = 'YEAR'
ORDER BY time_hier.hier_order;

Approx count Distinct

Ross Adamson, January 14, 2022 - 1:39 am UTC

Hi Chris,
just a follow up note to my previous comment.
We've found that by using APPROX_COUNT_DISTINCT as the aggregation function we avoid the issue that I highlighted in my previous comment. (And it is accurate enough for our purposes).
Regards,
Ross

MDX

Asim, September 04, 2022 - 10:30 pm UTC

I have read somewhere that Oracle now allows to query the analytic views using MDX queries/expressions language. But cant find further detail how to do that, can we write MDX queries in SQLPLUS or SQL Developer?
Can you please show an example of it.

Thanks.


Connor McDonald
September 05, 2022 - 5:44 am UTC

I don't know of anywhere that we permit MDX natively in the database. You need tooling for that, eg our analytics cloud product

https://docs.oracle.com/en/cloud/paas/analytics-cloud/tress/mdx.html

Essbase

Asim, September 05, 2022 - 11:57 am UTC

What it seems that analytic cloud is based on a seperate oracle bought product ie hyperion essbase?

But analytic views are Oracle' RDBMS database object and has no relation with Oracle hyperion essbase.

Right?
Connor McDonald
September 06, 2022 - 1:23 am UTC

That's my point. MDX is not a property of the *database* it is a property of the *tool*.

Thus it is the job of the tool to take a request presented via MDX and present that to the database as an appropriately constructed query against a table, view, or analytic view.

The database won't care because it will; just see the final query

Mdx

Asim, September 05, 2022 - 7:58 pm UTC

Here at this link it says that
.....You can even access Analytic Views from MDX-enabled applications such as Excel. Analytic Views can be defined against a variety of data sources, including In-Memory column stores, regular tables, external tables, even Big Data SQL.

https://vlamis.com/presentations/analytic-views-simplify-complex-business-intelligence-queries/


Seems database 18c natively support MDX queries

Asim, September 19, 2022 - 10:13 pm UTC

But from these links it seems that the database now natively support MDX queries directly.

https://www.oracle.com/database/technologies/sql-analytics.html

point no. 5
...
....Analytic Views now support MDX (Multi-Dimensional Expression) query language

and a few lines down
Also its written
....
Also new in 18c for : Analytic Views which organize data using a dimensional model. They provide a fast and efficient way to create analytic queries over data stored in existing database tables and views. They allow you to easily add aggregations and calculations to data sets and to present data in views that can be queried with relatively simple SQL. New in Database 18c is the ability to use the MDX (Multi-Dimensional Expression) query language.

The above no where says anything about tool, it seems they are saying 18c database natively support MDX now for queryjng analytic views.

Also on this link it says about DBMS_MDX_ODBO PL/SQL Package although its not saying about native support.

https://www.oracle.com/webfolder/technetwork/tutorials/obe/db/18c/dba/querying_analytic_views/querying_analytic_views.html

This 15-minute tutorial shows you how to query analytic views (AVs) with SQL to achieve the same functionality as Microsoft’s Multidimensional Expression language. You can also use the DBMS_MDX_ODBO package, the Multidimensional Expression language interface provided by PL/SQL.

I couldnt find in 12c, 18c, 21c docs anything about DBMS_MDX_ODBO plsql package, how ever searching on google, I found this non oracle link which describes DBMS_MDX_ODBO package, and claims it started from 12.2

https://www.google.com/url?sa=t&source=web&rct=j&url=https://www.morganslibrary.org/reference/pkgs/dbms_mdx_odbo.html&ved=2ahUKEwjk7OSS2qH6AhVX_rsIHdN3D_kQFnoECAUQAQ&usg=AOvVaw1saHeiYud7-HK-ChsCJ9_b

Please, can you clarify this?





More to Explore

SQL

The Oracle documentation contains a complete SQL reference.