Thanks for the question, Pablo.

Asked: November 06, 2002 - 7:14 am UTC

Last updated: November 18, 2005 - 3:20 pm UTC

Version: 8.1.7

Viewed 10K+ times! This question is

You Asked

Hi Tom,
I have tables with invoices, invoice payments and invoice accounting. I would like to show the following:

Invoice Number (from invoices)
Last payment for the Invoice (from invoice payments, may be several per invoice)
Total Accounted Amount (from invoice accounting, may be several rows per invoice)

I would like to do this in a single sql and if possible without using inline views. I guess I can use analytic functions, but I cannot get it without inline views. Is there any way?

Thanks for your time.

and Tom said...

Why the "fear" of inline views???

Anyway -- need a tad more detail:

invoice number -- easy enough

last payment for the invoice -- is the the last "payment_amount" for the max(payment_date) or just the max(payment_date). Eg: are you asking for the salary of the last hired employee or asking for the date of the last hire

total accounted amount -- easy enough

Oh well -- I'll give you every possible answer then - starting with some test data:

ops$tkyte@ORA920.US.ORACLE.COM> create table t1 ( id number primary key );

Table created.

ops$tkyte@ORA920.US.ORACLE.COM> create table t2 ( id references t1, payment_date date, amt number );

Table created.

ops$tkyte@ORA920.US.ORACLE.COM> create table t3 ( id references t1, amt number );

Table created.

ops$tkyte@ORA920.US.ORACLE.COM>
ops$tkyte@ORA920.US.ORACLE.COM> insert into t1 select rownum from all_users where rownum <= 5;

5 rows created.

ops$tkyte@ORA920.US.ORACLE.COM>
ops$tkyte@ORA920.US.ORACLE.COM> insert into t2 select mod(rownum,5)+1, sysdate-dbms_random.value(1,100), round(dbms_random.value( 1, 100 ),2)
2 from all_users;

38 rows created.

ops$tkyte@ORA920.US.ORACLE.COM>
ops$tkyte@ORA920.US.ORACLE.COM>
ops$tkyte@ORA920.US.ORACLE.COM> insert into t3 select mod(rownum,5)+1, round(dbms_random.value( 1, 100 ),2)
2 from (select * from all_users union all select * from all_users );

76 rows created.

so, we have different numbers of rows in t2 and t3 for each ID like you do -- here are the singleton queries that show us what values we need to get:

ops$tkyte@ORA920.US.ORACLE.COM>
ops$tkyte@ORA920.US.ORACLE.COM>
ops$tkyte@ORA920.US.ORACLE.COM> select t1.id, max(t2.payment_date) max_pay_date
2 from t1, t2
3 where t1.id = t2.id
4 group by t1.id
5 /

ID MAX_PAY_D
---------- ---------
1 31-OCT-02
2 31-OCT-02
3 02-NOV-02
4 31-OCT-02
5 16-OCT-02

ops$tkyte@ORA920.US.ORACLE.COM> select t1.id, substr( max(to_char(t2.payment_date,'yyyymmddhh24miss') || amt), 15 ) max_t2_amt
2 from t1, t2
3 where t1.id = t2.id
4 group by t1.id
5 /

ID MAX_T2_AMT
---------- ----------------------------------------
1 20.22
2 23.48
3 71.17
4 76.45
5 81.02

(thats the amount for the max payment date)

ops$tkyte@ORA920.US.ORACLE.COM> select t1.id, sum(t3.amt) sum_t3_amt
2 from t1, t3
3 where t1.id = t3.id
4 group by t1.id
5 /

ID SUM_T3_AMT
---------- ----------
1 601.01
2 821.66
3 706.87
4 857.05
5 893.75

ops$tkyte@ORA920.US.ORACLE.COM>
ops$tkyte@ORA920.US.ORACLE.COM>
ops$tkyte@ORA920.US.ORACLE.COM> select t1.id,
2 (select max(payment_date) from t2 where t2.id = t1.id) max_pay_date,
3 (select to_number(substr( max(to_char(t2.payment_date,'yyyymmddhh24miss') || amt), 15 ))
4 from t2
5 where t2.id = t1.id) max_t2_amt,
6 (select sum(amt) from t3 where t3.id = t1.id ) sum_t3_amt
7 from t1
8 /

ID MAX_PAY_D MAX_T2_AMT SUM_T3_AMT
---------- --------- ---------- ----------
1 31-OCT-02 20.22 601.01
2 31-OCT-02 23.48 821.66
3 02-NOV-02 71.17 706.87
4 31-OCT-02 76.45 857.05
5 16-OCT-02 81.02 893.75

Using a select of a select, we can get that information pretty easy. Those are not inline views -- technically -- so that is a "fair" answer....

ops$tkyte@ORA920.US.ORACLE.COM>
ops$tkyte@ORA920.US.ORACLE.COM> select t1.id,
2 max(payment_date) max_pay_date,
3 to_number(substr( max(to_char(t2.payment_date,'yyyymmddhh24miss') || t2.amt), 15 )) max_t2_amt,
4 sum(t3.amt)/count(distinct t2.rowid) sum_t3_amt
5 from t1, t2, t3
6 where t1.id = t2.id and t2.id = t3.id
7 group by t1.id
8 /

ID MAX_PAY_D MAX_T2_AMT SUM_T3_AMT
---------- --------- ---------- ----------
1 31-OCT-02 20.22 601.01
2 31-OCT-02 23.48 821.66
3 02-NOV-02 71.17 706.87
4 31-OCT-02 76.45 857.05
5 16-OCT-02 81.02 893.75

I'll let you figure out why that one that uses no subqueries or inline views works.... the count(distinct rowid) could be count(distinct t2.id) as well...

ops$tkyte@ORA920.US.ORACLE.COM>
ops$tkyte@ORA920.US.ORACLE.COM> select t1.id, max_pay_date, max_t2_amt, sum_t3_amt
2 from t1,
3 ( select id, max(payment_date) max_pay_date,
4 to_number(substr( max(to_char(payment_date,'yyyymmddhh24miss') || amt), 15 )) max_t2_amt
5 from t2
6 group by id ) t2,
7 ( select id, sum(amt) sum_t3_amt
8 from t3
9 group by id ) t3
10 where t1.id = t2.id and t1.id = t3.id
11 /

ID MAX_PAY_D MAX_T2_AMT SUM_T3_AMT
---------- --------- ---------- ----------
1 31-OCT-02 20.22 601.01
2 31-OCT-02 23.48 821.66
3 02-NOV-02 71.17 706.87
4 31-OCT-02 76.45 857.05
5 16-OCT-02 81.02 893.75

ops$tkyte@ORA920.US.ORACLE.COM>

and the inline view for completeness...

Rating

(32 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

"How to get max and sum from two different tables in the same select",

Prasanna, November 06, 2002 - 5:19 pm UTC

Is there a different way to approach this problem possibly looking at the data model aspect?

I do not know the full details of this database, but seems to me that between invoice payments and invoice accounting the relationship is one to one.

In which case going back to Tom's example table t2 and t3 need to have additional columns "id2".

Then the query becomes simpler

ops$tkyte@ORA920.US.ORACLE.COM>
ops$tkyte@ORA920.US.ORACLE.COM> select t1.id,
2 max(payment_date) max_pay_date,
3 to_number(substr( max(to_char(t2.payment_date,'yyyymmddhh24miss') ||
t2.amt), 15 )) max_t2_amt,
4 sum(t3.amt) sum_t3_amt --/count(distinct t2.rowid) --No need for Count
5 from t1, t2, t3
6 where t1.id = t2.id and --t2.id = t3.id
t2.id2 = t3.id2
7 group by t1.id
8 /

November 06, 2002 - 6:32 pm UTC

that would be true if and only if it was one to one -- but they said it wasn't -- it was 1:n and 1:m -- hence you need to divide to get the sum back right.

Not sure...

Kashif, November 06, 2002 - 5:49 pm UTC

Hi Tom,

I'm a little unclear on your second query, in which you use the count (distinct t2.rowid). How does it work? I know the count (distinct t2.rowid) will give the number of distinct rowids per id, but that's where I get lost. If we don't divide by it, we'll get the sum of amts in t3 times the number of rows in t2. But why? And why distinct? I guess my understanding of joins and groups is a little lacking here... Any feedback would be useful. Thanks.

Kashif

November 06, 2002 - 6:38 pm UTC

Ok, lets say there are 

1 rows in t1 with id = 1
2 rows in t2 with id = 1
3 rows in t3 with id = 1

when we join:

from t1, t2, t3
where t1.id = t2.id and t1.id = t3.id

we'll get 6 rows -- (1*2 from joining t1 to t2) * 3 from joining that to t3

We apparently are counting each T3.amt 2 times -- once for each row in T2.  That is the nature of the join.  Look:

ops$tkyte@ORA920.US.ORACLE.COM> create table t1 ( id int );

Table created.

ops$tkyte@ORA920.US.ORACLE.COM> create table t2 ( id int, msg varchar2(10) );

Table created.

ops$tkyte@ORA920.US.ORACLE.COM> create table t3 ( id int, amt int );

Table created.

ops$tkyte@ORA920.US.ORACLE.COM>
ops$tkyte@ORA920.US.ORACLE.COM> insert into t1 values ( 1 );

1 row created.

ops$tkyte@ORA920.US.ORACLE.COM> insert into t2 values ( 1, 'row1' );

1 row created.

ops$tkyte@ORA920.US.ORACLE.COM> insert into t2 values ( 1, 'row2' );

1 row created.

ops$tkyte@ORA920.US.ORACLE.COM> insert into t3 values ( 1, 2 );

1 row created.

ops$tkyte@ORA920.US.ORACLE.COM> insert into t3 values ( 1, 4 );

1 row created.

ops$tkyte@ORA920.US.ORACLE.COM> insert into t3 values ( 1, 6 );

1 row created.

ops$tkyte@ORA920.US.ORACLE.COM>
ops$tkyte@ORA920.US.ORACLE.COM>
ops$tkyte@ORA920.US.ORACLE.COM> select t1.id, t2.msg, t3.amt
  2    from t1, t2, t3
  3   where t1.id = t2.id
  4     and t1.id = t3.id
  5  /

        ID MSG               AMT
---------- ---------- ----------
         1 row1                2
         1 row2                2
         1 row1                4
         1 row2                4
         1 row1                6
         1 row2                6

6 rows selected.


So -- each row in T3 is "amplified" or "multiplied" by each row in T2.  two rows in T2 -- then T3's rows will appear in the join twice.  Three rows in T2 -- then T3's rows will appear in the join three times.  and so on.


So, when we sum them up:

ops$tkyte@ORA920.US.ORACLE.COM> select t1.id, sum(t3.amt), count(distinct t2.rowid)
  2    from t1, t2, t3
  3   where t1.id = t2.id
  4     and t1.id = t3.id
  5   group by t1.id
  6  /

        ID SUM(T3.AMT) COUNT(DISTINCTT2.ROWID)
---------- ----------- -----------------------
         1          24                       2

ops$tkyte@ORA920.US.ORACLE.COM>

we doubled counted (in this example) everything.  Dividing the SUM by the number of rows in T2 that contributed to this group will fix that -- 24/2 = 12 = 6+4+2...

Does it not have to be count(distinct t2.rowid) always?

Ramakrishna, November 07, 2002 - 3:34 am UTC

Dear Tom,
Thanks for your detailed explanation. You have said in
your response that "count(distinct rowid) could be count(distinct t2.id) as well". But they return different results! The count(distinct t2.id) for a given t1.id always returns 1 unlike count(distinct t2.rowid) which will return the factory to "de-amplify" by. Is that right?

Also, the Discoverer and Business Objects documentation
seem to call this 'Fan Trap' or 'Chasm Trap'.
It would be great if you could list the generic conditions
under which such connection traps can occur in SQL so that we will know what to watch out for.

thanks,
Ramakrishna

November 07, 2002 - 7:42 am UTC

DOH -- sorry about that -- you are correct of course, needs to be count(distinct t2.rowid) -- we need to know the number of rows.......

I've never heard of "Fan Trap" or "Chasm Trap" (I can see the "fan" fitting in -- it "fans out" the result set -- chasm-- that one doesn't click with me)

Anyway -- here are the rules:

If

a) you are using aggregation
b) and you have more then 2 tables in the query (you have N tables resulting in N-1 joins)
c) and there are NOT at least "N-2" one to one relationships involved

you have this issue

Eg: suppose you have:

select ..
from t1, t2, t3 where t1.x=t2.x and t1.x=t3.x N=3, there are 2 joins

t1 1:m t2
t1 1:m t3 there are 0 1:1 conditions, 0 < 3-2 so this "fans" out

Suppose you have:

select ...
from t1, t2, t3 where t1.x=t2.x and t1.x=t3.x N=3, there are 2 joins

t1 1:1 t2
t1 1:m t3 there are 1 1:1 conditions, 1 >= 3-2 so this does NOT fan out

How about a new book: "Cool Little SQL Tricks"

Chuck Jolley, November 07, 2002 - 10:18 am UTC

to_number(substr( max(to_char( t2.payment_date, 'yyyymmddhh24miss') ||
t2.amt), 15 )) max_t2_amt

Clever!
Thanks,
Chuck

November 07, 2002 - 10:45 am UTC

Well, actually, that is what my next book is mutating into -- more of a best practices, rather then a formal "performance tuning text book".

Similar to:

</code> http://www.amazon.com/exec/obidos/ASIN/0201310058 <code>

curious -- what do people think of that format for an Oracle best practices book? there are online samples of the content so you can see what it would look like. (i have about 100 items in my list)

Thanks.

Kashif, November 07, 2002 - 11:57 am UTC

Thanks Tom, the explanation was fantastic. When do you think you might have a revision for "Expert One on One" out for 9i, by the way? The best practices, bullet-point style is perfect for recommending tips and tricks, especially for intermediate to advance level programmers, which is probably the audience interested in such a text. Prevents us readers from having to sift through tons of text and having to dog-ear each page we're interested in, or even worse highlighting lines in the text with fluoresecent markers. To get what we're looking for, we can just zoom in to the topic we're interested in using the TOC. I've read a best practices book by Feurstein, and while I don't agree with some of his points, it was really hands-down the most effective tips book I have read yet, primarily as a result of the concise, bullet-point style of the book.

Kashif

P.S. Is there a way to get in line to order a copy of the book now?

November 07, 2002 - 12:36 pm UTC

I'm not planning on doing an Expert one on one for 9i - for a couple of reasons.

One is that 90% of the current one applies 100% to 9i (there is just more STUFF in 9i, lots more). So, the guts of the book is relevant.

Second is -- well, time is marching on and the next release would come out before I could do it -- so, I'm thinking about targeting that one ;)

I don't even have an ISBN for the other book as yet.

Interesting...

jan van mourik, November 07, 2002 - 12:02 pm UTC

That "Best Practices" books looks interesting, well, if I were a Java developer :-) I'd definitely be interested in your Oracle version...

Sufficient condition for fan trap?

Ramakrishna, November 08, 2002 - 6:18 am UTC

Dear Tom,
Thanks for your answer. Chasm trap and Fan trap seem to
be used to mean much the same thing. I went through your
conditions when this occurs and wanted to know if it is
correct to restate them as follows:

The NECESSARY conditions for a 'fan trap' to occur in a
query are:
1. Aggregation is being done
2. Involves more than one 1:m relationship

However, I am unable to formulate the SUFFICIENT
conditions. For example, there is one case where a query
meets both the above criteria but has no fan trap which is:
Queries on the classic 'star schema' used in data
warehouses involve many 1:m relationships and aggregations
are done on the central fact table and there is no problem.

For example, you could have a Day table, a Product table and a Region table all linked via 1:m relationships to a
central Sales table but a query joining all these 4 tables
to find total sales by, say, Month, by Category and by
Geography does not face this problem.

I want to know if it will be possible to state the SUFFICIENT condition for this in plain English so that I can
use it to check any multi-table queries easily.

thanks,
ramakrishna

November 08, 2002 - 8:06 am UTC

Queries on star schema's DO NOT involve more then one 1:m -- they are all 1:1 by definition!!!

You have a series of dimension tables, you have a single FACT table.

The relationship from FACT to DIMENSION is always 1:1 (the relationship from DIMENSION to FACT is generally 1:M but it is not relevant here -- you join FACT to a SINGLE DIMENSION row -- you join from FACT to a dimensions primary key)

So, with a star schema -- you have a whole bunch of 1 to 1 relationships.

You were looking at the relationship the wrong way....

You stated the sufficient conditions.

group functions

Amit, November 12, 2002 - 11:30 pm UTC

Dear Tom,
My requirement is I 've to find the sum of quantity and the date corresponding to it but display only the max of the summed up value and its corresponding date.I am using the following query :

select * from (select to_char(docdt,'DD-MON-YYYY'),sum(val/100000) from t
where trunc(dt) >= to_date(:fdt,'dd-mon-yyyy')
and trunc(dt) < to_date(:tdt,'dd-mon-yyyy')+1
group by to_char(dt,'DD-MON-YYYY') order by sum(val/100000) desc) where rownum=1

If my from and to dates are far apart, the query takes more time to execute.
Any improvement is greately appreciated.
TIA

November 13, 2002 - 1:21 pm UTC

Hmm, the more data I have to scan and process the longer the query takes.... seems reasonable to me?

But anyway there is definite room for improvement. for example:

select * from (select to_char(docdt,'DD-MON-YYYY'),sum(val/100000) from t
where trunc(docdt) >= to_date(:fdt,'dd-mon-yyyy')
and trunc(docdt) < to_date(:tdt,'dd-mon-yyyy')+1
group by to_char(docdt,'DD-MON-YYYY') order by sum(val/100000) desc) where rownum=1

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.46 0.48 72 75 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.46 0.48 72 75 0 1
********************************************************************************
select *
from (
select trunc(docdt), sum(val/100000)
from t
where docdt >= to_date(:fdt,'dd-mon-yyyy')
and docdt < to_date(:tdt,'dd-mon-yyyy')+1
group by trunc(docdt)
order by 2 desc
)
where rownum = 1

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.22 0.24 72 75 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.22 0.24 72 75 0 1

those two queries are the same but by removing many of the unnecessary function calls you are making -- we can cut the CPU needs in half (my table t was created by select created, object_id from all_objects by the way).

I looked at:

where trunc(dt) >= to_date(:fdt,'dd-mon-yyyy')

Now, the trunc(dt) isn't needed since DT >= trunc(dt) always and if trunc(dt) >= to_date(:fdt) then DT by definition is >= to_date(:fdt)

Looking at
trunc(dt) < to_date(:tdt,'dd-mon-yyyy')+1

we can also say that if DT = 01-jan-2002 23:59:59 and trunc(dt) = 01-jan-2002 00:00:00. And if

01-jan-2002 00:00:00 < to_date(:tdt)+1

then to_date(:tdt)+1 must be 02-jan-2002 OR GREATER (the minimum it could be is 02-jan-2002.

Hence, if trunc(dt) < to_date(:tdt)+1 in this case, dt is ALSO < to_date(:tdt)+1 (cause we are just whacking the time component here)

So, we can remove lots of trunc calls. I also changed the TO_CHARS (lots of overhead there to string-ize a date) to TRUNC -- achieving the same results but at a lower CPU cost:

select to_char(docdt,'dd-mon-yyyy')
from
t group by to_char(docdt,'dd-mon-yyyy')

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 3 0.16 0.19 72 75 0 17
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 5 0.16 0.19 72 75 0 17

select trunc(docdt)
from
t group by trunc(docdt)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 3 0.08 0.11 72 75 0 17
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 5 0.08 0.11 72 75 0 17

Now, the removal of the functions on docdt in the where clause will also permit this to use an index on docdt when/if appropriate -- if you happen to have an index on DOCDT,VAL -- you would get the biggest bang for the buck here as well.

group functions

Amit, November 12, 2002 - 11:39 pm UTC

Dear Tom,
Sorry.I just muffed up with the above query the correct one is :

select * from (select to_char(dt,'DD-MON-YYYY'),sum(val/100000) from t
where trunc(dt) >= to_date(:fdt,'dd-mon-yyyy')
and trunc(dt) < to_date(:tdt,'dd-mon-yyyy')+1
group by to_char(dt,'DD-MON-YYYY') order by sum(val/100000) desc) where rownum=1

TIA

November 13, 2002 - 1:22 pm UTC

see above -- i figured dt and docdt were one in the same...

what im waiting for

Bill, November 13, 2002 - 1:59 pm UTC

is a line of cool clothes called BIND VARIABLES
the tag on the clothes could be :TOM

Explanation of the terms "fan trap" and "chasm trap"

Dan Kefford, November 13, 2002 - 2:46 pm UTC

In case anyone was wondering, the following URL appears to explain the two terms quite well:

</code> http://www.businessobjects.com/services/InfoCenter/tips/unvdes/tip/ut001/ut001.htm <code>

Randy, November 10, 2004 - 11:21 am UTC

Tom,
I have a query that takes the sum from two tables then divides the sum to give % of program. The query below is an example, however, the output is not correct.

select i.appn, p.program_name, i.i_pron, i.ssn,
sum(i.APPN_DOLLARS - i.WITHHELD + i.REPROGRAM - i.TAXES) "current_program",
sum(l.commitment), sum(l.obligations), sum(l.obligations) / sum(i.APPN_DOLLARS - i.WITHHELD + i.REPROGRAM - i.TAXES) "program_percent"
from program p, i_pron i, ledger l
where p.dispid = i.program
and i.dispid = l.i_pron
group by i.ssn, p.program_name, i.appn, i.i_pron

The sum of "current_program" and "program_percent" does not add up correctly. However, the sum of commitment and obligations all show correct totals. Where am I going wrong with this?

Thanks in advance,
Randy

November 10, 2004 - 12:24 pm UTC

show me, give me a "for example"

Randy, November 10, 2004 - 2:46 pm UTC

Ok, I have a query shown below...

  1  select i.dispid, p.program_name, i.appn, i.ssn,
  2  sum(i.APPN_DOLLARS - i.WITHHELD + i.REPROGRAM -  i.TAXES) "current_program"
  3  from i_pron i, program p
  4  where i.program = p.dispid
  5* group by i.dispid, i.ssn, i.appn, p.program_name
SQL> /

    DISPID PROGRAM_NAME                             APPN            SSN            current_program
---------- ---------------------------------------- --------------- -------------- ---------------
        41 Test Mod Line                         WTCV            GZ2400                41565000
        42 Carrier MODs                         WTCV            GB1930                40770000
        68 Reimb Customer-OMA                       Customer - OMA  123207.NC000        18958418.9

"current_program" show's the correct total.  However, when I add the third table to the from line all the current_program totals change.

  1  select i.dispid, p.program_name, i.appn, i.ssn,
  2  sum(i.APPN_DOLLARS - i.WITHHELD + i.REPROGRAM -  i.TAXES) "current_program"
  3  from i_pron i, program p, ledger l
  4  where i.program = p.dispid
  5* group by i.dispid, i.ssn, i.appn, p.program_name
SQL> /

    DISPID PROGRAM_NAME                             APPN            SSN            current_program
---------- ---------------------------------------- --------------- -------------- ---------------
        41 Test Mod Line                         WTCV            GZ2400              8687085000
        42 Carrier MOD                         WTCV            GB1930              8520930000
        68 Reimb Customer-OMA                       Customer - OMA  123207.NC000        3962309548

November 10, 2004 - 8:14 pm UTC

yes, well, you did cartesian product to ledger there.

where is the join to L?

Group functions

Peg, December 29, 2004 - 10:34 am UTC

Thanks so much! I've been looking for a technique to solve this type of problem (group functions on multiple tables)for days!

How to sum the dimension table columns?

Michael, January 30, 2005 - 6:33 am UTC

Hi Tom,
I have very similar problem,
I have 2 table: dimension and fact.
For example:
My dimension table stores book. The structure is: book_id, shelf_id, num_of_pages
In the fact table I have sales data: book_id, date, price.
The tables contains the following records:
Books:
book_id shelf_id num_of_pages
----------------------------------------------------------
1 1 100
2 1 200
sales:
book_id date price
-------------------------------------------------------
1 d1 10
1 d2 11
2 d1 20
2 d2 21
2 d3 22
2 d4 23
I want to create a query that group the data by shelfs and present the total sales (according to price) and the sum of pages for all the books that were sold. The problem is the number of pages is multiplied by the number of sales.
For example the following SQL:
Select sum(num_of_pages) num_of_pages
, sum(price) total_price
from books, sales
where books.book_id = sales.book_id
group by books.shelfs_id.

Ill get num_of_pages = 1000 (100 * 2 + 200 * 4), total_price = 107.

In this case, even dividing by count will not work since some of the records are multiply by 2 and some by 4.
Can you help?

January 30, 2005 - 9:37 am UTC

if i only had a create table and some inserts... i can see how to do this with analytics. do you?

here are the scropts

Michael, January 30, 2005 - 10:59 am UTC

CREATE TABLE books
( book_id NUMBER,
shelf_id NUMBER,
num_of_pafes NUMBER);

Insert into books
(BOOK_ID, SHELF_ID, NUM_OF_PAFES)
Values
(1, 1, 100);
Insert into books
(BOOK_ID, SHELF_ID, NUM_OF_PAFES)
Values
(2, 1, 200);

CREATE TABLE MICK_63.books_sales
( book_id NUMBER,
sales_date DATE,
price NUMBER
);

Insert into books_sales
(BOOK_ID, SALES_DATE, PRICE)
Values
(1, TO_DATE('01/30/2005 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), 10);
Insert into books_sales
(BOOK_ID, SALES_DATE, PRICE)
Values
(1, TO_DATE('01/31/2005 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), 11);
Insert into books_sales
(BOOK_ID, SALES_DATE, PRICE)
Values
(2, TO_DATE('01/30/2005 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), 20);
Insert into books_sales
(BOOK_ID, SALES_DATE, PRICE)
Values
(2, TO_DATE('01/31/2005 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), 21);
Insert into books_sales
(BOOK_ID, SALES_DATE, PRICE)
Values
(2, TO_DATE('02/01/2005 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), 22);
Insert into books_sales
(BOOK_ID, SALES_DATE, PRICE)
Values
(2, TO_DATE('02/02/2005 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), 23);

January 30, 2005 - 12:33 pm UTC

actually, don't even need analytics -- just roll up the sales to the book level, join, then aggregate up to the shelf level.

select a.shelf_id, sum(a.num_of_pages), sum(b.price)
from books a,
(select sum(price) price, book_id
from books_sales
group by book_id ) b
where a.book_id = b.book_id
group by a.shelf_id;

what's the solution you have?

Gabe, January 30, 2005 - 11:00 am UTC

create table book
( book_id       number(9) primary key
 ,shelf_id      number(9) not null
 ,num_of_pages  number(9) not null
);

insert into book values (1,1,100);
insert into book values (2,1,200);
insert into book values (3,2,300);
insert into book values (4,3,120);
insert into book values (5,3,120);

create table sale
( book_id       number(9) not null references book(book_id)
 ,sale_dt       date      not null
 ,price         number(9) not null
);

insert into sale values (1, sysdate-9, 10);
insert into sale values (1, sysdate-8, 11);
insert into sale values (2, sysdate-7, 20);
insert into sale values (2, sysdate-6, 21);
insert into sale values (2, sysdate-5, 22);
insert into sale values (2, sysdate-4, 23);
insert into sale values (3, sysdate-3, 30);
insert into sale values (3, sysdate-2, 40);
insert into sale values (3, sysdate-1, 50);
insert into sale values (4, sysdate+1, 30);
insert into sale values (4, sysdate+2, 40);
insert into sale values (5, sysdate+3, 50);

SQL> select shelf_id, sum(pages), sum(sales)
  2  from   ( select b.shelf_id
  3                 ,b.book_id
  4                 ,b.num_of_pages pages
  5                 ,sum(s.price) sales
  6           from   sale s, book b
  7           where  s.book_id = b.book_id
  8           group by b.shelf_id, b.book_id, b.num_of_pages
  9         )
 10  group by shelf_id
 11  ;

  SHELF_ID SUM(PAGES) SUM(SALES)
---------- ---------- ----------
         1        300        107
         2        300        120
         3        240        120

I can do some analytics as well ... but is it worth it (performance, elegance)?

Unless I'm missing something here!

The thing that is missing

Michael, January 31, 2005 - 2:43 am UTC

That is great but my problem is that the format of the SQL is strict since it is going into a template:
The template is:
Select shelf_id,
<expression that will sum pages>,
<expression that will sum sales>
from sale s, book b
where s.book_id = b.book_id
group by b.shelf_id

so I cant add sub selects or additional group by. I need to add this into one expression that is grouped by shelf.

January 31, 2005 - 8:15 am UTC

The thing that is missing is you have cut out the power of SQL in the name of "generic code"

That is the thing that is missing. If you lobotimize SQL, you get -- well, brain dead SQL.

What is this "template" thing all about (besides sounding like a bad idea(tm))...

It would take two levels of aggregation to do this. I don't believe you'll "get there from here" with that template (withing inline views at the very least)

not for nothing

Michael, January 31, 2005 - 8:49 am UTC

We have a generic system that allows the user to configure what data he wants to look at.
The user configures parts of the select statement and also on what columns to do the group by.
The application combines this into a whole SQL. So what we have is some kind of an SQL generator.
I know that we can get this data by running different SQL, but enabling the application to run this kind of SQL will force us to make the applications SQL generator even more complex.
This is why I thought that we can achieve this data by using analytic functions.
My logic is Oracle multiply the records in the books table by the records in the sales table as a result of the join between 2 tables. Maybe, there is a function that can sum this data before the multiplication.

January 31, 2005 - 9:16 am UTC

I know they are "not for nothing", but they always -- repeat always -- come at a cost, a pretty high cost if you ask me after seeing so many variations of them over many many many years.

can "expression" be a "scalar subquery"

If so

ops$tkyte@ORA10GR1> select A.shelf_id,
  2         sum(A.num_of_pages) pages,
  3         (select sum(price)
  4            from sale, book
  5           where sale.book_id = book.book_id
  6             and book.shelf_id = A.shelf_id) price
  7    from book A
  8   group by A.shelf_id;
 
  SHELF_ID      PAGES      PRICE
---------- ---------- ----------
         1        300        107
         2        300        120
         3        240        120

almost

Michael, January 31, 2005 - 9:20 am UTC

It can be a subselect but the main select must join the books and sales tables.

January 31, 2005 - 9:45 am UTC

"why" he says.  you just need to use the template against "book" instead of "book and sales"

anyway, what we'd need is the factor to divide "num_of_pages" by (the number of rows it was multiplied by).  that would be CNT:

ops$tkyte@ORA10GR1> select a.shelf_id,
  2         a.num_of_pages,
  3         count(distinct b.rowid) over (partition by a.rowid) cnt,
  4         b.price
  5    from book a, sale b
  6   where a.book_id = b.book_id
  7  /
 
  SHELF_ID NUM_OF_PAGES        CNT      PRICE
---------- ------------ ---------- ----------
         1          100          2         10
         1          100          2         11
         1          200          4         20
         1          200          4         21
         1          200          4         22
         1          200          4         23
         2          300          3         30
         2          300          3         40
         2          300          3         50
         3          120          2         30
         3          120          2         40
         3          120          1         50
 
12 rows selected.
 

<b>so, using an inline view, we could:</b>


ops$tkyte@ORA10GR1>
ops$tkyte@ORA10GR1> select shelf_id, sum(num_of_pages/cnt), sum(price)
  2    from (
  3  select a.shelf_id,
  4         a.num_of_pages,
  5         count(distinct b.rowid) over (partition by a.rowid) cnt,
  6         b.price
  7    from book a, sale b
  8   where a.book_id = b.book_id
  9         )
 10   group by shelf_id
 11  /
 
  SHELF_ID SUM(NUM_OF_PAGES/CNT) SUM(PRICE)
---------- --------------------- ----------
         1                   300        107
         2                   300        120
         3                   240        120

<b>but it won't work as an analytic without the inline view:</b>
 
ops$tkyte@ORA10GR1>
ops$tkyte@ORA10GR1> select a.shelf_id,
  2         sum(a.num_of_pages/
  3         count(distinct b.rowid) over (partition by a.rowid)) pages,
  4         sum(b.price)
  5    from book a, sale b
  6   where a.book_id = b.book_id
  7   group by shelf_id
  8  /
       count(distinct b.rowid) over (partition by a.rowid)) pages,
       *
ERROR at line 3:
ORA-30483: window  functions are not allowed here
 
 
<b>so that is out, but now you know what you need -- the count of rows in sale for that book id</b>

ops$tkyte@ORA10GR1> select a.shelf_id,<b>
  2         sum(a.num_of_pages/
  3         (select count(*) from sale where book_id = a.book_id)) pages,</b>
  4         sum(b.price) price
  5    from book a, sale b
  6   where a.book_id = b.book_id
  7   group by shelf_id
  8  /
 
  SHELF_ID      PAGES      PRICE
---------- ---------- ----------
         1        300        107
         2        300        120
         3        240        120
 
ops$tkyte@ORA10GR1>

<b>there, a way underperforming query that fits the template</b>


I am for some reason reminded of a childhood show -- the Brady Bunch (you might not know it at all).  There was an Episode "Johnny Bravo"

http://www.bradyresidence.com/bravo99.html


Greg Brady got to be a "rock star" just because the "suit fit" -- not for any other reason.  Seems to fit here for some reason :)

You are the KING

Michael, January 31, 2005 - 10:43 am UTC

Thanks,
This is grate.
Fits like a glove.
In addition, it will not be so bad regarding performance since we have an index sales.book_id so the sub SQL performing a fast scan on the index.

Helped a lot.

January 31, 2005 - 11:33 am UTC

over and over and over again. anything done alot adds up remember....

ok

Siva, January 31, 2005 - 11:42 am UTC

Hi Tom,
What is the difference between Inline view and subquery?
Can they replace each other?
Please reply.

January 31, 2005 - 11:56 am UTC

inline view can be used where a view could be used:

select * from VIEW

select * from ( select * from t );

a subquery is a set used in a predicate

where x in ( subquery );

What is (tm)?

Menon, January 31, 2005 - 12:10 pm UTC

"What is this "template" thing all about (besides sounding like a bad idea(tm)).."
E.g. in the above sentence you have "idea (tm)."
I keep seeing it and thought it stood for trademark.
However, that does not fit in the above statement?

January 31, 2005 - 12:20 pm UTC

it is a "joke"

thats a really good idea "(tm)"

sum of same fields in two different tables

Sonali Sherma, February 10, 2005 - 1:23 pm UTC

Hi Tom,
I am facing a problem, hope you would help

I have two tables.. here is the sample data..

create table t1
(item_ID NUMBER(2),
book_ID NUMBER(2),
price NUMBER(2))

/

insert into t1 (item_ID, book_ID, price) values (1,1,60);
insert into t1 (item_ID, book_ID, price) values (1,2,70);
insert into t1 (item_ID, book_ID, price) values (1,3,80);
insert into t1 (item_ID, book_ID, price) values (2,1,10);
insert into t1 (item_ID, book_ID, price) values (2,1,20);
insert into t1 (item_ID, book_ID, price) values (1,3,40);

create table t2
(item_ID NUMBER(2),
book_ID NUMBER(2),
price NUMBER(2))

/

insert into t2 (item_ID, book_ID, price) values (1,1,89);
insert into t2 (item_ID, book_ID, price) values (1,2,29);
insert into t2 (item_ID, book_ID, price) values (1,3,99);
insert into t2 (item_ID, book_ID, price) values (2,1,55);
insert into t2 (item_ID, book_ID, price) values (2,1,90);
insert into t2 (item_ID, book_ID, price) values (1,3,10);

as you can see, both tables have the same column names with different data. what I require is that i need to calculate the SUM of price grouped by book_id from both the tables in a single SQL statement output.

so my required output is this:

BOOK_ID SUM(PRICE)
-------- ----------
1 323
2 99
3 229

where the SUM(price) for each book_id is for all the books in both the tables where book_id = 1 and then 2 in the second row and then 3 in the third row.

Would you please help, as I am unable to write the query for this.?

Thanks,
Sonali.

February 11, 2005 - 7:01 pm UTC

select...
from ( select book_id, sum(price) price from t1 group by book_id ) t1,
( select book_id, sum(price) price from t1 group by book_id ) t2
where t1.book_id = t2.book_id;

Date function

A reader, August 16, 2005 - 1:34 pm UTC

Tom, is having a fuction like this a good practice?

CREATE OR REPLACE function customer_sum_FUNC(date_IN IN DATE)
return number
is
v_sum number := 0 ;
begin
select count(*) into v_sum
from his_customer
where trunc(completed_date) = trunc(date_IN);
return(v_sum);
end ;
/

August 17, 2005 - 12:01 pm UTC

I doubt it.

why would you not just query that out with the data in the first place?

but at the very least,

where completed_date >= trunc(date_in) and completed < trunc(date_in)+1;

to let you at least use an index on completed_date.

A reader, August 17, 2005 - 2:24 pm UTC

This will use use the index. Correct me if I am wrong Tom.

select count(*)
from his_customer
where completed_date between to_date( completed_date,'dd-mon-yyyy hh24:mi:ss' )
and to_date(:DATE_IN,'dd-mon-yyyy hh24:mi:ss' )

August 17, 2005 - 2:44 pm UTC

sorry, that doesn't even compute.

where completed_date between to_date( completed_date,'dd-mon-yyyy hh24:mi:ss' )

????????

you are comparing completed_date to completed_date using an IMPLICIT conversion of completed date to string and then explicitly back into a date?????

??

A reader, August 17, 2005 - 3:02 pm UTC

Was trying to use between...just another way to trying use the index.

August 17, 2005 - 5:11 pm UTC

but you betweened the column with ITSELF

where x between x and y

that would just be where x <= y

A reader, August 17, 2005 - 4:13 pm UTC

Tom,

Can you give us an example of using between,in this
case?

August 17, 2005 - 5:16 pm UTC

why? it would be much "harder".

where completed_date between trunc(date_in)
and completed < trunc(date_in)+1-1/24/60/60;

Correcting Tom's typo

A reader, August 17, 2005 - 6:36 pm UTC

It should be

where completed_date between trunc(date_in)
and trunc(date_in)+1-1/24/60/60;

instead of

where completed_date between trunc(date_in)
and completed < trunc(date_in)+1-1/24/60/60;

August 18, 2005 - 3:27 pm UTC

yup, correct, thanks!

Dynamic Condition

Mita, August 31, 2005 - 4:10 pm UTC

I have a reporting requirement which needs the query conditions to be defined dynamically.

section column value incl_excl
A id 1 I
A id 2 E
A catg A I
A catg B I
B type 3 I
C cntry ABC I

SECTION column represents a section of report which should display data satisfy all include and exclude conditions. Each section can have differnt no of include/exclude conditions. Data should meet all the include conditions for all different columns. (it would be like IN for include conditions with same column)

what's the best way to do it ??

September 01, 2005 - 1:25 am UTC

not sure I understand.

More Info

Mita, September 01, 2005 - 11:34 am UTC

Let me explain you in simple terms.

I have a report which shows products under different header/sections based on the attributes of the product. we have to add new headers/sections every now and then based on new products so we decided to make this query metadata driven where we will store each section/header and attributes of the product which will fall under that.

what will be the best way to achieve that ?? One way would be generating whole SQLs dynamically based on metadata.

September 01, 2005 - 3:51 pm UTC

sorry, not quite simple enough.

metadata is too big a word maybe.

remember, I don't work with you, I haven't been looking at your problem with you for hours or days. I know nothing about it.

(and remember, metadata sounds really cool, it is good for documentation, but to run a real system - you might rethink that if scalability, performance, or ease of understanding is among your goals)

Aggregate Function - Multiplication

A reader, November 18, 2005 - 9:22 am UTC

Hi Tom,

Is there any aggregate function for multiplication which is the counterpart of the sum() function. Currently I am using a for loop to iterate over all the rows which is not elegant.

e.g.
select product(number) from mytable;

November 18, 2005 - 3:20 pm UTC

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:2200571416651 http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:2200571416651#2211353573530 <code>

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

"How to get max and sum from two different tables in the same select",

Not sure...

Does it not have to be count(distinct t2.rowid) always?

How about a new book: "Cool Little SQL Tricks"

Thanks.

Interesting...

Sufficient condition for fan trap?

group functions

group functions

what im waiting for

Explanation of the terms "fan trap" and "chasm trap"

Group functions

How to sum the dimension table columns?

here are the scropts

what's the solution you have?

The thing that is missing

not for nothing

almost

You are the KING

ok

What is (tm)?

sum of same fields in two different tables

Date function

??

Correcting Tom's typo

Dynamic Condition

More Info

Aggregate Function - Multiplication

More to Explore

DBMS_RANDOM