Skip to Main Content

Breadcrumb

May 4th

Question and Answer

Connor McDonald

Thanks for the question, suman.

Asked: October 17, 2017 - 1:09 pm UTC

Last updated: October 18, 2017 - 3:26 am UTC

Version: 11g

Viewed 1000+ times

You Asked

I want to optimize the following query

SELECT t.merchant_id,
t.brand_id,
t.transaction_type,
t.trns_currency_code,
COUNT (*) AS total_no_tx,
SUM (t.mv_is_trnx_approved) AS approved_no_tx,
COUNT (t.mv_is_trnx_approved) AS count_approved_trnx,
SUM (t.mv_total_sales) AS total_sales,
COUNT (t.mv_total_sales) AS count_total_sales,

TRUNC (FROM_TZ (CAST (t.order_date AS TIMESTAMP), 'GMT') AT TIME ZONE m.timezone) AS transaction_date
FROM dmart.trxns t, mdb.merchant m
WHERE t.merchant_id = m.merchant_id
GROUP BY t.merchant_id,
t.brand_id,
t.transaction_type,
t.trns_currency_code,
TRUNC (FROM_TZ (CAST (t.order_date AS TIMESTAMP), 'GMT') AT TIME ZONE m.timezone)

Problem is both the tables has millions of records and the count and sum is taking a lot of time to execute. I've indexes created on mv_is_trnx_approved and mv_total_sales. Please is there any alternative way.

and Connor said...

"Problem is both the tables has millions of records" when combined with

a) this appears to be a standard parent/child relationship (ie, every txn will have a merchant)
b) no other criteria beside the join
c) the absence of any other information

your best option is probably a hash join, and indexes are of little benefit.

So you could try as a first step the hints:

USE_HASH(t) LEADING(m t) FULL(m) FULL(t)

If that gives you the performance you want, then look at why the optimizer did not do that automatically (bad stats etc).

If it does not, then do a trace, and see where the time is lost - you may have other issues (slow storage, insufficient cpu) etc

Rating

  (5 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Manual PLACE_GROUP_BY

Sayan Malakshinov, October 18, 2017 - 12:53 am UTC

You may know that since 11.1 Oracle has "QKSFM_PLACE_GROUP_BY" mechanism (hints /*+PLACE_GROUP_BY*/ NO_PLACE_GROUP_BY). It allows to reduce amount of work for oracle. Unfortuantely, in your case it can't help because you are using one value from second table.
But you can use scalar subquery caching to reduce this overhead:
SELECT 
   t.merchant_id, 
   t.brand_id, 
   t.transaction_type, 
   t.trns_currency_code, 
   COUNT (*) AS total_no_tx, 
   SUM (t.mv_is_trnx_approved) AS approved_no_tx, 
   COUNT (t.mv_is_trnx_approved) AS count_approved_trnx, 
   SUM (t.mv_total_sales) AS total_sales, 
   COUNT (t.mv_total_sales) AS count_total_sales, 
   TRUNC (FROM_TZ (CAST (t.order_date AS TIMESTAMP), 'GMT') 
      AT TIME ZONE (select m.timezone from  mdb.merchant m WHERE t.merchant_id = m.merchant_id)
   ) AS transaction_date 
FROM dmart.trxns t
GROUP BY 
      t.merchant_id, 
      t.brand_id, 
      t.transaction_type, 
      t.trns_currency_code, 
      TRUNC (FROM_TZ (CAST (t.order_date AS TIMESTAMP), 'GMT') 
         AT TIME ZONE (select m.timezone from  mdb.merchant m WHERE t.merchant_id = m.merchant_id)
      ) AS transaction_date 
/

Redesign the table

Sayan Malakshinov, October 18, 2017 - 12:59 am UTC

Though, perhaps I'd prefer to redesign this table, because merchant can relocate to another country and change its' timezone, so it would allow not only to avoid this join, but also will alllow to see original timezone of this transaction (for example for local discounts and so on)
Connor McDonald
October 18, 2017 - 3:25 am UTC

Agreed, but there is not much we can work with here given that we've just been given the SQL and nothing else.

To Sayan on Scalar subquery caching

Rajeshwaran, Jeyabal, October 18, 2017 - 1:30 am UTC

....
But you can use scalar subquery caching to reduce this overhead
....


then

"Problem is both the tables has millions of records"

So given that millions of record on both the table, does a scalar query scale up?
Scalar subquery caching are good for response time and not for through put.

drop table t1 purge;
drop table t2 purge;

create table t1 as select * from all_objects;
create table t2 as select * from t1;
insert into t2 select * from t1;
insert into t2 select * from t1;
commit;

alter table t1 add constraint t1_pk primary key(object_id);
alter table t2 add constraint t2_fk foreign key(object_id)
references t1(object_id);
create index t2_idx on t2(object_id);
exec dbms_stats.gather_table_stats(user,'T1');
exec dbms_stats.gather_table_stats(user,'T2');


exec runstats_pkg.rs_start;
begin 
 for x in ( select t1.object_id, max(t2.object_id)
  from t1, t2 
  where t1.object_id = t2.object_id
  group by t1.object_id )
 loop
  null ;
 end loop;
end;
/

exec runstats_pkg.rs_middle;
begin 
 for x in ( select t1.object_id, ( select max(t2.object_id) 
  from t2 
  where t2.object_id = t1.object_id ) max_id 
  from t1 )
 loop
  null;
 end loop;
end;
/ 

exec runstats_pkg.rs_stop(1000);

demo@ORA11G> exec runstats_pkg.rs_stop(1000);
Run1 ran in 11 hsecs
Run2 ran in 34 hsecs
run 1 ran in 32.35% of the time

Name                                  Run1        Run2        Diff
STAT...buffer is pinned count            0       2,227       2,227
STAT...no work - consistent re       1,406       5,233       3,827
STAT...consistent gets from ca       1,416       5,250       3,834
STAT...consistent gets               1,426       5,825       4,399
STAT...consistent gets from ca       1,426       5,825       4,399
STAT...session logical reads         1,461       5,870       4,409
LATCH.cache buffers chains           3,030      11,804       8,774
STAT...index scans kdiixs1               1      84,575      84,574
STAT...logical read bytes from  11,968,512  48,087,040  36,118,528

Run1 latches total versus runs -- difference and pct
        Run1        Run2        Diff       Pct
       3,361      12,560       9,199     26.76%

PL/SQL procedure successfully completed.

demo@ORA11G>

Connor McDonald
October 18, 2017 - 3:26 am UTC

Which brings us back to point (c) in my response.

You can't tune an SQL by looking at the just SQL text.

To Rajeshwaran on Scalar subquery caching

Sayan Malakshinov, October 18, 2017 - 7:12 am UTC

>So given that millions of record on both the table,
> does a scalar query scale up?
>Scalar subquery caching are good for response time
> and not for through put.
Of course, you're right that often hash join is much better than SSC, but from my experience:
1) The number of transactions is much more than number of merchants;
2) Usually just 5% of merchants generates 95% of transactions

PS. btw, your example is not quite relevant to the question:
1) your query is a best subject for groupby placement;
2) on versions> 12.1 your subquery from second query could be unnested and group-by placement can move it into group-by just by second table,
so CBO can rewrite your query as the first one and both of them will have the same plan.
Check it:
SQL> explain plan for
  2    select t1.object_id, ( select max(t2.object_id)
  3    from t2
  4    where t2.object_id = t1.object_id ) max_id
  5    from t1
  6  ;

Explained.

SQL> @xplan

P_FORMAT
----------------
typical


PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------
Plan hash value: 3931102600

------------------------------------------------------------------------------------------
| Id  | Operation             | Name     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |          | 75299 |  2279K|       |   745   (1)| 00:00:01 |
|*  1 |  HASH JOIN OUTER      |          | 75299 |  2279K|  1256K|   745   (1)| 00:00:01 |
|   2 |   INDEX FAST FULL SCAN| T1_PK    | 75299 |   367K|       |    44   (0)| 00:00:01 |
|   3 |   VIEW                | VW_SSQ_1 | 76296 |  1937K|       |   502   (1)| 00:00:01 |
|   4 |    HASH GROUP BY      |          | 76296 |   372K|  2672K|   502   (1)| 00:00:01 |
|   5 |     INDEX FULL SCAN   | T2_IDX   |   225K|  1103K|       |   502   (1)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ITEM_1"(+)="T1"."OBJECT_ID")

As you see, group-by was done just on T2 table and after that it was outer-joined to T1.

You can also check trace 10053 to see how it looks after transformations:
set echo on;
alter session set MAX_DUMP_FILE_SIZE = unlimited;
alter session set tracefile_identifier='grby';
alter session set events '10053 trace name context forever, level 1';

select t1.object_id, ( select max(t2.object_id) from t2 where t2.object_id = t1.object_id ) max_id from t1;
set echo off;
disc;
--after that from the trace:
Final query after transformations:******* UNPARSED QUERY IS *******
SELECT "T1"."OBJECT_ID" "OBJECT_ID","VW_SSQ_1"."MAX(T2.OBJECT_ID)" "MAX_ID" 
FROM  (SELECT MAX("T2"."OBJECT_ID") "MAX(T2.OBJECT_ID)","T2"."OBJECT_ID" "ITEM_1" 
FROM "XTENDER"."T2" "T2" GROUP BY "T2"."OBJECT_ID") "VW_SSQ_1","XTENDER"."T1" "T1" 
WHERE "VW_SSQ_1"."ITEM_1"(+)="T1"."OBJECT_ID"


The same transformed query you would get with first your query.

Fix of my response To Rajeshwaran on SSC

Sayan Malakshinov, October 18, 2017 - 7:20 am UTC

There is a typo in
"2) on versions > 12.1 your subquery from second query could be unnested and group-by placement can move it into group-by just by second table, "

It should be:
"2) on versions >= 12.1 your subquery from second query could be unnested and group-by placement can move it into group-by just by second table, "

More to Explore

Performance

Get all the information about database performance in the Database Performance guide.