Thanks for the question, lh.

Asked: April 10, 2019 - 6:13 am UTC

Last updated: April 16, 2019 - 9:56 am UTC

Version: 12.2

Viewed 1000+ times

You Asked

Hi

We have merge statements which get their values from non deterministic functions.
Some of the updates merge does are not 'real' in a sence that actual values do not change.
We would prefer so that these updates are not made at all but do not want those functions to be evaluated twice .

How can we accomplish this in an Oracle supported way ?
Only way we have found sofar is by using undocumented materialize hint.

Here is simple test case for this.

create or replace package lhkoep as
    function lhkoe(p_a number, p_b varchar2, p_c date) return varchar2 ;
end;
/

create or replace package body lhkoep as
    function lhkoe(p_a number, p_b varchar2, p_c date) return varchar2 is
    begin
        dbms_lock.sleep(1);
        return p_a;
    end;
end;
/


select 
    lhkoep.lhkoe(x, 'A', TO_DATE('20190103','YYYYMMDD') )
from (
select
    level x
from 
    dual connect by level < 5
);
-- 4 seconds.
    
select 
    lhkoep.lhkoe(x,'A', TO_DATE('20190103','YYYYMMDD') )
  from 
(select 
     level x
from 
    dual connect by level < 5
)
where not ((lhkoep.lhkoe(x,'A', TO_DATE('20190103','YYYYMMDD') ) is null
       and X is null)
       or (lhkoep.lhkoe(X,'A', TO_DATE('20190103','YYYYMMDD') ) = X)
       );
-- 8 s

select 
    lhkoep.lhkoe(x,'A', TO_DATE('20190103','YYYYMMDD') )
  from 
(select 
     level x
from 
    dual connect by level < 5
)
where ((lhkoep.lhkoe(x,'A', TO_DATE('20190103','YYYYMMDD') ) is null
       and X is null)
       or (lhkoep.lhkoe(X,'A', TO_DATE('20190103','YYYYMMDD') ) = X)
       );
-- 12 s


     
select 
   new_v
from (  
     select 
         lhkoep.lhkoe(x,'A', TO_DATE('20190103','YYYYMMDD') )  new_v,
         x old_v
       from 
           (select 
                 level x
             from 
                  dual connect by level < 5
           )
     )
where ((new_v is null and old_v is null)
      or (new_v = old_v));
-- -- -- 12 s
with d as (
select --+ MATERIALIZE
         lhkoep.lhkoe(x,'A', TO_DATE('20190103','YYYYMMDD') )  new_v,
         x old_v
       from 
           (select 
                 level x
             from 
                  dual connect by level < 5
           )
)
select 
   new_v
from d
where ((new_v is null and old_v is null)
      or (new_v = old_v));
-- 4 s

and Chris said...

We have merge statements which get their values from non deterministic functions.

That sounds... dangerous...

Anyway, just push the function call into your row generator and you'll be back to 4s execution times:

set timing on

with d as (
  select new_v,
         x old_v
  from (
    select level x,
           lhkoep.lhkoe (level,'A',to_date ('20190103','YYYYMMDD')) new_v
    from   dual 
    connect by level < 5
  )
)
select new_v
from   d
where (
  ( new_v is null and old_v is null ) or 
  ( new_v = old_v )
);

NEW_V   
1       
2       
3       
4       

Elapsed: 00:00:04.571

Rating

(3 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

lh, April 11, 2019 - 11:22 am UTC

Hi

These functions can read data from database or access sysdate function. So in theory the results are not same when called with same parameters. In practice, they are.

My original test case seems not to have been very good.
This row generator might affect results.
When testing with regular table, results are different.

create table four_row_table (
c1 number(1)
);
insert into four_row_table values (1);
insert into four_row_table values (2);
insert into four_row_table values (3);
insert into four_row_table values (4);
commit;


with d as (
  select new_v,
         c1 old_v
  from (
    select c1,
           lhkoep.lhkoe (c1,'A',to_date ('20190103','YYYYMMDD')) new_v
    from   four_row_table
  )
)
select new_v
from   d
where (
  ( new_v is null and old_v is null ) or 
  ( new_v = old_v )
);
8s

with d as (
  select --+ materialize 
         new_v,
         c1 old_v
  from (
    select c1,
           lhkoep.lhkoe (c1,'A',to_date ('20190103','YYYYMMDD')) new_v
    from   four_row_table
  )
)
select new_v
from   d
where (
  ( new_v is null and old_v is null ) or 
  ( new_v = old_v )
);
4s

With 12.1 this materialize hint seems to generate some io. If no better solution is found we are hoping to get rid of it when upgrade to 12.2 happens.
(In memory cursor duration temp tables,
https://oracle-big-data.blogspot.fi/2017/03/this-post-covers-one-of-new-sql.html )

lh

April 11, 2019 - 2:27 pm UTC

Scalar subquery caching may help:

set timing on

with d as (
  select ( select lhkoep.lhkoe (c1,'A',to_date ('20190103','YYYYMMDD'))
             from   dual 
         ) new_v,
         c1 old_v
  from   four_row_table
)
select new_v
from   d
where (
  ( new_v is null and old_v is null ) or 
  ( new_v = old_v )
);

NEW_V   
1       
2       
3       
4       


Elapsed: 00:00:04.311

why does it have to be this way?

Racer I., April 12, 2019 - 7:58 am UTC

Hi,

Fascinating.

https://oracle-big-data.blogspot.fi/2017/03/this-post-covers-one-of-new-sql.html

> Just to be absolutely clear - there are no hints or parameters you can set to force the use of In-Memory “cursor duration” temporary tables.

Why, oh wy? This would be the perfect opportunity to make MATERIALIZE offical AND better...

Many a query I used MATERIALIZE to force single executions with abandon.

You have to see if 12.2 and up will be obliging enough to use this in your case.

A chink :
> The following types of queries commonly use cursor duration temp tables:
> Grouping Sets

with
rs as (select  level x from dual connect by level < 5),
fv as (
  select lhkoep.lhkoe(x,'A', TO_DATE('20190103','YYYYMMDD') )  new_v, x old_v
  from rs),
gs as (
  select min(old_v) old_v, new_v
  from fv
  group by grouping sets (new_v))
select new_v
from gs
where ((new_v is null and old_v is null) or (new_v = old_v));

with
fv as (
  select lhkoep.lhkoe(c1,'A', TO_DATE('20190103','YYYYMMDD') )  new_v, c1 old_v
  from four_row_table),
...same as above
;

Both these run 4 s in our 12.1 db.
Strangely both report dual executions in their explan plans and no temp table creation. Don't know what the overhead for the GS would be in a regular scenario.

Also strange : the OPs original version (without MATERIALIZE) also reports two executions but the 12 s means triple executions. So it seems that version actually computes the inner select values even though the statement ignores these and executes them twice again for the filter. How inefficient can you get?

regards,

April 15, 2019 - 4:38 am UTC

Ultimately, (even with materialize or any other optimization mechanism) we *never* make any guarantees about how many times a function will be executed during the running of a SQL statement.

If it's deterministic, then it should be specified as so, so the chance of avoiding repeated executions are minimized.

If it's not deterministic...then ... man, it shouldn't be in a SQL statement :-)

have no cake AND don't eat it

Racer I., April 15, 2019 - 1:28 pm UTC

Hi,

Fair point. I would try to split this into two arguments :
- SQL (the standard) apparently doesn't specify any behaviour for functions (and likely can't). So don't rely on specific numbers of executions for a correct result. Or the other way round : Any result you get is still "correct" (quote/unquote ;) even if function shenanigans *) have screwed it over. CAVEAT EMPTOR.
- However I wish there was a way to influence number of executions for performance reasons. Like PARALLEL is a performance hint without effect on the correctness of the result. I would still get "correct" results but at least faster. If I move multiple expensive function calls into an inner query to have them executed only once I'd like them to stay there or have an (official) HINT to influence that. Like NOT_LIFT_PRED (opposite of PUSH_PRED) or something.
*) like the mid-statement-restart you may get (from MERGE-statements I think)

regards,

April 16, 2019 - 9:56 am UTC

Well there is the UDF pragma. Which tells the database that you'll primarily call the function in SQL statements. And to optimize accordingly.

This may not reduce the number of executions. But can help minimize the SQL<>PL/SQL context switch.

Breadcrumb

Question and Answer

You Asked

and Chris said...

Rating

Comments

why does it have to be this way?

have no cake AND don't eat it

More to Explore

PL/SQL demos

PL/SQL docs

How to avoid functions being called twice in a single sql-statement?

Breadcrumb

Question and Answer

You Asked

and Chris said...

Rating

Comments

why does it have to be this way?

have no cake AND don't eat it

More to Explore

PL/SQL demos

PL/SQL docs