I have some situations where I have a report built in Apex with a huge query in the application. Ideally we like to keep as much code as possible on the database (for dependency checking purposes etc) however the reason these have been left in Apex is because the queries are parameterised and for whatever reason the predicates can't be pushed into views.
So one thing I've been considering is a package with a set of pipelined for these, where I have parameters to the function, run the query and then pipe the row out.
The main concern I have however is whether there is a performance risk in doing this - i.e. is one row at a time really processed or does the db bulk it up internally?
My initial tests look like there are some differences - but I'm not 100% sure on how to interpret the results - i.e. whether it's just an artifact of the method used or a genuine performance decrease?
create table xxtst as select mod(level,5) i, level j from dual connect by level <= 10000;
exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'xxtst');
create type xxtst_tab as table of number;
/
create or replace function xxtst_p(pi in number) return xxtst_tab pipelined is
begin
for i in (select j from xxtst where i=pi) loop
pipe row(i.j);
end loop;
end;
/
set autotrace traceonly;
select j from xxtst where i=2;
select * from table(xxtst_p(pi=>2));
drop function xxtst_p;
drop type xxtst_tab;
drop table xxtst;
exit;
$> sp @tst.sql
SQL*Plus: Release 10.1.0.5.0 - Production on Wed Aug 30 12:01:07 2017
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Table created.
PL/SQL procedure successfully completed.
Type created.
Function created.
2000 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=25 Card=2000 Bytes
=14000)
1 0 TABLE ACCESS (FULL) OF 'XXTST' (TABLE) (Cost=25 Card=2000
Bytes=14000)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
162 consistent gets
0 physical reads
0 redo size
37017 bytes sent via SQL*Net to client
2182 bytes received via SQL*Net from client
144 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2000 rows processed
2000 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=104 Card=8168 Byte
s=16336)
1 0 COLLECTION ITERATOR (PICKLER FETCH) OF 'XXTST_P' (PROCEDUR
E) (Cost=104 Card=8168 Bytes=16336)
Statistics
----------------------------------------------------------
53 recursive calls
6 db block gets
117 consistent gets
0 physical reads
0 redo size
37028 bytes sent via SQL*Net to client
2182 bytes received via SQL*Net from client
144 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2000 rows processed
Function dropped.
Type dropped.
Table dropped.
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Thanks for the nice test case.
Yeah, I'm not sure we can totally believe the autotrace results here, so I'll use a SQL trace so we can dig into the recursive calls.
SQL> create table xxtst as select mod(level,5) i, level j from dual connect by level <= 100000;
Table created.
SQL>
SQL> exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'xxtst');
PL/SQL procedure successfully completed.
SQL>
SQL> create or replace type xxtst_tab as table of number;
2 /
Type created.
SQL>
SQL> create or replace function xxtst_p(pi in number) return xxtst_tab pipelined is
2 begin
3 for i in (select j from xxtst where i=pi) loop
4 pipe row(i.j);
5 end loop;
6 end;
7 /
Function created.
SQL> set arraysize 100
SQL> set feedback only
SQL> exec dbms_monitor.session_trace_enable
PL/SQL procedure successfully completed.
SQL> select j from xxtst where i=2;
20000 rows selected.
SQL> select * from table(xxtst_p(pi=>2));
20000 rows selected.
SQL> exec dbms_monitor.session_trace_disable
PL/SQL procedure successfully completed.
SQL> set feedback on
SQL> @tk
VALUE
-------------------------------------------------------------------------------------------------
VALUE1
-------------------------------------------------------------------------------------------------
C:\ORACLE\diag\rdbms\db122\db122\trace\db122_ora_4492.trc
C:\ORACLE\diag\rdbms\db122\db122\trace\db122_ora_4492.prf
1 row selected.
TKPROF: Release 12.2.0.1.0 - Development on Wed Sep 6 08:52:45 2017
Copyright (c) 1982, 2017, Oracle and/or its affiliates. All rights reserved.
SQL>
SQL>
SQL>
In the trace file I see this:
Raw SQL
=======
select j
from
xxtst where i=2
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 2 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 201 0.00 0.00 0 388 4 20000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 203 0.00 0.00 0 388 6 20000
Pipelined SQL
=============
SELECT J
FROM
XXTST WHERE I=:B1
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 201 0.01 0.00 0 388 4 20000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 203 0.01 0.00 0 388 4 20000
select *
from
table(xxtst_p(pi=>2))
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.00 0 45 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 201 0.01 0.02 0 0 0 20000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 203 0.03 0.02 0 45 0 20000
</code>
So (as expected) in a case where you are doing *exactly* the same, ie, fetching from a table, then adding a pipeline function on top of that, does create an overhead, but you can also see, all the normal goodness applies, ie, we are doing array fetching both inside the function and from the function itself.
But of course the real benefits of pipelines come into play when the functionality is more complicated, ie
- fetch (or be passed) some source rows
- additional functionality to perturbate the data
- pipe out the computed results