To start this off, I kind of have a solution for my problem, I'm just trying to understand the mechanics behind it.
Ok, here's the set-up:
- I have a very simple APEX page: a report on a view + 2 date pickers to filter the data;
- The query is as simple as can be too: SELECT * FROM view WHERE ck_date BETWEEN :P1_START_DATE AND :P1_END_DATE;
- The view itself is very straightforward too: SELECT columns FROM table. No extra filters, conditions, or any processed columns. All it does is it selects a subset of the columns in the table, so that not all columns are exposed to the app;
- Now here is the trick: The APEX app and the view are on server A, while the underlying table is on server B. So it all goes through a DB link. And unfortunately, there is considerable latency (TNSPING=60ms);
Here's what I've done:
I'm working with a date range than generates roughly 3k rows of data.
If I use this:
SELECT *
FROM db_view
WHERE ck_date BETWEEN :P1_START_DATE AND :P1_END_DATE
as the source of the report region, the page eventually crashes after several minutes of waiting.
If I use this:
WITH
daterange AS (SELECT TO_DATE(:P1_START_DATE, 'DD-MON-YY') "START_DATE",
TO_DATE(:P1_END_DATE, 'DD-MON-YY') "END_DATE"
FROM DUAL)
SELECT db.*
FROM db_view db,
daterange dr
where db.ck_date BETWEEN dr.start_date AND dr.end_date
... the page loads just fine. Of course, there's a bit of delay due to latency on the DB link, but it loads up to 40k rows without failing, while the other version can't even load 3k.
I came to using the second version based on the assumption:
- that somehow the bind variables are evaluated for every row;
- that the optimizer can't really ... optimize without a given value, and using the WITH clause might help with this;
It looks to me like using the bind variables causes row-by-row process, generating a lot of back-and-forth on the network, which in turns increases the wait times by a huge factor.
I'm looking to understand
- the mechanics behind the two queries, and why the second works faster, how it differs from the first.
- What is the impact of using bind variables;
- When are bind variables evaluated - is it with every row?
- ... and whatever you may see fit;
If there are any URLs / topics for me to research, that would be even better - I tried googling first, but honestly I don't really know what to google for?
Can you please help a newbie out? :)
Thanks a lot!
First up, you have to_date on your binds in the with query. But not in the standard where. So it's possible you have some implicit conversions going on which is why one is faster than the other.
Oracle Database will bind the values when it executes the query. It only does this once. But it does have to inspect all the rows considered against the bind values.
There's no real overhead to this though. The main difference between using binds and literals (e.g. date'2016-07-07') is the optimizer may choose a different plan.
To figure out why the first is slower than the second, you need to get an execution plan. For details on these read:
https://blogs.oracle.com/sql/entry/how_to_create_an_execution Note: as you have a database link, you'll need to go to the remote site to get the plan there. This is a bit convoluted. You need to:
1. Enable statitics_level = all at the remote site
2. Figure out what the query at the remote site is
3. Get the plan at the remote site
For step 1, create a procedure like this on the remote site:
create or replace procedure p is
begin
execute immediate 'alter session set statistics_level = all';
end p;
/
Then call it from your local database:
exec p@loopback;
You can usually find the SQL statement at the remote site by enabling autotrace in SQL*Plus:
SQL> with dts as (
2 select to_date(:st, 'dd/mm/yyyy') st, to_date(:en, 'dd/mm/yyyy') en from dual
3 )
4 select /*+ gather_plan_statistics */v.* from v, dts
5 where x between st and en;
Elapsed: 00:00:00.10
Execution Plan
----------------------------------------------------------
Plan hash value: 593061001
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Inst |IN-OUT|
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 33 | 297 | 4 (0)| 00:00:01 | | |
|* 1 | FILTER | | | | | | | |
| 2 | NESTED LOOPS | | 33 | 297 | 4 (0)| 00:00:01 | | |
| 3 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 | | |
| 4 | REMOTE | T | 33 | 297 | 2 (0)| 00:00:01 | LOOPB~ | R->S |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(TO_DATE(:EN,'dd/mm/yyyy')>=TO_DATE(:ST,'dd/mm/yyyy'))
Remote SQL Information (identified by operation id):
----------------------------------------------------
4 - SELECT /*+ */ "X" FROM "T" "T" WHERE "X">=:1 AND "X"<=:2 (accessing
'LOOPBACK.DBPM8.ORACLECLOUD.INTERNAL' )
Note the "Remote SQL" at the bottom. Use this on the other database to find the SQL_ID of the remote SQL:
SQL> select sql_id
2 from v$sql
3 where sql_text = 'SELECT /*+ */ "X" FROM "T" "T" WHERE "X">=:1 AND "X"<=:2'
4 /
SQL_ID
-------------
c3f73w62gsq4q
You can then pass this SQL_id to DBMS_Xplan on the remote site to get the plan and its stats!
select * from table(dbms_xplan.display_cursor('c3f73w62gsq4q', null, 'ALLSTATS LAST'));
If, once you have plans for both queries, you're still not clear what the issue is, please post them both here and we'll see how we can help.
Finally - try executing the queries in SQL Dev, SQL*Plus, etc. Do you get the same differences in these environments?