Thanks for the question, Dan.

Asked: September 29, 2008 - 4:28 pm UTC

Last updated: February 04, 2009 - 9:49 am UTC

Version: 10.1.0.2

Viewed 10K+ times! This question is

You Asked

Hi Tom, thanks for your time.

My question is in regards toward creating a temporary table and indexing it, vs using an inline view.

Consider:
Process 1:

SELECT <stuff>
FROM a,
     b,
     (SELECT <more stuff>) c
WHERE <join>
  AND <criteria>;

Process 2:

CREATE TABLE c AS
SELECT <more stuff>;

CREATE INDEX c_index ON c(some useful columns);

SELECT <stuff>
FROM a,
     b,
     c
WHERE <join>
  AND <criteria>;

Could Process 2 lead to better performance by creating the index, which Oracle otherwise wouldn't have access to? Or would Oracle optimize the inline view in Process 1 so that this is a non-issue?

Thanks.

and Tom said...

Doubtful that an index would be useful, we would hash the data we needed into memory - it is better than indexed that way - instant access to the data. Or we would sort it and sort merge the data by the key.

Also, the optimizer is free to merge into your view - it can 'holistically' optimize the entire statement when you do it right (a single statement).

Do not in general try to outsmart the optimizer. Pack it into a single sql statement, avoid procedural code.

Rating

(11 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Dan, October 01, 2008 - 1:20 pm UTC

So if I have a situation where there is an apparent difference (creating the inline view as a table and then joining to it runs at least an order of magnitude faster, actually regardless of of any indexing) would you say that's generally because of bad statistics/optimization variables? (obviously there are exceptions to any statement as broad as that one)

Cannot get tkprofs, but here's the explain plans:
Create table:

                          Optimizer       Cost    Cardinality
CREATE TABLE STATEMENT    ALL_ROWS        8720    427597
LOAD AS SELECT TAB3
HASH JOIN(RIGHT SEMI)                     8331    427597
  TABLE ACCESS(FULL) TAB1 ANALYZED        146     515915
  TABLE ACCESS(FULL) TAB2 ANALYZED        7012    434442

Merge:

                        Optimizer Cost    Cardinality
MERGE STATEMENT         ALL_ROWS  2561    105037
MERGE TAB1
VIEW
HASH JOIN                         2561    105037
  VIEW                            1744    105037
  SORT(UNIQUE)                    1744    105037
  WINDOW(SORT)                    1744    105037
  TABLE ACCESS(FULL) TAB3         112     105037
TABLE ACCESS(FULL) TAB1 ANALYZED  148     515915

Merge with inline view instead of create:

                            Optimizer Cost    Cardinality
MERGE STATEMENT             ALL_ROWS  8114    21380
MERGE TAB1
VIEW
HASH JOIN                             8114    21380
  VIEW                                7476    21380
  SORT(UNIQUE)                        7476    21380
  WINDOW(SORT)                        7476    21380
  HASH JOIN(SEMI)                     7205    21380
    TABLE ACCESS(FULL) TAB2 ANALYZED  7048    21722
    TABLE ACCESS(FULL) TAB1 ANALYZED  146     515915
TABLE ACCESS(FULL) TAB1     ANALYZED  148     515915

The intent is to update a table of customer ids(tab1) with the most recent valid email (stored on tab2). Email format is not necessarily validated, so a REGEXP_LIKE pattern is used to make sure it follows at least a basically correct pattern. I'm using FIRST_VALUE partitioned by customer id and ordered by the add_date to pick out the correct email for each customer.

It looks to me as if the plans are pretty similar as far as table accesses go, so I'm kind of at a loss as to why one would take significantly longer then the other. Unless with the inline view, Oracle ends up doing extra analytics that it doesn't need?

October 01, 2008 - 2:18 pm UTC

... So if I have a situation where there is an apparent difference (creating the inline view as a table and then joining to it runs at least an order of magnitude faster, actually regardless of of any indexing) would you say that's generally because of bad statistics/optimization variables? (obviously there are exceptions to any statement as broad as that one) ...

take your last bit to heart, that would be my answer.

What sort of times are we talking about here.

Dan, October 01, 2008 - 2:43 pm UTC

50k records:
create + merge : 20 seconds
Merge w/inline view : 381 seconds

500k records:
create + merge : 93 seconds
Merge w/inline view : 418 seconds

So not necessarily an order of magnitude faster as size goes up, but it does seem like oracle is "de-optimizing" and performing more work then is necessary. I don't change any code between the two, other then copying the SELECT... statement directly into the merge as a view - nothing else.

(I don't know why these paragraphs format strangely, spent a bit messing with it but couldn't find a fix).

October 01, 2008 - 5:40 pm UTC

can i see tkprofs of each?

Dan, October 02, 2008 - 4:46 pm UTC

I will work on getting them, but I don't normally have access to trace logs. Will followup if/when I can.

Dan, October 07, 2008 - 4:43 pm UTC

Tom,

I'm not able to get tkprofs - access to trace files is "unnecessary". I'm looking at setting up Oracle Express Edition on a personal computer, replicating the data structure and populating it with dummy data, but it will take a while before I can get it setup to be useful. In the meantime, is there anything I can give you that would be helpful besides a tkprof?

If not, I'll just update when I can. Thanks again for your time.

October 08, 2008 - 9:54 pm UTC

do you have v$ access?

Dan, October 09, 2008 - 10:12 am UTC

Partial. I can see 4-5 v$ views, and a bunch of v_$ views, only some of which actually "exists" for me (I can see through SQL Developer, but a SELECT... gives a "table or view does not exists").

October 09, 2008 - 11:19 am UTC

V_$SQL_PLAN_STATISTICS
V_$SQL_PLAN_STATISTICS_ALL

can you see either of those?

Dan, October 09, 2008 - 1:11 pm UTC

I can see that both exists, I cannot access data in either.

October 09, 2008 - 2:16 pm UTC

can you get access to them? Can you ask to have access to them? they have the ability to show us what a tkprof on a trace might show us.

Use Exists

Stephan, October 13, 2008 - 12:02 am UTC

Hi Tom,
Thanks for your time.
Below is my original query and TKPROF output.

********************************************************************************

SELECT DISTINCT C.CLI_NUM,
d.BIRTH_DT,
d.CLI_NM,
d.CLI_NM,
'1',
'C'
FROM table1 b, table2 c, table3 d, table4 f
WHERE f.unit_head_agt_key = '1234567812'
AND B.AGT_KEY = F.AGT_KEY
AND c.CLI_NUM = d.CLI_NUM
AND b.POL_NUM = c.POL_NUM
AND B.LOB = C.LOB
AND B.SUB_GRP_NUM = C.SUB_GRP_NUM
AND B.CERT_NUM = C.CERT_NUM
AND (UPPER(c.SURNAME) LIKE '%' OR
UPPER(c.SURNAME) LIKE '%')

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.01 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 414 2.87 4.85 6838 46398 0 6188
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 416 2.89 4.86 6838 46398 0 6188

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 300

Rows Row Source Operation
------- ---------------------------------------------------
6188 SORT UNIQUE
17994 NESTED LOOPS
17994 HASH JOIN
8985 NESTED LOOPS
15 TABLE ACCESS BY INDEX ROWID table4
15 INDEX RANGE SCAN IX11_table4 (object id 160326)
8985 INDEX RANGE SCAN PK1_table1 (object id 149463)
912162 TABLE ACCESS FULL table2
17994 TABLE ACCESS BY INDEX ROWID table3
17994 INDEX UNIQUE SCAN CDT_PK (object id 152916)

********************************************************************************
Then I have modify the query as below.

********************************************************************************

SELECT d.CLI_NUM,
d.BIRTH_DT,
d.CLI_NM,
d.CLI_NM,
'1',
'C'
FROM table3 d
where exists
( select distinct C.CLI_NUM
from table2 c
where exists
(select 'x'
from table1 b
where exists (select 'x'
from table4 a
where a.unit_head_agt_key = '1234567812'
and b.agt_key = a.agt_key)
and b.pol_num = c.pol_num
and b.sub_grp_num = c.sub_grp_num
and b.cert_num = c.cert_num
and b.lob = c.lob)
AND (UPPER(c.SURNAME) LIKE '%' OR UPPER(c.SURNAME) LIKE '%')
and d.cli_num = c.cli_num)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 414 2.79 4.76 7355 23198 0 6188
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 416 2.79 4.77 7355 23198 0 6188

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 300

Rows Row Source Operation
------- ---------------------------------------------------
6188 NESTED LOOPS
6188 VIEW
6188 SORT UNIQUE
17994 HASH JOIN
8985 NESTED LOOPS
15 TABLE ACCESS BY INDEX ROWID table4
15 INDEX RANGE SCAN IX11_table4 (object id 160326)
8985 INDEX RANGE SCAN PK1_table1 (object id 149463)
912162 TABLE ACCESS FULL table2
6188 TABLE ACCESS BY INDEX ROWID table3
6188 INDEX UNIQUE SCAN CDT_PK (object id 152916)

********************************************************************************

cpu and elapsed time reduce by little and query reduce by half.
Is there better way to write the orignal query?

Thank you.

October 14, 2008 - 4:41 pm UTC

those two queries are *not equivalent* in general - you cannot replace a join by a where exists *in general*. So, without one having intimate knowledge of your data model and relationships between all involved - one cannot say "yes, use that query"

But in short, if you meant to use in or where exists - USE IT - do not join. Joining means "I want data from that other source" - but you don't - you want the source table to be "in" the other table. A join (your first approach) is semantically incorrect, you are not saying what you actually want to do there - the query with IN or WHERE EXISTS says what you MEAN.

https://www.oracle.com/technetwork/issue-archive/2006/06-may/o36asktom-099107.html

Use Exists

Stephan, October 15, 2008 - 6:08 am UTC

Hi Tom,
Thank you very much for your reply.

I want to pass the branch code and get distinct customers attached to that branch. But it will only display
few policys in table1. From font end user may pass the
given name or surname . So we are passing those values to surname column in table2.

--table1 all policys attached to agent
--table2 policys attached to which client ( one policy has two different client)
--table3 all the client details ( client number is unquie)
--table4 agent belows which branch

Since I using wrong column name to get the cli_num, I have change the query as below.

SELECT CLI_NUMBER.CLI_NUM,
d.BIRTH_DT,
d.CLI_NM,
d.CLI_NM,
'1',
'C'
FROM table3 d,
(select distinct C.CLI_NUM
from table2 c
where exists
(select 'x'
from table1 b
where exists (select 'x'
from table4 a
where a.unit_head_agt_key = '1400000350'
and b.agt_key = a.agt_key)
and b.pol_num = c.pol_num
and b.sub_grp_num = c.sub_grp_num
and b.cert_num = c.cert_num
and b.lob = c.lob) AND (UPPER(c.SURNAME) LIKE '%' OR UPPER(c.SURNAME) LIKE '%')) CLI_NUMBER
where d.cli_num = cli_number.cli_num

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.03 0.03 0 30 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 390 0.98 4.78 10702 22714 0 5826
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 392 1.01 4.82 10702 22744 0 5826

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 300 (POLE)

Rows Row Source Operation
------- ---------------------------------------------------
5826 NESTED LOOPS
5826 VIEW
5826 SORT UNIQUE
16760 HASH JOIN
8368 NESTED LOOPS
13 TABLE ACCESS FULL table4
8368 INDEX RANGE SCAN PK1_table1 (object id 149463)
922478 TABLE ACCESS FULL table2
5826 TABLE ACCESS BY INDEX ROWID table3
5826 INDEX UNIQUE SCAN CDT_PK (object id 152916)

Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
5826 NESTED LOOPS
5826 VIEW
5826 SORT (UNIQUE)
16760 HASH JOIN
8368 NESTED LOOPS
13 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'table4'
8368 INDEX GOAL: ANALYZED (RANGE SCAN) OF 'PK1_table1'
(UNIQUE)
922478 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'table2'
5826 TABLE ACCESS GOAL: ANALYZED (BY INDEX ROWID) OF
'table3'
5826 INDEX GOAL: ANALYZED (UNIQUE SCAN) OF 'CDT_PK' (UNIQUE)

Dan, October 22, 2008 - 11:22 am UTC

Tom -

no luck on getting access to those views. However, I was able to create a small test case for you that (at least on my machine) shows the discrepancy.

Setup

drop table samplenames;
drop table customers;
drop table emails;
drop sequence primkey;

create table samplenames 
(name VARCHAR2(10));

insert into samplenames values ('dan');
insert into samplenames values ('joe');
insert into samplenames values ('bob');
insert into samplenames values ('sam');
insert into samplenames values ('weslington');

insert into samplenames values ('sue');
insert into samplenames values ('ann');
insert into samplenames values ('mary');
insert into samplenames values ('pam');
insert into samplenames values ('lucy');

create sequence primkey
start with 1000000
increment by 1;

create table customers as
select primkey.nextval as cust_id, 
       tmp1.name || tmp2.name as first_name, 
       tmp3.name || tmp4.name || tmp5.name as last_name
from samplenames tmp1,
     samplenames tmp2,
     samplenames tmp3,
     samplenames tmp4,
     samplenames tmp5;
     
create table emails
(cust_id NUMBER, 
 use_type VARCHAR2(2), 
 add_dttm DATE, 
 email VARCHAR2(50)
);     
     
insert into emails
select cust_id, 
       'IE', 
       TO_DATE(TRUNC(DBMS_RANDOM.VALUE(TO_CHAR(trunc(sysdate), 'J'), TO_CHAR(trunc(sysdate), 'J')-20)), 'J'), 
       substr(first_name,1,1) || last_name || '@fakeemailsrus.com'
from customers;

insert into emails
select cust_id, 
       'IE', 
       TO_DATE(TRUNC(DBMS_RANDOM.VALUE(TO_CHAR(trunc(sysdate), 'J'), TO_CHAR(trunc(sysdate), 'J')-20)), 'J'), 
       last_name || substr(first_name,1,1) || '@fakeemailsrus.com'
from customers;

That gives us 100k customers, each with two emails on a random date.

Each test case is against 25k customers.

Test 1

drop table cust_sample;
drop table tmp_emails;

create table cust_sample as 
SELECT cust_id, first_name, last_name
FROM customers
where rownum <= 25000;

alter table cust_sample add (email VARCHAR2(50));

CREATE TABLE tmp_emails NOLOGGING
AS
SELECT cust_id,
       email,
       add_dttm
FROM emails
WHERE use_type = 'IE'
  AND cust_id IN (SELECT cust_id FROM cust_sample);

MERGE INTO cust_sample tmp
USING (SELECT DISTINCT cust_id, 
                       FIRST_VALUE(email) OVER (PARTITION BY cust_id ORDER BY add_dttm DESC) AS frst_val
       FROM tmp_emails
       WHERE REGEXP_LIKE (trim(email),
                          '^[A-Za-z0-9._%+-]+@[A-Za-z0-9._%+-]+\.[A-Za-z]{2,3}$')
      ) tmp2
ON (tmp.cust_id = tmp2.cust_id)
WHEN MATCHED THEN
UPDATE
  SET email = tmp2.frst_val;

Test 1 completes in ~30 seconds in my development environment.

Test 2

drop table cust_sample;
drop table tmp_emails;

create table cust_sample as 
SELECT cust_id, first_name, last_name
FROM customers
where rownum <= 25000;

alter table cust_sample add (email VARCHAR2(50));

MERGE INTO cust_sample tmp
USING (SELECT DISTINCT cust_id, 
                       FIRST_VALUE(email) OVER (PARTITION BY cust_id ORDER BY add_dttm DESC) AS frst_val
       FROM (SELECT cust_id,
                    email,
                    add_dttm
             FROM emails
             WHERE use_type = 'IE'
               AND cust_id IN (SELECT cust_id FROM cust_sample))
       WHERE REGEXP_LIKE (trim(email),
                          '^[A-Za-z0-9._%+-]+@[A-Za-z0-9._%+-]+\.[A-Za-z]{2,3}$')
      ) tmp2
ON (tmp.cust_id = tmp2.cust_id)
WHEN MATCHED THEN
UPDATE
  SET email = tmp2.frst_val;

Test 2 completes in ~60 seconds in my development environment.

I'm aware this is relatively non-scientific (my measurements), but the difference is relatively large and it is consistent.

October 22, 2008 - 4:39 pm UTC

MERGE INTO cust_sample tmp
USING (SELECT DISTINCT cust_id,
                       FIRST_VALUE(email) OVER (PARTITION BY cust_id ORDER BY add_dttm DESC) AS frst_val
       FROM ( select cust_id, email, add_dttm
                from ( (SELECT cust_id, email, add_dttm, rownum r
                          FROM emails
                         WHERE use_type = 'IE'
                           AND cust_id IN (SELECT cust_id FROM cust_sample) )
                     )
               WHERE r > 0 and REGEXP_LIKE (trim(email), '^[A-Za-z0-9._%+-]+@[A-Za-z0-9._%+-]+\.[A-Za-z]{2,3}$')
            )
        ) tmp2
ON (tmp.cust_id = tmp2.cust_id)
WHEN MATCHED THEN
UPDATE
  SET email = tmp2.frst_val;

rare'ish case. the regexp_like is really cpu expensive.

if you do the filters of "IE" and "in" first and then apply the regexp_like, you use a lot less cpu.

that single merge should be superior to either of the prior approaches.

Dan, October 22, 2008 - 4:52 pm UTC

Tom,

That runs in 1/2 to 2/3rds the time of the table + merge. And no extra table necessary. Perfect.

I was kind of thinking it was something like that, but I couldn't think of a way to force the REGEXP_LIKE outside of the other filters (since Oracle would always just put it back inside the view...) other then to just make a table first, then only run the expensive stuff on that. The trick with rownum is very neat.

October 22, 2008 - 4:57 pm UTC

(you could also use the materialize hint, see jonathan lewis's website for examples of that - instead of rownum...)

REGEXP_LIKE and use of INdex

Baju, February 03, 2009 - 7:53 pm UTC

Predicate :1
where <column> like '1%' is equivalent to

Predicate:2
where regexp_like(<column>, '^1.*')

Although these two predicates are same, optimizer does not use the index scan in case of regexp_like clause. Just wanted to confirm this is by design or I am missing something here.

Thanks in advance.

February 04, 2009 - 9:49 am UTC

you applied a function - regexp_like - to a column. This is 100% 'design', regexp_like is not an 'operator', it is a function. LIKE is an operator (similar to '=' or '>')

sort of like:

select * from t where to_char(x,'999999') = :x

would not use an index on X, but it could use an index on to_char(x,'999999')

More to Explore

Performance

Get all the information about database performance in the Database Performance guide.

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

Use Exists

Use Exists

REGEXP_LIKE and use of INdex

More to Explore

Performance

Performance - inline views vs indexing

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

Use Exists

Use Exists

REGEXP_LIKE and use of INdex

More to Explore

Performance