You Asked

Hi Connor/Chris/Maria,

My question is I have 2 schemas on 2 different servers Schema A1 and Schema B1.
There is a DBlink on Schema B1 to access A1. Now I have a query where I need to access 2 huge tables (each contains 10 million records).
My colleague wrote a co-related sub_query on schema B1

select * from B1.B1_tab_1 where NOT (SELECT NULL FROM A1_tab_1@B1toA1 tab_1, A1_tab_2@B1toA1 tab_2 WHERE tab_1.col2 = tab_2.col2 AND........AND tab_1.id = B1_tab_1.id )

Where B1_tab_1 is comparatively small table ..say it contains 1m records.
I disagree with above logic because it is calling dblink for each record.
Instead I am asking him to write query in this way

select * from B1.B1_tab_1 where id NOT IN (SELECT tab_1.id FROM A1_tab_1@B1toA1 tab_1, A1_tab_2@B1toA1 tab_2 WHERE tab_1.col2 = tab_2.col2 AND........ )

or
select * from B1.B1_tab_1 where id IN
(SELECT id from B1.B1_tab_1
MINUS
SELECT tab_1.id FROM A1_tab_1@B1toA1 tab_1, A1_tab_2@B1toA1 tab_2 WHERE tab_1.col2 = tab_2.col2 AND........ )

Could you please help me to understand which query will have less cost associated with it.
thanks in advance :)

and we said...

The text of the query is not really the critical factor here, it is what the optimizer does with your query that is important.

Let's look at a very simple example using a join between two tables on two different databases (where the remote database is called "db11"):

--
-- Create table t on both databases
--

SQL> create table t as select * from dba_objects where object_id is not null;

Table created.

SQL>
SQL> alter table t add primary key (object_id );

Table altered.

Now Let's look at a simple join between the two table and the two possible execution plans the optimizer *might* come up with:

select count(*)
from t, t@db11 t_remote
where t.object_id = t_remote.object_id;

----------------------------------------------------------------------------------------------------
| Id  | Operation           | Name         | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |              |     1 |    18 |    38  (14)| 00:00:01 |        |      |
|   1 |  SORT AGGREGATE     |              |     1 |    18 |            |          |        |      |
|   2 |   NESTED LOOPS      |              |   103K|  1822K|    38  (14)| 00:00:01 |        |      |
|   3 |    REMOTE           | T            |   103K|  1315K|    33   (0)| 00:00:01 |   DB11 | R->S |
|*  4 |    INDEX UNIQUE SCAN| SYS_C0033070 |     1 |     5 |     0   (0)| 00:00:01 |        |      |
----------------------------------------------------------------------------------------------------

In this first plan we "hop" across the database link for each row in the local table. The larger the local table the longer this plan will take to execute.

Or an alternative plan for the very same query might be:

----------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Inst   |IN-OUT|
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     1 |    18 |       |   741   (1)| 00:00:01 |        |      |
|   1 |  SORT AGGREGATE     |      |     1 |    18 |       |            |          |        |      |
|*  2 |   HASH JOIN         |      |   103K|  1822K|  1736K|   741   (1)| 00:00:01 |        |      |
|   3 |    TABLE ACCESS FULL| T    |   104K|   508K|       |   311   (1)| 00:00:01 |        |      |
|   4 |    REMOTE           | T    |   103K|  1315K|       |   222   (1)| 00:00:01 |   DB11 | R->S |
----------------------------------------------------------------------------------------------------

In this plan we pull the rows from the remote table T back to the local database "en masse" for a local hash join.

Optimization *across* database links is notoriously difficult for the optimizer, because there are so many factors in play (network latency, load on the remote node, etc, etc.).

Rather than be concerned about the sql syntax your colleague is using - check the execution plans, and if necessary, use the available mechanisms (hints, sql plan management, profiles etc) to lock down the plan that works best for your infrastructure.

Is this answer out of date? If it is, please let us know via a Comment

More to Explore

Performance

Get all the information about database performance in the Database Performance guide.

Breadcrumb

Question and Answer

You Asked

and we said...

More to Explore

Performance

cost linked to DBLink

Breadcrumb

Question and Answer

You Asked

and we said...

More to Explore

Performance