You Asked

Hello,

I have a question to be clarified related to costs associated with SQL query execution.

I have view built on 3 big tables.

CREATE OR REPLACE VIEW TEXT_TRANSLATION AS
SELECT
      c.module                                                               module,
      -- about 10 more direct table columns similar to above here
      NVL((SELECT text -- sub-select used to improve performance
            FROM language_translation_tab t
            WHERE t.attribute_id = a.attribute_id
            AND t.lang_code = 'en'), prog_text)                              translation_prog_text,
       l.lang_code                                                           lang_code,
       l.description                                                         lang_description,
      (SELECT text -- sub-select used to improve performance
            FROM language_translation_tab t
            WHERE t.attribute_id = a.attribute_id
            AND t.lang_code = l.lang_code)                                   text,
      NVL((SELECT status -- sub-select used to improve performance
            FROM language_translation_tab t
            WHERE t.attribute_id = a.attribute_id
            AND t.lang_code = l.lang_code), 'N')                             status_db,
      NVL((SELECT Language_Tr_Status_API.Decode(status)  -- sub-select used to improve performance
            FROM language_translation_tab t
            WHERE t.attribute_id = a.attribute_id
            AND t.lang_code = l.lang_code),'Not Translated')                 status,
      (SELECT reject_status -- sub-select used to improve performance
            FROM language_translation_tab t
            WHERE t.attribute_id = a.attribute_id
            AND t.lang_code = l.lang_code)                                   reject_status,
      (SELECT reject_information -- sub-select used to improve performance
            FROM language_translation_tab t
            WHERE t.attribute_id = a.attribute_id
            AND t.lang_code = l.lang_code)                                   reject_information,
      (SELECT case_id -- sub-select used to improve performance
            FROM language_translation_tab t
            WHERE t.attribute_id = a.attribute_id
            AND t.lang_code = l.lang_code)                                   case_id
FROM   language_attribute_tab a,
       language_context_tab c,
       language_code_tab l
WHERE  a.context_id = c.context_id
  AND  a.obsolete = 'N'
  AND  a.name != 'Help Data'
  AND  a.name != 'Status Text'
  AND  l.status = 'A'
  AND A.prog_text IS NOT NULL
WITH   READ ONLY
;

When I execute the query

SELECT * FROM TEXT_TRANSLATION

, it takes longer time with the following execution plan which is understandable.

----------------------------------------------------------------------------------------------------------------
| Id   | Operation                     | Name                     | Rows    | Bytes      | Cost     | Time     |
----------------------------------------------------------------------------------------------------------------
|    0 | SELECT STATEMENT              |                          | 6942275 | <b>1041341250</b> | <b>48687095 </b>| 00:31:42 |
|    1 |   TABLE ACCESS BY INDEX ROWID | LANGUAGE_TRANSLATION_TAB |       1 |         34 |        1 | 00:00:01 |
|  * 2 |    INDEX UNIQUE SCAN          | LANGUAGE_TRANSLATION_PK  |       1 |            |        1 | 00:00:01 |
|    3 |   TABLE ACCESS BY INDEX ROWID | LANGUAGE_TRANSLATION_TAB |       1 |         34 |        1 | 00:00:01 |
|  * 4 |    INDEX UNIQUE SCAN          | LANGUAGE_TRANSLATION_PK  |       1 |            |        1 | 00:00:01 |
|    5 |   TABLE ACCESS BY INDEX ROWID | LANGUAGE_TRANSLATION_TAB |       1 |         11 |        1 | 00:00:01 |
|  * 6 |    INDEX UNIQUE SCAN          | LANGUAGE_TRANSLATION_PK  |       1 |            |        1 | 00:00:01 |
|    7 |   TABLE ACCESS BY INDEX ROWID | LANGUAGE_TRANSLATION_TAB |       1 |         11 |        1 | 00:00:01 |
|  * 8 |    INDEX UNIQUE SCAN          | LANGUAGE_TRANSLATION_PK  |       1 |            |        1 | 00:00:01 |
|    9 |   TABLE ACCESS BY INDEX ROWID | LANGUAGE_TRANSLATION_TAB |       1 |         51 |        1 | 00:00:01 |
| * 10 |    INDEX UNIQUE SCAN          | LANGUAGE_TRANSLATION_PK  |       1 |            |        1 | 00:00:01 |
|   11 |   TABLE ACCESS BY INDEX ROWID | LANGUAGE_TRANSLATION_TAB |       1 |         60 |        1 | 00:00:01 |
| * 12 |    INDEX UNIQUE SCAN          | LANGUAGE_TRANSLATION_PK  |       1 |            |        1 | 00:00:01 |
|   13 |   TABLE ACCESS BY INDEX ROWID | LANGUAGE_TRANSLATION_TAB |       1 |         60 |        1 | 00:00:01 |
| * 14 |    INDEX UNIQUE SCAN          | LANGUAGE_TRANSLATION_PK  |       1 |            |        1 | 00:00:01 |
| * 15 |   HASH JOIN                   |                          | 6942275 | 1041341250 |   111886 | 00:00:05 |
|   16 |    TABLE ACCESS FULL          | LANGUAGE_CONTEXT_TAB     | 1034257 |   96185901 |     5861 | 00:00:01 |
|   17 |    MERGE JOIN CARTESIAN       |                          | 6942275 |  395709675 |    78133 | 00:00:04 |
| * 18 |     TABLE ACCESS FULL         | LANGUAGE_CODE_TAB        |      14 |        210 |        4 | 00:00:01 |
|   19 |     BUFFER SORT               |                          |  495877 |   20826834 |    78129 | 00:00:04 |
| * 20 |      TABLE ACCESS FULL        | LANGUAGE_ATTRIBUTE_TAB   |  495877 |   20826834 |     5581 | 00:00:01 |
----------------------------------------------------------------------------------------------------------------

If you notice the view TEXT_TRANSLATION, all the sub-select statements are on the same table, LANGUAGE_TRANSLATION_TAB so I decided to increase the performance by
introducing an ANSI outer join instead of those sub-selects which I believe correct.

CREATE OR REPLACE VIEW TEXT_TRANSLATION AS
SELECT t0.* , NVL((SELECT text -- sub-select used to improve performance
            FROM language_translation_tab t
            WHERE t.attribute_id = t0.attribute_id
            AND t.lang_code = 'en'), t0.prog_text)                              translation_prog_text,
            t1.text, 
        NVL(t1.status, 'N')                             status_db,
        NVL(Language_Tr_Status_API.Decode(t1.status),'Not Translated')      status,
        t1.reject_status                                   reject_status,
        t1.reject_information                                   reject_information,
        t1.case_id     case_id
FROM
( SELECT
      c.module                                                               module,
      -- about 10 more direct table columns similar to above here      
       l.lang_code                                                           lang_code,
       l.description                                                         lang_description
FROM   language_attribute_tab a,
       language_context_tab c,
       language_code_tab l
WHERE  a.context_id = c.context_id
  AND  a.obsolete = 'N'
  AND  a.name != 'Help Data'
  AND  a.name != 'Status Text'
  AND  l.status = 'A'
  AND A.prog_text IS NOT NULL) t0
<b>LEFT OUTER JOIN language_translation_tab t1
  ON t1.attribute_id = t0.attribute_id
 AND t1.lang_code = t0.lang_code</b>
WITH   READ ONLY
;

When I ran the same SQL, following execution plan is now used.

---------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name                     | Rows     | Bytes      | Cost    | Time     |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                          |  6942275 | <b>1457877750 </b>| <b>7186447 </b>| 00:04:41 |
|   1 |   TABLE ACCESS BY INDEX ROWID | LANGUAGE_TRANSLATION_TAB |        1 |         34 |       1 | 00:00:01 |
| * 2 |    INDEX UNIQUE SCAN          | LANGUAGE_TRANSLATION_PK  |        1 |            |       1 | 00:00:01 |
| * 3 |   HASH JOIN                   |                          |  6942275 | 1457877750 |  255532 | 00:00:10 |
|   4 |    TABLE ACCESS FULL          | LANGUAGE_CONTEXT_TAB     |  1034257 |   96185901 |    5861 | 00:00:01 |
| * 5 |    HASH JOIN OUTER            |                          |  6942275 |  812246175 |  202083 | 00:00:08 |
|   6 |     MERGE JOIN CARTESIAN      |                          |  6942275 |  395709675 |   78133 | 00:00:04 |
| * 7 |      TABLE ACCESS FULL        | LANGUAGE_CODE_TAB        |       14 |        210 |       4 | 00:00:01 |
|   8 |      BUFFER SORT              |                          |   495877 |   20826834 |   78129 | 00:00:04 |
| * 9 |       TABLE ACCESS FULL       | LANGUAGE_ATTRIBUTE_TAB   |   495877 |   20826834 |    5581 | 00:00:01 |
|  10 |     TABLE ACCESS FULL         | LANGUAGE_TRANSLATION_TAB | 16450150 |  987009000 |   44997 | 00:00:02 |
---------------------------------------------------------------------------------------------------------------

Here you can clearly see that the CPU cost has drastically decreased
compared to previous plan but the query seems taking more resources
as I now receive a TEMP tablespace overflow.

As I know in general terms (as suggested in most of the resources), CPU cost is the
most significant thing in optimizing SQLs in Oracle. But here I can notice that Bytes column
here is high compared to previous.

Since the final row count fetched in both approaches are same, is number of Bytes
here a significant factor compared to low cost I have here?
Or do you see any other problem with my approach?

Thank You.
Navinth

and we said...

Let me begin by explaining what the bytes column in the plan represents.

The bytes column represents the estimated number of bytes (rows x expected row size) produced by each operation in the plan.

If you look in DBA_TAB_COLUMNS, you will see a column called avg_col_len (average column length). If you multiple the number of rows the Optimizer estimates will be return from each operation by theavg_col_len for the columns involved in that operation you should get the value shown in the bytes column.

Let's take a look at a simple example of a join between TEST_TAB1 and TEST_TAB2, on the code column:

SELECT count(*) 
FROM   test_tab1 t1, test_tab2 te
WHERE  t1.code=t2.code;

---------------------------------------------------------------------------------
| Id  | Operation     | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |  | | |     6 (100)|  |
|   1 |  SORT AGGREGATE     |  |     1 |     6 |      |  |
|*  2 |   HASH JOIN     |  |     5 |    30 |     6   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| TEST_TAB2 |     5 |    15 |     3   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| TEST_TAB1 |    40 |   120 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------

TEST_TAB1 has 40 rows
TEST_TAB2 has   5 rows

When we query DBA_TAB_COLUMNS we can see that the avg_col_len for the code column in each of the tables is 3.

SELECT table_Name, column_name, avg_col_len
FROM   dba_tab_columns 
WHERE  table_name in ('TEST_TAB1','TEST_TAB2’);

TABLE_NAME COLUMN_NAME AVG_COL_LEN
---------- -----------  -----------
TEST_TAB1  CODE   3
TEST_TAB1  EFFECDATE  8
TEST_TAB1  RE_DATE  8
TEST_TAB1  RE_DO  2
TEST_TAB1  E_TYPE  3
TEST_TAB1  D_TYPE  3
TEST_TAB1  DATE_DELET  1
TEST_TAB2  CODE   3
TEST_TAB2  RE_DATE  8

So, the value in the bytes columns on line 4 of the plan above comes from multiplying 40 rows X 3 (avg_col_len), which is 120 and the value in the bytes column on line 3 comes from multiplying 5 rows X 3(avg_col_len), which is 15.

Now, that we know where the numbers come from, let me answer your question “Is the number of bytes here a significant factor compared to the low cost?”

Yes, the number of bytes is a significant factor for the join operations, in so much as, it will represent the estimated amount of space (PGA or temp tablespace) required to complete each of join.

When you rewrote the query to use the LEFT OUTER JOIN, you simplified the execution plan (thus lowering the cost) but increased the amount of PGA or temp tablespace required to complete the query because you now have to join the entire LANGUAGE_TRANSLATION_TAB to the result of the MERGE JOIN CARTESIAN between LANGUAGE_ATTRIBUTE_TAB and LANGUAGE_CODE_TAB.

There is no problem with your approach, assuming you have enough PGA or temp table space to accommodate the HASH JOIN OUTER.

The only comment I would make regarding your queries is that you are missing a join predicate on the LANGUAGE_CODE_TAB table.

Assuming the estimated number of rows is correct and there are only a small number of rows returned from this table, it’s not a big deal but if the estimate number of rows is not correct, or there is a chance that this table may grow over time, then the MERGE JOIN CARTESIAN may become a performance bottleneck.

Is this answer out of date? If it is, please let us know via a Comment

More to Explore

SQL

The Oracle documentation contains a complete SQL reference.

Breadcrumb

Question and Answer

You Asked

and we said...

More to Explore

SQL

Costs associated in a query execution plan

Breadcrumb

Question and Answer

You Asked

and we said...

More to Explore

SQL