Hello,
I have a question to be clarified related to costs associated with SQL query execution.
I have view built on 3 big tables.
CREATE OR REPLACE VIEW TEXT_TRANSLATION AS
SELECT
c.module module,
-- about 10 more direct table columns similar to above here
NVL((SELECT text -- sub-select used to improve performance
FROM language_translation_tab t
WHERE t.attribute_id = a.attribute_id
AND t.lang_code = 'en'), prog_text) translation_prog_text,
l.lang_code lang_code,
l.description lang_description,
(SELECT text -- sub-select used to improve performance
FROM language_translation_tab t
WHERE t.attribute_id = a.attribute_id
AND t.lang_code = l.lang_code) text,
NVL((SELECT status -- sub-select used to improve performance
FROM language_translation_tab t
WHERE t.attribute_id = a.attribute_id
AND t.lang_code = l.lang_code), 'N') status_db,
NVL((SELECT Language_Tr_Status_API.Decode(status) -- sub-select used to improve performance
FROM language_translation_tab t
WHERE t.attribute_id = a.attribute_id
AND t.lang_code = l.lang_code),'Not Translated') status,
(SELECT reject_status -- sub-select used to improve performance
FROM language_translation_tab t
WHERE t.attribute_id = a.attribute_id
AND t.lang_code = l.lang_code) reject_status,
(SELECT reject_information -- sub-select used to improve performance
FROM language_translation_tab t
WHERE t.attribute_id = a.attribute_id
AND t.lang_code = l.lang_code) reject_information,
(SELECT case_id -- sub-select used to improve performance
FROM language_translation_tab t
WHERE t.attribute_id = a.attribute_id
AND t.lang_code = l.lang_code) case_id
FROM language_attribute_tab a,
language_context_tab c,
language_code_tab l
WHERE a.context_id = c.context_id
AND a.obsolete = 'N'
AND a.name != 'Help Data'
AND a.name != 'Status Text'
AND l.status = 'A'
AND A.prog_text IS NOT NULL
WITH READ ONLY
;
When I execute the query
SELECT * FROM TEXT_TRANSLATION
, it takes longer time with the following execution plan which is understandable.
----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6942275 | <b>1041341250</b> | <b>48687095 </b>| 00:31:42 |
| 1 | TABLE ACCESS BY INDEX ROWID | LANGUAGE_TRANSLATION_TAB | 1 | 34 | 1 | 00:00:01 |
| * 2 | INDEX UNIQUE SCAN | LANGUAGE_TRANSLATION_PK | 1 | | 1 | 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID | LANGUAGE_TRANSLATION_TAB | 1 | 34 | 1 | 00:00:01 |
| * 4 | INDEX UNIQUE SCAN | LANGUAGE_TRANSLATION_PK | 1 | | 1 | 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID | LANGUAGE_TRANSLATION_TAB | 1 | 11 | 1 | 00:00:01 |
| * 6 | INDEX UNIQUE SCAN | LANGUAGE_TRANSLATION_PK | 1 | | 1 | 00:00:01 |
| 7 | TABLE ACCESS BY INDEX ROWID | LANGUAGE_TRANSLATION_TAB | 1 | 11 | 1 | 00:00:01 |
| * 8 | INDEX UNIQUE SCAN | LANGUAGE_TRANSLATION_PK | 1 | | 1 | 00:00:01 |
| 9 | TABLE ACCESS BY INDEX ROWID | LANGUAGE_TRANSLATION_TAB | 1 | 51 | 1 | 00:00:01 |
| * 10 | INDEX UNIQUE SCAN | LANGUAGE_TRANSLATION_PK | 1 | | 1 | 00:00:01 |
| 11 | TABLE ACCESS BY INDEX ROWID | LANGUAGE_TRANSLATION_TAB | 1 | 60 | 1 | 00:00:01 |
| * 12 | INDEX UNIQUE SCAN | LANGUAGE_TRANSLATION_PK | 1 | | 1 | 00:00:01 |
| 13 | TABLE ACCESS BY INDEX ROWID | LANGUAGE_TRANSLATION_TAB | 1 | 60 | 1 | 00:00:01 |
| * 14 | INDEX UNIQUE SCAN | LANGUAGE_TRANSLATION_PK | 1 | | 1 | 00:00:01 |
| * 15 | HASH JOIN | | 6942275 | 1041341250 | 111886 | 00:00:05 |
| 16 | TABLE ACCESS FULL | LANGUAGE_CONTEXT_TAB | 1034257 | 96185901 | 5861 | 00:00:01 |
| 17 | MERGE JOIN CARTESIAN | | 6942275 | 395709675 | 78133 | 00:00:04 |
| * 18 | TABLE ACCESS FULL | LANGUAGE_CODE_TAB | 14 | 210 | 4 | 00:00:01 |
| 19 | BUFFER SORT | | 495877 | 20826834 | 78129 | 00:00:04 |
| * 20 | TABLE ACCESS FULL | LANGUAGE_ATTRIBUTE_TAB | 495877 | 20826834 | 5581 | 00:00:01 |
----------------------------------------------------------------------------------------------------------------
If you notice the view TEXT_TRANSLATION, all the sub-select statements are on the same table, LANGUAGE_TRANSLATION_TAB so I decided to increase the performance by
introducing an ANSI outer join instead of those sub-selects which I believe correct.
CREATE OR REPLACE VIEW TEXT_TRANSLATION AS
SELECT t0.* , NVL((SELECT text -- sub-select used to improve performance
FROM language_translation_tab t
WHERE t.attribute_id = t0.attribute_id
AND t.lang_code = 'en'), t0.prog_text) translation_prog_text,
t1.text,
NVL(t1.status, 'N') status_db,
NVL(Language_Tr_Status_API.Decode(t1.status),'Not Translated') status,
t1.reject_status reject_status,
t1.reject_information reject_information,
t1.case_id case_id
FROM
( SELECT
c.module module,
-- about 10 more direct table columns similar to above here
l.lang_code lang_code,
l.description lang_description
FROM language_attribute_tab a,
language_context_tab c,
language_code_tab l
WHERE a.context_id = c.context_id
AND a.obsolete = 'N'
AND a.name != 'Help Data'
AND a.name != 'Status Text'
AND l.status = 'A'
AND A.prog_text IS NOT NULL) t0
<b>LEFT OUTER JOIN language_translation_tab t1
ON t1.attribute_id = t0.attribute_id
AND t1.lang_code = t0.lang_code</b>
WITH READ ONLY
;
When I ran the same SQL, following execution plan is now used.
---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6942275 | <b>1457877750 </b>| <b>7186447 </b>| 00:04:41 |
| 1 | TABLE ACCESS BY INDEX ROWID | LANGUAGE_TRANSLATION_TAB | 1 | 34 | 1 | 00:00:01 |
| * 2 | INDEX UNIQUE SCAN | LANGUAGE_TRANSLATION_PK | 1 | | 1 | 00:00:01 |
| * 3 | HASH JOIN | | 6942275 | 1457877750 | 255532 | 00:00:10 |
| 4 | TABLE ACCESS FULL | LANGUAGE_CONTEXT_TAB | 1034257 | 96185901 | 5861 | 00:00:01 |
| * 5 | HASH JOIN OUTER | | 6942275 | 812246175 | 202083 | 00:00:08 |
| 6 | MERGE JOIN CARTESIAN | | 6942275 | 395709675 | 78133 | 00:00:04 |
| * 7 | TABLE ACCESS FULL | LANGUAGE_CODE_TAB | 14 | 210 | 4 | 00:00:01 |
| 8 | BUFFER SORT | | 495877 | 20826834 | 78129 | 00:00:04 |
| * 9 | TABLE ACCESS FULL | LANGUAGE_ATTRIBUTE_TAB | 495877 | 20826834 | 5581 | 00:00:01 |
| 10 | TABLE ACCESS FULL | LANGUAGE_TRANSLATION_TAB | 16450150 | 987009000 | 44997 | 00:00:02 |
---------------------------------------------------------------------------------------------------------------
Here you can clearly see that the CPU cost has drastically decreased
compared to previous plan but the query seems taking more resources
as I now receive a TEMP tablespace overflow.
As I know in general terms (as suggested in most of the resources),
CPU cost is the
most significant thing in optimizing SQLs in Oracle. But here I can notice that Bytes column
here is high compared to previous.
Since the final row count fetched in both approaches are same, is
number of Bytes here a significant factor compared to low cost I have here?
Or do you see any other problem with my approach?
Thank You.
Navinth
Let me begin by explaining what the bytes column in the plan represents.
The bytes column represents the estimated number of bytes (rows x expected row size) produced by each operation in the plan.
If you look in DBA_TAB_COLUMNS, you will see a column called avg_col_len (average column length). If you multiple the number of rows the Optimizer estimates will be return from each operation by theavg_col_len for the columns involved in that operation you should get the value shown in the bytes column.
Let's take a look at a simple example of a join between TEST_TAB1 and TEST_TAB2, on the code column:
SELECT count(*)
FROM test_tab1 t1, test_tab2 te
WHERE t1.code=t2.code;
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 6 (100)| |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
|* 2 | HASH JOIN | | 5 | 30 | 6 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| TEST_TAB2 | 5 | 15 | 3 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| TEST_TAB1 | 40 | 120 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------
TEST_TAB1 has 40 rows
TEST_TAB2 has 5 rows
When we query DBA_TAB_COLUMNS we can see that the avg_col_len for the code column in each of the tables is 3.
SELECT table_Name, column_name, avg_col_len
FROM dba_tab_columns
WHERE table_name in ('TEST_TAB1','TEST_TAB2’);
TABLE_NAME COLUMN_NAME AVG_COL_LEN
---------- ----------- -----------
TEST_TAB1 CODE 3
TEST_TAB1 EFFECDATE 8
TEST_TAB1 RE_DATE 8
TEST_TAB1 RE_DO 2
TEST_TAB1 E_TYPE 3
TEST_TAB1 D_TYPE 3
TEST_TAB1 DATE_DELET 1
TEST_TAB2 CODE 3
TEST_TAB2 RE_DATE 8
So, the value in the bytes columns on line 4 of the plan above comes from multiplying 40 rows X 3 (avg_col_len), which is 120 and the value in the bytes column on line 3 comes from multiplying 5 rows X 3(avg_col_len), which is 15.
Now, that we know where the numbers come from, let me answer your question “Is the number of bytes here a significant factor compared to the low cost?”
Yes, the number of bytes is a significant factor for the join operations, in so much as, it will represent the estimated amount of space (PGA or temp tablespace) required to complete each of join.
When you rewrote the query to use the LEFT OUTER JOIN, you simplified the execution plan (thus lowering the cost) but increased the amount of PGA or temp tablespace required to complete the query because you now have to join the entire LANGUAGE_TRANSLATION_TAB to the result of the MERGE JOIN CARTESIAN between LANGUAGE_ATTRIBUTE_TAB and LANGUAGE_CODE_TAB.
There is no problem with your approach, assuming you have enough PGA or temp table space to accommodate the HASH JOIN OUTER.
The only comment I would make regarding your queries is that you are missing a join predicate on the LANGUAGE_CODE_TAB table.
Assuming the estimated number of rows is correct and there are only a small number of rows returned from this table, it’s not a big deal but if the estimate number of rows is not correct, or there is a chance that this table may grow over time, then the MERGE JOIN CARTESIAN may become a performance bottleneck.