July/August 2014
Due to an application upgrade, we have to change all the records in one of our tables, based on new data supplied in a temporary table. (That table exists just for the upgrade; it is not a global temporary table.) What is the best approach for accomplishing this? The table has millions of records, and we need to complete the upgrade as quickly as possible.
This is a frequently asked question with an answer that might surprise you. The best way to update this table is not to update it but, rather, to re-create it! Because this is part of an application upgrade and the application is scheduled to be offline during the upgrade, re-creating the table will almost certainly be the most efficient way to accomplish this.
Rather than updating every row, which would necessarily generate UNDO for every modification and generate REDO for the table modification as well as more UNDO, simply use CREATE TABLE NEW_TABLE AS SELECT (CTAS). Using this approach, you can definitely skip all generated UNDO, and if you choose, all REDO as well. Additionally, instead of trying to maintain any and all indexes on these modified columns during the application update, simply create fresh indexes after the CREATE TABLE process is complete. If you were to update every row, the index on these modified columns would likely be at least twice as large as it needs to be, so you would almost certainly want to rebuild these indexes anyway.
So, your update steps are as follows:To see this in action against a large data set, I recommend that you look at the four-part video series at bit.ly/1fXdD8L, bit.ly/1idIpdK, bit.ly/1mO3dwL, and bit.ly/1lIg7yO. These videos show the loading, validating, and transforming (you are interested in the transforming portion most of all) of a large set of data with DDL and compare that approach with trying to use DML to do the same.
If You Can Hint It, You Can Baseline ItI recently had a problem with two SQL statements performing semantically equivalent queries but with very different runtimes. I think I narrowed the problem down to a simple case where the optimizer is wildly off in cardinality estimates. The statistics are up to date and represent the actual data.
When I run the query with a subquery (returning a single constant value) in the WHERE clause, the query is excruciatingly slow—about 350 seconds. If I evaluate the subquery by hand (which takes some milliseconds) and pass the return value as a constant to the query, the query will run in some milliseconds—with a completely different plan.
The subquery is a simple MAX(...) query, so why can’t the optimizer use the same plan or predict that a single value will be returned and optimize the query?
[Editor's note: Listing 1 shows the example table, a slow query and plan, and a faster query and plan.]
What is the best way to achieve this faster query plan without having to rewrite the original query?
Code Listing 1: Example table, slow query and plan, and faster query and plan
Here is the setup for the example table:
CREATE TABLE t AS SELECT * FROM all_objects; ALTER TABLE t ADD CONSTRAINT t_id_pk PRIMARY KEY (object_id); CREATE INDEX t_idx_id ON t(object_id); CREATE INDEX t_idx_type ON t(object_type); exec DBMS_STATS.GATHER_TABLE_STATS(user, 'T' );Here is the slow query I would like to tune, along with its plan:
SQL> SELECT * 2 FROM t t1 3 WHERE t1.object_type = 'TABLE' 4 AND t1.object_id > (SELECT MAX(t2.object_id) - 500000 FROM t t2); 6115 rows selected. Execution Plan ——————————————————————————————————————————————————————————————————————— | Id | Operation | Name | Rows | Bytes | ——————————————————————————————————————————————————————————————————————— | 0 | SELECT STATEMENT | | 413 | 39235 | | 1 | TABLE ACCESS BY INDEX ROWID | T | 413 | 39235 | | 2 | BITMAP CONVERSION TO ROWIDS | | | | | 3 | BITMAP AND | | | | | 4 | BITMAP CONVERSION FROM ROWIDS| | | | | 5 | SORT ORDER BY | | | | |* 6 | INDEX RANGE SCAN | T_IDX_ID | 8252 | | | 7 | SORT AGGREGATE | | 1 | 5 | | 8 | INDEX FULL SCAN (MIN/MAX)| T_IDX_ID | 1 | 5 | | 9 | BITMAP CONVERSION FROM ROWIDS| | | | |* 10 | INDEX RANGE SCAN | T_IDX_TYPE | 8252 | | ——————————————————————————————————————————————————————————————————————— Predicate Information (identified by operation id): ——————————————————————————————————————————————————————————————————————— 6 - access("T1"."OBJECT_ID"> (SELECT MAX("T2"."OBJECT_ID")-500000 FROM "T" "T2")) filter("T1"."OBJECT_ID"> (SELECT MAX("T2"."OBJECT_ID")-500000 FROM "T" "T2")) 10 - access("T1"."OBJECT_TYPE"='TABLE')
Here is the query and the faster plan I would like it to use:
SQL> SELECT MAX(t2.object_id) - 500000 FROM t t2; MAX(T2.OBJECT_ID)-500000 ———————————————————————————————————— 19975 SQL> SELECT * 2 FROM t t1 3 WHERE t1.object_type = 'TABLE' AND t1.object_id > 19975; 6115 rows selected. Execution Plan ————————————————————————————————————————————————————————————————— | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ——————————————————————————————————————————————————————————————————————————————— | 0 | SELECT STATEMENT | | 7198 | 667K| 102 (2)| |* 1 | TABLE ACCESS BY INDEX ROWID| T | 7198 | 667K| 102 (2)| |* 2 | INDEX RANGE SCAN | T_IDX_TYPE | 8553 | | 6 (0)| ——————————————————————————————————————————————————————————————————————————————— Predicate Information (identified by operation id): ———————————————————————————————————————————————————————————————————————————— 1 - filter("T1"."OBJECT_ID">19975) 2 – access("T1"."OBJECT_TYPE"='TABLE')
Although these two queries are semantically equivalent, they present very different challenges to the optimizer. In the first case, the optimizer will have to “guess” at the value returned by the subquery, because it has not yet executed the subquery. That is, the query
SQL> SELECT * 2 FROM t t1 3 WHERE t1.object_type = 'TABLE' 4 AND t1.object_id > (SELECT 5 MAX(t2.object_id) - 500000 6 FROM t t2);
is very much like
SQL> SELECT * 2 FROM t t1 3 WHERE t1.object_type = 'TABLE' 4 AND t1.object_id > ???;
The optimizer cannot see what the constraint against OBJECT_ID will be, and it cannot see what value will ultimately be plugged in there when the query executes. Therefore, the optimizer will guess at the estimated cardinality.
However, when you query with the construct
SQL> SELECT * 2 FROM t t1 3 WHERE t1.object_type = 'TABLE' 4 AND t1.object_id > 19975;
the optimizer has a lot more information to work with. It knows approximately how many rows will come back, due to the predicate with constant values. So what you are dealing with is a query in which the optimizer will have a really hard time obtaining correct cardinality estimates at hard parse time and therefore stands a good chance of not selecting the optimal plan—at least for the first execution. The question is, How can you fix this without rewriting the code to use two queries and, optimally, without using any hints in the production code? The answer is to temporarily use hints to get the plan you desire but then use the SQL Plan Management feature of Oracle Database and query plan baselines to implement the fix, removing the need for hints in the production environment and ensuring that you get the plan you want, unless and until the optimizer discovers a better plan.
The reason you do not want hints in your production code is that it is very difficult to correctly hint a query. To properly hint a query, you must fully, verbosely hint it. You might discover in your testing environment that using hints A, B, and C achieves the plan you want but that in production, after statistics are gathered and the data has changed, or after a database upgrade, hints A, B, and C may no longer do the job. Maybe you didn’t know you also needed hint D to ensure that the optimizer would choose the plan you wanted, even though the query was working well “by accident” before. Or, maybe after a database upgrade, the optimizer introduces a new query transformation/rewrite that invalidates your hint.
I’ll demonstrate the shortest list of hints you would need to ensure that the above query uses the plan you want, and you’ll see that this list of hints is much longer than any list of hints you would have coded yourself.
Another reason to avoid putting hints into the query and to use SQL Plan Management is that you enable the optimizer to evolve your query plan over time. That is, you won’t freeze your query plan, and you’ll have the opportunity to use a better plan in the future. If you fully hinted the query, you’d be stuck with whatever plan you had today. See “Baselines and Better Plans” at for a discussion of this query plan evolution.
Now let’s see how to get the plan you want and then how to use SQL Plan Management to put that plan in place and have it be used transparently by your query. In a testing environment, I discovered that by using either the FIRST_ROWS(1) or the OPT_PARAM( ‘_b_tree_bitmap_plans’, ‘FALSE’ ) hint to disable the use of B-tree bitmap plans for this SQL statement, I get the plan I want, as shown in Listing 2.
Code Listing 2: Getting a plan by using hints
SQL> SELECT /*+ first_rows(1) */ * 2 FROM t t1 3 WHERE t1.object_type = 'TABLE' 4 AND t1.object_id > (SELECT MAX(t2.object_id) - 500000 FROM t t2); Execution Plan ————————————————————————————————————————————————————————————————————————————— Plan hash value: 1289158178 —————————————————————————————————————————————————————————————————————————————— | Id | Operation | Name | Rows | Bytes | Cost (%CPU) —————————————————————————————————————————————————————————————————————————————— | 0 | SELECT STATEMENT | | 1 | 97 | 4 (0) |* 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 97 | 2 (0) |* 2 | INDEX RANGE SCAN | T_IDX_TYPE | 1871 | | 1 (0) | 3 | SORT AGGREGATE | | 1 | 5 | | 4 | INDEX FULL SCAN (MIN/MAX)| T_ID_PK | 1 | 5 | 2 (0) —————————————————————————————————————————————————————————————————————————————— SQL> select /*+ opt_param( '_b_tree_bitmap_plans', 'FALSE' ) */ * 2 FROM t t1 3 WHERE t1.object_type = 'TABLE' 4 AND t1.object_id > (SELECT MAX(t2.object_id) - 500000 FROM t t2); Execution Plan ————————————————————————————————————————————————————————————————————————————— Plan hash value: 1289158178 —————————————————————————————————————————————————————————————————————————————— | Id | Operation | Name | Rows | Bytes | Cost (%CPU) —————————————————————————————————————————————————————————————————————————————— | 0 | SELECT STATEMENT | | 94 | 9118 | 85 (0) |* 1 | TABLE ACCESS BY INDEX ROWID| T | 94 | 9118 | 83 (0) |* 2 | INDEX RANGE SCAN | T_IDX_TYPE | 1871 | | 6 (0) | 3 | SORT AGGREGATE | | 1 | 5 | | 4 | INDEX FULL SCAN (MIN/MAX)| T_ID_PK | 1 | 5 | 2 (0)
Now, if I wanted to ensure that the plan in Listing 2 is, in fact, the plan used by default, I would need more than just a FIRST_ROWS(1) or OPT_PARAM hint. In fact, when I use DBMS_XPLAN to display not only the plans but also the full set of hints necessary to reproduce this plan reliably, I see the plan and the outline data in Listing 3.
Code Listing 3: Determining plan and outline info needed to make a better hinted plan
SQL> select * 2 from table(dbms_xplan.display_cursor (sql_id=>'4rg4kcxr83kup', 3 cursor_child_no => 0, format=>'+outline')); PLAN_TABLE_OUTPUT —————————————————————————————————————————— SQL_ID 4rg4kcxr83kup, child number 0 ——————————————————————————————————————————————————————— select /*+ opt_param( '_b_tree_bitmap_plans', 'FALSE' ) gather_plan_statistics */ * FROM t t1 WHERE t1.object_type = 'TABLE' AND t1.object_id > (SELECT MAX(t2.object_id) - 500000 FROM t t2) Plan hash value: 1289158178 —————————————————————————————————————————————————————————————————————————————————————— | Id | Operation | Name | Rows | Bytes| Cost (%CPU)| Time | —————————————————————————————————————————————————————————————————————————————————————— | 0 | SELECT STATEMENT | | | | 83 (100)| | |* 1 | TABLE ACCESS BY INDEX ROWID| T | 91| 8827| 81 (0)| 00:00:01| |* 2 | INDEX RANGE SCAN | T_IDX_TYPE| 1823| | 6 (0)| 00:00:01| | 3 | SORT AGGREGATE | | 1| 5| | | | 4 | INDEX FULL SCAN (MIN/MAX)| T_ID_PK | 1| 5| 2 (0)| 00:00:01| —————————————————————————————————————————————————————————————————————————————————————— Outline Data ——————————————————— /*+ BEGIN_OUTLINE_DATA IGNORE_OPTIM_EMBEDDED_HINTS OPTIMIZER_FEATURES_ENABLE('11.2.0.2') DB_VERSION('11.2.0.3') OPT_PARAM('_b_tree_bitmap_plans' 'false') ALL_ROWS OUTLINE_LEAF(@"SEL$2") OUTLINE_LEAF(@"SEL$1") INDEX_RS_ASC(@"SEL$1" "T1"@"SEL$1" ("T"."OBJECT_TYPE")) PUSH_SUBQ(@"SEL$2") INDEX(@"SEL$2" "T2"@"SEL$2" ("T"."OBJECT_ID")) END_OUTLINE_DATA */As you can see in Listing 3, the full list of hints in the outline data that you need is quite extensive—far beyond just the OPT_PARAM hint. I can assure you that properly hinting a query to repeatedly get the same plan is much harder than it looks. Also, I don’t really want to freeze the plan forever, because future releases of Oracle Database will likely be able to rewrite the query or perform some optimization that just doesn’t exist yet, resulting in an even better plan. I’d like to be able to take advantage of future features, but if I’ve hinted the query, I won’t be able to.
So, what would the steps be to get the optimizer to choose the desired plan until a provably better plan comes along?
First, I would capture a SQL plan baseline for the existing query—the query with the “bad” plan. Why would I want to capture the bad plan? To establish a SQL plan baseline for this query (into which I will load the desired plan), I need the plan baseline to exist before I can load the plan into it. I execute the code in Listing 4 to create the baseline.
Code Listing 4: Creating the baseline
SQL> begin 2 dbms_output.put_line( 3 dbms_spm.load_plans_from_cursor_cache 4 ( sql_id => 'crvk9z6mx9n4d' ) 5 ); 6 end; 7 / 1 SQL> select sql_handle, 2 substr(sql_text,1,10)||'...'|| 3 substr(sql_text,length(sql_text)-10) stext, 4 plan_name, enabled 5 from dba_sql_plan_baselines 6 where sql_text like 7 'SELECT%FROM t t1%(SELECT MAX(t2.object_id) - 500000 FROM t t2)'; SQL_HANDLE STEXT PLAN_NAME ENA ———————————————————— ——————————————— —————————————————————————————— ——— SQL_e738c19a5191e8fd SELECT * SQL_PLAN_fff61m98t3u7xda64b1bb YES ... FROM t t2)
That is my baseline, and that plan will be used. But it isn’t what I want. The next step is to disable that plan:
SQL> begin 2 dbms_output.put_line( 3 dbms_spm.alter_sql_plan_baseline 4 ( sql_handle => 'SQL_e738c19a5191e8fd', 5 attribute_name => 'enabled', 6 attribute_value => 'NO' ) 7 ); 8 end; 9 / 1
The database will still use that plan, but only because it is the only game in town right now. So let’s create something better: the first query plan in Listing 2—the plan for the SELECT /*+ first_rows(1) */ * query.
That is the plan I want. All I need to do is load that plan in place of the bad one, as shown in Listing 5.
Code Listing 5: Loading the new—good—plan
SQL> set autotrace off SQL> begin 2 dbms_output.put_line( 3 dbms_spm.load_plans_from_cursor_cache 4 ( sql_id => '5mn39tz7fpjnu', 5 plan_hash_value => 1289158178, 6 sql_handle => 'SQL_e738c19a5191e8fd' ) 7 ); 8 end; 9 / 1 SQL> select sql_handle, 2 substr(sql_text,1,10)||'...'|| 3 substr(sql_text,length(sql_text)-10) stext, 4 plan_name, enabled 5 from dba_sql_plan_baselines 6 where sql_text like 7 'SELECT%FROM t t1%(SELECT MAX(t2.object_id) - 500000 FROM t t2)'; SQL_HANDLE STEXT PLAN_NAME ENA ———————————————————— ———————— —————————————————————————————— ——— SQL_e738c19a5191e8fd SELECT * SQL_PLAN_fff61m98t3u7x971f1a3f YES ... FROM t t2) SQL_e738c19a5191e8fd SELECT * SQL_PLAN_fff61m98t3u7xda64b1bb NO ... FROM t t2)
There are now two plans loaded for that SQL statement: one—the bad plan—is disabled; the other—the good plan (even though it is for a “different query,” it can work with this query)—is enabled. When I run the query again, as shown in Listing 6, I see that it used the query plan baseline for my good plan.
Code Listing 6: Confirming use of the good plan
SQL> set autotrace traceonly SQL> SELECT * 2 FROM t t1 3 WHERE t1.object_type = 'TABLE' 4 AND t1.object_id > (SELECT MAX(t2.object_id) - 500000 FROM t t2); Execution Plan ——————————————————————————————————————————————————— Plan hash value: 1289158178 ————————————————————————————————————————————————————— | Id | Operation | Name | Rows | Bytes | Cost (%CPU) ————————————————————————————————————————————————————— | 0 | SELECT STATEMENT | | 1 | 97 | 4 (0) |* 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 97 | 2 (0) |* 2 | INDEX RANGE SCAN | T_IDX_TYPE | 1871 | | 1 (0) | 3 | SORT AGGREGATE | | 1 | 5 | | 4 | INDEX FULL SCAN (MIN/MAX)| T_ID_PK | 1 | 5 | 2 (0) ————————————————————————————————————————————————————————— Predicate Information (identified by operation id): ——————————————————————————————————————————————————————————————————————————— 1 - filter("T1"."OBJECT_ID"> (SELECT /*+ PUSH_SUBQ INDEX ("T2" "T_ID_PK") * MAX("T2"."OBJECT_ID")-500000 FROM "T" "T2")) 2 - access("T1"."OBJECT_TYPE"='TABLE') Note ——————— - SQL plan baseline "SQL_PLAN_fff61m98t3u7x971f1a3f" used for this statement
With this baseline solution, there is no permanent hinting—and permanent hinting is much harder to set up and use than you think it is! Many hinted queries out there in production today may be ticking time bombs.
And there are no upgrade issues with this baseline solution. (You may have to completely change all your hints after a database upgrade as you add new query rewrites, access paths, and so on. With this baseline solution, you won’t.)
And, finally, if a better plan comes along, you’ll be able to use it if you are using query plan evolution.
|
Next Steps
ASK Tom
Tom Kyte answers your most difficult technology questions. Highlights from that forum appear in this column.
READ more Tom
DOWNLOAD Oracle Database 12c
LEARN more about
Oracle Database 12c
Baselines
“Baselines and Better Plans”
FOLLOW Tom on Twitter
FOLLOW Oracle Database
on Twitter
on Facebook
DISCLAIMER: We've captured these popular historical magazine articles for your reference. Links etc contained within these article possibly won't work. These articles are provided as-is with no guarantee that the information within them is the best advice for current versions of the Oracle Database.