March/April 2008
I've received some good questions since the last issue of Oracle Magazine , and they are all online at asktom.oracle.com. However, I've been using Oracle Database 11g for so many tasks recently, I feel compelled to address more new featurestidbits I keep stumbling on that just make the new release that much easier to use.
From SQL that pivots to virtual columns, Oracle Database 11g continues to impress.
How to PivotA frequently asked question on Ask Tom and virtually every online Oracle forum is, "How can I pivot a result set?" For example, suppose I have the query
SQL > select deptno, job, sum(sal) sal 2 from emp 3 group by deptno, job 4 / DEPTNO JOB SAL -------- ---------- ---------- 20 CLERK 1900 30 SALESMAN 5600 20 MANAGER 2975 30 CLERK 950 10 PRESIDENT 5000 30 MANAGER 2850 10 CLERK 1300 10 MANAGER 2450 20 ANALYST 6000 9 rows selected.
and I need to have the JOB attributes become the column headings, showing just three rowsfor DEPTNO 10, 20, and 30with columns CLERK, ANALYST, and so on. Well, we've been able to do this before Oracle Database 11g, and it looked like Listing 1.
Code Listing 1: Pivot with DECODE
SQL > select deptno, 2 sum( decode( job, 'CLERK', sal ) ) clerk, 3 sum( decode( job, 'SALESMAN', sal ) ) salesman, 4 sum( decode( job, 'MANAGER', sal ) ) manager, 5 sum( decode( job, 'ANALYST', sal ) ) analyst, 6 sum( decode( job, 'PRESIDENT', sal ) ) president 7 from emp 8 group by deptno 9 order by deptno 10 / DEPTNO CLERK SALESMAN MANAGER ANALYST PRESIDENT ----------- -------- ------------ ------------ ---------- ------------- 10 1300 2450 5000 20 1900 2975 6000 30 950 5600 2850
Now that certainly works, but it was never considered intuitive or easy. Starting with Oracle Database 11g, there is new PIVOT syntax for this operation. Documented and a bit more intuitive to use, this syntax is shown in Listing 2.
Pivot with PIVOTSQL > select * 2 from( select deptno, job, sal 3 from emp ) e 4 pivot( sum ( sal ) for job in 5 ( 'CLERK', 'SALESMAN', 'MANAGER', 'ANALYST', 'PRESIDENT' ) ) 6 order by deptno 7 / DEPTNO 'CLERK' 'SALESMAN' 'MANAGER' 'ANALYST' 'PRESIDENT' ----------- -------- ------------ ------------ ---------- ------------- 10 1300 2450 5000 20 1900 2975 6000 30 950 5600 2850
We have a data model based on a legacy system, and one of our fields is actually a concatenation of many other fields. We cannot modify this field right now, but we would like to enforce referential integrity on the values embedded within it. For example, if we applied the SUBSTR(column,5,2) function to the column, that value should appear in our lookup table. How can we achieve this?
In the past, there was no good answer for this. Referential integrity mandated that the child table had a value in a real column that pointed to a parent table. Here, there is no real column but rather a function on a column that needs to exist in this parent table. In Oracle Database 10g Release 2 and before, the only reasonable solution was to create an additional column and to define a trigger to maintain this value by applying the SUBSTR function. The solution might look like the code in Listing 3.
Code Listing 3: Referential integrity via a new column and trigger
SQL> create table lookup
2 ( status_code varchar2(2) primary key,
3 other_data varchar2(20)
4 );
Table created.
SQL> create table data
2 ( the_key varchar2(10) primary key,
3 status_code varchar2(2)
4 NOT NULL
5 references lookup
6 );
Table created.
SQL> create trigger data
2 before insert or update of the_key on data
3 for each row
4 begin
5 :new.status_code := substr(:new.the_key,5,2);
6 end;
7 /
Trigger created.
SQL> insert into lookup (status_code) values ( 'AA' );
1 row created.
SQL> insert into data (the_key) values ( '1234AA7890' );
1 row created.
SQL> insert into data (the_key) values ( '1234XX7890' );
insert into data (the_key) values ( '1234XX7890' )
*
ERROR at line 1:
ORA-02291: integrity constraint (OPS$TKYTE.SYS_C009710) violated - parent key not found
Now, the problems with this solution are twofold. One is an obvious storage overhead issue, because we are storing the data twice: once in the THE_KEY field and once in the STATUS field. The other issue is the performance implication of the triggerit will necessarily add to the runtime of all INSERTS and at least some UPDATES.
Starting with the introduction of virtual columns in Oracle Database 11g, we can define referential integrity based on functions of columns, not just on real columns themselves. So, in this new release, we can get rid of the trigger and instead just code a virtual column, as shown in lines 3 through 5 of Listing 4.
Code Listing 4: Referential integrity via a virtual column
SQL> create table data
2 ( the_key varchar2(10) primary key,
3 status_code
4 as (substr( the_key, 5, 2 ))
5 references lookup
6 );
Table created.
SQL> insert into data (the_key) values ( '1234AA7890' );
1 row created.
SQL> insert into data (the_key) values ( '1234XX7890' );
insert into data (the_key) values ( '1234XX7890' )
*
ERROR at line 1:
ORA-02291: integrity constraint (OPS$TKYTE.SYS_C009730) violated - parent key not found
The virtual column is simply metadatait does not store the data redundantly, and it does not require the trigger to maintain the value. And it can now be used to enforce the integrity constraint, further increasing the times when you can use declarative integrity in your applications.
But Wait, There's More to Virtual ColumnsVirtual columns can do even more for you, and the most important new capability associated with virtual columns has to do with the optimizer.
The cost-based optimizer is a complex piece of software that takes many bits of input, including
Based on the input, the optimizer creates a plan for executing a SQL statement. It does this in large part by estimating cardinality values (the number of rows) for each step of a plan and coming up with a cost for executing that step of the plan, based on how many rows it anticipates being returned. For example, for even the most trivial of queries,
select * from t where x = 5;
Oracle Database will attempt to compute the cardinality of the WHERE clauseWHERE X=5and based on that, it will figure out how to execute the query.
For example, suppose a table T has 100,000 records and X is indexed. The query optimizer will attempt to figure out the cardinality returned by WHERE X=5 when it decides whether to use the index. If X=5 is estimated to return 2 rows, the optimizer will almost certainly use the index. On the other hand, if X=5 is estimated to return 50,000 rows, the optimizer is likely to go for the more efficient full-table scan. For intermediate values, there will be a point up to which the optimizer will use the index, and after that, it will go for the full-table scan.
So, getting the right cardinality estimate is crucial. If the optimizer guesses "2 rows" but the query actually returns 50,000 rows, we will have a performance issue. An index will be used when it should not be. Likewise, if the optimizer guesses "50,000 rows" but the query actually returns only 2 rows, we will have run a full-table scan when using an index would have been the obviously correct choice.
Well, given the above really simple query, the optimizer is typically pretty good. Collect statistics on the table, column, and index, and the optimizer would probably estimate the right cardinality in most cases. But what about when the query is a bit more complex:
select * from emp where salary+commission > 50,000;
This is a more typical query, involving an expressiona function applied to various columns. The result of this expression is what we need to know about to correctly estimate the cardinality here. But because the optimizer has information only on individual thingsthe table, the salary column, the commission column, and so onit cannot correctly estimate the number of rows returned. If we were using this query in a larger, more-complex one where it was in a subquery or being joined to, a wrong cardinality estimate could lead to a disastrous plan in terms of performance.
In Oracle Database 10g Release 2 and before, we had a couple of tools we could use for this. We could have created a function-based index on EMP (SALARY+COMMISSION) and gathered statistics on that index. That would have given the optimizer some information, but at the cost of maintaining the index at runtime and the resulting increased storage. We could also have used dynamic sampling , whereby the optimizer, at hard-parse time, would have dynamically sampled the underlying table with predicates it needed to "guess" about (our WHERE SALARY+COMMISSION > 50,000 predicate above, for example). But this approach would come at the cost of increased parse times and increased resource utilization, because a mini "gather statistics" (similar to DBMS_STATS.GATHER_TABLE_STATS) would be executed each time we hard-parsed the query.
Enter virtual columns and the ability to gather statistics against them in Oracle Database 11g. By way of example, I'll create a small EMP table with some SALARY and COMMISSION data. I'll use ALL_OBJECTS to generate the data, using the OBJECT_ID column to populate SALARY and using DBMS_RANDOM to make up some COMMISSION values:
SQL> create table emp_data 2 as 3 select all_objects.*, 4 object_id salary, 5 round( 6 dbms_random.value(1000,100000)) 7 commission 8 from all_objects; Table created. and then gather statistics on it: SQL> begin 2 dbms_stats.gather_table_stats 3 ( user, 4 'EMP_DATA', 5 method_opt=>'for columns ' || 6 'salary,commission size 254' 7 ); 8 end; 9 / PL/SQL procedure successfully completed.
Code Listing 5: Determine averages and total compensation
SQL> column avg_salary new_val ASALARY SQL> column avg_commission new_val ACOMMISSION SQL> column avg_comp new_val ACOMP SQL> select count( case when salary+commission > avg_comp then 1 end ) above_comp, avg_comp, 2 count( case when salary > avg_salary then 1 end ) above_salary, avg_salary, 3 count( case when commission > avg_commission then 1 end ) 4 above_commission, avg_commission, 5 count(*) total 6 from emp_data, ( select avg(salary) avg_salary, 7 avg(commission) avg_commission, 8 avg(salary+commission) avg_comp 9 from emp_data) 10 group by avg_salary, avg_commission, avg_comp 11 / ABOVE_COMP AVG_COMP ABOVE_SALARY AVG_SALARY ABOVE_COMMISSION AVG_COMMISSION TOTAL ---------- ---------- ------------ ----------- ---------------- -------------- ----- 33829 85039.7057 33603 34564.1069 33739 50475.5988 67477
So far, so good. As expected, the people in about 50 percent of the rows (33,829) make more than the average SALARY, COMMISSION, and total compensation (SALARY+COMMISSION). If we ask Oracle Database how many rows are expected for each of these, however, we'll see a different story. For the discrete columns SALARY and COMMISSION, Oracle Database is very accurate, as shown in Listing 6. But for the SALARY+COMMISSION column, the cardinality story is very different, as shown in Listing 7.
Code Listing 6: Average cardinality estimate for SALARY and COMMISSION
SQL> set autotrace traceonly explain SQL> select * from emp_data where salary > &Asalary; old 1: select * from emp_data where salary > &Asalary new 1: select * from emp_data where salary > 34564.1069 Execution Plan ---------------------------------------- Plan hash value: 4079136661 ------------------------------------------------------------------------------------- | Id | Operation | Name |Rows |Bytes |Cost (%CPU) |Time | ------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | |33315 | 3611K | 307 (1) | 00:00:04 | |* 1 | TABLE ACCESS FULL | EMP_DATA |33315 | 3611K | 307 (1) | 00:00:04 | ------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("SALARY">34564.1069) SQL> select * from emp_data where commission > &Acommission; old 1: select * from emp_data where commission > &Acommission new 1: select * from emp_data where commission > 50475.5988 Execution Plan ------------------------------------------------------------------------ Plan hash value: 4079136661 ---------------------------------------------------------------------------------- | Id | Operation | Name |Rows |Bytes |Cost (%CPU) | Time | ---------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | |34663 |3757K | 307 (1) | 00:00:04 | |* 1 | TABLE ACCESS FULL | EMP_DATA |34663 |3757K | 307 (1) | 00:00:04 | ---------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("COMMISSION">50475.5988)
Code Listing 7: Low estimate for rows returned for SALARY+COMMISSION
SQL> select * from emp_data where salary+commission > &ACOMP; old 1: select * from emp_data where salary+commission > &ACOMP new 1: select * from emp_data where salary+commission > 85039.7057 Execution Plan ------------------------------------ Plan hash value: 4079136661 --------------------------------------------------------------------------------- | Id | Operation | Name |Rows | Bytes |Cost (%CPU) | Time | --------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | |3374 | 365K | 307 (1)| 00:00:04 | |* 1 | TABLE ACCESS FULL | EMP_DATA |3374 | 365K | 307 (1)| 00:00:04 | --------------------------------------------------------------------------------- Predicate Information (identified by operation id): ----------------------------------------------------------------------- 1 - filter("SALARY"+"COMMISSION">85039.7057)
As you can see, the cardinality is way off. The value reported is about 5 percent of the rows (3374)very far away from 50 percent. This is because the optimizer knows a lot of information about the SALARY column and the COMMISSION column but nothing about the SALARY+COMMISSION expression.
So, what can we do in this case? We need to "feed" the optimizer information about the SALARY+COMMISSION expression. In Oracle Database 11g, we can add a virtual column
SQL> alter table emp_data add (comp as (salary+commission)); Table altered. and then gather statistics on it: SQL> begin 2 dbms_stats.gather_table_stats 3 ( user, 4 'EMP_DATA', 5 method_opt=> 'for columns COMP' 6 ); 7 end; 8 / PL/SQL procedure successfully completed.
Now the optimizer has the same amount of information on this virtual COMP column as it does on the SALARY and COMMISSION columns and the cardinality estimates are dead on, as shown in Listing 8.
Code Listing 8: With virtual column, better estimate for SALARY+COMMISSION
SQL> select * from emp_data where salary+commission > &ACOMP; old 1: select * from emp_data where salary+commission > &ACOMP new 1: select * from emp_data where salary+commission > 85039.7057 Execution Plan ----------------------------------- Plan hash value: 4079136661 ------------------------------------------------------------------------------------ | Id | Operation | Name |Rows |Bytes |Cost (%CPU) |Time | ------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 34162 |3869K | 307 (1) |00:00:04 | |* 1 | TABLE ACCESS FULL | EMP_DATA | 34162 |3869K | 307 (1) |00:00:04 | ------------------------------------------------------------------------------------ Predicate Information (identified by operation id): ---------------------------------------------------------------------- 1 - filter("EMP_DATA"."COMP">85039.7057)
ASK Tom
|
DISCLAIMER: We've captured these popular historical magazine articles for your reference. Links etc contained within these article possibly won't work. These articles are provided as-is with no guarantee that the information within them is the best advice for current versions of the Oracle Database.