May/June 2013
This article is the second in a three-article sequence that introduces you to some commonly used SQL analytic functions and their associated clauses. Analytic functions add extensions to SQL that make complex queries easier to code and faster-running. In Part 10 of this series, “A Window into the World of Analytic Functions” (Oracle Magazine, March/April 2013), you learned how the SUM analytic function, the PARTITION and OVER clauses, the ROWS and RANGE windowing clauses, and several windowing clause parameter specifications help you manipulate result set data for business reporting purposes. This article introduces you to analytic functions that enable your queries to
Rank data—for example, to display the three employees with the highest salaries by department
Return the first or last value from a group—for example, to compare the salary of every employee in a department with that of the last employee hired in that department
Provide your report with the rows that either precede (lead) or follow (lag) the current row being processed—for example, to discover how many days before an employee’s hire date the penultimately hired employee was hired
Obtain percentages within a group—for example, to find out what percentage a particular employee received of the total amount a department pays its employees annually
To try out the examples in this series, you need access to an Oracle Database instance. If necessary, download and install an Oracle Database edition for your operating system. I recommend installing Oracle Database, Express Edition 11g Release 2. If you install the Oracle Database software, choose the installation option that enables you to create and configure a database. A new database, including sample user accounts and their associated schemas, will be created for you. (Note that SQL_101 is the user account for the examples in this series; it’s also the schema in which you’ll create database tables and other objects.) When the installation process prompts you to specify schema passwords, enter and confirm passwords for SYS and SYSTEM and make a note of them.
Finally—whether you installed the database software from scratch or have access to an existing Oracle Database instance—download, unzip, and execute the SQL script to create the tables for the SQL_101 schema required for this article’s examples. (View the script in a text editor for execution instructions.)
Being OutrankedA query that retrieves the top or bottom N row(s) from a database table that satisfy certain criteria is sometimes referred to as a top-N query. For example, you might want to ask who the most highly paid employees are or which department has the lowest sales figures. An easy way to answer such a question is to use either the RANK or the DENSE_RANK analytic function, both of which calculate and display the numerical rank of a value within a group of values. The example in Listing 1 lists all employees alongside their respective salary values, partitioned and sorted by department and further sorted, in descending order, by salary. It uses the DENSE_RANK analytic function to assign a numerical rank to the salaries within each department.
Code Listing 1: Code Listing 1: List employees, ranked by department, by salary
SQL> set feedback on SQL> set lines 32000 SQL> select department_id, last_name, first_name, salary, 2 DENSE_RANK() over (partition by department_id 3 order by salary desc) dense_ranking 4 from employee 5 order by department_id, salary desc, last_name, first_name; DEPARTMENT_ID LAST_NAME FIRST_NAME SALARY DENSE_RANKING ————————————— ——————————— ————————————————————————— —————— ————————————— 10 Dovichi Lori 1 10 Eckhardt Emily 100000 2 10 Newton Donald 80000 3 10 Michaels Matthew 70000 4 10 Friedli Roger 60000 5 10 James Betsy 60000 5 20 peterson michael 90000 1 20 leblanc mark 65000 2 30 Jeffrey Thomas 300000 1 30 Wong Theresa 70000 2 Newton Frances 75000 1 11 rows selected.
The results in Listing 1 reveal an interesting analytic function phenomenon. When a query uses a descending sort order, a NULL value can affect the outcome of the analytic function being used. By default, with a descending sort, SQL views NULLs as being higher than any other value. In Listing 1, the record for employee Lori Dovichi has no salary value, yet the DENSE_RANK analytic function gives her salary a rank value of 1— the highest rank—in Department 10.
You can eliminate NULLs from consideration by adding a WHERE clause such as
WHERE SALARY IS NOT NULLAlternatively, you can use the NULLS LAST extension to the ORDER BY clause in your windowing clause, as demonstrated in Listing 2. The record for Lori Dovichi still appears first for Department 10, because the query’s overall ORDER BY clause still orders by salary in descending order. But the rank value attributed to that record is now 5—the lowest rank value for Department 10. Note also that the DENSE_RANK function assigns the same rank value, 4, to two records (Roger Friedli’s and Betsy James’) in the results for Department 10, because both employees have the same salary value.
Code Listing 2: List employees, ranked by department, by salary, with NULLS LAST
SQL> select department_id, last_name, first_name, salary, 2 DENSE_RANK() over (partition by department_id 3 order by salary desc NULLS LAST) dense_ranking 4 from employee 5 order by department_id, salary desc, last_name, first_name; DEPARTMENT_ID LAST_NAME FIRST_NAME SALARY DENSE_RANKING ————————————— ——————————— ————————————————————————— —————— ————————————— 10 Dovichi Lori 5 10 Eckhardt Emily 100000 1 10 Newton Donald 80000 2 10 Michaels Matthew 70000 3 10 Friedli Roger 60000 4 10 James Betsy 60000 4 20 peterson michael 90000 1 20 leblanc mark 65000 2 30 Jeffrey Thomas 300000 1 30 Wong Theresa 70000 2 Newton Frances 75000 1 11 rows selected.
Listing 3 performs a similar query to that of Listing 2, with the RANK analytic function instead of DENSE_RANK. Note that the results include no rank value of 5 for Department 10. The reason is that DENSE_RANK and RANK attribute rank values to records differently. DENSE_RANK returns ranking numbers without any gaps, regardless of any records that have the same value for the expression in the ORDER BY windowing clause. In contrast, when the RANK analytic function finds multiple rows with the same value and assigns them the same rank, the subsequent rank numbers take account of this by skipping ahead. As you see in the results for Listing 3, RANK assigns a rank value of 4 to two records and skips to a rank value of 6 for the final record in the department, which has the lowest rank value.
Code Listing 3: Use the RANK analytic function instead of the DENSE_RANK analytic function
SQL> select department_id, last_name, first_name, salary, 2 RANK() over (partition by department_id 3 order by salary desc NULLS LAST) regular_ranking 4 from employee 5 order by department_id, salary desc, last_name, first_name; DEPARTMENT_ID LAST_NAME FIRST_NAME SALARY REGULAR_RANKING ————————————— ——————————— ——————————————————————— —————— ——————————————— 10 Dovichi Lori 6 10 Eckhardt Emily 100000 1 10 Newton Donald 80000 2 10 Michaels Matthew 70000 3 10 Friedli Roger 60000 4 10 James Betsy 60000 4 20 peterson michael 90000 1 20 leblanc mark 65000 2 30 Jeffrey Thomas 300000 1 30 Wong Theresa 70000 2 Newton Frances 75000 1 11 rows selected.
For reporting purposes, it might occasionally be useful to include the first value obtained for a particular group or window when displaying your query results. You can use the FIRST_VALUE analytic function for this purpose, as shown in Listing 4. The query in Listing 4 returns windows that are partitioned by department and ordered by date of hire within each partition. Alongside each returned salary value, the first salary value obtained per window is also displayed. This information could be useful for comparing the salary value of every employee in a department with that of the first employee hired in that department.
Code Listing 4: Display the first value returned per window, using FIRST_VALUE
SQL> select last_name, first_name, department_id, hire_date, salary, 2 FIRST_VALUE(salary) 3 over (partition by department_id order by hire_date) first_sal_by_dept 4 from employee 5 order by department_id, hire_date; LAST_NAME FIRST_NAME DEPARTMENT_ID HIRE_DATE SALARY FIRST_SAL_BY_DEPT ————————— —————————————— —————————————— ————————— ——————— ————————————————— Eckhardt Emily 10 07-JUL-04 100000 100000 Newton Donald 10 24-SEP-06 80000 100000 James Betsy 10 16-MAY-07 60000 100000 Friedli Roger 10 16-MAY-07 60000 100000 Michaels Matthew 10 16-MAY-07 70000 100000 Dovichi Lori 10 07-JUL-11 100000 peterson michael 20 03-NOV-08 90000 90000 leblanc mark 20 06-MAR-09 65000 90000 Jeffrey Thomas 30 27-FEB-10 300000 300000 Wong Theresa 30 27-FEB-10 70000 300000 Newton Frances 14-SEP-05 75000 75000 11 rows selected.
Contrast the query results in Listing 4 with the results in Listing 5. The query in Listing 5 uses the LAST_VALUE analytic function—but uses it incorrectly. You cannot simply swap the LAST_VALUE analytic function for the FIRST_VALUE analytic function and expect the results to return the last value per window. Recall that the default behavior of an ORDER BY clause in a partition without an accompanying windowing clause is to make the default window a sliding view that operates on the current row and all preceding rows. In Listing 5, then, the value returned from the call to the LAST_VALUE function is always the same as the current row’s salary value. To make the call to the LAST_VALUE function more meaningful, you must add a windowing clause to the ORDER BY clause in the partition, as shown in Listing 6.
Code Listing 5: Fail to obtain the last value per window, because of incorrect use of LAST_VALUE
SQL> select last_name, first_name, department_id, hire_date, salary, 2 LAST_VALUE(salary) 3 over (partition by department_id order by hire_date) last_sal_by_dept 4 from employee 5 order by department_id, hire_date; LAST_NAME FIRST_NAME DEPARTMENT_ID HIRE_DATE SALARY LAST_SAL_BY_DEPT ————————— ——————————— ——————————————— ————————— ——————— ———————————————— Eckhardt Emily 10 07-JUL-04 100000 100000 Newton Donald 10 24-SEP-06 80000 80000 James Betsy 10 16-MAY-07 60000 70000 Friedli Roger 10 16-MAY-07 60000 70000 Michaels Matthew 10 16-MAY-07 70000 70000 Dovichi Lori 10 07-JUL-11 peterson michael 20 03-NOV-08 90000 90000 leblanc mark 20 06-MAR-09 65000 65000 Jeffrey Thomas 30 27-FEB-10 300000 300000 Wong Theresa 30 27-FEB-10 70000 70000 Newton Frances 14-SEP-05 75000 75000 11 rows selected.
The query in Listing 6 displays the employee records that fall within each department partition, sorted by date of hire, along with the last salary value within each partition and its associated last name value. The query specifies WHERE SALARY IS NOT NULL, because a NULL salary value would not provide a useful comparison. Because the record for the employee Lori Dovichi is NULL, it’s not included in the partition for Department 10. All other records in that partition display the last name and salary values for Matthew Michaels as the last employee record that falls within the partition. The values for LAST_EMP and LAST_SAL for the employee Frances Newton are the same as those for her employee record, because no other employee records have a NULL value for their Department ID. You can use the IGNORE NULLS extension to eliminate NULLS from consideration in your LAST_VALUE analytic function call if you want to include all records, regardless of the presence of NULL values. To do this, change the call to LAST_VALUE (salary) to LAST_VALUE (salary IGNORE NULLS).
Code Listing 6: Display the last value per window through correct use of LAST_VALUE
SQL> select last_name, first_name, department_id dept_id, hire_date, salary, 2 LAST_VALUE(last_name) 3 over (partition by department_id order by hire_date 4 ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) last_emp, 5 LAST_VALUE(salary) 6 over (partition by department_id order by hire_date 7 ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) last_sal 8 from employee 9 where salary is not null 10 order by department_id, hire_date, last_name, first_name; LAST_NAME FIRST_NAME DEPT_ID HIRE_DATE SALARY LAST_EMP LAST_SAL —————————— ————————————— ————————— ————————— ————————— ———————— —————————— Eckhardt Emily 10 07-JUL-04 100000 Michaels 70000 Newton Donald 10 24-SEP-06 80000 Michaels 70000 Friedli Roger 10 16-MAY-07 60000 Michaels 70000 James Betsy 10 16-MAY-07 60000 Michaels 70000 Michaels Matthew 10 16-MAY-07 70000 Michaels 70000 peterson michael 20 03-NOV-08 90000 leblanc 65000 leblanc mark 20 06-MAR-09 65000 leblanc 65000 Jeffrey Thomas 30 27-FEB-10 300000 Wong 70000 Wong Theresa 30 27-FEB-10 70000 Wong 70000 Newton Frances 14-SEP-05 75000 Newton 75000 10 rows selected.
A common reporting requirement, for comparison purposes, is to access data not only from the current row being reviewed but also from the rows that precede or follow the current row. Consider the query in Listing 7. Using the LAG analytic function, you can obtain a side-by-side view of when the current employee was hired, alongside the last date on which an employee was hired, on a per-department basis. For example, the record for the employee Donald Newton shows that the employee hired before him was hired on 07-JUL-04. If you look at the record immediately preceding the record for Donald Newton—the record for Emily Eckhardt—you can see that she was indeed hired on 07-JUL-04.
Code Listing 7: Use the LAG analytic function to obtain row data preceding the current row
SQL> select last_name, first_name, department_id, hire_date, 2 LAG(hire_date, 1, null) over (partition by department_id 3 order by hire_date) prev_hire_date 4 from employee 5 order by department_id, hire_date, last_name, first_name; LAST_NAME FIRST_NAME DEPARTMENT_ID HIRE_DATE PREV_HIRE ————————— —————————————— —————————————————————————— ————————— ————————— Eckhardt Emily 10 07-JUL-04 Newton Donald 10 24-SEP-06 07-JUL-04 Friedli Roger 10 16-MAY-07 24-SEP-06 James Betsy 10 16-MAY-07 16-MAY-07 Michaels Matthew 10 16-MAY-07 16-MAY-07 Dovichi Lori 10 07-JUL-11 16-MAY-07 peterson michael 20 03-NOV-08 leblanc mark 20 06-MAR-09 03-NOV-08 Jeffrey Thomas 30 27-FEB-10 Wong Theresa 30 27-FEB-10 27-FEB-10 Newton Frances 14-SEP-05 11 rows selected.
The syntax for the LAG analytic function is
LAG(column | expression, offset, default)
Offset is a positive integer that defaults to a value of 1. This parameter tells the LAG function how many previous rows it should go back. A value of 1 means, “Look at the row immediately preceding the current row within the current window.” Default is the value you want to return if the offset value (index) is out of range for the current window. For the first row in a group, the default value will be returned.
The syntax for the LEAD analytic function is almost the same as that for the LAG analytic function, with two differences:
The offset parameter tells the LEAD function how many rows after the current row it should go forward.
For the last row in a group, the default value will be returned.
Consider the query in Listing 8. As Listing 8 shows, the LEAD analytic function looks at and reports on the row directly following the current row. The value for the FOLLOWING_HIRE_DATE column for the employee records that are listed last for each department is NULL, because there are no further records in each department group. Similarly, every time a new department group is displayed, the value for the PREV_HIRE_DATE column for the employee records listed first is also NULL, because there are no previous records in the group.
Code Listing 8: Use LAG and LEAD to obtain row data preceding and following the current row
SQL> select last_name, first_name, department_id, hire_date, 2 LAG(hire_date, 1, null) over (partition by department_id 3 order by hire_date) prev_hire_date, 4 LEAD(hire_date, 1, null) over (partition by department_id 5 order by hire_date) following_hire_date 6 from employee 7 order by department_id, hire_date, last_name, first_name; LAST_NAME FIRST_NAME DEPARTMENT_ID HIRE_DATE PREV_HIRE FOLLOWING ——————————— ———————————— —————————————————— ————————— ————————— ————————— Eckhardt Emily 10 07-JUL-04 24-SEP-06 Newton Donald 10 24-SEP-06 07-JUL-04 16-MAY-07 Friedli Roger 10 16-MAY-07 24-SEP-06 16-MAY-07 James Betsy 10 16-MAY-07 16-MAY-07 16-MAY-07 Michaels Matthew 10 16-MAY-07 16-MAY-07 07-JUL-11 Dovichi Lori 10 07-JUL-11 16-MAY-07 peterson michael 20 03-NOV-08 06-MAR-09 leblanc mark 20 06-MAR-09 03-NOV-08 Jeffrey Thomas 30 27-FEB-10 27-FEB-10 Wong Theresa 30 27-FEB-10 27-FEB-10 Newton Frances 14-SEP-05 11 rows selected.
Business users often need to report on percentages. Sales amounts, overall costs, and annual salaries are just some of the figures that are likely to require a percentage calculation. The query in Listing 9 uses the RATIO_TO_REPORT analytic function to answer the question “What percentage of the total annual salary allotment does each employee receive?” The syntax for the RATIO_TO_REPORT analytic function is
RATIO_TO_REPORT( column | expression)
Code Listing 9: Use RATIO_TO_REPORT to obtain the percentage of salaries
SQL> select last_name, first_name, department_id, hire_date, salary, round(RATIO_TO_REPORT(salary) over ()*100, 2) sal_percentage 2 from employee 3 order by department_id, salary desc, last_name, first_name; LAST_NAME FIRST_NAME DEPARTMENT_ID HIRE_DATE SALARY SAL_PERCENTAGE ——————————— ———————————— —————————————— —————————— —————— —————————————— Dovichi Lori 10 07-JUL-11 Eckhardt Emily 10 07-JUL-04 100000 10.31 Newton Donald 10 24-SEP-06 80000 8.25 Michaels Matthew 10 16-MAY-07 70000 7.22 Friedli Roger 10 16-MAY-07 60000 6.19 James Betsy 10 16-MAY-07 60000 6.19 peterson michael 20 03-NOV-08 90000 9.28 leblanc mark 20 06-MAR-09 65000 6.7 Jeffrey Thomas 30 27-FEB-10 300000 30.93 Wong Theresa 30 27-FEB-10 70000 7.22 Newton Frances 14-SEP-05 75000 7.73 11 rows selected.
One nice feature of this analytic function is that it does the work for you of summing the expression values that are used to obtain the resultant percentage values. You don’t need an additional aggregate function call. Note also that the analytic function call in this query example uses the entire set of rows as its window, because it does not specify any ORDER BY clause or additional windowing clauses. Compare the results from Listing 9 with those obtained from the query in Listing 10. The query in Listing 10 adds a PARTITION clause to the OVER clause to calculate the percentage of total departmental salaries each employee receives.
Code Listing 10: Use RATIO_TO_REPORT to obtain the percentage of salaries, by department
SQL> select last_name, first_name, department_id, hire_date, salary, round(ratio_to_report(salary) 2 over(partition by department_id)*100, 2) sal_dept_pct 3 from employee 4 order by department_id, salary desc, last_name, first_name; LAST_NAME FIRST_NAME DEPARTMENT_ID HIRE_DATE SALARY SAL_DEPT_PCT —————————— ————————————— ——————————————— ————————— —————— ———————————— Dovichi Lori 10 07-JUL-11 Eckhardt Emily 10 07-JUL-04 100000 27.03 Newton Donald 10 24-SEP-06 80000 21.62 Michaels Matthew 10 16-MAY-07 70000 18.92 Friedli Roger 10 16-MAY-07 60000 16.22 James Betsy 10 16-MAY-07 60000 16.22 peterson michael 20 03-NOV-08 90000 58.06 leblanc mark 20 06-MAR-09 65000 41.94 Jeffrey Thomas 30 27-FEB-10 300000 81.08 Wong Theresa 30 27-FEB-10 70000 18.92 Newton Frances 14-SEP-05 75000 100 11 rows selected.
This article continued the discussion of analytic functions introduced in Part 10 of this series. It demonstrated how you can use seven more of the most common analytic functions to manipulate the way your results display. You’ve seen how to use the RANK and DENSE_RANK analytic functions to obtain results for top-N queries and understand the differences between them. You’ve learned how the FIRST_VALUE and LAST_VALUE analytic functions can be used in your reports for data comparisons within groups. You also now know how LEAD and LAG can show you the row values preceding and following your current row values to facilitate data comparisons. Those reading the online version of this article have also learned how to obtain percentages within a group with the RATIO_TO_REPORT analytic function.
In all cases, you can see that harnessing the power of these analytic functions greatly reduces the need to write complicated SQL to obtain the same results. Review the documentation at bit.ly/yWtbz1 and bit.ly/R4cZyq for more details. The next installment of SQL 101 will conclude the discussion of analytic functions.
READ more about
|
DISCLAIMER: We've captured these popular historical magazine articles for your reference. Links etc contained within these article possibly won't work. These articles are provided as-is with no guarantee that the information within them is the best advice for current versions of the Oracle Database.