Database, SQL and PL/SQL

Leading Ranks and Lagging Percentages: Analytic Functions, Continued

Part 11 in a series on the basics of the relational database and SQL

By Melanie Caffrey

May/June 2013

This article is the second in a three-article sequence that introduces you to some commonly used SQL analytic functions and their associated clauses. Analytic functions add extensions to SQL that make complex queries easier to code and faster-running. In Part 10 of this series, “A Window into the World of Analytic Functions” (Oracle Magazine, March/April 2013), you learned how the SUM analytic function, the PARTITION and OVER clauses, the ROWS and RANGE windowing clauses, and several windowing clause parameter specifications help you manipulate result set data for business reporting purposes. This article introduces you to analytic functions that enable your queries to

  • Rank data—for example, to display the three employees with the highest salaries by department

  • Return the first or last value from a group—for example, to compare the salary of every employee in a department with that of the last employee hired in that department

  • Provide your report with the rows that either precede (lead) or follow (lag) the current row being processed—for example, to discover how many days before an employee’s hire date the penultimately hired employee was hired

  • Obtain percentages within a group—for example, to find out what percentage a particular employee received of the total amount a department pays its employees annually

To try out the examples in this series, you need access to an Oracle Database instance. If necessary, download and install an Oracle Database edition for your operating system. I recommend installing Oracle Database, Express Edition 11g Release 2. If you install the Oracle Database software, choose the installation option that enables you to create and configure a database. A new database, including sample user accounts and their associated schemas, will be created for you. (Note that SQL_101 is the user account for the examples in this series; it’s also the schema in which you’ll create database tables and other objects.) When the installation process prompts you to specify schema passwords, enter and confirm passwords for SYS and SYSTEM and make a note of them.

Finally—whether you installed the database software from scratch or have access to an existing Oracle Database instance—download, unzip, and execute the SQL script to create the tables for the SQL_101 schema required for this article’s examples. (View the script in a text editor for execution instructions.)

Being Outranked

A query that retrieves the top or bottom N row(s) from a database table that satisfy certain criteria is sometimes referred to as a top-N query. For example, you might want to ask who the most highly paid employees are or which department has the lowest sales figures. An easy way to answer such a question is to use either the RANK or the DENSE_RANK analytic function, both of which calculate and display the numerical rank of a value within a group of values. The example in Listing 1 lists all employees alongside their respective salary values, partitioned and sorted by department and further sorted, in descending order, by salary. It uses the DENSE_RANK analytic function to assign a numerical rank to the salaries within each department.

Code Listing 1: Code Listing 1: List employees, ranked by department, by salary

SQL> set feedback on
SQL> set lines 32000
SQL> select department_id, last_name, first_name, salary,
  2         DENSE_RANK() over (partition by department_id
  3                                order by salary desc) dense_ranking
  4    from employee
  5  order by department_id, salary desc, last_name, first_name;
DEPARTMENT_ID LAST_NAME    FIRST_NAME                    SALARY DENSE_RANKING
————————————— ———————————  —————————————————————————     —————— —————————————
           10 Dovichi      Lori                                             1
           10 Eckhardt     Emily                         100000             2
           10 Newton       Donald                         80000             3
           10 Michaels     Matthew                        70000             4
           10 Friedli      Roger                          60000             5
           10 James        Betsy                          60000             5
           20 peterson     michael                        90000             1
           20 leblanc      mark                           65000             2
           30 Jeffrey      Thomas                        300000             1
           30 Wong         Theresa                        70000             2
              Newton       Frances                        75000             1
11 rows selected.

The results in Listing 1 reveal an interesting analytic function phenomenon. When a query uses a descending sort order, a NULL value can affect the outcome of the analytic function being used. By default, with a descending sort, SQL views NULLs as being higher than any other value. In Listing 1, the record for employee Lori Dovichi has no salary value, yet the DENSE_RANK analytic function gives her salary a rank value of 1— the highest rank—in Department 10.

You can eliminate NULLs from consideration by adding a WHERE clause such as

WHERE SALARY IS NOT NULL
Alternatively, you can use the NULLS LAST extension to the ORDER BY clause in your windowing clause, as demonstrated in Listing 2. The record for Lori Dovichi still appears first for Department 10, because the query’s overall ORDER BY clause still orders by salary in descending order. But the rank value attributed to that record is now 5—the lowest rank value for Department 10. Note also that the DENSE_RANK function assigns the same rank value, 4, to two records (Roger Friedli’s and Betsy James’) in the results for Department 10, because both employees have the same salary value.

Code Listing 2: List employees, ranked by department, by salary, with NULLS LAST

SQL> select department_id, last_name, first_name, salary,
  2      DENSE_RANK() over (partition by department_id
  3                             order by salary desc NULLS LAST) dense_ranking
  4    from employee
  5  order by department_id, salary desc, last_name, first_name;
DEPARTMENT_ID LAST_NAME    FIRST_NAME                    SALARY DENSE_RANKING
————————————— ———————————  —————————————————————————     —————— —————————————
           10 Dovichi      Lori                                             5
           10 Eckhardt     Emily                         100000             1
           10 Newton       Donald                         80000             2
           10 Michaels     Matthew                        70000             3
           10 Friedli      Roger                          60000             4
           10 James        Betsy                          60000             4
           20 peterson     michael                        90000             1
           20 leblanc      mark                           65000             2
           30 Jeffrey      Thomas                        300000             1
           30 Wong         Theresa                        70000             2
              Newton       Frances                        75000             1
11 rows selected.

Listing 3 performs a similar query to that of Listing 2, with the RANK analytic function instead of DENSE_RANK. Note that the results include no rank value of 5 for Department 10. The reason is that DENSE_RANK and RANK attribute rank values to records differently. DENSE_RANK returns ranking numbers without any gaps, regardless of any records that have the same value for the expression in the ORDER BY windowing clause. In contrast, when the RANK analytic function finds multiple rows with the same value and assigns them the same rank, the subsequent rank numbers take account of this by skipping ahead. As you see in the results for Listing 3, RANK assigns a rank value of 4 to two records and skips to a rank value of 6 for the final record in the department, which has the lowest rank value.

Code Listing 3: Use the RANK analytic function instead of the DENSE_RANK analytic function

SQL> select department_id, last_name, first_name, salary,
  2         RANK() over (partition by department_id
  3                          order by salary desc NULLS LAST) regular_ranking
  4    from employee
  5  order by department_id, salary desc, last_name, first_name;
DEPARTMENT_ID LAST_NAME    FIRST_NAME                SALARY REGULAR_RANKING
————————————— ———————————  ———————————————————————   —————— ———————————————
           10 Dovichi      Lori                                           6
           10 Eckhardt     Emily                     100000               1
           10 Newton       Donald                     80000               2
           10 Michaels     Matthew                    70000               3
           10 Friedli      Roger                      60000               4
           10 James        Betsy                      60000               4
           20 peterson     michael                    90000               1
           20 leblanc      mark                       65000               2
           30 Jeffrey      Thomas                    300000               1
           30 Wong         Theresa                    70000               2
              Newton       Frances                    75000               1
11 rows selected.

Finishing First or Last

For reporting purposes, it might occasionally be useful to include the first value obtained for a particular group or window when displaying your query results. You can use the FIRST_VALUE analytic function for this purpose, as shown in Listing 4. The query in Listing 4 returns windows that are partitioned by department and ordered by date of hire within each partition. Alongside each returned salary value, the first salary value obtained per window is also displayed. This information could be useful for comparing the salary value of every employee in a department with that of the first employee hired in that department.

Code Listing 4: Display the first value returned per window, using FIRST_VALUE

SQL> select last_name, first_name, department_id, hire_date, salary,
  2       FIRST_VALUE(salary)
  3       over (partition by department_id order by hire_date) first_sal_by_dept
  4   from employee
  5  order by department_id, hire_date;
LAST_NAME     FIRST_NAME   DEPARTMENT_ID HIRE_DATE  SALARY FIRST_SAL_BY_DEPT
————————— ——————————————  —————————————— ————————— ——————— —————————————————
Eckhardt      Emily                   10 07-JUL-04  100000            100000
Newton        Donald                  10 24-SEP-06   80000            100000
James         Betsy                   10 16-MAY-07   60000            100000
Friedli       Roger                   10 16-MAY-07   60000            100000
Michaels      Matthew                 10 16-MAY-07   70000            100000
Dovichi       Lori                    10 07-JUL-11                    100000
peterson      michael                 20 03-NOV-08   90000             90000
leblanc       mark                    20 06-MAR-09   65000             90000
Jeffrey       Thomas                  30 27-FEB-10  300000            300000
Wong          Theresa                 30 27-FEB-10   70000            300000
Newton        Frances                    14-SEP-05   75000             75000
11 rows selected.

Contrast the query results in Listing 4 with the results in Listing 5. The query in Listing 5 uses the LAST_VALUE analytic function—but uses it incorrectly. You cannot simply swap the LAST_VALUE analytic function for the FIRST_VALUE analytic function and expect the results to return the last value per window. Recall that the default behavior of an ORDER BY clause in a partition without an accompanying windowing clause is to make the default window a sliding view that operates on the current row and all preceding rows. In Listing 5, then, the value returned from the call to the LAST_VALUE function is always the same as the current row’s salary value. To make the call to the LAST_VALUE function more meaningful, you must add a windowing clause to the ORDER BY clause in the partition, as shown in Listing 6.

Code Listing 5: Fail to obtain the last value per window, because of incorrect use of LAST_VALUE

SQL> select last_name, first_name, department_id, hire_date, salary,
  2       LAST_VALUE(salary)
  3       over (partition by department_id order by hire_date) last_sal_by_dept
  4    from employee
  5  order by department_id, hire_date;
LAST_NAME     FIRST_NAME   DEPARTMENT_ID HIRE_DATE  SALARY LAST_SAL_BY_DEPT
—————————    ——————————— ——————————————— ————————— ——————— ————————————————
Eckhardt      Emily                   10 07-JUL-04  100000           100000
Newton        Donald                  10 24-SEP-06   80000            80000
James         Betsy                   10 16-MAY-07   60000            70000
Friedli       Roger                   10 16-MAY-07   60000            70000
Michaels      Matthew                 10 16-MAY-07   70000            70000
Dovichi       Lori                    10 07-JUL-11
peterson      michael                 20 03-NOV-08   90000            90000
leblanc       mark                    20 06-MAR-09   65000            65000
Jeffrey       Thomas                  30 27-FEB-10  300000           300000
Wong          Theresa                 30 27-FEB-10   70000            70000
Newton        Frances                    14-SEP-05   75000            75000
11 rows selected.

The query in Listing 6 displays the employee records that fall within each department partition, sorted by date of hire, along with the last salary value within each partition and its associated last name value. The query specifies WHERE SALARY IS NOT NULL, because a NULL salary value would not provide a useful comparison. Because the record for the employee Lori Dovichi is NULL, it’s not included in the partition for Department 10. All other records in that partition display the last name and salary values for Matthew Michaels as the last employee record that falls within the partition. The values for LAST_EMP and LAST_SAL for the employee Frances Newton are the same as those for her employee record, because no other employee records have a NULL value for their Department ID. You can use the IGNORE NULLS extension to eliminate NULLS from consideration in your LAST_VALUE analytic function call if you want to include all records, regardless of the presence of NULL values. To do this, change the call to LAST_VALUE (salary) to LAST_VALUE (salary IGNORE NULLS).

Code Listing 6: Display the last value per window through correct use of LAST_VALUE

SQL> select last_name, first_name, department_id dept_id, hire_date, salary,
  2         LAST_VALUE(last_name)
  3         over (partition by department_id order by hire_date
  4               ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) last_emp,
  5         LAST_VALUE(salary)
  6         over (partition by department_id order by hire_date
  7               ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) last_sal
  8    from employee
  9   where salary is not null
 10   order by department_id, hire_date, last_name, first_name;
LAST_NAME     FIRST_NAME   DEPT_ID HIRE_DATE    SALARY LAST_EMP   LAST_SAL
—————————— ————————————— ————————— ————————— ————————— ———————— ——————————
Eckhardt      Emily             10 07-JUL-04    100000 Michaels      70000
Newton        Donald            10 24-SEP-06     80000 Michaels      70000
Friedli       Roger             10 16-MAY-07     60000 Michaels      70000
James         Betsy             10 16-MAY-07     60000 Michaels      70000
Michaels      Matthew           10 16-MAY-07     70000 Michaels      70000
peterson      michael           20 03-NOV-08     90000 leblanc       65000
leblanc       mark              20 06-MAR-09     65000 leblanc       65000
Jeffrey       Thomas            30 27-FEB-10    300000 Wong          70000
Wong          Theresa           30 27-FEB-10     70000 Wong          70000
Newton        Frances              14-SEP-05     75000 Newton        75000
10 rows selected.

In the Lead and Lagging Behind

A common reporting requirement, for comparison purposes, is to access data not only from the current row being reviewed but also from the rows that precede or follow the current row. Consider the query in Listing 7. Using the LAG analytic function, you can obtain a side-by-side view of when the current employee was hired, alongside the last date on which an employee was hired, on a per-department basis. For example, the record for the employee Donald Newton shows that the employee hired before him was hired on 07-JUL-04. If you look at the record immediately preceding the record for Donald Newton—the record for Emily Eckhardt—you can see that she was indeed hired on 07-JUL-04.

Code Listing 7: Use the LAG analytic function to obtain row data preceding the current row

SQL> select last_name, first_name, department_id, hire_date,
  2         LAG(hire_date, 1, null) over (partition by department_id
  3                                 order by hire_date) prev_hire_date
  4    from employee
  5  order by department_id, hire_date, last_name, first_name;
LAST_NAME     FIRST_NAME              DEPARTMENT_ID HIRE_DATE PREV_HIRE
————————— —————————————— —————————————————————————— ————————— —————————
Eckhardt      Emily                              10 07-JUL-04
Newton        Donald                             10 24-SEP-06 07-JUL-04
Friedli       Roger                              10 16-MAY-07 24-SEP-06
James         Betsy                              10 16-MAY-07 16-MAY-07
Michaels      Matthew                            10 16-MAY-07 16-MAY-07
Dovichi       Lori                               10 07-JUL-11 16-MAY-07
peterson      michael                            20 03-NOV-08
leblanc       mark                               20 06-MAR-09 03-NOV-08
Jeffrey       Thomas                             30 27-FEB-10
Wong          Theresa                            30 27-FEB-10 27-FEB-10
Newton        Frances                               14-SEP-05
11 rows selected.

The syntax for the LAG analytic function is

LAG(column | expression, offset, default)

Offset is a positive integer that defaults to a value of 1. This parameter tells the LAG function how many previous rows it should go back. A value of 1 means, “Look at the row immediately preceding the current row within the current window.” Default is the value you want to return if the offset value (index) is out of range for the current window. For the first row in a group, the default value will be returned.

The syntax for the LEAD analytic function is almost the same as that for the LAG analytic function, with two differences:

  • The offset parameter tells the LEAD function how many rows after the current row it should go forward.

  • For the last row in a group, the default value will be returned.

Consider the query in Listing 8. As Listing 8 shows, the LEAD analytic function looks at and reports on the row directly following the current row. The value for the FOLLOWING_HIRE_DATE column for the employee records that are listed last for each department is NULL, because there are no further records in each department group. Similarly, every time a new department group is displayed, the value for the PREV_HIRE_DATE column for the employee records listed first is also NULL, because there are no previous records in the group.

Code Listing 8: Use LAG and LEAD to obtain row data preceding and following the current row

SQL> select last_name, first_name, department_id, hire_date,
  2         LAG(hire_date, 1, null) over (partition by department_id
  3                                 order by hire_date) prev_hire_date,
  4         LEAD(hire_date, 1, null) over (partition by department_id
  5                                 order by hire_date) following_hire_date
  6    from employee
  7  order by department_id, hire_date, last_name, first_name;
LAST_NAME      FIRST_NAME       DEPARTMENT_ID  HIRE_DATE  PREV_HIRE  FOLLOWING
———————————  ————————————  ——————————————————  —————————  —————————  —————————
Eckhardt       Emily                       10  07-JUL-04             24-SEP-06
Newton         Donald                      10  24-SEP-06  07-JUL-04  16-MAY-07
Friedli        Roger                       10  16-MAY-07  24-SEP-06  16-MAY-07
James          Betsy                       10  16-MAY-07  16-MAY-07  16-MAY-07
Michaels       Matthew                     10  16-MAY-07  16-MAY-07  07-JUL-11
Dovichi        Lori                        10  07-JUL-11  16-MAY-07
peterson       michael                     20  03-NOV-08             06-MAR-09
leblanc        mark                        20  06-MAR-09  03-NOV-08
Jeffrey        Thomas                      30  27-FEB-10             27-FEB-10
Wong           Theresa                     30  27-FEB-10  27-FEB-10
Newton         Frances                         14-SEP-05
11 rows selected.


Increasing Your Ratios

Business users often need to report on percentages. Sales amounts, overall costs, and annual salaries are just some of the figures that are likely to require a percentage calculation. The query in Listing 9 uses the RATIO_TO_REPORT analytic function to answer the question “What percentage of the total annual salary allotment does each employee receive?” The syntax for the RATIO_TO_REPORT analytic function is

RATIO_TO_REPORT( column | expression)

Code Listing 9: Use RATIO_TO_REPORT to obtain the percentage of salaries

SQL> select last_name, first_name, department_id, hire_date, salary,
round(RATIO_TO_REPORT(salary) over ()*100, 2) sal_percentage
  2    from employee
  3  order by department_id, salary desc, last_name, first_name;
LAST_NAME      FIRST_NAME    DEPARTMENT_ID  HIRE_DATE  SALARY  SAL_PERCENTAGE
———————————  ————————————   —————————————— ——————————  ——————  ——————————————
Dovichi        Lori                     10  07-JUL-11
Eckhardt       Emily                    10  07-JUL-04  100000          10.31
Newton         Donald                   10  24-SEP-06   80000           8.25
Michaels       Matthew                  10  16-MAY-07   70000           7.22
Friedli        Roger                    10  16-MAY-07   60000           6.19
James          Betsy                    10  16-MAY-07   60000           6.19
peterson       michael                  20  03-NOV-08   90000           9.28
leblanc        mark                     20  06-MAR-09   65000            6.7
Jeffrey        Thomas                   30  27-FEB-10  300000          30.93
Wong           Theresa                  30  27-FEB-10   70000           7.22
Newton         Frances                      14-SEP-05   75000           7.73
11 rows selected.

One nice feature of this analytic function is that it does the work for you of summing the expression values that are used to obtain the resultant percentage values. You don’t need an additional aggregate function call. Note also that the analytic function call in this query example uses the entire set of rows as its window, because it does not specify any ORDER BY clause or additional windowing clauses. Compare the results from Listing 9 with those obtained from the query in Listing 10. The query in Listing 10 adds a PARTITION clause to the OVER clause to calculate the percentage of total departmental salaries each employee receives.

Code Listing 10: Use RATIO_TO_REPORT to obtain the percentage of salaries, by department

SQL> select last_name, first_name, department_id, hire_date, salary,
round(ratio_to_report(salary)
  2         over(partition by department_id)*100, 2) sal_dept_pct
  3    from employee
  4  order by department_id, salary desc, last_name, first_name;
LAST_NAME      FIRST_NAME    DEPARTMENT_ID  HIRE_DATE  SALARY  SAL_DEPT_PCT
——————————  —————————————  ———————————————  —————————  ——————  ————————————
Dovichi        Lori                     10  07-JUL-11
Eckhardt       Emily                    10  07-JUL-04  100000        27.03
Newton         Donald                   10  24-SEP-06   80000        21.62
Michaels       Matthew                  10  16-MAY-07   70000        18.92
Friedli        Roger                    10  16-MAY-07   60000        16.22
James          Betsy                    10  16-MAY-07   60000        16.22
peterson       michael                  20  03-NOV-08   90000        58.06
leblanc        mark                     20  06-MAR-09   65000        41.94
Jeffrey        Thomas                   30  27-FEB-10  300000        81.08
Wong           Theresa                  30  27-FEB-10   70000        18.92
Newton         Frances                      14-SEP-05   75000          100
11 rows selected.

Conclusion

This article continued the discussion of analytic functions introduced in Part 10 of this series. It demonstrated how you can use seven more of the most common analytic functions to manipulate the way your results display. You’ve seen how to use the RANK and DENSE_RANK analytic functions to obtain results for top-N queries and understand the differences between them. You’ve learned how the FIRST_VALUE and LAST_VALUE analytic functions can be used in your reports for data comparisons within groups. You also now know how LEAD and LAG can show you the row values preceding and following your current row values to facilitate data comparisons. Those reading the online version of this article have also learned how to obtain percentages within a group with the RATIO_TO_REPORT analytic function.

In all cases, you can see that harnessing the power of these analytic functions greatly reduces the need to write complicated SQL to obtain the same results. Review the documentation at bit.ly/yWtbz1 and bit.ly/R4cZyq for more details. The next installment of SQL 101 will conclude the discussion of analytic functions.

Next Steps

 READ SQL 101, Parts 1–10

READ more about
relational database design and concepts
 Oracle Database Concepts 11g Release 2 (11.2)
 Oracle Database SQL Language Reference 11g Release 1 (11.1)
 Oracle Database Data Warehousing Guide 11g Release 2 (11.2)
 Oracle Database SQL Language Reference 11g Release 2 (11.2)

 DOWNLOAD the sample script for this article



 

DISCLAIMER: We've captured these popular historical magazine articles for your reference. Links etc contained within these article possibly won't work. These articles are provided as-is with no guarantee that the information within them is the best advice for current versions of the Oracle Database.