Analytics and Big Data

A Window into the World of Analytic Functions

Part 10 in a series on the basics of the relational database and SQL

By Melanie Caffrey

March/April 2013

Part 9 in this series, “Having Sums, Averages, and Other Grouped Data” (Oracle Magazine, January/February 2013), introduced common SQL aggregate functions and the GROUP BY and HAVING clauses, showing how you can use them to manipulate single-row and grouped result set data to convey more-meaningful results. The discussion of aggregate functions segues logically into the subject of more-advanced SQL operations that use aggregations and other specific views of your data. This article is the first in a three-article sequence that introduces you to some commonly used analytic functions and their associated clauses. Analytic functions not only operate on multiple rows but also can perform operations such as ranking data, calculating running totals, and identifying changes between different time periods (to name a few)—all of which facilitate creation of queries that answer business questions for reporting purposes.

To try out the examples in this series, you need access to an Oracle Database instance. If necessary, download and install an Oracle Database edition for your operating system. I recommend installing Oracle Database, Express Edition 11g Release 2. If you install the Oracle Database software, choose the installation option that enables you to create and configure a database. A new database, including sample user accounts and their associated schemas, will be created for you. (Note that SQL_101 is the user account to use for the examples in this series; it’s also the schema in which you’ll create database tables and other objects.) When the installation process prompts you to specify schema passwords, enter and confirm passwords for SYS and SYSTEM and make a note of them.

Finally—whether you installed the database software from scratch or have access to an existing Oracle Database instance—download, unzip, and execute the SQL script to create the tables for the SQL_101 schema that are required for this article’s examples. (View the script in a text editor for execution instructions.)

Increasing Your Bottom Line

You can use standard SQL to answer most data questions. However, pure SQL queries that answer questions such as “What is the running total of employee salary values as they are summed row by row?” aren’t easy to write and may not perform well over time. Analytic functions add extensions to SQL that make such operations faster-running and easier to code.

The query in Listing 1 demonstrates use of the SUM analytic function. The query results list all employees alongside their respective salary values and display a cumulative total of their salaries.

Code Listing 1: Obtain a cumulative salary total, row by row, for all employees

SQL> set feedback on
SQL> set lines 32000
SQL> select last_name, first_name, salary,
  2     SUM (salary)
  3    OVER (ORDER BY last_name, first_name) running_total
  4    from employee
  5   order by last_name, first_name;
LAST_NAME  FIRST_NAME                         SALARY RUNNING_TOTAL
—————————  ——————————— ————————————————————————————— —————————————
Dovichi    Lori
Eckhardt   Emily                              100000        100000
Friedli    Roger                               60000        160000
James      Betsy                               60000        220000
Jeffrey    Thomas                             300000        520000
Michaels   Matthew                             70000        590000
Newton     Donald                              80000        670000
Newton     Frances                             75000        745000
Wong       Theresa                             70000        815000
leblanc    mark                                65000        880000
peterson   michael                             90000        970000
11 rows selected.

This result is accomplished with the query line that reads

SUM (salary)
  OVER (ORDER BY last_name, first_name) running_total
Anatomy of an Analytic Function

Learning the syntax of an analytic function is half the battle in harnessing its power for efficient query processing. The syntax for the analytic query line in Listing 1 is

FUNCTION_NAME( column | expression,column | expression,... )
OVER
( Order-by-Clause )

In Listing 1, the function name is SUM. The argument to the SUM function is the SALARY column (although it could also be an expression). The OVER clause identifies this function call as an analytic function (as opposed to an aggregate function). The ORDER BY clause identifies the piece of data this analytic function will be performed “over.”

This series will discuss scalar subqueries in a later installment. Suffice it to say for this article’s purposes that using a scalar subquery is another method you could employ to achieve the result obtained in Listing 1. However, it would perform significantly more slowly and its syntax would be more difficult to write than the analytic query line in Listing 1.

Code Listing 2: Obtain a cumulative salary total, row by row, by department

SQL> select last_name, first_name, department_id, salary,
  2     SUM (salary)
  3    OVER (PARTITION BY department_id ORDER BY last_name, first_name) department_total
  4    from employee
  5  order by department_id, last_name, first_name;
LAST_NAME  FIRST_NAME     DEPARTMENT_ID     SALARY  DEPARTMENT_TOTAL
—————————  ——————————  ————————————————  ————————— —————————————————
Dovichi    Lori                      10
Eckhardt   Emily                     10     100000           100000
Friedli    Roger                     10      60000           160000
James      Betsy                     10      60000           220000
Michaels   Matthew                   10      70000           290000
Newton     Donald                    10      80000           370000
leblanc    mark                      20      65000            65000
peterson   michael                   20      90000           155000
Jeffrey    Thomas                    30     300000           300000
Wong       Theresa                   30      70000           370000
Newton     Frances                           75000            75000
11 rows selected.

The query in Listing 2 cumulatively sums the salary values of the employee rows within each department. The PARTITION clause ensures that the analytic function is applied independently to each department group (or partition). You can see that the cumulative total resets after the department changes from 10 to 20, and again from 20 to 30, and finally from 30 to an employee record that has no department ID. The analytic function syntax including a PARTITION clause expands as follows on the syntax used in the Listing 1 example:

FUNCTION_NAME( argument,argument,… )
OVER
( Partition-Clause Order-by-Clause )
A Separate Order

The queries in Listings 1 and 2 sort the rows returned by employee last name and first name. The query in Listing 3 uses a slightly different ordering criterion for the analytic function computation.

Code Listing 3: Compute each row based on salary value

SQL> select last_name, first_name, department_id, salary,
  2     SUM (salary)
  3    OVER (PARTITION BY department_id ORDER BY salary) department_total
  4    from employee
  5  order by department_id, salary, last_name, first_name;
LAST_NAME  FIRST_NAME     DEPARTMENT_ID     SALARY  DEPARTMENT_TOTAL
—————————  ———————————    —————————————  —————————  ————————————————
Friedli    Roger                     10      60000            120000
James      Betsy                     10      60000            120000
Michaels   Matthew                   10      70000            190000
Newton     Donald                    10      80000            270000
Eckhardt   Emily                     10     100000            370000
Dovichi    Lori                      10                       370000
leblanc    mark                      20      65000             65000
peterson   michael                   20      90000            155000
Wong       Theresa                   30      70000             70000
Jeffrey    Thomas                    30     300000            370000
Newton     Frances                           75000             75000
11 rows selected.

The analytic function in Listing 3 computes the department total values based on salary, in ascending order for each partition, with NULL salary values evaluated last. You can see that the record for Lori Dovichi—the only record with a NULL salary value—ends up with the same DEPARTMENT_TOTAL value as the record in the same department (Emily Eckhardt) that has the highest salary value.

An analytic function’s ORDER BY clause works independently from the ORDER BY clause of the overall query that contains the analytic function. Little or no correlation exists between the two unless they use the same column or expression listings in the same order. In Listing 4, for example, note that even though the data returned is listed in department/last name/first name order (like the result sets in Listings 1 and 2), the values returned for the DEPARTMENT_TOTAL expression match the values for those returned in Listing 3. And even though Betsy James and Lori Dovichi appear in a different order in the result sets of Listings 3 and 4, the values returned for their respective department total computations are the same.

Code Listing 4: Sort the data returned from the query in Listing 3 differently

SQL> select last_name, first_name, department_id, salary,
  2     SUM (salary)
  3    OVER (PARTITION BY department_id ORDER BY salary) department_total
  4    from employee
  5  order by department_id, last_name, first_name;
LAST_NAME  FIRST_NAME     DEPARTMENT_ID     SALARY  DEPARTMENT_TOTAL
—————————  ———————————    —————————————  —————————  ————————————————
Dovichi    Lori                      10                      370000
Eckhardt   Emily                     10     100000           370000
Friedli    Roger                     10      60000           120000
James      Betsy                     10      60000           120000
Michaels   Matthew                   10      70000           190000
Newton     Donald                    10      80000           270000
leblanc    mark                      20      65000            65000
peterson   michael                   20      90000           155000
Jeffrey    Thomas                    30     300000           370000
Wong       Theresa                   30      70000            70000
Newton     Frances                           75000            75000
11 rows selected.
Your Choice of Window

An analytic function might or might not include a windowing clause. A windowing clause is a set of parameters or keywords that defines the group (or window) of rows within a particular partition that will be evaluated for analytic function computation. The query in Listing 1 uses a windowing clause by default, because it uses an ORDER BY clause. An ORDER BY clause, in the absence of any further windowing clause parameters, effectively adds a default windowing clause: RANGE UNBOUNDED PRECEDING, which means, “The current and previous rows in the current partition are the rows that should be used in the computation.” When an ORDER BY clause isn’t accompanied by a PARTITION clause, the entire set of rows used by the analytic function is the default current partition.

The queries in Listings 3 and 4 include a PARTITION clause but use no windowing clause parameters. In the calculated results, the DEPARTMENT_TOTAL values for Betsy James and Roger Friedli are identical. In the absence of windowing clause parameters, when your query’s analytic function orders by a particular column or expression within its partition and two or more rows have the same value, the analytic function is applied to each of them and returns the same result, because the analytic function cannot ascertain the order in which they should be evaluated.

The query in Listing 5 uses the ROWS 2 PRECEDING windowing clause to sum the current row’s salary value with just the two preceding rows’ salary values. Even though the employee listed just above Matthew Michaels, Betsy James, has a DEPARTMENT_TOTAL value of 220000, the DEPARTMENT_TOTAL value listed for Matthew Michaels is 190000. This occurs because only the SALARY value for Matthew Michaels, 70000, is summed with the SALARY values of the two rows directly preceding his—those of Betsy James and Roger Friedli.

Code Listing 5: Add a ROWS windowing clause

SQL> select last_name, first_name, department_id, salary,
  2     SUM (salary)
  3    OVER (PARTITION BY department_id ORDER BY last_name, first_name
  4          ROWS 2 PRECEDING) department_total
  5    from employee
  6   order by department_id, last_name, first_name;
LAST_NAME  FIRST_NAME     DEPARTMENT_ID     SALARY  DEPARTMENT_TOTAL
—————————  ———————————    —————————————  —————————  ————————————————
Dovichi    Lori                      10
Eckhardt   Emily                     10     100000           100000
Friedli    Roger                     10      60000           160000
James      Betsy                     10      60000           220000
Michaels   Matthew                   10      70000           190000
Newton     Donald                    10      80000           210000
leblanc    mark                      20      65000            65000
peterson   michael                   20      90000           155000
Jeffrey    Thomas                    30     300000           300000
Wong       Theresa                   30      70000           370000
Newton     Frances                           75000            75000
11 rows selected.

If a windowing clause that uses parameters is added to an analytic function, the resulting syntax will look like this:

FUNCTION_NAME( argument,argument,… )
OVER
( Partition-Clause Order-by-Clause Windowing-Clause)
Multiple Windows into Your Data

The windowing clause provides either a sliding or an anchored view of data, depending on which parameters you pass to it. Queries with just an ORDER BY clause (such as those in Listings 1, 2, 3, and 4) provide an anchored view of the data: it begins with the first row (or top) of the partition and ends with the current row being processed. The query in Listing 5 results in a sliding view of the data, because the DEPARTMENT_TOTAL value for each row can change, depending on how the data is sorted (ordered) within each partition.

Listing 5 demonstrates use of the ROWS clause as the parameter input to the windowing clause. You can also create a sliding view of data by using the RANGE clause. Unlike the ROWS clause, the RANGE windowing clause can be used only with ORDER BY clauses containing columns or expressions of numeric or date datatypes. It has this datatype requirement because it operates on all rows within a certain range of the current row. The value for the column or expression by which your data is ordered within each partition falls within specified numeric or date units from the current row.

Code Listing 6: Sort a partition by date of hire and use a RANGE windowing clause

SQL> select last_name, first_name, department_id, hire_date, salary,
  2     SUM (salary)
  3    OVER (PARTITION BY department_id ORDER BY hire_date
  4          RANGE 90 PRECEDING) department_total
  5   from employee
  6   order by department_id, hire_date;
LAST_NAME  FIRST_NAME      DEPARTMENT_ID HIRE_DATE   SALARY DEPARTMENT_TOTAL
—————————  —————————— —————————————————— —————————   —————— ————————————————
Eckhardt   Emily                      10 07-JUL-04   100000           100000
Newton     Donald                     10 24-SEP-06    80000            80000
James      Betsy                      10 16-MAY-07    60000           190000
Friedli    Roger                      10 16-MAY-07    60000           190000
Michaels   Matthew                    10 16-MAY-07    70000           190000
Dovichi    Lori                       10 07-JUL-11
peterson   michael                    20 03-NOV-08    90000            90000
leblanc    mark                       20 06-MAR-09    65000            65000
Jeffrey    Thomas                     30 27-FEB-10   300000           300000
Wong       Theresa                    30 27-FEB-10    70000           370000
Newton     Frances                       14-SEP-05    75000            75000
11 rows selected.

The query in Listing 6 illustrates how the RANGE clause works. The query’s partition is sorted by HIRE_DATE. The query then specifies the following windowing clause:

RANGE 90 PRECEDING
This line means, “Provide a summary of the current row’s salary value together with the salary values of all previous rows whose HIRE_DATE value falls within 90 days preceding the HIRE_DATE value of the current row.” Note that within Department 10, only three rows have a DEPARTMENT_TOTAL value different from their SALARY value. The employees listed in these rows were all hired on the same date and therefore fall within the range of date values required for salary summation.

Also note that within Department 30, two employees were hired on the same date but only one of the rows lists a DEPARTMENT_TOTAL value different from its SALARY value. This result is due to the PRECEDING keyword in the RANGE clause. Effectively, this means, “Look at any rows that precede the current row before determining whether the HIRE_DATE units being sorted fall within the range of the current row’s HIRE_DATE.” No row precedes that of Thomas Jeffrey in Department 30, so his resultant DEPARTMENT_TOTAL value remains unchanged and is no different from his listed SALARY value.

Additionally, keep in mind that the three employees hired on May 16, 2007 all have a 00:00:00 time component to their hire date, as does the hire date for Thomas Jeffrey. Conversely, Theresa Wong was hired at 09:02:45, so Thomas Jeffrey's record comes before hers.

The query in Listing 7 illustrates the importance of using only columns or expressions of date or numeric datatypes. It tries to sort each partition by employee last name and first name. Because a RANGE windowing clause can determine only an appropriate range of values dependent upon numeric or date ranges—not textual or string ranges—it cannot determine the appropriate range and causes the query to fail.

Code Listing 7: RANGE windowing clause that uses an incorrect datatype

SQL> select last_name, first_name, department_id, hire_date, salary,
  2     SUM (salary)
  3    OVER (PARTITION BY department_id ORDER BY last_name, first_name
  4         RANGE 90 PRECEDING) department_total
  5  from employee
  6  order by department_id, hire_date;
   SUM (salary)
   *
ERROR at line 2:
ORA-30486: invalid window aggregation group in the window specification

Also, if your query’s analytic function uses a RANGE windowing clause, you will be able to use only one column or expression in the ORDER BY clause; ranges are one-dimensional. These restrictions do not apply to the ROWS windowing clause, which can be applied to any datatype and is not limited to a single column or expression in the ORDER BY clause.

Narrowing Your Viewpoint

In its most basic form, a window can be specified in one of three mutually exclusive ways. Table 1 shows the types of parameters that can be passed to the ROWS or RANGE windowing clauses.

Windowing Clause Parameter Description
current row The window begins and ends with the current row being processed.
UNBOUNDED PRECEDING The window begins with the first row of the current partition and ends with the current row being processed.
numeric expression PRECEDING ROWS clause – The window begins with the row that is numeric expression rows preceding the current row and ends with the current row being processed.
RANGE clause – The window begins with the row whose ORDER BY value is numeric expression rows less than, or preceding, the current row and ends with the current row being processed.


Table 1: Windowing clause parameters

So far, all the windows demonstrated in this article end at the current row and use preceding row or range values in their computations. You can also use the BETWEEN operator to specify a window in which the current row falls somewhere in the middle of the result set. The query in Listing 8 demonstrates that in addition to a ROWS or RANGE clause that specifies that your window starts with previous row values and ends with the current row being processed, you can also use the FOLLOWING parameter to look at rows following the current row being processed and make an evaluation based on those row values.

Code Listing 8: Query with a RANGE windowing clause that uses the BETWEEN and FOLLOWING parameters

SQL> select last_name, first_name, department_id, hire_date, salary,
  2     SUM (salary)
  3    OVER (PARTITION BY department_id ORDER BY hire_date
  4          RANGE BETWEEN 365 PRECEDING AND 365 FOLLOWING) department_total
  5  from employee
  6  order by department_id, hire_date;
LAST_NAME FIRST_NAME       DEPARTMENT_ID  HIRE_DATE   SALARY DEPARTMENT_TOTAL
————————— ———————————— ————————————————— —————————— ———————— ————————————————
Eckhardt   Emily                      10 07-JUL-04   100000           100000
Newton     Donald                     10 24-SEP-06    80000           270000
James      Betsy                      10 16-MAY-07    60000           270000
Friedli    Roger                      10 16-MAY-07    60000           270000
Michaels   Matthew                    10 16-MAY-07    70000           270000
Dovichi    Lori                       10 07-JUL-11
peterson   michael                    20 03-NOV-08    90000           155000
leblanc    mark                       20 06-MAR-09    65000           155000
Jeffrey    Thomas                     30 27-FEB-10   300000           370000
Wong       Theresa                    30 27-FEB-10    70000           370000
Newton     Frances                       14-SEP-05    75000            75000
11 rows selected.
Conclusion

Using analytic functions is a powerful way to get answers about your data that would otherwise require convoluted, possibly poorly performing SQL. Your reporting needs will dictate not only which analytic functions you use but also which windowing clauses (if any) will provide the reporting view into your data that best conveys meaningful results to your users. This article has demonstrated use of a common analytic function (SUM); the PARTITION and OVER clauses; the ROWS and RANGE windowing clauses; and several basic, common windowing clause parameter specifications. The next installment of SQL 101 will continue the discussion of analytic functions. To learn more details about what you can glean from using the Oracle analytic functions, review the documentation.

Next Steps

 READ SQL 101, Parts 1–9

READ more about
relational database design and concepts
 Oracle Database Concepts 11g Release 2 (11.2)
 Oracle Database SQL Language Reference 11g Release 1 (11.1)
 Oracle Database Data Warehousing Guide 11g Release 2 (11.2)
 Oracle SQL Developer User’s Guide Release 3.1

 DOWNLOAD the sample script for this article

DISCLAIMER: We've captured these popular historical magazine articles for your reference. Links etc contained within these article possibly won't work. These articles are provided as-is with no guarantee that the information within them is the best advice for current versions of the Oracle Database.