March/April 2013
Part 9 in this series, “Having Sums, Averages, and Other Grouped Data” (Oracle Magazine, January/February 2013), introduced common SQL aggregate functions and the GROUP BY and HAVING clauses, showing how you can use them to manipulate single-row and grouped result set data to convey more-meaningful results. The discussion of aggregate functions segues logically into the subject of more-advanced SQL operations that use aggregations and other specific views of your data. This article is the first in a three-article sequence that introduces you to some commonly used analytic functions and their associated clauses. Analytic functions not only operate on multiple rows but also can perform operations such as ranking data, calculating running totals, and identifying changes between different time periods (to name a few)—all of which facilitate creation of queries that answer business questions for reporting purposes.
To try out the examples in this series, you need access to an Oracle Database instance. If necessary, download and install an Oracle Database edition for your operating system. I recommend installing Oracle Database, Express Edition 11g Release 2. If you install the Oracle Database software, choose the installation option that enables you to create and configure a database. A new database, including sample user accounts and their associated schemas, will be created for you. (Note that SQL_101 is the user account to use for the examples in this series; it’s also the schema in which you’ll create database tables and other objects.) When the installation process prompts you to specify schema passwords, enter and confirm passwords for SYS and SYSTEM and make a note of them.
Finally—whether you installed the database software from scratch or have access to an existing Oracle Database instance—download, unzip, and execute the SQL script to create the tables for the SQL_101 schema that are required for this article’s examples. (View the script in a text editor for execution instructions.)
Increasing Your Bottom LineYou can use standard SQL to answer most data questions. However, pure SQL queries that answer questions such as “What is the running total of employee salary values as they are summed row by row?” aren’t easy to write and may not perform well over time. Analytic functions add extensions to SQL that make such operations faster-running and easier to code.
The query in Listing 1 demonstrates use of the SUM analytic function. The query results list all employees alongside their respective salary values and display a cumulative total of their salaries.
Code Listing 1: Obtain a cumulative salary total, row by row, for all employees
SQL> set feedback on SQL> set lines 32000 SQL> select last_name, first_name, salary, 2 SUM (salary) 3 OVER (ORDER BY last_name, first_name) running_total 4 from employee 5 order by last_name, first_name; LAST_NAME FIRST_NAME SALARY RUNNING_TOTAL ————————— ——————————— ————————————————————————————— ————————————— Dovichi Lori Eckhardt Emily 100000 100000 Friedli Roger 60000 160000 James Betsy 60000 220000 Jeffrey Thomas 300000 520000 Michaels Matthew 70000 590000 Newton Donald 80000 670000 Newton Frances 75000 745000 Wong Theresa 70000 815000 leblanc mark 65000 880000 peterson michael 90000 970000 11 rows selected.
This result is accomplished with the query line that reads
SUM (salary) OVER (ORDER BY last_name, first_name) running_totalAnatomy of an Analytic Function
Learning the syntax of an analytic function is half the battle in harnessing its power for efficient query processing. The syntax for the analytic query line in Listing 1 is
FUNCTION_NAME( column | expression,column | expression,... ) OVER ( Order-by-Clause )
In Listing 1, the function name is SUM. The argument to the SUM function is the SALARY column (although it could also be an expression). The OVER clause identifies this function call as an analytic function (as opposed to an aggregate function). The ORDER BY clause identifies the piece of data this analytic function will be performed “over.”
This series will discuss scalar subqueries in a later installment. Suffice it to say for this article’s purposes that using a scalar subquery is another method you could employ to achieve the result obtained in Listing 1. However, it would perform significantly more slowly and its syntax would be more difficult to write than the analytic query line in Listing 1.
Code Listing 2: Obtain a cumulative salary total, row by row, by department
SQL> select last_name, first_name, department_id, salary,
2 SUM (salary)
3 OVER (PARTITION BY department_id ORDER BY last_name, first_name) department_total
4 from employee
5 order by department_id, last_name, first_name;
LAST_NAME FIRST_NAME DEPARTMENT_ID SALARY DEPARTMENT_TOTAL
————————— —————————— ———————————————— ————————— —————————————————
Dovichi Lori 10
Eckhardt Emily 10 100000 100000
Friedli Roger 10 60000 160000
James Betsy 10 60000 220000
Michaels Matthew 10 70000 290000
Newton Donald 10 80000 370000
leblanc mark 20 65000 65000
peterson michael 20 90000 155000
Jeffrey Thomas 30 300000 300000
Wong Theresa 30 70000 370000
Newton Frances 75000 75000
11 rows selected.
The query in Listing 2 cumulatively sums the salary values of the employee rows within each department. The PARTITION clause ensures that the analytic function is applied independently to each department group (or partition). You can see that the cumulative total resets after the department changes from 10 to 20, and again from 20 to 30, and finally from 30 to an employee record that has no department ID. The analytic function syntax including a PARTITION clause expands as follows on the syntax used in the Listing 1 example:
FUNCTION_NAME( argument,argument,… ) OVER ( Partition-Clause Order-by-Clause )A Separate Order
The queries in Listings 1 and 2 sort the rows returned by employee last name and first name. The query in Listing 3 uses a slightly different ordering criterion for the analytic function computation.
Code Listing 3: Compute each row based on salary value
SQL> select last_name, first_name, department_id, salary, 2 SUM (salary) 3 OVER (PARTITION BY department_id ORDER BY salary) department_total 4 from employee 5 order by department_id, salary, last_name, first_name; LAST_NAME FIRST_NAME DEPARTMENT_ID SALARY DEPARTMENT_TOTAL ————————— ——————————— ————————————— ————————— ———————————————— Friedli Roger 10 60000 120000 James Betsy 10 60000 120000 Michaels Matthew 10 70000 190000 Newton Donald 10 80000 270000 Eckhardt Emily 10 100000 370000 Dovichi Lori 10 370000 leblanc mark 20 65000 65000 peterson michael 20 90000 155000 Wong Theresa 30 70000 70000 Jeffrey Thomas 30 300000 370000 Newton Frances 75000 75000 11 rows selected.
The analytic function in Listing 3 computes the department total values based on salary, in ascending order for each partition, with NULL salary values evaluated last. You can see that the record for Lori Dovichi—the only record with a NULL salary value—ends up with the same DEPARTMENT_TOTAL value as the record in the same department (Emily Eckhardt) that has the highest salary value.
An analytic function’s ORDER BY clause works independently from the ORDER BY clause of the overall query that contains the analytic function. Little or no correlation exists between the two unless they use the same column or expression listings in the same order. In Listing 4, for example, note that even though the data returned is listed in department/last name/first name order (like the result sets in Listings 1 and 2), the values returned for the DEPARTMENT_TOTAL expression match the values for those returned in Listing 3. And even though Betsy James and Lori Dovichi appear in a different order in the result sets of Listings 3 and 4, the values returned for their respective department total computations are the same.
Code Listing 4: Sort the data returned from the query in Listing 3 differently
SQL> select last_name, first_name, department_id, salary, 2 SUM (salary) 3 OVER (PARTITION BY department_id ORDER BY salary) department_total 4 from employee 5 order by department_id, last_name, first_name; LAST_NAME FIRST_NAME DEPARTMENT_ID SALARY DEPARTMENT_TOTAL ————————— ——————————— ————————————— ————————— ———————————————— Dovichi Lori 10 370000 Eckhardt Emily 10 100000 370000 Friedli Roger 10 60000 120000 James Betsy 10 60000 120000 Michaels Matthew 10 70000 190000 Newton Donald 10 80000 270000 leblanc mark 20 65000 65000 peterson michael 20 90000 155000 Jeffrey Thomas 30 300000 370000 Wong Theresa 30 70000 70000 Newton Frances 75000 75000 11 rows selected.Your Choice of Window
An analytic function might or might not include a windowing clause. A windowing clause is a set of parameters or keywords that defines the group (or window) of rows within a particular partition that will be evaluated for analytic function computation. The query in Listing 1 uses a windowing clause by default, because it uses an ORDER BY clause. An ORDER BY clause, in the absence of any further windowing clause parameters, effectively adds a default windowing clause: RANGE UNBOUNDED PRECEDING, which means, “The current and previous rows in the current partition are the rows that should be used in the computation.” When an ORDER BY clause isn’t accompanied by a PARTITION clause, the entire set of rows used by the analytic function is the default current partition.
The queries in Listings 3 and 4 include a PARTITION clause but use no windowing clause parameters. In the calculated results, the DEPARTMENT_TOTAL values for Betsy James and Roger Friedli are identical. In the absence of windowing clause parameters, when your query’s analytic function orders by a particular column or expression within its partition and two or more rows have the same value, the analytic function is applied to each of them and returns the same result, because the analytic function cannot ascertain the order in which they should be evaluated.
The query in Listing 5 uses the ROWS 2 PRECEDING windowing clause to sum the current row’s salary value with just the two preceding rows’ salary values. Even though the employee listed just above Matthew Michaels, Betsy James, has a DEPARTMENT_TOTAL value of 220000, the DEPARTMENT_TOTAL value listed for Matthew Michaels is 190000. This occurs because only the SALARY value for Matthew Michaels, 70000, is summed with the SALARY values of the two rows directly preceding his—those of Betsy James and Roger Friedli.
Code Listing 5: Add a ROWS windowing clause
SQL> select last_name, first_name, department_id, salary, 2 SUM (salary) 3 OVER (PARTITION BY department_id ORDER BY last_name, first_name 4 ROWS 2 PRECEDING) department_total 5 from employee 6 order by department_id, last_name, first_name; LAST_NAME FIRST_NAME DEPARTMENT_ID SALARY DEPARTMENT_TOTAL ————————— ——————————— ————————————— ————————— ———————————————— Dovichi Lori 10 Eckhardt Emily 10 100000 100000 Friedli Roger 10 60000 160000 James Betsy 10 60000 220000 Michaels Matthew 10 70000 190000 Newton Donald 10 80000 210000 leblanc mark 20 65000 65000 peterson michael 20 90000 155000 Jeffrey Thomas 30 300000 300000 Wong Theresa 30 70000 370000 Newton Frances 75000 75000 11 rows selected.
If a windowing clause that uses parameters is added to an analytic function, the resulting syntax will look like this:
FUNCTION_NAME( argument,argument,… ) OVER ( Partition-Clause Order-by-Clause Windowing-Clause)Multiple Windows into Your Data
The windowing clause provides either a sliding or an anchored view of data, depending on which parameters you pass to it. Queries with just an ORDER BY clause (such as those in Listings 1, 2, 3, and 4) provide an anchored view of the data: it begins with the first row (or top) of the partition and ends with the current row being processed. The query in Listing 5 results in a sliding view of the data, because the DEPARTMENT_TOTAL value for each row can change, depending on how the data is sorted (ordered) within each partition.
Listing 5 demonstrates use of the ROWS clause as the parameter input to the windowing clause. You can also create a sliding view of data by using the RANGE clause. Unlike the ROWS clause, the RANGE windowing clause can be used only with ORDER BY clauses containing columns or expressions of numeric or date datatypes. It has this datatype requirement because it operates on all rows within a certain range of the current row. The value for the column or expression by which your data is ordered within each partition falls within specified numeric or date units from the current row.
Code Listing 6: Sort a partition by date of hire and use a RANGE windowing clause
SQL> select last_name, first_name, department_id, hire_date, salary, 2 SUM (salary) 3 OVER (PARTITION BY department_id ORDER BY hire_date 4 RANGE 90 PRECEDING) department_total 5 from employee 6 order by department_id, hire_date; LAST_NAME FIRST_NAME DEPARTMENT_ID HIRE_DATE SALARY DEPARTMENT_TOTAL ————————— —————————— —————————————————— ————————— —————— ———————————————— Eckhardt Emily 10 07-JUL-04 100000 100000 Newton Donald 10 24-SEP-06 80000 80000 James Betsy 10 16-MAY-07 60000 190000 Friedli Roger 10 16-MAY-07 60000 190000 Michaels Matthew 10 16-MAY-07 70000 190000 Dovichi Lori 10 07-JUL-11 peterson michael 20 03-NOV-08 90000 90000 leblanc mark 20 06-MAR-09 65000 65000 Jeffrey Thomas 30 27-FEB-10 300000 300000 Wong Theresa 30 27-FEB-10 70000 370000 Newton Frances 14-SEP-05 75000 75000 11 rows selected.
The query in Listing 6 illustrates how the RANGE clause works. The query’s partition is sorted by HIRE_DATE. The query then specifies the following windowing clause:
RANGE 90 PRECEDINGThis line means, “Provide a summary of the current row’s salary value together with the salary values of all previous rows whose HIRE_DATE value falls within 90 days preceding the HIRE_DATE value of the current row.” Note that within Department 10, only three rows have a DEPARTMENT_TOTAL value different from their SALARY value. The employees listed in these rows were all hired on the same date and therefore fall within the range of date values required for salary summation.
Also note that within Department 30, two employees were hired on the same date but only one of the rows lists a DEPARTMENT_TOTAL value different from its SALARY value. This result is due to the PRECEDING keyword in the RANGE clause. Effectively, this means, “Look at any rows that precede the current row before determining whether the HIRE_DATE units being sorted fall within the range of the current row’s HIRE_DATE.” No row precedes that of Thomas Jeffrey in Department 30, so his resultant DEPARTMENT_TOTAL value remains unchanged and is no different from his listed SALARY value.
Additionally, keep in mind that the three employees hired on May 16, 2007 all have a 00:00:00 time component to their hire date, as does the hire date for Thomas Jeffrey. Conversely, Theresa Wong was hired at 09:02:45, so Thomas Jeffrey's record comes before hers.
The query in Listing 7 illustrates the importance of using only columns or expressions of date or numeric datatypes. It tries to sort each partition by employee last name and first name. Because a RANGE windowing clause can determine only an appropriate range of values dependent upon numeric or date ranges—not textual or string ranges—it cannot determine the appropriate range and causes the query to fail.
Code Listing 7: RANGE windowing clause that uses an incorrect datatype
SQL> select last_name, first_name, department_id, hire_date, salary, 2 SUM (salary) 3 OVER (PARTITION BY department_id ORDER BY last_name, first_name 4 RANGE 90 PRECEDING) department_total 5 from employee 6 order by department_id, hire_date; SUM (salary) * ERROR at line 2: ORA-30486: invalid window aggregation group in the window specification
Also, if your query’s analytic function uses a RANGE windowing clause, you will be able to use only one column or expression in the ORDER BY clause; ranges are one-dimensional. These restrictions do not apply to the ROWS windowing clause, which can be applied to any datatype and is not limited to a single column or expression in the ORDER BY clause.
Narrowing Your ViewpointIn its most basic form, a window can be specified in one of three mutually exclusive ways. Table 1 shows the types of parameters that can be passed to the ROWS or RANGE windowing clauses.
Windowing Clause Parameter | Description |
current row | The window begins and ends with the current row being processed. |
UNBOUNDED PRECEDING | The window begins with the first row of the current partition and ends with the current row being processed. |
numeric expression PRECEDING | ROWS clause – The window begins with the row that is numeric expression rows preceding the current row and ends with the current row being processed. RANGE clause – The window begins with the row whose ORDER BY value is numeric expression rows less than, or preceding, the current row and ends with the current row being processed. |
Table 1: Windowing clause parameters
So far, all the windows demonstrated in this article end at the current row and use preceding row or range values in their computations. You can also use the BETWEEN operator to specify a window in which the current row falls somewhere in the middle of the result set. The query in Listing 8 demonstrates that in addition to a ROWS or RANGE clause that specifies that your window starts with previous row values and ends with the current row being processed, you can also use the FOLLOWING parameter to look at rows following the current row being processed and make an evaluation based on those row values.
Code Listing 8: Query with a RANGE windowing clause that uses the BETWEEN and FOLLOWING parameters
SQL> select last_name, first_name, department_id, hire_date, salary, 2 SUM (salary) 3 OVER (PARTITION BY department_id ORDER BY hire_date 4 RANGE BETWEEN 365 PRECEDING AND 365 FOLLOWING) department_total 5 from employee 6 order by department_id, hire_date; LAST_NAME FIRST_NAME DEPARTMENT_ID HIRE_DATE SALARY DEPARTMENT_TOTAL ————————— ———————————— ————————————————— —————————— ———————— ———————————————— Eckhardt Emily 10 07-JUL-04 100000 100000 Newton Donald 10 24-SEP-06 80000 270000 James Betsy 10 16-MAY-07 60000 270000 Friedli Roger 10 16-MAY-07 60000 270000 Michaels Matthew 10 16-MAY-07 70000 270000 Dovichi Lori 10 07-JUL-11 peterson michael 20 03-NOV-08 90000 155000 leblanc mark 20 06-MAR-09 65000 155000 Jeffrey Thomas 30 27-FEB-10 300000 370000 Wong Theresa 30 27-FEB-10 70000 370000 Newton Frances 14-SEP-05 75000 75000 11 rows selected.Conclusion
Using analytic functions is a powerful way to get answers about your data that would otherwise require convoluted, possibly poorly performing SQL. Your reporting needs will dictate not only which analytic functions you use but also which windowing clauses (if any) will provide the reporting view into your data that best conveys meaningful results to your users. This article has demonstrated use of a common analytic function (SUM); the PARTITION and OVER clauses; the ROWS and RANGE windowing clauses; and several basic, common windowing clause parameter specifications. The next installment of SQL 101 will continue the discussion of analytic functions. To learn more details about what you can glean from using the Oracle analytic functions, review the documentation.
Next Steps READ SQL 101, Parts 1–9
READ more about
relational database design and concepts
Oracle Database Concepts 11g Release 2 (11.2)
Oracle Database SQL Language Reference 11g Release 1 (11.1)
Oracle Database Data Warehousing Guide 11g Release 2 (11.2)
Oracle SQL Developer User’s Guide Release 3.1
DOWNLOAD the sample script for this article
DISCLAIMER: We've captured these popular historical magazine articles for your reference. Links etc contained within these article possibly won't work. These articles are provided as-is with no guarantee that the information within them is the best advice for current versions of the Oracle Database.