You Asked

I was searching a large emissions database for pairs of columns with correlated values.
My first effort identified a large number of sets with an R2 value of 1 based on blindly applying the REGR_R2 function to discrete pairs of numeric data columns.

When I started reviewing outputs I noticed that many of the REGR_R2 results were not reliable because REGR_R2 appears to return a value of 1 when the y column is constant (i.e., y_col) in the syntax REGR_R2(y_col, x_col).

I plan to add a standard deviation calculation to the script prior to the REGR_R2 calculation and forgo the REGR_R2 calculation when the standard deviation of y_col = 0.

But why does REGR_R2 return a value of 1 when the y parameter is constant and the x parameter is variable? Shouldn't it return either 0 or null?

table xy_values

y_val x_val
row 1 0.0038 0.023
row 2 0.0038 0.022
row 3 0.0038 0.085
row 4 0.0038 0.087
row 5 0.0038 0.019

select regr_r2(xy_values.y_val, xy_values.x_val) 
from  xy_values;

yields
REGR_R2(Y_VAL,X_VAL) = 1

Thanks,
Stephen

and Connor said...

As per the docs:

https://docs.oracle.com/en/database/oracle/oracle-database/18/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9

NULL if VAR_POP(expr2)  = 0

                        1 if VAR_POP(expr1)  = 0 and 
                             VAR_POP(expr2) != 0

POWER(CORR(expr1,expr),2) if VAR_POP(expr1)  > 0 and
                             VAR_POP(expr2  != 0

Is this answer out of date? If it is, please let us know via a Comment

More to Explore

PL/SQL demos

Check out more PL/SQL tutorials on our LiveSQL tool.

PL/SQL docs

PL/SQL reference manual from the Oracle documentation library

Breadcrumb

Question and Answer

You Asked

and Connor said...

More to Explore

PL/SQL demos

PL/SQL docs

REGR_R2 returns a value of 1 when y column contains a constant value (e.g., all rows have value of 0.0038).

Breadcrumb

Question and Answer

You Asked

and Connor said...

More to Explore

PL/SQL demos

PL/SQL docs