Skip to Main Content
  • Questions
  • A problem when ranking regression slopes over a database link

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Yury.

Asked: September 04, 2024 - 7:41 am UTC

Last updated: September 05, 2024 - 4:06 am UTC

Version: Oracle Database 19c Enterprise Edition

Viewed 100+ times

You Asked

Dear Colleagues,

I have come across a problem when ranking regression slopes over a database link.

I have a table containing dates and file volumes for downloads from several sources. I want to make a Top N for the sources most growing in their volumes. For this, I calculate slopes for all the sources using regr_slope() function and then rank the slopes using row_number() over (order by slope desc nulls last) as rn_top. Then I want to select rows where rn_top <= :n.

The results of the query obtained over a database link differ from the results obtained locally: locally obtained results are correct, remote ones demonstrate strange behavior. Regression is correct, ranking is correct; select * from (…ranking…) is NOT correct even without any “where” condition – the slopes are correct, but their ranking is not.

This effect does not take place for constants, nor for max/min/avg functions – only for regr_* functions. The effect is stable for two versions of remote servers: Oracle Database 19c Enterprise Edition and Oracle Database 11g Express Edition, and Oracle Database 19c Enterprise Edition wherefrom the database link goes.

For reproducing:

CREATE TABLE "TEST_003" ("TASK_SOURCE_ID" NUMBER(6,0), "TASK_DATE_DOWNLOADED" DATE, "TASK_FILE_SIZE" VARCHAR2(128)) ; /* historically task_file_size is varchar in original table */
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114656',to_date('26.08.24','DD.MM.RR'),'8266');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114657',to_date('26.08.24','DD.MM.RR'),'6925');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114658',to_date('26.08.24','DD.MM.RR'),'8783');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114659',to_date('26.08.24','DD.MM.RR'),'6590');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114660',to_date('26.08.24','DD.MM.RR'),'7204');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('122994',to_date('26.08.24','DD.MM.RR'),'59904');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('120116',to_date('27.08.24','DD.MM.RR'),'35125');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114656',to_date('27.08.24','DD.MM.RR'),'8226');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114658',to_date('27.08.24','DD.MM.RR'),'8784');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114659',to_date('27.08.24','DD.MM.RR'),'6591');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114660',to_date('27.08.24','DD.MM.RR'),'7206');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114656',to_date('28.08.24','DD.MM.RR'),'8230');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114657',to_date('28.08.24','DD.MM.RR'),'6928');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114658',to_date('28.08.24','DD.MM.RR'),'8786');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114659',to_date('28.08.24','DD.MM.RR'),'6592');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114660',to_date('28.08.24','DD.MM.RR'),'7205');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114656',to_date('29.08.24','DD.MM.RR'),'8230');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114657',to_date('29.08.24','DD.MM.RR'),'6928');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114658',to_date('29.08.24','DD.MM.RR'),'8787');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114659',to_date('29.08.24','DD.MM.RR'),'6592');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114660',to_date('29.08.24','DD.MM.RR'),'7207');
Commit;

select * from (
-- vvvv -- correct
select row_number() over (order by slope desc nulls last) as rn_top, task_source_id, round(slope, 1) /*for better reading results*/ as slope
from
(
select task_source_id
, regr_slope(to_number(task_file_size), task_date_downloaded - to_date('01.01.2022', 'dd.mm.yyyy')) as slope
from test_003@wr_dbl -- << -- DBlink that spoils the outer select
group by task_source_id
)
order by rn_top
-- ^^^^ -- correct
) order by rn_top
;


The correct results for local running of the entire query and for the remote running of the ranking subquery:
1 114658 1,4
2 114657 1,1
3 114660 0,8
4 114659 0,7
5 114656 -10,4
6 120116 null
7 122994 null


The incorrect results for remote running of the entire query:
1 114660 0,8
2 114659 0,7
3 114658 1,4
4 114657 1,1
5 114656 -10,4
6 120116 null
7 122994 null


Thank you for your attention.

and Connor said...

Unless I'm misunderstanding your instructions, I can't reproduce on my set up

v21 local => v19 remote
=================
SQL> select * from (
  2  -- vvvv -- correct
  3  select row_number() over (order by slope desc nulls last) as rn_top, task_source_id, round(slope, 1) /*for better reading results*/ as slope
  4  from
  5  (
  6  select task_source_id
  7  , regr_slope(to_number(task_file_size), task_date_downloaded - to_date('01.01.2022', 'dd.mm.yyyy')) as slope
  8  from test_003@db19pdb1 -- << -- DBlink that spoils the outer select
  9  group by task_source_id
 10  )
 11  order by rn_top
 12  -- ^^^^ -- correct
 13  ) order by rn_top
 14  ;

    RN_TOP TASK_SOURCE_ID      SLOPE
---------- -------------- ----------
         1         114658        1.4
         2         114657        1.1
         3         114660         .8
         4         114659         .7
         5         114656      -10.4
         6         120116
         7         122994

7 rows selected.




v19 local => v11.2.0.4 remote
=================
SQL> select * from (
  2  -- vvvv -- correct
  3  select row_number() over (order by slope desc nulls last) as rn_top, task_source_id, round(slope, 1) /*for better reading results*/ as slope
  4  from
  5  (
  6  select task_source_id
  7  , regr_slope(to_number(task_file_size), task_date_downloaded - to_date('01.01.2022', 'dd.mm.yyyy')) as slope
  8  from test_003@db11 -- << -- DBlink that spoils the outer select
  9  group by task_source_id
 10  )
 11  order by rn_top
 12  -- ^^^^ -- correct
 13  ) order by rn_top
 14  ;

    RN_TOP TASK_SOURCE_ID      SLOPE
---------- -------------- ----------
         1         114658        1.4
         2         114657        1.1
         3         114660         .8
         4         114659         .7
         5         114656      -10.4
         6         120116
         7         122994

7 rows selected.



Am I missing something?

Rating

  (1 rating)

We're not taking comments currently, so please try again later if you want to add a comment.

Comments

Yury, September 05, 2024 - 9:31 am UTC

Dear Connor,

Thank you for your attention. Seems, all your steps were correct, but I have started over and have got the same issue. Comments are limited, so briefly:

- Created a new user AskTom at remote DB, created and populated a table there;
- Created a new DB link from Monitor at local DB to the remote AskTom
create database link "AT_DBL" connect to "ASKTOM" identified by "..." using '.../...';

- Ran the query at remote AskTom without DBlink - correct;
- Ran the query at local Monitor over DBlink to remote AskTom - wrong;
- Ran the ranking subquery at local Monitor over DBlink to remote AskTom - correct;
Worked around:
- Created a view of the ranking subquery at remote AskTom;
- Ran select *... query at local Monitor from the view at remote AskTom - correct, fixed!

Seems, something exists.

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.