Dear Colleagues,
I have come across a problem when ranking regression slopes over a database link.
I have a table containing dates and file volumes for downloads from several sources. I want to make a Top N for the sources most growing in their volumes. For this, I calculate slopes for all the sources using regr_slope() function and then rank the slopes using row_number() over (order by slope desc nulls last) as rn_top. Then I want to select rows where rn_top <= :n.
The results of the query obtained over a database link differ from the results obtained locally: locally obtained results are correct, remote ones demonstrate strange behavior. Regression is correct, ranking is correct; select * from (…ranking…) is NOT correct even without any “where” condition – the slopes are correct, but their ranking is not.
This effect does not take place for constants, nor for max/min/avg functions – only for regr_* functions. The effect is stable for two versions of remote servers: Oracle Database 19c Enterprise Edition and Oracle Database 11g Express Edition, and Oracle Database 19c Enterprise Edition wherefrom the database link goes.
For reproducing:
CREATE TABLE "TEST_003" ("TASK_SOURCE_ID" NUMBER(6,0), "TASK_DATE_DOWNLOADED" DATE, "TASK_FILE_SIZE" VARCHAR2(128)) ; /* historically task_file_size is varchar in original table */
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114656',to_date('26.08.24','DD.MM.RR'),'8266');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114657',to_date('26.08.24','DD.MM.RR'),'6925');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114658',to_date('26.08.24','DD.MM.RR'),'8783');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114659',to_date('26.08.24','DD.MM.RR'),'6590');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114660',to_date('26.08.24','DD.MM.RR'),'7204');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('122994',to_date('26.08.24','DD.MM.RR'),'59904');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('120116',to_date('27.08.24','DD.MM.RR'),'35125');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114656',to_date('27.08.24','DD.MM.RR'),'8226');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114658',to_date('27.08.24','DD.MM.RR'),'8784');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114659',to_date('27.08.24','DD.MM.RR'),'6591');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114660',to_date('27.08.24','DD.MM.RR'),'7206');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114656',to_date('28.08.24','DD.MM.RR'),'8230');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114657',to_date('28.08.24','DD.MM.RR'),'6928');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114658',to_date('28.08.24','DD.MM.RR'),'8786');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114659',to_date('28.08.24','DD.MM.RR'),'6592');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114660',to_date('28.08.24','DD.MM.RR'),'7205');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114656',to_date('29.08.24','DD.MM.RR'),'8230');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114657',to_date('29.08.24','DD.MM.RR'),'6928');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114658',to_date('29.08.24','DD.MM.RR'),'8787');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114659',to_date('29.08.24','DD.MM.RR'),'6592');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114660',to_date('29.08.24','DD.MM.RR'),'7207');
Commit;
select * from (
-- vvvv -- correct
select row_number() over (order by slope desc nulls last) as rn_top, task_source_id, round(slope, 1) /*for better reading results*/ as slope
from
(
select task_source_id
, regr_slope(to_number(task_file_size), task_date_downloaded - to_date('01.01.2022', 'dd.mm.yyyy')) as slope
from test_003@wr_dbl -- << -- DBlink that spoils the outer select
group by task_source_id
)
order by rn_top
-- ^^^^ -- correct
) order by rn_top
;
The correct results for local running of the entire query and for the remote running of the ranking subquery:
1 114658 1,4
2 114657 1,1
3 114660 0,8
4 114659 0,7
5 114656 -10,4
6 120116 null
7 122994 null
The
incorrect results for remote running of the entire query:
1 114660 0,8
2 114659 0,7
3 114658 1,4
4 114657 1,1
5 114656 -10,4
6 120116 null
7 122994 null
Thank you for your attention.