Skip to Main Content
  • Questions
  • A problem when ranking regression slopes over a database link

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Yury.

Asked: September 04, 2024 - 7:41 am UTC

Last updated: September 30, 2024 - 7:09 am UTC

Version: Oracle Database 19c Enterprise Edition

Viewed 1000+ times

You Asked

Dear Colleagues,

I have come across a problem when ranking regression slopes over a database link.

I have a table containing dates and file volumes for downloads from several sources. I want to make a Top N for the sources most growing in their volumes. For this, I calculate slopes for all the sources using regr_slope() function and then rank the slopes using row_number() over (order by slope desc nulls last) as rn_top. Then I want to select rows where rn_top <= :n.

The results of the query obtained over a database link differ from the results obtained locally: locally obtained results are correct, remote ones demonstrate strange behavior. Regression is correct, ranking is correct; select * from (…ranking…) is NOT correct even without any “where” condition – the slopes are correct, but their ranking is not.

This effect does not take place for constants, nor for max/min/avg functions – only for regr_* functions. The effect is stable for two versions of remote servers: Oracle Database 19c Enterprise Edition and Oracle Database 11g Express Edition, and Oracle Database 19c Enterprise Edition wherefrom the database link goes.

For reproducing:

CREATE TABLE "TEST_003" ("TASK_SOURCE_ID" NUMBER(6,0), "TASK_DATE_DOWNLOADED" DATE, "TASK_FILE_SIZE" VARCHAR2(128)) ; /* historically task_file_size is varchar in original table */
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114656',to_date('26.08.24','DD.MM.RR'),'8266');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114657',to_date('26.08.24','DD.MM.RR'),'6925');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114658',to_date('26.08.24','DD.MM.RR'),'8783');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114659',to_date('26.08.24','DD.MM.RR'),'6590');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114660',to_date('26.08.24','DD.MM.RR'),'7204');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('122994',to_date('26.08.24','DD.MM.RR'),'59904');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('120116',to_date('27.08.24','DD.MM.RR'),'35125');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114656',to_date('27.08.24','DD.MM.RR'),'8226');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114658',to_date('27.08.24','DD.MM.RR'),'8784');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114659',to_date('27.08.24','DD.MM.RR'),'6591');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114660',to_date('27.08.24','DD.MM.RR'),'7206');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114656',to_date('28.08.24','DD.MM.RR'),'8230');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114657',to_date('28.08.24','DD.MM.RR'),'6928');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114658',to_date('28.08.24','DD.MM.RR'),'8786');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114659',to_date('28.08.24','DD.MM.RR'),'6592');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114660',to_date('28.08.24','DD.MM.RR'),'7205');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114656',to_date('29.08.24','DD.MM.RR'),'8230');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114657',to_date('29.08.24','DD.MM.RR'),'6928');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114658',to_date('29.08.24','DD.MM.RR'),'8787');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114659',to_date('29.08.24','DD.MM.RR'),'6592');
Insert into TEST_003 (TASK_SOURCE_ID,TASK_DATE_DOWNLOADED,TASK_FILE_SIZE) values ('114660',to_date('29.08.24','DD.MM.RR'),'7207');
Commit;

select * from (
-- vvvv -- correct
select row_number() over (order by slope desc nulls last) as rn_top, task_source_id, round(slope, 1) /*for better reading results*/ as slope
from
(
select task_source_id
, regr_slope(to_number(task_file_size), task_date_downloaded - to_date('01.01.2022', 'dd.mm.yyyy')) as slope
from test_003@wr_dbl -- << -- DBlink that spoils the outer select
group by task_source_id
)
order by rn_top
-- ^^^^ -- correct
) order by rn_top
;


The correct results for local running of the entire query and for the remote running of the ranking subquery:
1 114658 1,4
2 114657 1,1
3 114660 0,8
4 114659 0,7
5 114656 -10,4
6 120116 null
7 122994 null


The incorrect results for remote running of the entire query:
1 114660 0,8
2 114659 0,7
3 114658 1,4
4 114657 1,1
5 114656 -10,4
6 120116 null
7 122994 null


Thank you for your attention.

and Connor said...

Unless I'm misunderstanding your instructions, I can't reproduce on my set up

v21 local => v19 remote
=================
SQL> select * from (
  2  -- vvvv -- correct
  3  select row_number() over (order by slope desc nulls last) as rn_top, task_source_id, round(slope, 1) /*for better reading results*/ as slope
  4  from
  5  (
  6  select task_source_id
  7  , regr_slope(to_number(task_file_size), task_date_downloaded - to_date('01.01.2022', 'dd.mm.yyyy')) as slope
  8  from test_003@db19pdb1 -- << -- DBlink that spoils the outer select
  9  group by task_source_id
 10  )
 11  order by rn_top
 12  -- ^^^^ -- correct
 13  ) order by rn_top
 14  ;

    RN_TOP TASK_SOURCE_ID      SLOPE
---------- -------------- ----------
         1         114658        1.4
         2         114657        1.1
         3         114660         .8
         4         114659         .7
         5         114656      -10.4
         6         120116
         7         122994

7 rows selected.




v19 local => v11.2.0.4 remote
=================
SQL> select * from (
  2  -- vvvv -- correct
  3  select row_number() over (order by slope desc nulls last) as rn_top, task_source_id, round(slope, 1) /*for better reading results*/ as slope
  4  from
  5  (
  6  select task_source_id
  7  , regr_slope(to_number(task_file_size), task_date_downloaded - to_date('01.01.2022', 'dd.mm.yyyy')) as slope
  8  from test_003@db11 -- << -- DBlink that spoils the outer select
  9  group by task_source_id
 10  )
 11  order by rn_top
 12  -- ^^^^ -- correct
 13  ) order by rn_top
 14  ;

    RN_TOP TASK_SOURCE_ID      SLOPE
---------- -------------- ----------
         1         114658        1.4
         2         114657        1.1
         3         114660         .8
         4         114659         .7
         5         114656      -10.4
         6         120116
         7         122994

7 rows selected.



Am I missing something?

Rating

  (2 ratings)

Comments

Yury, September 05, 2024 - 9:31 am UTC

Dear Connor,

Thank you for your attention. Seems, all your steps were correct, but I have started over and have got the same issue. Comments are limited, so briefly:

- Created a new user AskTom at remote DB, created and populated a table there;
- Created a new DB link from Monitor at local DB to the remote AskTom
create database link "AT_DBL" connect to "ASKTOM" identified by "..." using '.../...';

- Ran the query at remote AskTom without DBlink - correct;
- Ran the query at local Monitor over DBlink to remote AskTom - wrong;
- Ran the ranking subquery at local Monitor over DBlink to remote AskTom - correct;
Worked around:
- Created a view of the ranking subquery at remote AskTom;
- Ran select *... query at local Monitor from the view at remote AskTom - correct, fixed!

Seems, something exists.
Connor McDonald
September 17, 2024 - 5:10 am UTC

Ideally we'd like to see your script (where it fails) from top to bottom - full definition, data population etc etc...

Yury Kirchin, September 26, 2024 - 11:00 am UTC

Dear Connor,

Sorry for delay. Seems, the problem is in the local DB server wherefrom the request goes via the DBlink to the remote DB server.

Initially the problem was discovered at following DB versions:
- Remote: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.3.0.0.0
- Local: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.3.0.0.0

Then it was repeatedly discovered at following DB versions:
- Remote (another): Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production
- Local (same): Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.3.0.0.0

Then I wrote the question and the problem was not reproduced at your set up.

Then I changed the local DB server - and the problem was NOT reproduced any more!
- Remote (same): Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production
- Local (another): Oracle Database 10g Express Edition Release 10.2.0.1.0 - Product

These let me to suppose that the problem is in some way related to one particular version of the LOCAL database: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.3.0.0.0.

Would you still like the entire script? It may be longer than the comment form allows.

Regards,
Yury
Connor McDonald
September 30, 2024 - 7:09 am UTC

I suppose in all cases, I'm going to recommend the same thing

19.3.0.0.0 => nearly 20 sets of patches out of date now
11.2.0.2.0 => out of support, and even if you had to stay on it, you'd definitely want 11.2.0.4

If there is a bug in particular patch level of these databases, then Support are going to ask you to do exactly this - can you reproduce on current/supported versions

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.