I want to run the following code, but Oracle doesn't like the distinct in the second column build. Essentially, I want to develop a running (cumulative) patient count for every incremental row. While pt_id is unique, com_pt_id is not; therefore there will be instances where the edw_pts_cum_2 starts to be less than the pts_cum count.
Here is sample of what I'm trying to accomplish. Do have any thoughts on how to develop column edw_pts_cum_2?
Cum Distance pts_cum edw_pts_cum_2 Pt_id com_pt_id
10 1 1 101 1
20 2 2 201 2
30 3 3 301 3
40 4 3 102 1
create table lab_data_res_sub_xy_com_sub
nologging
nocache
pctfree 0
as
select a.*,
count(*) over (order by distance asc
rows between unbounded preceding and current row)
as pts_cum,
count(distinct com_pt_id) over (order by distance
rows between unbounded preceding and current row)
as edw_pts_cum_2
from lab_data_res_sub_xy_com_dda a
order by distance asc, test_epatient_id;
well, the problem is..... fairly hard....
think about this, you order by distance and say the com_pt_id data looks like this
1
2
1
2
3
1
2
4
You have to remember what values you've already hit - for example, we keep hitting 1 over and over - we need to remember "1" has already been hit. Even if there are 1,000,000 intervening values - we need to remember "1 has been hit" so as to not count it again
Here are two approaches - one is a scalar subquery (rt_2b). I do not think (well, I know) it won't scale very well if the result set is huge. The other is an analytic on an analytic. We "mark" the first com_pt_id by distance and only count the first one.
ops$tkyte%ORA10GR2> create table t
2 as
3 select hiredate distance, empno pt_id, mgr com_pt_id
4 from scott.emp;
Table created.
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select pt_id, com_pt_id, distance,
2 count(*) over (order by distance) rt_1,
3 count(case when rn=1 then com_pt_id end) over (order by distance) rt_2a,
4 (select count(distinct com_pt_id) from t t2 where t2.distance <= t1.distance) rt_2b
5 from (select pt_id, com_pt_id, distance,
6 row_number() over (partition by com_pt_id order by distance) rn
7 from t) t1
8 order by distance
9 /
PT_ID COM_PT_ID DISTANCE RT_1 RT_2A RT_2B
---------- ---------- --------- ---------- ---------- ----------
7369 7902 17-DEC-80 1 1 1
7499 7698 20-FEB-81 2 2 2
7521 7698 22-FEB-81 3 2 2
7566 7839 02-APR-81 4 3 3
7698 7839 01-MAY-81 5 3 3
7782 7839 09-JUN-81 6 3 3
7844 7698 08-SEP-81 7 3 3
7654 7698 28-SEP-81 8 3 3
7839 17-NOV-81 9 3 3
7902 7566 03-DEC-81 11 4 4
7900 7698 03-DEC-81 11 4 4
7934 7782 23-JAN-82 12 5 5
7788 7566 09-DEC-82 13 5 5
7876 7788 12-JAN-83 14 6 6
14 rows selected.