Skip to Main Content
  • Questions
  • Random distribution of data based on a column value

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Rizwan.

Asked: June 29, 2006 - 1:00 am UTC

Last updated: June 30, 2006 - 4:29 pm UTC

Version: 9.2.0.5

Viewed 1000+ times

You Asked

Tom,
Thanks in advance for your time.

I have a table:

create table t ( id number, priority number, weight number );

with this data in it:

insert into t values ( 1, 100, 60);
insert into t values ( 2, 100, 20);
insert into t values ( 3, 100, 10);
insert into t values ( 4, 100, 10);


insert into t values ( 5, 200, 10);
insert into t values ( 6, 200, 80);
insert into t values ( 7, 200, 10);

I need to write a query which returns the id such that for the same priority(egs. 100) I get 1 returned approximately 60% times, 2 returned 20% times and 3,4 returned 10% times.

Is there anyway I can do it by analytics or otherwise?

Regards,
Rizwan



and Tom said...

ops$tkyte@ORA10GR2> create table data( id number );
Table created.


ops$tkyte@ORA10GR2> variable n number
ops$tkyte@ORA10GR2> exec :n := 100
PL/SQL procedure successfully completed.

ops$tkyte@ORA10GR2> select id, nvl( lag(hiweight) over (order by id),0) loweight, hiweight
2 from (
3 select id, sum(weight/100) over (order by id) hiweight
4 from t
5 where priority = :n
6 )
7 /

ID LOWEIGHT HIWEIGHT
---------- ---------- ----------
1 0 .6
2 .6 .8
3 .8 .9
4 .9 1

we need a range...

Yes, in the following rownum is relevant - you could also use the undocumented /*+ MATERIALIZE */ hint instead, but rownum does it (materializes the view, making rnd deterministic with a single row)


ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> with rnd
2 as
3 (select dbms_random.value*rownum r
4 from dual
5 ),
6 data
7 as
8 (
9 select id, nvl( lag(hiweight) over (order by id),0) loweight, hiweight
10 from (
11 select id, sum(weight/100) over (order by id) hiweight
12 from t
13 where priority = :n
14 )
15 )
16 select *
17 from data, rnd
18 where loweight < rnd.r
19 and hiweight >= rnd.r
20 /

ID LOWEIGHT HIWEIGHT R
---------- ---------- ---------- ----------
1 0 .6 .295654975

and just to test it out:

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> declare
2 l_id number;
3 begin
4 for i in 1 .. 1000
5 loop
6 with rnd
7 as
8 (select dbms_random.value*rownum r
9 from dual
10 ),
11 data
12 as
13 (
14 select id, nvl( lag(hiweight) over (order by id),0) loweight, hiweight
15 from (
16 select id, sum(weight/100) over (order by id) hiweight
17 from t
18 where priority = :n
19 )
20 )
21 select id into l_id
22 from data, rnd
23 where loweight < rnd.r
24 and hiweight >= rnd.r;
25 insert into data (id) values ( l_id );
26 end loop;
27 end;
28 /

PL/SQL procedure successfully completed.

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> select id, cnt, round(ratio_to_report(cnt) over () *100,2) rtr
2 from (
3 select id, count(*) cnt from data group by id
4 )
5 /

ID CNT RTR
---------- ---------- ----------
1 590 59
2 205 20.5
3 111 11.1
4 94 9.4

the approximate distributions you were looking for


Rating

  (5 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Random distribution of data based on a column value

Rizwan Qazi, June 29, 2006 - 12:18 pm UTC

Great! Thanks Tom...this is exactly what I was looking for!

Regards,
Rizwan

... Curious why you used dbms_random.value*rownum from dual

Venkat, June 30, 2006 - 11:20 am UTC

on line 8-9

select dbms_random.value*rownum r
from dual


Tom Kyte
June 30, 2006 - 12:46 pm UTC

I stated why:

Yes, in the following rownum is relevant - you could also use the undocumented
/*+ MATERIALIZE */ hint instead, but rownum does it (materializes the view,
making rnd deterministic with a single row)

Thanks Tom.. I should have paid more attention to your first response.

Venkat., June 30, 2006 - 2:32 pm UTC


Curious why you used row by row method instead of set?

Daniel, June 30, 2006 - 3:41 pm UTC

insert into data(id)
with rnd
as
(select dbms_random.value r
from dual
connect by level <= 1000
),
data
as
(
select id, nvl( lag(hiweight) over (order by id),0) loweight, hiweight
from (
select id, sum(weight/100) over (order by id) hiweight
from t
where priority = :n
)
)
select id
from data, rnd
where loweight < rnd.r
and hiweight >= rnd.r
/

Tom Kyte
June 30, 2006 - 4:29 pm UTC

sure, it would have worked, but I still needed to do it 1,000 times.

I could have done 1,000 single row "set" inserts...

Neat!

Matthew, July 04, 2006 - 10:52 am UTC

I never thought of that!

Thanks again, Tom!

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.