I have a table with about 2000 records of males and females of different races. I would like to create 10 random sets of 50 people using the following criteria
50% of 50 (so 25) females (can be part of non-white)
30% of 50 (so 15) non-white of either gender
20% of 50 (so 10) white males
The columns are basically: PERSON_ID_NUM, GENDER (M or F), and RACE (B=Black,W=White,H=Hispanic)
Again I would like to create 10 separate data sets of 50 patients that are made up of that breakdown. Is there a good way to use SQL to do this? I can't seem to find anything that works out. It is a very small data sample so I can do it by hand but I thought the challenge was interesting. Thanks for any help/information
I'm assuming the criteria means "at least", because (for example) the first 2 criteria means you could have up to 80% females in a given group.
You could do something like the following - to get the total population sample needed in random order, and then carve them into groups. I'll make no claim that this is particularly scalable :-)
SQL> create table t (
2 id int,
3 gender varchar2(1),
4 race varchar2(1)
5 );
Table created.
SQL>
SQL> exec dbms_random.seed(0); -- just to make it repeatable
PL/SQL procedure successfully completed.
--
-- 10,000 random patients
--
SQL> insert into t
2 select
3 rownum,
4 case when dbms_random.value > 0.6 then 'M' else 'F' end,
5 case when dbms_random.value > 0.7 then 'A'
6 when dbms_random.value > 0.4 then 'B'
7 else 'C' end
8 from
9 dual
10 connect by level <= 100000;
100000 rows created.
SQL>
SQL>
SQL>
SQL> drop table results purge;
Table dropped.
--
-- get 250 females, 150 non race a, and 100 males race a at random
--
SQL>
SQL> create table results as
2 with
3 females as
4 ( select * from
5 ( select * from t where gender = 'F' order by dbms_random.value )
6 where rownum <= 250 ),
7 race_non_a as
8 ( select * from
9 ( select * from t where race != 'A'
10 and id not in ( select id from females ) order by dbms_random.value )
11 where rownum <= 150 ),
12 males_race_a as
13 ( select * from
14 ( select * from t where gender = 'M' and race = 'A'
15 and id not in ( select id from females )
16 and id not in ( select id from race_non_a ) order by dbms_random.value )
17 where rownum <= 100 )
18 select mod(rownum,10) grp, f.*
19 from females f
20 union all
21 select mod(rownum,10) grp, r.*
22 from race_non_a r
23 union all
24 select mod(rownum,10) grp, m.*
25 from males_race_a m
26 order by 1, 2;
Table created.
SQL>
SQL> select
2 grp,
3 count(case when gender = 'F' then 1 end)/ count(*) females,
4 count(case when gender = 'M' and race = 'A' then 1 end)/ count(*) males_race_a,
5 count(case when race != 'A' then 1 end)/ count(*) race_non_a
6 from results
7 group by grp
8 order by 1;
GRP FEMALES MALES_RACE_A RACE_NON_A
---------- ---------- ------------ ----------
0 .68 .2 .64
1 .68 .2 .64
2 .74 .2 .62
3 .68 .2 .62
4 .66 .2 .64
5 .68 .2 .6
6 .68 .2 .66
7 .64 .2 .66
8 .7 .2 .68
9 .64 .2 .7
10 rows selected.
SQL>
SQL>
SQL>