Doesn't this require guesstimating data distributions, skew, and the like? We can use the 'infinite dual'
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:40476301944675 and DBMS_RANDOM, if we're willing to make assumptions.
For example, someone recently posted a question about a "query that's currently taking 2 hours to join the main table S_F_ASSET_VALUE with the Calendar table to get back one row for each day that falls between VALUATION_TYPE_DATE_FROM and VALUATION_TYPE_DATE_TO. So if VALUATION_TYPE_DATE_FROM is 01.Jan.2008 and VALUATION_TYPE_DATE_FROM is 13.Jan.2008, we'd get back 13 rows. It's possible that a row is "OPEN" until the end of times (our max date is 31.12.2099) ... there's also something called the Official date, which is the last day until which dates should be cloned...
There are about 600k rows in the main table when doing the initial load. Calendar's got all the rows from the 1950's until far in the future, and the cloning produces about 1.7 million rows in total."
They provided the query. I was able to construct a similar scenario using DBMS_RANDOM and the 'infinite dual' like so:
-- **********************************************
-- SET UP TABLES AND ROWS ***********************
-- **********************************************
create table d_calendar as
select to_number(
to_char(
to_date('19491231','YYYYMMDD')+level -- all the rows from the 1950's
,'yyyymmdd')
) calendar_dw_id
, to_date('19491231','YYYYMMDD')+level calendar_dt
from dual connect by level < 150*366 -- until far in the future
/
alter table d_calendar modify
( calendar_dw_id primary key
, calendar_dt unique )
/
create table S_F_ASSET_VALUE as
Select dt AS VALUATION_TYPE_DATE_FROM
, dt+DBMS_RANDOM.VALUE(1,10)
AS VALUATION_TYPE_DATE_TO
from (
select to_date('19491231','YYYYMMDD')
+ DBMS_RANDOM.VALUE(1,150*365)
AS dt
from dual
connect by level < 600000 -- 600k rows in the main table
)
/
-- **********************************************
-- QUERY ****************************************
-- **********************************************
SELECT count(*) FROM
S_F_ASSET_VALUE,
(select 20100101 DATE_OFFICIAL from dual )DATE_OFFICIAL,
d_calendar
where (calendar_dw_id between
greatest(
to_number(
to_char(VALUATION_TYPE_DATE_FROM ,'yyyymmdd')
),20000101
)
and
least(
to_number(
to_char(
VALUATION_TYPE_DATE_TO,'yyyymmdd')
),DATE_OFFICIAL)
)
or
(calendar_dw_id=to_number(to_char(VALUATION_TYPE_D ATE_FROM,'yyyymmdd'))
and calendar_dw_id<20000101)
/
Sure it's time-consuming (in terms of planning and the like), but generating data is pretty quick in implementation.
March 30, 2008 - 8:33 am UTC
... Doesn't this require guesstimating data distributions, skew, and the like? ...
yes, it does sort of, or at least analyzing the existing data and inflating it using the observed values - also you have domain value ranges to obey, uniqueness rules, foreign keys, etc...
if you just want random data, I've used this in the past:
create or replace procedure gen_data( p_tname in varchar2, p_records in number )
authid current_user
as
l_insert long;
l_rows number default 0;
begin
dbms_application_info.set_client_info( 'gen_data ' || p_tname );
l_insert := 'insert /*+ append */ into ' || p_tname ||
' select ';
for x in ( select data_type, data_length,
nvl(rpad( '9',data_precision,'9')/power(10,data_scale),9999999999) maxval
from user_tab_columns
where table_name = upper(p_tname)
order by column_id )
loop
if ( x.data_type in ('NUMBER', 'FLOAT' ))
then
l_insert := l_insert || 'dbms_random.value(1,' || x.maxval || '),';
elsif ( x.data_type = 'DATE' )
then
l_insert := l_insert ||
'sysdate+dbms_random.value+dbms_random.value(1,1000),';
else
l_insert := l_insert || 'dbms_random.string(''A'',' ||
x.data_length || '),';
end if;
end loop;
l_insert := rtrim(l_insert,',') ||
' from all_objects where rownum <= :n';
loop
execute immediate l_insert using p_records - l_rows;
l_rows := l_rows + sql%rowcount;
commit;
dbms_application_info.set_module( l_rows || ' rows of ' || p_records, '' );
exit when ( l_rows >= p_records );
end loop;
end;
/