Skip to Main Content

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Paul.

Asked: June 24, 2020 - 6:59 pm UTC

Answered by: Chris Saxon - Last updated: June 29, 2020 - 2:36 pm UTC

Category: SQL - Version: 12c

Viewed 1000+ times

You Asked

I have the following setup (test case below). How can the query be modified to INSERT N number of rows per employee_id.

I want to add something like this but can't figure how to implement this for my test case

( select level dbms_random.value(5,20) children from dual connect by <= children)


ALTER SESSION SET NLS_DATE_FORMAT = 'MMDDYYYY HH24:MI:SS';

-- create and populate an employee 
-- table with 50 rows
   
    CREATE TABLE employees AS
    SELECT level AS employee_id,
    
    CASE round(dbms_random.value(1,20)) 
            WHEN 1 THEN 'Albert' 
            WHEN 2 THEN 'Tom' 
            WHEN 3 THEN 'Anna'
            WHEN 4 THEN 'Ty' 
            WHEN 5 THEN 'Andy' 
            WHEN 6 THEN 'Thomas' 
            WHEN 7 THEN 'Alan'
            WHEN 8 THEN 'Tara' 
            WHEN 9 THEN 'Cheryl' 
            WHEN 10 THEN 'Ed' 
            WHEN 11 THEN 'Steve'
            WHEN 12 THEN 'Mel' 
            WHEN 13 THEN 'Micheal' 
            WHEN 14 THEN 'Ron' 
            WHEN 15 THEN 'Donald'
            WHEN 16 THEN 'Donny' 
            WHEN 17 THEN 'Racheal' 
            WHEN 18 THEN 'Debbie' 
            WHEN 19 THEN 'Madison'
            WHEN  20 THEN 'Danny' 
         END AS first_name,

    CASE  round(dbms_random.value(1,20)) 
            WHEN 1 THEN 'Andrews' 
            WHEN 2 THEN 'Thorton' 
            WHEN 3 THEN 'Smith'
            WHEN 4 THEN 'Jones' 
            WHEN 5 THEN 'Ott' 
            WHEN 6 THEN 'Stevens' 
            WHEN 7 THEN 'Feldman'
            WHEN 8 THEN 'Stein' 
            WHEN 9 THEN 'Ross' 
            WHEN 10 THEN 'Eden' 
            WHEN 11 THEN 'Saltzman'
            WHEN 12 THEN 'Kramer'
            WHEN 13 THEN 'Monroe' 
            WHEN 14 THEN 'Hanks' 
            WHEN 15 THEN 'Dunn'
            WHEN 16 THEN 'Dunbar' 
            WHEN 17 THEN 'Rucker' 
            WHEN 18 THEN 'Silverberg' 
            WHEN 19 THEN 'Daniels'
            WHEN  20 THEN 'Kahn' 
         END AS last_name, 
        
    dbms_random.string('X',        dbms_random.value(5, 10))  AS card_num

    FROM   dual
    CONNECT BY level <= 50;


    ALTER TABLE employees
         ADD ( CONSTRAINT employee_id_pk
       PRIMARY KEY (employee_id));

    -- create and populate a location 
    -- table with 10 rows. Randomly
    -- make some types 'A' for access
    -- 'T' for time and attendance,
    -- 'G' for guard tour.

    CREATE TABLE locations AS
    SELECT level AS location_id,
       'Door ' || level AS location_name,

    CASE round(dbms_random.value(1,3)) 
            WHEN 1 THEN 'A' 
            WHEN 2 THEN 'T' 
            WHEN 3 THEN 'G' 
         END AS location_type

    FROM   dual
    CONNECT BY level <= 25;


     ALTER TABLE locations 
         ADD ( CONSTRAINT location_id_pk
       PRIMARY KEY (location_id));


    create table access_history(
       employee_id NUMBER(6), 
       card_num varchar2(10),
       location_id number(4),
       access_date date,
       processed NUMBER(1) default 0
    );

    INSERT into access_history
     (employee_id,
       card_num,
      location_id,
       access_date)
       with all_combos as
          ( select e.*, l.*
          from   employees e, locations l
         )

        select *
         from (
           select employee_id, card_num,
                   location_id, 
                 trunc(sysdate) +     dbms_random.value (0, 2) + dbms_random.value (0, .75)
            from   all_combos
            order by dbms_random.value
      );
 


and we said...

Here's one way to get N - M rows for each row in a table:

- Create a table/generate 20 rows
- Call dbms_random for each employee to give you the N rows for that row
- Join the two where the row number for the first table <= N

For example:

CREATE TABLE employees AS
SELECT level AS employee_id,
CASE round(dbms_random.value(1,20)) 
      WHEN 1 THEN 'Albert' 
      WHEN 2 THEN 'Tom' 
      WHEN 3 THEN 'Anna'
      WHEN 4 THEN 'Ty' 
      WHEN 5 THEN 'Andy' 
      WHEN 6 THEN 'Thomas' 
      WHEN 7 THEN 'Alan'
      WHEN 8 THEN 'Tara' 
      WHEN 9 THEN 'Cheryl' 
      WHEN 10 THEN 'Ed' 
      WHEN 11 THEN 'Steve'
      WHEN 12 THEN 'Mel' 
      WHEN 13 THEN 'Micheal' 
      WHEN 14 THEN 'Ron' 
      WHEN 15 THEN 'Donald'
      WHEN 16 THEN 'Donny' 
      WHEN 17 THEN 'Racheal' 
      WHEN 18 THEN 'Debbie' 
      WHEN 19 THEN 'Madison'
      WHEN  20 THEN 'Danny' 
   END AS first_name,
CASE  round(dbms_random.value(1,20)) 
      WHEN 1 THEN 'Andrews' 
      WHEN 2 THEN 'Thorton' 
      WHEN 3 THEN 'Smith'
      WHEN 4 THEN 'Jones' 
      WHEN 5 THEN 'Ott' 
      WHEN 6 THEN 'Stevens' 
      WHEN 7 THEN 'Feldman'
      WHEN 8 THEN 'Stein' 
      WHEN 9 THEN 'Ross' 
      WHEN 10 THEN 'Eden' 
      WHEN 11 THEN 'Saltzman'
      WHEN 12 THEN 'Kramer'
      WHEN 13 THEN 'Monroe' 
      WHEN 14 THEN 'Hanks' 
      WHEN 15 THEN 'Dunn'
      WHEN 16 THEN 'Dunbar' 
      WHEN 17 THEN 'Rucker' 
      WHEN 18 THEN 'Silverberg' 
      WHEN 19 THEN 'Daniels'
      WHEN  20 THEN 'Kahn' 
   END AS last_name, 
  dbms_random.string('X',        dbms_random.value(5, 10))  AS card_num
FROM   dual
CONNECT BY level <= 50;

with rws as (
  select level rn 
  from   dual
  connect by level <= 20
), emps as (
  select /*+ materialize */e.*, round ( dbms_random.value(5,20) ) n
  from   employees e
)
  select employee_id, count (*) rw_count
  from   rws
  join   emps e
  on     rn <= n
  group  by employee_id;

EMPLOYEE_ID    RW_COUNT   
             6           9 
            14           7 
            23          17 
            27          11 
            50          17 
             1          19 
             7          20 
            15          15 
            31           7 
...


The materialize hint is to ensure the database only calls dbms_random once for each employee.

and you rated our response

  (4 ratings)

Reviews

June 25, 2020 - 3:06 pm UTC

Reviewer: Paul

Thanks for the suggestion it was helpful, especially the part about the hint, I was unaware of it.

I would prefer not to reinvent the wheel as I have a working solution, which I posted.

For academic purposes can you please explain why my solution was not modified? Was there a problem with the solution I provided.

I would really like to see where I went wrong trying to implement the random rows per employee. Thanks for your time and expertise. I await your response
Chris Saxon

Followup  

June 26, 2020 - 3:21 pm UTC

For academic purposes can you please explain why my solution was not modified? Was there a problem with the solution I provided.

Two reasons:

- Isolating the example to what you're asking for (repeat each employee's row a random number of times beween N-M) makes it easier to show and explain what's going on. This makes the question more valuable to other users of this site. Including this in your example should be relatively easy (no need to "reinvent the wheel").

- I'm not entirely clear where exactly in your code you're expecting this N row generation to go!

June 26, 2020 - 3:54 pm UTC

Reviewer: Paul

Your solution seemed to work fine. I tried creating a cartesian product between employee and locations table and was unable to get that to work so I could not go any further and create a random date.

I thought I was almost there with what I provided in my test case and was curious where it fell short.



Chris Saxon

Followup  

June 26, 2020 - 4:55 pm UTC

I need some context about what you're trying to achieve here.

Start from the beginning: what exactly are you hoping to be in each table after inserting the data?

June 26, 2020 - 5:45 pm UTC

Reviewer: Paul

For every employee in this case (50) I want a random number (1-20) rows. For each one of those rows I want to associate a random location_id from the locations table along with a random date MMDDYYYY HH24:MI:SS format from sysdate- sysdate+2.

In your example, I was unable to incorporate the locations record and the random date.

I took another approach using the code below, which appears to work fine but I suspect less efficient as your solution.

In an effort to better myself I was trying to understand the shortcomings of my original solution.

Having said that and the risk of not going too far off a tangent I recently sawcchris Sexton video on joins and was wondering if my new solution is a candidate for hash joins.

SELECT e_id, card_num, l_id,
trunc(sysdate) + dbms_random.value (0, 2) + dbms_random.value (0, .75)

FROM (
SELECT employee_id AS e_id, row_number() OVER (ORDER BY dbms_random.value) AS rn, card_num
FROM employees
) x
JOIN (SELECT location_id AS l_id, row_number() OVER (ORDER BY dbms_random.value) AS rn FROM locations) y USING (rn);







Chris Saxon

Followup  

June 29, 2020 - 12:59 pm UTC

You're not getting a random number of rows per employee though - just one.

So what are you looking for?

ONE row per employee linked to a random location?

or

1 - 20 rows for per employee, each of which are joined to a (different?) random location?

or

something else?

I recently saw Chris Sexton video on joins

You mean Chris Saxon? As in me? ;)

my new solution is a candidate for hash joins

The optimizer decides whether to use hash/nested loops/merge join.

Adding more details

June 29, 2020 - 1:08 pm UTC

Reviewer: Paul

It's been worked out. Thanks for your time


INSERT into access_history
(employee_id,
card_num,
location_id,
access_date,
processed)
WITH
emp AS (SELECT e.employee_id, e.card_num, ROWNUM rn FROM employees SAMPLE (99.999) e),
numbers AS (SELECT dbms_random.value(1,25) val, ROWNUM rn
FROM dual CONNECT BY LEVEL <= (SELECT COUNT(*) FROM emp)),
emps AS (SELECT e.employee_id, e.card_num, ROWNUM rn
, (SELECT val FROM numbers n WHERE n.rn = e.rn) nop
FROM emp e,
TABLE(CAST(MULTISET(
SELECT LEVEL lvl FROM dual
CONNECT BY LEVEL <= (SELECT val FROM numbers n WHERE n.rn = e.rn))
AS sys.odciNumberList))),
loc0 AS (SELECT location_id,
ROW_NUMBER() OVER (ORDER BY dbms_random.value) rn,
COUNT(*) OVER() cnt
FROM locations),
loc AS (SELECT TRUNC(dbms_random.value(1,m.cnt)) loc_row, ROWNUM rn
FROM (SELECT 1, cnt FROM loc0 WHERE ROWNUM = 1) m
CONNECT BY LEVEL <= (SELECT COUNT(*) FROM emps))
SELECT e.employee_id, e.card_num, l0.location_id, trunc (sysdate) + dbms_random.value (0, 2.75), 0
FROM emps e, loc l, loc0 l0
WHERE e.rn = l.rn
AND l.loc_row = l0.rn;


Chris Saxon

Followup  

June 29, 2020 - 2:36 pm UTC

OK, I'm pretty sure you can simplify that though...

More to Explore

PL/SQL

Check out more PL/SQL tutorials on our LiveSQL tool.