Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Maik.

Asked: August 14, 2003 - 8:17 am UTC

Last updated: April 17, 2009 - 9:26 am UTC

Version: 920

Viewed 10K+ times! This question is

You Asked

Hi Tom,

can u please explain, why in the first select i get different random values in A and B, whereas in
the second select these values are equal:

Verbunden mit:
Oracle9i Enterprise Edition Release 9.2.0.2.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.2.0 - Production

SQL> select x.num a, x.num b from (select dbms_random.value() num from dba_extents) x
where rownum < 10;

A B
---------- ----------
,077041949 ,617012263
,407078487 ,65212234
,091738267 ,845731559
,725483693 ,492466273
,302446816 ,338863591
,610226867 ,355880581
,263611497 ,298027832
,224679802 ,573592356
,836255197 ,334109933

9 Zeilen ausgewählt.

SQL>
SQL> select x.num a, x.num b from (select rownum,dbms_random.value() num from dba_extents) x
where rownum < 10;

A B
---------- ----------
,326255537 ,326255537
,742920752 ,742920752
,580281292 ,580281292
,584769609 ,584769609
,70732845 ,70732845
,362292964 ,362292964
,127140852 ,127140852
,971983448 ,971983448
,445556903 ,445556903

9 Zeilen ausgewählt.


1000! * Thanks



and Tom said...

i spent a great deal of time on this phenomena in my book coming out this month -- "tuning with rownum".


rownum (aggregates, set operations, many other things) force us to materialize the subquery - and then use that.

the first query was "merged", that is:

select x.num a, x.num b from (select dbms_random.value() num from
dba_extents) x
where rownum < 10;

we processed as it's equivalent:

select dbms_random.value() a, dbms_random.value() b from dba_extents
where rownum < 10;


the second

select x.num a, x.num b from (select rownum,dbms_random.value() num from
dba_extents) x
where rownum < 10;

cannot do that "merging" as it would change the effects of rownum in the inner query. hence that was more like:

insert into temp ( xxxx, num ) select select rownum,dbms_random.value() num from
dba_extents;

select num, num from temp where rownum < 10;




I use this "fact" as a tuning technique. I have a couple of interesting (multi-page) examples in the book coming out to further show where you might use this fact.


Rating

  (10 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

How does this impact tuning?

A reader, August 14, 2003 - 10:37 am UTC

Very interesting! But how do you use this merge/no-merge effect for tuning?

Thanks for the continuing education!

Tom Kyte
August 14, 2003 - 11:02 am UTC

here is one of the shorter examples from my book -- there are others, reducing the number of times a function is called, pagination (very important), top-n queries...


Join Queries

Another use of ROWNUM comes into play when you have this conundrum: You run two queries, and each query by itself runs very fast, in a couple hundredths of a second. You need to join these two queries together. This takes a really long runtime. What to do? Well, you can use this same ROWNUM trick, as follows:

Select * 
   From ( select ?., ROWNUM r1 from ?. )  query1,
        ( select ?, ROWNUM r2 from ? ) query2
  Where join_condition?


You should find the runtime to be that of the two individual queries give or take. Now, bear in mind, I generally see these sorts of issues in databases where something is wrong: Statistics are wrong or missing, or some setting is set totally inappropriately. For example, I used this technique on a system where a domain index was added to a table. That caused the CBO to be used as shown here:

ops$tkyte@ORA920> create table t ( x clob );
Table created.

ops$tkyte@ORA920> set autotrace traceonly explain
ops$tkyte@ORA920> select * from t;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   TABLE ACCESS (FULL) OF 'T'

ops$tkyte@ORA920> create index t_idx on t(x) indextype is ctxsys.context;
Index created.

ops$tkyte@ORA920> select * from t;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=82 Bytes=164164)
   1    0   TABLE ACCESS (FULL) OF 'T' (Cost=2 Card=82 Bytes=164164)


The developers had a text query that ran really fast. They had a relational query that ran really fast. They put them together, and it ran really slowly. The reason is that they were using the CBO against unanalyzed tables. They did not realize that by putting a domain index on that table, every query that would hit it would use the CBO. The individual queries got lucky, but when they were run together, the optimizer (given its lack of information) made a bad decision. As a stopgap measure to make the query work, ROWNUM was very useful. However, they need to start analyzing tables. At that point, removal of the ROWNUM trick would probably allow the query to run even faster.
 

NO_MERGE hint.

Kashif, August 14, 2003 - 3:04 pm UTC

Hi Tom,

The same effect can be acheived using the NO_MERGE hint:

Wrote file afiedt.buf

  1  select x.num a, x.num b from (select /*+ NO_MERGE */ dbms_random.value() num from
  2  dba_extents) x
  3*      where rownum < 10
SQL> /

         A          B
---------- ----------
.016041082 .016041082
.769042327 .769042327
.462616778 .462616778
.162651294 .162651294
.486199947 .486199947
.541907443 .541907443
.050641219 .050641219
.488757292 .488757292
.034866065 .034866065

9 rows selected.

Does rownum provide any value over the NO_MERGE hint?

Secondly, the way the subquery is executed without the rownum or NO_MERGE hints seems counter-intuitive, i.e. if I am selecting the same column from the subquery then it should return the same value. Is there a good reason why the optimizer was not designed to perform the temp segment optimization (i.e. materialize subquery and then produce results) by default? Did I miss the boat completely?

Thanks in advance.

Kashif 

Tom Kyte
August 14, 2003 - 3:12 pm UTC

yes, no_merge works in many cases.

to address your second question -- it is "not" an optimization in all cases. not merging can be deadly, we merge whenever possible.

in this case, no_merge might be the "better" choice (compare it with rownum)



Extreme!

Christo Kutrovsky, August 14, 2003 - 3:54 pm UTC

Now this is something extremelly interesting.

However in my specific case this doesn't seem to help to force oracle to materialize my subqueries.

I do not have the full example handy, but the general concept is:

select ..
from table t
where (t.v1 in (select v from table1a, table1b where ..)
or t.v2 in (select v from table2a, table2b where..)
or t.v1 = 0
)
and ...

And this is pretty expensive. Running each query separatelly, and pasting the values in the "in (..." part generates 10 times less consistent gets.



Nice Topic

Srinivas, March 25, 2004 - 7:16 am UTC

Thanks Tom for your Efforts

Is there any other way to Generate Random Numbers between a set of numbers.



Tom Kyte
March 25, 2004 - 9:20 am UTC

dbms_random.value( a,b ) does that.

Su Baba, July 21, 2006 - 2:47 am UTC

In the select statement below, it seems the subquery in both SQLs only gets executed once. Is there anyway to get it executed once per output record so the result is more random?

CREATE TABLE x (
name VARCHAR2(20),
rn NUMBER
);

INSERT INTO x VALUES ('John', 1);
INSERT INTO x VALUES ('Mary', 2);
INSERT INTO x VALUES ('Larry', 3);
INSERT INTO x VALUES ('Harry', 4);
INSERT INTO x VALUES ('Nicole', 5);

commit;

CREATE UNIQUE INDEX x_u1 ON x(rn);

-- SQL #1
SELECT rownum,
(SELECT name
FROM x
WHERE rn = TRUNC(dbms_random.value(1, 6))) name
FROM all_objects
WHERE rownum < 10;

ROWNUM NAME
---------- -----
1 Larry
2 Larry
3 Larry
4 Larry
5 Larry
6 Larry
7 Larry
8 Larry
9 Larry

-- SQL #2
SELECT b.name
FROM all_objects a,
(SELECT name, rownum
FROM x
WHERE rn = TRUNC(dbms_random.value(1, 6))) b
WHERE rownum < 10;

NAME
-----
Mary
Mary
Mary
Mary
Mary
Mary
Mary
Mary
Mary



Tom Kyte
July 22, 2006 - 5:55 pm UTC

you'd have to defeat the scalar subquery caching mechanism:

ops$tkyte%ORA10GR2> sELECT rownum,
  2        (SELECT name
  3         FROM   x
  4         WHERE  rn = TRUNC(dbms_random.value(1, 6))
  5           and all_objects.object_id > 0 ) name
  6  FROM   all_objects
  7  WHERE  rownum < 10;

    ROWNUM NAME
---------- ------------------------------
         1 Harry
         2 Mary
         3 Larry
         4 Mary
         5 Mary
         6 Nicole
         7 John
         8 Harry
         9 Larry

9 rows selected.


By sending in something unique to the scalar subquery (all_objects.object_id is the "primary key" of all_objects) we defeat the caching of scalar subqueries.

If you are interested in "scalar subquery caching" search for that quoted string on this site. 

dbms_random in subquery

dharanidhar, March 02, 2007 - 8:25 am UTC

my query :
select X from
(select dbms_random.string('x',6) X from user_objects)
where replace(translate(upper(X),'ABCDEFGHIJKLMNOPQRSTUVXYZ','AAAAAAAAAAAAAAAAAAAAAAAAAA'),'A','')is not null;

in output some rows containing
X
44LFT7
Y41I81
Q4TMHH
99F92B
IXV4N7
XXMEE5
34GCT0
V6WRS6
2LYQMD
MEI7R4
W04SQN
9H0XKW
V6CDQN
DR8D8P
Y7KUAN
YAKTO4
5VHUE7
5O758D
ONZ5KP
NHZYZS
0V09LT
G9K4S3
GNTWVX
79XKYU
G6SD5I
Y6D6FA
MJMPNV
WDWRPX
UZW1SJ
0TQJE1
XM7MC0
Q7AA6B
BN2NEX
0WIVCF
M9EV4H
UMOAYF
NBX9JC
AIQPCH
all alphabets .could please explain why i got these results
Tom Kyte
March 04, 2007 - 6:02 pm UTC

select X from
(select dbms_random.string('x',6) X from user_objects)
where replace(translate(upper(X),'ABCDEFGHIJKLMNOPQRSTUVXYZ','AAAAAAAAAAAAAAAAAAAAAAAAAA'),'A','')is not null;


is to the compiler the same as

select dbms_random.string('x',6) X
from user_objects
where replace( translate(upper(dbms_random.string('x',6),'...','....' )
is not null;

view merging.

You want to do something that'll prevent the view merging - such as:

ops$tkyte%ORA10GR2> select X
  2    from (select dbms_random.string('x',6) X
  3            from (select level l from dual connect by level <= 10)
  4         )
  5   where replace(translate(upper(X),'ABCDEFGHIJKLMNOPQRSTUVXYZ','AAAAAAAAAAAAAAAAAAAAAAAAAA'),'A','')is not null;

X
----------
R69GXC
G6XI7V
3GWTNL
BFIRAK
81CECA
FCGHJS
8316ZP
EUN6AK
YEX8OU
4Y1SU4

10 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3020077840

----------------------------------------------------------------------------------------
| Id  | Operation                       | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |      |     1 |  2002 |     2   (0)| 00:00:01 |
|   1 |  VIEW                           |      |     1 |  2002 |     2   (0)| 00:00:01 |
|*  2 |   FILTER                        |      |       |       |            |          |
|   3 |    VIEW                         |      |     1 |       |     2   (0)| 00:00:01 |
|*  4 |     CONNECT BY WITHOUT FILTERING|      |       |       |            |          |
|   5 |      FAST DUAL                  |      |     1 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(REPLACE(TRANSLATE(UPPER("DBMS_RANDOM"."STRING"('x',6)),'ABCDEFGHIJ
              KLMNOPQRSTUVXYZ','AAAAAAAAAAAAAAAAAAAAAAAAAA'),'A','') IS NOT NULL)
   4 - filter(LEVEL<=10)

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> column x format a10
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> with data
  2  as
  3  (select dbms_random.string('x',6) x, rownum r
  4     from (select level l from dual connect by level <= 10)
  5  )
  6  select *
  7    from data
  8   where replace(translate(upper(X),'ABCDEFGHIJKLMNOPQRSTUVXYZ','AAAAAAAAAAAAAAAAAAAAAAAAAA'),'A','')is not null;

X                   R
---------- ----------
NBKY0M              2
5JAIAE              3
WWZLKZ              4
G1675A              5
6947EM              6
G95H2H              7
5UFUWA              8
5K8JKK              9
LRRGD3             10

9 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1045347588

----------------------------------------------------------------------------------------
| Id  | Operation                       | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |      |     1 |  2015 |     2   (0)| 00:00:01 |
|*  1 |  VIEW                           |      |     1 |  2015 |     2   (0)| 00:00:01 |
|   2 |   COUNT                         |      |       |       |            |          |
|   3 |    VIEW                         |      |     1 |       |     2   (0)| 00:00:01 |
|*  4 |     CONNECT BY WITHOUT FILTERING|      |       |       |            |          |
|   5 |      FAST DUAL                  |      |     1 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(REPLACE(TRANSLATE(UPPER("X"),'ABCDEFGHIJKLMNOPQRSTUVXYZ','AAAAAAAA
              AAAAAAAAAAAAAAAAAA'),'A','') IS NOT NULL)
   4 - filter(LEVEL<=10)



see the differences in the filter...

merge still occurs under insert

David, April 02, 2007 - 1:45 pm UTC

This is on Oracle 10g 10.2.0.3.0 (Windows)

create or replace view pvt as
select level-1 as x from dual connect by level <= 1000000;


Lets generate a sparse set with a 20% chance of having a value between 0 and 4.
select a,b
  from (select p1.x as a,
               decode(mod(trunc(dbms_random.value(0, 5)), 5),
                      0,
                      trunc(dbms_random.value(0, 5))) as b
          from pvt p1
         where x < 100)
 where b is not null;
0 Rows selected in 0.016 seconds.


0 Rows? Thats unlikely. There must be a push going on here. Lets throw in a rownum to prevent it.

select a,b
  from (select p1.x as a,
               decode(mod(trunc(dbms_random.value(0, 5)), 5),
                      0,
                      trunc(dbms_random.value(0, 5))) as b,
                      rownum
          from pvt p1
         where x < 100)
 where b is not null;


gives the desired results. Lets put this in a table.
drop table test_foo;
create table test_foo (a number, b number);
insert into test_foo
  (a, b)
  (select a, b
     from (select p1.x as a,
                  decode(mod(trunc(dbms_random.value(0, 5)), 5),
                         0,
                         trunc(dbms_random.value(0, 5))) as b,
                         rownum
             from pvt p1
            where x < 100)
    where b is not null);
select * from test_foo where b is null;

15 Rows with null b?!


Why did rownum not work under the insert? I've seen similar bugs creep in with analytics and update queries. Why is the insert code path different from the select code path? This works fine if I run
create table as select



A reader, October 01, 2008 - 1:24 am UTC


RANDOM VALUE - uniform distribute?

Saurav Mukherjee, April 15, 2009 - 9:13 pm UTC

Hi Tom,
I always admire the smart solutions you provide.
I am having problem in generating true random numbers.
Basically, i have 8 to 10 million records in a table. I want to select 100 random records everytime.
here's the pl/sql block i am using.
I have data as below
probe_map_trove -- only ONE row to select the current active MAP_ID. has the TOTAL area for all
the detail records in below table
probe_map_area_trove -- has 8 to 10 milliion records as below

SITE_ID | LOCATION_ID |FRONT_START |FRONT_END | MAP_ID
1 | 11 |0 |250 |1
1 |12 |251 |450 |1
1 | 11 |451 |500 |1
1 |12 |501 |550 |1
2 |21 |551 |700 |1
2 |22 |701 |1025 |1

Total area = 250+450+500+550+700+1025 , this value is in probe_map_trove table

CREATE OR REPLACE PROCEDURE proc_random_num_ms AS
x BINARY_INTEGER;
BEGIN
DELETE FROM saurav_probe_result_us_1;
FOR i IN 1 .. 1200
LOOP
INSERT INTO saurav_probe_result_us_1
SELECT --+ ordered(A,B)
SYSDATE itime, location_id
FROM probe_map_trove a, probe_map_area_trove b
WHERE a.active = 1
AND b.map_id = a.ID
AND ROWNUM <= 100
AND (DBMS_RANDOM.VALUE (0,1)*1000000000) BETWEEN b.front_start AND b.front_end;
--AND DBMS_RANDOM.VALUE (0, 1) * a.area BETWEEN b.front_start AND b.front_end;
COMMIT;
--DBMS_LOCK.sleep(seconds => 5);
END LOOP;
END;
But, i did few test runs for 1 hr to 24 hrs periods to see if all the records are getting selected
atleast at some point in time. But, only 8.5K distinct location_id records are getting selected. But, i actually have 850K+ distinct location_ids. And i am looking to use RANDOM value to have atleast 250K+ distinct locationids to be selected in 24 hr period.

i am basically looking for RANDOM to generate "uniformly distributed" random numbers. But, it seems "normalized random" numbers?
Another thing is, in older system, we used JAVA function "nextDouble" to get this kind of random number and used the same area logic to randomize the records. Now, we are tryin in Oracle and it's not working.
Can you please help?
Tom Kyte
April 16, 2009 - 9:12 am UTC

... Basically, i have 8 to 10 million records in a table. I want to select 100
random records everytime.
...

not going to be possible if you think about it. How do you get 100 everytime? Your approach - stops at 100 records, but you are always getting the first 100 (eg: the odds you ever get rows from the 'end' of the tables is about zero percent)


Why not use the SAMPLE clause? Scanning a table that has 8-10 million rows twice in a query is bad enough, doing it 1200 times in a loop - ugh.

Just sample the data? Do you know of the sample clause?

RANDOM VALUE - uniform distribute

Saurav Mukherjee, April 16, 2009 - 4:25 pm UTC

Thanks tom for you help,
CREATE OR REPLACE PROCEDURE proc_random_num_ms1 AS
BEGIN
DELETE FROM saurav_probe_result_us_2;
FOR i IN 1 .. 1200
LOOP
INSERT INTO saurav_probe_result_us_2
SELECT --+ ordered(A,B)
SYSDATE itime, location_id
FROM probe_map_trove a,
(select location_id,front_start,front_end,map_id from probe_map_area_trove sample(0.01199417562831489028927552780369852399675))b
WHERE a.active = 1
AND b.map_id = a.ID
AND getrandom*a.area BETWEEN b.front_start AND b.front_end ;
COMMIT;
--DBMS_LOCK.sleep(seconds => 5);
END LOOP;
END;

select count(distinct location_id) from saurav_probe_result_us_2 --20921
select count(*) from saurav_probe_result_us_2 --21227


where 0.01199417562831489028927552780369852399675 represent 100 recs from probemap_area_table

Eventhough the number of distinct location ids has improved a lot but the total count of
record(21227) is much less in comparision to total number of iterations(0.12 million records).

can you please throw some light on it as in every iterations we are not
getting 100 records
Tom Kyte
April 17, 2009 - 9:26 am UTC

you are getting a RANDOM SAMPLE of "about" 0.011

random samples are never going to be "exactly 100 records", never - random-ness wouldn't be random if you got *precisely* 100 records - it is not possible.

Anything that gets *precisely* 100 records - isn't going to be random.



and I completely do not understand why you iterate 12 times, why not just SAMPLE 12 times as much data and do it *ONCE*

and lose that hint, just get rid of it, let the optimizer do the job it is supposed to do.

More to Explore

VLDB

If you are new to partitioning, check out Connor McDonald's introduction series here.

VLDB

Documentation set on VLDB and Partitioning.