great!
oraboy, September 15, 2005 - 5:53 pm UTC
thats amazing!!
"analytics rock, they roll. "
- cant agree anymore..may be its time you add to usual phrase
"analytics rock, they roll
- all the way to glory"
:)
5 stars with two thumps-up!
Big deal
Mikito Harakiri, September 15, 2005 - 6:35 pm UTC
with a as (
select count(1) total from all_objects
) select min(object_id), max(object_id), count(*), nt from (
select object_id, floor(rownum*8/total) nt
from all_objects, a
order by object_id
)
group by nt;
Admittedly, 3 times slower
September 15, 2005 - 6:54 pm UTC
so, not nearly as good, I agree - it would be a bad approach.
NTILE rocks and rolls ;) Thanks for confirming that analytics are the coolest thing to happen to SQL since the keyword SELECT!
What a PITA
mdinh, September 15, 2005 - 6:44 pm UTC
Mikito Harakiri, why do you have to be such a PITA?
SYS@TIGGER> set autotrace traceonly
SYS@TIGGER> SELECT MIN (object_id), MAX (object_id), COUNT (*), nt
FROM (SELECT object_id, NTILE (8) OVER (ORDER BY object_id) nt
FROM all_objects)
GROUP BY nt; 2 3 4
8 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 SORT (GROUP BY)
2 1 VIEW
3 2 WINDOW (SORT)
4 3 FILTER
5 4 TABLE ACCESS (BY INDEX ROWID) OF 'OBJ$'
6 5 NESTED LOOPS
7 6 TABLE ACCESS (FULL) OF 'USER$'
8 6 INDEX (RANGE SCAN) OF 'I_OBJ2' (UNIQUE)
9 4 TABLE ACCESS (BY INDEX ROWID) OF 'IND$'
10 9 INDEX (UNIQUE SCAN) OF 'I_IND1' (UNIQUE)
11 4 TABLE ACCESS (BY INDEX ROWID) OF 'OBJAUTH$'
12 11 NESTED LOOPS
13 12 FIXED TABLE (FULL) OF 'X$KZSRO'
14 12 INDEX (RANGE SCAN) OF 'I_OBJAUTH2' (NON-UNIQUE
)
15 4 FIXED TABLE (FULL) OF 'X$KZSPR'
16 4 FIXED TABLE (FULL) OF 'X$KZSPR'
17 4 FIXED TABLE (FULL) OF 'X$KZSPR'
18 4 FIXED TABLE (FULL) OF 'X$KZSPR'
19 4 FIXED TABLE (FULL) OF 'X$KZSPR'
20 4 FIXED TABLE (FULL) OF 'X$KZSPR'
21 4 FIXED TABLE (FULL) OF 'X$KZSPR'
22 4 FIXED TABLE (FULL) OF 'X$KZSPR'
23 4 FIXED TABLE (FULL) OF 'X$KZSPR'
24 4 FIXED TABLE (FULL) OF 'X$KZSPR'
25 4 FIXED TABLE (FULL) OF 'X$KZSPR'
26 4 FIXED TABLE (FULL) OF 'X$KZSPR'
27 4 FIXED TABLE (FULL) OF 'X$KZSPR'
28 4 FIXED TABLE (FULL) OF 'X$KZSPR'
29 4 FIXED TABLE (FULL) OF 'X$KZSPR'
30 4 FIXED TABLE (FULL) OF 'X$KZSPR'
31 4 FIXED TABLE (FULL) OF 'X$KZSPR'
Statistics
----------------------------------------------------------
7 recursive calls
0 db block gets
47893 consistent gets
0 physical reads
0 redo size
873 bytes sent via SQL*Net to client
651 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
8 rows processed
SYS@TIGGER> WITH a AS
2 (SELECT COUNT (1) total
3 FROM all_objects)
4 SELECT MIN (object_id), MAX (object_id), COUNT (*), nt
5 FROM (SELECT object_id, FLOOR (ROWNUM * 8 / total) nt
6 FROM all_objects, a
7 ORDER BY object_id)
8 GROUP BY nt;
9 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 SORT (GROUP BY)
2 1 VIEW
3 2 SORT (ORDER BY)
4 3 COUNT
5 4 FILTER
6 5 TABLE ACCESS (BY INDEX ROWID) OF 'OBJ$'
7 6 NESTED LOOPS
8 7 NESTED LOOPS
9 8 VIEW
10 9 SORT (AGGREGATE)
11 10 FILTER
12 11 TABLE ACCESS (BY INDEX ROWID) OF 'OB
J$'
13 12 NESTED LOOPS
14 13 TABLE ACCESS (FULL) OF 'USER$'
15 13 INDEX (RANGE SCAN) OF 'I_OBJ2' (
UNIQUE)
16 11 TABLE ACCESS (BY INDEX ROWID) OF 'IN
D$'
17 16 INDEX (UNIQUE SCAN) OF 'I_IND1' (U
NIQUE)
18 11 TABLE ACCESS (BY INDEX ROWID) OF 'OB
JAUTH$'
19 18 NESTED LOOPS
20 19 FIXED TABLE (FULL) OF 'X$KZSRO'
21 19 INDEX (RANGE SCAN) OF 'I_OBJAUTH
2' (NON-UNIQUE)
22 11 FIXED TABLE (FULL) OF 'X$KZSPR'
23 11 FIXED TABLE (FULL) OF 'X$KZSPR'
24 11 FIXED TABLE (FULL) OF 'X$KZSPR'
25 11 FIXED TABLE (FULL) OF 'X$KZSPR'
26 11 FIXED TABLE (FULL) OF 'X$KZSPR'
27 11 FIXED TABLE (FULL) OF 'X$KZSPR'
28 11 FIXED TABLE (FULL) OF 'X$KZSPR'
29 11 FIXED TABLE (FULL) OF 'X$KZSPR'
30 11 FIXED TABLE (FULL) OF 'X$KZSPR'
31 11 FIXED TABLE (FULL) OF 'X$KZSPR'
32 11 FIXED TABLE (FULL) OF 'X$KZSPR'
33 11 FIXED TABLE (FULL) OF 'X$KZSPR'
34 11 FIXED TABLE (FULL) OF 'X$KZSPR'
35 11 FIXED TABLE (FULL) OF 'X$KZSPR'
36 11 FIXED TABLE (FULL) OF 'X$KZSPR'
37 11 FIXED TABLE (FULL) OF 'X$KZSPR'
38 11 FIXED TABLE (FULL) OF 'X$KZSPR'
39 8 TABLE ACCESS (FULL) OF 'USER$'
40 7 INDEX (RANGE SCAN) OF 'I_OBJ2' (UNIQUE)
41 5 TABLE ACCESS (BY INDEX ROWID) OF 'IND$'
42 41 INDEX (UNIQUE SCAN) OF 'I_IND1' (UNIQUE)
43 5 TABLE ACCESS (BY INDEX ROWID) OF 'OBJAUTH$'
44 43 NESTED LOOPS
45 44 FIXED TABLE (FULL) OF 'X$KZSRO'
46 44 INDEX (RANGE SCAN) OF 'I_OBJAUTH2' (NON-UNIQ
UE)
47 5 FIXED TABLE (FULL) OF 'X$KZSPR'
48 5 FIXED TABLE (FULL) OF 'X$KZSPR'
49 5 FIXED TABLE (FULL) OF 'X$KZSPR'
50 5 FIXED TABLE (FULL) OF 'X$KZSPR'
51 5 FIXED TABLE (FULL) OF 'X$KZSPR'
52 5 FIXED TABLE (FULL) OF 'X$KZSPR'
53 5 FIXED TABLE (FULL) OF 'X$KZSPR'
54 5 FIXED TABLE (FULL) OF 'X$KZSPR'
55 5 FIXED TABLE (FULL) OF 'X$KZSPR'
56 5 FIXED TABLE (FULL) OF 'X$KZSPR'
57 5 FIXED TABLE (FULL) OF 'X$KZSPR'
58 5 FIXED TABLE (FULL) OF 'X$KZSPR'
59 5 FIXED TABLE (FULL) OF 'X$KZSPR'
60 5 FIXED TABLE (FULL) OF 'X$KZSPR'
61 5 FIXED TABLE (FULL) OF 'X$KZSPR'
62 5 FIXED TABLE (FULL) OF 'X$KZSPR'
63 5 FIXED TABLE (FULL) OF 'X$KZSPR'
Statistics
----------------------------------------------------------
14 recursive calls
0 db block gets
95786 consistent gets
0 physical reads
0 redo size
909 bytes sent via SQL*Net to client
651 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
9 rows processed
SYS@TIGGER>
Your suggestion generates 2X LIO.
To mdinh ...
Gabe, September 15, 2005 - 10:46 pm UTC
Why PITA? Seeing other points of view is good ... no? Could learn from them too!
If one is to compare/measure the two methods, maybe using a custom table rather than a complex dictionary view like all_objects would be better. Making sure the two solutions are equivalent also helps
see 8 vs. 9 rows processed.
Besides, that "... pure SQL" from the original question may have referred to a non-analytical query. The NTILE solution would indeed be the "easier ... more straightforward", but maybe there is room for alternatives in this case.
I knew something (obvious!) was being missed
Gareth, September 16, 2005 - 4:41 am UTC
Tom,
Many thanks for that, it is exactly what I felt we were missing (if that makes sense)! The solution seems so obvious now and I have no idea why I missed the ntile function, it does EXACTLY what we were asking - d'oh!
ntile (and analytics in general) do indeed appear to rock and roll.
Cheers,
Gareth.
Variation on original query....but not NTILE
Robert, September 19, 2005 - 1:41 pm UTC
Tom,
What if in the original question we wanted to group the results by the actual column VALUES and not just thier COUNT?
For example...
drop table t;
create table t (c number);
insert into t values (1);
insert into t values (2);
insert into t values (3);
insert into t values (7);
insert into t values (8);
insert into t values (15);
insert into t values (17);
insert into t values (20);
select c, ntile(2) over (order by c) nt from t;
C NT
--------- ---------
1 1
2 1
3 1
7 1
8 2
15 2
17 2
20 2
What I want instead... is to divide the VALUES in the 'C' column into 2 groups... like this...
MAX(C)/<# of groups>
C NT
--------- ---------
1 1
2 1
3 1
7 1
8 1 <-- NOW 8 is in the 1st group (between 1 and 10)
15 2
17 2
20 2
Is there an analytic function to accomplish this?
Thank you,
Robert.
September 19, 2005 - 4:46 pm UTC
ops$tkyte@ORA10GR2> select min(user_id), max(user_id), count(*), grp
2 from (
3 select user_id, trunc((user_id-0.1)/ceil(max(user_id) over ()/2)) grp
4 from all_users
5 )
6 group by grp
7 order by grp
8 /
MIN(USER_ID) MAX(USER_ID) COUNT(*) GRP
------------ ------------ ---------- ----------
0 25 7 0
34 61 21 1
something like that? (assumes integers - the 0.1 trick will get 1..10 to be in group one and 11..20 to be in group two and so on)
That seems very close
Robert, September 19, 2005 - 5:12 pm UTC
Tom,
I changed it to 10 groups and subtracted the max-min to see the values in the range.... it appears very even, except for grp=2 where there are much fewer rows in the group... I checked the data, and sure enough the data is very sparse around that area.
So would this be the best I could expect for grouping data in this fasion? (or am I asking for something illogical/imposible to get the groups more even based on a potentially unknown distribution of the data :)
Why aren't the DIFFs even among the groups (e.g. GRP=2) ?
In any event, thanks for your help!
1 select min(user_id)
2 , max(user_id)
3 , max(user_id) - min(user_id) diff
4 , count(*), grp
5 from (
6 select user_id, trunc((user_id-0.1)/ceil(max(user_id) over ()/10)) grp
7 from all_users
8 )
9 group by grp
10* order by grp
15:38:52 SQL> /
MIN(USER_ID) MAX(USER_ID) DIFF COUNT(*) GRP
------------ ------------ --------- --------- ---------
0 148 148 90 0
149 291 142 104 1
377 444 67 67 2
445 592 147 123 3
593 740 147 128 4
741 888 147 123 5
889 1036 147 111 6
1037 1184 147 123 7
1185 1332 147 139 8
1333 1476 143 136 9
10 rows selected.
September 19, 2005 - 11:35 pm UTC
if you want an even distribution -- ntile()
if you want to take the (high-min)/cnt and slot them out -- with non-evenly distributed data - you won't get that by definition.
something like this?
Gabe, September 19, 2005 - 9:57 pm UTC
SQL> select c, ntile(2) over (order by c) equiheight_bucket from t;
C EQUIHEIGHT_BUCKET
---------- -----------------
1 1
2 1
3 1
7 1
8 2
15 2
17 2
20 2
8 rows selected.
SQL> with x as (select min(c) min_c, max(c) max_c from t)
2 SELECT c, width_bucket(c,min_c,max_c+1,2) equiwidth_bucket
3 from t, x;
C EQUIWIDTH_BUCKET
---------- ----------------
1 1
2 1
3 1
7 1
8 1
15 2
17 2
20 2
8 rows selected.
Thanks Gabe
Oraboy, September 20, 2005 - 5:13 pm UTC
Thanks Gabe..I learnt one more cool thing (width_bucket) from your post.
I wasnt even aware of such a function and this is what I came up with.. the table 't' is simply a dump of all object_ids less than 10000..and then I deleted lots of rows between 6000 and 7000 and 1500-2000 just to check.
SQL>l
1 With DivRange as (select (max(object_id)+1)/(&N) divrange from t)
2 select min(object_id),max(object_id), max(object_id)-min(object_id) diff , count(*),grp+1 as grp
3 from (select object_id,trunc(object_id/d.Divrange) grp
4 from t, divrange d)
5 group by grp
6* order by grp
SQL>/
Enter value for n: 10
MIN(OBJECT_ID) MAX(OBJECT_ID) DIFF COUNT(*) GRP
-------------- -------------- -------- ---------- ----------
3 865 862 803 1
866 1499 633 634 2
2001 2597 596 595 3
2598 3463 865 853 4
3464 4329 865 822 5
4330 5195 865 389 6
5196 6061 865 846 7
6062 6927 865 158 8
6928 7793 865 703 9
7794 8658 864 749 10
10 rows selected.
Elapsed: 00:00:00.00
SQL>REM Gabes sql
SQL>REM
SQL>With DivRange as (select min(object_id) minid, max(object_id) maxid from t)
2 select min(object_id), max(object_id),max(object_id) - min(object_id) diff, count(*), nt
3 from (
4 select object_id,width_bucket(object_id,minid,maxid+1,&n) nt from t ,DivRange
5 )
6 group by nt order by nt
7 /
Enter value for n: 10
MIN(OBJECT_ID) MAX(OBJECT_ID) DIFF COUNT(*) NT
-------------- -------------- -------- ---------- ----------
3 868 865 806 1
869 1499 630 631 2
2001 2599 598 597 3
2600 3465 865 853 4
3466 4330 864 821 5
4331 5196 865 389 6
5197 6062 865 846 7
6063 6927 864 157 8
6928 7793 865 703 9
7794 8658 864 749 10
10 rows selected.
Elapsed: 00:00:00.00
autotrace show Full Table scan for both approaches (with LIOs ~ 30 blocks in both)
A little variation
Asim Naveed, October 02, 2005 - 10:13 am UTC
I have a problem slightly varied.
I have a table with only one column.
CREATE TABLE RN (MYNUM NUMBER(10));
INSERT INTO RN (MYNUM) VALUES(1);
INSERT INTO RN (MYNUM) VALUES(2);
INSERT INTO RN (MYNUM) VALUES(3);
INSERT INTO RN (MYNUM) VALUES(4);
INSERT INTO RN (MYNUM) VALUES(5);
INSERT INTO RN (MYNUM) VALUES(7);
INSERT INTO RN (MYNUM) VALUES(8);
INSERT INTO RN (MYNUM) VALUES(9);
INSERT INTO RN (MYNUM) VALUES(15);
INSERT INTO RN (MYNUM) VALUES(16);
INSERT INTO RN (MYNUM) VALUES(17);
What I want is a query that will make ranges from the
above data on the bases of missing nos.
I want a query that will return me
RF RT
1 5
7 9
15 17
Thanks
October 02, 2005 - 11:01 am UTC
ops$tkyte@ORA9IR2> select min(mynum), max(mynum)
2 from (
3 select mynum,
4 max(grp) over (order by mynum) maxgrp
5 from (
6 select mynum,
7 case when lag(mynum) over (order by mynum) <> mynum-1
8 then row_number() over (order by mynum)
9 end grp
10 from rn
11 )
12 )
13 group by maxgrp
14 order by 1
15 /
MIN(MYNUM) MAX(MYNUM)
---------- ----------
1 5
7 9
15 17
Very Good
PRS, October 02, 2005 - 5:33 pm UTC
Can you please explain the query.
Space for the "buckets" to grow...
MarkB..., December 04, 2005 - 11:36 am UTC
How about if the last "bucket" needs to have space to grow?
My requirement is to balance a set of SKUs into a number of ranges, and ensure that the last range has x% growth, since the surrogate key on the SKU is generated from a sequence (therefore new additions will fall into the last bucket range, which would quickly become "un-balanced").
Can you help?
December 04, 2005 - 11:54 am UTC
I don't get it?
what does it mean for a range to have "x% growth"???
ntile breaks a set of M rows (regardless of key values) into N buckets each having about M/N rows in it.
M = 1000
N = 10
each bucket would have 100 rows in it.
ops$tkyte@ORA10GR2> create table t as select dbms_random.random id from all_objects where rownum <= 1000;
Table created.
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> select nt, min(id), max(id), max(id)-min(id) gap, count(*)
2 from (select id, ntile(10) over (order by id) nt from t )
3 group by nt
4 order by nt
5 /
NT MIN(ID) MAX(ID) GAP COUNT(*)
---------- ---------- ---------- ---------- ----------
1 -2.145E+09 -1.727E+09 417541517 100
2 -1.725E+09 -1.252E+09 472724860 100
3 -1.245E+09 -869326065 375722857 100
4 -864818826 -418490867 446327959 100
5 -411805973 39727487 451533460 100
6 41056961 492515025 451458064 100
7 493298437 852795730 359497293 100
8 853341595 1249840427 396498832 100
9 1252590379 1648824944 396234565 100
10 1649062217 2140724647 491662430 100
10 rows selected.
Shachi, December 05, 2005 - 8:17 am UTC
Hi Tom,
How to divide rows in buckets of different sizes. e.g Let's say 3 buckets of 70% rows, 20% and 10% rows respectively.
How can we achieve it?
December 06, 2005 - 4:33 am UTC
ops$tkyte@ORA9IR2> select min(object_id), max(object_id), count(*), grp
2 from (
3 select object_id,
4 case when rn/cnt <= .7 then 70
5 when rn/cnt <= .9 then 20
6 else 10
7 end grp
8 from (
9 select object_id, row_number() over (order by object_id) rn,
count(*) over () cnt
10 from all_objects
11 )
12 )
13 group by grp
14 /
MIN(OBJECT_ID) MAX(OBJECT_ID) COUNT(*) GRP
-------------- -------------- ---------- ----------
29496 33977 3092 10
22314 29495 6184 20
3 22313 21644 70
Different sized buckets
Bob B, December 05, 2005 - 9:56 am UTC
Find the common divisor and find out how many buckets you need for that percent and decode.
10%, 20%, 70% - common divisor is 10% = 10 buckets
DECODE(
ntile(10) over ( order by id ),
1, 1,
2, 2,
3, 2,
3
) Bucket
Bucket 1 has ~10% of the rows
Bucket 2 has ~20% of the rows
Bucket 3 has ~70% of the rows
December 06, 2005 - 5:01 am UTC
indeed, that is a good way. easier than what I first thought of.
Jean-Pierre, December 06, 2005 - 9:20 am UTC
Hello Tom
Say I have these data
C1 C2
-----
A 1
A 1
B 3
B 4
C 5
D 6
......
......
......
How can I have the pourcentage of A,B,C and so on. Like this :
C1 count(*) %
A 2 n%
B 3 n%
C 1 n%
......
I don't know how many distinct C1 there is in my table, but of course I can know how many C2 there is for each distinct C1
Thanks a lot for reply
December 06, 2005 - 9:50 am UTC
I'd show you how to use ratio_to_report, but I don't have your table or data ;(
Jean-Pierre, December 06, 2005 - 10:38 am UTC
Here is my test
create table T (C1 varchar2(1),C2 varchar2(1));
insert into T values ('A','1');
insert into T values ('A','2');
insert into T values ('A','3');
insert into T values ('A','4');
insert into T values ('B','5');
insert into T values ('B','6');
insert into T values ('B','7');
insert into T values ('D','8');
insert into T values ('D','9');
...
The probleme is that I don't know how many distinct C1 there is in table T.
The result I'd like is :
Col1 Col2 Col3
---- ---- -------
A N rows x% of total rows
B N rows x% of total rows
C N rows x% of total rows
etc...
Thanks a lot
December 06, 2005 - 2:41 pm UTC
see below :) someone got my ratio to report hint right above.......
Analytics Rock
Bob B, December 06, 2005 - 11:10 am UTC
SELECT
COL1,
COUNT(*) COL2
COUNT(*)/(COUNT(*) OVER ( PARTITION BY COL1 )) COL3
FROM T
GROUP BY COL1
My question
Steve, December 06, 2005 - 12:13 pm UTC
Hi Tom,
I have a table T,
CREATE TABLE T ( dept_num, item_id ) AS
SELECT MOD(TRUNC(dbms_RANDOM.VALUE*100),12),
MOD(TRUNC(dbms_RANDOM.VALUE*1000),100)
FROM all_objects;
I like to partition table T into k (say 4) subsets,
and the item in the same dept must be in the same subset.
how to make each subset as balanced as possible?
Thanks!
December 06, 2005 - 3:20 pm UTC
it is tough in sql as it is sort of a bin fitting problem (maybe someone else wants to write the iterative solution seeking routine using the model clause :)
you can use something close to what I use to split a table up into N chunks by extents:
ops$tkyte@ORA9IR2> select dept_num, cnt,sum(cnt) over (order by dept_num) sum_cnt,
2 ceil(sum(cnt) over ()/4)sz
3 from (
4 select dept_num, count(*) cnt
5 from t
6 group by dept_num
7 )
8 /
DEPT_NUM CNT SUM_CNT SZ
---------- ---------- ---------- ----------
0 2761 2761 7731
1 2896 5657 7731
2 2788 8445 7731
3 2683 11128 7731
4 2487 13615 7731
5 2433 16048 7731
6 2454 18502 7731
7 2538 21040 7731
8 2390 23430 7731
9 2529 25959 7731
10 2557 28516 7731
11 2406 30922 7731
12 rows selected.
<b>our perfect set size is 7731, we'll never get it, but we'll do the best we can "by deptno" (sorted by deptno)</b>
ops$tkyte@ORA9IR2> select grp, min(dept_num), max(dept_num), sum(cnt)
2 from (
3 select dept_num, cnt,
4 floor( (sum(cnt) over (order by dept_num)) / ceil(sum(cnt) over ()/4) ) grp
5 from (
6 select dept_num, count(*) cnt
7 from t
8 group by dept_num
9 )
10 )
11 group by grp
12 /
GRP MIN(DEPT_NUM) MAX(DEPT_NUM) SUM(CNT)
---------- ------------- ------------- ----------
0 0 1 5657
1 2 4 7958
2 5 7 7425
3 8 11 9882
ops$tkyte@ORA9IR2>
You could play with some fudge factors on the group size - but you'll get errors either way depending on the data input into the table...
to: Jean-Pierre from FRANCE
Marcio Portes, December 06, 2005 - 1:48 pm UTC
Is it what you're looking for?
SQL> select c1, count(*),
2 (ratio_to_report(count(*)) over ())*100 ratio_n
3 from t
4 group by c1
5 /
C COUNT(*) RATIO_N
- ------------- -------------
A 4 44,4444444444
B 3 33,3333333333
D 2 22,2222222222
3 rows selected.
Always calling me "someone"
Marcio Portes, December 06, 2005 - 9:15 pm UTC
How to do such grouping ?
Parag J Patankar, January 31, 2006 - 7:54 am UTC
Hi Tom,
I have following data
create table emp1 as select * from scott.emp;
insert into emp1 (empno, ename) values (8000, 'PARAG');
insert into emp1 (empno, ename) values (8001, 'TOM');
commit;
so my emp1 table is now 15 records. Now I want this data to be grouped into 4 groups such as 3 groups will have 4 emps and 5th group will have 3 employees How can I do this ?
I am selecting empno, ename
select empno, ename from emp1 order by dbms_random.value;
So my desired output
EMPNO ENAME GROUP
---------- -------- -----
7934 MILLER 1
7782 CLARK 1
7566 JONES 1
7521 WARD 1
7654 MARTIN 2
7876 ADAMS 2
7698 BLAKE 2
7777 FORD 2
7369 SMITH 3
7499 ALLEN 3
8001 TOM 3
7844 TURNER 3
8000 PARAG 4
7788 SCOTT 4
7839 KING 4
15 rows selected.
How can I do this ?
regards & thanks
pjp
January 31, 2006 - 3:17 pm UTC
ops$tkyte@ORA10GR2> create table t as select * from scott.emp;
Table created.
ops$tkyte@ORA10GR2> insert into t
2 select * from scott.emp where rownum = 1;
1 row created.
ops$tkyte@ORA10GR2> update t set ename = lower(ename), empno = -empno where rownum = 1;
1 row updated.
ops$tkyte@ORA10GR2> commit;
Commit complete.
ops$tkyte@ORA10GR2> select empno, ename, ntile(4) over (order by dbms_random.value) grp
2 from t;
EMPNO ENAME GRP
---------- ---------- ----------
7839 KING 1
7698 BLAKE 1
7521 WARD 1
7934 MILLER 1
7566 JONES 2
7876 ADAMS 2
7844 TURNER 2
-7369 smith 2
7788 SCOTT 3
7654 MARTIN 3
7499 ALLEN 3
7902 FORD 3
7900 JAMES 4
7782 CLARK 4
7369 SMITH 4
15 rows selected.
But, what about ...
Mark, March 14, 2006 - 12:00 pm UTC
How about an 8i Standard solution to this problem?
Regards,
MArk
March 15, 2006 - 9:03 am UTC
second query...
ops$tkyte@ORA9IR2> select min(object_id), max(object_id), count(*), nt
2 from ( select object_id, ntile(8) over (order by object_id) nt
3 from all_objects
4 )
5 group by nt;
MIN(OBJECT_ID) MAX(OBJECT_ID) COUNT(*) NT
-------------- -------------- ---------- ----------
3 3996 3867 1
3997 8401 3867 2
8402 12268 3867 3
12269 16137 3867 4
16138 20004 3867 5
20005 23875 3867 6
23876 28659 3867 7
28660 33936 3867 8
8 rows selected.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select min(object_id), max(object_id), count(*), nt
2 from
3 (
4 select trunc((rownum-0.1)/(select trunc(count(*)/8) from all_objects)) nt, object_id
5 from
6 (
7 select object_id
8 from all_objects
9 order by object_id
10 )
11 )
12 group by nt;
MIN(OBJECT_ID) MAX(OBJECT_ID) COUNT(*) NT
-------------- -------------- ---------- ----------
3 3996 3867 0
3997 8401 3867 1
8402 12268 3867 2
12269 16137 3867 3
16138 20004 3867 4
20005 23875 3867 5
23876 28659 3867 6
28660 33936 3867 7
8 rows selected.
A reader, April 12, 2006 - 4:35 pm UTC
Close ...
Greg, March 22, 2007 - 1:50 pm UTC
Was looking for some help with a problem I have, and this thread came sooooo close ... :)
(Of course, learned about NTILE and WIDTH_BUCKET ... how cool is that?)
Ok ... so far we've been talking about dividing a group into a number of buckets and keeping it even "by count" ... how to do it "by sum" ?
Here's a sample script and setup to help demonstrate what I'm looking for:
drop table junk;
create table junk
( name varchar2(5),
bytes number )
/
insert into junk values ( 'A', 2837456 );
insert into junk values ( 'B', 984563 );
insert into junk values ( 'C', 92346512 );
insert into junk values ( 'D', 37465 );
insert into junk values ( 'E', 10293243 );
insert into junk values ( 'F', 282374 );
insert into junk values ( 'G', 27364 );
insert into junk values ( 'H', 883745 );
insert into junk values ( 'I', 227364 );
insert into junk values ( 'J', 827345 );
insert into junk values ( 'K', 2833 );
insert into junk values ( 'L', 123 );
commit;
set pagesize 50
break on grp skip 1
compute sum of bytes on grp
select name, grp, bytes, sum(bytes) over ( partition by grp )
from ( select name, bytes,
ntile(3) over (order by bytes desc) grp from junk )
/
Results in:
NAME GRP BYTES SUM(BYTES)OVER(PARTITIONBYGRP)
------------------------------ ---------- ---------- ------------------------------
C 1 92346512 106461774
E 10293243 106461774
A 2837456 106461774
B 984563 106461774
********** ----------
sum 106461774
H 2 883745 2220828
J 827345 2220828
I 227364 2220828
F 282374 2220828
********** ----------
sum 2220828
D 3 37465 67785
K 2833 67785
G 27364 67785
L 123 67785
********** ----------
sum 67785
12 rows selected.
However, what I would prefer to see, are those SUMS closer in total ... even if that means 1 group only has 1 record in it ...
SQL > select sum(bytes) from junk;
SUM(BYTES)
----------------
108,750,387
1 row selected.
SQL > select sum(bytes)/3 from junk;
SUM(BYTES)/3
------------
36250129
1 row selected.
Based on those queries, I'm expecting to see all 3 "buckets" with a total close to 36250129
(I know we won't hit perfect, don't want perfect, just want close).
I've done this previously with PL/SQL ... but would like to know if we can pull this off in SQL .. ;)
what if we don't have a key column
dav, April 24, 2007 - 2:13 pm UTC
How do we use ntile() to split the range even if we don't have key column.
I've a million record table with account numbers repeating along with some other columns. Table has no key column. I want to split the table into 10 ranges by accounting number field and send the min(account) and max(account) to a PL/SQL procedure so that I can execute the procedure in parallel with different account number ranges.
Please let me know. Thank you
April 24, 2007 - 3:02 pm UTC
well, we have to assume that account_number repeats, so, in the following, pretend ID is account_number.
start by rolling up to account_number (id)
9 select id, count(*) cnt
10 from t
11 group by id
add to that a running total of record cnts and the total of all records:
5 select id, cnt,
6 sum(cnt) over (order by id) running_total,
7 sum(cnt) over () total
8 from (
9 select id, count(*) cnt
10 from t
11 group by id
12 )
Next, we can put everyone into a "group" by dividing:
3 select id, cnt, ceil( running_total / ( total/4 )) grp
4 from (
5 select id, cnt,
6 sum(cnt) over (order by id) running_total,
7 sum(cnt) over () total
8 from (
9 select id, count(*) cnt
10 from t
11 group by id
12 )
13 )
that used "4", you can adjust...
and now we just aggregate:
ops$tkyte%ORA9IR2> /*
DOC>drop table t;
DOC>
DOC>create table t
DOC>as
DOC>select mod(object_id,200) id
DOC> from all_objects;
DOC>*/
ops$tkyte%ORA9IR2>
ops$tkyte%ORA9IR2> select count(distinct id), count(*) from t;
COUNT(DISTINCTID) COUNT(*)
----------------- ----------
200 30398
ops$tkyte%ORA9IR2>
ops$tkyte%ORA9IR2> select min(id), max(id), sum(cnt), grp
2 from (
3 select id, cnt, ceil( running_total / ( total/4 )) grp
4 from (
5 select id, cnt,
6 sum(cnt) over (order by id) running_total,
7 sum(cnt) over () total
8 from (
9 select id, count(*) cnt
10 from t
11 group by id
12 )
13 )
14 )
15 group by grp
16 order by grp
17 /
MIN(ID) MAX(ID) SUM(CNT) GRP
---------- ---------- ---------- ----------
0 48 7539 1
49 98 7587 2
99 148 7569 3
149 199 7703 4
ops$tkyte%ORA9IR2> select min(id), max(id), sum(cnt), grp
2 from (
3 select id, cnt, ceil( running_total / ( total/6 )) grp
4 from (
5 select id, cnt,
6 sum(cnt) over (order by id) running_total,
7 sum(cnt) over () total
8 from (
9 select id, count(*) cnt
10 from t
11 group by id
12 )
13 )
14 )
15 group by grp
16 order by grp
17 /
MIN(ID) MAX(ID) SUM(CNT) GRP
---------- ---------- ---------- ----------
0 31 4933 1
32 65 5186 2
66 98 5007 3
99 131 4996 4
132 165 5127 5
166 199 5149 6
6 rows selected.
Works Great.
A reader, April 24, 2007 - 3:08 pm UTC
Power of Analytic functions. Thanks.
Use NTILE to split into ranges
Kailash Subbaraman, August 08, 2007 - 1:10 pm UTC
Tom,
I have been a very big fan of you. I have solved lot of problems with analytic functions (truly analytics rock and analytics roll). But I have the following situation that I need your help.
I have a table with 2 columns :
Create Table A (F1 VARCHAR2(2), F2 VARCHAR2(3));
INSERT INTO A ( F1, F2 ) VALUES ( 'A1', '1');
INSERT INTO A ( F1, F2 ) VALUES ( 'A1', '2');
INSERT INTO A ( F1, F2 ) VALUES ( 'A1', '3');
INSERT INTO A ( F1, F2 ) VALUES ( 'A1', '4');
INSERT INTO A ( F1, F2 ) VALUES ( 'A2', '2');
INSERT INTO A ( F1, F2 ) VALUES ( 'A3', '1');
INSERT INTO A ( F1, F2 ) VALUES ( 'A4', '5');
INSERT INTO A ( F1, F2 ) VALUES ( 'A2', '3');
INSERT INTO A ( F1, F2 ) VALUES ( 'A2', '4');
INSERT INTO A ( F1, F2 ) VALUES ( 'A2', '5');
INSERT INTO A ( F1, F2 ) VALUES ( 'A2', '6');
INSERT INTO A ( F1, F2 ) VALUES ( 'A2', '7');
INSERT INTO A ( F1, F2 ) VALUES ( 'A3', '6');
INSERT INTO A ( F1, F2 ) VALUES ( 'A3', '7');
INSERT INTO A ( F1, F2 ) VALUES ( 'A3', '8');
INSERT INTO A ( F1, F2 ) VALUES ( 'A4', '5');
INSERT INTO A ( F1, F2 ) VALUES ( 'A4', '7');
I need to split these rows into say 3 groups (using function such as NTILE) in such a way that all records with the same value for the column F1 should be in the same group.
e.g
A1 1 1
A1 2 1
A1 3 1
A1 4 1
A2 2 2
A2 3 2
A2 4 2
A2 5 2
A2 6 2
A2 7 2
A3 1 3
A3 6 3
A3 7 3
A3 8 3
A4 5 3
A4 7 3
Thanks
franc, December 21, 2009 - 10:21 am UTC
so how do i get the whole (with the max value of one column) row from one group (ntile function used)??
Any ideas??
with regards
Fr.
December 21, 2009 - 3:13 pm UTC
I'm not sure what you mean here.
Unbalanced Sets ?
bc, December 23, 2009 - 11:11 am UTC
How can I split the below data into different buckets, the split should occur when there is a gap in time between the end time of the current row and the start time
of the next row.
My data looks like this, and the intent is to assign rows 1 thru 5 to bucket 1 and rows 6 thru 10 in bucket 2 and so forth. the number of buckets can vary depeding on
the number of gaps in the employees time.
ROWNUM EMPLOYEE_ID WORK_DAY START_TIME END_TIME HOURS_WORKED
---------- ----------- ----------------- ----------------------- ----------------------- ------------
1 99121331 22-Dec-2009 22-Dec-2009 12:00:00 AM 22-Dec-2009 04:00:00 AM 4.0
2 99121331 22-Dec-2009 22-Dec-2009 04:00:00 AM 22-Dec-2009 04:30:00 AM 0.5
3 99121331 22-Dec-2009 22-Dec-2009 04:30:00 AM 22-Dec-2009 08:30:00 AM 4.0
4 99121331 22-Dec-2009 22-Dec-2009 08:30:00 AM 22-Dec-2009 09:00:00 AM 0.5
5 99121331 22-Dec-2009 22-Dec-2009 09:00:00 AM 22-Dec-2009 12:00:00 PM 3.0
6 99121331 22-Dec-2009 22-Dec-2009 01:00:00 PM 22-Dec-2009 05:00:00 PM 4.0
7 99121331 22-Dec-2009 22-Dec-2009 05:00:00 PM 22-Dec-2009 05:30:00 PM 0.5
8 99121331 22-Dec-2009 22-Dec-2009 05:30:00 PM 22-Dec-2009 09:30:00 PM 4.0
9 99121331 22-Dec-2009 22-Dec-2009 09:30:00 PM 22-Dec-2009 10:00:00 PM 0.5
10 99121331 22-Dec-2009 22-Dec-2009 10:00:00 PM 23-Dec-2009 12:00:00 AM 2.0
script to create test data,
create table employee_time
(
employee_id number(10),
work_day date,
start_time date,
end_time date,
hours_worked number,
description varchar2(50)
);
insert into employee_time (start_time, end_time, hours_worked, description, employee_id, work_day)
values (to_date('22-12-2009', 'dd-mm-yyyy'), to_date('22-12-2009 04:00:00', 'dd-mm-yyyy hh24:mi:ss'), 4, 'work', 99121331, to_date('22-12-2009', 'dd-mm-yyyy'));
insert into employee_time (start_time, end_time, hours_worked, description, employee_id, work_day)
values (to_date('22-12-2009 04:00:00', 'dd-mm-yyyy hh24:mi:ss'), to_date('22-12-2009 04:30:00', 'dd-mm-yyyy hh24:mi:ss'), 0.5, 'break', 99121331, to_date('22-12-2009', 'dd-mm-yyyy'));
insert into employee_time (start_time, end_time, hours_worked, description, employee_id, work_day)
values (to_date('22-12-2009 04:30:00', 'dd-mm-yyyy hh24:mi:ss'), to_date('22-12-2009 08:30:00', 'dd-mm-yyyy hh24:mi:ss'), 4, 'work', 99121331, to_date('22-12-2009', 'dd-mm-yyyy'));
insert into employee_time (start_time, end_time, hours_worked, description, employee_id, work_day)
values (to_date('22-12-2009 08:30:00', 'dd-mm-yyyy hh24:mi:ss'), to_date('22-12-2009 09:00:00', 'dd-mm-yyyy hh24:mi:ss'), 0.5, 'break', 99121331, to_date('22-12-2009', 'dd-mm-yyyy'));
insert into employee_time (start_time, end_time, hours_worked, description, employee_id, work_day)
values (to_date('22-12-2009 09:00:00', 'dd-mm-yyyy hh24:mi:ss'), to_date('22-12-2009 12:00:00', 'dd-mm-yyyy hh24:mi:ss'), 3, 'work', 99121331, to_date('22-12-2009', 'dd-mm-yyyy'));
insert into employee_time (start_time, end_time, hours_worked, description, employee_id, work_day)
values (to_date('22-12-2009 13:00:00', 'dd-mm-yyyy hh24:mi:ss'), to_date('22-12-2009 17:00:00', 'dd-mm-yyyy hh24:mi:ss'), 4, 'work', 99121331, to_date('22-12-2009', 'dd-mm-yyyy'));
insert into employee_time (start_time, end_time, hours_worked, description, employee_id, work_day)
values (to_date('22-12-2009 17:00:00', 'dd-mm-yyyy hh24:mi:ss'), to_date('22-12-2009 17:30:00', 'dd-mm-yyyy hh24:mi:ss'), 0.5, 'break', 99121331, to_date('22-12-2009', 'dd-mm-yyyy'));
insert into employee_time (start_time, end_time, hours_worked, description, employee_id, work_day)
values (to_date('22-12-2009 17:30:00', 'dd-mm-yyyy hh24:mi:ss'), to_date('22-12-2009 21:30:00', 'dd-mm-yyyy hh24:mi:ss'), 4, 'work', 99121331, to_date('22-12-2009', 'dd-mm-yyyy'));
insert into employee_time (start_time, end_time, hours_worked, description, employee_id, work_day)
values (to_date('22-12-2009 21:30:00', 'dd-mm-yyyy hh24:mi:ss'), to_date('22-12-2009 22:00:00', 'dd-mm-yyyy hh24:mi:ss'), 0.5, 'break', 99121331, to_date('22-12-2009', 'dd-mm-yyyy'));
insert into employee_time (start_time, end_time, hours_worked, description, employee_id, work_day)
values (to_date('22-12-2009 22:00:00', 'dd-mm-yyyy hh24:mi:ss'), to_date('23-12-2009', 'dd-mm-yyyy'), 2, 'work', 99121331, to_date('22-12-2009', 'dd-mm-yyyy'));
I used lag and lead to get the previous and next rows.
select a.*,
case
when ( next_start_time - end_time ) * 24 * 60 > 15 then 'Y'
else 'N'
end split
from ( select rownum,
employee_id,
to_char(work_day, 'dd-Mon-yyyy') work_day,
start_time,
end_time,
hours_worked,
lag( start_time, 1 ) over( partition by employee_id, work_day
order by start_time ) prev_start_time,
lag( end_time, 1 ) over( partition by employee_id, work_day
order by end_time ) prev_end_time,
lead( start_time, 1 ) over( partition by employee_id, work_day
order by start_time ) next_start_time,
lead( end_time, 1 ) over( partition by employee_id, work_day
order by end_time ) next_end_time
from employee_time ) a
Thanks Tom, I appreciate your help.
December 31, 2009 - 12:10 pm UTC
I don't know why you partition by work_day - not sure why work_day even exists - but if you think you do - so be it.
it would be interesting to have test data that would test your partitions - don't you think? Since boundary conditions are what kill us in real life...
ops$tkyte%ORA10GR2> select employee_id, start_time, end_time, last_value(flag ignore nulls) over (partition by employee_id, work_day order by start_time) grp
2 from (
3 select employee_id, start_time, end_time, work_day,
4 case when nvl(lag(end_time) over (partition by employee_id, work_day order by start_time),start_time-1) <> start_time
5 then row_number() over (partition by employee_id, work_day order by start_time)
6 end flag
7 from employee_time
8 )
9 /
EMPLOYEE_ID START_TIME END_TIME GRP
----------- -------------------- -------------------- ----------
99121331 22-dec-2009 00:00:00 22-dec-2009 04:00:00 1
99121331 22-dec-2009 04:00:00 22-dec-2009 04:30:00 1
99121331 22-dec-2009 04:30:00 22-dec-2009 08:30:00 1
99121331 22-dec-2009 08:30:00 22-dec-2009 09:00:00 1
99121331 22-dec-2009 09:00:00 22-dec-2009 12:00:00 1
99121331 22-dec-2009 13:00:00 22-dec-2009 17:00:00 6
99121331 22-dec-2009 17:00:00 22-dec-2009 17:30:00 6
99121331 22-dec-2009 17:30:00 22-dec-2009 21:30:00 6
99121331 22-dec-2009 21:30:00 22-dec-2009 22:00:00 6
99121331 22-dec-2009 22:00:00 23-dec-2009 00:00:00 6
10 rows selected.
Sunil, December 31, 2009 - 7:26 am UTC
Try following query
SELECT employee_id,
work_day,
start_time,
end_time,
hours_worked,
description,
DENSE_RANK () OVER (ORDER BY rn) bucket
FROM (SELECT tt.*, MAX (rnum) OVER (ORDER BY start_time) rn
FROM (SELECT dd.*,
CASE
WHEN ABS(start_time
- lag (end_time) OVER (ORDER BY start_time)) > 0
THEN ROW_NUMBER () OVER (ORDER BY start_time)
WHEN ROW_NUMBER () OVER (ORDER BY start_time) = 1 THEN 1
ELSE
NULL
END rnum
FROM employee_time dd) tt) order by start_time
Awesome ...
BC, January 05, 2010 - 9:18 am UTC
Tom,
Once again, thank you so much.
As for work date, it already exists in the table ( Been there for 6+ years ) and is used all over the place. Seems like this system was designed by a RDBMS antagonist ...
BC
January 05, 2010 - 10:46 am UTC
... RDBMS
antagonist ...
I liked that, I've heard of "database agnostic", but database antagonistic is even better :)
how to convert time into buckets (minute into 5 minutes)
Peter Lopera, September 10, 2012 - 6:29 am UTC
Hi Tom
I am trying to convert Oracle sysmetric data from minutes samples into 5 minutes sample data.
Each minute I sample v$sysmetric and save data into a table, so when I query the table I have data such as
SAMPLE_TIMESTAMP METRIC_NAME VALUE
-------------------- ------------------------------ ----------
sep-04 1210 Database Time Per Sec 629.758744
sep-04 1211 Database Time Per Sec 405.815103
sep-04 1212 Database Time Per Sec 418.245743
sep-04 1213 Database Time Per Sec 393.123067
sep-04 1214 Database Time Per Sec 284.260073
sep-04 1215 Database Time Per Sec 257.588038
sep-04 1216 Database Time Per Sec 232.273854
sep-04 1217 Database Time Per Sec 297.157191
sep-04 1218 Database Time Per Sec 267.150952
sep-04 1219 Database Time Per Sec 349.309651
I would like to know how to query this data in interval of 5 minutes such s following output:
SAMPLE_TIMESTAMP METRIC_NAME VALUE
-------------------- ------------------------------ ----------
sep-04 1210 - 1214 Database Time Per Sec 426.240546
sep-04 1215 - 1219 Database Time Per Sec 280.695937
what it does is it takes average value from minute 1210 to 1214 and 1215 to 1219javascript:apex.submit('SUBMIT_REVIEW');
Is this possible with SQL?
Thank you
September 14, 2012 - 2:51 pm UTC
sure and I could show you how to do it if I had been given
create table
insert into table
so I could actually write the sql...
all you need to do is get an expression that works on whatever your timestamp data is (strange format there, don't know if that is what you really have in a string or if that is a timestamp or a date) to floor it down to the five minute interval.
ops$tkyte%ORA11GR2> create table t
2 as
3 select to_date( 'sep-04 ' || to_char(1209+rownum,'fm0000'), 'mon-yy hh24mi') x,
4 'blah blah blah' metric,
5 dbms_random.value value
6 from dual
7 connect by level <= 15;
Table created.
ops$tkyte%ORA11GR2> select to_char(x,'mon-yy hh24mi'),
2 to_char(
3 to_date( to_char(x,'mon-yy hh24' ) || to_char( to_number(to_char(x,'mi')) - mod(to_number(to_char(x,'mi')), 5 ), 'fm00' ), 'mon-yy hh24mi')
4 , 'mon-yy hh24mi' ),
5 metric,
6 value
7 from t;
TO_CHAR(X,' TO_CHAR(TO_ METRIC VALUE
----------- ----------- -------------- ----------
sep-04 1210 sep-04 1210 blah blah blah .206367439
sep-04 1211 sep-04 1210 blah blah blah .821680086
sep-04 1212 sep-04 1210 blah blah blah .043872448
sep-04 1213 sep-04 1210 blah blah blah .621996029
sep-04 1214 sep-04 1210 blah blah blah .256090763
sep-04 1215 sep-04 1215 blah blah blah .64936993
sep-04 1216 sep-04 1215 blah blah blah .925411805
sep-04 1217 sep-04 1215 blah blah blah .99099649
sep-04 1218 sep-04 1215 blah blah blah .688817703
sep-04 1219 sep-04 1215 blah blah blah .493901283
sep-04 1220 sep-04 1220 blah blah blah .834203449
sep-04 1221 sep-04 1220 blah blah blah .136844052
sep-04 1222 sep-04 1220 blah blah blah .904672883
sep-04 1223 sep-04 1220 blah blah blah .596391313
sep-04 1224 sep-04 1220 blah blah blah .459076258
15 rows selected.
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> select to_char(newtime,'mon-yy hh24mi' ), metric, avg(value)
2 from (
3 select to_date( to_char(x,'mon-yy hh24' ) || to_char( to_number(to_char(x,'mi')) - mod(to_number(to_char(x,'mi')), 5 ), 'fm00' ), 'mon-yy hh24mi') newtime,
4 metric,
5 value
6 from t
7 )
8 group by newtime, metric
9 order by newtime;
TO_CHAR(NEW METRIC AVG(VALUE)
----------- -------------- ----------
sep-04 1210 blah blah blah .390001353
sep-04 1215 blah blah blah .749699442
sep-04 1220 blah blah blah .586237591
how to convert time into buckets (minute into 5 minutes)
A reader, September 21, 2012 - 2:07 am UTC
Hi
Sorry didnt attach the data set, including them below.
The solution is good however it works if the data set start in minute 1, if it starts in minute 3 the first bucket would include only 3 minutes data.
I thought of using analytics and partition the data into 5 minutes interval but could not find a way!
create table sysmetric
(
sample_date date,
metric_name varchar2(40),
value number(10, 2)
);
insert into sysmetric values (to_date('20120918 105300', 'yyyymmdd hh24miss'), 'Database Time Per Sec', 629.75);
insert into sysmetric values (to_date('20120918 105400', 'yyyymmdd hh24miss'), 'Database Time Per Sec', 405.81);
insert into sysmetric values (to_date('20120918 105500', 'yyyymmdd hh24miss'), 'Database Time Per Sec', 418.24);
insert into sysmetric values (to_date('20120918 105600', 'yyyymmdd hh24miss'), 'Database Time Per Sec', 393.12);
insert into sysmetric values (to_date('20120918 105700', 'yyyymmdd hh24miss'), 'Database Time Per Sec', 284.26);
insert into sysmetric values (to_date('20120918 105800', 'yyyymmdd hh24miss'), 'Database Time Per Sec', 257.58);
insert into sysmetric values (to_date('20120918 105900', 'yyyymmdd hh24miss'), 'Database Time Per Sec', 232.27);
insert into sysmetric values (to_date('20120918 110000', 'yyyymmdd hh24miss'), 'Database Time Per Sec', 297.15);
insert into sysmetric values (to_date('20120918 110100', 'yyyymmdd hh24miss'), 'Database Time Per Sec', 267.15);
insert into sysmetric values (to_date('20120918 110200', 'yyyymmdd hh24miss'), 'Database Time Per Sec', 349.30);
output
SAMPLE_TIMESTAMP METRIC_NAME VALUE
-------------------- ------------------------------ ----------
sep-18 1053 - 1057 Database Time Per Sec 426.240546
sep-18 1058 - 1102 Database Time Per Sec 280.695937
September 26, 2012 - 12:21 pm UTC
... The solution is good however it works if the data set start in minute 1, if it
starts in minute 3 the first bucket would include only 3 minutes data. ..
just stop - stop right now.
write down your requirements, exactly, precisely, specifically, in great detail, at such a level of detail your mother would be able to write psuedo code for it.
I answered exactly the question - exactly the question - you asked. 100%.
so, apparently you have a totally different question. please specify it clearly, precisely. If you want 5 minute intervals based on a start time of the minimum value - say so, clearly. I think that is what you are saying
(and if it is, you should be able to figure it out - what sort of math would you do if you had the minute of the first sample? once you have that - getting the first sample it pretty easy with a scalar subquery...)
Ntile returns overlapping values
karma, March 29, 2013 - 4:59 pm UTC
Hi Tom:
I am trying to create buckets with non-overlapping values from large table usign ntile but i get overlapping values. I thought ntile will give you non-overlapping values. Is there a way to avoid this ? I would like to have next min value start after max value of previous.
SELECT nt
,MIN (experiment_ver_id)
,MAX (experiment_ver_id)
,COUNT (*)
FROM (SELECT experiment_ver_id, NTILE (4) OVER (ORDER BY experiment_ver_id) nt
FROM experiment_core )
GROUP BY nt
ORDER BY nt;
Results:
NT MIN (experiment_ver_id) MAX (experiment_ver_id) COUNT (*)
1 1002 53826 371902005
2 53826 64228 371902004
3 64228 635067 371902004
4 635067 8369496 371902004
March 29, 2013 - 5:59 pm UTC
http://docs.oracle.com/cd/E14072_01/server.112/e10592/functions113.htm
The number of rows in the buckets can differ by at most 1.
ntile breaks the data set into equi-sized groups.
What if you had a table with 100 rows with x=1, 200 rows with x=2, 300 rows with x=3
and then said "ntile that 4" and it didn't overlap?
In fact, what would you like to see come from such a grouping?
I guess you could just take the max() value - so get nt, max(experiment_ver_id) max_evid, and then use
select nt, nvl(lag(max_evid) over (order by nt)+1, 0) min_evid, max_evid
from (that_result)
that would give you non-overlapping ranges you can be between (the endpoints might not exist in your set, but they can be used for lower and upper bounds). Your buckets could be very uneven in counts after this (your count(*) won't count anymore ;) )
I obviously cannot give you a syntactically correct query since I don't have your data ;)
excellent..!!
Shah, June 14, 2015 - 10:10 am UTC
ntile was exactly what I was looking out for..rocking and rolling !!
Thanks Tom