Skip to Main Content
  • Questions
  • group records by interval of 3 seconds

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, LB.

Asked: November 25, 2003 - 1:38 pm UTC

Last updated: November 12, 2018 - 3:16 pm UTC

Version: 8.1.7

Viewed 10K+ times! This question is

You Asked

Hi Tom,

I am not able to find the exact answer to my question although I think I might need to use LAG and LEAD:

I have records like this:

Time Amount
11/22/2003 12:22:01 100
11/22/2003 12:22:03 200
11/22/2003 12:22:04 300
11/22/2003 12:22:45 100
11/22/2003 12:22:46 200
11/22/2003 12:23:12 100
11/22/2003 12:23:12 200

What I need to do is sum the amounts where the time of the records is within 3 seconds of each other.

In the case where the data is like this:

11/22/2003 12:22:03 200
11/22/2003 12:22:04 200
11/22/2003 12:22:05 200
11/22/2003 12:22:06 200
11/22/2003 12:22:07 200
11/22/2003 12:22:08 200
11/22/2003 12:22:09 200

There would only be one row with the total for all the rows.

(Basically, we are looking for "instances" where we define an instance such that all the records within the instance are no more than three seconds apart. So there can be 1 or many records all of the same instance and the resulting summation would have one summary record per instance.)

Would you please point me in the right direction?




and we said...

well, the way I decided to deal with this is:

take the first row, if the subsequent row is within 3 seconds, it stays in this group -- else, create a new group.

then, after "grouping", we sum....

Analytics rock
Analytics roll


ops$tkyte@ORA920PC> create table t ( x date, y int );
Table created.
 
ops$tkyte@ORA920PC> declare
  2          l_date date := to_date( '12:22:03', 'hh24:mi:ss' );
  3  begin
  4          for i in 1 .. 10
  5          loop
  6               insert into t values ( l_date, dbms_random.value( 0, 100 ) );
  7               l_date := l_date + 1/24/60/60;
  8          end loop;
  9
 10          l_date := l_date + 5/24/60/60;
 11          for i in 1 .. 3
 12          loop
 13               insert into t values ( l_date, dbms_random.value( 0, 100 ) );
 14               l_date := l_date + 1/24/60/60;
 15          end loop;
 16
 17          l_date := l_date + 15/24/60/60;
 18          for i in 1 .. 12
 19          loop
 20               insert into t values ( l_date, dbms_random.value( 0, 100 ) );
 21               l_date := l_date + 1/24/60/60;
 22          end loop;
 23  end;
 24  /
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA920PC> alter session set nls_date_format = 'hh24:mi:ss';
Session altered.
 
ops$tkyte@ORA920PC> select x, y, max(rn) over (order by x)
  2    from (
  3  select x, y,
  4         lag(x) over (order by x),
  5         case when abs(lag(x) over (order by x) - x) > 3/24/60/60 then row_number() over (order by x)
  6                  when row_number() over (order by x) = 1 then 1
  7                          else null
  8                  end rn
  9    from t
 10         )
 11  /
 
X                 Y MAX(RN)OVER(ORDERBYX)
-------- ---------- ---------------------
12:22:03         69                     1
12:22:04         62                     1
12:22:05         73                     1
12:22:06         69                     1
12:22:07         95                     1
12:22:08         87                     1
12:22:09         49                     1
12:22:10         85                     1
12:22:11         94                     1
12:22:12         57                     1
12:22:18         86                    11
12:22:19         50                    11
12:22:20         90                    11
12:22:36         63                    14
12:22:37         26                    14
12:22:38         14                    14
12:22:39         96                    14
12:22:40         18                    14
12:22:41         77                    14
12:22:42         50                    14
12:22:43         11                    14
12:22:44          1                    14
12:22:45         74                    14
12:22:46        100                    14
12:22:47         20                    14
 
25 rows selected.
 
ops$tkyte@ORA920PC>
ops$tkyte@ORA920PC> select min(x), max(x), sum(y)
  2    from (
  3  select x, y, max(rn) over (order by x) max_rn
  4    from (
  5  select x, y,
  6         lag(x) over (order by x),
  7         case when abs(lag(x) over (order by x) - x) > 3/24/60/60 then row_number() over (order by x)
  8                  when row_number() over (order by x) = 1 then 1
  9                          else null
 10                  end rn
 11    from t
 12         )
 13         )
 14   group by max_rn
 15  /
 
MIN(X)   MAX(X)       SUM(Y)
-------- -------- ----------
12:22:03 12:22:12        740
12:22:18 12:22:20        226
12:22:36 12:22:47        550
 



Rating

  (80 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Very helpful and quite impressive

LB, November 25, 2003 - 3:05 pm UTC

Wow! Thanks a lot!!!

HELP - worked in view but not stored proc

LB, December 10, 2003 - 1:44 pm UTC

Tom,

Your previous solution was most helpful and we have been using that logic as part of a view.
But then we decided instead of a view to load the data into a table every day, since the view takes a while to run.

I have created a stored procedure but for some reason it is giving me this error:

"Line 61: Pos 15: PLS-00103: Encountered the symbol "(" when expecting one of the following:

, from"

When I run the query by itself no problem, but when inserting within the procedure it gives me a problem.

Does it have to do with the case? (I am using Oracle 8.1.7)

Here is my procedure:

CREATE OR REPLACE PROCEDURE UETADMIN2.LOAD_UET_COLLECTION_DATA_PAGE

/****************************************************************************************
* By: L. Betesh
* Procedure Name: LOAD_UET_COLLECTION_DATA_PAGE
*
* Description: This procedure is called by CW2_Daily. It produces daily CW2 performance
* statistics sourced from uet_collection_DATA into UET_AVG_TEMP.
* All successful or erroneous execution messages will be logged in procedure_log.
*
* Parameters: None
*
* Dependent Tables or views:
* procedure_log - Log table
* UET_COLECTION_DATA - Raw data
* V_UET_MAPPINGS - Contains function mappings
*
****************************************************************************************/

AS

--VARIABLES FOR EXCEPTION HANDLING AND LOG REPORTING
V_NUM_ROW NUMBER :=0;
V_PROCESS_NAME VARCHAR2(50) := 'LOAD_UET_COLLECTION_DATA_PAGE';

BEGIN


INSERT INTO PROCEDURE_LOG
VALUES(v_process_name||' '||TO_CHAR(SYSDATE, 'MM/DD/YYYY FMHH24:MI:SS'), 'START');
COMMIT;

EXECUTE IMMEDIATE 'TRUNCATE TABLE UET_COLLECTION_DATA_PAGE';

INSERT INTO UET_COLLECTION_DATA_PAGE

SELECT min(req_time) AS req_time,
map_id,
ltrim(rtrim(IPADDRESS)) AS ipaddress,
user_platform,
usa_id,
SUM(T_FIRSTBYTE) AS T_FIRSTBYTE,
SUM(T_DOWNLOAD) AS T_DOWNLOAD,
SUM(T_RENDER) AS T_RENDER,
ROUND(MAX(REQ_TIME + T_OVERALL/1000) - MIN(REQ_TIME), 2)*1000 AS T_OVERALL,
ltrim(rtrim(brn_i)) AS BRN_I,
lower(ltrim(rtrim(usr_uunm_I))) AS usr_uunm_I,
usr_rmt_initr_c
FROM ( SELECT IPADDRESS, req_time, map_id, user_platform, usa_id, T_OVERALL, t_firstbyte, t_download,
t_render, brn_i, usr_uunm_I, usr_rmt_initr_c,
max(rn) over (order by req_time)
AS max_rn
FROM (
SELECT app_pg_req_m AS req_time, app_pg_I AS map_id,
usr_ip_ads_t AS ipaddress, usr_pltfrm_x AS user_platform,
ed_usa_i AS usa_id, app_1st_byt_time_n AS t_firstbyte,
app_dnld_time_n AS t_download, app_rndr_time_n AS t_render,
app_ovrll_time_n AS t_overall, LTRIM(brn_i) AS BRN_I,
LOWER(usr_uunm_I) AS USR_UUNM_I, usr_rmt_initr_c,
lag(TO_CHAR(app_pg_req_m,'MM/DD/YYYY FMHH24:MI:SS'))
/* LINE 61*/ over (order by TO_CHAR(app_pg_req_m,'MM/DD/YYYY FMHH24:MI:SS')) AS previous,
CASE
when abs(lag(app_pg_req_m)
over (order by TO_CHAR(app_pg_req_m,'MM/DD/YYYY FMHH24:MI:SS')) - app_pg_req_m) > 3/24/60/60
THEN row_number() over (order by TO_CHAR(app_pg_req_m,'MM/DD/YYYY FMHH24:MI:SS'))
when row_number() over (order by TO_CHAR(app_pg_req_m,'MM/DD/YYYY FMHH24:MI:SS')) = 1
then 1
else null
end AS rn
FROM uetadmin2.uet_collection_data, UETADMIN2.V_UET_MAPPINGS,
(SELECT COUNT(DISTINCT app_pg_i) AS PG_CNT, app_func_i
FROM uetadmin2.application_function_page
GROUP BY app_func_i) afp
WHERE uetadmin2.v_uet_mappings.APP_FUNC_I = afp.APP_FUNC_I
AND pg_cnt > 1
AND uet_collection_data.app_pg_I = V_UET_MAPPINGS.MAP_ID
)
)
GROUP BY IPADDRESS, map_id, user_platform, usa_id, brn_i, usr_uunm_I, usr_rmt_initr_c, max_rn
;

COMMIT;

v_num_row := SQL%ROWCOUNT;

INSERT INTO PROCEDURE_LOG
VALUES(v_process_name||' '||TO_CHAR(SYSDATE, 'MM/DD/YYYY FMHH24:MI:SS'), 'FINISHED. ROWS ADDED: ' || V_NUM_ROW );
COMMIT;


END;
/

THANKS!!!

-LB


Tom Kyte
December 10, 2003 - 4:16 pm UTC

case, group by rollup, group by cube, analytic functions, order by in a subquery,etc are sql constructs that are not recognized by plsql in 8i

they are in 9i

until then you must use native dynamic sql to "hide" the construct from plsql


execute immediate 'insert into .... ' using bind1, bind2, bind3, ....;



NO Case in Stored Proc

A reader, December 10, 2003 - 1:54 pm UTC

Hi Tom,

I just found another place on this site where you said:

"case is not a plsql construct until 9i. It is available in sql in 816 and 817, not plsql. "

Now I know why I am getting an error.

So the only option I can think of is to create this query as a view and then load the table as select from the view.

Am I right that that is the only option?

Thanks again!

Tom Kyte
December 10, 2003 - 4:18 pm UTC



execute immediate, dynamic sql

Question...

Kashif, December 11, 2003 - 9:41 am UTC

Hi Tom,

Excellent use of row_number and lag to achieve the required results. Though I'm a little unclear on how the 'max_rn' ( max(rn) over (order by x) ) column's value was derived, and it's my lack of understanding of how the max analytic function works. The 'rn' column has '1', followed by 9 nulls, then '11', followed by 2 nulls, and finally '14', followed 11 nulls. The 'max_rn' column though manages to substitute every null occurrence in the 'rn' column with the previous highest 'rn' value. I'm not clear on why the max function would do that, i.e. substitute the nulls in a column with the previous highest value of rn. I would've thought that if there was a null in 'rn', it would've returned a null in the 'max_rn' column. Any explanation would be helpful. Thanks.

Kashif

Tom Kyte
December 11, 2003 - 10:05 am UTC

max(rn) over (order by x)

says to take the result set (don't partition, no partition clause),

order by X (so you get the first row with 1, followed by some nulls, then 11, more nulls, then 14, more nulls)

using range between current row and unbounded preceding (that is the DEFAULT range) give me the max value. That basically means -- look for the max value in this result set starting with the current row and looking at ALL preceding rows. So, for row 1 -- that is 1 (current row plus preceding -- of which there are none). for the second row that is 1 (current row = null, preceding = 1, max is 1). for the 11th row that is 11 (current row = 11, preceding rows = NULL except for row 1 which is 1, max = 11)

and so on.

Awesome, thanks a lot Tom!

Kashif, December 11, 2003 - 1:57 pm UTC


A reader, December 11, 2003 - 2:01 pm UTC

Tom,

Out of curiosity, were you involved in the designing of Analytic functions in Oracle.


TIA.

Tom Kyte
December 11, 2003 - 2:17 pm UTC

nope.

mod_plsql.
html db to a degree..
various "i sure wish it worked like this" emails...

but no, I don't work in development really.

Wish I did think of them -- coolest things to happen to sql since 'select'

my favorite line in the reply was

peon, December 12, 2003 - 12:19 pm UTC

"the way I decided to deal with this is"

One day I hope to be able to speak or write such a phrase with confidence in that I will be able to solve any problem thrown at me with a correct and fast solution.


The only dissatisfying aspect of the masses making such comments is that the site owner never replies to such fan club clamoring... ;(


Time Based Statistics?

Raj Ganga, December 28, 2003 - 11:54 pm UTC

Tom,
i have table t

PID NUMBER
DID NUMBER
CONS_TIME DATE
CONS_FEE NUMBER

pid is the unqiue id for patient and did for Doctor
cons fee is the consoltation fee
how to find the sum of fee colloected per 5 per doctor
min on a given day.

i have tried with anlytical functions but each has its own window and sum is not reflecting correct.

Please guide me,
thank you for spending your most valuable time on my problem.

warm Regards
Raj.

Tom Kyte
December 29, 2003 - 10:27 am UTC

what is a "5 per doctor min"

no idea how to use a DATE (point in time) as a duration here. you'll really need to be a little more explicit -- an example set of inputs, the desired outputs and a "how to" go from inputs to outputs would be needed.

Timed Stastics

Rajshekhar GANGA, December 29, 2003 - 11:43 pm UTC

Tom,
table t :

PID NUMBER
DID NUMBER
CONS_TIME DATE
CONS_FEE NUMBER

Following are rows for one doctor id.

PID DID CONS_TIME CONS_FEE
--------- ---------- ------------------- ----------
202737 1 26/12/2003 09:39:36 202737
33867 1 26/12/2003 09:40:36 16934
33894 1 26/12/2003 09:41:36 11298
33895 1 26/12/2003 09:42:36 8474
34242 1 26/12/2003 09:43:36 6848
34327 1 26/12/2003 09:44:36 5721
33868 1 26/12/2003 09:45:36 4838
687891 1 26/12/2003 09:46:36 85986
220913 1 26/12/2003 09:47:36 24546
35122 1 26/12/2003 09:48:36 3512

i want to have time based statistics(per 5min.) like.
Did Total Collection
1 26/12/2003 09:44:36 252012 (202737+16934+11298+8474+6848+5721)
1 26/12/2003 09:48:36 118882
(4838+85986+ 24546+3512)

Thanks
Raj.

Tom Kyte
December 30, 2003 - 9:57 am UTC


using slightly different data to test boundaries, but here you go:


ops$tkyte@ORA9IR2> select did,
  2         cons_time,
  3             cons_fee,
  4             trunc(abs( (((cons_time-min(cons_time) over (partition by did))*24*60)-0.0001)/5 )) grp
  5    from t
  6  /
 
       DID CONS_TIME             CONS_FEE        GRP
---------- ------------------- ---------- ----------
         1 26/12/2003 09:39:36     202737          0
         1 26/12/2003 09:40:36      16934          0
         1 26/12/2003 09:41:36      11298          0
         1 26/12/2003 09:42:36       8474          0
         1 26/12/2003 09:43:36       6848          0
         1 26/12/2003 09:44:36       5721          0
         1 26/12/2003 09:45:36       4838          1
         1 26/12/2003 09:46:36      85986          1
         1 26/12/2003 09:47:36      24546          1
         1 26/12/2003 09:48:36       3512          1
         1 26/12/2003 09:49:36          0          1
         1 26/12/2003 09:50:36       3512          2
 
12 rows selected.
 
ops$tkyte@ORA9IR2> select did, min(cons_time), max(cons_time), sum(cons_fee)
  2    from (
  3  select did,
  4         cons_time,
  5             cons_fee,
  6             trunc(abs( (((cons_time-min(cons_time) over (partition by did))*24*60)-0.0001)/5 )) grp
  7    from t
  8         )
  9   group by did, grp
 10  /
 
       DID MIN(CONS_TIME)      MAX(CONS_TIME)      SUM(CONS_FEE)
---------- ------------------- ------------------- -------------
         1 26/12/2003 09:39:36 26/12/2003 09:44:36        252012
         1 26/12/2003 09:45:36 26/12/2003 09:49:36        118882
         1 26/12/2003 09:50:36 26/12/2003 09:50:36          3512
 

Test Connection

Rajashekhar GANGA, December 30, 2003 - 11:25 pm UTC

Tom,
thanks a lot

sorry for adding in response,
i have a question
we have leased lines connecting to our manufacturing units,
there are trigger and procedures which inserts rows from one location to other.
The problem is when the line is closed our procedures and triggers raises “timed out”
error
How can we check the connection to the remote computer similar to “ping” utility.

Happy New Year.

Regards
Raj.






Tom Kyte
December 31, 2003 - 9:35 am UTC

sounds like you need a new leaser -- the QOS sounds pretty bad.

basically, there is no "ping" in the database and even if there was, so ? you ping, its alive, you start and it dies. pinging buys you nothing really.

use tnsping

A reader, December 31, 2003 - 6:07 am UTC

.
Tom Kyte
December 31, 2003 - 9:49 am UTC

in a stored procedure or trigger, hmmm.

Test Connection

Raj GANAGA, December 31, 2003 - 11:52 pm UTC

i think i have not made it clear,

i have database link which connects to remote database user.

if table t has an update then the relevant table t on the remote database to be updated or inserted based on some business logic. if this logic is place in a procedure and call it from update or insert trigger,
i need to check the connectivity to the remote database before calling the procedure other wise "Connection Timed Out" will be raised after some time,
this delay in raising the exception makes catching the exception not of much use.

Thankx for reading the request.
Raj.


Tom Kyte
January 01, 2004 - 9:08 am UTC

i understood that -- there is nothing like "ping" in there -- and if there was, it would suffer the same exact "timeout issue".

Suggestion -- don't have client applications do distributed operations. Instead, have the client application schedule a job via dbms_job, have the job do the distributed operation. Now, you don't really care if it take .05 seconds of 50 seconds as no humans are sitting there waiting for it and your system will appear to be infinitely more available (as you are no longer relying on the wire to be there in real time).

Syncronous replication is a really really bad idea -- it really means "i should have only one database in the first place"

Olaf, January 05, 2004 - 7:54 am UTC

Tom,

I have a very big table with entries of patients. The data are inserted online from many mobile devises (OLTP system)
How can I select (most performant) last 5 entries for one patient (SQL query)

Thanks in advance

Olaf

Tom Kyte
January 05, 2004 - 9:27 am UTC

select *
from ( select /*+ FIRST_ROWS */ *
from T
where patient_id = :x
order by patiend_id DESC, visit_date DESC )
where rownum <= 5;


using the CBO. Consider:


big_table@ORA920PC> select count(*) from v;

COUNT(*)
----------
3500000

1 row selected.

big_table@ORA920PC>
big_table@ORA920PC> set autotrace on
big_table@ORA920PC> select *
2 from ( select /*+ FIRST_ROWS */ *
3 from v
4 where patient_id = :x
5 order by patient_id DESC, visit_date DESC )
6 where rownum <= 5;

PATIENT_ID VISIT_DAT DATA1 DATA2
------------------------------ --------- ------------------------------ ------------------
BIG_TABLE 09-JUL-03 BIG_TABLE TABLE
BIG_TABLE 09-JUL-03 BIG_TABLE TABLE
BIG_TABLE 09-JUL-03 BIG_TABLE TABLE
BIG_TABLE 09-JUL-03 BIG_TABLE TABLE
BIG_TABLE 09-JUL-03 BIG_TABLE TABLE

5 rows selected.


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=HINT: FIRST_ROWS (Cost=58797 Card=5 Bytes=270)
1 0 COUNT (STOPKEY)
2 1 VIEW (Cost=58797 Card=106061 Bytes=5727294)
3 2 TABLE ACCESS (BY INDEX ROWID) OF 'BIG_TABLE' (Cost=58797 Card=106061 Bytes=4242440)
4 3 INDEX (RANGE SCAN DESCENDING) OF 'BT_OWNER_CREATED' (NON-UNIQUE) (Cost=365 Card=106061)




Statistics
----------------------------------------------------------
7 recursive calls
0 db block gets
12 consistent gets
0 physical reads
0 redo size
623 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
5 rows processed

big_table@ORA920PC> set autotrace off


Why not "OVER (PARTITION BY VISIT_ID ORDER BY DATE_CLICK RANGE INTERVAL '1' DAY"

Lee, January 05, 2004 - 3:38 pm UTC

Tom,
In all the response for the questions that dealt with getting records within a time frame you did not mentioned:

OVER (PARTITION BY COL_1 ORDER BY DATE_COL RANGE INTERVAL '1' DAY

I found this method (with some additions and variations) to work very good for getting a sliding window of records, don’t you agree?

Thanks.


Tom Kyte
January 06, 2004 - 7:42 am UTC

that would get a sliding window -- yes.

But, we didn't WANT a sliding window. We wanted to

a) sort records by time
b) take ALL records within 3 seconds of eachother
c) turn them into a single record


Lets see you do that with a sliding window. the sliding window here would not be an appropriate tool for the question asked.

added another variable

LB, January 06, 2004 - 1:28 pm UTC

Hi Tom,

Thanks for your help before.
Now we have another variable added in and I can't figure out how to add it to the query.
Below is the data you used before plus a sample of the new variable:

X Y Z
-------- ---------- ---------------------
12:22:03 69 USER1
12:22:04 62 USER1
12:22:05 73 USER1
12:22:06 69 USER1
12:22:07 95 USER1
12:22:08 87 USER1
12:22:09 49 USER2
12:22:10 85 USER3
12:22:11 94 USER2
12:22:12 57 USER4
12:22:18 86 USER1
12:22:19 50 USER1
12:22:20 90 USER1
12:22:36 63 USER1
12:22:37 26 USER1
12:22:38 14 USER1
12:22:39 96 USER1
12:22:40 18 USER1
12:22:41 77 USER1
12:22:42 50 USER1
12:22:43 11 USER1
12:22:44 1 USER1
12:22:45 74 USER1
12:22:46 100 USER1
12:22:47 20 USER1

Basically, we only want to sum those records that are of the same user.

I have tried adding the user field to the order by of the lag but that did not work. Should I do group by z in the subquery? All I know is that I have to modify the innermost query first to get the groupings by user.

Have I explained the question clearly?

Thanks a million.

Leora

Tom Kyte
January 06, 2004 - 2:23 pm UTC

you want to PARTITION BY Z, you want to break the input query into N partitions -- one for each value of Z.


ops$tkyte@ORA920PC> alter table t add z varchar2(10);
 
Table altered.
 
ops$tkyte@ORA920PC> update t set z = 'user1';
 
25 rows updated.
 
ops$tkyte@ORA920PC> insert into t select x, y, 'user2' from t;
 
25 rows created.
 
ops$tkyte@ORA920PC> commit;
 
Commit complete.
 
ops$tkyte@ORA920PC>
ops$tkyte@ORA920PC> alter session set nls_date_format = 'hh24:mi:ss';
 
Session altered.
 
ops$tkyte@ORA920PC>
ops$tkyte@ORA920PC> select z, min(x), max(x), sum(y)
  2    from (
  3  select x, y, z, max(rn) over (partition by z order by x) max_rn
  4    from (
  5  select x, y, z,
  6         lag(x) over (partition by z order by x),
  7         case when abs(lag(x) over (partition by z
  8                                        order by x) - x) > 3/24/60/60
  9                      then row_number() over (partition by z order by x)
 10                  when row_number() over (partition by z order by x) = 1 then 1
 11                          else null
 12                  end rn
 13    from t
 14         )
 15         )
 16   group by z, max_rn
 17  /
 
Z          MIN(X)   MAX(X)       SUM(Y)
---------- -------- -------- ----------
user1      12:22:03 12:22:12        446
user1      12:22:18 12:22:20        221
user1      12:22:36 12:22:47        342
user2      12:22:03 12:22:12        446
user2      12:22:18 12:22:20        221
user2      12:22:36 12:22:47        342
 
6 rows selected.
 

Thanks

LB, January 06, 2004 - 2:28 pm UTC

Another lesson in analytics.
I think once you do it once it becomes easier to do again.

Thanks a lot.

Simple genial and simple solution

www.db-nemec.com, January 10, 2004 - 3:11 pm UTC

I have been looking for this solution since December 10, 2001 (see </code> http://asktom.oracle.com/pls/ask/f?p=4950:61:3018938560581774526::::P61_ID:2165133263446 <code>and ctrl F 'Are there limits on analytic functions'

It is nice to see that there is a analytic solution to this "session assignment" problem

Once again simple and ingenious solution

Thanks

Nemec


grouping within a window of time

bob, April 13, 2004 - 10:50 am UTC

Starting with the data from the original question the user posted and adding an attribute just for display:

here is the create and insert for a modified set of that original data.

create table test (date_time date, amount number, attrib varchar2(10));
insert into test values (to_date('11/22/2003 12:22:01','MM/DD/YYYY HH24:MI:SS'),100,'abc1');    
insert into test values (to_date('11/22/2003 12:22:03','MM/DD/YYYY HH24:MI:SS'),200,'abc2'); 
insert into test values (to_date('11/22/2003 12:22:05','MM/DD/YYYY HH24:MI:SS'),300,'abc3'); 
insert into test values (to_date('11/22/2003 12:22:08','MM/DD/YYYY HH24:MI:SS'),100,'abc1'); 
insert into test values (to_date('11/22/2003 12:22:13','MM/DD/YYYY HH24:MI:SS'),200,'abc2'); 
insert into test values (to_date('11/22/2003 12:22:26','MM/DD/YYYY HH24:MI:SS'),100,'abc1'); 
insert into test values (to_date('11/22/2003 12:22:27','MM/DD/YYYY HH24:MI:SS'),200,'abc2'); 
insert into test values (to_date('11/22/2003 12:22:28','MM/DD/YYYY HH24:MI:SS'),300,'abc3'); 
insert into test values (to_date('11/22/2003 12:22:33','MM/DD/YYYY HH24:MI:SS'),200,'abc2'); 
insert into test values (to_date('11/22/2003 12:22:35','MM/DD/YYYY HH24:MI:SS'),100,'abc1'); 
insert into test values (to_date('11/22/2003 12:22:45','MM/DD/YYYY HH24:MI:SS'),300,'abc3'); 
insert into test values (to_date('11/22/2003 12:22:48','MM/DD/YYYY HH24:MI:SS'),200,'abc2'); 
insert into test values (to_date('11/22/2003 12:22:50','MM/DD/YYYY HH24:MI:SS'),100,'abc1'); 
insert into test values (to_date('11/22/2003 12:22:55','MM/DD/YYYY HH24:MI:SS'),200,'abc2'); 

can you answer the following question with analytics?

For a time (:t), pivot the date_time and attributes for a group (100,200,300) that is within a (:x) second cluster of each other, and for which all values are before start time (:t).  Only show the group that is closest to the start time.

I have an non-analytics approach that full scans the table three times using nested loops and sometimes the index on amount (depending on other conditions in the subqueries).  I know it can be improved, and maybe I am missing a lead/lag time window approach. 

SQL> variable t varchar2(25);
SQL> variable x number;
SQL> exec :t := '11/22/2003 12:22:37';

PL/SQL procedure successfully completed.

SQL> exec :x := 5;

PL/SQL procedure successfully completed.

SQL> SELECT * FROM (
  2    SELECT to_char(a1.date_time,'HH24:MI:SS') a1_time,a1.attrib a1_attrib, 
  3           to_char(a2.date_time,'HH24:MI:SS') a2_time,a2.attrib a2_attrib,
  4           to_char(a3.date_time,'HH24:MI:SS') a3_time,a3.attrib a3_attrib
  5    FROM 
  6    (SELECT * from test where amount=100 
  7     AND date_time- to_date(:t,'MM/DD/YYYY HH24:MI:SS')<= 0) a1,
  8    (SELECT * from test where amount=200 
  9     AND date_time- to_date(:t,'MM/DD/YYYY HH24:MI:SS')<= 0) a2,
 10    (select * from test where amount=300
 11     and date_time- to_date(:t,'MM/DD/YYYY HH24:MI:SS')<= 0) a3
 12    WHERE abs(a2.date_time-a1.date_time) <= 1/24/60/60*:x
 13    AND abs(a3.date_time-a1.date_time) <= 1/24/60/60*:x
 14    ORDER by a1.date_time-to_date(:t,'MM/DD/YYYY HH24:MI:SS') desc
 15  ) final
 16  where rownum <=1
 17  
SQL> /


A1_TIME  A1_ATTRIB  A2_TIME  A2_ATTRIB  A3_TIME  A3_ATTRIB
-------- ---------- -------- ---------- -------- ----------
12:22:26 abc1       12:22:27 abc2       12:22:30 abc3

can you answer this question with analytics similarly to how you grouped data by a time interval as you demonstrated in this thread? 


 

Tom Kyte
April 13, 2004 - 3:32 pm UTC

it is not a simple lag/lead 

the problem is there is no way to sort the data -- there is nothing saying amount=100 might happen 5 times AND THEN amount=200 happens once and then amount=300 happens (in that order) all within the input number of seconds.  No way to look forward a row, two rows, N rows -- as you don't know how many to skip over.

Or, amount=300 happens and then 100 and then 200.... so they can arrive in random order....


Also, you are doing a non-equi join, meaning each row from the 1st query can be joined to 0..N rows in the second and so on.  Analytics will preserve the cardinality -- you needed to explode it out and create all possible pairings.


But let me rephrase your logic a little based on your query:


find the record for amount=100 nearest to, but less than :T that has an amount=200 record that happened before :T within :X seconds of that amount=100 record AND an amount=300 record that happened before :T within :X seconds as well.

Report the first such record that has both an amount=200 and 300 within that time range.


Now, a nice index on T(amount,date_time,<attrib>)  -- attrib is sort of optional there, would avoid the table access by index rowid -- and the query:

ops$tkyte@ORA9IR2> select a1_time, a1_attrib,
  2         to_date(substr(a2,1,14),'yyyymmddhh24miss') a2_time, substr(a2,15) a2_attrib,
  3         to_date(substr(a3,1,14),'yyyymmddhh24miss') a3_time, substr(a3,15) a3_attrib
  4    from (
  5  select a.date_time a1_time, a.attrib a1_attrib,
  6            (select to_char(date_time,'yyyymmddhh24miss') || attrib
  7               from test b
  8                          where amount = 200
  9                and date_time <= to_date(:t,'MM/DD/YYYY HH24:MI:SS')
 10                            and ABS(date_time-A.DATE_TIME) <= 1/24/60/60*:x
 11                            and rownum = 1) a2,
 12            (select to_char(date_time,'yyyymmddhh24miss') || attrib
 13               from test c
 14                          where amount = 300
 15                and date_time <= to_date(:t,'MM/DD/YYYY HH24:MI:SS')
 16                            and ABS(date_time-A.DATE_TIME) <= 1/24/60/60*:x
 17                            and rownum = 1) a3
 18    from test A
 19   where date_time <= to_date(:t,'MM/DD/YYYY HH24:MI:SS')
 20     and amount = 100
 21   order by amount desc, date_time desc
 22         )
 23   where a2 is not null
 24     and a3 is not null
 25     and rownum = 1
 26  /
 
A1_TIME   A1_ATTRIB  A2_TIME   A2_ATTRIB  A3_TIME   A3_ATTRIB
--------- ---------- --------- ---------- --------- ----------
22-NOV-03 abc1       22-NOV-03 abc2       22-NOV-03 abc3
 
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=3 Card=1 Bytes=44)
   1    0   COUNT (STOPKEY)
   2    1     VIEW (Cost=3 Card=1 Bytes=44)
   3    2       TABLE ACCESS (BY INDEX ROWID) OF 'TEST' (Cost=3 Card=1 Bytes=29)
   4    3         INDEX (RANGE SCAN DESCENDING) OF 'TEST_IDX' (NON-UNIQUE) (Cost=2 Card=1)
   5    4           COUNT (STOPKEY)
   6    5             TABLE ACCESS (BY INDEX ROWID) OF 'TEST' (Cost=3 Card=3 Bytes=87)
   7    6               INDEX (RANGE SCAN) OF 'TEST_IDX' (NON-UNIQUE) (Cost=2 Card=1)
   8    4           COUNT (STOPKEY)
   9    8             TABLE ACCESS (BY INDEX ROWID) OF 'TEST' (Cost=3 Card=3 Bytes=87)
  10    9               INDEX (RANGE SCAN) OF 'TEST_IDX' (NON-UNIQUE) (Cost=2 Card=1)

gets it with what I think would be "minimal work"

(your query needn't full scan as it is, it didn't on my machine anyway -- make sure you are using the CBO!) 

magician

bob, April 14, 2004 - 4:57 pm UTC

I didn't think it was possible either due to the 1..N rows ahead or behind the rows could be, but I envisioned you pulling something out of that magic analytic sql hat that would first transform the set into something that could use analytics...

my 9.2 laptop wasn't Full Scanning, but the real system with stats and the cbo was.. The query might was slightly different.

Maybe if you weren't so impressive all the time, I wouldn't get the idea that there was always a better way i was missing. in this case there was as well, just not analytically.

i have seen you use the concat/substr approach to other things.. I will have to go back and review other places you have used that to better know when that is worthy of being used.

overlooked nearest row?

bob, May 10, 2004 - 8:06 pm UTC

Tom,

I think you might overlooked something, you help me tremendously, so I am not complaining at all, I am just trying to be sure I understood your solution.

you wrote: "
find the record for amount=100 nearest to, but less than :T that has an amount=200 record that happened before :T within :X seconds of that amount=100 record AND an amount=300 record that happened before :T within :X seconds as well."

How does the solution you proposed ensure that I get the "nearest" row to the amount=100 row since there is no ordering in the subqueries?

If in fact the subquery returns more than one row within :t number of seconds that has an amount=200, I will just get the first one the db returns. right?

I think I have to nest a subquery one more level deep in each subquery and order by the time delta. Is that correct?



Tom Kyte
May 10, 2004 - 9:01 pm UTC

you are right, logic would need to change a tad to do that (but -- so would yours)

we both return a random row within N seconds of the row of interest.

i see (my bad)

bob, May 10, 2004 - 8:30 pm UTC

you said nearest to time :t, i meant the nearest 200 and nearest 300 after 100 in time, where all of them are before time t and within :x seconds of each other.

I am not suggesting the original query I submitted does that.



rambling tonight

bob, May 10, 2004 - 8:51 pm UTC

please disregard the "after" aspect. the ABS says it doesn't matter if it is before or after. the only area of uncertainty for me is ensuring the row I am seeing for amount=200 and amount=300 is the nearest to the amount 100 (in the event there is more than one in the window).



Tom Kyte
May 11, 2004 - 8:12 am UTC

but your query didn't do that either?

it got a random row as well. we are "equivalent" in that respect. do you still need to change it?

yes, i do

bob, May 11, 2004 - 9:30 am UTC

can you show me the change that is necessary to make that happen?

I have tried several things, but can't get the correlated subquery to return a single row that is ordered by the time delta.

The original query(not the one I submitted) had an:
"ORDER BY date_time- to_date(:t,'MM/DD/YYYY HH24:MI:SS')"
which ensured that each subquery for amount=200 and amount=300 returned the "nearest" to the amount=100 row first.

Does that fact I excluded eliminate the benefits to your approach?

I recognize that now additional LIO will have to occur to get all the rows that fit the time window and find the lowest time delta, but maybe there is other optimizations I am not aware of.

Your original solution saw how wasteful it was that the subqueries were not being short circuited with rownum=1 when there was no ordering going on. I apologize for not giving the complete picture at that point that would have been necessary to provide the best solution.

Tom Kyte
May 11, 2004 - 10:36 am UTC

that didn't do it.

SQL> SELECT * FROM (
  2    SELECT to_char(a1.date_time,'HH24:MI:SS') a1_time,a1.attrib a1_attrib, 
  3           to_char(a2.date_time,'HH24:MI:SS') a2_time,a2.attrib a2_attrib,
  4           to_char(a3.date_time,'HH24:MI:SS') a3_time,a3.attrib a3_attrib
  5    FROM 
  6    (SELECT * from test where amount=100 
  7     AND date_time- to_date(:t,'MM/DD/YYYY HH24:MI:SS')<= 0) a1,
  8    (SELECT * from test where amount=200 
  9     AND date_time- to_date(:t,'MM/DD/YYYY HH24:MI:SS')<= 0) a2,
 10    (select * from test where amount=300
 11     and date_time- to_date(:t,'MM/DD/YYYY HH24:MI:SS')<= 0) a3
 12    WHERE abs(a2.date_time-a1.date_time) <= 1/24/60/60*:x
 13    AND abs(a3.date_time-a1.date_time) <= 1/24/60/60*:x
 14    ORDER by a1.date_time-to_date(:t,'MM/DD/YYYY HH24:MI:SS') desc
 15  ) final
 16  where rownum <=1

that ordered the data by a1.date_time -- a2 and a3 -- NOT, you got "a random row within abs(a2.date_time-a1.date_time) <= 1/24/60/60*:x" for a2.  


Definitely did not mandate the closest row to a1!  You needed to sort A2 and sort A3 - in order to get the A2 and A3 "closest"



consider (i added a single new insert to your test case, amount=200 at time 12:22:28)

ops$tkyte@ORA9IR2> select * from test where amount = 200 order by date_time;
 
DATE_TIME               AMOUNT ATTRIB
------------------- ---------- ----------
11/22/2003 12:22:03        200 abc2
11/22/2003 12:22:13        200 abc2
11/22/2003 12:22:27        200 abc2ZZZ   <<<=== your row
11/22/2003 12:22:28        200 abc2XXX   <<<=== my row
11/22/2003 12:22:33        200 abc2
11/22/2003 12:22:48        200 abc2
11/22/2003 12:22:55        200 abc2
 
7 rows selected.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> SELECT *
  2    FROM (
  3  SELECT to_char(a1.date_time,'HH24:MI:SS') a1_time,a1.attrib a1_attrib,
  4         to_char(a2.date_time,'HH24:MI:SS') a2_time,a2.attrib a2_attrib,
  5         to_char(a3.date_time,'HH24:MI:SS') a3_time,a3.attrib a3_attrib
  6    FROM (SELECT * from test where amount=100 AND date_time- to_date(:t,'MM/DD/YYYY HH24:MI:SS')<= 0) a1,
  7         (SELECT * from test where amount=200 AND date_time- to_date(:t,'MM/DD/YYYY HH24:MI:SS')<= 0) a2,
  8         (select * from test where amount=300 and date_time- to_date(:t,'MM/DD/YYYY HH24:MI:SS')<= 0) a3
  9   WHERE abs(a2.date_time-a1.date_time) <= 1/24/60/60*:x
 10     AND abs(a3.date_time-a1.date_time) <= 1/24/60/60*:x
 11   ORDER by a1.date_time-to_date(:t,'MM/DD/YYYY HH24:MI:SS') desc
 12         ) final
 13  where rownum <=1
 14  /
 
A1_TIME  A1_ATTRIB  A2_TIME  A2_ATTRIB  A3_TIME  A3_ATTRIB
-------- ---------- -------- ---------- -------- ----------
12:22:26 abc1       12:22:28 abc2XXX    12:22:28 abc3
                    ^^^^^^^^^^^^^^^^ not the closest......

if you add this and don't see the same thing, mix up the order of the inserts.  consider:

ops$tkyte@ORA9IR2> select * from test where amount = 200 order by date_time;
 
DATE_TIME               AMOUNT ATTRIB
------------------- ---------- ----------
11/22/2003 12:22:03        200 abc2
11/22/2003 12:22:13        200 abc2
11/22/2003 12:22:27        200 abc2ZZZ
11/22/2003 12:22:28        200 abc2XXX
11/22/2003 12:22:33        200 abc2
11/22/2003 12:22:48        200 abc2
11/22/2003 12:22:55        200 abc2
 
7 rows selected.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> SELECT *
  2    FROM (
  3  SELECT to_char(a1.date_time,'HH24:MI:SS') a1_time,a1.attrib a1_attrib,
  4         to_char(a2.date_time,'HH24:MI:SS') a2_time,a2.attrib a2_attrib,
  5         to_char(a3.date_time,'HH24:MI:SS') a3_time,a3.attrib a3_attrib
  6    FROM (SELECT * from test where amount=100 AND date_time- to_date(:t,'MM/DD/YYYY HH24:MI:SS')<= 0) a1,
  7         (SELECT * from test where amount=200 AND date_time- to_date(:t,'MM/DD/YYYY HH24:MI:SS')<= 0) a2,
  8         (select * from test where amount=300 and date_time- to_date(:t,'MM/DD/YYYY HH24:MI:SS')<= 0) a3
  9   WHERE abs(a2.date_time-a1.date_time) <= 1/24/60/60*:x
 10     AND abs(a3.date_time-a1.date_time) <= 1/24/60/60*:x
 11   ORDER by a1.date_time-to_date(:t,'MM/DD/YYYY HH24:MI:SS') desc
 12         ) final
 13  where rownum <=1
 14  /
 
A1_TIME  A1_ATTRIB  A2_TIME  A2_ATTRIB  A3_TIME  A3_ATTRIB
-------- ---------- -------- ---------- -------- ----------
12:22:26 abc1       12:22:27 abc2ZZZ    12:22:28 abc3
                    ^^^^^^^^^^^^^^^^ is the right answer, all i did was 
                                     flip flop the inserts



Your query never returned the closest row (well, it never PROMISED to!!!)


but, in any case, we can modify the query to return the row "closest" to A1 that matches the other criteria easily:


ops$tkyte@ORA9IR2> select a1_time, a1_attrib,
  2         to_date(substr(a2,6,14),'yyyymmddhh24miss') a2_time, substr(a2,20) a2_attrib,
  3         to_date(substr(a3,6,14),'yyyymmddhh24miss') a3_time, substr(a3,20) a3_attrib
  4    from (
  5  select a.date_time a1_time, a.attrib a1_attrib,
  6            (select min( 
                   to_char(abs(round(a.date_time-date_time,5)),'fm00000') ||
  7                         to_char(date_time,'yyyymmddhh24miss') || attrib )
  8               from test b
  9              where amount = 200
 10                and date_time <= to_date(:t,'MM/DD/YYYY HH24:MI:SS')
 11                and date_time between a.date_time-1/24/60/60*:x 
                                    and a.date_time+1/24/60/60*:x) a2,
 12            (select min( 
                   to_char(abs(round(a.date_time-date_time,5)),'fm00000') ||
 13                         to_char(date_time,'yyyymmddhh24miss') || attrib )
 14               from test c
 15              where amount = 300
 16                and date_time <= to_date(:t,'MM/DD/YYYY HH24:MI:SS')
 17                and date_time between a.date_time-1/24/60/60*:x 
                                     and a.date_time+1/24/60/60*:x) a3
 18    from test A
 19   where date_time <= to_date(:t,'MM/DD/YYYY HH24:MI:SS')
 20     and amount = 100
 21   order by amount desc, date_time desc
 22         )
 23   where a2 is not null
 24     and a3 is not null
 25     and rownum = 1
 26  /
 
A1_TIME             A1_ATTRIB  A2_TIME             A2_ATTRIB   A3_TIME             A3_ATTRIB
------------------- ---------- ------------------- ----------- ------------------- -----------
11/22/2003 12:22:26 abc1       11/22/2003 12:22:27 abc2ZZZ     11/22/2003 12:22:28 abc3





That'll use the index I assume is on test(amount,date_time,<attrib>) to range scan on "amount=K and date between K1 and K2", limiting the number of rows it looks at considerably. 

thanks again

bob, May 11, 2004 - 1:38 pm UTC

I am very appreciative of your efforts.

If the "ORDER BY date_time-to_date(:t,'MM/DD/YYYY HH24:MI:SS')" was repeated in each of the subqueries (a2 and a3) as well, doesn't that ensure that when the final result set is assembled from each of the subqueries that the first row in that final result set will contain the lowest delta (closest row) for a2, and a3 (because each of the tables that was used to build that result set were sorted in that order).

ie.

ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> SELECT *
2 FROM (
3 SELECT to_char(a1.date_time,'HH24:MI:SS') a1_time,a1.attrib a1_attrib,
4 to_char(a2.date_time,'HH24:MI:SS') a2_time,a2.attrib a2_attrib,
5 to_char(a3.date_time,'HH24:MI:SS') a3_time,a3.attrib a3_attrib
6 FROM (SELECT * from test where amount=100 AND date_time-
to_date(:t,'MM/DD/YYYY HH24:MI:SS')<= 0) a1,
7 (SELECT * from test where amount=200 AND date_time-
to_date(:t,'MM/DD/YYYY HH24:MI:SS')<= 0 ORDER BY ABS(date_time-
to_date(:t,'MM/DD/YYYY HH24:MI:SS') asc) a2,
8 (select * from test where amount=300 and date_time-
to_date(:t,'MM/DD/YYYY HH24:MI:SS')<= 0 ORDER BY ABS(date_time-
to_date(:t,'MM/DD/YYYY HH24:MI:SS') asc) a3
9 WHERE abs(a2.date_time-a1.date_time) <= 1/24/60/60*:x
10 AND abs(a3.date_time-a1.date_time) <= 1/24/60/60*:x
11 ORDER by a1.date_time-to_date(:t,'MM/DD/YYYY HH24:MI:SS') desc
12 ) final
13 where rownum <=1
14 /


regardless of whether that is true (it seemed to be true for randomly inserted data I was testing with - I can't cut and paste from the pc I am currently on to show this), your solution will be very helpful.




Tom Kyte
May 11, 2004 - 3:10 pm UTC


joining three ordered sets together does NOT imply the results are sorted by the same way they came out! a simple sort/merge join, hash join, anything will mess up the sorted order!

especially when you toss in yet another order by!! that doesn't order by the same things.

naivety

bob, May 11, 2004 - 5:04 pm UTC

The shear volume of explanation points in that last post of yours implies I really should have known better than to make such an elementary assumption.

In my defense, I did contemplate the effects of a hash join on my assumption of the final result set, but fell back on the results I was seeing based on the plan being used.

I have a lot to learn.
Thank you for creating this forum to teach me the correct way to think and examine problems.

Tom Kyte
May 11, 2004 - 9:01 pm UTC



sorry, i love the !


but really - think "sets", think "sets".... SQL is so much easier when you let lose some of the procedural thinking we seem to be born with....

The tao of sets.

format mask mistake?

bob, May 11, 2004 - 10:40 pm UTC

Tom,

The question that won't die..

round(x,5) rounds at about the minute level, but the format mask 'fm00000' makes every time difference less than half a day (rounding up to 1) appear the same "00000".  

as a result, using the following data and your query as is, I get the wrong result.

drop table test;
create table test (date_time date, amount number, attrib varchar2(10));
insert into test values (to_date('11/22/2003 12:22:31','MM/DD/YYYYHH24:MI:SS'),300,'abc3'); 
insert into test values (to_date('11/22/2003 12:22:30','MM/DD/YYYYHH24:MI:SS'),300,'abc3'); 
insert into test values (to_date('11/22/2003 12:22:27','MM/DD/YYYYHH24:MI:SS'),300,'abc3'); 
insert into test values (to_date('11/22/2003 12:22:33','MM/DD/YYYYHH24:MI:SS'),200,'abc2'); 
insert into test values (to_date('11/22/2003 12:22:35','MM/DD/YYYYHH24:MI:SS'),100,'abc1'); 

SQL> select to_char(a1_time,'HH24:MI:SS'), a1_attrib,
  2             to_char(to_date(substr(a2,6,14),'yyyymmddhh24miss'),'HH24:MI:SS') a2_time, substr(a2
,20) a2_attrib,
  3             to_char(to_date(substr(a3,6,14),'yyyymmddhh24miss'),'HH24:MI:SS') a3_time, substr(a3
,20) a3_attrib
  4             from (
  5      select a.date_time a1_time, a.attrib a1_attrib,
  6                (select min( 
  7                     to_char(abs(round(a.date_time-date_time,5)),'fm00000') ||
  8                             to_char(date_time,'yyyymmddhh24miss') || attrib )
  9                   from test b
 10                  where amount = 200
 11                   and date_time <= to_date(:t,'MM/DD/YYYY HH24:MI:SS')
 12                   and date_time between a.date_time-1/24/60/60*:x 
 13                                      and a.date_time+1/24/60/60*:x) a2,
 14               (select min( 
 15                     to_char(abs(round(a.date_time-date_time,5)),'fm00000') ||
 16                            to_char(date_time,'yyyymmddhh24miss') || attrib )
 17                  from test c
 18                 where amount = 300
 19                   and date_time <= to_date(:t,'MM/DD/YYYY HH24:MI:SS')
 20                   and date_time between a.date_time-1/24/60/60*:x 
 21                                       and a.date_time+1/24/60/60*:x) a3
 22       from test A
 23      where date_time <= to_date(:t,'MM/DD/YYYYHH24:MI:SS')
 24        and amount = 100
 25      order by amount desc, date_time desc
 26            )
 27      where a2 is not null
 28        and a3 is not null
 29        and rownum = 1
 30  /

TO_CHAR( A1_ATTRIB  A2_TIME  A2_ATTRIB   A3_TIME  A3_ATTRIB
-------- ---------- -------- ----------- -------- -----------
12:22:35 abc1       12:22:33 abc2        12:22:30 abc3 
                                         ^^^^^^^^ should be 12:22:31

but since the deltas are both 00000 because of the format mask, the first row in (order of insert) is the first row returned.

I suspect it is a simple format mask change to 'fm.00000' and changing a couple substr indexes.

SQL> select to_char(a1_time,'HH24:MI:SS'), a1_attrib,
  2             to_char(to_date(substr(a2,7,14),'yyyymmddhh24miss'),'HH24:MI:SS') a2_time, substr(a2
,21) a2_attrib,
  3             to_char(to_date(substr(a3,7,14),'yyyymmddhh24miss'),'HH24:MI:SS') a3_time, substr(a3
,21) a3_attrib
  4             from (
  5      select a.date_time a1_time, a.attrib a1_attrib,
  6                (select min( 
  7                     to_char(abs(round(a.date_time-date_time,5)),'fm.00000') ||
  8                             to_char(date_time,'yyyymmddhh24miss') || attrib )
  9                   from test b
 10                  where amount = 200
 11                   and date_time <= to_date(:t,'MM/DD/YYYY HH24:MI:SS')
 12                   and date_time between a.date_time-1/24/60/60*:x 
 13                                      and a.date_time+1/24/60/60*:x) a2,
 14               (select min( 
 15                     to_char(abs(round(a.date_time-date_time,5)),'fm.00000') ||
 16                            to_char(date_time,'yyyymmddhh24miss') || attrib )
 17                  from test c
 18                 where amount = 300
 19                   and date_time <= to_date(:t,'MM/DD/YYYY HH24:MI:SS')
 20                   and date_time between a.date_time-1/24/60/60*:x 
 21                                       and a.date_time+1/24/60/60*:x) a3
 22       from test A
 23      where date_time <= to_date(:t,'MM/DD/YYYY HH24:MI:SS')
 24        and amount = 100
 25      order by amount desc, date_time desc
 26            )
 27      where a2 is not null
 28        and a3 is not null
 29        and rownum = 1
 30  /

TO_CHAR( A1_ATTRIB  A2_TIME  A2_ATTRIB   A3_TIME  A3_ATTRIB
-------- ---------- -------- ----------- -------- -----------
12:22:35 abc1       12:22:33 abc2        12:22:31 abc3
                                         ^^^^^^^^  correct
                         
                                             
Does that accomplish what you intended, or was the mask serving another purpose? 

Tom Kyte
May 12, 2004 - 7:33 am UTC

doh, lost the v


'fmv00000'


v is implied decimal, that will return:

  1* select to_char( round(sysdate-sysdate+1/24/60,5), 'fmv00000') from dual
ops$tkyte@ORA9IR2> /
 
TO_CHA
------
00069


which is want I wanted. 

more simply, leave fm alone, multiple time deltas by 24*60*60

bob, May 12, 2004 - 7:12 am UTC

I suspect that was what you unintentionally excluded.

another good trick I acquired out of this thread..

Thanks

amazing results.

bob, May 12, 2004 - 10:19 am UTC

with a large set of test data and the real similar requirement that was using this approach you outlined in this example, LIO was reduced from 2400 to 10. fabulous!!!!! (favorable use of exclamation points. :) )

Clarification

A reader, August 09, 2004 - 6:32 pm UTC

Tom,

The explanation you have given is wonderful. However, when I ran the example you had given (to sum records with are three seconds apart), I get the records which are 9 seconds apart and not 3 as posted in the question. For e-g, I get the following results when I ran your example above.

MIN(X) MAX(X) Sum
=============================================
8/1/2004 12:22:03 PM 8/1/2004 12:22:12 PM 552
8/1/2004 12:22:18 PM 8/1/2004 12:22:20 PM 222
8/1/2004 12:22:36 PM 8/1/2004 12:22:47 PM 629

Ideally it should be

8/1/2004 12:22:03 PM 8/1/2004 12:22:06 PM xxx

where xxx can change because of the random function in your example.

However, in the above case, the records show a 9 second difference, but the difference you use in your query is only 3 second interval as per the question. Why is this different? Please clarify.

2. On a similar note, I want to do the following. I have a table T1 with say 3 columns :
CREATE TABLE T1
(
Col1 VARCHAR2(2),
Col2 Date,
Col3 Number
)

Eg values in the table :

Col1 Col2 Col3
C1 1/1/2001 100
C1 4/5/2001 200
C1 3/5/2001 800
C1 6/10/2001 200
C1 10/10/2001 200
C1 12/15/2001 150
C2 ...
C2 ....
C2 ....
C3 ....
C3 ....
C3 .....

I want to find the sum of Col3 that should break at each change in Col1 and also that for a given value of Col1 we need to find the following :
1. The first value of the column Col2 that is in the same quarter as the value of column Col2 of the current row
2. The sum of column col3 must only add those rows where the date in column Col2 should lie in the same quarter as the previous rows. If they are in different quarter, then the current row should not be included in the summation.
So per the above condition, the output should be as follows :
Prev
Col1 Col2 Col3 Qtr QTD Total
===========================================
C1 1/1/2001 100 Null 100(a)
C1 4/5/2001 200 Null 200(b)
C1 3/5/2001 800 1/1/2001 900(c)
C1 6/10/2001 200 4/5/2001 400(d)
C1 10/10/2001 200 Null 200(e)
C1 12/15/2001 150 10/10/2001 350(f)

(a) - No previous record in same quarter
(b) - No previous record in same quarter
(c) - 800 + 100 (Add Jan and Mar since they are in same qtr
and also PrevQtr is 1/1/2001)
(d) - 200 + 200 (Add Jun and Apr since they are in same qtr
and also PrevQtr is 4/5/2001)
(e) - No previous record in same quarter
(f) - 200 + 150 (Add Dec and Oct since they are in same qtr
and also PrevQtr is 10/10/2001)

Please let me know how I can acheive this in a single query. I tried to see some examples from your book (last employee hire date example)using LAG/LEAD analytic functions and couldn't find one where the range can be a numeric expression)

3. Suppose my table has a DATE column, is it possible to use a condition such as to sum up the previous rows that are in the same quarter as the date in the current row etc using the LAG/LEAD analytic functions.

Your response on this will be immensely helpful as I have to write a Rate of return calculation algorithm based on some of the logic I described above.

Thanks

Tom Kyte
August 09, 2004 - 8:57 pm UTC

no, you missed the thrust of this example

we wanted to start scanning through a set of records...

and as long as the preceding record was within three seconds -- then the current record would be in the same group.

so, if you had 1,000 records -- each 1 second apart -- there would be ONE group.
ONE group spanning 1,000 seconds

if you had 1,000 records -- each 4 seconds apart -- there would be ONE THOUSAND groups. ONE THOUSAND groups -- each spanning 4 seconds.

(and all combos in between)


if you just wanted "records between 12:00:00 and 12:00:03-" 
                   "records between 12:00:03 and 12:00:06-"
                   "records between 12:00:06 and 12:00:09-"

you would "divide and truncate" (like divide and conquer)



like this:

ops$tkyte@ORA9IR2> create table t
  2  as
  3  select sysdate+1/24/60/60*rownum dt from all_users;
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> @bigdate
ops$tkyte@ORA9IR2> alter session set nls_date_format = 'dd-mon-yyyy hh24:mi:ss';
 
Session altered.
 
ops$tkyte@ORA9IR2> select dt, trunc(dt) + trunc(to_char(dt,'sssss')/3)*1/24/60/60 dt2
  2    from t;
 
DT                   DT2
-------------------- --------------------
09-aug-2004 20:50:16 09-aug-2004 06:56:45
09-aug-2004 20:50:17 09-aug-2004 06:56:45
09-aug-2004 20:50:18 09-aug-2004 06:56:46
09-aug-2004 20:50:19 09-aug-2004 06:56:46
09-aug-2004 20:50:20 09-aug-2004 06:56:46
09-aug-2004 20:50:21 09-aug-2004 06:56:47
09-aug-2004 20:50:22 09-aug-2004 06:56:47
09-aug-2004 20:50:23 09-aug-2004 06:56:47
09-aug-2004 20:50:24 09-aug-2004 06:56:48
09-aug-2004 20:50:25 09-aug-2004 06:56:48
09-aug-2004 20:50:26 09-aug-2004 06:56:48
09-aug-2004 20:50:27 09-aug-2004 06:56:49
09-aug-2004 20:50:28 09-aug-2004 06:56:49
09-aug-2004 20:50:29 09-aug-2004 06:56:49
09-aug-2004 20:50:30 09-aug-2004 06:56:50
09-aug-2004 20:50:31 09-aug-2004 06:56:50
09-aug-2004 20:50:32 09-aug-2004 06:56:50
09-aug-2004 20:50:33 09-aug-2004 06:56:51
09-aug-2004 20:50:34 09-aug-2004 06:56:51
09-aug-2004 20:50:35 09-aug-2004 06:56:51
09-aug-2004 20:50:36 09-aug-2004 06:56:52
09-aug-2004 20:50:37 09-aug-2004 06:56:52
09-aug-2004 20:50:38 09-aug-2004 06:56:52
09-aug-2004 20:50:39 09-aug-2004 06:56:53
09-aug-2004 20:50:40 09-aug-2004 06:56:53
09-aug-2004 20:50:41 09-aug-2004 06:56:53
09-aug-2004 20:50:42 09-aug-2004 06:56:54
09-aug-2004 20:50:43 09-aug-2004 06:56:54
09-aug-2004 20:50:44 09-aug-2004 06:56:54
09-aug-2004 20:50:45 09-aug-2004 06:56:55
09-aug-2004 20:50:46 09-aug-2004 06:56:55
09-aug-2004 20:50:47 09-aug-2004 06:56:55
09-aug-2004 20:50:48 09-aug-2004 06:56:56
09-aug-2004 20:50:49 09-aug-2004 06:56:56
09-aug-2004 20:50:50 09-aug-2004 06:56:56
09-aug-2004 20:50:51 09-aug-2004 06:56:57
09-aug-2004 20:50:52 09-aug-2004 06:56:57
09-aug-2004 20:50:53 09-aug-2004 06:56:57
09-aug-2004 20:50:54 09-aug-2004 06:56:58
09-aug-2004 20:50:55 09-aug-2004 06:56:58
09-aug-2004 20:50:56 09-aug-2004 06:56:58
09-aug-2004 20:50:57 09-aug-2004 06:56:59
09-aug-2004 20:50:58 09-aug-2004 06:56:59
09-aug-2004 20:50:59 09-aug-2004 06:56:59
09-aug-2004 20:51:00 09-aug-2004 06:57:00
09-aug-2004 20:51:01 09-aug-2004 06:57:00
09-aug-2004 20:51:02 09-aug-2004 06:57:00
09-aug-2004 20:51:03 09-aug-2004 06:57:01
09-aug-2004 20:51:04 09-aug-2004 06:57:01
09-aug-2004 20:51:05 09-aug-2004 06:57:01
09-aug-2004 20:51:06 09-aug-2004 06:57:02
09-aug-2004 20:51:07 09-aug-2004 06:57:02
09-aug-2004 20:51:08 09-aug-2004 06:57:02
 
53 rows selected.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select trunc(dt) + trunc(to_char(dt,'sssss')/3)*1/24/60/60 dt2, count(*)
  2    from t
  3   group by trunc(dt) + trunc(to_char(dt,'sssss')/3)*1/24/60/60;
 
DT2                    COUNT(*)
-------------------- ----------
09-aug-2004 06:56:45          2
09-aug-2004 06:56:46          3
09-aug-2004 06:56:47          3
09-aug-2004 06:56:48          3
09-aug-2004 06:56:49          3
09-aug-2004 06:56:50          3
09-aug-2004 06:56:51          3
09-aug-2004 06:56:52          3
09-aug-2004 06:56:53          3
09-aug-2004 06:56:54          3
09-aug-2004 06:56:55          3
09-aug-2004 06:56:56          3
09-aug-2004 06:56:57          3
09-aug-2004 06:56:58          3
09-aug-2004 06:56:59          3
09-aug-2004 06:57:00          3
09-aug-2004 06:57:01          3
09-aug-2004 06:57:02          3
 
18 rows selected.
 

Still some doubts

A reader, August 09, 2004 - 10:11 pm UTC

Tom,

I have understood the first problem (about grouping). However, I am not clear about the solution provided for my question 2 about summation of dates in the same quarter. Please explain me what you tried to provide in the other example and also a solution to the summation of dates in the same quarter

Thanks

Tom Kyte
August 10, 2004 - 7:34 am UTC

i didn't even look at the second part.

i demonstrated "3 second grouping" only.


I saw no insert intos to work with.


but you need to derive a "quarter" column

select f(date) as quarter.....


and then you can partition on col1, quarter -- and then you have first_value, lag, lead, sum -- over (partition by col1, quarter....) to do whatever you need.

What about a 'sliding' window?

Basil, March 31, 2006 - 5:34 pm UTC

Assuming I have db rows inserted at the following times:
12:00:00
12:00:10
12:00:13
12:00:15
12:00:20
12:00:25
12:01:00
12:01:01
12:01:02
12:01:03
12:01:04
12:01:45
12:02:05
12:03:00

I would like to know the maximum number of rows that exist within any 60 second window. Thus, the following shows the number of rows that are in the window beginning at each indicated time:

20060531 12:00:00 6
20060531 12:00:10 10
20060531 12:00:13 9
20060531 12:00:15 8
20060531 12:00:20 7
20060531 12:00:25 6
20060531 12:01:00 6
20060531 12:01:01 5
20060531 12:01:02 4
20060531 12:01:03 3
20060531 12:01:04 2
20060531 12:01:45 2
20060531 12:02:05 1
20060531 12:03:00 1

The window starting at 12:00:10 has the most rows for the 60 second period that would end at 12:00:09.

Here's the test data:
CREATE sequence t_seq;
CREATE TABLE t (t_id number not null primary key,
t_dt date not null);

INSERT INTO t values (t_seq.nextval, to_date('20060531 12:00:00','YYYYMMDD HH24:MI:SS'));
INSERT INTO t values (t_seq.nextval, to_date('20060531 12:00:10','YYYYMMDD HH24:MI:SS'));
INSERT INTO t values (t_seq.nextval, to_date('20060531 12:00:13','YYYYMMDD HH24:MI:SS'));
INSERT INTO t values (t_seq.nextval, to_date('20060531 12:00:15','YYYYMMDD HH24:MI:SS'));
INSERT INTO t values (t_seq.nextval, to_date('20060531 12:00:20','YYYYMMDD HH24:MI:SS'));
INSERT INTO t values (t_seq.nextval, to_date('20060531 12:00:25','YYYYMMDD HH24:MI:SS'));
INSERT INTO t values (t_seq.nextval, to_date('20060531 12:01:00','YYYYMMDD HH24:MI:SS'));
INSERT INTO t values (t_seq.nextval, to_date('20060531 12:01:01','YYYYMMDD HH24:MI:SS'));
INSERT INTO t values (t_seq.nextval, to_date('20060531 12:01:02','YYYYMMDD HH24:MI:SS'));
INSERT INTO t values (t_seq.nextval, to_date('20060531 12:01:03','YYYYMMDD HH24:MI:SS'));
INSERT INTO t values (t_seq.nextval, to_date('20060531 12:01:04','YYYYMMDD HH24:MI:SS'));
INSERT INTO t values (t_seq.nextval, to_date('20060531 12:01:45','YYYYMMDD HH24:MI:SS'));
INSERT INTO t values (t_seq.nextval, to_date('20060531 12:02:05','YYYYMMDD HH24:MI:SS'));
INSERT INTO t values (t_seq.nextval, to_date('20060531 12:03:00', 'YYYYMMDD HH24:MI:SS'));

Otherwise, I've not yet been able to find a way to do this such that we consider a 60 second window that moves around.

Tom Kyte
March 31, 2006 - 6:57 pm UTC

ops$tkyte@ORA10GR2> select *
  2    from (
  3  select t_id, t_dt, cnt, max(cnt) over () max_cnt
  4    from (
  5  select t_id, t_dt,
  6         count(*) over (order by t_dt desc range 1/24/60 preceding ) cnt
  7    from t
  8         )
  9             )
 10   where cnt = max_cnt
 11   order by t_dt;

      T_ID T_DT                        CNT    MAX_CNT
---------- -------------------- ---------- ----------
         2 31-may-2006 12:00:10         10         10

 

Helpful! Now how to figure out the number of concurrent ones?

Basil, April 10, 2006 - 4:25 pm UTC

How to determine, then, the maximum number of things that occur at the same time, and which things are involved? Consider the following situation:

Event 1 13:00 -----------------15:00
Event 2 14:00-------------------------17:00
Event 3 12:00------------14:30
Event 4 12:30-12:40
Event 5 15:30-16:00


The maximum number of concurrent things is 3 (Events 1, 2, and 3,
between 14:00 and 14:30).

ALTER SESSION SET NLS_DATE_FORMAT= 'YYYYMMDD HH24:MI';
CREATE TABLE t
( id number not null primary key,
startdt date,
enddt date );

INSERT INTO t
VALUES(1, '20060501 12:00', '20060501 14:30');

INSERT INTO t
VALUES(2, '20060501 12:30', '20060501 12:40');

INSERT INTO t
VALUES(3, '20060501 13:00', '20060501 15:00');

INSERT INTO t
VALUES(4, '20060501 14:00', '20060501 17:00');

INSERT INTO t
VALUES(5, '20060501 15:30', '20060501 16:00');

COMMIT;


Tom Kyte
April 11, 2006 - 12:11 pm UTC

and what happens when the max happens more than once and things come and go over time

eg:

... time ....
e1 --------
e2 --------
e3 ----
e4 ----


there are always THREE - but a different set of three. I mean - there is no "one row" answer.

what would you expect the outcome of that query to be?



Further details

Basil, April 11, 2006 - 12:55 pm UTC

What I'm trying to get at is, for the data currently in the database, what is the maximum number of concurrent things in any given minute (or quarter hour, or hour, etc.). If the events are airline flights, how many planes were in the air at the same time, assuming the database records the actual takeoff and landing times?

If there are multiple occasions with the maximum number, that's OK. We just report the number and (perhaps) the various times it did occur. Right now, I'm happy just to know the number.

In my particular example, I'd expect the answer to the question to be 3 (which occurs between 14:00 and 14:30 for events 1,2,and 3 from the diagram, or ids 3, 4, and 1 from the SQL; sorry for the id-event mismatch).

This almost seems like a job for Spatial, but I'd like to avoid that if I can. In this case, the events can be modeled as line segments, and we're assigning weights for the subsegements that are collinear.

This could be a start?

Lars Stampe Villadsen, April 11, 2006 - 1:56 pm UTC

This might bring you a bit further:

select ts.time, count(*) from
(select startdt time from t
union
select enddt+1/86400 from t) ts,
t
where ts.time between t.startdt and enddt
group by ts.time

TIME COUNT(*)
01-05-2006 12:00:00 1
01-05-2006 12:30:00 2
01-05-2006 12:40:01 1
01-05-2006 13:00:00 2
01-05-2006 14:00:00 3
01-05-2006 14:30:01 2
01-05-2006 15:00:01 1
01-05-2006 15:30:00 2
01-05-2006 16:00:01 1

A select max() on count(*) will give you the maximum number of events and the 'start-time' for the interval where it happens. Next time gives you the end of the interval

Lars

Great, Lars!

Basil, April 11, 2006 - 2:41 pm UTC

This solution seems to work perfectly. I tried another brute force approach that consisted of creating a buckets table (startdt, enddt, count). For each event, I incremented the count for each buckets row that was between the start and end date of the event. The results there agree with yours, although the buckets approach gives a lot more output. (And takes a LOT longer, especially if one is tracking by the minute, due to the number of buckets).

Similar query... but with RUNNING TOTAL GROUPING

Robert, February 16, 2007 - 6:02 pm UTC

Tom,

This question is very similar to the original.

But instead of comparing differences between times,
I want to do a running sum of values and divide into groups based on the sum for groups of consecutive rows.

drop table x;
create table x (id number, value number);

insert into x
select rownum
, trunc(dbms_random.value(1, 10))
from dba_users
where rownum < 30
/

----------------------------
-- Group into groups whose running sum is smallest value >= 20
----------------------------

-- select * from x;

ID VALUE
--------- ---------
1 3 -- 3
2 5 -- 5 + 3 = 8
3 2 -- 8 + 2 = 10
4 1 -- 1 + 10 = 11
5 8 -- 8 + 11 = 19
6 6 -- 6 + 19 = 25 (25 >= 20 --> This row and above go in group 1
7 9 -- 9
8 3 -- 3 + 9 = 12
9 4 -- 4 + 12 = 16
10 7 -- 7 + 16 = 23 (23 >= 20 --> These rows go into group 2
11 2 -- 2
12 5 -- 5 + 2 = 7
13 9 -- 9 + 7 = 16
14 5 -- 5 + 16 = 21 (21 >= 20 --> These rows go into group 3
15 7 -- 7
16 3 -- 3 + 7 = 10
17 6 -- 6 + 10 = 16
18 2 -- 2 + 16 = 18
19 1 -- 1 + 18 = 19
20 4 -- 4 + 19 = 23 (23 >=20 --> These rows go into group 4
21 8 -- ETC, ETC.......
22 9
23 7
24 7
25 9
26 1
27 5
28 3
29 5

I know there must be an 'easy' way to do this with analytics, Tom... but it is beyond me.
(P.S. I am going to run this query against a table we are range re-partitioning to get the partitions into approx equal sizes.)

Can you help?

Thanks,

Robert.
Tom Kyte
February 18, 2007 - 9:37 am UTC

... I know there must be an 'easy' way to do this with analytics .....

laughing out loud - no, there isn't :)

but I'm glad you said what your goal was - you want to use NTILE() to break the sets of rows up.

You need to determine the number of groups you want and then we can "ntile()" them to break them into equi-sized ranges:
ops$tkyte%ORA10GR2> select nt, min(id), max(id), count(*)
  2    from (
  3  select id, value,
  4         ntile(4) over (order by id) nt
  5    from x
  6         )
  7   group by nt
  8   order by nt
  9  /

        NT    MIN(ID)    MAX(ID)   COUNT(*)
---------- ---------- ---------- ----------
         1          1          8          8
         2          9         15          7
         3         16         22          7
         4         23         29          7


One level deeper than that...

Robert, February 19, 2007 - 9:41 am UTC

Tom,

My requirement actually goes one level of complexity deeper than your reply...

The ID column in my little table is representing the 'low value' of an ACCOUNT_ID RANGE... The VALUE column is representing NUMBER OF ROWS (e.g. 5,000,000) within that account_id range.

My goal is to break up the account_id ranges into equal sized partitions based on the the number of rows within each account_id range (i.e. range partitioned).

So my original example actually shows the results I am looking for.

Can you help? Thanks, Robert.

drop table x;
create table x (id number, value number);

insert into x 
select rownum
     , trunc(dbms_random.value(1, 10))
  from dba_users
 where rownum < 30
/

----------------------------
-- Group into groups whose running sum of VALUE column is smallest value >= 20
----------------------------

-- select * from x;

       ID     VALUE
(ACCOUNT  (#ROWS
   ID      IN 
 RANGE )  ACCOUNT_ID RANGE)
--------- ---------
        1         3   -- 3
        2         5   -- 5 + 3 = 8
        3         2   -- 8 + 2 = 10 
        4         1   -- 1 + 10 = 11 
        5         8   -- 8 + 11 = 19
        6         6   -- 6 + 19 = 25 (25 >= 20 --> This row and above go in group 1
        7         9   -- 9
        8         3   -- 3 + 9 = 12
        9         4   -- 4 + 12 = 16
       10         7   -- 7 + 16 = 23 (23 >= 20 --> These rows go into group 2
       11         2   -- 2
       12         5   -- 5 + 2 = 7
       13         9   -- 9 + 7 = 16
       14         5   -- 5 + 16 = 21 (21 >= 20 --> These rows go into group 3
       15         7   -- 7
       16         3   -- 3 + 7 = 10
       17         6   -- 6 + 10 = 16
       18         2   -- 2 + 16 = 18
       19         1   -- 1 + 18 = 19
       20         4   -- 4 + 19 = 23 (23 >=20 --> These rows go into group 4
       21         8   -- ETC, ETC.......
       22         9
       23         7
       24         7
       25         9
       26         1
       27         5
       28         3
       29         5


Tom Kyte
February 19, 2007 - 9:51 am UTC

we can do the "largest less than 25", but the "smallest number greater than 25" isn't something I see being done easily.

largest <= 25 is just "divide"

RE: Group into groups whose running sum of VALUE column is smallest value >= 20

Frank Zhou, February 22, 2007 - 3:49 pm UTC

Hi Robert,

Here is a 10G SQL Model Clause solution for "Group into groups whose running sum of VALUE column is smallest value >= 20"


Frank


SQL> SELECT id, value, sum,
2 CASE WHEN tmp IS NULL
3 THEN 'This row and above belongs to one group' END flag
4 FROM
5 (
6 SELECT id, value, sum ,tmp
7 FROM (SELECT id, value, count(*) OVER ( ) counter,
8 ROW_NUMBER() OVER (ORDER BY id ) rn FROM x)
9 MODEL
10 DIMENSION BY (rn)
11 MEASURES (id, value, counter, CAST(NULL AS NUMBER) sum, value tmp)
12 RULES ITERATE (10000)
13 UNTIL (ITERATION_NUMBER>= counter[1])
14 (sum[ITERATION_NUMBER] = case when tmp[ITERATION_NUMBER-1] is null
15 then tmp[ITERATION_NUMBER]
16 else tmp[ITERATION_NUMBER] + sum[cv()-1]
17 end,
18 tmp[ITERATION_NUMBER] = CASE when sum[ITERATION_NUMBER] <21
19 then 0
20 end
21 )
22 )
23 WHERE ID IS NOT NULL
24 ORDER BY ID;

ID VALUE SUM FLAG
---------- ---------- ---------- ---------------------------------------
1 2 2
2 4 6
3 7 13
4 8 21 This row and above belongs to one group
5 3 3
6 9 12
7 3 15
8 5 20
9 6 26 This row and above belongs to one group
10 8 8
11 3 11
12 4 15
13 7 22 This row and above belongs to one group
14 5 5
15 7 12
16 3 15
17 4 19
18 4 23 This row and above belongs to one group
19 4 4
20 2 6
21 3 9
22 8 17
23 3 20
24 9 29 This row and above belongs to one group
25 2 2
26 8 10
27 1 11
28 8 19
29 6 25 This row and above belongs to one group

29 rows selected.

SQL>
SQL> spool off;

RE: Group into groups whose running sum of VALUE column is smallest value >= 20

Frank Zhou, August 07, 2007 - 2:03 pm UTC

Here is the latest SQL model clause soultion for "Group into groups whose running sum of VALUE column is smallest value >= 20"

http://oraqa.com/2007/08/02/how-to-divide-consecutive-rows-into-groups-based-on-the-value-of-the-smallest-running-total-in-sql/

Frank

RE: Group into groups whose running sum of VALUE column is smallest value >= 20

Frank Zhou, August 08, 2007 - 10:08 am UTC

Formated and fixed the previous URL.

Here is the latest SQL model clause solution:

http://oraqa.com/2007/08/02/how-to-divide-consecutive-rows-into-groups-based-on-the-value-of-the-smallest-running-total-in-sql/

Frank

Grouping response times into discreet buckets

Allen Shatzer, August 29, 2007 - 11:11 am UTC

I know that the answer to this is right in front of my nose, but for the life of me, I can't put my finger on it.

I would like to write a sql statement that would take response times for an application and count the number that occur in each .25 second interval. That is, count the number of requests that are .25 seconds or less, greater than .25 seconds and .50 seconds or less, etc.

The problem I am having is how to group the rows into .25 second ranges without having to build a driving table with ranges for each possible range of .25 second intervals.
Is that the only way that you would approach this or is there some other "trick" that could be used?

Thanks for all of the help that you provide to the Oracle developer community!
Tom Kyte
September 04, 2007 - 4:00 pm UTC

ops$tkyte%ORA10GR2> create table t as select object_id/10000 secs from all_objects;
ops$tkyte%ORA10GR2> */
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select trunc( (secs-0.000001)/.25 ), min(secs), max(secs), count(*)
  2    from t group by
  3   trunc((secs-0.000001)/.25) order by 1;

TRUNC((SECS-0.000001)/.25)  MIN(SECS)  MAX(SECS)   COUNT(*)
-------------------------- ---------- ---------- ----------
                         0      .0002        .25       2410
                         1      .2501         .5       2400
                         2      .5001        .75       1839
                         3      .7501      .9999       2219
                         4     1.0001       1.25       2457
                         5     1.2501        1.5       2500
                         6     1.5001       1.75       2500
                         7     1.7501          2       2500
                         8     2.0001       2.25       2500
                         9     2.2501        2.5       2499
                        10     2.5001       2.75       2500
                        11     2.7501          3       2500
                        12     3.0001       3.25       2500
                        13     3.2501        3.5       2500
                        14     3.5001       3.75       2497
                        15     3.7501          4       2498
                        16     4.0001       4.25       1586
                        17     4.2501        4.5       2464
                        18     4.5001       4.75       2008
                        19     4.7501          5       2264
                        20     5.0001      5.192       1028
                        21     5.3433      5.487         11
                        22     5.5191     5.6669          6
                        23     5.7823     5.7964          9
                        24      6.175      6.249        227
                        25     6.3285     6.4853          9
                        26      6.674     6.6746          2
                        27     6.8163     6.8164          2
                        28     7.1824       7.25        553
                        29     7.2501     7.3822        491
                        50     12.666    12.6844         13
                        51    12.7838    12.7838          1
                        52    13.2189    13.2189          1
                        53    13.4234    13.4235          2
                        55    13.8862    13.9986         22
                        56    14.0064    14.2309        146

36 rows selected.

Why 0.0001

babata, December 30, 2007 - 7:59 am UTC

Good morning Sir ,
I just want to know in the request

ops$tkyte@ORA9IR2> select did, min(cons_time), max(cons_time), sum(cons_fee)
  2    from (
  3  select did,
  4         cons_time,
  5             cons_fee,
  6             trunc(abs( (((cons_time-min(cons_time) over (partition by did))*24*60)-0.0001)/5 )) 
grp
  7    from t
  8         )
  9   group by did, grp
 10  /
 
why -0.0001 ??

thanks,


Reader, June 30, 2009 - 5:30 pm UTC

Tom,
select recid, key_date, price,val
trunc(abs( (((key_date-min(key_date) over (partition by val))*24*60)-0.0001)/5 )) grp
from config_test order by key_date

I need the grp to get aggregated values of price for 5 minute range. can you tell how to get this?

I also need to get the price_prev as shown below

At 9:35:01 (group 1) I need to get the price of the 1st record at the previous range
key_date=05/29/2009 09:35:00 recid=6;price=90

At 9:40:01 (group 2) I need to get the price of the 1st record at the previous range
key_date=05/29/2009 09:35:00 recid=6;price=124

Not sure if my explanation is clear. Please let me know if I am missing something or not clear.



RECID KEY_DATE PRICE VAL GRP price_prev
---------- ------------------- ---------- ---------- ---------- ---------
1 05/29/2009 09:30:00 100 A 0
2 05/29/2009 09:30:00 102 A 0
3 05/29/2009 09:30:00 104 A 0
4 05/29/2009 09:30:03 105 A 0
5 05/29/2009 09:30:03 106 A 0
6 05/29/2009 09:35:00 90 A 0
7 05/29/2009 09:35:00 80 A 0
8 05/29/2009 09:35:01 80 A 1 90
9 05/29/2009 09:35:01 80 A 1
10 05/29/2009 09:35:01 80 A 1
11 05/29/2009 09:35:02 120 A 1
12 05/29/2009 09:35:02 121 A 1
13 05/29/2009 09:35:02 123 A 1
14 05/29/2009 09:40:00 124 A 1
15 05/29/2009 09:40:00 125 A 1
16 05/29/2009 09:40:01 150 A 2 124
17 05/29/2009 09:40:01 150 A 2
18 05/29/2009 09:40:01 150 A 2
Tom Kyte
July 06, 2009 - 6:56 pm UTC

... I need the grp to get aggregated values of price for 5 minute range. can you
tell how to get this?
...

you need to define that infinitely more precisely. As written - it is not meaningful to me.

eg: do you simply want to take the date column and round down to the nearest 5 minutes and aggregate

or

do you want to take the "first row in the result and group together every row within 5 minutes of it and add up the values, then take the second row and do the same...."

or

do you want to take the first record and if the second is within 5 minutes, they are in the same group - else the second is the start of a new group. and so on and so on and so on


I can interpret your 'goal' is so so so many different ways.


and if you do not give us an interesting create table and inserts - we will never give you a select statement.

Reader, June 30, 2009 - 5:45 pm UTC

Apologies for the wrong info in the previous question


At 9:40:01 (group 2) I need to get the price of the 1st record at the previous range
key_date=05/29/2009 09:40:00 recid=14;price=124

Reader, July 01, 2009 - 10:49 am UTC

Tom,
Please ignore the previous explanation.

select recid, key_date, price,val
trunc(abs( (((key_date-min(key_date) over (partition by val))*24*60)-0.0001)/5 )) grp
from config_test order by key_date

I need the grp to get aggregated values of price for 5 minute range.


can you tell how to get the following?
I need to get the price_prev as shown below

At 9:35:01 (group 1) I need to get the price of the 1st record at the previous range
key_date=05/29/2009 09:35:00 recid=6;price=90

At 9:40:01 (group 2) I need to get the price of the 1st record at the previous range
key_date=05/29/2009 09:40:00 recid=14;price=124

Not sure if my explanation is clear. Please let me know if I am missing something or not clear.



RECID KEY_DATE PRICE VAL GRP price_prev
---------- ------------------- ---------- ---------- ---------- ---------
1 05/29/2009 09:30:00 100 A 0
2 05/29/2009 09:30:00 102 A 0
3 05/29/2009 09:30:00 104 A 0
4 05/29/2009 09:30:03 105 A 0
5 05/29/2009 09:30:03 106 A 0
6 05/29/2009 09:35:00 90 A 0
7 05/29/2009 09:35:00 80 A 0
8 05/29/2009 09:35:01 80 A 1 90
9 05/29/2009 09:35:01 80 A 1
10 05/29/2009 09:35:01 80 A 1
11 05/29/2009 09:35:02 120 A 1
12 05/29/2009 09:35:02 121 A 1
13 05/29/2009 09:35:02 123 A 1
14 05/29/2009 09:40:00 124 A 1
15 05/29/2009 09:40:00 125 A 1
16 05/29/2009 09:40:01 150 A 2 124
17 05/29/2009 09:40:01 150 A 2
18 05/29/2009 09:40:01 150 A 2




Reader, July 07, 2009 - 12:07 pm UTC

Sorry Tom for not being clear and not sending the create/insert statements

create table tst(recid number, key_dt date, price number, range_1 number, range2 number, val varchar2(10));

insert into tst values(1, to_date('07/07/2009 09:30:00 am','mm/dd/yyyy hh:mi:ss am') ,100,99,101, 'A');
insert into tst values(2, to_date('07/07/2009 09:30:00 am','mm/dd/yyyy hh:mi:ss am') ,102,101,103,'A');
insert into tst values(3, to_date('07/07/2009 09:30:00 am','mm/dd/yyyy hh:mi:ss am') ,104,102,105, 'A');
insert into tst values(4, to_date('07/07/2009 09:30:03 am','mm/dd/yyyy hh:mi:ss am') ,105,104,106, 'A');
insert into tst values(5, to_date('07/07/2009 09:30:03 am','mm/dd/yyyy hh:mi:ss am') ,106,105,107, 'A');
insert into tst values(6, to_date('07/07/2009 09:35:00 am','mm/dd/yyyy hh:mi:ss am') ,90 ,89,91,'A');
insert into tst values(7, to_date('07/07/2009 09:35:00 am','mm/dd/yyyy hh:mi:ss am') ,80 ,79,81,'A');
insert into tst values(8, to_date('07/07/2009 09:35:01 am','mm/dd/yyyy hh:mi:ss am') ,80 ,79,81,'A');
insert into tst values(9, to_date('07/07/2009 09:35:01 am','mm/dd/yyyy hh:mi:ss am') ,80 ,79,81,'A');
insert into tst values(10, to_date('07/07/2009 09:35:01 am','mm/dd/yyyy hh:mi:ss am'),80 ,79,81,'A');
insert into tst values(11, to_date('07/07/2009 09:35:02 am','mm/dd/yyyy hh:mi:ss am'),120,119,121,'A');
insert into tst values(12, to_date('07/07/2009 09:35:02 am','mm/dd/yyyy hh:mi:ss am'),121,120,122,'A');
insert into tst values(13, to_date('07/07/2009 09:35:02 am','mm/dd/yyyy hh:mi:ss am'),123,122,124,'A');
insert into tst values(14, to_date('07/07/2009 09:40:00 am','mm/dd/yyyy hh:mi:ss am'),124,123,125,'A');
insert into tst values(15, to_date('07/07/2009 09:40:00 am','mm/dd/yyyy hh:mi:ss am'),125,124,126,'A');
insert into tst values(16, to_date('07/07/2009 09:40:01 am','mm/dd/yyyy hh:mi:ss am'),150,149,151,'A');
insert into tst values(17, to_date('07/07/2009 09:40:01 am','mm/dd/yyyy hh:mi:ss am'),150,149,151,'A');
insert into tst values(18, to_date('07/07/2009 09:40:01 am','mm/dd/yyyy hh:mi:ss am'),150,149,151,'A');
insert into tst values(19, to_date('07/07/2009 09:45:00 am','mm/dd/yyyy hh:mi:ss am'),200,199,201,'A');
insert into tst values(20, to_date('07/07/2009 3:55:00 pm','mm/dd/yyyy hh:mi:ss am'),400,399,401,'A');
insert into tst values(21, to_date('07/07/2009 3:55:01 pm','mm/dd/yyyy hh:mi:ss am'),3,2,4,'A');
insert into tst values(22, to_date('07/07/2009 4:00:00 pm','mm/dd/yyyy hh:mi:ss am'),8,7,9,'A');
commit;

I have to start the grouping at 5 minute interval as shown below. The group start time is 9:35:00
Records from 9:30:00 to 9:35:00 will come under 9:35:00 group
Records from 9:35:01 to 9:40:00 will come under 9:40:00 group and so on until 4:00 pm
Records from 3:55:01 pm to 4:00 pm will come under 4:00:00 pm group

Based on this grouping I need to do aggregation. For example, sum of price at each 5 minute interval group based on val.I also need to get the last value of range_1 and range_2 at each interval ordered by recid.
In the end one record per group (val, key_dt,sum,last_rng_1)
For example

val KEY_DT sum last_rng_1
-- ---------------------- --- ----------
A 07/07/2009 09:35:00 am 687 79
A 07/07/2009 09:40:00 am 853 124
A 07/07/2009 09:45:00 am 650 199
A 07/07/2009 03:55:00 pm 400 399
A 07/07/2009 04:00:00 pm 11 7

Can you advice?
Tom Kyte
July 07, 2009 - 7:02 pm UTC

basically, round to the nearest 5 minutes.

ops$tkyte%ORA10GR2> select dt2, min(dt), max(dt)
  2    from (
  3  select dt,
  4         trunc(dt,'hh') +
  5               (( decode( mod(to_number(to_char(dt,'mi')),5)+to_char(dt,'ss'), 0, 0, 1 ) +
  6                 trunc(to_number(to_char(dt,'mi'))/5) ) * 5) * (1/24/60) dt2
  7    from (select trunc(sysdate)+1/24/60/60*level dt
  8            from dual connect by level <= 24*60*60+10)
  9            )
 10   group by dt2
 11   order by dt2
 12  /



lines 4-6 brute forces a date to that 5 minute window, just use that.

Reader, July 07, 2009 - 12:31 pm UTC

Insert statements are not clear for the above question when I pasted. I am re-sending.

create table tst(recid number, key_dt date, price number, range_1 number, range2 number, val varchar2(10));

insert into tst values(1, to_date('07/07/2009 09:30:00 am','mm/dd/yyyy hh:mi:ss am'),100,99,101,'A');
insert into tst values(2, to_date('07/07/2009 09:30:00 am','mm/dd/yyyy hh:mi:ss am'),102,101,103,'A');
insert into tst values(3, to_date('07/07/2009 09:30:00 am','mm/dd/yyyy hh:mi:ss am'),104,102,105,'A');
insert into tst values(4, to_date('07/07/2009 09:30:03 am','mm/dd/yyyy hh:mi:ss am'),105,104,106,'A');
insert into tst values(5, to_date('07/07/2009 09:30:03 am','mm/dd/yyyy hh:mi:ss am'),106,105,107,'A');
insert into tst values(6, to_date('07/07/2009 09:35:00 am','mm/dd/yyyy hh:mi:ss am'),90,89,91,'A');
insert into tst values(7, to_date('07/07/2009 09:35:00 am','mm/dd/yyyy hh:mi:ss am'),80,79,81,'A');
insert into tst values(8, to_date('07/07/2009 09:35:01 am','mm/dd/yyyy hh:mi:ss am'),80,79,81,'A');
insert into tst values(9, to_date('07/07/2009 09:35:01 am','mm/dd/yyyy hh:mi:ss am'),80,79,81,'A');
insert into tst values(10, to_date('07/07/2009 09:35:01 am','mm/dd/yyyy hh:mi:ss am'),80,79,81,'A');
insert into tst values(11, to_date('07/07/2009 09:35:02 am','mm/dd/yyyy hh:mi:ss am'),120,119,121,'A');
insert into tst values(12, to_date('07/07/2009 09:35:02 am','mm/dd/yyyy hh:mi:ss am'),121,120,122,'A');
insert into tst values(13, to_date('07/07/2009 09:35:02 am','mm/dd/yyyy hh:mi:ss am'),123,122,124,'A');
insert into tst values(14, to_date('07/07/2009 09:40:00 am','mm/dd/yyyy hh:mi:ss am'),124,123,125,'A');
insert into tst values(15, to_date('07/07/2009 09:40:00 am','mm/dd/yyyy hh:mi:ss am'),125,124,126,'A');
insert into tst values(16, to_date('07/07/2009 09:40:01 am','mm/dd/yyyy hh:mi:ss am'),150,149,151,'A');
insert into tst values(17, to_date('07/07/2009 09:40:01 am','mm/dd/yyyy hh:mi:ss am'),150,149,151,'A');
insert into tst values(18, to_date('07/07/2009 09:40:01 am','mm/dd/yyyy hh:mi:ss am'),150,149,151,'A');
insert into tst values(19, to_date('07/07/2009 09:45:00 am','mm/dd/yyyy hh:mi:ss am'),200,199,201,'A');
insert into tst values(20, to_date('07/07/2009 3:55:00 pm','mm/dd/yyyy hh:mi:ss am'),400,399,401,'A');
insert into tst values(21, to_date('07/07/2009 3:55:01 pm','mm/dd/yyyy hh:mi:ss am'),3,2,4,'A');
insert into tst values(22, to_date('07/07/2009 4:00:00 pm','mm/dd/yyyy hh:mi:ss am'),8,7,9,'A');
commit;

Reader, July 08, 2009 - 11:51 am UTC

Thank you for the query.

In regards to the above solution that you suggested, I wrote the query as below to get the data I want.

Can you suggest if this is the correct way? If not, can you please suggest how it should be done?

select *
from(
select dt_all.dt2,TST.VAL,TST.KEY_DT,price,recid,range_1
,sum(tst.PRICE) over (partition by val,dt2) price_sum
,first_value(range_1) over (partition by val,dt2 order by recid desc) last_val_rang_1
,row_number() over (partition by val,dt2 order by recid desc) rn
from tst ,
(select dt2, min(dt) min_dt, max(dt) max_dt
from(select dt,
trunc(dt,'hh') +
(( decode( mod(to_number(to_char(dt,'mi')),5)+to_char(dt,'ss'), 0, 0, 1 ) +
trunc(to_number(to_char(dt,'mi'))/5) ) * 5) * (1/24/60) dt2
from (select trunc(sysdate)+1/24/60/60*level dt
from dual connect by level <= 24*60*60+10))
where dt2 between to_date('07/08/2009 09:30:00 am','mm/dd/yyyy hh:mi:ss am')
and to_date('07/08/2009 04:00:00 pm','mm/dd/yyyy hh:mi:ss am')
group by dt2
order by dt2
) dt_all
where tst.KEY_DT between dt_all.min_dt and dt_all.max_dt
)
where rn = 1

Reader, July 09, 2009 - 11:35 am UTC

Tom,
I was wondering if you could give suggestion for the above question.
Tom Kyte
July 14, 2009 - 3:06 pm UTC

did you analyze YOUR query and determine that it answers your question?


Your output does not match what I think it should be.


For example   

val    KEY_DT            sum    last_rng_1
--    ----------------------    ---    ----------
A    07/07/2009 09:35:00 am    687    79    
A    07/07/2009 09:40:00 am    853    124
A    07/07/2009 09:45:00 am    650    199
A    07/07/2009 03:55:00 pm    400    399



you posted that. But, when I run:

ops$tkyte%ORA10GR2> select recid, key_dt,
  2         trunc(key_dt,'hh') +
  3           (( decode( mod(to_number(to_char(key_dt,'mi')),5)+to_char(key_dt,'ss'), 0, 0, 1 ) +
  4           trunc(to_number(to_char(key_dt,'mi'))/5) ) * 5) * (1/24/60) key_dt2,
  5         price, range_1, range2, val
  6    from tst
  7   order by key_dt
  8  /

     RECID KEY_DT               KEY_DT2                   PRICE    RANGE_1     RANGE2 VAL
---------- -------------------- -------------------- ---------- ---------- ---------- ----------
         1 07-jul-2009 09:30:00 07-jul-2009 09:30:00        100         99        101 A
         2 07-jul-2009 09:30:00 07-jul-2009 09:30:00        102        101        103 A
         3 07-jul-2009 09:30:00 07-jul-2009 09:30:00        104        102        105 A
         4 07-jul-2009 09:30:03 07-jul-2009 09:35:00        105        104        106 A
         5 07-jul-2009 09:30:03 07-jul-2009 09:35:00        106        105        107 A
         6 07-jul-2009 09:35:00 07-jul-2009 09:35:00         90         89         91 A
         7 07-jul-2009 09:35:00 07-jul-2009 09:35:00         80         79         81 A
         8 07-jul-2009 09:35:01 07-jul-2009 09:40:00         80         79         81 A
         9 07-jul-2009 09:35:01 07-jul-2009 09:40:00         80         79         81 A
        10 07-jul-2009 09:35:01 07-jul-2009 09:40:00         80         79         81 A
        11 07-jul-2009 09:35:02 07-jul-2009 09:40:00        120        119        121 A
        13 07-jul-2009 09:35:02 07-jul-2009 09:40:00        123        122        124 A
        12 07-jul-2009 09:35:02 07-jul-2009 09:40:00        121        120        122 A
        15 07-jul-2009 09:40:00 07-jul-2009 09:40:00        125        124        126 A
        14 07-jul-2009 09:40:00 07-jul-2009 09:40:00        124        123        125 A
        17 07-jul-2009 09:40:01 07-jul-2009 09:45:00        150        149        151 A
        16 07-jul-2009 09:40:01 07-jul-2009 09:45:00        150        149        151 A
        18 07-jul-2009 09:40:01 07-jul-2009 09:45:00        150        149        151 A
        19 07-jul-2009 09:45:00 07-jul-2009 09:45:00        200        199        201 A
        20 07-jul-2009 15:55:00 07-jul-2009 15:55:00        400        399        401 A
        21 07-jul-2009 15:55:01 07-jul-2009 16:00:00          3          2          4 A
        22 07-jul-2009 16:00:00 07-jul-2009 16:00:00          8          7          9 A

22 rows selected.



I don't see how you get your sum - my sum is:

ops$tkyte%ORA10GR2> select max(val) val,
  2         key_dt2,
  3             sum(range_1) sumr1 ,
  4             max(range_1) keep (dense_rank first order by key_dt DESC) lastr1
  5    from (
  6  select recid, key_dt,
  7         trunc(key_dt,'hh') +
  8           (( decode( mod(to_number(to_char(key_dt,'mi')),5)+to_char(key_dt,'ss'), 0, 0, 1 ) +
  9           trunc(to_number(to_char(key_dt,'mi'))/5) ) * 5) * (1/24/60) key_dt2,
 10         price, range_1, range2, val
 11    from tst
 12         )
 13   group by key_dt2
 14   order by key_dt2
 15  /

VAL        KEY_DT2                   SUMR1     LASTR1
---------- -------------------- ---------- ----------
A          07-jul-2009 09:30:00        302        102
A          07-jul-2009 09:35:00        377         89
A          07-jul-2009 09:40:00        845        124
A          07-jul-2009 09:45:00        646        199
A          07-jul-2009 15:55:00        399        399
A          07-jul-2009 16:00:00          9          7

6 rows selected.


and why is 79 the "last record", it arrived at the same time as 89. 89 is the last record I think (why do I think that? because I do, they arrived at the same time, they are both last)

Rohit, October 06, 2009 - 8:14 am UTC

Hi Tom,
This example is very useful at many places in our systems. I was wondering how would break the group to be able to report on occourances every minutes within first instance.
Here is the sample data set and expected output

drop table t
/
create table t ( x date, y int,id int )
/
INSERT INTO T ( T.X, T.Y, T.ID )
VALUES ( TO_DATE('10/06/2008 03:34:40 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 228 )
/
INSERT INTO T ( T.X, T.Y, T.ID )
VALUES ( TO_DATE('10/06/2008 03:34:30 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 228 )
/
INSERT INTO T ( T.X, T.Y, T.ID )
VALUES ( TO_DATE('10/06/2008 03:34:50 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 228 )
/
INSERT INTO T ( T.X, T.Y, T.ID )
VALUES ( TO_DATE('10/06/2008 04:28:30 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 228 )
/
INSERT INTO T ( T.X, T.Y, T.ID )
VALUES ( TO_DATE('10/06/2008 04:29:30 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 228 )
/
INSERT INTO T ( T.X, T.Y, T.ID )
VALUES ( TO_DATE('10/06/2008 04:30:30 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 228 )
/
INSERT INTO T ( T.X, T.Y, T.ID )
VALUES ( TO_DATE('10/06/2008 04:30:29 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 228 )
/
INSERT INTO T ( T.X, T.Y, T.ID )
VALUES ( TO_DATE('10/06/2008 04:27:35 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 12, 228 )
/
INSERT INTO T ( T.X, T.Y, T.ID )
VALUES ( TO_DATE('10/06/2008 03:33:34 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 21, 228 )
/
INSERT INTO T ( T.X, T.Y, T.ID )
VALUES ( TO_DATE('10/06/2008 03:34:15 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 228 )
/
INSERT INTO T ( T.X, T.Y, T.ID )
VALUES ( TO_DATE('10/07/2008 12:59:40 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 229 )
/
INSERT INTO T ( T.X, T.Y, T.ID )
VALUES ( TO_DATE('10/07/2008 12:59:50 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 229 )
/
INSERT INTO T ( T.X, T.Y, T.ID )
VALUES ( TO_DATE('10/07/2008 01:00:05 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 229 )
/
INSERT INTO T ( T.X, T.Y, T.ID )
VALUES ( TO_DATE('10/07/2008 01:00:30 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 229 )
/
INSERT INTO T ( T.X, T.Y, T.ID )
VALUES ( TO_DATE('10/07/2008 01:00:49 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 229 )
/
INSERT INTO T ( T.X, T.Y, T.ID )
VALUES ( TO_DATE('10/07/2008 01:00:55 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 229 )
/
commit
/
I am expecting output like this

id x sum(y) grp
228 06-Oct-08 03.33.34.00 PM 41 0
228 06-Oct-08 03.34.40.00 PM 20 1
228 06-Oct-08 04.27.35.00 PM 22 2
228 06-Oct-08 04.29.30.00 PM 20 3
228 06-Oct-08 04.30.30.00 PM 10 4
229 07-Oct-08 12.59.40.00 PM 40 5
229 07-Oct-08 01.00.49.00 PM 20 6

Thank you
Rohit
Tom Kyte
October 08, 2009 - 7:09 am UTC

... I was wondering how
would break the group to be able to report on occourances every minutes within
first instance.
...

explain in LARGE, LONG detail what "within first instance" means?

and I see id, but it doesn't see we are to partition by ID? explain the output - IN DETAIL - in painstaking detail - take at least 5, if not 10 minutes to do this. give us a specification.

rohit, October 06, 2009 - 11:58 am UTC

Data in the table looks like this:
X Y ID
06-Oct-08 03.33.34.00 PM 21 228<=g1
06-Oct-08 03.34.15.00 PM 10 228<=g1
06-Oct-08 03.34.30.00 PM 10 228<=g1
06-Oct-08 03.34.40.00 PM 10 228<=g2
06-Oct-08 03.34.50.00 PM 10 228<=g3
06-Oct-08 04.27.35.00 PM 12 228<=g4
06-Oct-08 04.28.30.00 PM 10 228<=g5
06-Oct-08 04.29.30.00 PM 10 228<=g6
06-Oct-08 04.30.29.00 PM 10 228<=g6
06-Oct-08 04.30.30.00 PM 10 228<=g7
07-Oct-08 12.59.40.00 PM 10 229<=g1
07-Oct-08 12.59.50.00 PM 10 229<=g2
07-Oct-08 01.00.05.00 PM 10 229<=g2
07-Oct-08 01.00.30.00 PM 10 229<=g2
07-Oct-08 01.00.49.00 PM 10 229<=g3
07-Oct-08 01.00.55.00 PM 10 229<=g4


Tom Kyte
October 08, 2009 - 7:13 am UTC

that doesn't mean a thing to me.

rohit, October 11, 2009 - 7:49 am UTC

Sorry, I was not clear enough. We are getting duplicate data from the source. Requirement is to process and get only first record with in first 60 seconds.

For example these source records
X Y ID
06-Oct-08 03.33.34.00 PM 21 228
06-Oct-08 03.34.15.00 PM 10 228
06-Oct-08 03.34.30.00 PM 10 228

Output should be ignoring all following records within 60 seconds of 06-Oct-08 03.33.34.00 PM

X Y ID
06-Oct-08 03.33.34.00 PM 21 228

Same way for below records
06-Oct-08 03.34.40.00 PM 10 228
06-Oct-08 03.34.50.00 PM 10 228

I should get
06-Oct-08 03.34.40.00 PM 10 228

Problem I am facing is to determine next group of records. I could do it with PL/SQL procedure but I have to process at least 5M records every day and I believe sql would be much faster. I hope, I make my self more clear.

Thanks
rohit

Try this

Steve G, October 12, 2009 - 1:04 pm UTC

with e as (
select
x
, y
, id
, row_number() over (partition by id order by x) as rn
from t
)
select
to_char(x, 'MM/DD/YYYY HH12:MI:SS AM')
, y
, id
from e
where rn = 1

@Rohit re: group records within a minute

Stew Ashton, October 17, 2009 - 7:50 am UTC


Rohit,

I take you to mean:
- partition by id, order by x
- the first "group" starts with the first record and includes all records less than a minute later than the first record
- the next "group" starts with the next record after the previous group and includes all records less than a minute later than that next record
- and so on.

My approach is:
- assign row numbers
- assume each record is the first record of a "group" and get the row number of the last record in that "group". This gives me the good "groups", plus a lot of overlapping groups.
- use START WITH...CONNECT BY to select out the good "groups".
select id, x, last_x, rn, last_rn, y, sum_y from (
  select id, x, y, rn
  , sum(y) over(partition by id order by x 
    range between current row and numtodsinterval(59, 'second') following) sum_y
  , last_value(x) over(partition by id order by x 
    range between current row and numtodsinterval(59, 'second') following) last_x
  , last_value(rn) over(partition by id order by x 
    range between current row and numtodsinterval(59, 'second') following) last_rn
  from (
    select id, x, y, row_number() over(partition by id order by x) rn from t
  )
)
start with rn = 1
connect by (id, rn) = ((prior id, prior last_rn+1))
/
ID   X                    LAST_X               RN  LAST_RN  Y   SUM_Y
---- -------------------- -------------------- --- -------- --- -----
228  06-10-2008 15:33:34  06-10-2008 15:34:30  1   3        21  41
228  06-10-2008 15:34:40  06-10-2008 15:34:50  4   5        10  20
228  06-10-2008 16:27:35  06-10-2008 16:28:30  6   7        12  22
228  06-10-2008 16:29:30  06-10-2008 16:30:29  8   9        10  20
228  06-10-2008 16:30:30  06-10-2008 16:30:30  10  10       10  10
229  07-10-2008 12:59:40  07-10-2008 13:00:30  1   4        10  40
229  07-10-2008 13:00:49  07-10-2008 13:00:55  5   6        10  20

7 rows selected

Rohit, October 21, 2009 - 11:06 am UTC

Hi Stew,
Thanks for your response. I don't know what I did wrong. I am getting different results. Getting only one record for 10/06/09 for id 228 as suppose to 4 more records. Could you check the results.

Thanks,
Rohit

ID X LAST_X RN LAST_RN Y SUM_Y
228 10/6/2008 3:33:34.000 PM 10/6/2008 3:34:30.000 PM 1 3 21 41
229 10/7/2008 12:59:40.000 PM 10/7/2008 1:00:30.000 PM 1 4 10 40
229 10/7/2008 1:00:49.000 PM 10/7/2008 1:00:55.000 PM 5 6 10 20

@rohit re: problem with my approach

Stew Ashton, October 21, 2009 - 3:12 pm UTC


Hi again rohit,

The last line of my code is
connect by (id, rn) = ((prior id, prior last_rn+1))
not
connect by (id, rn) = ((prior id+1, prior last_rn+2))
If that is not the problem, try running what follows one select at a time to see where your results differ from mine. Please don't type this in yourself; use copy and paste.
select count(*) from t
/
COUNT(*)               
---------------------- 
16                     

1 rows selected
/
select id, x, y, row_number() over(partition by id order by x) rn from t
order by id, x
/
ID  X                         Y                      RN                     
--- ------------------------- ---------------------- ------
228 06-10-2008 15:33:34       21                     1      
228 06-10-2008 15:34:15       10                     2     
228 06-10-2008 15:34:30       10                     3   
228 06-10-2008 15:34:40       10                     4  
228 06-10-2008 15:34:50       10                     5       
228 06-10-2008 16:27:35       12                     6      
228 06-10-2008 16:28:30       10                     7   
228 06-10-2008 16:29:30       10                     8    
228 06-10-2008 16:30:29       10                     9   
228 06-10-2008 16:30:30       10                     10    
229 07-10-2008 12:59:40       10                     1          
229 07-10-2008 12:59:50       10                     2                      
229 07-10-2008 13:00:05       10                     3                      
229 07-10-2008 13:00:30       10                     4                      
229 07-10-2008 13:00:49       10                     5                      
229 07-10-2008 13:00:55       10                     6                      

16 rows selected
/
select id, x, y, rn
, last_value(rn) over(partition by id order by x 
  range between current row and numtodsinterval(59, 'second') following) last_rn
from (
  select id, x, y, row_number() over(partition by id order by x) rn from t
)
order by id, x
/
ID   X                         Y                      RN                     LAST_RN 
---- ------------------------- ---------------------- ---------------------- --------
228  06-10-2008 15:33:34       21                     1                      3       
228  06-10-2008 15:34:15       10                     2                      5        
228  06-10-2008 15:34:30       10                     3                      5        
228  06-10-2008 15:34:40       10                     4                      5      
228  06-10-2008 15:34:50       10                     5                      5    
228  06-10-2008 16:27:35       12                     6                      7      
228  06-10-2008 16:28:30       10                     7                      7      
228  06-10-2008 16:29:30       10                     8                      9   
228  06-10-2008 16:30:29       10                     9                      10   
228  06-10-2008 16:30:30       10                     10                     10   
229  07-10-2008 12:59:40       10                     1                      4  
229  07-10-2008 12:59:50       10                     2                      5 
229  07-10-2008 13:00:05       10                     3                      6 
229  07-10-2008 13:00:30       10                     4                      6   
229  07-10-2008 13:00:49       10                     5                      6    
229  07-10-2008 13:00:55       10                     6                      6 

16 rows selected
/
select id, x, rn, last_rn, y from (
  select id, x, y, rn
  , last_value(rn) over(partition by id order by x 
    range between current row and numtodsinterval(59, 'second') following) last_rn
  from (
    select id, x, y, row_number() over(partition by id order by x) rn from t
  )
)
start with rn = 1
connect by (id, rn) = ((prior id, prior last_rn+1))
/
ID  X                         RN                     LAST_RN                Y 
--- ------------------------- ---------------------- ---------------------- ---
228 06-10-2008 15:33:34       1                      3                      21 
228 06-10-2008 15:34:40       4                      5                      10    
228 06-10-2008 16:27:35       6                      7                      12 
228 06-10-2008 16:29:30       8                      9                      10   
228 06-10-2008 16:30:30       10                     10                     10  
229 07-10-2008 12:59:40       1                      4                      10
229 07-10-2008 13:00:49       5                      6                      10

7 rows selected

Rohit, October 23, 2009 - 10:49 am UTC

Hi Stew,
Thanks for your response again. I had tried again but still did not work I got the same output. Then I tried on Oracle version 9.2.0.8 and it worked. Its hard to believe but it seems that Oracle 11.1.0.7 have bug while using start with connect by.
I am checking with Oracle support if they could suggest a solution for this bug.

Thanks so much and really appreciate your help.

Regards,
Rohit

Group by interval using alternate approach

Narendra Prabhudesai, July 30, 2010 - 6:04 am UTC

Hello Tom,

I am trying to get first record within an interval of 1 minute for an ID from following resultset
X                             Y         ID
-------------------- ---------- ----------
06-OCT-2008 15:33:34         21        228
06-OCT-2008 15:34:15         10        228
06-OCT-2008 15:34:30         10        228
06-OCT-2008 15:34:40         10        228
06-OCT-2008 15:34:50         10        228
06-OCT-2008 16:27:35         12        228
06-OCT-2008 16:28:30         10        228
06-OCT-2008 16:29:30         10        228
06-OCT-2008 16:30:29         10        228
06-OCT-2008 16:30:30         10        228
07-OCT-2008 12:59:40         10        229
07-OCT-2008 12:59:50         10        229
07-OCT-2008 13:00:05         10        229
07-OCT-2008 13:00:30         10        229
07-OCT-2008 13:00:49         10        229
07-OCT-2008 13:00:55         10        229

For e.g. the following set of records will be considered as a group
X                             Y         ID
-------------------- ---------- ----------
06-OCT-2008 15:33:34         21        228
06-OCT-2008 15:34:15         10        228
06-OCT-2008 15:34:30         10        228

and should generate the output as
X                             Y         ID
-------------------- ---------- ----------
06-OCT-2008 15:33:34         21        228

The next set of records to be considered as a group will be
X                             Y         ID
-------------------- ---------- ----------
06-OCT-2008 15:34:40         10        228
06-OCT-2008 15:34:50         10        228

and should generate the output as
X                             Y         ID
-------------------- ---------- ----------
06-OCT-2008 15:34:40         10        228

and so on.
I am able to get the desired results using the approach suggested by Stew Ashton above
SQL> select id, x, rn, last_rn, y from
  2  (select id, x, y, rn,
  3  last_value(rn) over (partition by id order by x range between current row and numtodsinterval(59, 'second') following) last_rn from
  4  (select id, x, y, row_number() over (partition by id order by x) rn from t1))
  5  start with rn = 1
  6  connect by (id, rn) = ((prior id, prior last_rn+1))
  7  /

        ID X                            RN    LAST_RN          Y
---------- -------------------- ---------- ---------- ----------
       228 06-OCT-2008 15:33:34          1          3         21
       228 06-OCT-2008 15:34:40          4          5         10
       228 06-OCT-2008 16:27:35          6          7         12
       228 06-OCT-2008 16:29:30          8          9         10
       228 06-OCT-2008 16:30:30         10         10         10
       229 07-OCT-2008 12:59:40          1          4         10
       229 07-OCT-2008 13:00:49          5          6         10

7 rows selected.

However, I was wondering if it would be possible to use the approach that you suggested at http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:13946369553642#14611583328217
and I got almost identical results except for one difference. While Stew's query gave
X              
--------------------
06-OCT-2008 16:29:30
06-OCT-2008 16:30:30

the other approach gave
X                   
--------------------
06-OCT-2008 16:29:30
06-OCT-2008 16:30:29

Can you help?
Following are the complete test case details:
SQL> select * from v$version ;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE 10.2.0.4.0 Production
TNS for Solaris: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

SQL> create table t1 (x date, y int, id int) ;

Table created.

SQL> insert into t1 values (to_date('10/06/2008 03:34:40 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 228)
  2  /

1 row created.

SQL> insert into t1 values (to_date('10/06/2008 03:34:30 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 228)
  2  /

1 row created.

SQL> insert into t1 values (to_date('10/06/2008 03:34:50 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 228)
  2  /

1 row created.

SQL> insert into t1 values (to_date('10/06/2008 04:28:30 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 228)
  2  /

1 row created.

SQL> insert into t1 values (to_date('10/06/2008 04:29:30 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 228)
  2  /

1 row created.

SQL> insert into t1 values (to_date('10/06/2008 04:30:30 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 228)
  2  /

1 row created.

SQL> insert into t1 values (to_date('10/06/2008 04:30:29 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 228)
  2  /

1 row created.

SQL> insert into t1 values (to_date('10/06/2008 04:27:35 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 228)
  2  /

1 row created.

SQL> update t1 set y = 12 where x = to_date('10/06/2008 04:27:35 PM', 'MM/DD/YYYY HH12:MI:SS AM') and y = 10 ;

1 row updated.

SQL> insert into t1 values (to_date('10/06/2008 03:33:34 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 21, 228)
  2  /

1 row created.

SQL> insert into t1 values (to_date('10/06/2008 03:34:15 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 228)
  2  /

1 row created.

SQL> insert into t1 values (to_date('10/07/2008 12:59:40 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 229)
  2  /

1 row created.

SQL> insert into t1 values (to_date('10/07/2008 12:59:50 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 229)
  2  /

1 row created.

SQL> insert into t1 values (to_date('10/07/2008 01:00:05 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 229)
  2  /

1 row created.

SQL> insert into t1 values (to_date('10/07/2008 01:00:30 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 229)
  2  /

1 row created.

SQL> insert into t1 values (to_date('10/07/2008 01:00:49 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 229)
  2  /

1 row created.

SQL> insert into t1 values (to_date('10/07/2008 01:00:55 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 229)
  2  /

1 row created.

SQL> commit ;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(user, 'T1') ;

PL/SQL procedure successfully completed.

SQL> select count(*) from t1 ;

  COUNT(*)
----------
        16

SQL> select id, x, y, row_number() over (partition by id order by x) rn from t1 order by id, x
  2  /

        ID X                             Y         RN
---------- -------------------- ---------- ----------
       228 06-OCT-2008 15:33:34         21          1
       228 06-OCT-2008 15:34:15         10          2
       228 06-OCT-2008 15:34:30         10          3
       228 06-OCT-2008 15:34:40         10          4
       228 06-OCT-2008 15:34:50         10          5
       228 06-OCT-2008 16:27:35         12          6
       228 06-OCT-2008 16:28:30         10          7
       228 06-OCT-2008 16:29:30         10          8
       228 06-OCT-2008 16:30:29         10          9
       228 06-OCT-2008 16:30:30         10         10
       229 07-OCT-2008 12:59:40         10          1
       229 07-OCT-2008 12:59:50         10          2
       229 07-OCT-2008 13:00:05         10          3
       229 07-OCT-2008 13:00:30         10          4
       229 07-OCT-2008 13:00:49         10          5
       229 07-OCT-2008 13:00:55         10          6

16 rows selected.

SQL> select id, x, y, rn, last_value(rn) over (partition by id order by x range between current row and numtodsinterval(59, 'second') following) last_rn from (select id, x, y, row_number() over (partition by id order by x) rn from t1) order by id, x
  2  /

        ID X                             Y         RN    LAST_RN
---------- -------------------- ---------- ---------- ----------
       228 06-OCT-2008 15:33:34         21          1          3
       228 06-OCT-2008 15:34:15         10          2          5
       228 06-OCT-2008 15:34:30         10          3          5
       228 06-OCT-2008 15:34:40         10          4          5
       228 06-OCT-2008 15:34:50         10          5          5
       228 06-OCT-2008 16:27:35         12          6          7
       228 06-OCT-2008 16:28:30         10          7          7
       228 06-OCT-2008 16:29:30         10          8          9
       228 06-OCT-2008 16:30:29         10          9         10
       228 06-OCT-2008 16:30:30         10         10         10
       229 07-OCT-2008 12:59:40         10          1          4
       229 07-OCT-2008 12:59:50         10          2          5
       229 07-OCT-2008 13:00:05         10          3          6
       229 07-OCT-2008 13:00:30         10          4          6
       229 07-OCT-2008 13:00:49         10          5          6
       229 07-OCT-2008 13:00:55         10          6          6

16 rows selected.

SQL> select id, x, rn, last_rn, y from
  2  (select id, x, y, rn,
  3  last_value(rn) over (partition by id order by x range between current row and numtodsinterval(59, 'second') following) last_rn from
  4  (select id, x, y, row_number() over (partition by id order by x) rn from t1))
  5  start with rn = 1
  6  connect by (id, rn) = ((prior id, prior last_rn+1))
  7  /

        ID X                            RN    LAST_RN          Y
---------- -------------------- ---------- ---------- ----------
       228 06-OCT-2008 15:33:34          1          3         21
       228 06-OCT-2008 15:34:40          4          5         10
       228 06-OCT-2008 16:27:35          6          7         12
       228 06-OCT-2008 16:29:30          8          9         10
       228 06-OCT-2008 16:30:30         10         10         10
       229 07-OCT-2008 12:59:40          1          4         10
       229 07-OCT-2008 13:00:49          5          6         10

7 rows selected.

SQL> select /*+ gather_plan_statistics */ id, x, rn, last_rn, y from
  2   (select id, x, y, rn,
  3    last_value(rn) over (partition by id order by x range between current row and numtodsinterval
(59, 'second') following) last_rn from
  4     (select id, x, y, row_number() over (partition by id order by x) rn from t1))
  5  start with rn = 1
  6  connect by (id, rn) = ((prior id, prior last_rn+1))
  7  /

        ID X                            RN    LAST_RN          Y
---------- -------------------- ---------- ---------- ----------
       228 06-OCT-2008 15:33:34          1          3         21
       228 06-OCT-2008 15:34:40          4          5         10
       228 06-OCT-2008 16:27:35          6          7         12
       228 06-OCT-2008 16:29:30          8          9         10
       228 06-OCT-2008 16:30:30         10         10         10
       229 07-OCT-2008 12:59:40          1          4         10
       229 07-OCT-2008 13:00:49          5          6         10

7 rows selected.

SQL> select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST')) ;

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID  6mu4cggkf86g4, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ id, x, rn, last_rn, y from  (select id, x, y, rn,   last_value(r
(partition by id order by x range between current row and numtodsinterval(59, 'second') following) l
from    (select id, x, y, row_number() over (partition by id order by x) rn from t1)) start with rn = 1
connect by (id, rn) = ((prior id, prior last_rn+1))

Plan hash value: 936035874

----------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Me
----------------------------------------------------------------------------------------------------
|*  1 |  CONNECT BY WITH FILTERING|      |      1 |        |      7 |00:00:00.01 |      42 |  2048 |  2048 | 2048  (0)|
|*  2 |   VIEW                    |      |      1 |     16 |      2 |00:00:00.01 |       7 |       |       |          |
|   3 |    WINDOW SORT            |      |      1 |     16 |     16 |00:00:00.01 |       7 |  2048 |  2048 | 2048  (0)|
|   4 |     VIEW                  |      |      1 |     16 |     16 |00:00:00.01 |       7 |       |       |          |
|   5 |      WINDOW SORT          |      |      1 |     16 |     16 |00:00:00.01 |       7 |  2048 |  2048 | 2048  (0)|
|   6 |       TABLE ACCESS FULL   | T1   |      1 |     16 |     16 |00:00:00.01 |       7 |       |       |          |
|*  7 |   FILTER                  |      |      5 |        |      5 |00:00:00.01 |      35 |       |       |          |
|*  8 |    HASH JOIN              |      |      5 |        |     62 |00:00:00.01 |      35 |  1269K|  1269K|  335K (0)|
|   9 |     CONNECT BY PUMP       |      |      5 |        |      7 |00:00:00.01 |       0 |       |       |          |
|  10 |     VIEW                  |      |      5 |     16 |     80 |00:00:00.01 |      35 |       |       |          |
|  11 |      WINDOW SORT          |      |      5 |     16 |     80 |00:00:00.01 |      35 |  2048 |  2048 | 2048  (0)|
|  12 |       VIEW                |      |      5 |     16 |     80 |00:00:00.01 |      35 |       |       |          |
|  13 |        WINDOW SORT        |      |      5 |     16 |     80 |00:00:00.01 |      35 |  2048 |  2048 | 2048  (0)|
|  14 |         TABLE ACCESS FULL | T1   |      5 |     16 |     80 |00:00:00.01 |      35 |       |       |          |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ID"=PRIOR NULL)
   2 - filter("RN"=1)
   7 - filter("RN"=PRIOR NULL+1)
   8 - access("ID"=PRIOR NULL)


36 rows selected.

SQL> select x,y,id,trunc(abs(((x-min(x) over (partition by id))*24*60) - 0.0001)) grp from t1 order
by id,x;

X                             Y         ID        GRP
-------------------- ---------- ---------- ----------
06-OCT-2008 15:33:34         21        228          0
06-OCT-2008 15:34:15         10        228          0
06-OCT-2008 15:34:30         10        228          0
06-OCT-2008 15:34:40         10        228          1
06-OCT-2008 15:34:50         10        228          1
06-OCT-2008 16:27:35         12        228         54
06-OCT-2008 16:28:30         10        228         54
06-OCT-2008 16:29:30         10        228         55
06-OCT-2008 16:30:29         10        228         56
06-OCT-2008 16:30:30         10        228         56
07-OCT-2008 12:59:40         10        229          0
07-OCT-2008 12:59:50         10        229          0
07-OCT-2008 13:00:05         10        229          0
07-OCT-2008 13:00:30         10        229          0
07-OCT-2008 13:00:49         10        229          1
07-OCT-2008 13:00:55         10        229          1

16 rows selected.

SQL> select /*+ gather_plan_statistics */ min(x), min(y) keep (dense_rank first order by x) my from 
(select x,y,id, trunc(abs((((x-min(x) over (partition by id))*24*60) - 0.0001))) grp from t1) group 
by id,grp ;

MIN(X)                       MY
-------------------- ----------
06-OCT-2008 15:33:34         21
06-OCT-2008 15:34:40         10
06-OCT-2008 16:27:35         12
06-OCT-2008 16:29:30         10
06-OCT-2008 16:30:29         10
07-OCT-2008 12:59:40         10
07-OCT-2008 13:00:49         10

7 rows selected.

SQL> select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST')) ;

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID  0htwzwfguzh0r, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ min(x), min(y) keep (dense_rank first order by x) my from (sele
x,y,id, trunc(abs((((x-min(x) over (partition by id))*24*60) - 0.0001))) grp from t1) group by id,gr

Plan hash value: 3529221598

----------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Us
----------------------------------------------------------------------------------------------------
|   1 |  SORT GROUP BY       |      |      1 |      2 |      7 |00:00:00.01 |       7 |  2048 |  2048 | 204
|   2 |   VIEW               |      |      1 |     16 |     16 |00:00:00.01 |       7 |       |       |          |
|   3 |    WINDOW SORT       |      |      1 |     16 |     16 |00:00:00.01 |       7 |  2048 |  2048 | 204
|   4 |     TABLE ACCESS FULL| T1   |      1 |     16 |     16 |00:00:00.01 |       7 |       |       |         
----------------------------------------------------------------------------------------------------


16 rows selected.

Tom Kyte
August 02, 2010 - 8:34 am UTC

I am trying to get first record within an interval of 1 minute for an ID from following resultset

X                             Y         ID
-------------------- ---------- ----------
06-OCT-2008 15:33:34         21        228
06-OCT-2008 15:34:15         10        228
06-OCT-2008 15:34:30         10        228
06-OCT-2008 15:34:40         10        228
06-OCT-2008 15:34:50         10        228
06-OCT-2008 16:27:35         12        228
06-OCT-2008 16:28:30         10        228
06-OCT-2008 16:29:30         10        228
06-OCT-2008 16:30:29         10        228
06-OCT-2008 16:30:30         10        228
07-OCT-2008 12:59:40         10        229
07-OCT-2008 12:59:50         10        229
07-OCT-2008 13:00:05         10        229
07-OCT-2008 13:00:30         10        229
07-OCT-2008 13:00:49         10        229
07-OCT-2008 13:00:55         10        229


For e.g. the following set of records will be considered as a group

X                             Y         ID
-------------------- ---------- ----------
06-OCT-2008 15:33:34         21        228
06-OCT-2008 15:34:15         10        228
06-OCT-2008 15:34:30         10        228


and should generate the output as

X                             Y         ID
-------------------- ---------- ----------
06-OCT-2008 15:33:34         21        228



I stopped reading shortly after this - because - I never saw any logic or reasoning here.

why are those three records "the group"
why is that one record "the answer"

Re:Group by interval using alternate approach

Narendra, August 02, 2010 - 9:03 am UTC

Tom,

Allow me to explain again.
X                             Y         ID
-------------------- ---------- ----------
06-OCT-2008 15:33:34         21        228
06-OCT-2008 15:34:15         10        228
06-OCT-2008 15:34:30         10        228
06-OCT-2008 15:34:40         10        228
06-OCT-2008 15:34:50         10        228
06-OCT-2008 16:27:35         12        228
06-OCT-2008 16:28:30         10        228
06-OCT-2008 16:29:30         10        228
06-OCT-2008 16:30:29         10        228
06-OCT-2008 16:30:30         10        228
07-OCT-2008 12:59:40         10        229
07-OCT-2008 12:59:50         10        229
07-OCT-2008 13:00:05         10        229
07-OCT-2008 13:00:30         10        229
07-OCT-2008 13:00:49         10        229
07-OCT-2008 13:00:55         10        229

In above resultset, the following records belong to same group because
06-OCT-2008 15:33:34         21        228
06-OCT-2008 15:34:15         10        228
06-OCT-2008 15:34:30         10        228

the dates "06-OCT-2008 15:34:15" and "06-OCT-2008 15:34:30" fall within one minute interval from "06-OCT-2008 15:33:34".
The next date in the resultset, "06-OCT-2008 15:34:40" is out of one minute interval from "06-OCT-2008 15:33:34" and so will start new group and act as a baseline for next group member(s), which will be "06-OCT-2008 15:34:50" since it falls within one minute from "06-OCT-2008 15:34:40" and so on.
Hope this helps.
Tom Kyte
August 02, 2010 - 10:00 am UTC

and the one row you get as output, what was that.

and should generate the output as

X                             Y         ID
-------------------- ---------- ----------
06-OCT-2008 15:33:34         21        228

Re:Group by interval using alternate approach

Narendra, August 02, 2010 - 9:09 am UTC

I thought your following threads answer similar questions
http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:4222062043865
and
http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:2285405972624#30168486048752

The only difference between test data in those threads and my test data is that mine contains seconds as well and I am not able to use same technique (which I think is possible)

Re:Group by interval using alternate approach

Narendra, August 02, 2010 - 10:08 am UTC

and the one row you get as output, what was that.

and should generate the output as

X Y ID
-------------------- ---------- ----------
06-OCT-2008 15:33:34 21 228

From each group, the row(s) where x = min(x) within the "group", should be included in the output.

Tom Kyte
August 02, 2010 - 10:44 am UTC

However, I was wondering if it would be possible to use the approach that you suggested at

Now that I see what you are after - the answer would be "no, I don't think that other approach would work for you"

the other approach works with discrete ranges that have fixed start/stop times (the division of a date works that way, I'm just rounding a date value down to some start point to figure out what group it belongs in).

You want to dynamically pick the start range for the times from group to group.

Yar, June 01, 2011 - 3:12 am UTC

CREATE TABLE GRTAB
(
 ID NUMBER
 INS_DATE DATE
)
/


INSERT INTO GRTAB VALUES(1,TO_DATE('2011-05-25 23:13:32','YYYY-MM-DD HH24:MI:SS'));
INSERT INTO GRTAB VALUES(1,TO_DATE('2011-05-26 02:14:19','YYYY-MM-DD HH24:MI:SS'));
INSERT INTO GRTAB VALUES(1,TO_DATE('2011-05-26 04:15:30','YYYY-MM-DD HH24:MI:SS'));
INSERT INTO GRTAB VALUES(1,TO_DATE('2011-05-26 05:14:31','YYYY-MM-DD HH24:MI:SS'));
INSERT INTO GRTAB VALUES(1,TO_DATE('2011-05-26 07:15:19','YYYY-MM-DD HH24:MI:SS'));
INSERT INTO GRTAB VALUES(1,TO_DATE('2011-05-26 10:15:50','YYYY-MM-DD HH24:MI:SS'));
INSERT INTO GRTAB VALUES(1,TO_DATE('2011-05-26 13:44:46','YYYY-MM-DD HH24:MI:SS'));
INSERT INTO GRTAB VALUES(1,TO_DATE('2011-05-26 15:14:54','YYYY-MM-DD HH24:MI:SS'));
INSERT INTO GRTAB VALUES(1,TO_DATE('2011-05-26 16:15:01','YYYY-MM-DD HH24:MI:SS'));
INSERT INTO GRTAB VALUES(1,TO_DATE('2011-05-26 17:14:38','YYYY-MM-DD HH24:MI:SS'));
INSERT INTO GRTAB VALUES(1,TO_DATE('2011-05-26 19:15:36','YYYY-MM-DD HH24:MI:SS'));
INSERT INTO GRTAB VALUES(2,TO_DATE('2011-05-30 11:30:17','YYYY-MM-DD HH24:MI:SS'));
INSERT INTO GRTAB VALUES(2,TO_DATE('2011-05-30 14:30:22','YYYY-MM-DD HH24:MI:SS'));



I'd like to group INS_DATE every 6 hours from the first INS_DATE of every ID. And on each group, set the BASE_DATE on the first ins_date per group, like:

ID INS_DATE            BASE_DATE 
-- ------------------- ------------------- 
 1 2011-05-25 23:13:32 2011-05-25 23:13:32  
 1 2011-05-26 02:14:19 2011-05-25 23:13:32  
 1 2011-05-26 04:15:30 2011-05-25 23:13:32  
 1 2011-05-26 05:14:31 2011-05-26 05:14:31  
 1 2011-05-26 07:15:19 2011-05-26 05:14:31  
 1 2011-05-26 10:15:50 2011-05-26 05:14:31  
 1 2011-05-26 13:44:46 2011-05-26 13:44:46  
 1 2011-05-26 15:14:54 2011-05-26 13:44:46  
 1 2011-05-26 16:15:01 2011-05-26 13:44:46  
 1 2011-05-26 17:14:38 2011-05-26 13:44:46  
 1 2011-05-26 19:15:36 2011-05-26 13:44:46  
 2 2011-05-30 11:30:17 2011-05-30 11:30:17  
 2 2011-05-30 14:30:22 2011-05-30 11:30:17  


been struggling the whole time doing this, any help is appreciated
Tom Kyte
June 01, 2011 - 2:28 pm UTC

ops$tkyte%ORA11GR2> select id, ins_date, fv, hours, grp, first_value( ins_date ) over (partition by id, grp order by ins_date) new_fv
  2    from (
  3  select id, ins_date, fv, hours, trunc(hours/6) grp
  4    from (
  5  select id, ins_date, fv, trunc((ins_date-fv)*24) hours
  6    from (
  7  select id, ins_date, first_value(ins_date) over (partition by id order by ins_date) fv
  8    from grtab
  9         )
 10         )
 11         )
 12  /

        ID INS_DATE             FV                        HOURS        GRP NEW_FV
---------- -------------------- -------------------- ---------- ---------- --------------------
         1 25-may-2011 23:13:32 25-may-2011 23:13:32          0          0 25-may-2011 23:13:32
         1 26-may-2011 02:14:19 25-may-2011 23:13:32          3          0 25-may-2011 23:13:32
         1 26-may-2011 04:15:30 25-may-2011 23:13:32          5          0 25-may-2011 23:13:32
         1 26-may-2011 05:14:31 25-may-2011 23:13:32          6          1 26-may-2011 05:14:31
         1 26-may-2011 07:15:19 25-may-2011 23:13:32          8          1 26-may-2011 05:14:31
         1 26-may-2011 10:15:50 25-may-2011 23:13:32         11          1 26-may-2011 05:14:31
         1 26-may-2011 13:44:46 25-may-2011 23:13:32         14          2 26-may-2011 13:44:46
         1 26-may-2011 15:14:54 25-may-2011 23:13:32         16          2 26-may-2011 13:44:46
         1 26-may-2011 16:15:01 25-may-2011 23:13:32         17          2 26-may-2011 13:44:46
         1 26-may-2011 17:14:38 25-may-2011 23:13:32         18          3 26-may-2011 17:14:38
         1 26-may-2011 19:15:36 25-may-2011 23:13:32         20          3 26-may-2011 17:14:38
         2 30-may-2011 11:30:17 30-may-2011 11:30:17          0          0 30-may-2011 11:30:17
         2 30-may-2011 14:30:22 30-may-2011 11:30:17          3          0 30-may-2011 11:30:17

13 rows selected.



I don't agree with your sample output - you have things that are 18 and 20 hours off in the same group as something 17 hours off

A reader, June 01, 2011 - 8:20 pm UTC

Hi Tom,

Thank you very much for the feedback.

Allow me to rephrase:

We need to get the reference date starting from the first row's date value of each group (by ID). Succeeding rows within the 6 hour time will get the same base date. Once a row whose value is on the 6th hour after that base date, the base date is changed and is set to the current row's date value:

ID INS_DATE            BASE_DATE 
-- ------------------- ------------------- 
 1 2011-05-25 23:13:32 2011-05-25 23:13:32 <-1st row,gets date
 1 2011-05-26 02:14:19 2011-05-25 23:13:32 <- less than 6hr
 1 2011-05-26 04:15:30 2011-05-25 23:13:32 <- less than 6hr
 1 2011-05-26 05:14:31 2011-05-26 05:14:31 <- INS_DATE is >= preceding base_date; base_date is re-set to current row's INS_DATE
 1 2011-05-26 07:15:19 2011-05-26 05:14:31 <- less than 6hr
 1 2011-05-26 10:15:50 2011-05-26 05:14:31 <- less than 6hr
 1 2011-05-26 13:44:46 2011-05-26 13:44:46 <- INS_DATE is >= preceding base_date; base_date is re-set to current row's INS_DATE
 1 2011-05-26 15:14:54 2011-05-26 13:44:46 <- less than 6hr
 1 2011-05-26 16:15:01 2011-05-26 13:44:46 <- less than 6hr
 1 2011-05-26 17:14:38 2011-05-26 13:44:46 <- less than 6hr 
 1 2011-05-26 19:15:36 2011-05-26 13:44:46 <- less than 6hr 
 2 2011-05-30 11:30:17 2011-05-30 11:30:17 <- First record of the new ID, takes INS_DATE as BASE_DATE
 2 2011-05-30 14:30:22 2011-05-30 11:30:17 <- less than 6hr 

Tom Kyte
June 02, 2011 - 8:49 am UTC

I don't see offhand a way to do that with analytics.

@Yar re: GRTAB

Stew Ashton, June 03, 2011 - 7:36 am UTC

You can do this with a variant of the technique I proposed above to Rohit. It uses analytics, hierarchical query and a self join. Beware of the bug he ran into on V11.1.
with Q1 as (
  SELECT A.*
  ,LAST_VALUE(RN) OVER(PARTITION BY ID ORDER BY INS_DATE 
    RANGE BETWEEN CURRENT ROW AND NUMTODSINTERVAL(6, 'hour') FOLLOWING) LAST_RN
  FROM (
    SELECT ID, INS_DATE, ROW_NUMBER() OVER(PARTITION BY ID ORDER BY INS_DATE) RN 
    FROM GRTAB
  ) A
), q2 as (
  select * from  q1
  START WITH RN = 1
  CONNECT BY (ID, RN) = ((PRIOR ID, PRIOR LAST_RN+1))
)
SELECT q1.id, q1.ins_date, q2.ins_date base_date FROM Q1, Q2
WHERE Q1.ID = Q2.ID AND Q1.RN BETWEEN Q2.RN AND Q2.LAST_RN
ORDER BY Q1.ID, Q1.INS_DATE;

Induction Solutions

Brendan, June 04, 2011 - 7:09 am UTC

Interesting question and nice solution from Stew. Before reading Stew's solution, I had got as far as seeing that the problem could be viewed as a simple induction. If c, b are the current date, base date, and p is the prior base date then:

b = c - p > 0.25 ? c : p

The problem then is that the obvious function to use to get the prior value, Lag, won't work owing to the recursion. On seeing Stew's use of Row_Number and recursion by tree-walk, I realised an induction solution could be implemented by similar techniques, although I substituted recurive subquery factoring for the tree-walk. I then thought that this might be a good case for the Model clause. Here are the two alternative solutions:

Recursive Subquery Factor
WITH ind AS (
SELECT id, ins_date, Row_Number() OVER (PARTITION BY id ORDER BY ins_date) rn
  FROM grtab
), rsq (id, rn, ins_date, base_date) AS (
SELECT id, rn, ins_date, ins_date   base_date
  FROM ind
 WHERE rn         = 1
 UNION ALL
SELECT  ind.id,
        ind.rn,
        ind.ins_date, 
        CASE WHEN ind.ins_date - rsq.base_date > 0.25 THEN ind.ins_date ELSE rsq.base_date END 
  FROM ind
  JOIN rsq
    ON rsq.rn     = ind.rn - 1
   AND rsq.id     = ind.id
)
SELECT rsq.id, rsq.ins_date, rsq.base_date
  FROM rsq
ORDER BY rsq.id, rsq.rn
/

Model
SELECT id, ins_date, base_date
  FROM grtab
 MODEL
    PARTITION BY (id)
    DIMENSION BY (Row_Number() OVER (PARTITION BY id ORDER BY ins_date) rn)
    MEASURES (ins_date, ins_date base_date)
    RULES (
       base_date[rn = 1] = ins_date[cv()],
       base_date[rn > 1] = CASE WHEN ins_date[cv()] - base_date[cv()-1] > 0.25 THEN ins_date[cv()] ELSE base_date[cv()-1] END
    )
ORDER BY 1, 2
/

        ID INS_DATE            BASE_DATE
---------- ------------------- -------------------
         1 2011-05-25 23:13:32 2011-05-25 23:13:32
           2011-05-26 02:14:19 2011-05-25 23:13:32
           2011-05-26 04:15:30 2011-05-25 23:13:32
           2011-05-26 05:14:31 2011-05-26 05:14:31
           2011-05-26 07:15:19 2011-05-26 05:14:31
           2011-05-26 10:15:50 2011-05-26 05:14:31
           2011-05-26 13:44:46 2011-05-26 13:44:46
           2011-05-26 15:14:54 2011-05-26 13:44:46
           2011-05-26 16:15:01 2011-05-26 13:44:46
           2011-05-26 17:14:38 2011-05-26 13:44:46
           2011-05-26 19:15:36 2011-05-26 13:44:46
         2 2011-05-30 11:30:17 2011-05-30 11:30:17
           2011-05-30 14:30:22 2011-05-30 11:30:17

13 rows selected.

@Brendan re:Induction solutions

Stew Ashton, June 05, 2011 - 2:49 am UTC


Brendan, thanks for the kind words. I dislike my solution because it's not straightforward and the execution plan uses a temporary table with physical reads and writes. So I used Tom's RUNSTATS package and DBMS_XPLAN.DISPLAY_CURSOR to compare your solutions with mine.

In tests with 212K rows, both your queries avoid the temporary table but your recursive subquery factoring solution does lots of logical I/O and is a bit slower than mine. Your MODEL solution is limpid, fast and does only one full scan of the table.

I have tried the MODEL clause in the past and my efforts were complex and slow, so I had given up on it. Thanks for providing a clear, efficient use case.
Tom Kyte
June 06, 2011 - 9:31 am UTC

Thanks to both of you for going above and beyond...

I'll have to learn that model clause myself someday :)

limits of analytics

Sokrates, June 06, 2011 - 2:41 pm UTC

very interesting problem !
quite easy to solve when you think "procedural" ( so a pipelined table function solution would do the trick quite easy ) but shows the limits of analytics.

would "Comparative Window Functions" ( http://tkyte.blogspot.com/2009_11_01_archive.html ) be of any use here ?

P.S. on induction solutions

Stew Ashton, June 06, 2011 - 3:39 pm UTC


Tom, appreciate your encouragement.

Brendan, after reading the documentation I think your second rule needs an "order by"
base_date[rn > 1] order by rn =...
otherwise results are non-deterministic. I hate it when that happens. http://download.oracle.com/docs/cd/E11882_01/server.112/e16579/sqlmodel.htm#i1012511

Thank you!

Yar, June 29, 2011 - 1:39 pm UTC

Thanks Tom, Thanks Brendan! Just what i need.

Date in previous row

Tracy, September 08, 2011 - 9:09 am UTC

 SQL > create table t3 (id number, starttime date, usernumber number, description varchar2(20));

Table created.

 SQL > insert into t3 values (1,  sysdate,1,'Test');
 SQL > insert into t3 values (2,  sysdate,1,'Test');
 SQL > insert into t3 values (3,  sysdate,1,'Test');
 SQL > insert into t3 values (4,  sysdate,1,'Test');
 SQL > insert into t3 values (5   sysdate,1,'Test 2');
 SQL > insert into t3 values (6,  sysdate,1,'Test 3');
 SQL > insert into t3 values (7,  sysdate,1,'Test 4');
 SQL > insert into t3 values (8,  sysdate,2,'Test 4');
 SQL > insert into t3 values (9,  sysdate,2,'Test 4');
 SQL > insert into t3 values (10, sysdate,3,'Test 4');
 SQL > insert into t3 values (11, sysdate,3,'Test 4');
 SQL > insert into t3 values (12, sysdate,1,'Test 4');

 SQL > select id, to_char(starttime,'yyyy-mm-dd hh24:mi:ss'), usernumber, description from t3 order by id;

        ID TO_CHAR(STARTTIME,' USERNUMBER DESCRIPTION
---------- ------------------- ---------- --------------------
         1  2011-09-08 15:50:48          1 Test
         2  2011-09-08 15:51:01          1 Test
         3  2011-09-08 15:51:28          1 Test
         4  2011-09-08 15:51:32          1 Test
         5  2011-09-08 15:51:57          1 Test 2
         6  2011-09-08 15:52:11          1 Test 3
         7  2011-09-08 15:52:30          1 Test 4
         8  2011-09-08 15:52:31          2 Test 4
         9  2011-09-08 15:52:50          2 Test 4
        10  2011-09-08 15:52:52          3 Test 4
        11  2011-09-08 15:52:56          3 Test 4
        12  2011-09-08 15:53:23          1 Test 4

12 rows selected.

I want a query to return all rows grouped by usernumber and description but only where the starttime is within 5 seconds of the previous starttime. 

Given the above data I would like to return rows with Id=3,4,10,11

Tom Kyte
September 08, 2011 - 5:46 pm UTC

but this is *almost exactly* what the original question above is???? You mean to say you cannot change some column names and table names and apply the TECHNIQUE to your problem?

you need to add a partition by clause to 'group' things (break things apart, partition things by) usernumber and description - but short of that - it is the pretty much a variation on a theme here...

why does your test case (which is missing a comma by the way on one of the inserts) use SYSDATE. You know, I cannot get your example that way. Try again if you want me to do the simple changes to the above query....

tracy, September 09, 2011 - 3:51 am UTC

I realise that it is very similar to the original question (that's how I ended up here), but nevertheless I couldn't seem to get it quite right. Here are the inserts with dates specified:

SQL> drop table t3;
SQL> create table t3 (id number, starttime date, usernumber number, description varchar2(20));
SQL> 
SQL> insert into t3 values (1,  to_date( '08-sep-2011 15:50:48', 'dd-mon-yyyy hh24:mi:ss' ) , 1 ,'Test');
SQL> insert into t3 values (2,  to_date( '08-sep-2011 15:51:01', 'dd-mon-yyyy hh24:mi:ss' ) , 1 ,'Test');
SQL> insert into t3 values (3,  to_date( '08-sep-2011 15:51:28', 'dd-mon-yyyy hh24:mi:ss' ) , 1 ,'Test');
SQL> insert into t3 values (4,  to_date( '08-sep-2011 15:51:32', 'dd-mon-yyyy hh24:mi:ss' ) , 1 ,'Test');
SQL> insert into t3 values (5,  to_date( '08-sep-2011 15:51:57', 'dd-mon-yyyy hh24:mi:ss' ) , 1 ,'Test 2');
SQL> insert into t3 values (6,  to_date( '08-sep-2011 15:52:11', 'dd-mon-yyyy hh24:mi:ss' ) , 1 ,'Test 3');
SQL> insert into t3 values (7,  to_date( '08-sep-2011 15:52:30', 'dd-mon-yyyy hh24:mi:ss' ) , 1 ,'Test 4');
SQL> insert into t3 values (8,  to_date( '08-sep-2011 15:52:31', 'dd-mon-yyyy hh24:mi:ss' ) , 2 ,'Test 4');
SQL> insert into t3 values (9,  to_date( '08-sep-2011 15:52:50', 'dd-mon-yyyy hh24:mi:ss' ) , 2 ,'Test 4');
SQL> insert into t3 values (10, to_date( '08-sep-2011 15:52:52', 'dd-mon-yyyy hh24:mi:ss' ) , 3 ,'Test 4');
SQL> insert into t3 values (11, to_date( '08-sep-2011 15:52:56', 'dd-mon-yyyy hh24:mi:ss' ) , 3 ,'Test 4');
SQL> insert into t3 values (12, to_date( '08-sep-2011 15:53:23', 'dd-mon-yyyy hh24:mi:ss' ) , 1 ,'Test 4');
SQL> 
SQL> select id, to_char(starttime,'yyyy-mm-dd hh24:mi:ss'), usernumber, description from t3 order  by id;

        ID TO_CHAR(STARTTIME,' USERNUMBER DESCRIPTION
---------- ------------------- ---------- -----------------------------------------------------------------
         1  2011-09-08 15:50:48          1 Test
         2  2011-09-08 15:51:01          1 Test
         3  2011-09-08 15:51:28          1 Test
         4  2011-09-08 15:51:32          1 Test
         5  2011-09-08 15:51:57          1 Test 2
         6  2011-09-08 15:52:11          1 Test 3
         7  2011-09-08 15:52:30          1 Test 4
         8  2011-09-08 15:52:31          2 Test 4
         9  2011-09-08 15:52:50          2 Test 4
        10  2011-09-08 15:52:52          3 Test 4
        11  2011-09-08 15:52:56          3 Test 4
        12  2011-09-08 15:53:23          1 Test 4

Crikey

tracy, September 09, 2011 - 6:01 am UTC

Bazoiks !! This seems to work. I don't know much about these analytic whatsits but I tried a few different things and this one produces the right results for the above data:

select usernumber,description,starttime from (
select usernumber , description, starttime,
lead(starttime) over ( partition by usernumber, description order by starttime) le,
lag (starttime) over ( partition by usernumber, description order by starttime) la
from t3 order by starttime
)
where le-starttime < 5/24/60/60 or starttime-la < 5/24/60/60

USERNUMBER DESCRIPT TO_CHAR(STARTTIME,'
---------- -------- -------------------
1 Test 2011-09-08 15:51:28
1 Test 2011-09-08 15:51:32
3 Test 4 2011-09-08 15:52:52
3 Test 4 2011-09-08 15:52:56

Is this just a fluke, or is it correct?

Tom Kyte
September 09, 2011 - 7:28 am UTC

you have assigned to each row the prior and next starttimes within groups of unique usernumbers and descriptions after sorting by starttime.

You keep every record such that the prior or next record is within five seconds of it.

If that is what you wanted, you have it.

It is best to try to understand the technique - so you can answer whether or not it is getting the right answer.

Resetting partitioned running totals...

J, March 20, 2012 - 2:32 pm UTC

Hello! Hope all's well.
Version: Oracle 11.2.0.3
Scenario: We have a table of events with two columns: CID contains the customer id, and EDT contains the event date. We need to assign [CID+EDT] combinations to five-day groups.
Rule: If an event occurs after the fifth day from the start of a group, then it becomes the new "anchor" for the start of another five-day group. It's this "becomes a new anchor" part that we can't quite figure out.
Idea: partition by CID, order by EDT, identify the prior EDT
and determine the date difference. The first row of each partition becomes the anchor for the first group. Then if the sum of subsequent (EDT-prior EDT) rows >= 5, then that row becomes the next anchor, and the counter is reset to zero. Perhaps it's really close, but just can't make the last leap of how to reset the counter after an anchor is identified. Any advice would be most welcome, (thanks)^23!

drop table t;

create table t (cid number, edt date);

insert into t (cid,edt) values (1,trunc(sysdate)); -- s/b anchor 1
insert into t (cid,edt) values (1,trunc(sysdate)+5); -- 2
insert into t (cid,edt) values (1,trunc(sysdate)+6); --
insert into t (cid,edt) values (1,trunc(sysdate)+6); --
insert into t (cid,edt) values (1,trunc(sysdate)+7); --
insert into t (cid,edt) values (1,trunc(sysdate)+10); -- 6
insert into t (cid,edt) values (1,trunc(sysdate)+18); -- 7
insert into t (cid,edt) values (1,trunc(sysdate)+21); --
insert into t (cid,edt) values (1,trunc(sysdate)+22); --
insert into t (cid,edt) values (1,trunc(sysdate)+23); -- 10
insert into t (cid,edt) values (1,trunc(sysdate)+24); --
insert into t (cid,edt) values (1,trunc(sysdate)+25); --
insert into t (cid,edt) values (1,trunc(sysdate)+27); --
insert into t (cid,edt) values (1,trunc(sysdate)+28); -- 14
insert into t (cid,edt) values (1,trunc(sysdate)+29); --
insert into t (cid,edt) values (1,trunc(sysdate)+33); -- 16
insert into t (cid,edt) values (1,trunc(sysdate)+36); --

commit;

select cid, rn
,to_char(edt,'mm/dd/yy') edt
,to_char(p_edt,'mm/dd/yy') p_edt
,edt-p_edt dt_diff
,case -- and here's the sticky bit
when rn = 1 then rn
when edt-p_edt >= 5 then rn
else null -- ?some analytic case stmt?
end anchor
from
(select cid
,row_number () over
(partition by cid order by edt) rn
,edt
,first_value(edt) over
(partition by cid order by edt
rows 1 preceding) p_edt
from t)
order by cid, rn;

CID RN EDT P_EDT DT_DIFF ANCHOR
--- -- -------- -------- ------- ------
1 1 03/20/12 03/20/12 0 1
1 2 03/25/12 03/20/12 5 2
1 3 03/26/12 03/25/12 1
1 4 03/26/12 03/26/12 0
1 5 03/27/12 03/26/12 1
1 6 03/30/12 03/27/12 3 s/b 6 (1+0+1+3) >= 5
1 7 04/07/12 03/30/12 8 7
1 8 04/10/12 04/07/12 3
1 9 04/11/12 04/10/12 1
1 10 04/12/12 04/11/12 1 s/b 10 (3+1+1) >= 5
1 11 04/13/12 04/12/12 1
1 12 04/14/12 04/13/12 1
1 13 04/16/12 04/14/12 2
1 14 04/17/12 04/16/12 1 s/b 14 (1+1+2+1) >= 5
1 15 04/18/12 04/17/12 1
1 16 04/22/12 04/18/12 4 s/b 16 (1+4) >= 5
1 17 04/25/12 04/22/12 3

17 rows selected.

Resetting partitioned running totals - 2

J, April 11, 2012 - 1:27 pm UTC

Hello Tom.

Hmmm... Hope I didn't post a goofball or painfully inane question here. Apologies if that's the case. Has something along this line (resetting sums) in a query already been addressed, or was the question unclear? Or perhaps we need to completely rethink our approach. In any case, thanks for the fantastic site and have a great week!

Best Regards,
- J
Tom Kyte
April 11, 2012 - 2:14 pm UTC

neither goofball nor inane.

Just something that analytics are not going to be able to do. This 'reset' thing is non-trivial.

see
http://www.oracle.com/technetwork/issue-archive/2012/12-mar/o22asktom-1518271.html

and read the "grouping ranges" part... Based on that, you could write a plsql pipelined function - or use recursive SQL:

ops$tkyte%ORA11GR2> with data1
  2  as
  3  (select row_number() over (partition by cid order by edt) rn, t.*
  4     from t
  5  ),
  6  data2 (rn, cid, edt, diff, flag )
  7  as
  8  (select rn, cid, edt, 0 diff, 1 flag
  9     from data1
 10    where rn = 1
 11    UNION ALL
 12   select data1.rn, data1.cid, data1.edt,
 13          case when data2.diff + (data1.edt-data2.edt) >= 5
 14               then 0
 15               else data2.diff + (data1.edt-data2.edt)
 16           end diff,
 17          case when data2.diff + (data1.edt-data2.edt) >= 5
 18               then data2.flag + 1
 19               else data2.flag
 20           end flag
 21     from data1, data2
 22    where data1.rn = data2.rn+1
 23      and data1.cid = data2.cid
 24  )
 25  select *
 26    from data2
 27   order by cid, edt
 28  /

        RN        CID EDT             DIFF       FLAG
---------- ---------- --------- ---------- ----------
         1          1 11-APR-12          0          1
         2          1 16-APR-12          0          2
         3          1 17-APR-12          1          2
         4          1 17-APR-12          1          2
         5          1 18-APR-12          2          2
         6          1 21-APR-12          0          3
         7          1 29-APR-12          0          4
         8          1 02-MAY-12          3          4
         9          1 03-MAY-12          4          4
        10          1 04-MAY-12          0          5
        11          1 05-MAY-12          1          5
        12          1 06-MAY-12          2          5
        13          1 08-MAY-12          4          5
        14          1 09-MAY-12          0          6
        15          1 10-MAY-12          1          6
        16          1 14-MAY-12          0          7
        17          1 17-MAY-12          3          7
         1          2 11-APR-12          0          1
         2          2 16-APR-12          0          2
         3          2 17-APR-12          1          2
         4          2 17-APR-12          1          2
         5          2 18-APR-12          2          2
         6          2 21-APR-12          0          3
         7          2 29-APR-12          0          4
         8          2 02-MAY-12          3          4
         9          2 03-MAY-12          4          4
        10          2 04-MAY-12          0          5
        11          2 05-MAY-12          1          5
        12          2 06-MAY-12          2          5
        13          2 08-MAY-12          4          5
        14          2 09-MAY-12          0          6
        15          2 10-MAY-12          1          6
        16          2 14-MAY-12          0          7
        17          2 17-MAY-12          3          7

34 rows selected.



(I made your example more "interesting" by repeating the data for cid=1 as cid=2)

Or you could work out an example using the model clause as demonstrated there as well...

Read the article though as it talks about a 'con' of the above approach if there is a large set of data.

Resetting partitioned running totals - 3

J, April 11, 2012 - 11:23 pm UTC

Thanks so much! Oh, there's some interesting stuff tucked in there (it's currently 9:25pm out here and it's been a crazy-long day), so I'll tackle it with a fresh set of eyes in the morning. A very-very different approach indeed (you should write a book on "Approaches in Query Design for Oracle" - I'd buy one in a heartbeat). We'll also follow up on that link, for sure. Cheers!

Similar problem

Joanna, July 19, 2018 - 12:20 pm UTC

Hi Tom,
I have very similar problem to described above by LB.

I have records like this as well:

Time Amount
11/22/2003 12:22:01 100
11/22/2003 12:22:03 200
11/22/2003 12:22:04 300
11/22/2003 12:22:45 100
11/22/2003 12:22:46 200
11/22/2003 12:23:12 100
11/22/2003 12:23:12 200

What I need to do is sum the amounts where the time of the records is within 2 (or let it be 3) seconds of each other.

The difference is for me it doesn't matter if there are 2 or 10 rows i just need to have groups within 2 seconds.

So for me the data like this:

11/22/2003 12:22:03 200
11/22/2003 12:22:04 200

11/22/2003 12:22:05 200
11/22/2003 12:22:06 200

11/22/2003 12:22:07 200
11/22/2003 12:22:08 200

11/22/2003 12:22:09 200

should be 4 different groups. So I cannot use lag or lead - I need to lock first row, then check if there are some within 2 seconds if so then put into group if not create new group and do similar things.

Would you please point me in the right direction?
Connor McDonald
July 20, 2018 - 10:40 am UTC

Sorry - I'm lost - that sounds exactly what the first requirement was, no? We are assigning new group numbers based on the delta of seconds.

Put your data into a table, run our initial answer and see how you go.

If you're still stuck, pass on the same inserts you used and we'll work on it here

3 seconds?

Aps Reine, November 05, 2018 - 4:25 pm UTC

Can you explain Tom's original answer from 14+ years ago:

ops$tkyte@ORA920PC>
ops$tkyte@ORA920PC> select min(x), max(x), sum(y)
2 from (
3 select x, y, max(rn) over (order by x) max_rn
4 from (
5 select x, y,
6 lag(x) over (order by x),
7 case when abs(lag(x) over (order by x) - x) > 3/24/60/60 then row_number() over (order by x)
8 when row_number() over (order by x) = 1 then 1
9 else null
10 end rn
11 from t
12 )
13 )
14 group by max_rn
15 /

MIN(X) MAX(X) SUM(Y)
-------- -------- ----------
12:22:03 12:22:12 740
12:22:18 12:22:20 226
12:22:36 12:22:47 550

why there is a time difference if 9 seconds in the first row returned, 2 seconds in the second, and 11 in the 3rd row - when the condition was that the grouping would be by 3 second intervals with case when abs(lag(x) over (order by x) - x) > 3/24/60/60

I would expect the result set to be:

MIN(X) MAX(X) SUM(Y)
-------- -------- ----------
12:22:03 12:22:06 273
12:22:18 12:22:21 289
12:22:36 12:22:39 199

Basically a sum provided for every 3 seconds of grouping.

thanks,
Aps R.

Connor McDonald
November 06, 2018 - 2:22 am UTC

This is an example of where precision in the definition is so important.

"3 seconds" can be interpreted as:

1) Records fall into a single group if record "n+1" is less than 3 seconds later than record "n". So in this instance:

rec1: 09:00:00
rec2: 09:00:01
rec3: 09:00:02
rec4: 09:00:03
rec5: 09:00:04
rec6: 09:00:05
rec7: 09:00:06


all are in a *single* group.

or

2) Records fall into a single group if record "n" is less than 3 seconds later than record "0", where record "0" is the first record for this logical group. So in this instance:

rec1: 09:00:00   start group1
rec2: 09:00:01   group1
rec3: 09:00:02   group1
rec4: 09:00:03   start group2 (because rec "0" is the 'rec1' above)
rec5: 09:00:04   group2
rec6: 09:00:05   group2
rec7: 09:00:06  


etc etc


Option 2

Aps, November 06, 2018 - 10:18 am UTC

Thanks for that explanation, I see the difference now.

Looks like Tom's original post addressed Option 1 with a single group if record "n+1" is less than 3 seconds later than record "n".

I'm struggling to figure out how to make Option 2 work within the original example. Could you provide an example of how Option 2 would look?

Thanks,
Aps
Chris Saxon
November 06, 2018 - 10:39 am UTC

The easiest way is with match_recognize:

select * 
from   t
match_recognize ( 
  order by x
  measures 
    match_number() as grp
  all rows per match
  pattern ( within_3s+ ) 
  define 
    within_3s as x < first ( x ) + interval '3' second
);

X                      GRP   Y    
01-NOV-2018 12:22:03       1   25 
01-NOV-2018 12:22:04       1   64 
01-NOV-2018 12:22:05       1   79 
01-NOV-2018 12:22:06       2   94 
01-NOV-2018 12:22:07       2   52 
01-NOV-2018 12:22:08       2   77 
01-NOV-2018 12:22:09       3   80 
01-NOV-2018 12:22:10       3    8
...


You need to be on at least 12c to use this. Read more about it at:

https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:9533477800346658909

Aps, November 06, 2018 - 11:56 am UTC

Ah, that is pretty a pretty cool feature.

Unfortunately we're still using an 11g database so I have to try to get it working based on Tom's original query.

Any hints using that original query how to restrict the window to 3 seconds as you explained in Option 2?

Many thanks,
Aps
Chris Saxon
November 08, 2018 - 3:58 pm UTC

You can do this using recursive with. Which looks something like:

with rws as (
  select t.*, row_number () over ( order by x ) rn
  from   t
), within_3s (
  x, y, grp_start, grp, rn
) as (
  select x, y, x, 1, rn from rws
  where  rn = 1
  union all
  select r.x, r.y, 
         case
           when r.x < grp_start + interval '3' second then
             grp_start
           else
            r.x
         end, 
         case
           when r.x < grp_start + interval '3' second then
             w.grp
           else
             w.grp + 1
         end, r.rn
  from   within_3s w
  join   rws r
  on     r.rn = w.rn + 1
)
  select * 
  from   within_3s;

X                      Y    GRP_START              GRP   RN   
01-NOV-2018 12:22:03     21 01-NOV-2018 12:22:03       1    1 
01-NOV-2018 12:22:04      9 01-NOV-2018 12:22:03       1    2 
01-NOV-2018 12:22:05     47 01-NOV-2018 12:22:03       1    3 
01-NOV-2018 12:22:06     70 01-NOV-2018 12:22:06       2    4 
01-NOV-2018 12:22:07     23 01-NOV-2018 12:22:06       2    5 
01-NOV-2018 12:22:08     78 01-NOV-2018 12:22:06       2    6 
01-NOV-2018 12:22:09     31 01-NOV-2018 12:22:09       3    7 
...

Brilliant! Thank you

A reader, November 12, 2018 - 3:03 pm UTC

this is fantastic, thank you. I think I'm headed in the right direction now.
Chris Saxon
November 12, 2018 - 3:16 pm UTC

Great

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.