Very helpful and quite impressive
LB, November 25, 2003 - 3:05 pm UTC
Wow! Thanks a lot!!!
HELP - worked in view but not stored proc
LB, December 10, 2003 - 1:44 pm UTC
Tom,
Your previous solution was most helpful and we have been using that logic as part of a view.
But then we decided instead of a view to load the data into a table every day, since the view takes a while to run.
I have created a stored procedure but for some reason it is giving me this error:
"Line 61: Pos 15: PLS-00103: Encountered the symbol "(" when expecting one of the following:
, from"
When I run the query by itself no problem, but when inserting within the procedure it gives me a problem.
Does it have to do with the case? (I am using Oracle 8.1.7)
Here is my procedure:
CREATE OR REPLACE PROCEDURE UETADMIN2.LOAD_UET_COLLECTION_DATA_PAGE
/****************************************************************************************
* By: L. Betesh
* Procedure Name: LOAD_UET_COLLECTION_DATA_PAGE
*
* Description: This procedure is called by CW2_Daily. It produces daily CW2 performance
* statistics sourced from uet_collection_DATA into UET_AVG_TEMP.
* All successful or erroneous execution messages will be logged in procedure_log.
*
* Parameters: None
*
* Dependent Tables or views:
* procedure_log - Log table
* UET_COLECTION_DATA - Raw data
* V_UET_MAPPINGS - Contains function mappings
*
****************************************************************************************/
AS
--VARIABLES FOR EXCEPTION HANDLING AND LOG REPORTING
V_NUM_ROW NUMBER :=0;
V_PROCESS_NAME VARCHAR2(50) := 'LOAD_UET_COLLECTION_DATA_PAGE';
BEGIN
INSERT INTO PROCEDURE_LOG
VALUES(v_process_name||' '||TO_CHAR(SYSDATE, 'MM/DD/YYYY FMHH24:MI:SS'), 'START');
COMMIT;
EXECUTE IMMEDIATE 'TRUNCATE TABLE UET_COLLECTION_DATA_PAGE';
INSERT INTO UET_COLLECTION_DATA_PAGE
SELECT min(req_time) AS req_time,
map_id,
ltrim(rtrim(IPADDRESS)) AS ipaddress,
user_platform,
usa_id,
SUM(T_FIRSTBYTE) AS T_FIRSTBYTE,
SUM(T_DOWNLOAD) AS T_DOWNLOAD,
SUM(T_RENDER) AS T_RENDER,
ROUND(MAX(REQ_TIME + T_OVERALL/1000) - MIN(REQ_TIME), 2)*1000 AS T_OVERALL,
ltrim(rtrim(brn_i)) AS BRN_I,
lower(ltrim(rtrim(usr_uunm_I))) AS usr_uunm_I,
usr_rmt_initr_c
FROM ( SELECT IPADDRESS, req_time, map_id, user_platform, usa_id, T_OVERALL, t_firstbyte, t_download,
t_render, brn_i, usr_uunm_I, usr_rmt_initr_c,
max(rn) over (order by req_time)
AS max_rn
FROM (
SELECT app_pg_req_m AS req_time, app_pg_I AS map_id,
usr_ip_ads_t AS ipaddress, usr_pltfrm_x AS user_platform,
ed_usa_i AS usa_id, app_1st_byt_time_n AS t_firstbyte,
app_dnld_time_n AS t_download, app_rndr_time_n AS t_render,
app_ovrll_time_n AS t_overall, LTRIM(brn_i) AS BRN_I,
LOWER(usr_uunm_I) AS USR_UUNM_I, usr_rmt_initr_c,
lag(TO_CHAR(app_pg_req_m,'MM/DD/YYYY FMHH24:MI:SS'))
/* LINE 61*/ over (order by TO_CHAR(app_pg_req_m,'MM/DD/YYYY FMHH24:MI:SS')) AS previous,
CASE
when abs(lag(app_pg_req_m)
over (order by TO_CHAR(app_pg_req_m,'MM/DD/YYYY FMHH24:MI:SS')) - app_pg_req_m) > 3/24/60/60
THEN row_number() over (order by TO_CHAR(app_pg_req_m,'MM/DD/YYYY FMHH24:MI:SS'))
when row_number() over (order by TO_CHAR(app_pg_req_m,'MM/DD/YYYY FMHH24:MI:SS')) = 1
then 1
else null
end AS rn
FROM uetadmin2.uet_collection_data, UETADMIN2.V_UET_MAPPINGS,
(SELECT COUNT(DISTINCT app_pg_i) AS PG_CNT, app_func_i
FROM uetadmin2.application_function_page
GROUP BY app_func_i) afp
WHERE uetadmin2.v_uet_mappings.APP_FUNC_I = afp.APP_FUNC_I
AND pg_cnt > 1
AND uet_collection_data.app_pg_I = V_UET_MAPPINGS.MAP_ID
)
)
GROUP BY IPADDRESS, map_id, user_platform, usa_id, brn_i, usr_uunm_I, usr_rmt_initr_c, max_rn
;
COMMIT;
v_num_row := SQL%ROWCOUNT;
INSERT INTO PROCEDURE_LOG
VALUES(v_process_name||' '||TO_CHAR(SYSDATE, 'MM/DD/YYYY FMHH24:MI:SS'), 'FINISHED. ROWS ADDED: ' || V_NUM_ROW );
COMMIT;
END;
/
THANKS!!!
-LB
December 10, 2003 - 4:16 pm UTC
case, group by rollup, group by cube, analytic functions, order by in a subquery,etc are sql constructs that are not recognized by plsql in 8i
they are in 9i
until then you must use native dynamic sql to "hide" the construct from plsql
execute immediate 'insert into .... ' using bind1, bind2, bind3, ....;
NO Case in Stored Proc
A reader, December 10, 2003 - 1:54 pm UTC
Hi Tom,
I just found another place on this site where you said:
"case is not a plsql construct until 9i. It is available in sql in 816 and 817, not plsql. "
Now I know why I am getting an error.
So the only option I can think of is to create this query as a view and then load the table as select from the view.
Am I right that that is the only option?
Thanks again!
December 10, 2003 - 4:18 pm UTC
execute immediate, dynamic sql
Question...
Kashif, December 11, 2003 - 9:41 am UTC
Hi Tom,
Excellent use of row_number and lag to achieve the required results. Though I'm a little unclear on how the 'max_rn' ( max(rn) over (order by x) ) column's value was derived, and it's my lack of understanding of how the max analytic function works. The 'rn' column has '1', followed by 9 nulls, then '11', followed by 2 nulls, and finally '14', followed 11 nulls. The 'max_rn' column though manages to substitute every null occurrence in the 'rn' column with the previous highest 'rn' value. I'm not clear on why the max function would do that, i.e. substitute the nulls in a column with the previous highest value of rn. I would've thought that if there was a null in 'rn', it would've returned a null in the 'max_rn' column. Any explanation would be helpful. Thanks.
Kashif
December 11, 2003 - 10:05 am UTC
max(rn) over (order by x)
says to take the result set (don't partition, no partition clause),
order by X (so you get the first row with 1, followed by some nulls, then 11, more nulls, then 14, more nulls)
using range between current row and unbounded preceding (that is the DEFAULT range) give me the max value. That basically means -- look for the max value in this result set starting with the current row and looking at ALL preceding rows. So, for row 1 -- that is 1 (current row plus preceding -- of which there are none). for the second row that is 1 (current row = null, preceding = 1, max is 1). for the 11th row that is 11 (current row = 11, preceding rows = NULL except for row 1 which is 1, max = 11)
and so on.
Awesome, thanks a lot Tom!
Kashif, December 11, 2003 - 1:57 pm UTC
A reader, December 11, 2003 - 2:01 pm UTC
Tom,
Out of curiosity, were you involved in the designing of Analytic functions in Oracle.
TIA.
December 11, 2003 - 2:17 pm UTC
nope.
mod_plsql.
html db to a degree..
various "i sure wish it worked like this" emails...
but no, I don't work in development really.
Wish I did think of them -- coolest things to happen to sql since 'select'
my favorite line in the reply was
peon, December 12, 2003 - 12:19 pm UTC
"the way I decided to deal with this is"
One day I hope to be able to speak or write such a phrase with confidence in that I will be able to solve any problem thrown at me with a correct and fast solution.
The only dissatisfying aspect of the masses making such comments is that the site owner never replies to such fan club clamoring... ;(
Time Based Statistics?
Raj Ganga, December 28, 2003 - 11:54 pm UTC
Tom,
i have table t
PID NUMBER
DID NUMBER
CONS_TIME DATE
CONS_FEE NUMBER
pid is the unqiue id for patient and did for Doctor
cons fee is the consoltation fee
how to find the sum of fee colloected per 5 per doctor
min on a given day.
i have tried with anlytical functions but each has its own window and sum is not reflecting correct.
Please guide me,
thank you for spending your most valuable time on my problem.
warm Regards
Raj.
December 29, 2003 - 10:27 am UTC
what is a "5 per doctor min"
no idea how to use a DATE (point in time) as a duration here. you'll really need to be a little more explicit -- an example set of inputs, the desired outputs and a "how to" go from inputs to outputs would be needed.
Timed Stastics
Rajshekhar GANGA, December 29, 2003 - 11:43 pm UTC
Tom,
table t :
PID NUMBER
DID NUMBER
CONS_TIME DATE
CONS_FEE NUMBER
Following are rows for one doctor id.
PID DID CONS_TIME CONS_FEE
--------- ---------- ------------------- ----------
202737 1 26/12/2003 09:39:36 202737
33867 1 26/12/2003 09:40:36 16934
33894 1 26/12/2003 09:41:36 11298
33895 1 26/12/2003 09:42:36 8474
34242 1 26/12/2003 09:43:36 6848
34327 1 26/12/2003 09:44:36 5721
33868 1 26/12/2003 09:45:36 4838
687891 1 26/12/2003 09:46:36 85986
220913 1 26/12/2003 09:47:36 24546
35122 1 26/12/2003 09:48:36 3512
i want to have time based statistics(per 5min.) like.
Did Total Collection
1 26/12/2003 09:44:36 252012 (202737+16934+11298+8474+6848+5721)
1 26/12/2003 09:48:36 118882
(4838+85986+ 24546+3512)
Thanks
Raj.
December 30, 2003 - 9:57 am UTC
using slightly different data to test boundaries, but here you go:
ops$tkyte@ORA9IR2> select did,
2 cons_time,
3 cons_fee,
4 trunc(abs( (((cons_time-min(cons_time) over (partition by did))*24*60)-0.0001)/5 )) grp
5 from t
6 /
DID CONS_TIME CONS_FEE GRP
---------- ------------------- ---------- ----------
1 26/12/2003 09:39:36 202737 0
1 26/12/2003 09:40:36 16934 0
1 26/12/2003 09:41:36 11298 0
1 26/12/2003 09:42:36 8474 0
1 26/12/2003 09:43:36 6848 0
1 26/12/2003 09:44:36 5721 0
1 26/12/2003 09:45:36 4838 1
1 26/12/2003 09:46:36 85986 1
1 26/12/2003 09:47:36 24546 1
1 26/12/2003 09:48:36 3512 1
1 26/12/2003 09:49:36 0 1
1 26/12/2003 09:50:36 3512 2
12 rows selected.
ops$tkyte@ORA9IR2> select did, min(cons_time), max(cons_time), sum(cons_fee)
2 from (
3 select did,
4 cons_time,
5 cons_fee,
6 trunc(abs( (((cons_time-min(cons_time) over (partition by did))*24*60)-0.0001)/5 )) grp
7 from t
8 )
9 group by did, grp
10 /
DID MIN(CONS_TIME) MAX(CONS_TIME) SUM(CONS_FEE)
---------- ------------------- ------------------- -------------
1 26/12/2003 09:39:36 26/12/2003 09:44:36 252012
1 26/12/2003 09:45:36 26/12/2003 09:49:36 118882
1 26/12/2003 09:50:36 26/12/2003 09:50:36 3512
Test Connection
Rajashekhar GANGA, December 30, 2003 - 11:25 pm UTC
Tom,
thanks a lot
sorry for adding in response,
i have a question
we have leased lines connecting to our manufacturing units,
there are trigger and procedures which inserts rows from one location to other.
The problem is when the line is closed our procedures and triggers raises timed out
error
How can we check the connection to the remote computer similar to ping utility.
Happy New Year.
Regards
Raj.
December 31, 2003 - 9:35 am UTC
sounds like you need a new leaser -- the QOS sounds pretty bad.
basically, there is no "ping" in the database and even if there was, so ? you ping, its alive, you start and it dies. pinging buys you nothing really.
use tnsping
A reader, December 31, 2003 - 6:07 am UTC
.
December 31, 2003 - 9:49 am UTC
in a stored procedure or trigger, hmmm.
Test Connection
Raj GANAGA, December 31, 2003 - 11:52 pm UTC
i think i have not made it clear,
i have database link which connects to remote database user.
if table t has an update then the relevant table t on the remote database to be updated or inserted based on some business logic. if this logic is place in a procedure and call it from update or insert trigger,
i need to check the connectivity to the remote database before calling the procedure other wise "Connection Timed Out" will be raised after some time,
this delay in raising the exception makes catching the exception not of much use.
Thankx for reading the request.
Raj.
January 01, 2004 - 9:08 am UTC
i understood that -- there is nothing like "ping" in there -- and if there was, it would suffer the same exact "timeout issue".
Suggestion -- don't have client applications do distributed operations. Instead, have the client application schedule a job via dbms_job, have the job do the distributed operation. Now, you don't really care if it take .05 seconds of 50 seconds as no humans are sitting there waiting for it and your system will appear to be infinitely more available (as you are no longer relying on the wire to be there in real time).
Syncronous replication is a really really bad idea -- it really means "i should have only one database in the first place"
Olaf, January 05, 2004 - 7:54 am UTC
Tom,
I have a very big table with entries of patients. The data are inserted online from many mobile devises (OLTP system)
How can I select (most performant) last 5 entries for one patient (SQL query)
Thanks in advance
Olaf
January 05, 2004 - 9:27 am UTC
select *
from ( select /*+ FIRST_ROWS */ *
from T
where patient_id = :x
order by patiend_id DESC, visit_date DESC )
where rownum <= 5;
using the CBO. Consider:
big_table@ORA920PC> select count(*) from v;
COUNT(*)
----------
3500000
1 row selected.
big_table@ORA920PC>
big_table@ORA920PC> set autotrace on
big_table@ORA920PC> select *
2 from ( select /*+ FIRST_ROWS */ *
3 from v
4 where patient_id = :x
5 order by patient_id DESC, visit_date DESC )
6 where rownum <= 5;
PATIENT_ID VISIT_DAT DATA1 DATA2
------------------------------ --------- ------------------------------ ------------------
BIG_TABLE 09-JUL-03 BIG_TABLE TABLE
BIG_TABLE 09-JUL-03 BIG_TABLE TABLE
BIG_TABLE 09-JUL-03 BIG_TABLE TABLE
BIG_TABLE 09-JUL-03 BIG_TABLE TABLE
BIG_TABLE 09-JUL-03 BIG_TABLE TABLE
5 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=HINT: FIRST_ROWS (Cost=58797 Card=5 Bytes=270)
1 0 COUNT (STOPKEY)
2 1 VIEW (Cost=58797 Card=106061 Bytes=5727294)
3 2 TABLE ACCESS (BY INDEX ROWID) OF 'BIG_TABLE' (Cost=58797 Card=106061 Bytes=4242440)
4 3 INDEX (RANGE SCAN DESCENDING) OF 'BT_OWNER_CREATED' (NON-UNIQUE) (Cost=365 Card=106061)
Statistics
----------------------------------------------------------
7 recursive calls
0 db block gets
12 consistent gets
0 physical reads
0 redo size
623 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
5 rows processed
big_table@ORA920PC> set autotrace off
Why not "OVER (PARTITION BY VISIT_ID ORDER BY DATE_CLICK RANGE INTERVAL '1' DAY"
Lee, January 05, 2004 - 3:38 pm UTC
Tom,
In all the response for the questions that dealt with getting records within a time frame you did not mentioned:
OVER (PARTITION BY COL_1 ORDER BY DATE_COL RANGE INTERVAL '1' DAY
I found this method (with some additions and variations) to work very good for getting a sliding window of records, dont you agree?
Thanks.
January 06, 2004 - 7:42 am UTC
that would get a sliding window -- yes.
But, we didn't WANT a sliding window. We wanted to
a) sort records by time
b) take ALL records within 3 seconds of eachother
c) turn them into a single record
Lets see you do that with a sliding window. the sliding window here would not be an appropriate tool for the question asked.
added another variable
LB, January 06, 2004 - 1:28 pm UTC
Hi Tom,
Thanks for your help before.
Now we have another variable added in and I can't figure out how to add it to the query.
Below is the data you used before plus a sample of the new variable:
X Y Z
-------- ---------- ---------------------
12:22:03 69 USER1
12:22:04 62 USER1
12:22:05 73 USER1
12:22:06 69 USER1
12:22:07 95 USER1
12:22:08 87 USER1
12:22:09 49 USER2
12:22:10 85 USER3
12:22:11 94 USER2
12:22:12 57 USER4
12:22:18 86 USER1
12:22:19 50 USER1
12:22:20 90 USER1
12:22:36 63 USER1
12:22:37 26 USER1
12:22:38 14 USER1
12:22:39 96 USER1
12:22:40 18 USER1
12:22:41 77 USER1
12:22:42 50 USER1
12:22:43 11 USER1
12:22:44 1 USER1
12:22:45 74 USER1
12:22:46 100 USER1
12:22:47 20 USER1
Basically, we only want to sum those records that are of the same user.
I have tried adding the user field to the order by of the lag but that did not work. Should I do group by z in the subquery? All I know is that I have to modify the innermost query first to get the groupings by user.
Have I explained the question clearly?
Thanks a million.
Leora
January 06, 2004 - 2:23 pm UTC
you want to PARTITION BY Z, you want to break the input query into N partitions -- one for each value of Z.
ops$tkyte@ORA920PC> alter table t add z varchar2(10);
Table altered.
ops$tkyte@ORA920PC> update t set z = 'user1';
25 rows updated.
ops$tkyte@ORA920PC> insert into t select x, y, 'user2' from t;
25 rows created.
ops$tkyte@ORA920PC> commit;
Commit complete.
ops$tkyte@ORA920PC>
ops$tkyte@ORA920PC> alter session set nls_date_format = 'hh24:mi:ss';
Session altered.
ops$tkyte@ORA920PC>
ops$tkyte@ORA920PC> select z, min(x), max(x), sum(y)
2 from (
3 select x, y, z, max(rn) over (partition by z order by x) max_rn
4 from (
5 select x, y, z,
6 lag(x) over (partition by z order by x),
7 case when abs(lag(x) over (partition by z
8 order by x) - x) > 3/24/60/60
9 then row_number() over (partition by z order by x)
10 when row_number() over (partition by z order by x) = 1 then 1
11 else null
12 end rn
13 from t
14 )
15 )
16 group by z, max_rn
17 /
Z MIN(X) MAX(X) SUM(Y)
---------- -------- -------- ----------
user1 12:22:03 12:22:12 446
user1 12:22:18 12:22:20 221
user1 12:22:36 12:22:47 342
user2 12:22:03 12:22:12 446
user2 12:22:18 12:22:20 221
user2 12:22:36 12:22:47 342
6 rows selected.
Thanks
LB, January 06, 2004 - 2:28 pm UTC
Another lesson in analytics.
I think once you do it once it becomes easier to do again.
Thanks a lot.
Simple genial and simple solution
www.db-nemec.com, January 10, 2004 - 3:11 pm UTC
grouping within a window of time
bob, April 13, 2004 - 10:50 am UTC
Starting with the data from the original question the user posted and adding an attribute just for display:
here is the create and insert for a modified set of that original data.
create table test (date_time date, amount number, attrib varchar2(10));
insert into test values (to_date('11/22/2003 12:22:01','MM/DD/YYYY HH24:MI:SS'),100,'abc1');
insert into test values (to_date('11/22/2003 12:22:03','MM/DD/YYYY HH24:MI:SS'),200,'abc2');
insert into test values (to_date('11/22/2003 12:22:05','MM/DD/YYYY HH24:MI:SS'),300,'abc3');
insert into test values (to_date('11/22/2003 12:22:08','MM/DD/YYYY HH24:MI:SS'),100,'abc1');
insert into test values (to_date('11/22/2003 12:22:13','MM/DD/YYYY HH24:MI:SS'),200,'abc2');
insert into test values (to_date('11/22/2003 12:22:26','MM/DD/YYYY HH24:MI:SS'),100,'abc1');
insert into test values (to_date('11/22/2003 12:22:27','MM/DD/YYYY HH24:MI:SS'),200,'abc2');
insert into test values (to_date('11/22/2003 12:22:28','MM/DD/YYYY HH24:MI:SS'),300,'abc3');
insert into test values (to_date('11/22/2003 12:22:33','MM/DD/YYYY HH24:MI:SS'),200,'abc2');
insert into test values (to_date('11/22/2003 12:22:35','MM/DD/YYYY HH24:MI:SS'),100,'abc1');
insert into test values (to_date('11/22/2003 12:22:45','MM/DD/YYYY HH24:MI:SS'),300,'abc3');
insert into test values (to_date('11/22/2003 12:22:48','MM/DD/YYYY HH24:MI:SS'),200,'abc2');
insert into test values (to_date('11/22/2003 12:22:50','MM/DD/YYYY HH24:MI:SS'),100,'abc1');
insert into test values (to_date('11/22/2003 12:22:55','MM/DD/YYYY HH24:MI:SS'),200,'abc2');
can you answer the following question with analytics?
For a time (:t), pivot the date_time and attributes for a group (100,200,300) that is within a (:x) second cluster of each other, and for which all values are before start time (:t). Only show the group that is closest to the start time.
I have an non-analytics approach that full scans the table three times using nested loops and sometimes the index on amount (depending on other conditions in the subqueries). I know it can be improved, and maybe I am missing a lead/lag time window approach.
SQL> variable t varchar2(25);
SQL> variable x number;
SQL> exec :t := '11/22/2003 12:22:37';
PL/SQL procedure successfully completed.
SQL> exec :x := 5;
PL/SQL procedure successfully completed.
SQL> SELECT * FROM (
2 SELECT to_char(a1.date_time,'HH24:MI:SS') a1_time,a1.attrib a1_attrib,
3 to_char(a2.date_time,'HH24:MI:SS') a2_time,a2.attrib a2_attrib,
4 to_char(a3.date_time,'HH24:MI:SS') a3_time,a3.attrib a3_attrib
5 FROM
6 (SELECT * from test where amount=100
7 AND date_time- to_date(:t,'MM/DD/YYYY HH24:MI:SS')<= 0) a1,
8 (SELECT * from test where amount=200
9 AND date_time- to_date(:t,'MM/DD/YYYY HH24:MI:SS')<= 0) a2,
10 (select * from test where amount=300
11 and date_time- to_date(:t,'MM/DD/YYYY HH24:MI:SS')<= 0) a3
12 WHERE abs(a2.date_time-a1.date_time) <= 1/24/60/60*:x
13 AND abs(a3.date_time-a1.date_time) <= 1/24/60/60*:x
14 ORDER by a1.date_time-to_date(:t,'MM/DD/YYYY HH24:MI:SS') desc
15 ) final
16 where rownum <=1
17
SQL> /
A1_TIME A1_ATTRIB A2_TIME A2_ATTRIB A3_TIME A3_ATTRIB
-------- ---------- -------- ---------- -------- ----------
12:22:26 abc1 12:22:27 abc2 12:22:30 abc3
can you answer this question with analytics similarly to how you grouped data by a time interval as you demonstrated in this thread?
April 13, 2004 - 3:32 pm UTC
it is not a simple lag/lead
the problem is there is no way to sort the data -- there is nothing saying amount=100 might happen 5 times AND THEN amount=200 happens once and then amount=300 happens (in that order) all within the input number of seconds. No way to look forward a row, two rows, N rows -- as you don't know how many to skip over.
Or, amount=300 happens and then 100 and then 200.... so they can arrive in random order....
Also, you are doing a non-equi join, meaning each row from the 1st query can be joined to 0..N rows in the second and so on. Analytics will preserve the cardinality -- you needed to explode it out and create all possible pairings.
But let me rephrase your logic a little based on your query:
find the record for amount=100 nearest to, but less than :T that has an amount=200 record that happened before :T within :X seconds of that amount=100 record AND an amount=300 record that happened before :T within :X seconds as well.
Report the first such record that has both an amount=200 and 300 within that time range.
Now, a nice index on T(amount,date_time,<attrib>) -- attrib is sort of optional there, would avoid the table access by index rowid -- and the query:
ops$tkyte@ORA9IR2> select a1_time, a1_attrib,
2 to_date(substr(a2,1,14),'yyyymmddhh24miss') a2_time, substr(a2,15) a2_attrib,
3 to_date(substr(a3,1,14),'yyyymmddhh24miss') a3_time, substr(a3,15) a3_attrib
4 from (
5 select a.date_time a1_time, a.attrib a1_attrib,
6 (select to_char(date_time,'yyyymmddhh24miss') || attrib
7 from test b
8 where amount = 200
9 and date_time <= to_date(:t,'MM/DD/YYYY HH24:MI:SS')
10 and ABS(date_time-A.DATE_TIME) <= 1/24/60/60*:x
11 and rownum = 1) a2,
12 (select to_char(date_time,'yyyymmddhh24miss') || attrib
13 from test c
14 where amount = 300
15 and date_time <= to_date(:t,'MM/DD/YYYY HH24:MI:SS')
16 and ABS(date_time-A.DATE_TIME) <= 1/24/60/60*:x
17 and rownum = 1) a3
18 from test A
19 where date_time <= to_date(:t,'MM/DD/YYYY HH24:MI:SS')
20 and amount = 100
21 order by amount desc, date_time desc
22 )
23 where a2 is not null
24 and a3 is not null
25 and rownum = 1
26 /
A1_TIME A1_ATTRIB A2_TIME A2_ATTRIB A3_TIME A3_ATTRIB
--------- ---------- --------- ---------- --------- ----------
22-NOV-03 abc1 22-NOV-03 abc2 22-NOV-03 abc3
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=3 Card=1 Bytes=44)
1 0 COUNT (STOPKEY)
2 1 VIEW (Cost=3 Card=1 Bytes=44)
3 2 TABLE ACCESS (BY INDEX ROWID) OF 'TEST' (Cost=3 Card=1 Bytes=29)
4 3 INDEX (RANGE SCAN DESCENDING) OF 'TEST_IDX' (NON-UNIQUE) (Cost=2 Card=1)
5 4 COUNT (STOPKEY)
6 5 TABLE ACCESS (BY INDEX ROWID) OF 'TEST' (Cost=3 Card=3 Bytes=87)
7 6 INDEX (RANGE SCAN) OF 'TEST_IDX' (NON-UNIQUE) (Cost=2 Card=1)
8 4 COUNT (STOPKEY)
9 8 TABLE ACCESS (BY INDEX ROWID) OF 'TEST' (Cost=3 Card=3 Bytes=87)
10 9 INDEX (RANGE SCAN) OF 'TEST_IDX' (NON-UNIQUE) (Cost=2 Card=1)
gets it with what I think would be "minimal work"
(your query needn't full scan as it is, it didn't on my machine anyway -- make sure you are using the CBO!)
magician
bob, April 14, 2004 - 4:57 pm UTC
I didn't think it was possible either due to the 1..N rows ahead or behind the rows could be, but I envisioned you pulling something out of that magic analytic sql hat that would first transform the set into something that could use analytics...
my 9.2 laptop wasn't Full Scanning, but the real system with stats and the cbo was.. The query might was slightly different.
Maybe if you weren't so impressive all the time, I wouldn't get the idea that there was always a better way i was missing. in this case there was as well, just not analytically.
i have seen you use the concat/substr approach to other things.. I will have to go back and review other places you have used that to better know when that is worthy of being used.
overlooked nearest row?
bob, May 10, 2004 - 8:06 pm UTC
Tom,
I think you might overlooked something, you help me tremendously, so I am not complaining at all, I am just trying to be sure I understood your solution.
you wrote: "
find the record for amount=100 nearest to, but less than :T that has an amount=200 record that happened before :T within :X seconds of that amount=100 record AND an amount=300 record that happened before :T within :X seconds as well."
How does the solution you proposed ensure that I get the "nearest" row to the amount=100 row since there is no ordering in the subqueries?
If in fact the subquery returns more than one row within :t number of seconds that has an amount=200, I will just get the first one the db returns. right?
I think I have to nest a subquery one more level deep in each subquery and order by the time delta. Is that correct?
May 10, 2004 - 9:01 pm UTC
you are right, logic would need to change a tad to do that (but -- so would yours)
we both return a random row within N seconds of the row of interest.
i see (my bad)
bob, May 10, 2004 - 8:30 pm UTC
you said nearest to time :t, i meant the nearest 200 and nearest 300 after 100 in time, where all of them are before time t and within :x seconds of each other.
I am not suggesting the original query I submitted does that.
rambling tonight
bob, May 10, 2004 - 8:51 pm UTC
please disregard the "after" aspect. the ABS says it doesn't matter if it is before or after. the only area of uncertainty for me is ensuring the row I am seeing for amount=200 and amount=300 is the nearest to the amount 100 (in the event there is more than one in the window).
May 11, 2004 - 8:12 am UTC
but your query didn't do that either?
it got a random row as well. we are "equivalent" in that respect. do you still need to change it?
yes, i do
bob, May 11, 2004 - 9:30 am UTC
can you show me the change that is necessary to make that happen?
I have tried several things, but can't get the correlated subquery to return a single row that is ordered by the time delta.
The original query(not the one I submitted) had an:
"ORDER BY date_time- to_date(:t,'MM/DD/YYYY HH24:MI:SS')"
which ensured that each subquery for amount=200 and amount=300 returned the "nearest" to the amount=100 row first.
Does that fact I excluded eliminate the benefits to your approach?
I recognize that now additional LIO will have to occur to get all the rows that fit the time window and find the lowest time delta, but maybe there is other optimizations I am not aware of.
Your original solution saw how wasteful it was that the subqueries were not being short circuited with rownum=1 when there was no ordering going on. I apologize for not giving the complete picture at that point that would have been necessary to provide the best solution.
May 11, 2004 - 10:36 am UTC
that didn't do it.
SQL> SELECT * FROM (
2 SELECT to_char(a1.date_time,'HH24:MI:SS') a1_time,a1.attrib a1_attrib,
3 to_char(a2.date_time,'HH24:MI:SS') a2_time,a2.attrib a2_attrib,
4 to_char(a3.date_time,'HH24:MI:SS') a3_time,a3.attrib a3_attrib
5 FROM
6 (SELECT * from test where amount=100
7 AND date_time- to_date(:t,'MM/DD/YYYY HH24:MI:SS')<= 0) a1,
8 (SELECT * from test where amount=200
9 AND date_time- to_date(:t,'MM/DD/YYYY HH24:MI:SS')<= 0) a2,
10 (select * from test where amount=300
11 and date_time- to_date(:t,'MM/DD/YYYY HH24:MI:SS')<= 0) a3
12 WHERE abs(a2.date_time-a1.date_time) <= 1/24/60/60*:x
13 AND abs(a3.date_time-a1.date_time) <= 1/24/60/60*:x
14 ORDER by a1.date_time-to_date(:t,'MM/DD/YYYY HH24:MI:SS') desc
15 ) final
16 where rownum <=1
that ordered the data by a1.date_time -- a2 and a3 -- NOT, you got "a random row within abs(a2.date_time-a1.date_time) <= 1/24/60/60*:x" for a2.
Definitely did not mandate the closest row to a1! You needed to sort A2 and sort A3 - in order to get the A2 and A3 "closest"
consider (i added a single new insert to your test case, amount=200 at time 12:22:28)
ops$tkyte@ORA9IR2> select * from test where amount = 200 order by date_time;
DATE_TIME AMOUNT ATTRIB
------------------- ---------- ----------
11/22/2003 12:22:03 200 abc2
11/22/2003 12:22:13 200 abc2
11/22/2003 12:22:27 200 abc2ZZZ <<<=== your row
11/22/2003 12:22:28 200 abc2XXX <<<=== my row
11/22/2003 12:22:33 200 abc2
11/22/2003 12:22:48 200 abc2
11/22/2003 12:22:55 200 abc2
7 rows selected.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> SELECT *
2 FROM (
3 SELECT to_char(a1.date_time,'HH24:MI:SS') a1_time,a1.attrib a1_attrib,
4 to_char(a2.date_time,'HH24:MI:SS') a2_time,a2.attrib a2_attrib,
5 to_char(a3.date_time,'HH24:MI:SS') a3_time,a3.attrib a3_attrib
6 FROM (SELECT * from test where amount=100 AND date_time- to_date(:t,'MM/DD/YYYY HH24:MI:SS')<= 0) a1,
7 (SELECT * from test where amount=200 AND date_time- to_date(:t,'MM/DD/YYYY HH24:MI:SS')<= 0) a2,
8 (select * from test where amount=300 and date_time- to_date(:t,'MM/DD/YYYY HH24:MI:SS')<= 0) a3
9 WHERE abs(a2.date_time-a1.date_time) <= 1/24/60/60*:x
10 AND abs(a3.date_time-a1.date_time) <= 1/24/60/60*:x
11 ORDER by a1.date_time-to_date(:t,'MM/DD/YYYY HH24:MI:SS') desc
12 ) final
13 where rownum <=1
14 /
A1_TIME A1_ATTRIB A2_TIME A2_ATTRIB A3_TIME A3_ATTRIB
-------- ---------- -------- ---------- -------- ----------
12:22:26 abc1 12:22:28 abc2XXX 12:22:28 abc3
^^^^^^^^^^^^^^^^ not the closest......
if you add this and don't see the same thing, mix up the order of the inserts. consider:
ops$tkyte@ORA9IR2> select * from test where amount = 200 order by date_time;
DATE_TIME AMOUNT ATTRIB
------------------- ---------- ----------
11/22/2003 12:22:03 200 abc2
11/22/2003 12:22:13 200 abc2
11/22/2003 12:22:27 200 abc2ZZZ
11/22/2003 12:22:28 200 abc2XXX
11/22/2003 12:22:33 200 abc2
11/22/2003 12:22:48 200 abc2
11/22/2003 12:22:55 200 abc2
7 rows selected.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> SELECT *
2 FROM (
3 SELECT to_char(a1.date_time,'HH24:MI:SS') a1_time,a1.attrib a1_attrib,
4 to_char(a2.date_time,'HH24:MI:SS') a2_time,a2.attrib a2_attrib,
5 to_char(a3.date_time,'HH24:MI:SS') a3_time,a3.attrib a3_attrib
6 FROM (SELECT * from test where amount=100 AND date_time- to_date(:t,'MM/DD/YYYY HH24:MI:SS')<= 0) a1,
7 (SELECT * from test where amount=200 AND date_time- to_date(:t,'MM/DD/YYYY HH24:MI:SS')<= 0) a2,
8 (select * from test where amount=300 and date_time- to_date(:t,'MM/DD/YYYY HH24:MI:SS')<= 0) a3
9 WHERE abs(a2.date_time-a1.date_time) <= 1/24/60/60*:x
10 AND abs(a3.date_time-a1.date_time) <= 1/24/60/60*:x
11 ORDER by a1.date_time-to_date(:t,'MM/DD/YYYY HH24:MI:SS') desc
12 ) final
13 where rownum <=1
14 /
A1_TIME A1_ATTRIB A2_TIME A2_ATTRIB A3_TIME A3_ATTRIB
-------- ---------- -------- ---------- -------- ----------
12:22:26 abc1 12:22:27 abc2ZZZ 12:22:28 abc3
^^^^^^^^^^^^^^^^ is the right answer, all i did was
flip flop the inserts
Your query never returned the closest row (well, it never PROMISED to!!!)
but, in any case, we can modify the query to return the row "closest" to A1 that matches the other criteria easily:
ops$tkyte@ORA9IR2> select a1_time, a1_attrib,
2 to_date(substr(a2,6,14),'yyyymmddhh24miss') a2_time, substr(a2,20) a2_attrib,
3 to_date(substr(a3,6,14),'yyyymmddhh24miss') a3_time, substr(a3,20) a3_attrib
4 from (
5 select a.date_time a1_time, a.attrib a1_attrib,
6 (select min(
to_char(abs(round(a.date_time-date_time,5)),'fm00000') ||
7 to_char(date_time,'yyyymmddhh24miss') || attrib )
8 from test b
9 where amount = 200
10 and date_time <= to_date(:t,'MM/DD/YYYY HH24:MI:SS')
11 and date_time between a.date_time-1/24/60/60*:x
and a.date_time+1/24/60/60*:x) a2,
12 (select min(
to_char(abs(round(a.date_time-date_time,5)),'fm00000') ||
13 to_char(date_time,'yyyymmddhh24miss') || attrib )
14 from test c
15 where amount = 300
16 and date_time <= to_date(:t,'MM/DD/YYYY HH24:MI:SS')
17 and date_time between a.date_time-1/24/60/60*:x
and a.date_time+1/24/60/60*:x) a3
18 from test A
19 where date_time <= to_date(:t,'MM/DD/YYYY HH24:MI:SS')
20 and amount = 100
21 order by amount desc, date_time desc
22 )
23 where a2 is not null
24 and a3 is not null
25 and rownum = 1
26 /
A1_TIME A1_ATTRIB A2_TIME A2_ATTRIB A3_TIME A3_ATTRIB
------------------- ---------- ------------------- ----------- ------------------- -----------
11/22/2003 12:22:26 abc1 11/22/2003 12:22:27 abc2ZZZ 11/22/2003 12:22:28 abc3
That'll use the index I assume is on test(amount,date_time,<attrib>) to range scan on "amount=K and date between K1 and K2", limiting the number of rows it looks at considerably.
thanks again
bob, May 11, 2004 - 1:38 pm UTC
I am very appreciative of your efforts.
If the "ORDER BY date_time-to_date(:t,'MM/DD/YYYY HH24:MI:SS')" was repeated in each of the subqueries (a2 and a3) as well, doesn't that ensure that when the final result set is assembled from each of the subqueries that the first row in that final result set will contain the lowest delta (closest row) for a2, and a3 (because each of the tables that was used to build that result set were sorted in that order).
ie.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> SELECT *
2 FROM (
3 SELECT to_char(a1.date_time,'HH24:MI:SS') a1_time,a1.attrib a1_attrib,
4 to_char(a2.date_time,'HH24:MI:SS') a2_time,a2.attrib a2_attrib,
5 to_char(a3.date_time,'HH24:MI:SS') a3_time,a3.attrib a3_attrib
6 FROM (SELECT * from test where amount=100 AND date_time-
to_date(:t,'MM/DD/YYYY HH24:MI:SS')<= 0) a1,
7 (SELECT * from test where amount=200 AND date_time-
to_date(:t,'MM/DD/YYYY HH24:MI:SS')<= 0 ORDER BY ABS(date_time-
to_date(:t,'MM/DD/YYYY HH24:MI:SS') asc) a2,
8 (select * from test where amount=300 and date_time-
to_date(:t,'MM/DD/YYYY HH24:MI:SS')<= 0 ORDER BY ABS(date_time-
to_date(:t,'MM/DD/YYYY HH24:MI:SS') asc) a3
9 WHERE abs(a2.date_time-a1.date_time) <= 1/24/60/60*:x
10 AND abs(a3.date_time-a1.date_time) <= 1/24/60/60*:x
11 ORDER by a1.date_time-to_date(:t,'MM/DD/YYYY HH24:MI:SS') desc
12 ) final
13 where rownum <=1
14 /
regardless of whether that is true (it seemed to be true for randomly inserted data I was testing with - I can't cut and paste from the pc I am currently on to show this), your solution will be very helpful.
May 11, 2004 - 3:10 pm UTC
joining three ordered sets together does NOT imply the results are sorted by the same way they came out! a simple sort/merge join, hash join, anything will mess up the sorted order!
especially when you toss in yet another order by!! that doesn't order by the same things.
naivety
bob, May 11, 2004 - 5:04 pm UTC
The shear volume of explanation points in that last post of yours implies I really should have known better than to make such an elementary assumption.
In my defense, I did contemplate the effects of a hash join on my assumption of the final result set, but fell back on the results I was seeing based on the plan being used.
I have a lot to learn.
Thank you for creating this forum to teach me the correct way to think and examine problems.
May 11, 2004 - 9:01 pm UTC
sorry, i love the !
but really - think "sets", think "sets".... SQL is so much easier when you let lose some of the procedural thinking we seem to be born with....
The tao of sets.
format mask mistake?
bob, May 11, 2004 - 10:40 pm UTC
Tom,
The question that won't die..
round(x,5) rounds at about the minute level, but the format mask 'fm00000' makes every time difference less than half a day (rounding up to 1) appear the same "00000".
as a result, using the following data and your query as is, I get the wrong result.
drop table test;
create table test (date_time date, amount number, attrib varchar2(10));
insert into test values (to_date('11/22/2003 12:22:31','MM/DD/YYYYHH24:MI:SS'),300,'abc3');
insert into test values (to_date('11/22/2003 12:22:30','MM/DD/YYYYHH24:MI:SS'),300,'abc3');
insert into test values (to_date('11/22/2003 12:22:27','MM/DD/YYYYHH24:MI:SS'),300,'abc3');
insert into test values (to_date('11/22/2003 12:22:33','MM/DD/YYYYHH24:MI:SS'),200,'abc2');
insert into test values (to_date('11/22/2003 12:22:35','MM/DD/YYYYHH24:MI:SS'),100,'abc1');
SQL> select to_char(a1_time,'HH24:MI:SS'), a1_attrib,
2 to_char(to_date(substr(a2,6,14),'yyyymmddhh24miss'),'HH24:MI:SS') a2_time, substr(a2
,20) a2_attrib,
3 to_char(to_date(substr(a3,6,14),'yyyymmddhh24miss'),'HH24:MI:SS') a3_time, substr(a3
,20) a3_attrib
4 from (
5 select a.date_time a1_time, a.attrib a1_attrib,
6 (select min(
7 to_char(abs(round(a.date_time-date_time,5)),'fm00000') ||
8 to_char(date_time,'yyyymmddhh24miss') || attrib )
9 from test b
10 where amount = 200
11 and date_time <= to_date(:t,'MM/DD/YYYY HH24:MI:SS')
12 and date_time between a.date_time-1/24/60/60*:x
13 and a.date_time+1/24/60/60*:x) a2,
14 (select min(
15 to_char(abs(round(a.date_time-date_time,5)),'fm00000') ||
16 to_char(date_time,'yyyymmddhh24miss') || attrib )
17 from test c
18 where amount = 300
19 and date_time <= to_date(:t,'MM/DD/YYYY HH24:MI:SS')
20 and date_time between a.date_time-1/24/60/60*:x
21 and a.date_time+1/24/60/60*:x) a3
22 from test A
23 where date_time <= to_date(:t,'MM/DD/YYYYHH24:MI:SS')
24 and amount = 100
25 order by amount desc, date_time desc
26 )
27 where a2 is not null
28 and a3 is not null
29 and rownum = 1
30 /
TO_CHAR( A1_ATTRIB A2_TIME A2_ATTRIB A3_TIME A3_ATTRIB
-------- ---------- -------- ----------- -------- -----------
12:22:35 abc1 12:22:33 abc2 12:22:30 abc3
^^^^^^^^ should be 12:22:31
but since the deltas are both 00000 because of the format mask, the first row in (order of insert) is the first row returned.
I suspect it is a simple format mask change to 'fm.00000' and changing a couple substr indexes.
SQL> select to_char(a1_time,'HH24:MI:SS'), a1_attrib,
2 to_char(to_date(substr(a2,7,14),'yyyymmddhh24miss'),'HH24:MI:SS') a2_time, substr(a2
,21) a2_attrib,
3 to_char(to_date(substr(a3,7,14),'yyyymmddhh24miss'),'HH24:MI:SS') a3_time, substr(a3
,21) a3_attrib
4 from (
5 select a.date_time a1_time, a.attrib a1_attrib,
6 (select min(
7 to_char(abs(round(a.date_time-date_time,5)),'fm.00000') ||
8 to_char(date_time,'yyyymmddhh24miss') || attrib )
9 from test b
10 where amount = 200
11 and date_time <= to_date(:t,'MM/DD/YYYY HH24:MI:SS')
12 and date_time between a.date_time-1/24/60/60*:x
13 and a.date_time+1/24/60/60*:x) a2,
14 (select min(
15 to_char(abs(round(a.date_time-date_time,5)),'fm.00000') ||
16 to_char(date_time,'yyyymmddhh24miss') || attrib )
17 from test c
18 where amount = 300
19 and date_time <= to_date(:t,'MM/DD/YYYY HH24:MI:SS')
20 and date_time between a.date_time-1/24/60/60*:x
21 and a.date_time+1/24/60/60*:x) a3
22 from test A
23 where date_time <= to_date(:t,'MM/DD/YYYY HH24:MI:SS')
24 and amount = 100
25 order by amount desc, date_time desc
26 )
27 where a2 is not null
28 and a3 is not null
29 and rownum = 1
30 /
TO_CHAR( A1_ATTRIB A2_TIME A2_ATTRIB A3_TIME A3_ATTRIB
-------- ---------- -------- ----------- -------- -----------
12:22:35 abc1 12:22:33 abc2 12:22:31 abc3
^^^^^^^^ correct
Does that accomplish what you intended, or was the mask serving another purpose?
May 12, 2004 - 7:33 am UTC
doh, lost the v
'fmv00000'
v is implied decimal, that will return:
1* select to_char( round(sysdate-sysdate+1/24/60,5), 'fmv00000') from dual
ops$tkyte@ORA9IR2> /
TO_CHA
------
00069
which is want I wanted.
more simply, leave fm alone, multiple time deltas by 24*60*60
bob, May 12, 2004 - 7:12 am UTC
I suspect that was what you unintentionally excluded.
another good trick I acquired out of this thread..
Thanks
amazing results.
bob, May 12, 2004 - 10:19 am UTC
with a large set of test data and the real similar requirement that was using this approach you outlined in this example, LIO was reduced from 2400 to 10. fabulous!!!!! (favorable use of exclamation points. :) )
Clarification
A reader, August 09, 2004 - 6:32 pm UTC
Tom,
The explanation you have given is wonderful. However, when I ran the example you had given (to sum records with are three seconds apart), I get the records which are 9 seconds apart and not 3 as posted in the question. For e-g, I get the following results when I ran your example above.
MIN(X) MAX(X) Sum
=============================================
8/1/2004 12:22:03 PM 8/1/2004 12:22:12 PM 552
8/1/2004 12:22:18 PM 8/1/2004 12:22:20 PM 222
8/1/2004 12:22:36 PM 8/1/2004 12:22:47 PM 629
Ideally it should be
8/1/2004 12:22:03 PM 8/1/2004 12:22:06 PM xxx
where xxx can change because of the random function in your example.
However, in the above case, the records show a 9 second difference, but the difference you use in your query is only 3 second interval as per the question. Why is this different? Please clarify.
2. On a similar note, I want to do the following. I have a table T1 with say 3 columns :
CREATE TABLE T1
(
Col1 VARCHAR2(2),
Col2 Date,
Col3 Number
)
Eg values in the table :
Col1 Col2 Col3
C1 1/1/2001 100
C1 4/5/2001 200
C1 3/5/2001 800
C1 6/10/2001 200
C1 10/10/2001 200
C1 12/15/2001 150
C2 ...
C2 ....
C2 ....
C3 ....
C3 ....
C3 .....
I want to find the sum of Col3 that should break at each change in Col1 and also that for a given value of Col1 we need to find the following :
1. The first value of the column Col2 that is in the same quarter as the value of column Col2 of the current row
2. The sum of column col3 must only add those rows where the date in column Col2 should lie in the same quarter as the previous rows. If they are in different quarter, then the current row should not be included in the summation.
So per the above condition, the output should be as follows :
Prev
Col1 Col2 Col3 Qtr QTD Total
===========================================
C1 1/1/2001 100 Null 100(a)
C1 4/5/2001 200 Null 200(b)
C1 3/5/2001 800 1/1/2001 900(c)
C1 6/10/2001 200 4/5/2001 400(d)
C1 10/10/2001 200 Null 200(e)
C1 12/15/2001 150 10/10/2001 350(f)
(a) - No previous record in same quarter
(b) - No previous record in same quarter
(c) - 800 + 100 (Add Jan and Mar since they are in same qtr
and also PrevQtr is 1/1/2001)
(d) - 200 + 200 (Add Jun and Apr since they are in same qtr
and also PrevQtr is 4/5/2001)
(e) - No previous record in same quarter
(f) - 200 + 150 (Add Dec and Oct since they are in same qtr
and also PrevQtr is 10/10/2001)
Please let me know how I can acheive this in a single query. I tried to see some examples from your book (last employee hire date example)using LAG/LEAD analytic functions and couldn't find one where the range can be a numeric expression)
3. Suppose my table has a DATE column, is it possible to use a condition such as to sum up the previous rows that are in the same quarter as the date in the current row etc using the LAG/LEAD analytic functions.
Your response on this will be immensely helpful as I have to write a Rate of return calculation algorithm based on some of the logic I described above.
Thanks
August 09, 2004 - 8:57 pm UTC
no, you missed the thrust of this example
we wanted to start scanning through a set of records...
and as long as the preceding record was within three seconds -- then the current record would be in the same group.
so, if you had 1,000 records -- each 1 second apart -- there would be ONE group.
ONE group spanning 1,000 seconds
if you had 1,000 records -- each 4 seconds apart -- there would be ONE THOUSAND groups. ONE THOUSAND groups -- each spanning 4 seconds.
(and all combos in between)
if you just wanted "records between 12:00:00 and 12:00:03-"
"records between 12:00:03 and 12:00:06-"
"records between 12:00:06 and 12:00:09-"
you would "divide and truncate" (like divide and conquer)
like this:
ops$tkyte@ORA9IR2> create table t
2 as
3 select sysdate+1/24/60/60*rownum dt from all_users;
Table created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> @bigdate
ops$tkyte@ORA9IR2> alter session set nls_date_format = 'dd-mon-yyyy hh24:mi:ss';
Session altered.
ops$tkyte@ORA9IR2> select dt, trunc(dt) + trunc(to_char(dt,'sssss')/3)*1/24/60/60 dt2
2 from t;
DT DT2
-------------------- --------------------
09-aug-2004 20:50:16 09-aug-2004 06:56:45
09-aug-2004 20:50:17 09-aug-2004 06:56:45
09-aug-2004 20:50:18 09-aug-2004 06:56:46
09-aug-2004 20:50:19 09-aug-2004 06:56:46
09-aug-2004 20:50:20 09-aug-2004 06:56:46
09-aug-2004 20:50:21 09-aug-2004 06:56:47
09-aug-2004 20:50:22 09-aug-2004 06:56:47
09-aug-2004 20:50:23 09-aug-2004 06:56:47
09-aug-2004 20:50:24 09-aug-2004 06:56:48
09-aug-2004 20:50:25 09-aug-2004 06:56:48
09-aug-2004 20:50:26 09-aug-2004 06:56:48
09-aug-2004 20:50:27 09-aug-2004 06:56:49
09-aug-2004 20:50:28 09-aug-2004 06:56:49
09-aug-2004 20:50:29 09-aug-2004 06:56:49
09-aug-2004 20:50:30 09-aug-2004 06:56:50
09-aug-2004 20:50:31 09-aug-2004 06:56:50
09-aug-2004 20:50:32 09-aug-2004 06:56:50
09-aug-2004 20:50:33 09-aug-2004 06:56:51
09-aug-2004 20:50:34 09-aug-2004 06:56:51
09-aug-2004 20:50:35 09-aug-2004 06:56:51
09-aug-2004 20:50:36 09-aug-2004 06:56:52
09-aug-2004 20:50:37 09-aug-2004 06:56:52
09-aug-2004 20:50:38 09-aug-2004 06:56:52
09-aug-2004 20:50:39 09-aug-2004 06:56:53
09-aug-2004 20:50:40 09-aug-2004 06:56:53
09-aug-2004 20:50:41 09-aug-2004 06:56:53
09-aug-2004 20:50:42 09-aug-2004 06:56:54
09-aug-2004 20:50:43 09-aug-2004 06:56:54
09-aug-2004 20:50:44 09-aug-2004 06:56:54
09-aug-2004 20:50:45 09-aug-2004 06:56:55
09-aug-2004 20:50:46 09-aug-2004 06:56:55
09-aug-2004 20:50:47 09-aug-2004 06:56:55
09-aug-2004 20:50:48 09-aug-2004 06:56:56
09-aug-2004 20:50:49 09-aug-2004 06:56:56
09-aug-2004 20:50:50 09-aug-2004 06:56:56
09-aug-2004 20:50:51 09-aug-2004 06:56:57
09-aug-2004 20:50:52 09-aug-2004 06:56:57
09-aug-2004 20:50:53 09-aug-2004 06:56:57
09-aug-2004 20:50:54 09-aug-2004 06:56:58
09-aug-2004 20:50:55 09-aug-2004 06:56:58
09-aug-2004 20:50:56 09-aug-2004 06:56:58
09-aug-2004 20:50:57 09-aug-2004 06:56:59
09-aug-2004 20:50:58 09-aug-2004 06:56:59
09-aug-2004 20:50:59 09-aug-2004 06:56:59
09-aug-2004 20:51:00 09-aug-2004 06:57:00
09-aug-2004 20:51:01 09-aug-2004 06:57:00
09-aug-2004 20:51:02 09-aug-2004 06:57:00
09-aug-2004 20:51:03 09-aug-2004 06:57:01
09-aug-2004 20:51:04 09-aug-2004 06:57:01
09-aug-2004 20:51:05 09-aug-2004 06:57:01
09-aug-2004 20:51:06 09-aug-2004 06:57:02
09-aug-2004 20:51:07 09-aug-2004 06:57:02
09-aug-2004 20:51:08 09-aug-2004 06:57:02
53 rows selected.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select trunc(dt) + trunc(to_char(dt,'sssss')/3)*1/24/60/60 dt2, count(*)
2 from t
3 group by trunc(dt) + trunc(to_char(dt,'sssss')/3)*1/24/60/60;
DT2 COUNT(*)
-------------------- ----------
09-aug-2004 06:56:45 2
09-aug-2004 06:56:46 3
09-aug-2004 06:56:47 3
09-aug-2004 06:56:48 3
09-aug-2004 06:56:49 3
09-aug-2004 06:56:50 3
09-aug-2004 06:56:51 3
09-aug-2004 06:56:52 3
09-aug-2004 06:56:53 3
09-aug-2004 06:56:54 3
09-aug-2004 06:56:55 3
09-aug-2004 06:56:56 3
09-aug-2004 06:56:57 3
09-aug-2004 06:56:58 3
09-aug-2004 06:56:59 3
09-aug-2004 06:57:00 3
09-aug-2004 06:57:01 3
09-aug-2004 06:57:02 3
18 rows selected.
Still some doubts
A reader, August 09, 2004 - 10:11 pm UTC
Tom,
I have understood the first problem (about grouping). However, I am not clear about the solution provided for my question 2 about summation of dates in the same quarter. Please explain me what you tried to provide in the other example and also a solution to the summation of dates in the same quarter
Thanks
August 10, 2004 - 7:34 am UTC
i didn't even look at the second part.
i demonstrated "3 second grouping" only.
I saw no insert intos to work with.
but you need to derive a "quarter" column
select f(date) as quarter.....
and then you can partition on col1, quarter -- and then you have first_value, lag, lead, sum -- over (partition by col1, quarter....) to do whatever you need.
What about a 'sliding' window?
Basil, March 31, 2006 - 5:34 pm UTC
Assuming I have db rows inserted at the following times:
12:00:00
12:00:10
12:00:13
12:00:15
12:00:20
12:00:25
12:01:00
12:01:01
12:01:02
12:01:03
12:01:04
12:01:45
12:02:05
12:03:00
I would like to know the maximum number of rows that exist within any 60 second window. Thus, the following shows the number of rows that are in the window beginning at each indicated time:
20060531 12:00:00 6
20060531 12:00:10 10
20060531 12:00:13 9
20060531 12:00:15 8
20060531 12:00:20 7
20060531 12:00:25 6
20060531 12:01:00 6
20060531 12:01:01 5
20060531 12:01:02 4
20060531 12:01:03 3
20060531 12:01:04 2
20060531 12:01:45 2
20060531 12:02:05 1
20060531 12:03:00 1
The window starting at 12:00:10 has the most rows for the 60 second period that would end at 12:00:09.
Here's the test data:
CREATE sequence t_seq;
CREATE TABLE t (t_id number not null primary key,
t_dt date not null);
INSERT INTO t values (t_seq.nextval, to_date('20060531 12:00:00','YYYYMMDD HH24:MI:SS'));
INSERT INTO t values (t_seq.nextval, to_date('20060531 12:00:10','YYYYMMDD HH24:MI:SS'));
INSERT INTO t values (t_seq.nextval, to_date('20060531 12:00:13','YYYYMMDD HH24:MI:SS'));
INSERT INTO t values (t_seq.nextval, to_date('20060531 12:00:15','YYYYMMDD HH24:MI:SS'));
INSERT INTO t values (t_seq.nextval, to_date('20060531 12:00:20','YYYYMMDD HH24:MI:SS'));
INSERT INTO t values (t_seq.nextval, to_date('20060531 12:00:25','YYYYMMDD HH24:MI:SS'));
INSERT INTO t values (t_seq.nextval, to_date('20060531 12:01:00','YYYYMMDD HH24:MI:SS'));
INSERT INTO t values (t_seq.nextval, to_date('20060531 12:01:01','YYYYMMDD HH24:MI:SS'));
INSERT INTO t values (t_seq.nextval, to_date('20060531 12:01:02','YYYYMMDD HH24:MI:SS'));
INSERT INTO t values (t_seq.nextval, to_date('20060531 12:01:03','YYYYMMDD HH24:MI:SS'));
INSERT INTO t values (t_seq.nextval, to_date('20060531 12:01:04','YYYYMMDD HH24:MI:SS'));
INSERT INTO t values (t_seq.nextval, to_date('20060531 12:01:45','YYYYMMDD HH24:MI:SS'));
INSERT INTO t values (t_seq.nextval, to_date('20060531 12:02:05','YYYYMMDD HH24:MI:SS'));
INSERT INTO t values (t_seq.nextval, to_date('20060531 12:03:00', 'YYYYMMDD HH24:MI:SS'));
Otherwise, I've not yet been able to find a way to do this such that we consider a 60 second window that moves around.
March 31, 2006 - 6:57 pm UTC
ops$tkyte@ORA10GR2> select *
2 from (
3 select t_id, t_dt, cnt, max(cnt) over () max_cnt
4 from (
5 select t_id, t_dt,
6 count(*) over (order by t_dt desc range 1/24/60 preceding ) cnt
7 from t
8 )
9 )
10 where cnt = max_cnt
11 order by t_dt;
T_ID T_DT CNT MAX_CNT
---------- -------------------- ---------- ----------
2 31-may-2006 12:00:10 10 10
Helpful! Now how to figure out the number of concurrent ones?
Basil, April 10, 2006 - 4:25 pm UTC
How to determine, then, the maximum number of things that occur at the same time, and which things are involved? Consider the following situation:
Event 1 13:00 -----------------15:00
Event 2 14:00-------------------------17:00
Event 3 12:00------------14:30
Event 4 12:30-12:40
Event 5 15:30-16:00
The maximum number of concurrent things is 3 (Events 1, 2, and 3,
between 14:00 and 14:30).
ALTER SESSION SET NLS_DATE_FORMAT= 'YYYYMMDD HH24:MI';
CREATE TABLE t
( id number not null primary key,
startdt date,
enddt date );
INSERT INTO t
VALUES(1, '20060501 12:00', '20060501 14:30');
INSERT INTO t
VALUES(2, '20060501 12:30', '20060501 12:40');
INSERT INTO t
VALUES(3, '20060501 13:00', '20060501 15:00');
INSERT INTO t
VALUES(4, '20060501 14:00', '20060501 17:00');
INSERT INTO t
VALUES(5, '20060501 15:30', '20060501 16:00');
COMMIT;
April 11, 2006 - 12:11 pm UTC
and what happens when the max happens more than once and things come and go over time
eg:
... time ....
e1 --------
e2 --------
e3 ----
e4 ----
there are always THREE - but a different set of three. I mean - there is no "one row" answer.
what would you expect the outcome of that query to be?
Further details
Basil, April 11, 2006 - 12:55 pm UTC
What I'm trying to get at is, for the data currently in the database, what is the maximum number of concurrent things in any given minute (or quarter hour, or hour, etc.). If the events are airline flights, how many planes were in the air at the same time, assuming the database records the actual takeoff and landing times?
If there are multiple occasions with the maximum number, that's OK. We just report the number and (perhaps) the various times it did occur. Right now, I'm happy just to know the number.
In my particular example, I'd expect the answer to the question to be 3 (which occurs between 14:00 and 14:30 for events 1,2,and 3 from the diagram, or ids 3, 4, and 1 from the SQL; sorry for the id-event mismatch).
This almost seems like a job for Spatial, but I'd like to avoid that if I can. In this case, the events can be modeled as line segments, and we're assigning weights for the subsegements that are collinear.
This could be a start?
Lars Stampe Villadsen, April 11, 2006 - 1:56 pm UTC
This might bring you a bit further:
select ts.time, count(*) from
(select startdt time from t
union
select enddt+1/86400 from t) ts,
t
where ts.time between t.startdt and enddt
group by ts.time
TIME COUNT(*)
01-05-2006 12:00:00 1
01-05-2006 12:30:00 2
01-05-2006 12:40:01 1
01-05-2006 13:00:00 2
01-05-2006 14:00:00 3
01-05-2006 14:30:01 2
01-05-2006 15:00:01 1
01-05-2006 15:30:00 2
01-05-2006 16:00:01 1
A select max() on count(*) will give you the maximum number of events and the 'start-time' for the interval where it happens. Next time gives you the end of the interval
Lars
Great, Lars!
Basil, April 11, 2006 - 2:41 pm UTC
This solution seems to work perfectly. I tried another brute force approach that consisted of creating a buckets table (startdt, enddt, count). For each event, I incremented the count for each buckets row that was between the start and end date of the event. The results there agree with yours, although the buckets approach gives a lot more output. (And takes a LOT longer, especially if one is tracking by the minute, due to the number of buckets).
Similar query... but with RUNNING TOTAL GROUPING
Robert, February 16, 2007 - 6:02 pm UTC
Tom,
This question is very similar to the original.
But instead of comparing differences between times,
I want to do a running sum of values and divide into groups based on the sum for groups of consecutive rows.
drop table x;
create table x (id number, value number);
insert into x
select rownum
, trunc(dbms_random.value(1, 10))
from dba_users
where rownum < 30
/
----------------------------
-- Group into groups whose running sum is smallest value >= 20
----------------------------
-- select * from x;
ID VALUE
--------- ---------
1 3 -- 3
2 5 -- 5 + 3 = 8
3 2 -- 8 + 2 = 10
4 1 -- 1 + 10 = 11
5 8 -- 8 + 11 = 19
6 6 -- 6 + 19 = 25 (25 >= 20 --> This row and above go in group 1
7 9 -- 9
8 3 -- 3 + 9 = 12
9 4 -- 4 + 12 = 16
10 7 -- 7 + 16 = 23 (23 >= 20 --> These rows go into group 2
11 2 -- 2
12 5 -- 5 + 2 = 7
13 9 -- 9 + 7 = 16
14 5 -- 5 + 16 = 21 (21 >= 20 --> These rows go into group 3
15 7 -- 7
16 3 -- 3 + 7 = 10
17 6 -- 6 + 10 = 16
18 2 -- 2 + 16 = 18
19 1 -- 1 + 18 = 19
20 4 -- 4 + 19 = 23 (23 >=20 --> These rows go into group 4
21 8 -- ETC, ETC.......
22 9
23 7
24 7
25 9
26 1
27 5
28 3
29 5
I know there must be an 'easy' way to do this with analytics, Tom... but it is beyond me.
(P.S. I am going to run this query against a table we are range re-partitioning to get the partitions into approx equal sizes.)
Can you help?
Thanks,
Robert.
February 18, 2007 - 9:37 am UTC
... I know there must be an 'easy' way to do this with analytics .....
laughing out loud - no, there isn't :)
but I'm glad you said what your goal was - you want to use NTILE() to break the sets of rows up.
You need to determine the number of groups you want and then we can "ntile()" them to break them into equi-sized ranges:
ops$tkyte%ORA10GR2> select nt, min(id), max(id), count(*)
2 from (
3 select id, value,
4 ntile(4) over (order by id) nt
5 from x
6 )
7 group by nt
8 order by nt
9 /
NT MIN(ID) MAX(ID) COUNT(*)
---------- ---------- ---------- ----------
1 1 8 8
2 9 15 7
3 16 22 7
4 23 29 7
One level deeper than that...
Robert, February 19, 2007 - 9:41 am UTC
Tom,
My requirement actually goes one level of complexity deeper than your reply...
The ID column in my little table is representing the 'low value' of an ACCOUNT_ID RANGE... The VALUE column is representing NUMBER OF ROWS (e.g. 5,000,000) within that account_id range.
My goal is to break up the account_id ranges into equal sized partitions based on the the number of rows within each account_id range (i.e. range partitioned).
So my original example actually shows the results I am looking for.
Can you help? Thanks, Robert.
drop table x;
create table x (id number, value number);
insert into x
select rownum
, trunc(dbms_random.value(1, 10))
from dba_users
where rownum < 30
/
----------------------------
-- Group into groups whose running sum of VALUE column is smallest value >= 20
----------------------------
-- select * from x;
ID VALUE
(ACCOUNT (#ROWS
ID IN
RANGE ) ACCOUNT_ID RANGE)
--------- ---------
1 3 -- 3
2 5 -- 5 + 3 = 8
3 2 -- 8 + 2 = 10
4 1 -- 1 + 10 = 11
5 8 -- 8 + 11 = 19
6 6 -- 6 + 19 = 25 (25 >= 20 --> This row and above go in group 1
7 9 -- 9
8 3 -- 3 + 9 = 12
9 4 -- 4 + 12 = 16
10 7 -- 7 + 16 = 23 (23 >= 20 --> These rows go into group 2
11 2 -- 2
12 5 -- 5 + 2 = 7
13 9 -- 9 + 7 = 16
14 5 -- 5 + 16 = 21 (21 >= 20 --> These rows go into group 3
15 7 -- 7
16 3 -- 3 + 7 = 10
17 6 -- 6 + 10 = 16
18 2 -- 2 + 16 = 18
19 1 -- 1 + 18 = 19
20 4 -- 4 + 19 = 23 (23 >=20 --> These rows go into group 4
21 8 -- ETC, ETC.......
22 9
23 7
24 7
25 9
26 1
27 5
28 3
29 5
February 19, 2007 - 9:51 am UTC
we can do the "largest less than 25", but the "smallest number greater than 25" isn't something I see being done easily.
largest <= 25 is just "divide"
RE: Group into groups whose running sum of VALUE column is smallest value >= 20
Frank Zhou, February 22, 2007 - 3:49 pm UTC
Hi Robert,
Here is a 10G SQL Model Clause solution for "Group into groups whose running sum of VALUE column is smallest value >= 20"
Frank
SQL> SELECT id, value, sum,
2 CASE WHEN tmp IS NULL
3 THEN 'This row and above belongs to one group' END flag
4 FROM
5 (
6 SELECT id, value, sum ,tmp
7 FROM (SELECT id, value, count(*) OVER ( ) counter,
8 ROW_NUMBER() OVER (ORDER BY id ) rn FROM x)
9 MODEL
10 DIMENSION BY (rn)
11 MEASURES (id, value, counter, CAST(NULL AS NUMBER) sum, value tmp)
12 RULES ITERATE (10000)
13 UNTIL (ITERATION_NUMBER>= counter[1])
14 (sum[ITERATION_NUMBER] = case when tmp[ITERATION_NUMBER-1] is null
15 then tmp[ITERATION_NUMBER]
16 else tmp[ITERATION_NUMBER] + sum[cv()-1]
17 end,
18 tmp[ITERATION_NUMBER] = CASE when sum[ITERATION_NUMBER] <21
19 then 0
20 end
21 )
22 )
23 WHERE ID IS NOT NULL
24 ORDER BY ID;
ID VALUE SUM FLAG
---------- ---------- ---------- ---------------------------------------
1 2 2
2 4 6
3 7 13
4 8 21 This row and above belongs to one group
5 3 3
6 9 12
7 3 15
8 5 20
9 6 26 This row and above belongs to one group
10 8 8
11 3 11
12 4 15
13 7 22 This row and above belongs to one group
14 5 5
15 7 12
16 3 15
17 4 19
18 4 23 This row and above belongs to one group
19 4 4
20 2 6
21 3 9
22 8 17
23 3 20
24 9 29 This row and above belongs to one group
25 2 2
26 8 10
27 1 11
28 8 19
29 6 25 This row and above belongs to one group
29 rows selected.
SQL>
SQL> spool off;
RE: Group into groups whose running sum of VALUE column is smallest value >= 20
Frank Zhou, August 07, 2007 - 2:03 pm UTC
RE: Group into groups whose running sum of VALUE column is smallest value >= 20
Frank Zhou, August 08, 2007 - 10:08 am UTC
Grouping response times into discreet buckets
Allen Shatzer, August 29, 2007 - 11:11 am UTC
I know that the answer to this is right in front of my nose, but for the life of me, I can't put my finger on it.
I would like to write a sql statement that would take response times for an application and count the number that occur in each .25 second interval. That is, count the number of requests that are .25 seconds or less, greater than .25 seconds and .50 seconds or less, etc.
The problem I am having is how to group the rows into .25 second ranges without having to build a driving table with ranges for each possible range of .25 second intervals.
Is that the only way that you would approach this or is there some other "trick" that could be used?
Thanks for all of the help that you provide to the Oracle developer community!
September 04, 2007 - 4:00 pm UTC
ops$tkyte%ORA10GR2> create table t as select object_id/10000 secs from all_objects;
ops$tkyte%ORA10GR2> */
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select trunc( (secs-0.000001)/.25 ), min(secs), max(secs), count(*)
2 from t group by
3 trunc((secs-0.000001)/.25) order by 1;
TRUNC((SECS-0.000001)/.25) MIN(SECS) MAX(SECS) COUNT(*)
-------------------------- ---------- ---------- ----------
0 .0002 .25 2410
1 .2501 .5 2400
2 .5001 .75 1839
3 .7501 .9999 2219
4 1.0001 1.25 2457
5 1.2501 1.5 2500
6 1.5001 1.75 2500
7 1.7501 2 2500
8 2.0001 2.25 2500
9 2.2501 2.5 2499
10 2.5001 2.75 2500
11 2.7501 3 2500
12 3.0001 3.25 2500
13 3.2501 3.5 2500
14 3.5001 3.75 2497
15 3.7501 4 2498
16 4.0001 4.25 1586
17 4.2501 4.5 2464
18 4.5001 4.75 2008
19 4.7501 5 2264
20 5.0001 5.192 1028
21 5.3433 5.487 11
22 5.5191 5.6669 6
23 5.7823 5.7964 9
24 6.175 6.249 227
25 6.3285 6.4853 9
26 6.674 6.6746 2
27 6.8163 6.8164 2
28 7.1824 7.25 553
29 7.2501 7.3822 491
50 12.666 12.6844 13
51 12.7838 12.7838 1
52 13.2189 13.2189 1
53 13.4234 13.4235 2
55 13.8862 13.9986 22
56 14.0064 14.2309 146
36 rows selected.
Why 0.0001
babata, December 30, 2007 - 7:59 am UTC
Good morning Sir ,
I just want to know in the request
ops$tkyte@ORA9IR2> select did, min(cons_time), max(cons_time), sum(cons_fee)
2 from (
3 select did,
4 cons_time,
5 cons_fee,
6 trunc(abs( (((cons_time-min(cons_time) over (partition by did))*24*60)-0.0001)/5 ))
grp
7 from t
8 )
9 group by did, grp
10 /
why -0.0001 ??
thanks,
Reader, June 30, 2009 - 5:30 pm UTC
Tom,
select recid, key_date, price,val
trunc(abs( (((key_date-min(key_date) over (partition by val))*24*60)-0.0001)/5 )) grp
from config_test order by key_date
I need the grp to get aggregated values of price for 5 minute range. can you tell how to get this?
I also need to get the price_prev as shown below
At 9:35:01 (group 1) I need to get the price of the 1st record at the previous range
key_date=05/29/2009 09:35:00 recid=6;price=90
At 9:40:01 (group 2) I need to get the price of the 1st record at the previous range
key_date=05/29/2009 09:35:00 recid=6;price=124
Not sure if my explanation is clear. Please let me know if I am missing something or not clear.
RECID KEY_DATE PRICE VAL GRP price_prev
---------- ------------------- ---------- ---------- ---------- ---------
1 05/29/2009 09:30:00 100 A 0
2 05/29/2009 09:30:00 102 A 0
3 05/29/2009 09:30:00 104 A 0
4 05/29/2009 09:30:03 105 A 0
5 05/29/2009 09:30:03 106 A 0
6 05/29/2009 09:35:00 90 A 0
7 05/29/2009 09:35:00 80 A 0
8 05/29/2009 09:35:01 80 A 1 90
9 05/29/2009 09:35:01 80 A 1
10 05/29/2009 09:35:01 80 A 1
11 05/29/2009 09:35:02 120 A 1
12 05/29/2009 09:35:02 121 A 1
13 05/29/2009 09:35:02 123 A 1
14 05/29/2009 09:40:00 124 A 1
15 05/29/2009 09:40:00 125 A 1
16 05/29/2009 09:40:01 150 A 2 124
17 05/29/2009 09:40:01 150 A 2
18 05/29/2009 09:40:01 150 A 2
July 06, 2009 - 6:56 pm UTC
... I need the grp to get aggregated values of price for 5 minute range. can you
tell how to get this?
...
you need to define that infinitely more precisely. As written - it is not meaningful to me.
eg: do you simply want to take the date column and round down to the nearest 5 minutes and aggregate
or
do you want to take the "first row in the result and group together every row within 5 minutes of it and add up the values, then take the second row and do the same...."
or
do you want to take the first record and if the second is within 5 minutes, they are in the same group - else the second is the start of a new group. and so on and so on and so on
I can interpret your 'goal' is so so so many different ways.
and if you do not give us an interesting create table and inserts - we will never give you a select statement.
Reader, June 30, 2009 - 5:45 pm UTC
Apologies for the wrong info in the previous question
At 9:40:01 (group 2) I need to get the price of the 1st record at the previous range
key_date=05/29/2009 09:40:00 recid=14;price=124
Reader, July 01, 2009 - 10:49 am UTC
Tom,
Please ignore the previous explanation.
select recid, key_date, price,val
trunc(abs( (((key_date-min(key_date) over (partition by val))*24*60)-0.0001)/5 )) grp
from config_test order by key_date
I need the grp to get aggregated values of price for 5 minute range.
can you tell how to get the following?
I need to get the price_prev as shown below
At 9:35:01 (group 1) I need to get the price of the 1st record at the previous range
key_date=05/29/2009 09:35:00 recid=6;price=90
At 9:40:01 (group 2) I need to get the price of the 1st record at the previous range
key_date=05/29/2009 09:40:00 recid=14;price=124
Not sure if my explanation is clear. Please let me know if I am missing something or not clear.
RECID KEY_DATE PRICE VAL GRP price_prev
---------- ------------------- ---------- ---------- ---------- ---------
1 05/29/2009 09:30:00 100 A 0
2 05/29/2009 09:30:00 102 A 0
3 05/29/2009 09:30:00 104 A 0
4 05/29/2009 09:30:03 105 A 0
5 05/29/2009 09:30:03 106 A 0
6 05/29/2009 09:35:00 90 A 0
7 05/29/2009 09:35:00 80 A 0
8 05/29/2009 09:35:01 80 A 1 90
9 05/29/2009 09:35:01 80 A 1
10 05/29/2009 09:35:01 80 A 1
11 05/29/2009 09:35:02 120 A 1
12 05/29/2009 09:35:02 121 A 1
13 05/29/2009 09:35:02 123 A 1
14 05/29/2009 09:40:00 124 A 1
15 05/29/2009 09:40:00 125 A 1
16 05/29/2009 09:40:01 150 A 2 124
17 05/29/2009 09:40:01 150 A 2
18 05/29/2009 09:40:01 150 A 2
Reader, July 07, 2009 - 12:07 pm UTC
Sorry Tom for not being clear and not sending the create/insert statements
create table tst(recid number, key_dt date, price number, range_1 number, range2 number, val varchar2(10));
insert into tst values(1, to_date('07/07/2009 09:30:00 am','mm/dd/yyyy hh:mi:ss am') ,100,99,101, 'A');
insert into tst values(2, to_date('07/07/2009 09:30:00 am','mm/dd/yyyy hh:mi:ss am') ,102,101,103,'A');
insert into tst values(3, to_date('07/07/2009 09:30:00 am','mm/dd/yyyy hh:mi:ss am') ,104,102,105, 'A');
insert into tst values(4, to_date('07/07/2009 09:30:03 am','mm/dd/yyyy hh:mi:ss am') ,105,104,106, 'A');
insert into tst values(5, to_date('07/07/2009 09:30:03 am','mm/dd/yyyy hh:mi:ss am') ,106,105,107, 'A');
insert into tst values(6, to_date('07/07/2009 09:35:00 am','mm/dd/yyyy hh:mi:ss am') ,90 ,89,91,'A');
insert into tst values(7, to_date('07/07/2009 09:35:00 am','mm/dd/yyyy hh:mi:ss am') ,80 ,79,81,'A');
insert into tst values(8, to_date('07/07/2009 09:35:01 am','mm/dd/yyyy hh:mi:ss am') ,80 ,79,81,'A');
insert into tst values(9, to_date('07/07/2009 09:35:01 am','mm/dd/yyyy hh:mi:ss am') ,80 ,79,81,'A');
insert into tst values(10, to_date('07/07/2009 09:35:01 am','mm/dd/yyyy hh:mi:ss am'),80 ,79,81,'A');
insert into tst values(11, to_date('07/07/2009 09:35:02 am','mm/dd/yyyy hh:mi:ss am'),120,119,121,'A');
insert into tst values(12, to_date('07/07/2009 09:35:02 am','mm/dd/yyyy hh:mi:ss am'),121,120,122,'A');
insert into tst values(13, to_date('07/07/2009 09:35:02 am','mm/dd/yyyy hh:mi:ss am'),123,122,124,'A');
insert into tst values(14, to_date('07/07/2009 09:40:00 am','mm/dd/yyyy hh:mi:ss am'),124,123,125,'A');
insert into tst values(15, to_date('07/07/2009 09:40:00 am','mm/dd/yyyy hh:mi:ss am'),125,124,126,'A');
insert into tst values(16, to_date('07/07/2009 09:40:01 am','mm/dd/yyyy hh:mi:ss am'),150,149,151,'A');
insert into tst values(17, to_date('07/07/2009 09:40:01 am','mm/dd/yyyy hh:mi:ss am'),150,149,151,'A');
insert into tst values(18, to_date('07/07/2009 09:40:01 am','mm/dd/yyyy hh:mi:ss am'),150,149,151,'A');
insert into tst values(19, to_date('07/07/2009 09:45:00 am','mm/dd/yyyy hh:mi:ss am'),200,199,201,'A');
insert into tst values(20, to_date('07/07/2009 3:55:00 pm','mm/dd/yyyy hh:mi:ss am'),400,399,401,'A');
insert into tst values(21, to_date('07/07/2009 3:55:01 pm','mm/dd/yyyy hh:mi:ss am'),3,2,4,'A');
insert into tst values(22, to_date('07/07/2009 4:00:00 pm','mm/dd/yyyy hh:mi:ss am'),8,7,9,'A');
commit;
I have to start the grouping at 5 minute interval as shown below. The group start time is 9:35:00
Records from 9:30:00 to 9:35:00 will come under 9:35:00 group
Records from 9:35:01 to 9:40:00 will come under 9:40:00 group and so on until 4:00 pm
Records from 3:55:01 pm to 4:00 pm will come under 4:00:00 pm group
Based on this grouping I need to do aggregation. For example, sum of price at each 5 minute interval group based on val.I also need to get the last value of range_1 and range_2 at each interval ordered by recid.
In the end one record per group (val, key_dt,sum,last_rng_1)
For example
val KEY_DT sum last_rng_1
-- ---------------------- --- ----------
A 07/07/2009 09:35:00 am 687 79
A 07/07/2009 09:40:00 am 853 124
A 07/07/2009 09:45:00 am 650 199
A 07/07/2009 03:55:00 pm 400 399
A 07/07/2009 04:00:00 pm 11 7
Can you advice?
July 07, 2009 - 7:02 pm UTC
basically, round to the nearest 5 minutes.
ops$tkyte%ORA10GR2> select dt2, min(dt), max(dt)
2 from (
3 select dt,
4 trunc(dt,'hh') +
5 (( decode( mod(to_number(to_char(dt,'mi')),5)+to_char(dt,'ss'), 0, 0, 1 ) +
6 trunc(to_number(to_char(dt,'mi'))/5) ) * 5) * (1/24/60) dt2
7 from (select trunc(sysdate)+1/24/60/60*level dt
8 from dual connect by level <= 24*60*60+10)
9 )
10 group by dt2
11 order by dt2
12 /
lines 4-6 brute forces a date to that 5 minute window, just use that.
Reader, July 07, 2009 - 12:31 pm UTC
Insert statements are not clear for the above question when I pasted. I am re-sending.
create table tst(recid number, key_dt date, price number, range_1 number, range2 number, val varchar2(10));
insert into tst values(1, to_date('07/07/2009 09:30:00 am','mm/dd/yyyy hh:mi:ss am'),100,99,101,'A');
insert into tst values(2, to_date('07/07/2009 09:30:00 am','mm/dd/yyyy hh:mi:ss am'),102,101,103,'A');
insert into tst values(3, to_date('07/07/2009 09:30:00 am','mm/dd/yyyy hh:mi:ss am'),104,102,105,'A');
insert into tst values(4, to_date('07/07/2009 09:30:03 am','mm/dd/yyyy hh:mi:ss am'),105,104,106,'A');
insert into tst values(5, to_date('07/07/2009 09:30:03 am','mm/dd/yyyy hh:mi:ss am'),106,105,107,'A');
insert into tst values(6, to_date('07/07/2009 09:35:00 am','mm/dd/yyyy hh:mi:ss am'),90,89,91,'A');
insert into tst values(7, to_date('07/07/2009 09:35:00 am','mm/dd/yyyy hh:mi:ss am'),80,79,81,'A');
insert into tst values(8, to_date('07/07/2009 09:35:01 am','mm/dd/yyyy hh:mi:ss am'),80,79,81,'A');
insert into tst values(9, to_date('07/07/2009 09:35:01 am','mm/dd/yyyy hh:mi:ss am'),80,79,81,'A');
insert into tst values(10, to_date('07/07/2009 09:35:01 am','mm/dd/yyyy hh:mi:ss am'),80,79,81,'A');
insert into tst values(11, to_date('07/07/2009 09:35:02 am','mm/dd/yyyy hh:mi:ss am'),120,119,121,'A');
insert into tst values(12, to_date('07/07/2009 09:35:02 am','mm/dd/yyyy hh:mi:ss am'),121,120,122,'A');
insert into tst values(13, to_date('07/07/2009 09:35:02 am','mm/dd/yyyy hh:mi:ss am'),123,122,124,'A');
insert into tst values(14, to_date('07/07/2009 09:40:00 am','mm/dd/yyyy hh:mi:ss am'),124,123,125,'A');
insert into tst values(15, to_date('07/07/2009 09:40:00 am','mm/dd/yyyy hh:mi:ss am'),125,124,126,'A');
insert into tst values(16, to_date('07/07/2009 09:40:01 am','mm/dd/yyyy hh:mi:ss am'),150,149,151,'A');
insert into tst values(17, to_date('07/07/2009 09:40:01 am','mm/dd/yyyy hh:mi:ss am'),150,149,151,'A');
insert into tst values(18, to_date('07/07/2009 09:40:01 am','mm/dd/yyyy hh:mi:ss am'),150,149,151,'A');
insert into tst values(19, to_date('07/07/2009 09:45:00 am','mm/dd/yyyy hh:mi:ss am'),200,199,201,'A');
insert into tst values(20, to_date('07/07/2009 3:55:00 pm','mm/dd/yyyy hh:mi:ss am'),400,399,401,'A');
insert into tst values(21, to_date('07/07/2009 3:55:01 pm','mm/dd/yyyy hh:mi:ss am'),3,2,4,'A');
insert into tst values(22, to_date('07/07/2009 4:00:00 pm','mm/dd/yyyy hh:mi:ss am'),8,7,9,'A');
commit;
Reader, July 08, 2009 - 11:51 am UTC
Thank you for the query.
In regards to the above solution that you suggested, I wrote the query as below to get the data I want.
Can you suggest if this is the correct way? If not, can you please suggest how it should be done?
select *
from(
select dt_all.dt2,TST.VAL,TST.KEY_DT,price,recid,range_1
,sum(tst.PRICE) over (partition by val,dt2) price_sum
,first_value(range_1) over (partition by val,dt2 order by recid desc) last_val_rang_1
,row_number() over (partition by val,dt2 order by recid desc) rn
from tst ,
(select dt2, min(dt) min_dt, max(dt) max_dt
from(select dt,
trunc(dt,'hh') +
(( decode( mod(to_number(to_char(dt,'mi')),5)+to_char(dt,'ss'), 0, 0, 1 ) +
trunc(to_number(to_char(dt,'mi'))/5) ) * 5) * (1/24/60) dt2
from (select trunc(sysdate)+1/24/60/60*level dt
from dual connect by level <= 24*60*60+10))
where dt2 between to_date('07/08/2009 09:30:00 am','mm/dd/yyyy hh:mi:ss am')
and to_date('07/08/2009 04:00:00 pm','mm/dd/yyyy hh:mi:ss am')
group by dt2
order by dt2
) dt_all
where tst.KEY_DT between dt_all.min_dt and dt_all.max_dt
)
where rn = 1
Reader, July 09, 2009 - 11:35 am UTC
Tom,
I was wondering if you could give suggestion for the above question.
July 14, 2009 - 3:06 pm UTC
did you analyze YOUR query and determine that it answers your question?
Your output does not match what I think it should be.
For example
val KEY_DT sum last_rng_1
-- ---------------------- --- ----------
A 07/07/2009 09:35:00 am 687 79
A 07/07/2009 09:40:00 am 853 124
A 07/07/2009 09:45:00 am 650 199
A 07/07/2009 03:55:00 pm 400 399
you posted that. But, when I run:
ops$tkyte%ORA10GR2> select recid, key_dt,
2 trunc(key_dt,'hh') +
3 (( decode( mod(to_number(to_char(key_dt,'mi')),5)+to_char(key_dt,'ss'), 0, 0, 1 ) +
4 trunc(to_number(to_char(key_dt,'mi'))/5) ) * 5) * (1/24/60) key_dt2,
5 price, range_1, range2, val
6 from tst
7 order by key_dt
8 /
RECID KEY_DT KEY_DT2 PRICE RANGE_1 RANGE2 VAL
---------- -------------------- -------------------- ---------- ---------- ---------- ----------
1 07-jul-2009 09:30:00 07-jul-2009 09:30:00 100 99 101 A
2 07-jul-2009 09:30:00 07-jul-2009 09:30:00 102 101 103 A
3 07-jul-2009 09:30:00 07-jul-2009 09:30:00 104 102 105 A
4 07-jul-2009 09:30:03 07-jul-2009 09:35:00 105 104 106 A
5 07-jul-2009 09:30:03 07-jul-2009 09:35:00 106 105 107 A
6 07-jul-2009 09:35:00 07-jul-2009 09:35:00 90 89 91 A
7 07-jul-2009 09:35:00 07-jul-2009 09:35:00 80 79 81 A
8 07-jul-2009 09:35:01 07-jul-2009 09:40:00 80 79 81 A
9 07-jul-2009 09:35:01 07-jul-2009 09:40:00 80 79 81 A
10 07-jul-2009 09:35:01 07-jul-2009 09:40:00 80 79 81 A
11 07-jul-2009 09:35:02 07-jul-2009 09:40:00 120 119 121 A
13 07-jul-2009 09:35:02 07-jul-2009 09:40:00 123 122 124 A
12 07-jul-2009 09:35:02 07-jul-2009 09:40:00 121 120 122 A
15 07-jul-2009 09:40:00 07-jul-2009 09:40:00 125 124 126 A
14 07-jul-2009 09:40:00 07-jul-2009 09:40:00 124 123 125 A
17 07-jul-2009 09:40:01 07-jul-2009 09:45:00 150 149 151 A
16 07-jul-2009 09:40:01 07-jul-2009 09:45:00 150 149 151 A
18 07-jul-2009 09:40:01 07-jul-2009 09:45:00 150 149 151 A
19 07-jul-2009 09:45:00 07-jul-2009 09:45:00 200 199 201 A
20 07-jul-2009 15:55:00 07-jul-2009 15:55:00 400 399 401 A
21 07-jul-2009 15:55:01 07-jul-2009 16:00:00 3 2 4 A
22 07-jul-2009 16:00:00 07-jul-2009 16:00:00 8 7 9 A
22 rows selected.
I don't see how you get your sum - my sum is:
ops$tkyte%ORA10GR2> select max(val) val,
2 key_dt2,
3 sum(range_1) sumr1 ,
4 max(range_1) keep (dense_rank first order by key_dt DESC) lastr1
5 from (
6 select recid, key_dt,
7 trunc(key_dt,'hh') +
8 (( decode( mod(to_number(to_char(key_dt,'mi')),5)+to_char(key_dt,'ss'), 0, 0, 1 ) +
9 trunc(to_number(to_char(key_dt,'mi'))/5) ) * 5) * (1/24/60) key_dt2,
10 price, range_1, range2, val
11 from tst
12 )
13 group by key_dt2
14 order by key_dt2
15 /
VAL KEY_DT2 SUMR1 LASTR1
---------- -------------------- ---------- ----------
A 07-jul-2009 09:30:00 302 102
A 07-jul-2009 09:35:00 377 89
A 07-jul-2009 09:40:00 845 124
A 07-jul-2009 09:45:00 646 199
A 07-jul-2009 15:55:00 399 399
A 07-jul-2009 16:00:00 9 7
6 rows selected.
and why is 79 the "last record", it arrived at the same time as 89. 89 is the last record I think (why do I think that? because I do, they arrived at the same time, they are both last)
Rohit, October 06, 2009 - 8:14 am UTC
Hi Tom,
This example is very useful at many places in our systems. I was wondering how would break the group to be able to report on occourances every minutes within first instance.
Here is the sample data set and expected output
drop table t
/
create table t ( x date, y int,id int )
/
INSERT INTO T ( T.X, T.Y, T.ID )
VALUES ( TO_DATE('10/06/2008 03:34:40 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 228 )
/
INSERT INTO T ( T.X, T.Y, T.ID )
VALUES ( TO_DATE('10/06/2008 03:34:30 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 228 )
/
INSERT INTO T ( T.X, T.Y, T.ID )
VALUES ( TO_DATE('10/06/2008 03:34:50 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 228 )
/
INSERT INTO T ( T.X, T.Y, T.ID )
VALUES ( TO_DATE('10/06/2008 04:28:30 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 228 )
/
INSERT INTO T ( T.X, T.Y, T.ID )
VALUES ( TO_DATE('10/06/2008 04:29:30 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 228 )
/
INSERT INTO T ( T.X, T.Y, T.ID )
VALUES ( TO_DATE('10/06/2008 04:30:30 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 228 )
/
INSERT INTO T ( T.X, T.Y, T.ID )
VALUES ( TO_DATE('10/06/2008 04:30:29 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 228 )
/
INSERT INTO T ( T.X, T.Y, T.ID )
VALUES ( TO_DATE('10/06/2008 04:27:35 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 12, 228 )
/
INSERT INTO T ( T.X, T.Y, T.ID )
VALUES ( TO_DATE('10/06/2008 03:33:34 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 21, 228 )
/
INSERT INTO T ( T.X, T.Y, T.ID )
VALUES ( TO_DATE('10/06/2008 03:34:15 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 228 )
/
INSERT INTO T ( T.X, T.Y, T.ID )
VALUES ( TO_DATE('10/07/2008 12:59:40 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 229 )
/
INSERT INTO T ( T.X, T.Y, T.ID )
VALUES ( TO_DATE('10/07/2008 12:59:50 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 229 )
/
INSERT INTO T ( T.X, T.Y, T.ID )
VALUES ( TO_DATE('10/07/2008 01:00:05 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 229 )
/
INSERT INTO T ( T.X, T.Y, T.ID )
VALUES ( TO_DATE('10/07/2008 01:00:30 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 229 )
/
INSERT INTO T ( T.X, T.Y, T.ID )
VALUES ( TO_DATE('10/07/2008 01:00:49 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 229 )
/
INSERT INTO T ( T.X, T.Y, T.ID )
VALUES ( TO_DATE('10/07/2008 01:00:55 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 229 )
/
commit
/
I am expecting output like this
id x sum(y) grp
228 06-Oct-08 03.33.34.00 PM 41 0
228 06-Oct-08 03.34.40.00 PM 20 1
228 06-Oct-08 04.27.35.00 PM 22 2
228 06-Oct-08 04.29.30.00 PM 20 3
228 06-Oct-08 04.30.30.00 PM 10 4
229 07-Oct-08 12.59.40.00 PM 40 5
229 07-Oct-08 01.00.49.00 PM 20 6
Thank you
Rohit
October 08, 2009 - 7:09 am UTC
... I was wondering how
would break the group to be able to report on occourances every minutes within
first instance.
...
explain in LARGE, LONG detail what "within first instance" means?
and I see id, but it doesn't see we are to partition by ID? explain the output - IN DETAIL - in painstaking detail - take at least 5, if not 10 minutes to do this. give us a specification.
rohit, October 06, 2009 - 11:58 am UTC
Data in the table looks like this:
X Y ID
06-Oct-08 03.33.34.00 PM 21 228<=g1
06-Oct-08 03.34.15.00 PM 10 228<=g1
06-Oct-08 03.34.30.00 PM 10 228<=g1
06-Oct-08 03.34.40.00 PM 10 228<=g2
06-Oct-08 03.34.50.00 PM 10 228<=g3
06-Oct-08 04.27.35.00 PM 12 228<=g4
06-Oct-08 04.28.30.00 PM 10 228<=g5
06-Oct-08 04.29.30.00 PM 10 228<=g6
06-Oct-08 04.30.29.00 PM 10 228<=g6
06-Oct-08 04.30.30.00 PM 10 228<=g7
07-Oct-08 12.59.40.00 PM 10 229<=g1
07-Oct-08 12.59.50.00 PM 10 229<=g2
07-Oct-08 01.00.05.00 PM 10 229<=g2
07-Oct-08 01.00.30.00 PM 10 229<=g2
07-Oct-08 01.00.49.00 PM 10 229<=g3
07-Oct-08 01.00.55.00 PM 10 229<=g4
October 08, 2009 - 7:13 am UTC
that doesn't mean a thing to me.
rohit, October 11, 2009 - 7:49 am UTC
Sorry, I was not clear enough. We are getting duplicate data from the source. Requirement is to process and get only first record with in first 60 seconds.
For example these source records
X Y ID
06-Oct-08 03.33.34.00 PM 21 228
06-Oct-08 03.34.15.00 PM 10 228
06-Oct-08 03.34.30.00 PM 10 228
Output should be ignoring all following records within 60 seconds of 06-Oct-08 03.33.34.00 PM
X Y ID
06-Oct-08 03.33.34.00 PM 21 228
Same way for below records
06-Oct-08 03.34.40.00 PM 10 228
06-Oct-08 03.34.50.00 PM 10 228
I should get
06-Oct-08 03.34.40.00 PM 10 228
Problem I am facing is to determine next group of records. I could do it with PL/SQL procedure but I have to process at least 5M records every day and I believe sql would be much faster. I hope, I make my self more clear.
Thanks
rohit
Try this
Steve G, October 12, 2009 - 1:04 pm UTC
with e as (
select
x
, y
, id
, row_number() over (partition by id order by x) as rn
from t
)
select
to_char(x, 'MM/DD/YYYY HH12:MI:SS AM')
, y
, id
from e
where rn = 1
@Rohit re: group records within a minute
Stew Ashton, October 17, 2009 - 7:50 am UTC
Rohit,
I take you to mean:
- partition by id, order by x
- the first "group" starts with the first record and includes all records less than a minute later than the first record
- the next "group" starts with the next record after the previous group and includes all records less than a minute later than that next record
- and so on.
My approach is:
- assign row numbers
- assume each record is the first record of a "group" and get the row number of the last record in that "group". This gives me the good "groups", plus a lot of overlapping groups.
- use START WITH...CONNECT BY to select out the good "groups".
select id, x, last_x, rn, last_rn, y, sum_y from (
select id, x, y, rn
, sum(y) over(partition by id order by x
range between current row and numtodsinterval(59, 'second') following) sum_y
, last_value(x) over(partition by id order by x
range between current row and numtodsinterval(59, 'second') following) last_x
, last_value(rn) over(partition by id order by x
range between current row and numtodsinterval(59, 'second') following) last_rn
from (
select id, x, y, row_number() over(partition by id order by x) rn from t
)
)
start with rn = 1
connect by (id, rn) = ((prior id, prior last_rn+1))
/
ID X LAST_X RN LAST_RN Y SUM_Y
---- -------------------- -------------------- --- -------- --- -----
228 06-10-2008 15:33:34 06-10-2008 15:34:30 1 3 21 41
228 06-10-2008 15:34:40 06-10-2008 15:34:50 4 5 10 20
228 06-10-2008 16:27:35 06-10-2008 16:28:30 6 7 12 22
228 06-10-2008 16:29:30 06-10-2008 16:30:29 8 9 10 20
228 06-10-2008 16:30:30 06-10-2008 16:30:30 10 10 10 10
229 07-10-2008 12:59:40 07-10-2008 13:00:30 1 4 10 40
229 07-10-2008 13:00:49 07-10-2008 13:00:55 5 6 10 20
7 rows selected
Rohit, October 21, 2009 - 11:06 am UTC
Hi Stew,
Thanks for your response. I don't know what I did wrong. I am getting different results. Getting only one record for 10/06/09 for id 228 as suppose to 4 more records. Could you check the results.
Thanks,
Rohit
ID X LAST_X RN LAST_RN Y SUM_Y
228 10/6/2008 3:33:34.000 PM 10/6/2008 3:34:30.000 PM 1 3 21 41
229 10/7/2008 12:59:40.000 PM 10/7/2008 1:00:30.000 PM 1 4 10 40
229 10/7/2008 1:00:49.000 PM 10/7/2008 1:00:55.000 PM 5 6 10 20
@rohit re: problem with my approach
Stew Ashton, October 21, 2009 - 3:12 pm UTC
Hi again rohit,
The last line of my code is
connect by (id, rn) = ((prior id, prior last_rn+1))
not
connect by (id, rn) = ((prior id+1, prior last_rn+2))
If that is not the problem, try running what follows one select at a time to see where your results differ from mine. Please don't type this in yourself; use copy and paste.
select count(*) from t
/
COUNT(*)
----------------------
16
1 rows selected
/
select id, x, y, row_number() over(partition by id order by x) rn from t
order by id, x
/
ID X Y RN
--- ------------------------- ---------------------- ------
228 06-10-2008 15:33:34 21 1
228 06-10-2008 15:34:15 10 2
228 06-10-2008 15:34:30 10 3
228 06-10-2008 15:34:40 10 4
228 06-10-2008 15:34:50 10 5
228 06-10-2008 16:27:35 12 6
228 06-10-2008 16:28:30 10 7
228 06-10-2008 16:29:30 10 8
228 06-10-2008 16:30:29 10 9
228 06-10-2008 16:30:30 10 10
229 07-10-2008 12:59:40 10 1
229 07-10-2008 12:59:50 10 2
229 07-10-2008 13:00:05 10 3
229 07-10-2008 13:00:30 10 4
229 07-10-2008 13:00:49 10 5
229 07-10-2008 13:00:55 10 6
16 rows selected
/
select id, x, y, rn
, last_value(rn) over(partition by id order by x
range between current row and numtodsinterval(59, 'second') following) last_rn
from (
select id, x, y, row_number() over(partition by id order by x) rn from t
)
order by id, x
/
ID X Y RN LAST_RN
---- ------------------------- ---------------------- ---------------------- --------
228 06-10-2008 15:33:34 21 1 3
228 06-10-2008 15:34:15 10 2 5
228 06-10-2008 15:34:30 10 3 5
228 06-10-2008 15:34:40 10 4 5
228 06-10-2008 15:34:50 10 5 5
228 06-10-2008 16:27:35 12 6 7
228 06-10-2008 16:28:30 10 7 7
228 06-10-2008 16:29:30 10 8 9
228 06-10-2008 16:30:29 10 9 10
228 06-10-2008 16:30:30 10 10 10
229 07-10-2008 12:59:40 10 1 4
229 07-10-2008 12:59:50 10 2 5
229 07-10-2008 13:00:05 10 3 6
229 07-10-2008 13:00:30 10 4 6
229 07-10-2008 13:00:49 10 5 6
229 07-10-2008 13:00:55 10 6 6
16 rows selected
/
select id, x, rn, last_rn, y from (
select id, x, y, rn
, last_value(rn) over(partition by id order by x
range between current row and numtodsinterval(59, 'second') following) last_rn
from (
select id, x, y, row_number() over(partition by id order by x) rn from t
)
)
start with rn = 1
connect by (id, rn) = ((prior id, prior last_rn+1))
/
ID X RN LAST_RN Y
--- ------------------------- ---------------------- ---------------------- ---
228 06-10-2008 15:33:34 1 3 21
228 06-10-2008 15:34:40 4 5 10
228 06-10-2008 16:27:35 6 7 12
228 06-10-2008 16:29:30 8 9 10
228 06-10-2008 16:30:30 10 10 10
229 07-10-2008 12:59:40 1 4 10
229 07-10-2008 13:00:49 5 6 10
7 rows selected
Rohit, October 23, 2009 - 10:49 am UTC
Hi Stew,
Thanks for your response again. I had tried again but still did not work I got the same output. Then I tried on Oracle version 9.2.0.8 and it worked. Its hard to believe but it seems that Oracle 11.1.0.7 have bug while using start with connect by.
I am checking with Oracle support if they could suggest a solution for this bug.
Thanks so much and really appreciate your help.
Regards,
Rohit
Group by interval using alternate approach
Narendra Prabhudesai, July 30, 2010 - 6:04 am UTC
Hello Tom,
I am trying to get first record within an interval of 1 minute for an ID from following resultset
X Y ID
-------------------- ---------- ----------
06-OCT-2008 15:33:34 21 228
06-OCT-2008 15:34:15 10 228
06-OCT-2008 15:34:30 10 228
06-OCT-2008 15:34:40 10 228
06-OCT-2008 15:34:50 10 228
06-OCT-2008 16:27:35 12 228
06-OCT-2008 16:28:30 10 228
06-OCT-2008 16:29:30 10 228
06-OCT-2008 16:30:29 10 228
06-OCT-2008 16:30:30 10 228
07-OCT-2008 12:59:40 10 229
07-OCT-2008 12:59:50 10 229
07-OCT-2008 13:00:05 10 229
07-OCT-2008 13:00:30 10 229
07-OCT-2008 13:00:49 10 229
07-OCT-2008 13:00:55 10 229
For e.g. the following set of records will be considered as a group
X Y ID
-------------------- ---------- ----------
06-OCT-2008 15:33:34 21 228
06-OCT-2008 15:34:15 10 228
06-OCT-2008 15:34:30 10 228
and should generate the output as
X Y ID
-------------------- ---------- ----------
06-OCT-2008 15:33:34 21 228
The next set of records to be considered as a group will be
X Y ID
-------------------- ---------- ----------
06-OCT-2008 15:34:40 10 228
06-OCT-2008 15:34:50 10 228
and should generate the output as
X Y ID
-------------------- ---------- ----------
06-OCT-2008 15:34:40 10 228
and so on.
I am able to get the desired results using the approach suggested by Stew Ashton above
SQL> select id, x, rn, last_rn, y from
2 (select id, x, y, rn,
3 last_value(rn) over (partition by id order by x range between current row and numtodsinterval(59, 'second') following) last_rn from
4 (select id, x, y, row_number() over (partition by id order by x) rn from t1))
5 start with rn = 1
6 connect by (id, rn) = ((prior id, prior last_rn+1))
7 /
ID X RN LAST_RN Y
---------- -------------------- ---------- ---------- ----------
228 06-OCT-2008 15:33:34 1 3 21
228 06-OCT-2008 15:34:40 4 5 10
228 06-OCT-2008 16:27:35 6 7 12
228 06-OCT-2008 16:29:30 8 9 10
228 06-OCT-2008 16:30:30 10 10 10
229 07-OCT-2008 12:59:40 1 4 10
229 07-OCT-2008 13:00:49 5 6 10
7 rows selected.
However, I was wondering if it would be possible to use the approach that you suggested at
http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:13946369553642#14611583328217 and I got almost identical results except for one difference. While Stew's query gave
X
--------------------
06-OCT-2008 16:29:30
06-OCT-2008 16:30:30
the other approach gave
X
--------------------
06-OCT-2008 16:29:30
06-OCT-2008 16:30:29
Can you help?
Following are the complete test case details:
SQL> select * from v$version ;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE 10.2.0.4.0 Production
TNS for Solaris: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production
SQL> create table t1 (x date, y int, id int) ;
Table created.
SQL> insert into t1 values (to_date('10/06/2008 03:34:40 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 228)
2 /
1 row created.
SQL> insert into t1 values (to_date('10/06/2008 03:34:30 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 228)
2 /
1 row created.
SQL> insert into t1 values (to_date('10/06/2008 03:34:50 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 228)
2 /
1 row created.
SQL> insert into t1 values (to_date('10/06/2008 04:28:30 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 228)
2 /
1 row created.
SQL> insert into t1 values (to_date('10/06/2008 04:29:30 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 228)
2 /
1 row created.
SQL> insert into t1 values (to_date('10/06/2008 04:30:30 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 228)
2 /
1 row created.
SQL> insert into t1 values (to_date('10/06/2008 04:30:29 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 228)
2 /
1 row created.
SQL> insert into t1 values (to_date('10/06/2008 04:27:35 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 228)
2 /
1 row created.
SQL> update t1 set y = 12 where x = to_date('10/06/2008 04:27:35 PM', 'MM/DD/YYYY HH12:MI:SS AM') and y = 10 ;
1 row updated.
SQL> insert into t1 values (to_date('10/06/2008 03:33:34 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 21, 228)
2 /
1 row created.
SQL> insert into t1 values (to_date('10/06/2008 03:34:15 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 228)
2 /
1 row created.
SQL> insert into t1 values (to_date('10/07/2008 12:59:40 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 229)
2 /
1 row created.
SQL> insert into t1 values (to_date('10/07/2008 12:59:50 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 229)
2 /
1 row created.
SQL> insert into t1 values (to_date('10/07/2008 01:00:05 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 229)
2 /
1 row created.
SQL> insert into t1 values (to_date('10/07/2008 01:00:30 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 229)
2 /
1 row created.
SQL> insert into t1 values (to_date('10/07/2008 01:00:49 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 229)
2 /
1 row created.
SQL> insert into t1 values (to_date('10/07/2008 01:00:55 PM', 'MM/DD/YYYY HH12:MI:SS AM'), 10, 229)
2 /
1 row created.
SQL> commit ;
Commit complete.
SQL> exec dbms_stats.gather_table_stats(user, 'T1') ;
PL/SQL procedure successfully completed.
SQL> select count(*) from t1 ;
COUNT(*)
----------
16
SQL> select id, x, y, row_number() over (partition by id order by x) rn from t1 order by id, x
2 /
ID X Y RN
---------- -------------------- ---------- ----------
228 06-OCT-2008 15:33:34 21 1
228 06-OCT-2008 15:34:15 10 2
228 06-OCT-2008 15:34:30 10 3
228 06-OCT-2008 15:34:40 10 4
228 06-OCT-2008 15:34:50 10 5
228 06-OCT-2008 16:27:35 12 6
228 06-OCT-2008 16:28:30 10 7
228 06-OCT-2008 16:29:30 10 8
228 06-OCT-2008 16:30:29 10 9
228 06-OCT-2008 16:30:30 10 10
229 07-OCT-2008 12:59:40 10 1
229 07-OCT-2008 12:59:50 10 2
229 07-OCT-2008 13:00:05 10 3
229 07-OCT-2008 13:00:30 10 4
229 07-OCT-2008 13:00:49 10 5
229 07-OCT-2008 13:00:55 10 6
16 rows selected.
SQL> select id, x, y, rn, last_value(rn) over (partition by id order by x range between current row and numtodsinterval(59, 'second') following) last_rn from (select id, x, y, row_number() over (partition by id order by x) rn from t1) order by id, x
2 /
ID X Y RN LAST_RN
---------- -------------------- ---------- ---------- ----------
228 06-OCT-2008 15:33:34 21 1 3
228 06-OCT-2008 15:34:15 10 2 5
228 06-OCT-2008 15:34:30 10 3 5
228 06-OCT-2008 15:34:40 10 4 5
228 06-OCT-2008 15:34:50 10 5 5
228 06-OCT-2008 16:27:35 12 6 7
228 06-OCT-2008 16:28:30 10 7 7
228 06-OCT-2008 16:29:30 10 8 9
228 06-OCT-2008 16:30:29 10 9 10
228 06-OCT-2008 16:30:30 10 10 10
229 07-OCT-2008 12:59:40 10 1 4
229 07-OCT-2008 12:59:50 10 2 5
229 07-OCT-2008 13:00:05 10 3 6
229 07-OCT-2008 13:00:30 10 4 6
229 07-OCT-2008 13:00:49 10 5 6
229 07-OCT-2008 13:00:55 10 6 6
16 rows selected.
SQL> select id, x, rn, last_rn, y from
2 (select id, x, y, rn,
3 last_value(rn) over (partition by id order by x range between current row and numtodsinterval(59, 'second') following) last_rn from
4 (select id, x, y, row_number() over (partition by id order by x) rn from t1))
5 start with rn = 1
6 connect by (id, rn) = ((prior id, prior last_rn+1))
7 /
ID X RN LAST_RN Y
---------- -------------------- ---------- ---------- ----------
228 06-OCT-2008 15:33:34 1 3 21
228 06-OCT-2008 15:34:40 4 5 10
228 06-OCT-2008 16:27:35 6 7 12
228 06-OCT-2008 16:29:30 8 9 10
228 06-OCT-2008 16:30:30 10 10 10
229 07-OCT-2008 12:59:40 1 4 10
229 07-OCT-2008 13:00:49 5 6 10
7 rows selected.
SQL> select /*+ gather_plan_statistics */ id, x, rn, last_rn, y from
2 (select id, x, y, rn,
3 last_value(rn) over (partition by id order by x range between current row and numtodsinterval
(59, 'second') following) last_rn from
4 (select id, x, y, row_number() over (partition by id order by x) rn from t1))
5 start with rn = 1
6 connect by (id, rn) = ((prior id, prior last_rn+1))
7 /
ID X RN LAST_RN Y
---------- -------------------- ---------- ---------- ----------
228 06-OCT-2008 15:33:34 1 3 21
228 06-OCT-2008 15:34:40 4 5 10
228 06-OCT-2008 16:27:35 6 7 12
228 06-OCT-2008 16:29:30 8 9 10
228 06-OCT-2008 16:30:30 10 10 10
229 07-OCT-2008 12:59:40 1 4 10
229 07-OCT-2008 13:00:49 5 6 10
7 rows selected.
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST')) ;
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID 6mu4cggkf86g4, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ id, x, rn, last_rn, y from (select id, x, y, rn, last_value(r
(partition by id order by x range between current row and numtodsinterval(59, 'second') following) l
from (select id, x, y, row_number() over (partition by id order by x) rn from t1)) start with rn = 1
connect by (id, rn) = ((prior id, prior last_rn+1))
Plan hash value: 936035874
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Me
----------------------------------------------------------------------------------------------------
|* 1 | CONNECT BY WITH FILTERING| | 1 | | 7 |00:00:00.01 | 42 | 2048 | 2048 | 2048 (0)|
|* 2 | VIEW | | 1 | 16 | 2 |00:00:00.01 | 7 | | | |
| 3 | WINDOW SORT | | 1 | 16 | 16 |00:00:00.01 | 7 | 2048 | 2048 | 2048 (0)|
| 4 | VIEW | | 1 | 16 | 16 |00:00:00.01 | 7 | | | |
| 5 | WINDOW SORT | | 1 | 16 | 16 |00:00:00.01 | 7 | 2048 | 2048 | 2048 (0)|
| 6 | TABLE ACCESS FULL | T1 | 1 | 16 | 16 |00:00:00.01 | 7 | | | |
|* 7 | FILTER | | 5 | | 5 |00:00:00.01 | 35 | | | |
|* 8 | HASH JOIN | | 5 | | 62 |00:00:00.01 | 35 | 1269K| 1269K| 335K (0)|
| 9 | CONNECT BY PUMP | | 5 | | 7 |00:00:00.01 | 0 | | | |
| 10 | VIEW | | 5 | 16 | 80 |00:00:00.01 | 35 | | | |
| 11 | WINDOW SORT | | 5 | 16 | 80 |00:00:00.01 | 35 | 2048 | 2048 | 2048 (0)|
| 12 | VIEW | | 5 | 16 | 80 |00:00:00.01 | 35 | | | |
| 13 | WINDOW SORT | | 5 | 16 | 80 |00:00:00.01 | 35 | 2048 | 2048 | 2048 (0)|
| 14 | TABLE ACCESS FULL | T1 | 5 | 16 | 80 |00:00:00.01 | 35 | | | |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ID"=PRIOR NULL)
2 - filter("RN"=1)
7 - filter("RN"=PRIOR NULL+1)
8 - access("ID"=PRIOR NULL)
36 rows selected.
SQL> select x,y,id,trunc(abs(((x-min(x) over (partition by id))*24*60) - 0.0001)) grp from t1 order
by id,x;
X Y ID GRP
-------------------- ---------- ---------- ----------
06-OCT-2008 15:33:34 21 228 0
06-OCT-2008 15:34:15 10 228 0
06-OCT-2008 15:34:30 10 228 0
06-OCT-2008 15:34:40 10 228 1
06-OCT-2008 15:34:50 10 228 1
06-OCT-2008 16:27:35 12 228 54
06-OCT-2008 16:28:30 10 228 54
06-OCT-2008 16:29:30 10 228 55
06-OCT-2008 16:30:29 10 228 56
06-OCT-2008 16:30:30 10 228 56
07-OCT-2008 12:59:40 10 229 0
07-OCT-2008 12:59:50 10 229 0
07-OCT-2008 13:00:05 10 229 0
07-OCT-2008 13:00:30 10 229 0
07-OCT-2008 13:00:49 10 229 1
07-OCT-2008 13:00:55 10 229 1
16 rows selected.
SQL> select /*+ gather_plan_statistics */ min(x), min(y) keep (dense_rank first order by x) my from
(select x,y,id, trunc(abs((((x-min(x) over (partition by id))*24*60) - 0.0001))) grp from t1) group
by id,grp ;
MIN(X) MY
-------------------- ----------
06-OCT-2008 15:33:34 21
06-OCT-2008 15:34:40 10
06-OCT-2008 16:27:35 12
06-OCT-2008 16:29:30 10
06-OCT-2008 16:30:29 10
07-OCT-2008 12:59:40 10
07-OCT-2008 13:00:49 10
7 rows selected.
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST')) ;
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID 0htwzwfguzh0r, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ min(x), min(y) keep (dense_rank first order by x) my from (sele
x,y,id, trunc(abs((((x-min(x) over (partition by id))*24*60) - 0.0001))) grp from t1) group by id,gr
Plan hash value: 3529221598
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Us
----------------------------------------------------------------------------------------------------
| 1 | SORT GROUP BY | | 1 | 2 | 7 |00:00:00.01 | 7 | 2048 | 2048 | 204
| 2 | VIEW | | 1 | 16 | 16 |00:00:00.01 | 7 | | | |
| 3 | WINDOW SORT | | 1 | 16 | 16 |00:00:00.01 | 7 | 2048 | 2048 | 204
| 4 | TABLE ACCESS FULL| T1 | 1 | 16 | 16 |00:00:00.01 | 7 | | |
----------------------------------------------------------------------------------------------------
16 rows selected.
August 02, 2010 - 8:34 am UTC
I am trying to get first record within an interval of 1 minute for an ID from following resultset
X Y ID
-------------------- ---------- ----------
06-OCT-2008 15:33:34 21 228
06-OCT-2008 15:34:15 10 228
06-OCT-2008 15:34:30 10 228
06-OCT-2008 15:34:40 10 228
06-OCT-2008 15:34:50 10 228
06-OCT-2008 16:27:35 12 228
06-OCT-2008 16:28:30 10 228
06-OCT-2008 16:29:30 10 228
06-OCT-2008 16:30:29 10 228
06-OCT-2008 16:30:30 10 228
07-OCT-2008 12:59:40 10 229
07-OCT-2008 12:59:50 10 229
07-OCT-2008 13:00:05 10 229
07-OCT-2008 13:00:30 10 229
07-OCT-2008 13:00:49 10 229
07-OCT-2008 13:00:55 10 229
For e.g. the following set of records will be considered as a group
X Y ID
-------------------- ---------- ----------
06-OCT-2008 15:33:34 21 228
06-OCT-2008 15:34:15 10 228
06-OCT-2008 15:34:30 10 228
and should generate the output as
X Y ID
-------------------- ---------- ----------
06-OCT-2008 15:33:34 21 228
I stopped reading shortly after this - because - I never saw any logic or reasoning here.
why are those three records "the group"
why is that one record "the answer"
Re:Group by interval using alternate approach
Narendra, August 02, 2010 - 9:03 am UTC
Tom,
Allow me to explain again.
X Y ID
-------------------- ---------- ----------
06-OCT-2008 15:33:34 21 228
06-OCT-2008 15:34:15 10 228
06-OCT-2008 15:34:30 10 228
06-OCT-2008 15:34:40 10 228
06-OCT-2008 15:34:50 10 228
06-OCT-2008 16:27:35 12 228
06-OCT-2008 16:28:30 10 228
06-OCT-2008 16:29:30 10 228
06-OCT-2008 16:30:29 10 228
06-OCT-2008 16:30:30 10 228
07-OCT-2008 12:59:40 10 229
07-OCT-2008 12:59:50 10 229
07-OCT-2008 13:00:05 10 229
07-OCT-2008 13:00:30 10 229
07-OCT-2008 13:00:49 10 229
07-OCT-2008 13:00:55 10 229
In above resultset, the following records belong to same group because
06-OCT-2008 15:33:34 21 228
06-OCT-2008 15:34:15 10 228
06-OCT-2008 15:34:30 10 228
the dates "06-OCT-2008 15:34:15" and "06-OCT-2008 15:34:30" fall within one minute interval from "06-OCT-2008 15:33:34".
The next date in the resultset, "06-OCT-2008 15:34:40" is out of one minute interval from "06-OCT-2008 15:33:34" and so will start new group and act as a baseline for next group member(s), which will be "06-OCT-2008 15:34:50" since it falls within one minute from "06-OCT-2008 15:34:40" and so on.
Hope this helps.
August 02, 2010 - 10:00 am UTC
and the one row you get as output, what was that.
and should generate the output as
X Y ID
-------------------- ---------- ----------
06-OCT-2008 15:33:34 21 228
Re:Group by interval using alternate approach
Narendra, August 02, 2010 - 9:09 am UTC
Re:Group by interval using alternate approach
Narendra, August 02, 2010 - 10:08 am UTC
and the one row you get as output, what was that.
and should generate the output as
X Y ID
-------------------- ---------- ----------
06-OCT-2008 15:33:34 21 228
From each group, the row(s) where x = min(x) within the "group", should be included in the output.
August 02, 2010 - 10:44 am UTC
However, I was wondering if it would be possible to use the approach that you suggested at
Now that I see what you are after - the answer would be "no, I don't think that other approach would work for you"
the other approach works with discrete ranges that have fixed start/stop times (the division of a date works that way, I'm just rounding a date value down to some start point to figure out what group it belongs in).
You want to dynamically pick the start range for the times from group to group.
Yar, June 01, 2011 - 3:12 am UTC
CREATE TABLE GRTAB
(
ID NUMBER
INS_DATE DATE
)
/
INSERT INTO GRTAB VALUES(1,TO_DATE('2011-05-25 23:13:32','YYYY-MM-DD HH24:MI:SS'));
INSERT INTO GRTAB VALUES(1,TO_DATE('2011-05-26 02:14:19','YYYY-MM-DD HH24:MI:SS'));
INSERT INTO GRTAB VALUES(1,TO_DATE('2011-05-26 04:15:30','YYYY-MM-DD HH24:MI:SS'));
INSERT INTO GRTAB VALUES(1,TO_DATE('2011-05-26 05:14:31','YYYY-MM-DD HH24:MI:SS'));
INSERT INTO GRTAB VALUES(1,TO_DATE('2011-05-26 07:15:19','YYYY-MM-DD HH24:MI:SS'));
INSERT INTO GRTAB VALUES(1,TO_DATE('2011-05-26 10:15:50','YYYY-MM-DD HH24:MI:SS'));
INSERT INTO GRTAB VALUES(1,TO_DATE('2011-05-26 13:44:46','YYYY-MM-DD HH24:MI:SS'));
INSERT INTO GRTAB VALUES(1,TO_DATE('2011-05-26 15:14:54','YYYY-MM-DD HH24:MI:SS'));
INSERT INTO GRTAB VALUES(1,TO_DATE('2011-05-26 16:15:01','YYYY-MM-DD HH24:MI:SS'));
INSERT INTO GRTAB VALUES(1,TO_DATE('2011-05-26 17:14:38','YYYY-MM-DD HH24:MI:SS'));
INSERT INTO GRTAB VALUES(1,TO_DATE('2011-05-26 19:15:36','YYYY-MM-DD HH24:MI:SS'));
INSERT INTO GRTAB VALUES(2,TO_DATE('2011-05-30 11:30:17','YYYY-MM-DD HH24:MI:SS'));
INSERT INTO GRTAB VALUES(2,TO_DATE('2011-05-30 14:30:22','YYYY-MM-DD HH24:MI:SS'));
I'd like to group INS_DATE every 6 hours from the first INS_DATE of every ID. And on each group, set the BASE_DATE on the first ins_date per group, like:
ID INS_DATE BASE_DATE
-- ------------------- -------------------
1 2011-05-25 23:13:32 2011-05-25 23:13:32
1 2011-05-26 02:14:19 2011-05-25 23:13:32
1 2011-05-26 04:15:30 2011-05-25 23:13:32
1 2011-05-26 05:14:31 2011-05-26 05:14:31
1 2011-05-26 07:15:19 2011-05-26 05:14:31
1 2011-05-26 10:15:50 2011-05-26 05:14:31
1 2011-05-26 13:44:46 2011-05-26 13:44:46
1 2011-05-26 15:14:54 2011-05-26 13:44:46
1 2011-05-26 16:15:01 2011-05-26 13:44:46
1 2011-05-26 17:14:38 2011-05-26 13:44:46
1 2011-05-26 19:15:36 2011-05-26 13:44:46
2 2011-05-30 11:30:17 2011-05-30 11:30:17
2 2011-05-30 14:30:22 2011-05-30 11:30:17
been struggling the whole time doing this, any help is appreciated
June 01, 2011 - 2:28 pm UTC
ops$tkyte%ORA11GR2> select id, ins_date, fv, hours, grp, first_value( ins_date ) over (partition by id, grp order by ins_date) new_fv
2 from (
3 select id, ins_date, fv, hours, trunc(hours/6) grp
4 from (
5 select id, ins_date, fv, trunc((ins_date-fv)*24) hours
6 from (
7 select id, ins_date, first_value(ins_date) over (partition by id order by ins_date) fv
8 from grtab
9 )
10 )
11 )
12 /
ID INS_DATE FV HOURS GRP NEW_FV
---------- -------------------- -------------------- ---------- ---------- --------------------
1 25-may-2011 23:13:32 25-may-2011 23:13:32 0 0 25-may-2011 23:13:32
1 26-may-2011 02:14:19 25-may-2011 23:13:32 3 0 25-may-2011 23:13:32
1 26-may-2011 04:15:30 25-may-2011 23:13:32 5 0 25-may-2011 23:13:32
1 26-may-2011 05:14:31 25-may-2011 23:13:32 6 1 26-may-2011 05:14:31
1 26-may-2011 07:15:19 25-may-2011 23:13:32 8 1 26-may-2011 05:14:31
1 26-may-2011 10:15:50 25-may-2011 23:13:32 11 1 26-may-2011 05:14:31
1 26-may-2011 13:44:46 25-may-2011 23:13:32 14 2 26-may-2011 13:44:46
1 26-may-2011 15:14:54 25-may-2011 23:13:32 16 2 26-may-2011 13:44:46
1 26-may-2011 16:15:01 25-may-2011 23:13:32 17 2 26-may-2011 13:44:46
1 26-may-2011 17:14:38 25-may-2011 23:13:32 18 3 26-may-2011 17:14:38
1 26-may-2011 19:15:36 25-may-2011 23:13:32 20 3 26-may-2011 17:14:38
2 30-may-2011 11:30:17 30-may-2011 11:30:17 0 0 30-may-2011 11:30:17
2 30-may-2011 14:30:22 30-may-2011 11:30:17 3 0 30-may-2011 11:30:17
13 rows selected.
I don't agree with your sample output - you have things that are 18 and 20 hours off in the same group as something 17 hours off
A reader, June 01, 2011 - 8:20 pm UTC
Hi Tom,
Thank you very much for the feedback.
Allow me to rephrase:
We need to get the reference date starting from the first row's date value of each group (by ID). Succeeding rows within the 6 hour time will get the same base date. Once a row whose value is on the 6th hour after that base date, the base date is changed and is set to the current row's date value:
ID INS_DATE BASE_DATE
-- ------------------- -------------------
1 2011-05-25 23:13:32 2011-05-25 23:13:32 <-1st row,gets date
1 2011-05-26 02:14:19 2011-05-25 23:13:32 <- less than 6hr
1 2011-05-26 04:15:30 2011-05-25 23:13:32 <- less than 6hr
1 2011-05-26 05:14:31 2011-05-26 05:14:31 <- INS_DATE is >= preceding base_date; base_date is re-set to current row's INS_DATE
1 2011-05-26 07:15:19 2011-05-26 05:14:31 <- less than 6hr
1 2011-05-26 10:15:50 2011-05-26 05:14:31 <- less than 6hr
1 2011-05-26 13:44:46 2011-05-26 13:44:46 <- INS_DATE is >= preceding base_date; base_date is re-set to current row's INS_DATE
1 2011-05-26 15:14:54 2011-05-26 13:44:46 <- less than 6hr
1 2011-05-26 16:15:01 2011-05-26 13:44:46 <- less than 6hr
1 2011-05-26 17:14:38 2011-05-26 13:44:46 <- less than 6hr
1 2011-05-26 19:15:36 2011-05-26 13:44:46 <- less than 6hr
2 2011-05-30 11:30:17 2011-05-30 11:30:17 <- First record of the new ID, takes INS_DATE as BASE_DATE
2 2011-05-30 14:30:22 2011-05-30 11:30:17 <- less than 6hr
June 02, 2011 - 8:49 am UTC
I don't see offhand a way to do that with analytics.
@Yar re: GRTAB
Stew Ashton, June 03, 2011 - 7:36 am UTC
You can do this with a variant of the technique I proposed above to Rohit. It uses analytics, hierarchical query and a self join. Beware of the bug he ran into on V11.1.
with Q1 as (
SELECT A.*
,LAST_VALUE(RN) OVER(PARTITION BY ID ORDER BY INS_DATE
RANGE BETWEEN CURRENT ROW AND NUMTODSINTERVAL(6, 'hour') FOLLOWING) LAST_RN
FROM (
SELECT ID, INS_DATE, ROW_NUMBER() OVER(PARTITION BY ID ORDER BY INS_DATE) RN
FROM GRTAB
) A
), q2 as (
select * from q1
START WITH RN = 1
CONNECT BY (ID, RN) = ((PRIOR ID, PRIOR LAST_RN+1))
)
SELECT q1.id, q1.ins_date, q2.ins_date base_date FROM Q1, Q2
WHERE Q1.ID = Q2.ID AND Q1.RN BETWEEN Q2.RN AND Q2.LAST_RN
ORDER BY Q1.ID, Q1.INS_DATE;
Induction Solutions
Brendan, June 04, 2011 - 7:09 am UTC
Interesting question and nice solution from Stew. Before reading Stew's solution, I had got as far as seeing that the problem could be viewed as a simple induction. If c, b are the current date, base date, and p is the prior base date then:
b = c - p > 0.25 ? c : p
The problem then is that the obvious function to use to get the prior value, Lag, won't work owing to the recursion. On seeing Stew's use of Row_Number and recursion by tree-walk, I realised an induction solution could be implemented by similar techniques, although I substituted recurive subquery factoring for the tree-walk. I then thought that this might be a good case for the Model clause. Here are the two alternative solutions:
Recursive Subquery Factor
WITH ind AS (
SELECT id, ins_date, Row_Number() OVER (PARTITION BY id ORDER BY ins_date) rn
FROM grtab
), rsq (id, rn, ins_date, base_date) AS (
SELECT id, rn, ins_date, ins_date base_date
FROM ind
WHERE rn = 1
UNION ALL
SELECT ind.id,
ind.rn,
ind.ins_date,
CASE WHEN ind.ins_date - rsq.base_date > 0.25 THEN ind.ins_date ELSE rsq.base_date END
FROM ind
JOIN rsq
ON rsq.rn = ind.rn - 1
AND rsq.id = ind.id
)
SELECT rsq.id, rsq.ins_date, rsq.base_date
FROM rsq
ORDER BY rsq.id, rsq.rn
/
Model
SELECT id, ins_date, base_date
FROM grtab
MODEL
PARTITION BY (id)
DIMENSION BY (Row_Number() OVER (PARTITION BY id ORDER BY ins_date) rn)
MEASURES (ins_date, ins_date base_date)
RULES (
base_date[rn = 1] = ins_date[cv()],
base_date[rn > 1] = CASE WHEN ins_date[cv()] - base_date[cv()-1] > 0.25 THEN ins_date[cv()] ELSE base_date[cv()-1] END
)
ORDER BY 1, 2
/
ID INS_DATE BASE_DATE
---------- ------------------- -------------------
1 2011-05-25 23:13:32 2011-05-25 23:13:32
2011-05-26 02:14:19 2011-05-25 23:13:32
2011-05-26 04:15:30 2011-05-25 23:13:32
2011-05-26 05:14:31 2011-05-26 05:14:31
2011-05-26 07:15:19 2011-05-26 05:14:31
2011-05-26 10:15:50 2011-05-26 05:14:31
2011-05-26 13:44:46 2011-05-26 13:44:46
2011-05-26 15:14:54 2011-05-26 13:44:46
2011-05-26 16:15:01 2011-05-26 13:44:46
2011-05-26 17:14:38 2011-05-26 13:44:46
2011-05-26 19:15:36 2011-05-26 13:44:46
2 2011-05-30 11:30:17 2011-05-30 11:30:17
2011-05-30 14:30:22 2011-05-30 11:30:17
13 rows selected.
@Brendan re:Induction solutions
Stew Ashton, June 05, 2011 - 2:49 am UTC
Brendan, thanks for the kind words. I dislike my solution because it's not straightforward and the execution plan uses a temporary table with physical reads and writes. So I used Tom's RUNSTATS package and DBMS_XPLAN.DISPLAY_CURSOR to compare your solutions with mine.
In tests with 212K rows, both your queries avoid the temporary table but your recursive subquery factoring solution does lots of logical I/O and is a bit slower than mine. Your MODEL solution is limpid, fast and does only one full scan of the table.
I have tried the MODEL clause in the past and my efforts were complex and slow, so I had given up on it. Thanks for providing a clear, efficient use case.
June 06, 2011 - 9:31 am UTC
Thanks to both of you for going above and beyond...
I'll have to learn that model clause myself someday :)
limits of analytics
Sokrates, June 06, 2011 - 2:41 pm UTC
very interesting problem !
quite easy to solve when you think "procedural" ( so a pipelined table function solution would do the trick quite easy ) but shows the limits of analytics.
would "Comparative Window Functions" (
http://tkyte.blogspot.com/2009_11_01_archive.html ) be of any use here ?
P.S. on induction solutions
Stew Ashton, June 06, 2011 - 3:39 pm UTC
Thank you!
Yar, June 29, 2011 - 1:39 pm UTC
Thanks Tom, Thanks Brendan! Just what i need.
Date in previous row
Tracy, September 08, 2011 - 9:09 am UTC
SQL > create table t3 (id number, starttime date, usernumber number, description varchar2(20));
Table created.
SQL > insert into t3 values (1, sysdate,1,'Test');
SQL > insert into t3 values (2, sysdate,1,'Test');
SQL > insert into t3 values (3, sysdate,1,'Test');
SQL > insert into t3 values (4, sysdate,1,'Test');
SQL > insert into t3 values (5 sysdate,1,'Test 2');
SQL > insert into t3 values (6, sysdate,1,'Test 3');
SQL > insert into t3 values (7, sysdate,1,'Test 4');
SQL > insert into t3 values (8, sysdate,2,'Test 4');
SQL > insert into t3 values (9, sysdate,2,'Test 4');
SQL > insert into t3 values (10, sysdate,3,'Test 4');
SQL > insert into t3 values (11, sysdate,3,'Test 4');
SQL > insert into t3 values (12, sysdate,1,'Test 4');
SQL > select id, to_char(starttime,'yyyy-mm-dd hh24:mi:ss'), usernumber, description from t3 order by id;
ID TO_CHAR(STARTTIME,' USERNUMBER DESCRIPTION
---------- ------------------- ---------- --------------------
1 2011-09-08 15:50:48 1 Test
2 2011-09-08 15:51:01 1 Test
3 2011-09-08 15:51:28 1 Test
4 2011-09-08 15:51:32 1 Test
5 2011-09-08 15:51:57 1 Test 2
6 2011-09-08 15:52:11 1 Test 3
7 2011-09-08 15:52:30 1 Test 4
8 2011-09-08 15:52:31 2 Test 4
9 2011-09-08 15:52:50 2 Test 4
10 2011-09-08 15:52:52 3 Test 4
11 2011-09-08 15:52:56 3 Test 4
12 2011-09-08 15:53:23 1 Test 4
12 rows selected.
I want a query to return all rows grouped by usernumber and description but only where the starttime is within 5 seconds of the previous starttime.
Given the above data I would like to return rows with Id=3,4,10,11
September 08, 2011 - 5:46 pm UTC
but this is *almost exactly* what the original question above is???? You mean to say you cannot change some column names and table names and apply the TECHNIQUE to your problem?
you need to add a partition by clause to 'group' things (break things apart, partition things by) usernumber and description - but short of that - it is the pretty much a variation on a theme here...
why does your test case (which is missing a comma by the way on one of the inserts) use SYSDATE. You know, I cannot get your example that way. Try again if you want me to do the simple changes to the above query....
tracy, September 09, 2011 - 3:51 am UTC
I realise that it is very similar to the original question (that's how I ended up here), but nevertheless I couldn't seem to get it quite right. Here are the inserts with dates specified:
SQL> drop table t3;
SQL> create table t3 (id number, starttime date, usernumber number, description varchar2(20));
SQL>
SQL> insert into t3 values (1, to_date( '08-sep-2011 15:50:48', 'dd-mon-yyyy hh24:mi:ss' ) , 1 ,'Test');
SQL> insert into t3 values (2, to_date( '08-sep-2011 15:51:01', 'dd-mon-yyyy hh24:mi:ss' ) , 1 ,'Test');
SQL> insert into t3 values (3, to_date( '08-sep-2011 15:51:28', 'dd-mon-yyyy hh24:mi:ss' ) , 1 ,'Test');
SQL> insert into t3 values (4, to_date( '08-sep-2011 15:51:32', 'dd-mon-yyyy hh24:mi:ss' ) , 1 ,'Test');
SQL> insert into t3 values (5, to_date( '08-sep-2011 15:51:57', 'dd-mon-yyyy hh24:mi:ss' ) , 1 ,'Test 2');
SQL> insert into t3 values (6, to_date( '08-sep-2011 15:52:11', 'dd-mon-yyyy hh24:mi:ss' ) , 1 ,'Test 3');
SQL> insert into t3 values (7, to_date( '08-sep-2011 15:52:30', 'dd-mon-yyyy hh24:mi:ss' ) , 1 ,'Test 4');
SQL> insert into t3 values (8, to_date( '08-sep-2011 15:52:31', 'dd-mon-yyyy hh24:mi:ss' ) , 2 ,'Test 4');
SQL> insert into t3 values (9, to_date( '08-sep-2011 15:52:50', 'dd-mon-yyyy hh24:mi:ss' ) , 2 ,'Test 4');
SQL> insert into t3 values (10, to_date( '08-sep-2011 15:52:52', 'dd-mon-yyyy hh24:mi:ss' ) , 3 ,'Test 4');
SQL> insert into t3 values (11, to_date( '08-sep-2011 15:52:56', 'dd-mon-yyyy hh24:mi:ss' ) , 3 ,'Test 4');
SQL> insert into t3 values (12, to_date( '08-sep-2011 15:53:23', 'dd-mon-yyyy hh24:mi:ss' ) , 1 ,'Test 4');
SQL>
SQL> select id, to_char(starttime,'yyyy-mm-dd hh24:mi:ss'), usernumber, description from t3 order by id;
ID TO_CHAR(STARTTIME,' USERNUMBER DESCRIPTION
---------- ------------------- ---------- -----------------------------------------------------------------
1 2011-09-08 15:50:48 1 Test
2 2011-09-08 15:51:01 1 Test
3 2011-09-08 15:51:28 1 Test
4 2011-09-08 15:51:32 1 Test
5 2011-09-08 15:51:57 1 Test 2
6 2011-09-08 15:52:11 1 Test 3
7 2011-09-08 15:52:30 1 Test 4
8 2011-09-08 15:52:31 2 Test 4
9 2011-09-08 15:52:50 2 Test 4
10 2011-09-08 15:52:52 3 Test 4
11 2011-09-08 15:52:56 3 Test 4
12 2011-09-08 15:53:23 1 Test 4
Crikey
tracy, September 09, 2011 - 6:01 am UTC
Bazoiks !! This seems to work. I don't know much about these analytic whatsits but I tried a few different things and this one produces the right results for the above data:
select usernumber,description,starttime from (
select usernumber , description, starttime,
lead(starttime) over ( partition by usernumber, description order by starttime) le,
lag (starttime) over ( partition by usernumber, description order by starttime) la
from t3 order by starttime
)
where le-starttime < 5/24/60/60 or starttime-la < 5/24/60/60
USERNUMBER DESCRIPT TO_CHAR(STARTTIME,'
---------- -------- -------------------
1 Test 2011-09-08 15:51:28
1 Test 2011-09-08 15:51:32
3 Test 4 2011-09-08 15:52:52
3 Test 4 2011-09-08 15:52:56
Is this just a fluke, or is it correct?
September 09, 2011 - 7:28 am UTC
you have assigned to each row the prior and next starttimes within groups of unique usernumbers and descriptions after sorting by starttime.
You keep every record such that the prior or next record is within five seconds of it.
If that is what you wanted, you have it.
It is best to try to understand the technique - so you can answer whether or not it is getting the right answer.
Resetting partitioned running totals...
J, March 20, 2012 - 2:32 pm UTC
Hello! Hope all's well.
Version: Oracle 11.2.0.3
Scenario: We have a table of events with two columns: CID contains the customer id, and EDT contains the event date. We need to assign [CID+EDT] combinations to five-day groups.
Rule: If an event occurs after the fifth day from the start of a group, then it becomes the new "anchor" for the start of another five-day group. It's this "becomes a new anchor" part that we can't quite figure out.
Idea: partition by CID, order by EDT, identify the prior EDT
and determine the date difference. The first row of each partition becomes the anchor for the first group. Then if the sum of subsequent (EDT-prior EDT) rows >= 5, then that row becomes the next anchor, and the counter is reset to zero. Perhaps it's really close, but just can't make the last leap of how to reset the counter after an anchor is identified. Any advice would be most welcome, (thanks)^23!
drop table t;
create table t (cid number, edt date);
insert into t (cid,edt) values (1,trunc(sysdate)); -- s/b anchor 1
insert into t (cid,edt) values (1,trunc(sysdate)+5); -- 2
insert into t (cid,edt) values (1,trunc(sysdate)+6); --
insert into t (cid,edt) values (1,trunc(sysdate)+6); --
insert into t (cid,edt) values (1,trunc(sysdate)+7); --
insert into t (cid,edt) values (1,trunc(sysdate)+10); -- 6
insert into t (cid,edt) values (1,trunc(sysdate)+18); -- 7
insert into t (cid,edt) values (1,trunc(sysdate)+21); --
insert into t (cid,edt) values (1,trunc(sysdate)+22); --
insert into t (cid,edt) values (1,trunc(sysdate)+23); -- 10
insert into t (cid,edt) values (1,trunc(sysdate)+24); --
insert into t (cid,edt) values (1,trunc(sysdate)+25); --
insert into t (cid,edt) values (1,trunc(sysdate)+27); --
insert into t (cid,edt) values (1,trunc(sysdate)+28); -- 14
insert into t (cid,edt) values (1,trunc(sysdate)+29); --
insert into t (cid,edt) values (1,trunc(sysdate)+33); -- 16
insert into t (cid,edt) values (1,trunc(sysdate)+36); --
commit;
select cid, rn
,to_char(edt,'mm/dd/yy') edt
,to_char(p_edt,'mm/dd/yy') p_edt
,edt-p_edt dt_diff
,case -- and here's the sticky bit
when rn = 1 then rn
when edt-p_edt >= 5 then rn
else null -- ?some analytic case stmt?
end anchor
from
(select cid
,row_number () over
(partition by cid order by edt) rn
,edt
,first_value(edt) over
(partition by cid order by edt
rows 1 preceding) p_edt
from t)
order by cid, rn;
CID RN EDT P_EDT DT_DIFF ANCHOR
--- -- -------- -------- ------- ------
1 1 03/20/12 03/20/12 0 1
1 2 03/25/12 03/20/12 5 2
1 3 03/26/12 03/25/12 1
1 4 03/26/12 03/26/12 0
1 5 03/27/12 03/26/12 1
1 6 03/30/12 03/27/12 3 s/b 6 (1+0+1+3) >= 5
1 7 04/07/12 03/30/12 8 7
1 8 04/10/12 04/07/12 3
1 9 04/11/12 04/10/12 1
1 10 04/12/12 04/11/12 1 s/b 10 (3+1+1) >= 5
1 11 04/13/12 04/12/12 1
1 12 04/14/12 04/13/12 1
1 13 04/16/12 04/14/12 2
1 14 04/17/12 04/16/12 1 s/b 14 (1+1+2+1) >= 5
1 15 04/18/12 04/17/12 1
1 16 04/22/12 04/18/12 4 s/b 16 (1+4) >= 5
1 17 04/25/12 04/22/12 3
17 rows selected.
Resetting partitioned running totals - 2
J, April 11, 2012 - 1:27 pm UTC
Hello Tom.
Hmmm... Hope I didn't post a goofball or painfully inane question here. Apologies if that's the case. Has something along this line (resetting sums) in a query already been addressed, or was the question unclear? Or perhaps we need to completely rethink our approach. In any case, thanks for the fantastic site and have a great week!
Best Regards,
- J
April 11, 2012 - 2:14 pm UTC
neither goofball nor inane.
Just something that analytics are not going to be able to do. This 'reset' thing is non-trivial.
see
http://www.oracle.com/technetwork/issue-archive/2012/12-mar/o22asktom-1518271.html and read the "grouping ranges" part... Based on that, you could write a plsql pipelined function - or use recursive SQL:
ops$tkyte%ORA11GR2> with data1
2 as
3 (select row_number() over (partition by cid order by edt) rn, t.*
4 from t
5 ),
6 data2 (rn, cid, edt, diff, flag )
7 as
8 (select rn, cid, edt, 0 diff, 1 flag
9 from data1
10 where rn = 1
11 UNION ALL
12 select data1.rn, data1.cid, data1.edt,
13 case when data2.diff + (data1.edt-data2.edt) >= 5
14 then 0
15 else data2.diff + (data1.edt-data2.edt)
16 end diff,
17 case when data2.diff + (data1.edt-data2.edt) >= 5
18 then data2.flag + 1
19 else data2.flag
20 end flag
21 from data1, data2
22 where data1.rn = data2.rn+1
23 and data1.cid = data2.cid
24 )
25 select *
26 from data2
27 order by cid, edt
28 /
RN CID EDT DIFF FLAG
---------- ---------- --------- ---------- ----------
1 1 11-APR-12 0 1
2 1 16-APR-12 0 2
3 1 17-APR-12 1 2
4 1 17-APR-12 1 2
5 1 18-APR-12 2 2
6 1 21-APR-12 0 3
7 1 29-APR-12 0 4
8 1 02-MAY-12 3 4
9 1 03-MAY-12 4 4
10 1 04-MAY-12 0 5
11 1 05-MAY-12 1 5
12 1 06-MAY-12 2 5
13 1 08-MAY-12 4 5
14 1 09-MAY-12 0 6
15 1 10-MAY-12 1 6
16 1 14-MAY-12 0 7
17 1 17-MAY-12 3 7
1 2 11-APR-12 0 1
2 2 16-APR-12 0 2
3 2 17-APR-12 1 2
4 2 17-APR-12 1 2
5 2 18-APR-12 2 2
6 2 21-APR-12 0 3
7 2 29-APR-12 0 4
8 2 02-MAY-12 3 4
9 2 03-MAY-12 4 4
10 2 04-MAY-12 0 5
11 2 05-MAY-12 1 5
12 2 06-MAY-12 2 5
13 2 08-MAY-12 4 5
14 2 09-MAY-12 0 6
15 2 10-MAY-12 1 6
16 2 14-MAY-12 0 7
17 2 17-MAY-12 3 7
34 rows selected.
(I made your example more "interesting" by repeating the data for cid=1 as cid=2)
Or you could work out an example using the model clause as demonstrated there as well...
Read the article though as it talks about a 'con' of the above approach if there is a large set of data.
Resetting partitioned running totals - 3
J, April 11, 2012 - 11:23 pm UTC
Thanks so much! Oh, there's some interesting stuff tucked in there (it's currently 9:25pm out here and it's been a crazy-long day), so I'll tackle it with a fresh set of eyes in the morning. A very-very different approach indeed (you should write a book on "Approaches in Query Design for Oracle" - I'd buy one in a heartbeat). We'll also follow up on that link, for sure. Cheers!
Similar problem
Joanna, July 19, 2018 - 12:20 pm UTC
Hi Tom,
I have very similar problem to described above by LB.
I have records like this as well:
Time Amount
11/22/2003 12:22:01 100
11/22/2003 12:22:03 200
11/22/2003 12:22:04 300
11/22/2003 12:22:45 100
11/22/2003 12:22:46 200
11/22/2003 12:23:12 100
11/22/2003 12:23:12 200
What I need to do is sum the amounts where the time of the records is within 2 (or let it be 3) seconds of each other.
The difference is for me it doesn't matter if there are 2 or 10 rows i just need to have groups within 2 seconds.
So for me the data like this:
11/22/2003 12:22:03 200
11/22/2003 12:22:04 200
11/22/2003 12:22:05 200
11/22/2003 12:22:06 200
11/22/2003 12:22:07 200
11/22/2003 12:22:08 200
11/22/2003 12:22:09 200
should be 4 different groups. So I cannot use lag or lead - I need to lock first row, then check if there are some within 2 seconds if so then put into group if not create new group and do similar things.
Would you please point me in the right direction?
July 20, 2018 - 10:40 am UTC
Sorry - I'm lost - that sounds exactly what the first requirement was, no? We are assigning new group numbers based on the delta of seconds.
Put your data into a table, run our initial answer and see how you go.
If you're still stuck, pass on the same inserts you used and we'll work on it here
3 seconds?
Aps Reine, November 05, 2018 - 4:25 pm UTC
Can you explain Tom's original answer from 14+ years ago:
ops$tkyte@ORA920PC>
ops$tkyte@ORA920PC> select min(x), max(x), sum(y)
2 from (
3 select x, y, max(rn) over (order by x) max_rn
4 from (
5 select x, y,
6 lag(x) over (order by x),
7 case when abs(lag(x) over (order by x) - x) > 3/24/60/60 then row_number() over (order by x)
8 when row_number() over (order by x) = 1 then 1
9 else null
10 end rn
11 from t
12 )
13 )
14 group by max_rn
15 /
MIN(X) MAX(X) SUM(Y)
-------- -------- ----------
12:22:03 12:22:12 740
12:22:18 12:22:20 226
12:22:36 12:22:47 550
why there is a time difference if 9 seconds in the first row returned, 2 seconds in the second, and 11 in the 3rd row - when the condition was that the grouping would be by 3 second intervals with case when abs(lag(x) over (order by x) - x) > 3/24/60/60
I would expect the result set to be:
MIN(X) MAX(X) SUM(Y)
-------- -------- ----------
12:22:03 12:22:06 273
12:22:18 12:22:21 289
12:22:36 12:22:39 199
Basically a sum provided for every 3 seconds of grouping.
thanks,
Aps R.
November 06, 2018 - 2:22 am UTC
This is an example of where precision in the definition is so important.
"3 seconds" can be interpreted as:
1) Records fall into a single group if record "n+1" is less than 3 seconds later than record "n". So in this instance:
rec1: 09:00:00
rec2: 09:00:01
rec3: 09:00:02
rec4: 09:00:03
rec5: 09:00:04
rec6: 09:00:05
rec7: 09:00:06
all are in a *single* group.
or
2) Records fall into a single group if record "n" is less than 3 seconds later than record "0", where record "0" is the first record for this logical group. So in this instance:
rec1: 09:00:00 start group1
rec2: 09:00:01 group1
rec3: 09:00:02 group1
rec4: 09:00:03 start group2 (because rec "0" is the 'rec1' above)
rec5: 09:00:04 group2
rec6: 09:00:05 group2
rec7: 09:00:06
etc etc
Option 2
Aps, November 06, 2018 - 10:18 am UTC
Thanks for that explanation, I see the difference now.
Looks like Tom's original post addressed Option 1 with a single group if record "n+1" is less than 3 seconds later than record "n".
I'm struggling to figure out how to make Option 2 work within the original example. Could you provide an example of how Option 2 would look?
Thanks,
Aps
November 06, 2018 - 10:39 am UTC
The easiest way is with match_recognize:
select *
from t
match_recognize (
order by x
measures
match_number() as grp
all rows per match
pattern ( within_3s+ )
define
within_3s as x < first ( x ) + interval '3' second
);
X GRP Y
01-NOV-2018 12:22:03 1 25
01-NOV-2018 12:22:04 1 64
01-NOV-2018 12:22:05 1 79
01-NOV-2018 12:22:06 2 94
01-NOV-2018 12:22:07 2 52
01-NOV-2018 12:22:08 2 77
01-NOV-2018 12:22:09 3 80
01-NOV-2018 12:22:10 3 8
...
You need to be on at least 12c to use this. Read more about it at:
https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:9533477800346658909
Aps, November 06, 2018 - 11:56 am UTC
Ah, that is pretty a pretty cool feature.
Unfortunately we're still using an 11g database so I have to try to get it working based on Tom's original query.
Any hints using that original query how to restrict the window to 3 seconds as you explained in Option 2?
Many thanks,
Aps
November 08, 2018 - 3:58 pm UTC
You can do this using recursive with. Which looks something like:
with rws as (
select t.*, row_number () over ( order by x ) rn
from t
), within_3s (
x, y, grp_start, grp, rn
) as (
select x, y, x, 1, rn from rws
where rn = 1
union all
select r.x, r.y,
case
when r.x < grp_start + interval '3' second then
grp_start
else
r.x
end,
case
when r.x < grp_start + interval '3' second then
w.grp
else
w.grp + 1
end, r.rn
from within_3s w
join rws r
on r.rn = w.rn + 1
)
select *
from within_3s;
X Y GRP_START GRP RN
01-NOV-2018 12:22:03 21 01-NOV-2018 12:22:03 1 1
01-NOV-2018 12:22:04 9 01-NOV-2018 12:22:03 1 2
01-NOV-2018 12:22:05 47 01-NOV-2018 12:22:03 1 3
01-NOV-2018 12:22:06 70 01-NOV-2018 12:22:06 2 4
01-NOV-2018 12:22:07 23 01-NOV-2018 12:22:06 2 5
01-NOV-2018 12:22:08 78 01-NOV-2018 12:22:06 2 6
01-NOV-2018 12:22:09 31 01-NOV-2018 12:22:09 3 7
...
Brilliant! Thank you
A reader, November 12, 2018 - 3:03 pm UTC
this is fantastic, thank you. I think I'm headed in the right direction now.
November 12, 2018 - 3:16 pm UTC
Great
Great answer but ...
Krzysztof Żwirek, May 06, 2022 - 11:51 am UTC
Thank You for Your answer, it was very helpful, but one question though.
I have just used your solution to split time series into groups, every group should consists of records with 10 (3 or 5, 8 or ...) minutes "window".
The table:
CREATE TABLE DANE"("TIME" DATE, "VALUE" NUMBER);
The data:
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 21:01:57','DD/MM/YYYY HH24:MI:SS'),1);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 21:02:03','DD/MM/YYYY HH24:MI:SS'),2);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 21:02:09','DD/MM/YYYY HH24:MI:SS'),3);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 21:02:15','DD/MM/YYYY HH24:MI:SS'),4);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 21:02:21','DD/MM/YYYY HH24:MI:SS'),5);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 21:23:23','DD/MM/YYYY HH24:MI:SS'),6);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 21:23:29','DD/MM/YYYY HH24:MI:SS'),7);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 21:23:35','DD/MM/YYYY HH24:MI:SS'),8);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 21:23:41','DD/MM/YYYY HH24:MI:SS'),9);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 21:23:47','DD/MM/YYYY HH24:MI:SS'),10);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 21:23:53','DD/MM/YYYY HH24:MI:SS'),11);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 21:35:10','DD/MM/YYYY HH24:MI:SS'),12);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 21:35:16','DD/MM/YYYY HH24:MI:SS'),13);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 21:35:22','DD/MM/YYYY HH24:MI:SS'),14);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 21:35:28','DD/MM/YYYY HH24:MI:SS'),15);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 21:35:34','DD/MM/YYYY HH24:MI:SS'),16);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 21:35:40','DD/MM/YYYY HH24:MI:SS'),17);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 21:35:46','DD/MM/YYYY HH24:MI:SS'),18);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 21:39:09','DD/MM/YYYY HH24:MI:SS'),19);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 22:06:11','DD/MM/YYYY HH24:MI:SS'),20);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 22:06:17','DD/MM/YYYY HH24:MI:SS'),21);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 22:06:23','DD/MM/YYYY HH24:MI:SS'),22);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 22:06:29','DD/MM/YYYY HH24:MI:SS'),23);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 22:06:35','DD/MM/YYYY HH24:MI:SS'),24);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 22:06:41','DD/MM/YYYY HH24:MI:SS'),25);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 22:19:50','DD/MM/YYYY HH24:MI:SS'),26);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 22:26:12','DD/MM/YYYY HH24:MI:SS'),27);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 22:26:30','DD/MM/YYYY HH24:MI:SS'),28);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 22:29:50','DD/MM/YYYY HH24:MI:SS'),29);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 22:30:22','DD/MM/YYYY HH24:MI:SS'),30);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 22:33:29','DD/MM/YYYY HH24:MI:SS'),31);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 22:33:35','DD/MM/YYYY HH24:MI:SS'),32);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 23:08:46','DD/MM/YYYY HH24:MI:SS'),33);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 23:08:52','DD/MM/YYYY HH24:MI:SS'),34);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 23:12:28','DD/MM/YYYY HH24:MI:SS'),35);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 23:12:34','DD/MM/YYYY HH24:MI:SS'),36);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 23:31:07','DD/MM/YYYY HH24:MI:SS'),37);
Insert into DANE (TIME,VALUE) values (to_date('05/04/2022 23:31:13','DD/MM/YYYY HH24:MI:SS'),38);
Insert into DANE (TIME,VALUE) values (to_date('06/04/2022 00:11:39','DD/MM/YYYY HH24:MI:SS'),39);
Insert into DANE (TIME,VALUE) values (to_date('06/04/2022 00:11:45','DD/MM/YYYY HH24:MI:SS'),40);
Insert into DANE (TIME,VALUE) values (to_date('06/04/2022 00:11:51','DD/MM/YYYY HH24:MI:SS'),41);
Insert into DANE (TIME,VALUE) values (to_date('06/04/2022 00:11:57','DD/MM/YYYY HH24:MI:SS'),42);
Insert into DANE (TIME,VALUE) values (to_date('06/04/2022 00:15:02','DD/MM/YYYY HH24:MI:SS'),43);
Insert into DANE (TIME,VALUE) values (to_date('06/04/2022 00:15:08','DD/MM/YYYY HH24:MI:SS'),44);
Insert into DANE (TIME,VALUE) values (to_date('06/04/2022 00:15:14','DD/MM/YYYY HH24:MI:SS'),45);
Insert into DANE (TIME,VALUE) values (to_date('06/04/2022 00:15:20','DD/MM/YYYY HH24:MI:SS'),46);
Insert into DANE (TIME,VALUE) values (to_date('06/04/2022 01:01:32','DD/MM/YYYY HH24:MI:SS'),47);
Insert into DANE (TIME,VALUE) values (to_date('06/04/2022 01:01:38','DD/MM/YYYY HH24:MI:SS'),48);
Insert into DANE (TIME,VALUE) values (to_date('06/04/2022 01:01:44','DD/MM/YYYY HH24:MI:SS'),49);
Insert into DANE (TIME,VALUE) values (to_date('06/04/2022 01:01:50','DD/MM/YYYY HH24:MI:SS'),50);
The query:
select time, value, max(rn) over (order by time)
from (
select time, value,
lag(time) over (order by time),
case when abs(lag(time) over (order by time) - time) > 600/24/60/60 then row_number() over (order by time)
when row_number() over (order by time) = 1 then 1
else null
end rn
from dane
);
The result is almost ok, but ...
records from 26 to 29 (05/04/2022 22:19:50, 05/04/2022 22:26:12, 05/04/2022 22:26:30, 05/04/2022 22:29:50) should be in one group 26
records from 30 to 32 (05/04/2022 22:30:22, 05/04/2022 22:33:29, 05/04/2022 22:33:35) should be in another group
but they are not. :(
I do not know how to deal with that problem.
Please help.
May 09, 2022 - 2:44 pm UTC
The query is grouping any rows with are within 10 minutes of the previous row; not the first in the group.
One way to define a fixed window size is this:
select * from dane
match_recognize (
order by time
measures
match_number() as grp
all rows per match
pattern ( init within_10* )
define
within_10 as time < init.time + interval '10' minute
)
Other alternatives...
Rajeshwaran Jeyabal, May 10, 2022 - 3:52 pm UTC
Here are few other alternatives using Model clause for inter-row calculations...
demo@XEPDB1> select *
2 from t
3 match_recognize(
4 order by time
5 measures
6 match_number() mno,
7 count(*) as cnt,
8 min(time) as start_time,
9 max(time) as end_time
10 pattern( init diff_by_10_mins* )
11 define
12 diff_by_10_mins as time < init.time + numtodsinterval(10,'minute') )
13 /
MNO CNT START_TIME END_TIME
---------- ---------- ----------------------- -----------------------
1 5 05-APR-2022 09:01:57 pm 05-APR-2022 09:02:21 pm
2 6 05-APR-2022 09:23:23 pm 05-APR-2022 09:23:53 pm
3 8 05-APR-2022 09:35:10 pm 05-APR-2022 09:39:09 pm
4 6 05-APR-2022 10:06:11 pm 05-APR-2022 10:06:41 pm
5 3 05-APR-2022 10:19:50 pm 05-APR-2022 10:26:30 pm
6 4 05-APR-2022 10:29:50 pm 05-APR-2022 10:33:35 pm
7 4 05-APR-2022 11:08:46 pm 05-APR-2022 11:12:34 pm
8 2 05-APR-2022 11:31:07 pm 05-APR-2022 11:31:13 pm
9 8 06-APR-2022 12:11:39 am 06-APR-2022 12:15:20 am
10 4 06-APR-2022 01:01:32 am 06-APR-2022 01:01:50 am
10 rows selected.
demo@XEPDB1> select grp, count(*) ,min(x1) as start_time, max(x1) as end_time
2 from (
3 select *
4 from t
5 model
6 dimension by ( row_number() over(order by time) r )
7 measures( time as x1, 0 grp, cast(null as date) as x2 )
8 rules(
9 x2[any] order by r = case when cv(r)=1 then x1[cv(r)]
10 when x1[cv(r)] < x2[cv(r)-1] + interval '10' minute
11 then x2[cv(r)-1] else x1[cv(r)] end ,
12 grp[any] order by r = case when cv(r)=1 then 1
13 when x1[cv(r)] < x2[cv(r)-1] + interval '10' minute
14 then grp[cv(r)-1] else grp[cv(r)-1]+1 end )
15 )
16 group by grp
17 /
GRP COUNT(*) START_TIME END_TIME
---------- ---------- ----------------------- -----------------------
1 5 05-APR-2022 09:01:57 pm 05-APR-2022 09:02:21 pm
2 6 05-APR-2022 09:23:23 pm 05-APR-2022 09:23:53 pm
3 8 05-APR-2022 09:35:10 pm 05-APR-2022 09:39:09 pm
4 6 05-APR-2022 10:06:11 pm 05-APR-2022 10:06:41 pm
5 3 05-APR-2022 10:19:50 pm 05-APR-2022 10:26:30 pm
6 4 05-APR-2022 10:29:50 pm 05-APR-2022 10:33:35 pm
7 4 05-APR-2022 11:08:46 pm 05-APR-2022 11:12:34 pm
8 2 05-APR-2022 11:31:07 pm 05-APR-2022 11:31:13 pm
9 8 06-APR-2022 12:11:39 am 06-APR-2022 12:15:20 am
10 4 06-APR-2022 01:01:32 am 06-APR-2022 01:01:50 am
10 rows selected.
demo@XEPDB1>
May 11, 2022 - 5:41 am UTC
nice stuff