Skip to Main Content
  • Questions
  • Finding date ranges using analytic/aggregate functions

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Rajesh.

Asked: May 11, 2023 - 5:09 am UTC

Last updated: May 23, 2023 - 4:55 am UTC

Version: 19c

Viewed 10K+ times! This question is

You Asked

Hi

I have a table with project entry and exit dates for employee.

EMPNO PROJECT_NUMBER PROJECT_IN_DATE PROJECT_OUT_DATE
001 P1000 15-JAN-23 06-JUN-23
001 P1000 30-JUN-23 17-JUL-23


Now I have an upcoming project 'P1001' that has the same starting and ending date as for project 'P1000'

PROJECT_NUMBER START_DATE END_DATE
P1000 01-JAN-23 31-DEC-23
P1001 01-JAN-23 31-DEC-23


Against the given data, I want to get date slots for employee '001' for project 'P1001' like below.
EMPNO PROJECT_NUMBER PROJECT_IN_DATE PROJECT_OUT_DATE
001 P1001 01-JAN-23 14-JAN-23
001 P1001 07-JUN-23 29-JUN-23
001 P1001 18-JUL-23 31-DEC-23


I have a PL/SQL block in place to achieve this. Now, as I started using analytical functions frequently, wondering whether the above results are achievable using both aggregate and analytical functions.

Thank you!


with LiveSQL Test Case:

and Chris said...

So you want to find all the dates an employee is NOT assigned to a project and assign them to the other one?

Here's something to get you started:

- First generate all the days in the time period
- The outer join in the employee assignments for these days. Make this a partitioned outer join on employee ID to get a row per employee per day
- Combine the assigned/unassigned date ranges together using your favourite method. I've used pattern matching here:

create table projectsm 
    (project_number varchar2(30), 
    start_date date, 
    end_date date) ;
    
create table empprojects 
(empno varchar2(30), 
    project_number varchar2(100), 
    project_in_date date, 
    project_out_date date) ;

insert into projectsm values('P1000','01-JAN-2023','31-DEC-2023');

insert into projectsm values('P1001','01-JAN-2023','31-DEC-2023');

insert into empprojects values ('001','P1000','15-JAN-2023','06-JUN-2023');

insert into empprojects values ('001','P1000','30-JUN-2023','17-JUL-2023');

commit;

with date_range as (
  select min ( start_date ) start_date, 
         max ( end_date ) - min ( start_date ) + 1 days
  from   projectsm
), project_dates as (
  select start_date + level - 1 dt 
  from   date_range
  connect by level <= days
), emp_project_days as (
  select * 
  from   project_dates p
  left join empprojects e
    partition by ( empno )
  on   ( 
    p.dt between project_in_date and project_out_date 
  )
)
  select * from emp_project_days
    match_recognize (
      order by dt
      measures first ( dt ) st, last ( dt ) en, empno emp, project_number pnum
      pattern ( init grp* ) 
      define
        grp as prev ( project_number ) = project_number
          or coalesce ( prev ( project_number ), project_number ) is null
    );
    
ST                EN                EMP                            PNUM                                                                                                
----------------- ----------------- ------------------------------ ------
01-JAN-2023 00:00 14-JAN-2023 00:00 001                            <null>                                                                                              
15-JAN-2023 00:00 06-JUN-2023 00:00 001                            P1000                                                                                               
07-JUN-2023 00:00 29-JUN-2023 00:00 001                            <null>                                                                                              
30-JUN-2023 00:00 17-JUL-2023 00:00 001                            P1000                                                                                               
18-JUL-2023 00:00 31-DEC-2023 00:00 001                            <null>


Having got the date ranges, you can then assign free slots as needed. One way to do this is with a subquery like this:

case when pnum is null then
  ( select project_number from projectsm p
    where  not exists (
      select * from empprojects e
      where  emp = e.empno
      and    p.project_number = e.project_number
    )
  ) 
end proj


To finalize this you'll need to decide how to handle assignments when there are many overlapping projects - how do you choose which unassigned project to place the employee on? What if they have time assigned to every project, but still have free slots?

Rating

  (11 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Need more time to digest and understand your solution

Rajesh, May 12, 2023 - 5:46 am UTC

Hi Chris
Thank you very much for the solution. I need some time to digest the solution. I am pretty new to measures, match_recognize etc. I need bit more time to understand the whole thing!

Thank you

Rajesh, May 16, 2023 - 4:06 am UTC

Hello Chris, thank you! Basically, this code portion goes into a planning where the employees are assigned to projects by the project planner. I worked on this many years back with Iudith Menzel and had to switch to PL/SQL because I couldn't figure out few issues.
I cannot say I did understand everything here either! Regardless, your solution is clean and much efficient.
Would you clarify what did you mean by "using your favourite method"? What could be used other than pattern matching?
Thanks once again
Chris Saxon
May 16, 2023 - 9:00 am UTC

There's a good chance you'll need an iterative algorithm to solve a scheduling problem like this. This is often easiest - or indeed only possible! - in PL/SQL or other non-SQL languages.

The Tabibitosan method is another common way to find consecutive rows.

Alternative - avoiding separate rows for each employee/date combo

mathguy, May 16, 2023 - 6:31 pm UTC

Generating one row for each employee/date combination, then excluding those where an employee is already assigned to another project, and then recombining the remaining rows into ranges, has the advantage of simplicity, and would work even if the inputs aren't "normalized" (by which I mean that the pre-existing, already-assigned ranges for each employee are not overlapping). However, this approach doesn't scale well.

An alternative approach is to work directly with ranges. Set operations (union, intersection, complement) defined on finite disjoint unions of intervals can be implemented directly in SQL, for example using analytic functions - as your intuition suggested.

One common question is, given a finite union of possibly overlapping intervals, write it as a disjoint union of non-adjacent intervals. (This includes stitching together intervals like 1-JAN to 13-JAN and 14-JAN to 23-JAN into a single interval, 1-JAN to 23-JAN. In the extreme, it includes creating a single range out of a set of consecutive, single dates.) This is known as "merging" intervals, and has a very nice solution using MATCH_RECOGNIZE.

In your problem, it seems that it is already assumed that the existing assignments for each employee form a disjoint union. This simplifies the solution to the question you posed; otherwise, the first step would be to "normalize" the existing assignments, before looking for availability for a new project.

You said the new project has the same start and end dates as the existing one. That doesn't seem to be relevant to the problem. Rather, the input should be a table of existing assignments (to possibly more than one project!) and a new project with given start and end dates. You must find availability for every employee, for the span of the new project.

In the code below I re-create the input tables. You need also an "employees" table showing all the available employees. In my sample, employee '444' is not currently assigned to any project; if you only look at table "empprojects", you would never know that this employee even existed.

In the sample, projects P1000, P1002, P1003 and P1004 are pre-existing, and I show a few employees assigned to these projects. P1001 is the new project. Notice also a few special cases I included: employee '002' is already assigned for the entire duration of project P1001, so he won't be included in the output. Employee '444' is available for the entire duration. And employee '003' is available for just one day in a few cases - and notice also what happens when he is assigned from one project to another on adjacent dates (there is no slack there for him to work on P1001). You always need to include such special cases for thorough testing. (It is very likely that I still missed a few other possible special situations.)

Importantly, note that the reference to project P1001 is a red herring. Really what the query does is to find available employee/date combos for a given date range (interval), in this case between 1 JAN 2023 and 31 DEC 2023. It doesn't matter what that range represents.

drop table projectsm purge;
drop table empprojects purge;

create table empm     (
  empno varchar2(30),
  ename varchar2(40)  );

create table projectsm (
  project_number varchar2(30), 
  start_date     date, 
  end_date       date  );
    
create table empprojects (
  empno            varchar2(30), 
  project_number   varchar2(100), 
  project_in_date  date, 
  project_out_date date  );
  
insert into empm values ('001', 'Emp 001');
insert into empm values ('002', 'Emp 002');
insert into empm values ('003', 'Emp 003');
insert into empm values ('444', 'Emp 444');    

-- always do this if you give dates as strings:
alter session set nls_date_format='dd-MON-yyyy';

insert into projectsm values ('P1000','01-JAN-2023','31-DEC-2023');
insert into projectsm values ('P1001','01-JAN-2023','31-DEC-2023');
insert into projectsm values ('P1002','01-MAR-2023','31-OCT-2023');
insert into projectsm values ('P1003','01-JAN-2022','31-DEC-2024');
insert into projectsm values ('P1004','15-APR-2023','20-JAN-2024');

insert into empprojects values ('001','P1000','15-JAN-2023','06-JUN-2023');
insert into empprojects values ('001','P1000','30-JUN-2023','17-JUL-2023');
insert into empprojects values ('002','P1003','15-JAN-2022','31-DEC-2023');
insert into empprojects values ('003','P1000','02-JAN-2023','29-APR-2023');
insert into empprojects values ('003','P1000','01-MAY-2023','14-MAY-2023');
insert into empprojects values ('003','P1002','15-MAY-2023','06-JUN-2023');
insert into empprojects values ('003','P1004','30-AUG-2023','30-DEC-2023');

commit;

with
  proj_dates (start_date, end_date) as (
    select start_date, end_date
    from   projectsm
    where  project_number = 'P1001'
  )
, prep (empno, project_in_date, project_out_date) as (
    select empno, null, start_date - 1
    from   empm cross join proj_dates
    union all
    select empno, project_in_date, project_out_date
    from   empprojects
    where  project_out_date >= (select start_date from proj_dates)
           and
           project_in_date  <= (select end_date   from proj_dates)
  )
, comp (empno, project_in_date, project_out_date) as (
    select empno,
           project_out_date + 1,
           lead(project_in_date - 1, 1, (select end_date from proj_dates))
               over (partition by empno order by project_out_date)
    from   prep
  )
select empno, project_in_date, project_out_date
from   comp
where  project_in_date <= project_out_date
order  by empno, project_in_date
;


EMPNO PROJECT_IN_DATE  PROJECT_OUT_DATE
----- ---------------- ----------------
001   01-JAN-2023      14-JAN-2023     
001   07-JUN-2023      29-JUN-2023     
001   18-JUL-2023      31-DEC-2023     
003   01-JAN-2023      01-JAN-2023     
003   30-APR-2023      30-APR-2023     
003   07-JUN-2023      29-AUG-2023     
003   31-DEC-2023      31-DEC-2023     
444   01-JAN-2023      31-DEC-2023 

Chris Saxon
May 17, 2023 - 12:59 pm UTC

Thanks for sharing

Understanding the requirement from my requirement

Rajesh, May 17, 2023 - 3:21 pm UTC

Hi @mathguy
Well, I have to start with "You said the new project has the same start and end dates as the existing one. That doesn't seem to be relevant to the problem. Rather, the input should be a table of existing assignments (to possibly more than one project!) and a new project with given start and end dates. You must find availability for every employee, for the span of the new project", that is not my requirement. We are HVAC contractor with more than 300 villa projects every year & an assessment of available man power is made, based on the skills required. Hence, we never need to know who all are available when we are doing the exercise to find out when a particular skill is available for a particular slot. I believe the requirement was clearly presented and sorry, if I didn't explain well with my previous posts.

Let me be further clear with an example

Employee A is a fabricator, B is a welder. For my new project I don't need a welder, just a fabricator. So, I need to see what are the date ranges A is not assigned in other projects, and that is what exactly I tried to explain while posting the query.

I have tested Chris's solution within a limited scope & I will be more than happy to test your solution as well.

Thank you very much mathguy! You helped me with another complex situation couple of years back. Appreciate your helps always.
Chris Saxon
May 18, 2023 - 12:23 pm UTC

As mathguy says below, it should be possible to adapt either solution to solve this requirement

Different skills

mathguy, May 17, 2023 - 5:06 pm UTC

Hi @Rajesh,

What you describe isn't too different from what I posted. The additional detail is in an unrelated direction anyway. What I said about "the date range given as input doesn't need to be related to a project" still stands; what you do need, in addition to the date range (which may come from a potential project, but where it comes from has no relevance to how to solve the problem), is also to know what skills you need. So, if you need fabricators, but not welders, then from your table of all available workers, you will filter on skill, so only fabricators are included. You should still include all fabricators from your "employees" table, just in case there's one who is not yet assigned to ANY project in the "empprojects" table.

This assumes I understood your task, with the additional clarification. Rephrasing: Here is my table with all my workers and their skills, and another table with all open projects and the ranges (date intervals) each worker is already assigned to various projects. Now I have a new potential project, with a given date range when we must work on it, and here is the skill (or several skills) needed. Show me, by "skill needed", which of my workers with the required skills are available (not already assigned to other projects), at any time during the date range for the new project.

If this is correct, the EMP table must include a column for "skill", the inputs - in addition to the new time range - must also include skill or skills needed, the output must include an additional column for skill (can be picked up from EMP with no extra effort), and you need an additional WHERE filter so you only look at the skill or skills you need for this project. It should be easy to adapt what I posted (and similar for Chris's approach).
Chris Saxon
May 18, 2023 - 12:23 pm UTC

Well said

Current Tables vs History Tables

rob squire, May 18, 2023 - 7:09 pm UTC

Please permit me to make the following observations.

Looking at your schema, i notice some interesting things.

The empm table is a relational / 'current view' table. It has a primary key empno and a potential unique key (ename).

However, the projectsm table is not relational. By adding start_date and end_date columns it becomes a 'History of Projects'.
This means that there is no meaningful unique key for that table.

The empprojects table is an intersection between empm and projectsm and is also a history table. The history of the when employees are in projects i.e. a history of the relationship between empm and projectsm. Again, this table not relational since does not have a meaningful unique key.

Looking at the data, there is no date when an emp is in more than one project at the same time. This means that the relationship between empm and projectsm is one:many and in the current view would be modelled as follows

create table empm (
empno varchar2(30)
ename varchar2(40)
project_number varchar2(30) );

create table projectsm
project_number varchar2(30) );

alter table empm ADD (CONSTRAINT
empm_projectsm_fk FOREIGN KEY
(project_number) REFERENCES projectsm
(project_number));

The problem is that you are trying to use relational joins on non-relational History tables which is very probelmatic (as the above answers show).

SQL joins only work in the current view i.e. when there is uniqueness in the data.

The answer is to model your data in the relational current view, and populate history tables as the data changes (with triggers that check for referential integrity between empm and projectsm) and then find a way to project the current view (for a known date) from your stored history data.

Oracle 23c has recently introduced JSON-Relational duality views which hold data in a relational manner. Unfortunately, they only store the last known state (not the history). If it did then time series type of reporting like the above requirement – which is common to most enterprise applications – would genuinely be “app simpler” to develop!

Chris Saxon
May 19, 2023 - 10:45 am UTC

Why exactly is projectsm "not relational"?

Granted there's no primary/unique key in the table defined, but presumably one could be added over either projectno project_number or ( project_number , start_date ), which case the table is normalized up to fifth normal form as far as I can tell.

JSON-Relational duality views do not store "last known state". They create a JSON representation of the underlying relational tables. You can then read/write JSON objects using the view. The database translates the writes to the relational tables.

This works with history tables. Here's a quick example to record employee job history:

create table employees (
   employee_id integer primary key
);

create table job_history (
   employee_id integer, job_id varchar2(30), start_date date, end_date date,
   primary key ( employee_id, start_date )
);
  
create or replace json duality view job_hist_v as 
select json {
         'empId' : e.employee_id,
         'jobs' : [
           select json { 'emp' : h.employee_id, 'job' : h.job_id, 'startDate' : start_date, 'endDate' : end_date }
           from   job_history h with insert update
           where  ( e.employee_id = h.employee_id )
         ]
       }
from   employees e with insert;

insert into job_hist_v values ( '{"empId":42,"jobs":[{"emp":42,"job":"DEVELOPER","startDate":"2023-01-21T00:00:00"}]}' );

select * from employees;
/*
EMPLOYEE_ID
-----------
         42
*/
select * from job_history;
/*
EMPLOYEE_ID JOB_ID                         START_DATE        END_DAT
----------- ------------------------------ ----------------- -------
         42 DEVELOPER                      21-JAN-2023 00:00 <null> 
*/        
update job_hist_v
set    data = '{"empId":42,"jobs":[       
  {"emp":42,"job":"DEVELOPER","startDate":"2023-01-21T00:00:00","endDate":"2023-05-19T00:00:00"},
  {"emp":42,"job":"SENIOR_DEVELOPER","startDate":"2023-05-19T00:00:00"}
  ]
}' ;
  
select * from job_history;
/*
EMPLOYEE_ID JOB_ID                         START_DATE        END_DATE         
----------- ------------------------------ ----------------- -----------------
         42 DEVELOPER                      21-JAN-2023 00:00 19-MAY-2023 00:00
         42 SENIOR_DEVELOPER               19-MAY-2023 00:00 <null>           
*/



Interesting it gets!

Rajesh, May 22, 2023 - 5:44 am UTC

Well, I've already mentioned, this is simplest of data representation that I could have thought of while posting the query. The tables used for the application have much detailed information & my current PL/SQL block is "quite" capable of listing the empty slot for employee or interest.
I think I can avoid a number of confusions in the future by including the actual tables structures..

I've found that, Chris's solution has some gaps, when I add one more project to the table. I will post the whole situation soon. Running late on few projects, apologies.

History tables continued ...

Rob Squire, May 22, 2023 - 9:40 am UTC

Hi Connor

You said ....
"Granted there's no primary/unique key in the table defined, but presumably one could be added.."

I say ....
If you try to create a unique key which includes both start_date and end_date you still can't stop gaps and overlaps in the data

You also said ....
"table is normalized up to fifth normal form as far as I can tell."

I say ....
Adding start_date and end_date actually breaks 3NF/BCNF


So, we were taking about the empm, projectm and empprojects tables and I pointed out that projectsm and empprojects are history tables because they have from / to dates in them. Creating useful Unique keys on such tables is problematic and creating meaningful joins TO such tables is not possible.

The same is true of your job_history table but your duality view example does not have any joins or foreign keys in it and so doesn't address the point.

However, in job_history, with the current defined primary key, I can for example ...

insert into job_history (employee_id, job_id, start_date, end_date)
values (42, 'PROJECT MANAGER', to_date('22-JAN-2023','DD-MON-YYYY'), null);


and so
on ANY date from 23-JAN-2023, the employee is both a DEVELOPER and a PROJECT MANAGER
on ANY date from 19-MAY-2023, the employee is both a SENIOR DEVELOPER and a PROJECT MANAGER

The end_date is important.

This is a trivial example but If you just try to populate and join to history tables defined as discussed above then you will see for yourself.

Also, you may also need to correct these dates, so using them in the primary key is not a good idea.
Chris Saxon
May 22, 2023 - 12:55 pm UTC

If you try to create a unique key which includes both start_date and end_date you still can't stop gaps and overlaps in the data

It is possible to make constraints to prevent gaps & overlaps, it's just fiddly/impractical with constraints as they exist at this time:



Adding start_date and end_date actually breaks 3NF/BCNF

Which functional dependencies does this break?

creating meaningful joins TO such tables is not possible

How so? The DV example above includes a join (from employees to job_history) - how is this "not meaningful"? What exactly is a "meaningful" join?

Constraints

rob squire, May 22, 2023 - 4:36 pm UTC

Which functional dependencies does this break?

I mean that in 3NF there are no transitive dependencies for the non-prime attributes i.e. the various columns in the record are independent.
Start and End dates are not independent - either within the row or between rows.

How so? The DV example above includes a join (from employees to job_history)
Your DV example above doesn't actually JOIN TO the history, it simply filters job_history rows by employee_id in the where clause.

What exactly is a "meaningful" join?
By 'meaningful' join I mean a join that provides a correct set of rows in the result and you haven't shown a JOIN TO a history table yet.
Also what if the employees table also has start and end dates and there is also a departments table with start / end dates?
Try joining employees, departments and job-history together. I have tried and I know it doesn't work.

Your presentation on 'advanced constraints' was very nice, but you only show a technique for ensurimg start/end dates are not overlapping across rows. You also mention several times that the technique is limited / doesnt work in practice.

Also you don't show a JOIN TO a table with start / end dates.

This is my main point Joining TO a 'history' tables doesnt actually work.

Chris Saxon
May 22, 2023 - 4:59 pm UTC

Start and End dates are not independent - either within the row or between rows.

There is a constraint between these values - the end has to be after the start. But this is not a functional dependency (for every value of start date the end date must be the same).

(Technically there could be, but I would find it very strange to say, e.g. every time the start date is 1 Jan 2023, the end date must be 31 Dec 2023)

Normal forms only concern themselves with functional dependencies.

Your DV example above doesn't actually JOIN TO the history, it simply filters job_history rows by employee_id in the where clause.

You could easily rewrite the query as a join though, right?

Also what if the employees table also has start and end dates and there is also a departments table with start / end dates?
Try joining employees, departments and job-history together. I have tried and I know it doesn't work.


Show us what you've tried!

This is my main point Joining TO a 'history' tables doesnt actually work.

You've stated this many times, yet not given a concrete example where this "doesn't work" for us to discuss.

Joining to History Tables

rob squire, May 22, 2023 - 9:56 pm UTC

You've stated this many times, yet not given a concrete example where this "doesn't work" for us to discuss.

As you suggest, I have posted a new question into Ask Tom called 'Joining to History Tables' which has the DDL/DML for an example.

Perhaps you could show me how you would use joins to get the correct results

Powerpoint

rob squire, May 22, 2023 - 10:04 pm UTC

Is there any way to send a ppt to ask tom with the re;evant information to get with the DDL and DML?
Connor McDonald
May 23, 2023 - 4:55 am UTC

You can always email us (with the question ID as the subject) to

asktom_us@oracle.com

As per legal terms on the site, anything you email to us should never contain sensitive information etc, and we will destroy it once the question has been answered.

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.