Skip to Main Content
  • Questions
  • Getting lowest record from duplicates

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Parthasaradhi.

Asked: April 11, 2019 - 2:57 pm UTC

Last updated: April 15, 2019 - 4:28 am UTC

Version: 12.1

Viewed 1000+ times

You Asked

Hi Tom,

Following is the View definition.

SELECT     C.VERSION,
           C.DOW,
           C.DELV_TYPE_CODE,
           C.CURR_DELVPT_SYS_ID,
           C.EMP_ID,
           C.ZIP5,
           C.ZIP4,
           C.ZIP2,
           C.RTE,
           C.EVENT_DATE,
           TO_CHAR (
                 TRUNC (SYSDATE)
               + (  GREATEST (
                        (B.EVENT_SECONDS + C.DELIVERY_SECONDS),
                        0)
                  / 86400),
               'HH24MI')
               DELIVERY_TIME,
           GREATEST (
               FLOOR ( (B.EVENT_SECONDS + C.DELIVERY_SECONDS) / 60),
               0)
               DELIVERY_MINUTES,
           C.DELIVERY_SECONDS      RELATIVE_DELIVERY_SECONDS,
           C.DELIVERY_SECONDS / 60 RELATIVE_DELIVERY_MINUTES,
           NULL                         REGULAR_ROUTE,
           NULL                         DEVIATION,
           TO_CHAR (
                 TRUNC (SYSDATE)
               + (  GREATEST (
                        (B.EVENT_SECONDS + C.DELIVERY_SECONDS),
                        0)
                  / 86400),
               'HH24MISS')
               DELIVERY_COMMUTE,
           GREATEST ( (B.EVENT_SECONDS + C.DELIVERY_SECONDS), 0)
               DELIVERY_SECONDS,
           C.DELIVERY_SECONDS_LIST,
           C.DELIVERY_TIME_FLAGS,
           C.MID_POINT_STATUS,
           TRUNC (B.EVENT_DTM)      DEPART_DATE,
           B.EVENT_DTM              DEPART_TIME,
           B.EVENT_SECONDS          DEPART_SECONDS,
           C.SEQ_NBR,
           C.DELIVERY_SEQ_RATIO,
           C.DELIVERY_SEQ_PERCENT,
           B.CREATE_DATE            MODIFIED_DATE
      FROM A.GEO_RTE_OFD_PRESENT_T  B
           JOIN A.GEO_RTE_C C
               ON     C.ZIP5 = B.ZIP5
                  AND C.RTE = B.RTE
                  AND C.EMP_ID = B.EMP_ID
                  AND C.DOW = TO_NUMBER (TO_CHAR (B.EVENT_DTM, 'D'))
                  AND C.VERSION = 12 ;


When we use the above view along with extra conditions to get the lowest sequence number from the duplicate records taking more time.

select v1.curr_delvpt_sys_id,
                   v1.emp_id,
                   v1.rte,
                   v1.zip5,
                   v1.zip5||v1.zip4||v1.zip2,
                   v1.relative_delivery_seconds,
                   v1.delivery_seconds,
                   v1.depart_time,
                   v1.depart_seconds,
                   v1.delivery_seq_ratio,
                   v1.delivery_seq_percent
              from a.geo_rte_exdw_present_v v1
             where v1.modified_date >= to_date('2019-02-08', 'YYYY-MM-DD')
               and v1.modified_date < (to_date('2019-02-08', 'YYYY-MM-DD') + 1)
               and v1.delivery_seconds = (select min(v2.delivery_seconds) 
                                            from a.geo_rte_exdw_present_v v2 
                                           where v2.modified_date >= to_date('2019-02-08', 'YYYY-MM-DD')
                                             and v2.modified_date < (to_date('2019-02-08', 'YYYY-MM-DD') + 1)
                                             and v2.zip5||v2.zip4||v2.zip2 = v1.zip5||v1.zip4||v1.zip2); 


QUESTION:

The query pulls data from this view which holds delivery information by route/ZIP11/modified_date with the caveat that ZIP11 is ZIP5|ZIP4|ZIP2 – as you noticed.
The problem is that the view holds multiple rows for the same route/ZIP11/modified_date combination and I only must have one… the one with the lowest DELIVERY_SECONDS.
There will only be one record per route/ZIP11/modified_date combination with the same DELIVERY_SECONDS. So DELIVERY_SECONDS is like a sequence for the rows by route/ZIP11/modified_date.
I need the one record with the lowest sequence.
In other words: I need to dedupe the view and only get the one record from the list of (route/ZIP11/modified_date) with the lowest DELIVERY_SECONDS value.
This is basically a dedupe operation.


I really appreciate your help on this.

-Partha

and Connor said...

Try something like this

select *
from (
select v1.curr_delvpt_sys_id,
                   v1.emp_id,
                   v1.rte,
                   v1.zip5,
                   v1.zip5||v1.zip4||v1.zip2,
                   v1.relative_delivery_seconds,
                   v1.delivery_seconds,
                   v1.depart_time,
                   v1.depart_seconds,
                   v1.delivery_seq_ratio,
                   v1.delivery_seq_percent,
                   row_number() over ( partition by v1.zip5,v1.zip4,v1.zip1 order by v1.delivery_seconds ) as seq
              from a.geo_rte_exdw_present_v v1
             where v1.modified_date >= to_date('2019-02-08', 'YYYY-MM-DD')
               and v1.modified_date < (to_date('2019-02-08', 'YYYY-MM-DD') + 1)
)
where seq = 1


Rating

  (1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Getting lowest sequence number dedupes

Partha, April 12, 2019 - 2:56 am UTC

This is most useful query to get the lowest sequence number dedupes.


Connor McDonald
April 15, 2019 - 4:28 am UTC

glad to help

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.