Thanks for the question, Mohammed Imran.

Asked: November 12, 2015 - 5:53 am UTC

Last updated: November 23, 2015 - 11:04 am UTC

Version: 11.2.0.3.0

Viewed 1000+ times

You Asked

Hi,

I need suggestion on how to improve performance of a function which has multiple update statements in a loop which loops through 1.5 million records. Below are more details.

The Function has a loop that traverses across 1.5 million records in PLSQL table(collection) fetched from table (let’s assume table name be tab1).

FOR indx in coll_var.FIRST .. coll_var.LAST LOOP

In each iteration 2 select statements are executed on same table tab1.
In each iteration 4-5 update statements are executed on same table tab1.
Commit statement is issued only after every 200 iterations are completed.

Currently the function takes around 7-8 hours(very high than what is expected) to execute. Reports AWR and ADDM suggest to increase memory (which cannot be made) and shows update statements as most CPU consuming statements (The SQL spent 95% of its database time on CPU, I/O and Cluster waits) through they are all going for index range scan. The SQL with most CPU time looks like
CPU Time (s) Executions CPU per Exec (s) %Total Elapsed Time (s) %CPU %IO SQL Id SQL Module SQL Text
2,449.86 128,359 0.02 42.14 2,544.85 96.27 0.14 3g600mxfzbd4x
SQL*Plus UPDATE …
Though CPU per exec is 0.02 secs the time for all execution is high. Hence I guess there nothing wrong with update statement the problem is with number of times it is executed.

Another observation that I made is about memory consumption(from v$process) while function is executing. I am not sure if the values below higher?
PGA_USED_MEM|PGA_ALLOC_MEM |PGA_FREEABLE_MEM | PGA_MAX_MEM
------------------------------------------------------------------------------------------------------
837328 |952632 |0 |952632  when function was not executed
129350184 |130124088 |393216 |130124088  when function was executed
41326496 |130124088 |88276992 |130124088  after some time when function was still executing

I guess the memory consumption is high due to collection variable which hold 200 records at a time. Can this be cause of performance? As an alternative changed logic to run loop on cursor instead on collection but found no improvement and memory stats also looked to be same.

Currently parameter values:
pga_aggregate_target 1000341504
sga_target 3003121664
sga_max_size 3003121664

I tested various functions with little alteration to the logic mentioned above but found no solution hence request you to suggest me on how to proceed further in resolving the issue? Please let me know if you need more information.

Here the way nth row is treated in function depends upon how prior (n-1) rows are processed/updated. I hope you understand. Below is the algorithm for your better understanding.

Function F1 IS
Cursor C1 is select * from tab1;

buffer_lines_count NUMBER := -100;
Begin

--Fetch the data in bulk (200 records at a time) into collection variable.
OPEN C1;
LOOP
FETCH C1
BULK COLLECT INTO l_collection_var LIMIT 200;

FOR indx IN 1 .. l_collection_var.COUNT LOOP
/*Logic 1st half*/
Begin
Select * into l_tab1_rec_type from tab1 where lng = l_collection_var(indx).lng_s and lat = l_collection_var(indx).lat_s and rank = l_collection_var(indx).rank and line_no < -100;
Exception When NO_DATA_FOUND Then
l_tab1_rec_type := null;
End;

IF l_tab1_rec_type.line_no IS NOT NULL THEN
Update tab1 set line_no = decode(line_seq_no,1,null,line_no), line_seq_no = l_tab1_rec_type.pnt_cnt+1, pnt_cnt = l_tab1_rec_type.pnt_cnt+1 where line_no = l_collection_var(indx).line_no;
Update tab1 set line_no = l_collection_var(indx).line_no, line_seq_no = decode(l_tab1_rec_type.line_seq_no,1,l_tab1_rec_type.pts_cnt - line_seq_no + 1,line_seq_no), pnt_cnt = pnt_cnt+1 where line_no = l_tab1_rec_type.line_no;
END IF;

/*The 2nd half of the logic is almost the same as 1st with changes in select query where clause input column(s) and minor changes in update statement*/

IF /*some CONDITION*/ THEN
buffer_lines_count := buffer_lines_count - 1;
UPDATE tab1 set line_no = buffer_lines_count where line_no = l_collection_var(indx).line_no;
END IF;

END LOOP;
Commit;
EXIT WHEN l_collection_var.COUNT < 200;
END LOOP;
Return 0;
END;

I guess I made function look simpler by giving its algorithm, Let me know If you need the entire body.

Also bulk operation is not possible while updating data because row-by-row processing needs to be done.

Thanks in advance.
Imran.

and Chris said...

There's a couple of bad practices in the code:

- row by row (slow by slow) processing
- checking if a row exists before doing the update

For a first pass, you could:

- Remove the select. Modify the updates to do this directly
- Take these out of the inner loop (indx IN 1 .. l_collection_var.COUNT) and use bulk processing (forall) to do the updates

These would now look something like:

  Update tab1 t
  set line_no   = decode(line_seq_no,1,null,line_no),
    (line_seq_no, pnt_cnt) = (Select pnt_cnt+1, pnt_cnt+1
                              from tab1 s
                              where lng   = l_collection_var(indx).lng_s
                              and lat     = l_collection_var(indx).lat_s
                              and rank    = l_collection_var(indx).rank
                              and line_no < -100)
  where exists (Select line_no
                from tab1 s
                where lng   = l_collection_var(indx).lng_s
                and lat     = l_collection_var(indx).lat_s
                and rank    = l_collection_var(indx).rank
                and line_no < -100);
                
forall indx IN 1 .. l_collection_var.COUNT
  Update tab1
  set line_no   = l_collection_var(indx).line_no,
    line_seq_no = decode(l_tab1_rec_type.line_seq_no,1,l_tab1_rec_type.pts_cnt - line_seq_no + 1,line_seq_no), 
    pnt_cnt     = pnt_cnt+1
  where line_no in (Select line_no
                    from tab1
                    where lng   = l_collection_var(indx).lng_s
                    and lat     = l_collection_var(indx).lat_s
                    and rank    = l_collection_var(indx).rank
                    and line_no < -100);

You may be able to go even further and remove the explicit cursor and do all the updates directly. I'm not clear how you would do this because all the tables are called tab1. I'm guessing this isn't the case in reality.

Rating

(2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Current Logic is as per requirement

Mohammed Imran Pasha, November 16, 2015 - 6:22 am UTC

There's a couple of bad practices in the code:

- row by row (slow by slow) processing ==> row by row processing is required in this case because the way current row is processed depends upon updates made to all previous rows (as mentioned by me in my query also).
- checking if a row exists before doing the update ==> as suggested by you removing select statement will require changing the subsequent update to include select subquery hence select query will be executed even after change, why do you think this change would improve performance?

I guess telling you more about why this function is required will help you evaluate function better. Below are the requirements and expected output from the function.

The function should join the line segments in table tab1 by comparing their coordinates (longitude, latitude). Below is the example

LINE_NO LINE_SEQ_NO PTS_CNT LONGITUDE LATITUDE
------- ------------- ---------- ------------ ----------
1 1 2 81910000 -122430000
1 2 2 81921000 -122429000
2 1 2 81910000 -122430000
2 2 2 81902000 -122428000

Example above shows 2 line segments (1 and 2) each record corresponding to an end point of the line hence there are 4 records shown for 2 line segments. Line Seq No represents the order of/direction of line in example above line_seq_no value 1 indicates start point of the line and 2 (line_seq_no=pts_cnt) indicates end point. Initially table tab1 will have million(s) such line segments and function is responsible for joining them without any limitation on number of segments a line segment can have for example a line can thousands of segments into it which means line_seq_no/pts_cnt value in thousands. Further in the example above records with line_no 1 & 2 and line_seq_no 1 & 1 respectively have same coordinates hence segments 1 and 2 can be merged so the line after update would look like.

LINE_NO LINE_SEQ_NO PTS_CNT LONGITUDE LATITUDE
------- ------------- ---------- ------------ ----------
2 3 81910000 -122430000
1 3 3 81921000 -122429000
1 2 3 81910000 -122430000
1 1 3 81902000 -122428000

The 1st record line_no is updated to NULL since its coordinate already exist on other segment so it is not required and line_seq_no and pts_cnt are updated accordingly. Here direction of line is not importance I mean the line_seq_no values 1,2,3 or vice versa makes no difference. At the

Currently my logic consist of loop which traverses line by line and merges segments by comparing each line coordinate of line_seq_no 1 then update and of line_seq_no 2 then update. Hence current logic processes row-by-row (which could be cause of slow performance) and updates the same table tab1 always. At loop end after updates are done if the start and end coordinates of new line formed are same then line is saved as is else it is moved to buffer (by updating line_no to number less than -100) assuming there is scope to merge the new line with others.

Before writing logic for the function I have considered using collection instead of table(tab1) for processing, since time to select/update table data can be saved and later after entire processing collection data can bulk inserted into tab1 but I found searching data in collection and changing data into them consumes almost same or more time. So I continued with using table to implement the logic. Therefore I request you to look at the issue from higher level and any suggestions which require entire code rewrite are also welcome.

Let me know if you need more information.

November 23, 2015 - 11:04 am UTC

You can almost always convert row-by-row to set based. Sometimes you need to think carefully however ;)

Performance isn't the real issue with separate statements. The potential problem is read consistency. If there are other users, the data can change between you reading it and executing the update. Placing the select within the updates reduces the number of statements you execute, so you're saving on parsing. It's possible moving it into one query will result in a slower execution plan however.

Here's how I would approach this:

- Find all the duplicate lat/longs and set all but the original to null. You can do this with a statement like:

update t t
set    lseq = null
where  rowid not in (select min(rowid)
                     from   t
                     group  by lat, lng);

If there are three or more lines that share co-ordinates you may end up with all lines having null sequences. If this is the case you'll need to tweak this query for the rest of this process to work.

- Reassign the line numbers for those where there exists a null sequence. You could do this with something like the below statement. It finds all the line numbers for the rows where there is a duplicate lat/long, but the sequence wasn't set to null:

update t t
set    lno = (select distinct lno from t 
              where  rowid in 
               (select min(rowid)
                from   t
                group  by lat, lng)
              and not exists (select * from t s where s.lno = t.lno and s.lseq is null)
             )
where  exists (select * from t s where s.lno = t.lno and s.lseq is null)
and    lseq is not null;

There may be better ways of doing this in your environment. You'll need to test to see what works.

- Update the counts and the sequence numbers as appropriate.

update t t
set    (pts_c, lseq) = (select p, rn from (
                          select lno, lseq, lat, lng,
                                 count(*) over( partition by lno) p, 
                                 row_number() over ( partition by lno order by lat, lng) rn
                          from   t
                          where  lseq is not null
                          ) v
                        where  v.lno = t.lno
                        and    v.lat = t.lat
                        and    v.lng = t.lng
                       )
where  lseq is not null;

Altneratively, as you're working with geometry you may want to take a look a spatial. This may have features you need that does what you want.

Full example of above code:

create table t (lno int, lseq int, pts_c int, lng int, lat int);

insert into t values (1, 1, 2, 81910000, -122430000); 
insert into t values (1, 2, 2, 81921000, -122429000); 
insert into t values (2, 1, 2, 81910000, -122430000); 
insert into t values (2, 2, 2, 81902000, -122428000);

commit;

update t t
set    lseq = null
where  rowid not in (select min(rowid)
                     from   t
                     group  by lat, lng);
        
update t t
set    lno = (select distinct lno from t 
              where  rowid in 
               (select min(rowid)
                from   t
                group  by lat, lng)
              and not exists (select * from t s where s.lno = t.lno and s.lseq is null)
             )
where  exists (select * from t s where s.lno = t.lno and s.lseq is null)
and    lseq is not null; 

update t t
set    (pts_c, lseq) = (select p, rn from (
                          select lno, lseq, lat, lng,
                                 count(*) over( partition by lno) p, 
                                 row_number() over ( partition by lno order by lat, lng) rn
                          from   t
                          where  lseq is not null
                          ) v
                        where  v.lno = t.lno
                        and    v.lat = t.lat
                        and    v.lng = t.lng
                       )
where  lseq is not null;

select * from t;

       LNO       LSEQ      PTS_C        LNG        LAT
---------- ---------- ---------- ---------- ----------
         1          1          3   81910000 -122430000
         1          2          3   81921000 -122429000
         2                     2   81910000 -122430000
         1          3          3   81902000 -122428000

Using Connect By Clause could be one possible solution?

Mohammed Imran Pasha, March 18, 2016 - 2:51 pm UTC

Hi,

Thanks for bearing with all the queries and sorry for long delay. I am back to the performance issue which I left half way.

As alternate solution to the problem I am trying to use connect by clause to merge lines together.
Below are scripts with table and data. Let me know if need more information.

CREATE TABLE tab1
(
   line_no          NUMBER
  ,line_seq_no      NUMBER
  ,line_direction   CHAR(1 BYTE)
  ,free_end_ind     CHAR(1 BYTE)
  ,from_longitude   NUMBER
  ,from_latitude    NUMBER
  ,to_longitude     NUMBER
  ,to_latitude      NUMBER
  ,unique_num       NUMBER
);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14641, 1, 'S', 'B', 168.35207, 
    -46.41314, 168.35368, -46.41314, 6);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14642, 1, 'S', 'B', 168.35368, 
    -46.41314, 168.35693, -46.41314, 7);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14643, 1, 'S', 'B', 168.35693, 
    -46.41314, 168.357, -46.41314, 10);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14644, 1, 'S', 'B', 168.357, 
    -46.41314, 168.35805, -46.41313, 12);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14645, 1, 'S', 'B', 168.35805, 
    -46.41313, 168.3615, -46.41312, 13);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14646, 1, 'S', 'B', 168.3615, 
    -46.41312, 168.36422, -46.41306, 15);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14650, 1, 'S', 'B', 168.36422, 
    -46.41306, 168.3671, -46.41246, 29);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14652, 1, 'S', 'B', 168.3671, 
    -46.41246, 168.37116, -46.41148, 33);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14656, 1, 'S', 'B', 168.37116, 
    -46.41148, 168.37849, -46.40972, 43);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14658, 1, 'S', 'B', 168.37849, 
    -46.40972, 168.37894, -46.40961, 47);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14660, 1, 'S', 'B', 168.37894, 
    -46.40961, 168.38088, -46.40914, 50);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14662, 1, 'S', 'B', 168.38088, 
    -46.40914, 168.38215, -46.40883, 54);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14664, 1, 'S', 'B', 168.38215, 
    -46.40883, 168.38494, -46.40817, 61);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14666, 1, 'S', 'B', 168.38494, 
    -46.40817, 168.38649, -46.40778, 65);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14670, 1, 'S', 'B', 168.38649, 
    -46.40778, 168.39041, -46.40686, 72);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14672, 1, 'S', 'B', 168.39041, 
    -46.40686, 168.39186, -46.40651, 77);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14674, 1, 'S', 'B', 168.39186, 
    -46.40651, 168.39337, -46.40608, 81);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14680, 1, 'S', 'B', 168.39337, 
    -46.40608, 168.39545, -46.40539, 92);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14682, 1, 'S', 'B', 168.39545, 
    -46.40539, 168.39834, -46.40442, 95);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14685, 1, 'S', 'B', 168.39834, 
    -46.40442, 168.39887, -46.40419, 98);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14686, 1, 'S', 'B', 168.39887, 
    -46.40419, 168.41102, -46.40086, 102);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14691, 1, 'S', 'B', 168.41102, 
    -46.40086, 168.41779, -46.39924, 112);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14692, 1, 'S', 'B', 168.41102, 
    -46.40086, 168.41777, -46.3992, 114);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14693, 1, 'S', 'B', 168.41779, 
    -46.39924, 168.42015, -46.39869, 116);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14694, 1, 'S', 'B', 168.41777, 
    -46.3992, 168.42012, -46.39862, 117);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14695, 1, 'S', 'B', 168.42015, 
    -46.39869, 168.42278, -46.39801, 121);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14696, 1, 'S', 'B', 168.42012, 
    -46.39862, 168.42278, -46.39801, 122);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14702, 1, 'S', 'B', 168.42278, 
    -46.39801, 168.44123, -46.39219, 133);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14705, 1, 'S', 'B', 168.44123, 
    -46.39219, 168.44198, -46.39179, 137);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14708, 1, 'S', 'B', 168.44198, 
    -46.39179, 168.44537, -46.38961, 144);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14709, 1, 'S', 'B', 168.44123, 
    -46.39219, 168.44537, -46.38961, 145);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14714, 1, 'S', 'B', 168.44537, 
    -46.38961, 168.45362, -46.3846, 155);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14927, 1, 'S', 'B', 168.45362, 
    -46.3846, 168.47328, -46.38075, 171);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14928, 1, 'S', 'B', 168.47328, 
    -46.38075, 168.48361, -46.37876, 175);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14931, 1, 'S', 'B', 168.48361, 
    -46.37876, 168.49542, -46.37647, 180);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14936, 1, 'S', 'B', 168.49542, 
    -46.37647, 168.50514, -46.37459, 189);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14937, 1, 'S', 'B', 168.50514, 
    -46.37459, 168.50588, -46.37446, 191);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14938, 1, 'S', 'B', 168.50588, 
    -46.37446, 168.51841, -46.37183, 194);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14942, 1, 'S', 'B', 168.51841, 
    -46.37183, 168.52407, -46.36903, 199);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14947, 1, 'S', 'B', 168.52407, 
    -46.36903, 168.54147, -46.36018, 208);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14949, 1, 'S', 'B', 168.54147, 
    -46.36018, 168.54751, -46.35709, 212);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14950, 1, 'S', 'B', 168.54751, 
    -46.35709, 168.54802, -46.35679, 213);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14952, 1, 'S', 'B', 168.54802, 
    -46.35679, 168.55038, -46.35564, 216);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14953, 1, 'S', 'B', 168.54802, 
    -46.35679, 168.55031, -46.35558, 217);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14954, 1, 'S', 'B', 168.55038, 
    -46.35564, 168.55081, -46.35541, 219);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14955, 1, 'S', 'B', 168.55031, 
    -46.35558, 168.55075, -46.35535, 221);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14956, 1, 'S', 'B', 168.55075, 
    -46.35535, 168.55217, -46.35407, 225);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14957, 1, 'S', 'B', 168.55081, 
    -46.35541, 168.55217, -46.35407, 226);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14958, 1, 'S', 'B', 168.55217, 
    -46.35407, 168.55249, -46.35364, 227);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14964, 1, 'S', 'B', 168.55249, 
    -46.35364, 168.56585, -46.34044, 243);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14965, 1, 'S', 'B', 168.56585, 
    -46.34044, 168.57014, -46.33746, 245);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14967, 1, 'S', 'B', 168.57014, 
    -46.33746, 168.5801, -46.33057, 248);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14971, 1, 'S', 'B', 168.5801, 
    -46.33057, 168.60184, -46.31736, 259);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14972, 1, 'S', 'B', 168.60184, 
    -46.31736, 168.60187, -46.31735, 261);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14973, 1, 'S', 'B', 168.60187, 
    -46.31735, 168.60245, -46.31715, 262);
Insert into TAB1
   (LINE_NO, 
    LINE_SEQ_NO, LINE_DIRECTION, FREE_END_IND, FROM_LONGITUDE, FROM_LATITUDE, 
    TO_LONGITUDE, TO_LATITUDE, UNIQUE_NUM)
 Values
   (14977, 1, 'S', 'B', 168.60245, 
    -46.31715, 168.61233, -46.3144, 269);
COMMIT;

Now,my goal is to merge all lines in the tab1 by comparing their longitude and latitude respectively and since the earlier solution (discussed in previous questions above) is having huge performance impact the 2nd solution that I could think of is to merge lines using CONNECT BY clause.

Though the query below with connect by merges lines, there are 2 problems in resolving which I need your help.

Problem 1. The new_line_seq value 22 in result set is shown twice because oracle found two records (unique_num : 114,112) satisfying condition in connect by clause however I want oracle to join with any one record? How can this be done by altering the query below?

Problem 2. The total number of records in tab1 table are 56 but query below shows total count of 167 because oracle is reusing the records, for example unique_num 155 record is shown in new_line_seq 27 and 28 however it should be shown only once in any one new_line_no and new_line_seq combination. How can this be done by altering the query below?

If solution to problem 2 is not found then I may have to manually fetch repeating points and pass to query below and update table tab1 in a loop which could be a performance bottleneck again! please suggest.

SELECT CONNECT_BY_ROOT line_no AS "NEW_LINE_NO"
          ,LEVEL AS "NEW_LINE_SEQ"
          ,from_longitude 
           ,from_latitude
          ,to_longitude 
           ,to_latitude
          ,DECODE(
              LEVEL
             ,CONNECT_BY_ISLEAF, free_end_ind
             ,DECODE(DECODE(LEVEL, 1, -1, DECODE(CONNECT_BY_ISLEAF, 1, -2, 0))
                    ,-1, DECODE(free_end_ind,  'B', 'E',  'S', 'N')
                    ,-2, DECODE(free_end_ind,  'B', 'T',  'T', 'N')
                    ,'N'))
              AS "free_end_ind"
      FROM tab1
     WHERE free_end_ind IN ('B', 'S')
CONNECT BY PRIOR to_longitude = from_longitude AND
           PRIOR to_latitude = from_latitude
START WITH  from_longitude = 168.35207 and from_latitude = -46.41314
order by 1,2

I have separate logic written to identify repeating points (duplicate) and only such points will be used in START WITH clause since they have another line with same point to which it will be merged.

The data in the tab1 and values in start with clause below are used as an example. Production DB will have millions of records to be merged.

Thanks,
Imran.

More to Explore

Performance

Get all the information about database performance in the Database Performance guide.

Breadcrumb

Question and Answer

You Asked

and Chris said...

Rating

Comments

Current Logic is as per requirement

Using Connect By Clause could be one possible solution?

More to Explore

Performance

Performance Issue with Function with multiple update statements

Breadcrumb

Question and Answer

You Asked

and Chris said...

Rating

Comments

Current Logic is as per requirement

Using Connect By Clause could be one possible solution?

More to Explore

Performance