I have a table '
raw_traffic' with approximately 35 columns titled
id,
status,
insert_date,
change_date,
device_name,
device_port,
class,
prev_id,
col_1,
col_2,
col_3,
.
.
.
col_N
Whenever I receive fresh traffic data, I need to find the previous row id with a specific combination of device_name, device_port, class, status and some other columns
The approach that we currently have is to run the following query to get the last inserted id.
SELECT MAX(id) FROM raw_traffic WHERE device_name='___', device_port='___', class='___', col_1='___', col_2='___' and status='___'
After retrieving this id we are storing it in the prev_id column of the new row and doing an insert.
It is important to mention that this table at any given point of time has around half a million records and around 10 indexes (one on each of these searchable columns). Consequently the select and the insert operations have become intolerably slow and deteriorating by the day.
In order to improve it I was thinking of putting the retrieve last id function in a PL/SQL block or even a before_insert trigger in the db.
What would the best approach for handling such a scenario. Moreover what should be the most efficient index strategy for such a table ?
If you want to find the largest value in a column for all the rows that equal some values in other columns, you can create an index on:
- all the columns in your where
- followed by the column you want the max value from
The database can then do a nice efficient min/max scan of the index.
For example, the following searches the table for all rows where the first three columns = 0. And gets the max value for the last one:
create table t as
select mod ( level, 2 ) c1,
mod ( level, 4 ) c2,
mod ( level, 8 ) c3,
level c4
from dual
connect by level <= 100;
create index i on t ( c1, c2, c3, c4 );
set serveroutput off
select /*+ gather_plan_statistics */
max ( c4 )
from t
where c1 = 0
and c2 = 0
and c3 = 0;
select *
from table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST'));
PLAN_TABLE_OUTPUT
SQL_ID 1t8zzb3s0r1fr, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ max ( c4 ) from t where
c1 = 0 and c2 = 0 and c3 = 0
Plan hash value: 3623458378
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 1 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 1 |
| 2 | FIRST ROW | | 1 | 1 | 1 |00:00:00.01 | 1 |
|* 3 | INDEX RANGE SCAN (MIN/MAX)| I | 1 | 1 | 1 |00:00:00.01 | 1 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("C1"=0 AND "C2"=0 AND "C3"=0)
This only accesses one entry in the index. So is about as efficient as you can get.
This only works if you've got equality (=) on all the leading columns of the index columns. If you have any inequalities (<, >=, etc.) you won't get this scan.
For example, the following changes the predicate on c2 to < 1. Instead of = 0.
Although for these data the effect is the same, the database does a normal range scan of the index. Which reads 12 entries instead of 1:
select /*+ gather_plan_statistics */
max ( c4 )
from t
where c1 = 0
and c2 < 1
and c3 = 0;
select *
from table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST'));
PLAN_TABLE_OUTPUT
SQL_ID 4sd3havabynyu, child number 1
-------------------------------------
select /*+ gather_plan_statistics */ max ( c4 ) from t where
c1 = 0 and c2 < 1 and c3 = 0
Plan hash value: 163676535
------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 1 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 1 |
|* 2 | INDEX RANGE SCAN| I | 1 | 12 | 12 |00:00:00.01 | 1 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("C1"=0 AND "C3"=0 AND "C2"<1)
filter("C3"=0)
What would the best approach for handling such a scenarioHard to say without understanding
why you want to get the previous id for these values. What's the purpose of this column? How are you calculating the id values?