Thanks for the question, Allen.

Asked: December 08, 2016 - 2:28 pm UTC

Last updated: May 10, 2019 - 8:18 am UTC

Version: 11.2.0.4

Viewed 10K+ times! This question is

You Asked

Database is 11.2.0.4. At this point, cannot use full featured FLASHBACK ARCHIVE with CONTEXT... but it would have the same issues I think

ENVIRONMENT
I have a change logging table (e.g. CHANGESTABLE) that is basically the mirror image column-wise of an actual base table (e.g. BASETABLE) augmented with an archive ID GUID, a timestamp and a DML flag. The change logging table is filled with rows based on DML operations by after triggers (standard approach). The usefulness of the archive ID GUID is still subject to debate...

The PROBLEM:
The result set from the UNPIVOT gets way huge way fast and is ultimately degenerate for any query beyond a few original ID values. Is there any more efficient way to generate this across an entire set of BASETABLE records? In my particular case, the table has about 20K records (tiny), with ~155 columns of various sorts.

EXAMPLE:

BASETABLE organization:

ID, col1, col2, col3, .... col155,...col200, WHO_DID_IT;

CHANGESTABLE Change Logging table organization:

ARC_GD, ARC_TS, DML_FLAG, ID, col1, col2, col3, .... col155,...col200, WHO_DID_IT ;

where ARC_GD is a GUID and ARC_TS is a timestamp, and DML_FLAG is any of (I, U, or D)

I need to show changes to BASETABLE from CHANGESTABLE over time by column to reveal changes by 'WHO_DID_IT', 'ARC_TS', 'ID' and combinations thereof, including date windows.

UNPIVOT provides what I need in terms of showing columns as rows in a result set, which I can then sort, for example, by ID and ARC_TS and COL_NAME to show the chronology of changes to COL_VALUE accounting for different data types as needed.

SELECT * FROM ( SELECT * FROM CHANGESTABLE  ) 
 Unpivot Include NULLS (Col_Value FOR Col_Name IN( col1, col2, col1, col2, col3, .... col155,...col200 ) )

-- no where clause yet...

So the output is:

ID, ARC_TS, WHO_DID_IT, COL_NAME, COL_VALUE
1, timestamp1, A, COL1, 1
1, timestamp2, B,  COL1, 2
1, timestamp3, B, COL1, 2
1, timestamp1, C, COL2, NULL
1, timestamp2, C, COL2, 2
1, timestamp3, C, COL2, 3
...
...
...

etc. (sort order and WHERE clause depends on reporting requirement).

So for timestamp1 -> timestamp2, COL1 was changed by 'B' from 1 to 2,
timestamp2 -> timestamp3, COL2 was changed by 'C' from 2 to 3, etc.

This works essentially instantaneously for a tiny set of IDs but is unresponsive for the entire inventory.
So, what is the best way to be able to do this UNPIVOT over a large number of logged records to generate this output, or should this become a PL/SQL operation to generate a transposed table periodically for reporting (which has the right information but loses the benefit of an immediate analysis of changes in response to query, concern, etc.).

Or is ultimately best to just get the CHANGESTABLE records of interest and then UNPIVOT the subset every time? Still might be huge.

Thank you.
Allen

and Chris said...

So I built a test case to reproduce your situation to see what's going on:

declare
  stmt varchar2(32767);
begin
  stmt := 'create table changestable (
  arc_gd varchar2(10), arc_ts   timestamp, dml_flag varchar2(1)';
  
  for i in 1 .. 200 loop
    stmt := stmt || ', col'|| i || ' varchar2(10)';
  end loop;
  
  execute immediate stmt || ')';
  
  stmt := 'insert into changestable
  select mod(rownum, 26)+65, sysdate+rownum, 
  case mod(rownum, 3) when 0 then ''I'' when 1 then ''U'' when 2 then ''D'' end case';
  
  for i in 1 .. 200 loop
    stmt := stmt || ', ''' || i || '''';
  end loop;
  
  stmt := stmt || 'from dual connect by level <= 20000';

  execute immediate stmt;
end;
/

commit;

So we've got a 200+ row table with 20,000 rows. I then unpivoted five of the columns:

SQL> SELECT * FROM CHANGESTABLE
  2  Unpivot Include NULLS (
  3    Col_Value FOR Col_Name IN( col1, col2, col3, col155, col200 ) );

100000 rows selected.

Elapsed: 00:02:42.90

Over 2 and a half minutes. That's quite some time...

To find out where it's all going, I repeated the test. This time I traced the query:

alter session set tracefile_identifier = chris;
exec DBMS_monitor.session_trace_enable(null, null, true, true);

set autotrace trace 
SELECT * FROM CHANGESTABLE  
Unpivot Include NULLS (
  Col_Value FOR Col_Name IN( col1, col2, col3, col155, col200 ) );

exit

I then found the tracefile and ran TKPROF on it to see what was going on with the query. This gave the following execution stats:

SELECT * FROM CHANGESTABLE
Unpivot Include NULLS (
  Col_Value FOR Col_Name IN( col1, col2, col3, col155, col200 ) )

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch     1001      0.87       0.84       2001       3004          0      100000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     1003      0.87       0.85       2001       3004          0      100000

Less than 1 second elapsed?! So where did the other ~160s go?

Looking at the bottom of the file, I found:

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                    1015        0.00          0.00
  SQL*Net message from client                  1015        1.12        167.15
  SQL*Net more data to client                     1        0.00          0.00
  Disk file operations I/O                        2        0.00          0.00
  direct path read                              256        0.00          0.01
  log file sync                                   1        0.00          0.00

Wow! 167s on "SQL*Net message from client". Nearly all the time for the query was spent outside the database, waiting for SQL*Plus!

You see, when a client fetches data, it has to do something with it. This takes time. If you're unpivoting all 200 columns for 20k rows that becomes 4 million+ rows in your output!

No matter how you do this, it'll take a while for your client to process all these rows.

So the real question is:

Why are you returning all the data? A human is never going to trawl through millions of rows of data.

If someone is inspecting these data, implement some form of top-N/pagination. Let them filter and fetch a subset of the rows as they need.

Rating

(1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Elaborate

A reader, May 07, 2019 - 8:33 am UTC

Hi Chris
"If someone is inspecting these data, implement some form of top-N/pagination. Let them filter and fetch a subset of the rows as they need. "

Could you pls take your own example above and show by example how implement such advice?

May 10, 2019 - 8:18 am UTC

The point is - why is the OP running this query? Who is going to look at its output and why?

It's infeasible for a person to read millions of rows and notice anything meaningful.

So instead of returning all the data, create a search screen to get the first N rows. And add pagination so they can traverse the data set.

For example, with fetch first:

select * from ...
fetch first 10 rows only;

Breadcrumb

Question and Answer

You Asked

and Chris said...

Rating

Comments

Elaborate

More to Explore

PL/SQL demos

PL/SQL docs

Most efficient way to UNPIVOT a table with large # of columns

Breadcrumb

Question and Answer

You Asked

and Chris said...

Rating

Comments

Elaborate

More to Explore

PL/SQL demos

PL/SQL docs