Concatenation performance problem
Vlado, January 20, 2005 - 10:10 am UTC
Tom,
I'm concatenating columns in a table and then sending them to a checksum function(I do this for two tables, and then compare the checksums to confirm that the data in the tables is identical). The query is slow, so I was able to narrow the problem down to the concatenation (at least I think). The first trace result is from a query that does the concatenation and inserts the data into a table. The second is inserting the same amount of data into the the table, but with no concatenations. The first query executes in 4.78 secs, the second in 0.42. So, to me it seems that the insert is not the problem, but the concatenation (or the to_char function?). If it the problem is in the concatenation, any ideas how to make to tune it?
I apologize if you think that this is not a valid follow-up questions. If so, just let me know that I have to submit a new question...
insert
into h1
(select EVENT_ITEM_PRICE_ID
, substr((TO_CHAR(EVENT_ITEM_PRICE_ID)|| '|' ||TO_CHAR(EVENT_PRICE_LIST_ID)|| '|' ||TO_CHAR(COMPANY_
ID)|| '|' ||TO_CHAR(MASTER_ITEM_ID)|| '|' ||TO_CHAR(SORT_ORDER)|| '|' ||TO_CHAR(LAST_UPDATE_DATE,
'DD-MON-YYYY HH24:MI:SS')|| '|' ||TO_CHAR(LAST_UPDATED_BY)|| '|' ||TO_CHAR(CREATION_DATE,
'DD-MON-YYYY HH24:MI:SS')|| '|' ||TO_CHAR(CREATED_BY)|| '|' ||TO_CHAR(INACTIVE_FLAG)|| '|' ||TO_CHAR
(INACTIVE_DATE,
'DD-MON-YYYY HH24:MI:SS')|| '|' ||TO_CHAR(COMPANY_ITEM_ID)|| '|' ||ITEM_NUMBER|| '|' ||ITEM_DESCRIPT
ION|| '|' ||TO_CHAR(ITEM_PRICE)|| '|' ||TO_CHAR(RETAIL_PRICE)|| '|' ||TO_CHAR(WHOLESALE_PRICE)|| '|'
||TO_CHAR(OVERSELL_PERCENT)|| '|' ||TO_CHAR(START_TIME_DATE,
'DD-MON-YYYY HH24:MI:SS')|| '|' ||TO_CHAR(END_TIME_DATE,
'DD-MON-YYYY HH24:MI:SS')|| '|' ||TO_CHAR(DEFAULT_ORDER_QUANTITY)|| '|' ||QTY_UNIT_OF_MEASURE_CODE||
'|' ||TO_CHAR(MIN_ORDER_QUANTITY)|| '|' ||TO_CHAR(MAX_ORDER_QUANTITY)|| '|' ||TO_CHAR(MAX_EVENT_QUA
NTITY)|| '|' ||TO_CHAR(NO_AUTOSHIP_FLAG)|| '|' ||TO_CHAR(KIT_ONLY_FLAG)|| '|' ||TO_CHAR(MULTI_PAYMEN
T_FLAG)|| '|' ||TO_CHAR(MIN_NUMBER_PAYMENTS)|| '|' ||TO_CHAR(MAX_NUMBER_PAYMENTS)|| '|' ||TO_CHAR(SH
IPPING_AMOUNT)|| '|' ||TO_CHAR(HANDLING_AMOUNT)|| '|' ||IMAGE_FILENAME|| '|' ||SMALL_IMAGE_FILENAME|
| '|' ||SMALL_IMAGE_DESCRIPTION|| '|' ||TO_CHAR(BASE_ITEM_PRICE)|| '|' ||TO_CHAR(BASE_RETAIL_PRICE)|
| '|' ||TO_CHAR(BASE_WHOLESALE_PRICE)|| '|' ||TO_CHAR(BASE_SHIPPING_AMOUNT)|| '|' ||TO_CHAR(BASE_HAN
DLING_AMOUNT)|| '|' ||TO_CHAR(EXCHANGE_RATE)|| '|' ||TO_CHAR(EXCHANGE_DATE,
'DD-MON-YYYY HH24:MI:SS')|| '|' ||LOCAL_CURRENCY_CODE|| '|' ||BASE_CURRENCY_CODE|| '|' ||TO_CHAR(SCR
IPT_ID)|| '|' ||TO_CHAR(SHOW_START_TIME_DATE,
'DD-MON-YYYY HH24:MI:SS')|| '|' ||TO_CHAR(SHOW_END_TIME_DATE,
'DD-MON-YYYY HH24:MI:SS')|| '|' ||TO_CHAR(SHOW_GRACE_END_TIME_DATE,
'DD-MON-YYYY HH24:MI:SS')|| '|' ||SHOW_HOST|| '|' ||TO_CHAR(FORECAST_SALES_QUANTITY)|| '|' ||TO_CHAR
(FORECAST_SALES_AMOUNT)|| '|' ||TO_CHAR(ACTUAL_SALES_QUANTITY)|| '|' ||TO_CHAR(ACTUAL_SALES_AMOUNT)|
| '|' ||COMMENT_TEXT|| '|' ||ATTRIBUTE01|| '|' ||ATTRIBUTE02|| '|' ||ATTRIBUTE03|| '|' ||ATTRIBUTE04
|| '|' ||ATTRIBUTE05|| '|' ||ATTRIBUTE06|| '|' ||ATTRIBUTE07|| '|' ||ATTRIBUTE08|| '|' ||ATTRIBUTE09
|| '|' ||ATTRIBUTE10|| '|' ||ATTRIBUTE11|| '|' ||ATTRIBUTE12|| '|' ||ATTRIBUTE13|| '|' ||ATTRIBUTE14
|| '|' ||ATTRIBUTE15 ), 1, 32)
from x.coe_event_item_prices
where rownum < 10000)
call count cpu elapsed disk query current rows misses
------- ----- ---- ------- ---- ----- ------- ---- ------
Parse 1 0.02 0.02 0 0 0 0 1
Execute 1 4.81 4.77 22 446 221 9999 0
------- ----- ---- ------- ---------- ------- ---- ------
total 2 4.83 4.78 22 446 221 9999 1
| Rows Row Source Operation
|----- ---------------------------------------------------
| 9999 COUNT STOPKEY (cr=268 pr=22 pw=0 time=0.20)
| 9999 .TABLE ACCESS FULL COE_EVENT_ITEM_PRICES (cr=268 pr=22 pw=0 time=0.18)
Event Times Count Max. Total Blocks
waited on Waited Zero Time Wait Waited Accessed
--------------------------------------------------- --------- ---- ------ --------
db file scattered read (multiblock full scan) 5 0 0.01 0.03 18
db file sequential read (single block scan).. 4 0 0.01 0.01 4
SQL*Net message from client (idle)........... 1 0 0.00 0.00
SQL*Net message to client (idle)............. 1 0 0.00 0.00
--------------------------------------------- ------ -------- ---- ------ ---------
total.................................... 11 0 0.01 0.04 22
insert
into h1
(select EVENT_ITEM_PRICE_ID
, '11111111111111111111111111111111'
from x.coe_event_item_prices
where rownum < 10000)
call count cpu elapsed disk query current rows misses
------- ----- ---- ------- ---- ----- ------- ---- ------
Parse 1 0.01 0.01 0 2 0 0 1
Execute 1 0.40 0.41 0 189 231 9999 0
------- ----- ---- ------- ---- ----- ------- ---- ------
total 2 0.41 0.42 0 191 231 9999 1
| Rows Row Source Operation
| ---- ---------------------------------------------------
| 9999 COUNT STOPKEY (cr=13 pr=0 pw=0 time=0.06)
| 9999 .INDEX FULL SCAN COE_EVENT_ITEM_PRICES_PK (cr=13 pr=0 pw=0 time=0.04)
Event Times Count Max. Total Blocks
waited on Waited Zero Time Wait Waited Accessed
--------------------------------- ------ --------- ---- ------ --------
SQL*Net message from client (idle) 1 0 0.00 0.00
SQL*Net message to client (idle). 1 0 0.00 0.00
--------------------------------- ------ --------- ---- ------ --------
total......................... 2 0 0.00 0.00 0
January 20, 2005 - 10:51 am UTC
something else is different -- see the lio's going way down. what else changed...
oh, the plan.
get it using the same plan please before you make any assumptions....
Concatenation performance problem generic script
Vlado, January 20, 2005 - 11:03 pm UTC
I changed the executions plans, so they match, but the slowdown persists on the same scale.
Below is a test case you can execute yourself (it's using sys.argument$ based on 8.1.7).
I run this in two different databases (8.1.7) on two different servers, same problem. BTW, the query from my first question run on a third database (9ir2). Is it the number of columns that are being concatenated that are causing the slowdown?
create table h1 (pk varchar2(255), checksum varchar2(32));
insert
into h1
(select OBJ#, substr(to_char(OBJ#) || '|' || to_char(PROCEDURE$) || '|' || to_char(OVERLOAD#) || '|' || to_char(POSITION#) || '|'
|| to_char(SEQUENCE#) || '|' || to_char(LEVEL#) || '|' || to_char(ARGUMENT) || '|' || to_char(TYPE#) || '|'
|| to_char(CHARSETID) || '|' || to_char(CHARSETFORM) || '|' || to_char(DEFAULT#) || '|' || to_char(IN_OUT)
|| '|' || to_char(LENGTH) || '|' || to_char(PRECISION#) || '|' || to_char(SCALE) || '|' || to_char(RADIX)
|| '|' || to_char(DEFLENGTH) || '|' || to_char(TYPE_OWNER) || '|' || to_char(TYPE_NAME)
|| '|' || to_char(TYPE_SUBNAME) || '|' || to_char(TYPE_LINKNAME) || '|' || to_char(PLS_TYPE)
|| '|' || to_char(OBJ#) || '|' || to_char(PROCEDURE$) || '|' || to_char(OVERLOAD#) || '|' || to_char(POSITION#) || '|'
|| to_char(SEQUENCE#) || '|' || to_char(LEVEL#) || '|' || to_char(ARGUMENT) || '|' || to_char(TYPE#) || '|'
|| to_char(CHARSETID) || '|' || to_char(CHARSETFORM) || '|' || to_char(DEFAULT#) || '|' || to_char(IN_OUT)
|| '|' || to_char(LENGTH) || '|' || to_char(PRECISION#) || '|' || to_char(SCALE) || '|' || to_char(RADIX)
|| '|' || to_char(DEFLENGTH) || '|' || to_char(TYPE_OWNER) || '|' || to_char(TYPE_NAME)
|| '|' || to_char(TYPE_SUBNAME) || '|' || to_char(TYPE_LINKNAME) || '|' || to_char(PLS_TYPE)
|| '|' || to_char(OBJ#) || '|' || to_char(PROCEDURE$) || '|' || to_char(OVERLOAD#) || '|' || to_char(POSITION#) || '|'
|| to_char(SEQUENCE#) || '|' || to_char(LEVEL#) || '|' || to_char(ARGUMENT) || '|' || to_char(TYPE#) || '|'
|| to_char(CHARSETID) || '|' || to_char(CHARSETFORM) || '|' || to_char(DEFAULT#) || '|' || to_char(IN_OUT)
|| '|' || to_char(LENGTH) || '|' || to_char(PRECISION#) || '|' || to_char(SCALE) || '|' || to_char(RADIX)
|| '|' || to_char(DEFLENGTH) || '|' || to_char(TYPE_OWNER) || '|' || to_char(TYPE_NAME)
|| to_char(SEQUENCE#) || '|' || to_char(LEVEL#) || '|' || to_char(ARGUMENT) || '|' || to_char(TYPE#) || '|'
|| to_char(CHARSETID) || '|' || to_char(CHARSETFORM) || '|' || to_char(DEFAULT#) || '|' || to_char(IN_OUT)
|| '|' || to_char(LENGTH) || '|' || to_char(PRECISION#) || '|' || to_char(SCALE) || '|' || to_char(RADIX)
|| '|' || to_char(DEFLENGTH) || '|' || to_char(TYPE_OWNER) || '|' || to_char(TYPE_NAME)
|| '|' || to_char(TYPE_SUBNAME) || '|' || to_char(TYPE_LINKNAME) || '|' || to_char(PLS_TYPE)
|| '|' || to_char(OBJ#) || '|' || to_char(PROCEDURE$) || '|' || to_char(OVERLOAD#) || '|' || to_char(POSITION#) || '|' ,1,32)
from sys.ARGUMENT$
where rownum < 10000);
commit;
truncate table h1;
insert
into h1 h
(select /*+ FULL(h) */ obj#
, '11111111111111111111111111111111'
from sys.ARGUMENT$ h
where rownum < 10000);
commit;
drop table h1;
January 20, 2005 - 11:58 pm UTC
on my system, that nasty gnarly huge concatention, with lots of "to's" to convert - to build a really big string took.........
7.4007 e-5 cpu seconds per row extra...
.000074007 cpu seconds, per row..
7 one hundred thousands of a second.....
seems pretty darn good when stated that way no?
0.82 vs 0.08 -- you have to expect some cost -- and if you do something that is really fast ALOT, it adds up.
don't see any silver bullets here.
String concatenation
Anders, January 21, 2005 - 5:37 am UTC
I thing Vlado do have a point.
There are different ways of implementing the concatenation. Lets say, for the sake of argument, that we want to concatenate n strings each containing 1 character:
* An naive way of doing it would be, for each concatination operator, to allocate a new string just lange enough for holding the to strings being concatinated. For the first concatination this would need two copy operations, for the second three copy operations and so forth. A total of something like n*(n+1)/2-1 copy operations (some would say O(n^2) )
* If you are stuck with the inefficient copy operator you can however do a little better by changeing the order of the concatination since the naive way will end up concatinating very long strings. By first concatinating all the neighbors we get n/2 strings 2 characters long, concating the new neighbors gives us n/4 strings 4 characters long and so forth. For each of the log2(n) steps we copy n characters, a total of n*log2(n) copy operations.
* The most efficient way of doing this would problably be to allocate a new n character string and copying all the strings into this new string. This would require copying n characters.
All the bla-bla aside, it turns out oracle is rather naive regarding concatination. I don't think Oracle will let us do the most efficient way, so I'll try the second solution. First I ran Vlados statement with a max to force all concatinations but only get one row returned:
select max(to_char(OBJ#) || '|' || ... || (to_char(POSITION#) || '|')
from sys.ARGUMENT$
where rownum < 10000;
Then I added some parentesis to the query:
select max(
(((((((to_char(OBJ#)||'|')||(to_char(PROCEDURE$)||'|'))||
((to_char(OVERLOAD#)||'|')||(to_char(POSITION#)||'|')))||
(((to_char(SEQUENCE#)||'|')||(to_char(LEVEL#)||'|'))||
((to_char(ARGUMENT)||'|')||(to_char(TYPE#)||'|'))))||
((((to_char(CHARSETID)||'|')||(to_char(CHARSETFORM)||'|'))||
((to_char(DEFAULT#)||'|')||(to_char(IN_OUT)||'|')))||
(((to_char(LENGTH)||'|')||(to_char(PRECISION#)||'|'))||
((to_char(SCALE)||'|')||(to_char(RADIX) ||'|')))))||
(((((to_char(DEFLENGTH)||'|')||(to_char(TYPE_OWNER)||'|'))||
((to_char(TYPE_NAME)||'|')||(to_char(TYPE_SUBNAME)||'|')))||
(((to_char(TYPE_LINKNAME)||'|')||(to_char(PLS_TYPE)||'|'))||
((to_char(OBJ#)||'|')||(to_char(PROCEDURE$)||'|'))))||
((((to_char(OVERLOAD#)||'|')||(to_char(POSITION#)||'|'))||
((to_char(SEQUENCE#)||'|')||(to_char(LEVEL#)||'|')))||
(((to_char(ARGUMENT)||'|')||(to_char(TYPE#)||'|'))||
((to_char(CHARSETID)||'|')||(to_char(CHARSETFORM)||'|'))))))||
((((((to_char(DEFAULT#)||'|')||(to_char(IN_OUT)||'|'))||
((to_char(LENGTH)||'|')||(to_char(PRECISION#)||'|')))||
(((to_char(SCALE)||'|')||(to_char(RADIX)||'|'))||
((to_char(DEFLENGTH)||'|')||(to_char(TYPE_OWNER)||'|'))))||
((((to_char(TYPE_NAME)||'|')||(to_char(TYPE_SUBNAME)||'|'))||
((to_char(TYPE_LINKNAME)||'|')||(to_char(PLS_TYPE)||'|')))||
(((to_char(OBJ#)||'|')||(to_char(PROCEDURE$)||'|'))||
((to_char(OVERLOAD#)||'|')||(to_char(POSITION#)||'|')))))||
(((((to_char(SEQUENCE#)||'|')||(to_char(LEVEL#)||'|'))||
((to_char(ARGUMENT)||'|')||(to_char(TYPE#)||'|')))||
(((to_char(CHARSETID)||'|')||(to_char(CHARSETFORM)||'|'))||
((to_char(DEFAULT#)||'|')||(to_char(IN_OUT)||'|'))))||
((((to_char(LENGTH)||'|')||(to_char(PRECISION#)||'|'))||
((to_char(SCALE)||'|')||(to_char(RADIX)||'|')))||
(((to_char(DEFLENGTH)||'|')||(to_char(TYPE_OWNER)||'|'))||
((to_char(TYPE_NAME)||(to_char(SEQUENCE#)||'|')))))))||
(((((((to_char(LEVEL#)||'|')||(to_char(ARGUMENT)||'|'))||
((to_char(TYPE#)||'|')||(to_char(CHARSETID)||'|')))||
(((to_char(CHARSETFORM)||'|')||(to_char(DEFAULT#)||'|'))||
((to_char(IN_OUT)||'|')||(to_char(LENGTH)||'|'))))||
((((to_char(PRECISION#)||'|')||(to_char(SCALE)||'|'))||
((to_char(RADIX)||'|')||(to_char(DEFLENGTH)||'|')))||
(((to_char(TYPE_OWNER)||'|')||(to_char(TYPE_NAME)||'|'))||
((to_char(TYPE_SUBNAME)||'|')||(to_char(TYPE_LINKNAME)||'|')))))||
(((((to_char(PLS_TYPE)||'|')||(to_char(OBJ#)||'|'))||
((to_char(PROCEDURE$)||'|')||(to_char(OVERLOAD#)||'|')))||
(((to_char(POSITION#)||'|')))))))))
from sys.ARGUMENT$
where rownum < 10000;
The first query took 1.000 seconds, but the second query took 0.563 seconds. Of cause a minor improvement, so maybe i'm just nitpicking. Further the parentesis/benefit ratio isn't so great, but most of the benefit can be had by just splitting the expression into 4 groups.
Explanation?
Frank, January 21, 2005 - 7:24 am UTC
Couldn't the difference in Vlado's results come from the fact that the concatenation-query selects a ton of columns from the originating table, whereas the other query just selects 11111?
January 21, 2005 - 8:40 am UTC
that is part of it too, yes. 7/1000000's of a second to get all of the columns from the block, format them, and concatenate the strings.
Concatenation performance problem
Vlado, January 22, 2005 - 7:29 pm UTC
Well, I was surprised that the concatenations & to_chars increase the execution time buy a factor of approx. 10.
Even when I execute this with more rows (50000, 100000 & 500000) the factor is between 5 - 10.
I guess I'm used to data access taking most of the time, not data transformations.
Thanks for your help.
OK
A reader, August 24, 2005 - 8:00 am UTC
Hi Tom,
I would like to split a sentence and store into
an array with each word stored in that array elements.
HOW TO DO THAT?
SQL>declare
type t is table of varchar2(30)
index by pls_integer;
chartab t;
str varchar2(100) := 'How Now Brown Cow'
How to proceed further??
Could you please helpout??
August 24, 2005 - 2:07 pm UTC
instr and substr are your friend.
search this site for str2tbl, you'll see a function that parses a string around delimiters and returns them as a table type, just modify to stick them into an array instead.
OK
A reader, August 25, 2005 - 2:23 am UTC
SQL> set serverout on
SQL> create or replace type vc2arr as table of varchar2(30)
2 /
Type created.
SQL> declare
2 str varchar2(30) := 'Hello Nice World';
3 l_n number;
4 l_arr vc2arr;
5 begin
6 loop
7 l_n := instr(str,' ');
8 select cast(substr(str,1,l_n-1) as vc2arr) into l_arr from dual;
9 exit when nvl(l_n,0) = 0;
10 str := substr(str,l_n+1);
11 end loop;
12 for i in 1..l_arr.count loop
13 dbms_output.put_line(l_arr(i));
14 end loop;
15 end;
16 /
select cast(substr(str,1,l_n-1) as vc2arr) into l_arr from dual;
*
ERROR at line 8:
ORA-06550: line 8, column 13:
PL/SQL: ORA-00932: inconsistent datatypes: expected - got CHAR
ORA-06550: line 8, column 1:
PL/SQL: SQL Statement ignored
August 25, 2005 - 3:33 am UTC
you cannot cast a scalar into an array type like that, you don't need to, you would
select <scalar>
BULK COLLECT into l_arr
from dual;
OK
A reader, August 25, 2005 - 7:05 am UTC
Hi Tom,
Thanks for that response.
Still I am not getting any output.
SQL> set serverout on
SQL> declare
2 str varchar2(30) := 'How Now Brown Cow';
3 l_n number;
4 l_arr vc2arr;
5 begin
6 loop
7 l_n := instr(str,' ');
8 select substr(str,1,l_n) bulk collect into l_arr from dual;
9 exit when (nvl(l_n,0) = 0);
10 str := substr(str,l_n+1);
11 end loop;
12 for i in 1..l_arr.count loop
13 dbms_output.put_line(l_arr(i));
14 end loop;
15 end;
16 /
PL/SQL procedure successfully completed.
What may be the reason??
August 25, 2005 - 8:35 am UTC
well, you are NOT ADDING to the array, why are you selecting bulk collecting?
loop
...
l_arr(l_arr.count+1) := substr( str, 1, l_n );
....
perhaps that is what you mean? (assuming vc2arr is a PLSQL index by table, if not, you:
l_arr := l_arr();
loop
...
l_arr.extend;
l_arr(l_arr.count) := substr( str, 1, l_n );
....
VALUE corresponding to Maximum value???
ramis, September 26, 2005 - 2:44 pm UTC
Hi,
i have a table in which three are three NUMERIC columns
CREATE TABLE t (id NUMBER, col1 NUMBER, col2 NUMBER)
/
INSERT INTO t VALUES (1,23,1);
INSERT INTO t VALUES (2,211,18);
INSERT INTO t VALUES (3,99,1);
INSERT INTO t VALUES (4,199,9);
INSERT INTO t VALUES (1,45,1);
[code]
ID COL1 COL2
1 23 1
2 211 18
3 99 1
4 199 9
1 45 1
...so on
NOW
SQL> SELECT ID, MAX(COL1) FROM TABLE
GROUP BY ID
ID MAX(COL1)
1 45
2 211
3 99
4 199
what I want is that if the MAX(COL1) value has COL_2 = 1 in the original table then I want '*' asterick concatinated to the GROUP BY out put in this format
ID MAX(COL1)
1 45*
2 211
3 99*
4 199
as you can see that for ID = 1 THE MAX(COL1) IS 45 and in original table COL2 = 1 WHERE COL1 = 45
IN other words if the MAX(col1) value does not have col2 = 1 in the original table than it should come as it is in the group by query i.e 211
otherwise, if MAX(col1) value has col2 = 1 in the original table than it should come concatinated with * (asterick) i.e. 45*, 99*
please tell me the shortest and the easiest way to achieve this as I have already a very long GROUP BY query in which I wanted to include this MAXIMUM column result
i am using 9i
regards,
ramis.
September 27, 2005 - 9:45 am UTC
ops$tkyte@ORA10G> column max_c1 format a10
ops$tkyte@ORA10G> select id, max(col1) || case when max(col1) = max(case when col2=1 then col1 end) then '*' end max_c1
2 from t
3 group by id
4 /
ID MAX_C1
---------- ----------
1 45*
2 211
3 99*
4 199