You Asked

Tom,

We have an OLTP environment where we store calculation results. The 'data model' is (simplified) as follows:

SQL> desc DEVDAT.CALCFIGURESUB -- VALUES table
Name Null? Type
----------------------------------------- -------- -------------
CALCFIGIK NOT NULL NUMBER(10)
CALCFIGURE NUMBER(21,8)
CALCFIGSUBDATIK NOT NULL NUMBER(10)

SQL> desc DEVDAT.CALCFIGSUBDATA -- DATA table
Name Null? Type
----------------------------------------- -------- ---------------
CALCFIGSUBDATIK NOT NULL NUMBER(10)
LINEIK NOT NULL NUMBER(10)
CALCITEMTYPE NOT NULL NUMBER(5)
SEGIK NOT NULL NUMBER(10)
TRANSSEG NOT NULL NUMBER(10)
COSTIK NOT NULL NUMBER(5)
PTRIK NOT NULL NUMBER(5)
SIGN NOT NULL NUMBER(5)
CALCRULE NOT NULL VARCHAR2(1000)
REFLINE NOT NULL NUMBER(10)
SORT NOT NULL NUMBER(5)
ROUNDVAL NOT NULL NUMBER(13,9)
ROUNDTRUNC NOT NULL NUMBER(5)
FILENAME NOT NULL VARCHAR2(255)
REFKEY NOT NULL VARCHAR2(12)
<< 17 other NUMBER columns>>
PERFSPLITNAME NOT NULL VARCHAR2(50)
<< 13 other NUMBER columns>>

The column CALCFIGSUBDATIK in CALCFIGSUBDATA (from now on the DATA table) is the primary key of the table and the same column in CALCFIGURESUB (the VALUES table) is a foreign key that refers the DATA table. The two tables used to be one (via a simple join) but the high number of 'redundant' values in the DATA part of the tables did we try to 'normalize' all these columns in to a separate table and then try to reuse the values in the DATA table for each new insert into the VALUES table.

In an ideal world would we like to enforce uniqueness of all the columns in the DATA table (all except for the primary key) - but we have a problem as the VALUES (and DATA) table is filled (calculated) by a high number of parallel calculation servers. The problem we have seen very often is the following:

Time Process A Process B
0 Finds a new record R
that has to be inserted
into the DATA table.
1 Finds that the same record R
should be inserted into the
DATA table.
2 Inserts record R into
the DATA table.
3 Inserts record R into
the DATA table.
4 Commits
5 Commits

Record R is not visible for process B before time=4 - so the two processes will insert a 'redundant' record into the DATA table (with a new primary key). We have tried a number of things:

1) Add a unique constraint to the columns in the DATA table. Process B would then fail with ORA-00001 and process B could go back to before time=1. This will off cause not work for more than 32(!) columns in a unique constraint - and as you probably can see have we something like 40+ columns!

2) Make an autonom transaction that locks the full DATA table (and the VALUES since foreign keys can not be validated!) - The autonom transaction would then start around time=2 above and then verify if R is now in the table. If not will this be inserted and committed. Process B was forced to wait until time=4 before it could execute the autonom transaction. This serialized our access to DATA and VALUES tables - and was abandoned for that specific reason.

3) Allow 'duplicates' in the DATA table... (As we have elected now) - But this might gives us up to the number of calculation servers of 'redundant' rows of the R record.

We have looked into compressed tables (where we join the DATA and VALUES tables) - and we see a considerable space savings (if we do a CTAS) but all succeeding inserts (via OCI and array inserts) makes the table explode in size (some of our installations have more than 85000000 rows in the VALUE table).

Although we think we have exhausted our possibilities would we like to hear if you see an option that we have overlooked that can be used in an OLTP environment and does not have the characteristics of 3) above?

Sincerely,

Lars Stampe Villadsen

and Tom said...

seems like a classic case for messaging, queues. Instead of processes a, b, c, .... "finding" rows -- they instead have the rows pushed to them in a message queue and process them. Message queues (AQ in Oracle) are designed to be highly concurrent and make it so a message is delivered at least once and at most once.

Short of that, A and B (and C...) could take the data they have, concatenate it together into a delimited string and hash it. The hash will prevent duplicates and using the MD5 hash -- the odds of a false positive are extremely small.

scott@ORA9IR2> create or replace function hash_emp_row( p_rec in emp%rowtype ) return raw
2 is
3 l_raw raw(16);
4 l_data long;
5 begin
6 l_data := p_rec.empno || '/' || p_rec.ename || '/' ||
7 p_rec.job || '/' || p_rec.mgr || '/' ||
8 to_char(p_rec.hiredate,'yyyymmddhh24miss')||'/'||
9 p_rec.sal || '/' || p_rec.comm || '/' || p_rec.deptno;
10
11 return dbms_obfuscation_toolkit.md5( input => utl_raw.cast_to_raw(l_data) );
12 end;
13 /

Function created.

scott@ORA9IR2>
scott@ORA9IR2> begin
2 for x in ( select * from emp )
3 loop
4 dbms_output.put_line( x.empno || ' hashed(' || hash_emp_row(x) ||')' );
5 end loop;
6 end;
7 /
7369 hashed(96C7AF1AED37858C2D38CA955975069D)
7499 hashed(68EBC1E1B2B47F1BEBB0FC4BCB6E3408)
7521 hashed(D3A52FD8A5405A26370066A226A9C437)
7566 hashed(78B82C7CFAD7955CC4F2BAB58F54CBFC)
7654 hashed(3A99DDF9C17F035E633FBA085C941CF5)
7698 hashed(EE21B46747B90BA605C36723C2DA1557)
7782 hashed(9C982E8673B37F3EE865F8EBD74C0669)
7788 hashed(80B89547CEF50BF29989962FC2DF6384)
7839 hashed(243A88DA780E7AF590C2D1F945891A50)
7844 hashed(F2BCD4F03869C4AAD16D848937B993BD)
7876 hashed(4785D553EDFE95B5D5832215A10AA36B)
7900 hashed(324D92832667B0102C12204751998514)
7902 hashed(C571808ABC6659F44626DDD3B7BAA67E)
7934 hashed(48C27BED93FC95F851B46C7E25E03C73)

PL/SQL procedure successfully completed.

so you would add a raw(16) to the table with a unique constraint and upon insert -- you would compute the hash and it would provide the uniqueness checking.

Two ideas for you.

Rating

(2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

More than one process saving?

Lars Villadsen, May 30, 2005 - 1:57 pm UTC

Tom,

Thanks for your prompt response.

Just to make sure that I have understood your suggestion: Make the calculation servers write to a queue instead of directly to the DATA and VALUE tables. Add (a) server process(es) that receives the data and calculates the HASH value for the concatenated columns and let a unique constraint enforce the "uniqueness"!

Would you suggest that we should have more than one process saving or should we 'serialize' by simply using one? (The error handling for possible violations of the unique constraint by adding extra save processes would be 'slightly' more complicated).

Thanks

May 30, 2005 - 2:05 pm UTC

no, I was suggesting that the guys that create the "new record R" that A and B discover -- push a message onto a queue rather than create the "new record R"

But what about foregn keys then?

Lars Villadsen, May 30, 2005 - 2:18 pm UTC

But as we push these DATA columns will we not be able to save in the VALUES table before we know the value of the foreign key in the DATA table - i.e. A and B needs to wait for the queue to be processed?

Or should we simply use the hash-calculated value as the foreign key in DATA and then calculate the hash value before we push the values?

Thanks

May 30, 2005 - 2:45 pm UTC

I don't have enough knowledge of your processing to answer. My point is -- the thing that "makes the R records appear", maybe it doesn't make the "r records appear anymore", but queues a message with the R record and asks the team of A, B, C processes to "make them appear and do the processing (rollup or whatever) on them".

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

More than one process saving?

But what about foregn keys then?

More to Explore

PL/SQL demos

PL/SQL docs

Avoiding duplicates in master table

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

More than one process saving?

But what about foregn keys then?

More to Explore

PL/SQL demos

PL/SQL docs