Hi Chris/Connor,
I had a disagreement with someone on the use of straight SQL vs SQL calling PL/SQL function.
Here is the scenario....
In a data warehouse environment....
We have few external tables that are used to populate fact tables. External table to fact table is a one to one relation.
And each external table has individual SQL jobs to populate the corresponding fact table.
When populating the fact table, we need to get dimension data(dimension table's primary key) based on values from external table.
I said, it's better to use straight SQL to read the external table and then join dimension table to get it's primary key.
The other person said, it's better to use PL/SQL function with RESULT CACHE enabled to get the dimension table's primary key because it's modular and can be used by other programs.
External table can have around 300K to 500K records, dimension table may be around 2,000 to 10,000 and the load process runs around 10 times a day.
This dimension lookup may be used by 3 or 4 different programs.
The PL/SQL function is just a straight SQL, meaning no procedural code involved.
If it was you , would you use and if you can explain the benefit of one over the other
1)
SELECT ext.*, d1.id dim1_id, d2.id dim2_id
FROM external_table ext, dimension1_table d1, dimension2_table d2
WHERE ext.dim1_val1 = d1.dim1_val1(+)
AND ext.dim1_val2 = d1.dim1_val2(+)
AND ext.dim2_val1 = d2.dim2_val1(+)
AND ext.dim2_val2 = d2.dim2_val2(+)
AND ext.dim2_val3 = d2.dim2_val3(+);
OR
2)
CREATE FUNCTION DIM1_ID_FUNC(I_dim1_val1 VARCHAR2,
I_dim1_val2 VARCHAR2)
RETURN NUMBER RESULT_CACHE RELIES_ON (DIMENSION1_TABLE) AS
L_dim1_id NUMBER(10);
BEGIN
SELECT d1.id
INTO L_dim1_id
FROM dimension1_table d1
WHERE d1.dim1_val1 = I_dim1_val1
AND d1.dim1_val2 = I_dim1_val2;
---
RETURN L_dim1_id;
END DIM1_ID_FUNC;
CREATE FUNCTION DIM2_ID_FUNC(I_dim2_val1 VARCHAR2,
I_dim2_val2 VARCHAR2,
I_dim2_val3 VARCHAR2)
RETURN NUMBER RESULT_CACHE RELIES_ON (DIMENSION2_TABLE) AS
L_dim2_id NUMBER(10);
BEGIN
SELECT d2.id
INTO L_dim2_id
FROM dimension2_table d2
WHERE d2.dim2_val1 = I_dim2_val1
AND d2.dim2_val2 = I_dim2_val2
AND d2.dim2_val3 = I_dim2_val3;
---
RETURN L_dim2_id;
END DIM1_ID_FUNC;
SELECT ext.*, DIM1_ID_FUNC(d1.dim1_val1, d1.dim1_val2) dim1_id, DIM2_ID_FUNC(d2.dim2_val1, d2.dim2_val2, d2.dim2_val3) dim2_id
FROM external_table ext;
Thank you
Either will probably work just fine, but I'm with you. Simple SQL.
SQL's from within PL/SQL functions that are being called from SQL have read consistency challenges (which may not apply to you if your dimensions are static during the load).
There's also the cost of jumping back and forth between the sql and plsql engines. And as your dimensions grow over time...eventually you might start running into dramas with not enough result cache memory ... and things start to go very poorly after that (It costs a lot more to - look in cache, find its not there, age something else out, put the new stuff in).