Skip to Main Content
  • Questions
  • Performing aggregation through the hierarchical data

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Muhammad.

Asked: April 15, 2010 - 2:36 am UTC

Last updated: April 19, 2010 - 9:58 am UTC

Version: 11.1.0

Viewed 1000+ times

You Asked

Hi Thomas Kyte!

I have the description of following tables in (General Ledger) GL Schema:
GL_COA and SL_COA tables are used to manage chart of accounts in the hierarchy up to 5 levels. Out of these 5, 4 levels are described in GL_COA table and the final (5th) level is described in SL_COA. The first three levels in GL_COA are for summary purpose and are not used in financial transactions directly, whereas level 4 and 5 are used in transactions. The financial transactions are posted in GL_ENTRIES table. The SL_COA table is the child table of GL_COA. Example of Chart of Account in hierarchy is as:
+Total Assets (Level 1)
+Fixed Assets
+Vehicles
+Hiace
+ICT MF (989) MERCEDES
+ICT FB (989) MERCEDES
+Current Assets (Level 2)
+Bank Balances (Level 3)
+Askari Bank, Scheme III, RWP. (Level 4)
+1123343-909PKR
+1123343-910PKR
CREATE TABLE gl_coa
(gl_account_id VARCHAR2(20),
gl_account_title VARCHAR2(50),
gl_base_account_id VARCHAR2(20),
CONSTRAINT gl_coa_pk PRIMARY KEY(gl_account_id),
CONSTRAINT gl_coa_fk1 FOREIGN KEY(gl_base_account_id) REFERENCES gl_coa(gl_account_id)
);

The data of GL_COA table is as:
INSERT INTO gl_coa VALUES('11','Total Assets',NULL);
INSERT INTO gl_coa VALUES('1102','Current Assets','11');
INSERT INTO gl_coa VALUES('110201','Bank Balances','1102');
INSERT INTO gl_coa VALUES('1102010001','Askari Bank, Scheme III, RWP.','110201');
INSERT INTO gl_coa VALUES('1102010002','Askari Bank, F-10, ISB.','110201');
INSERT INTO gl_coa VALUES('1102010003','Bank of Punjab, Scheme III RWP.','110201');
INSERT INTO gl_coa VALUES('1101','Fixed Assets','11');
INSERT INTO gl_coa VALUES('110101','Vehicles','1101');
INSERT INTO gl_coa VALUES('110101001','Hiace','110101');
INSERT INTO gl_coa VALUES('110101002','Coasters','110101');
INSERT INTO gl_coa VALUES('110101003','VANs','110101');

The hierarchy can be retrieved using the following query.
SELECT
level,
LPAD(gl_account_title,LENGTH(gl_account_title)+ (level*2) - 2,'-') gl_account_title
FROM gl_coa
START WITH gl_base_account_id IS NULL
CONNECT BY PRIOR gl_account_id = gl_base_account_id;

CREATE TABLE sl_coa
(sl_account_id NUMBER,
sl_account_title VARCHAR2(50),
gl_account_id VARCHAR2(20),
CONSTRAINT sl_coa_pk PRIMARY KEY(gl_account_id,sl_account_id),
CONSTRAINT sl_coa_fk1 FOREIGN KEY(gl_account_id) REFERENCES gl_coa(gl_account_id)
);

The data of SL_COA table is as:
INSERT INTO sl_coa VALUES(1,'1123343-909PKR','1102010001');
INSERT INTO sl_coa VALUES(2,'1123343-910PKR','1102010001');
INSERT INTO sl_coa VALUES(1,'1123343-911PKR','1102010002');
INSERT INTO sl_coa VALUES(1,'1123343-911PKR','1102010003');
INSERT INTO sl_coa VALUES(1,'ICT MF (989) MERCEDES ','110101001');
INSERT INTO sl_coa VALUES(2,'ICT FB (989) MERCEDES ','110101001');

The full chart of account hierarchy up to 5 levels can be retrieved using the following SQL:
SELECT
level,
LPAD(gl_account_title,LENGTH(gl_account_title)+ (level*2) - 2,'-') gl_account_title
FROM (SELECT gl_account_id, gl_account_title, gl_base_account_id FROM gl_coa
UNION ALL
SELECT TO_CHAR(sl_account_id), sl_account_title, gl_account_id FROM sl_coa)
START WITH gl_base_account_id IS NULL
CONNECT BY PRIOR gl_account_id = gl_base_account_id;

Financial transactions are posted in GL_ENTRIES table.
CREATE TABLE gl_entries
(journal_id NUMBER PRIMARY KEY,
journal_date DATE,
gl_account_id VARCHAR2(20),
sl_account_id NUMBER,
journal_amount NUMBER,
CONSTRAINT gl_entries_fk01 FOREIGN KEY(gl_account_id,sl_account_id) REFERENCES sl_coa(gl_account_id,sl_account_id)
);

The data of GL_ENTRIES table is as:
INSERT INTO gl_entries VALUES(100,SYSDATE,'1102010001',1,5000);
INSERT INTO gl_entries VALUES(101,SYSDATE,'1102010001',2,15000);
INSERT INTO gl_entries VALUES(102,SYSDATE,'1102010002',1,7000);
INSERT INTO gl_entries VALUES(103,SYSDATE,'110101001',1,30000);
INSERT INTO gl_entries VALUES(104,SYSDATE,'110101001',2,5000);

Now I want to retrieve trial of these accounts using the following SQL:
SELECT
LPAD(gl_account_title,LENGTH(gl_account_title)+ (level*2) - 2,'-') gl_account_title,
(
SELECT NVL(SUM(journal_amount),0) je_voucher_amount
FROM gl_entries
WHERE gl_account_id IN
(
SELECT gl_account_id FROM gl_coa
START WITH gl_base_account_id = outer.gl_account_id
CONNECT BY PRIOR gl_account_id = gl_base_account_id
)
OR gl_account_id = outer.gl_account_id
)
Outstanding
FROM gl_coa outer
START WITH outer.gl_base_account_id IS NULL
CONNECT BY PRIOR outer.gl_account_id = outer.gl_base_account_id;

The result is
GL_ACCOUNT_TITLE OUTSTANDING
-------------------------------------------------- -----------
Total Assets 62000
--Fixed Assets 35000
----Vehicles 35000
------Hiace 35000
------Coasters 0
------VANs 0
--Current Assets 27000
----Bank Balances 27000
------Askari Bank, Scheme III, RWP. 20000
------Askari Bank, F-10, ISB. 7000
------Bank of Punjab, Scheme III RWP. 0

The result retrieved by above query is exactly what is required but the problem lies with the scalar sub-query used in the SQL. It is taking too long to get executed because our production database has almost 32,674 chart of account entries in GL_SOA and SL_COA tables and a total of 500k of financial entries. When I execute this query it kills the server performance. My questions here are that:
1. is it possible to rewrite the same query without scalar sub-query proficiently?
2. is it possible to compute the summaries of level 4 from the GL_ENTRIES table for once whereas for the remaining of every n-1 level, n level should acts as the base data source in the SQL query?

I hope that I may describe the whole problem. Sorry again for such a long question. Thanks in advance for any advise or guidance.
With best regards.
Bilal

Excellent! Thanks for the response. Its great. The report based on this query was taking too much time before. The actual query is as:
SELECT
gl_account_id,
DECODE(level,1,UPPER(gl_account_title),outer.gl_account_title) gl_account_title,
(SELECT SUM(je_voucher_line_amount) je_voucher_amount
FROM erp$gl_journal_lines c JOIN erp$gl_journal_entries p USING (university_id, institute_id, je_voucher_id)
WHERE (gl_account_id IN
(
SELECT gl_account_id FROM erp$gl_chart_of_accounts
WHERE university_id=SYS_CONTEXT('erp$security_mgmt_ctx','university')
AND institute_id=SYS_CONTEXT('erp$security_mgmt_ctx','institute')
START WITH gl_base_account_id = outer.gl_account_id
CONNECT BY PRIOR gl_account_id = gl_base_account_id
)
OR gl_account_id = outer.gl_account_id)
AND university_id=SYS_CONTEXT('erp$security_mgmt_ctx','university')
AND institute_id=SYS_CONTEXT('erp$security_mgmt_ctx','institute'))
gl_account_withheld_amount
FROM erp$gl_chart_of_accounts outer
WHERE university_id=SYS_CONTEXT('erp$security_mgmt_ctx','university')
AND institute_id=SYS_CONTEXT('erp$security_mgmt_ctx','institute')
START WITH outer.gl_base_account_id IS NULL
CONNECT BY PRIOR outer.gl_account_id = outer.gl_base_account_id
....
1269 rows selected.

Elapsed: 27:23:39.71

The modified query is as:
WITH list_of_charts
AS
(
SELECT DISTINCT university_id, institute_id, CONNECT_BY_ROOT gl_account_id l4_gl_account_id, gl_account_id
FROM erp$gl_chart_of_accounts
WHERE university_id=SYS_CONTEXT('erp$security_mgmt_ctx','university')
AND institute_id=SYS_CONTEXT('erp$security_mgmt_ctx','institute')
CONNECT BY PRIOR gl_account_id = gl_base_account_id AND university_id = university_id AND institute_id=institute_id),
list_of_witheld
AS
(
SELECT l.university_id, l.institute_id, l.l4_gl_account_id, SUM(TO_NUMBER(DECODE(je_voucher_line_dc,'D',je_voucher_line_amount,0))) -
SUM(TO_NUMBER(DECODE(je_voucher_line_dc,'C',je_voucher_line_amount,0))) je_voucher_amount
FROM list_of_charts l JOIN erp$gl_journal_lines c ON (l.university_id=c.university_id AND l.institute_id=c.institute_id AND l.gl_account_id=c.gl_account_id) JOIN erp$gl_journal_entries p ON (p.university_id = c.university_id AND p.institute_id=c.institute_id ANd p.je_voucher_id=c.je_voucher_id AND p.je_voucher_category=c.je_voucher_category)
WHERE p.je_voucher_date <= CAST(PD$date AS DATE)
AND p.university_id=SYS_CONTEXT('erp$security_mgmt_ctx','university')
AND p.institute_id=SYS_CONTEXT('erp$security_mgmt_ctx','institute')
GROUP BY l.university_id, l.institute_id, l.l4_gl_account_id
)
SELECT gl_account_id,
DECODE(level,1,UPPER(gl_account_title),gl_account_title) gl_account_title,
w.je_voucher_amount
FROM erp$gl_chart_of_accounts e JOIN list_of_witheld w ON (w.university_id=e.university_id AND w.institute_id=e.institute_id AND w.l4_gl_account_id=e.gl_account_id)
WHERE level <= PN$level
AND e.university_id=SYS_CONTEXT('erp$security_mgmt_ctx','university')
AND e.institute_id=SYS_CONTEXT('erp$security_mgmt_ctx','institute')
START WITH e.gl_base_account_id IS NULL
CONNECT BY PRIOR e.gl_account_id = e.gl_base_account_id AND e.university_id = e.university_id AND e.institute_id=e.institute_id
1126 rows selected.

Elapsed: 00:00:02.78

It greatly improved the response time of query. Thanks a lot of your guidance and support.
With best regards
Bilal

and Tom said...

27 hours to 2 seconds :)

Greatly improved indeed! Thanks for the feedback...


what about this:

with data
as
(
SELECT distinct connect_by_root gl_account_id top_gl_account_id, gl_account_id
FROM gl_coa
CONNECT BY PRIOR gl_account_id = gl_base_account_id
),
data2
as
(select top_gl_account_id, sum(journal_amount) sum_amt
from data, gl_entries
where data.gl_account_id = gl_entries.gl_account_id
group by top_gl_account_id
)
SELECT LPAD(gl_account_title,LENGTH(gl_account_title)+ (level*2) - 2,'-') gl_account_title,
data2.sum_amt
FROM gl_coa outer, data2
where outer.gl_account_id = data2.top_gl_account_id
START WITH outer.gl_base_account_id IS NULL
CONNECT BY PRIOR outer.gl_account_id = outer.gl_base_account_id;



data builds the entire hierarchy in bulk (instead of doing it slow by slow in the subquery) and then joins that to the gl_entries in bulk and aggregates to get the total rollups and then joins that to the small hierarchy.

How does that work for you?

Rating

  (1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

CONNECT BY PRIOR moved from "where" to "from" fixed performance for me too

Bill Murphy, April 19, 2010 - 1:11 pm UTC


I had a similar problem with performance that was also solved this way.

Much faster execution after moving the "connnect by prior" from the "where" to the "from" portion of the query. For me it took a 5-hour query to a 12 minute one, so not as dramatic as seen above, but still a good technique to try if connect by prior in the "where" is hurting performance.

(I try not to post "me-too" to the internet, but this generalized answer would have saved me a day or more of investigating how to speed up a query had I seen it a few weeks ago... so in this case "me-too!")

--Bill