Skip to Main Content
  • Questions
  • Collections, Associative array or nested table for huge volume of data processing

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, syed.

Asked: April 27, 2010 - 11:31 am UTC

Last updated: July 29, 2019 - 4:05 pm UTC

Version: 10.2.0

Viewed 10K+ times! This question is

You Asked

Hi Tom,

At the outset let me thank you for your help.

I have been working with collections since past 3 year. I have few doubts, it would be great if you could clarify.
1) when i am processing huge volum of data in PLSQL, which collection to be used. Nested table or Associative array? and why? Assume that i am not going to put processed data back into database.
2) Few docs of oracle have mentioned that oracle maintains Btree structure like representation to store the data of Associative array in memory. How about nested table, whether oracle uses same type of indexes here as well.?

Thanks

and Tom said...

1) trick question, the only answer is:

neither, you do not process huge volumes of data in memory, in collections/index by tables (plsql index by tables are all associated arrays, collections are the other kind of array).

You process a bit of data at a time, not tens of thousands, not millions of entries at a time.

So, when doing it right - a piece at a time, a few hundred/thousand (low thousands) - you use which ever one is easier.

I myself prefer index by tables, they work much more like an array - you don't have to .extend them. They are nice also in that they can be sparse if it makes sense - and it gives you a really easy way to look up something - either by a number or a string.


2) index by tables are sparse - that much is documented. That internally they are a b*tree isn't so relevant to us. That they are sparse - is. What if you had to store index elements -2000000000 and +2000000000 for whatever reason.

ops$tkyte%ORA10GR2> declare
  2          type array is table of date index by binary_integer;
  3          l_data array;
  4  begin
  5          l_data(-2000000000) := sysdate;
  6          l_data(+2000000000) := sysdate+1;
  7  end;
  8  /

PL/SQL procedure successfully completed.


that is nice, it doesn't allocate all 4,000,000,000 elements - just the two it needs.

If you use a collection for that - think about what you would have to .extend to :)

collections in plsql variables are not sparse.

Whether that impacts you or not - is up to your design, your implementation.

Most of the times, I use plsql index by tables - as arrays.

Collections are nice if you want to reference the collection in SQL directly - if that is necessary, the collections would be the way to go.


So, for much of my code - plsql index by tables - no .extend, sparse - but no direct interaction with SQL, no easy way to initialize them..

Collections - not sparse, have to allocate them, direct interaction with SQL - and a nice initialization method.

Rating

  (7 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

syed zaheerulla, April 28, 2010 - 12:56 pm UTC

Thanks a lot for your valuable information.

I have a query regarding the indexes of Associative array and nested tables. If oracle is using indexing mechanism for associative array, it has to use same kind of mechanish for nested table also right? It is logical that it has to retrieve the value from both nested table and associative array and some kind of indexing is required for both types.
I mean its not that orace is using indexing only with associatve array, it is using index with nested table also right?
Tom Kyte
April 28, 2010 - 1:07 pm UTC

... If oracle is using indexing mechanism for associative array, it has to use same kind of mechanish for nested table also right? ...

how did you make that jump in logic?


If A uses method X for Y, it has to use method X for Z as well.


If "a doctor" uses "amputation" for "gangrene", it (the doctor) has to use "amputation" for "a wart"



Why would you use the same algorithm for

o storing a sparse array, indexed by "anything" (plsql index by tables are all associative arrays)

o storing a dense array, indexed by only a numeric offset


I myself would be tempted to use - something completely different for the two, wouldn't you?

Array/Collection naming

Charlie 木匠, April 28, 2010 - 4:33 pm UTC

Too many names. ^_^

PL/SQL table = Associative Array (or index-by table)

-- Associative array type
TYPE population IS TABLE OF NUMBER INDEX BY VARCHAR2(64);

--Declaring Nested Tables
TYPE nested_type IS TABLE OF VARCHAR2(30);

invoke EXTEND method to add elements later

--Collection of ADT = UDT, Abstract datatype, User defined datatype:

CREATE OR REPLACE TYPE INVDB.NUMBER_TAB_TYPE is table of number;
/

select ... from TABLE(ADT_Table);


Tom Kyte
April 28, 2010 - 5:44 pm UTC

just use two

plsql index by tables, which are sparse associative arrays

collections, which are dense arrays indexed by some integer

Syed Zaheerulla, April 29, 2010 - 5:17 am UTC

I would follow you tom, I will also be tempted to use different alogorithm for both, I think I didnt put my question right, Actually i wanted to ask whether oracle is using some kind of indexing mechanism for nested table as well? It cannot be the same as associative array but oracle is indexing the nested table elements as well with different algorithm.
Thanks
Tom Kyte
April 29, 2010 - 7:44 am UTC

think of them as you would think of any dense array in a programming language. They are dense - internally they might be arrays of lists, list of lists, hash tables, x, y, z - but they work and are addressable as dense arrays would be in any other language.


In order words, we don't really care or need to know - beyond the fact that there are

a) sparse arrays
b) dense arrays


Which method should be used

Shaji, April 30, 2010 - 2:53 am UTC

Recently one of our PL/SQL package started throwing the following error
ORA-04030: out of process memory when trying to allocate
16408 bytes (koh-kghu call ,kollalo2)
ORA-06500: PL/SQL: storage error
ORA-04030: out of process memory when trying to allocate 16408 bytes (koh-kghu
call ,pmuccst: adt/record)
Oracle suggested not to use PL/sql table type for large volume of data as you mentioned. But funniest part is it's giving error for 1600 records. The application structure we have here is 3 tier architecture where the Portal speaks to oracle Fusion Middleware and which in turn talks to PL/SQL api. How can the PL/sql API send this many records to FMW in this scenario?
Tom Kyte
April 30, 2010 - 7:08 am UTC

use a result set - not an in memory collection.

pipelined function, return a ref cursor


ref cursor with a defining query would be best.

pipelined function would be second best.

How to determine the size of array for LIMIT

Kuldeep, October 03, 2011 - 7:18 am UTC

Hi,

I am using LIMIT with BULK COLLECT to process records. How can I determine what should be the correct size of my array for a chunk of data I am processing?

For example I have a table whose record size is 1024bytes and I am using LIMIT=1024 for an array of type table's record type. So this makes size of my array 1Mb.

How can I verify that this 1Mb size if is an appropriate size, not oversized or undersized. Agains which memory structure I should compare it with. Like if my shared pool is 100Mb then I should not exceed my array size more than 5% of my shared pool i.e. 5Mb. Is there any recommendation?

Thanks in advance.
Tom Kyte
October 03, 2011 - 10:30 am UTC

Make it a parameter (configurable)

start with 100. 100 seems like a really good number in my experience. Sometimes as much as 500.


Do not compare "memory structures". compare real world performance.


Your plsql memory is not allocated from the shared pool. It is in your UGA memory - which when using dedicated servers is in the PGA (process memory, not in the SGA at all) or the LARGE_POOL when using shared server.

Ankit, June 27, 2012 - 4:32 am UTC

Hi Tom
Thanks for your knowledge sharing.

I am curious about knowing:

1) How much data can we store in a collection, is there any limit ?
2) Is there any data dictionary view which shows current collections being used in currently running programs, with their number of records,bytes taken by them etc ?
3) Where is collection data stored in memory (UGA/PGA) ?

Thanks.
Tom Kyte
June 27, 2012 - 9:38 am UTC

1) only limited by disk space. If you are dealing with the collection as a variable in a program, then you might also be limited by memory

2) no, they are just variables, we don't track such minutiae

3) it would be in the UGA which can either be in the PGA (dedicated server) or SGA (shared server)

Oracle

A reader, July 29, 2019 - 9:13 am UTC

Can anyone give the ans of below mentioned scenario.
If i have a table having total no of rows 10Cr.
Now i need to process the data using collection(Not going to isert data into database again).
So which collection(Associative or Nested) i should use.?
What is the benefit of index by clause in Associative array.?
Tell me a scenario where i can use Associative array but not nested and vice versa?
Which one takes more memory associative or nested?

Chris Saxon
July 29, 2019 - 4:05 pm UTC

Tom summarizes the differences between these in the answer right at the top of this page.

Which part of the answer are you struggling with?

More to Explore

PL/SQL demos

Check out more PL/SQL tutorials on our LiveSQL tool.

PL/SQL docs

PL/SQL reference manual from the Oracle documentation library