Skip to Main Content

Breadcrumb

May 4th

Question and Answer

Tom Kyte

Thanks for the question, gajanan.

Asked: November 12, 2008 - 1:22 am UTC

Last updated: November 18, 2008 - 5:53 pm UTC

Version: 9.2.0

Viewed 1000+ times

You Asked

Dear tom,

I have a question, how does multi-pass sort work, with reference to temporary tablespace.
the following sizes are included to make the example clear.
Suppose the sort_area_size is 1 mb and i have a table t of 1gb.
I give the query as select * from t order by sal;
There are no indexes on this table.
then it will break the table in 1000 parts and then take one at a time in sort_area and then sort and keep it in temporary tablespace, so it will have 1000 such parts.
Now suppose that sal start from 1 and ends with 10,00,000, with no duplicate values. When it takes the first part to sort, then it will be having lets say
values of sal from 20 to 100, next pass will have from 1 to 30, third will have 200 to 1000.
1) Now the temporary tablespace has 1000 parts, but they are not sorted from 1 to 10,00,000. The first one is sorted from 20 to 100, second 1 to 30 and thrid 200 to 1000. so does it sort again all the 1000 parts to get from 1 to 10,00,000. Please explain

Also when the following query is given
select * from t where empno=7499;
suppose this gets 1000 blocks in the sga, then it will process it to get only the rows from this 1000 blocks to get the result and then put it in pga.
1) Can you explain the sequence of steps that happens behind this.
2) Which part of pga, is this result stored.

thank you .

and Tom said...

Ok, in short

in memory sort Everything fits in memory, no disk is needed, everything done in ram.

one pass sort We read some amount of data and sort it, the sort workarea fills up, we write that to disk. We do it again (read more, sort it) - this too spills to disk. We do this over and over until we've read the entire set of data and sorted each of the chunks. Now we need read a bit of each of the chunks we've sorted on disk and start returning data from there. In the one pass sort, we can read a bit of each chunk and start returning data. If we cannot read a bit of EVERY chunk then....

we are in the realm of the multi-pass sort. We read and merge the first few chunks - resulting in a new set. We read and merge another set of chunks - resulting in a new set. And so on - until we've processed all of the output from the first pass and created a "second pass" of more merged (more sorted) data. Then we merge those in the sort area and start returning data.



Jonathan Lewis has a good write up of this in Cost Base Oracle Fundamentals, and a presentation of his gives you pictures to go with this:
http://www.nocoug.org/download/2003-08/how_cbo_works.ppt


The PGA is used in dedicated server, the UGA (stored in the SGA) for shared server. When using manual memory management - the sort_area_retained size bits of data are stored in the PGA/UGA (just the pga/uga, there isn't really a 'part' to point to), the sort_area_size-sort_area_retained is kept on disk and read in as needed.



As for 'select * from t where empno = 7499', the only answer is "it depends", there will be very little probably in the pga at any moment in time, we fetch data on request when we can. If this used an index - we would just get the N rows the client fetched - at a time. And the data need not persist in the pga, it is only there while we get it from the cache and send it back to the client.

Rating

  (2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

gajanan, November 13, 2008 - 6:04 am UTC

I did go through the Jonathan Lewis document, on page 32 it shows huge sort(i think this will be multi-pass) it shows small sorts are done and then they are merged and then these bigger units are merged, is it possible( in some cases atleast), that these small units after being merged, needs another sort, before being merged?

Also in cases where the data needs to be in pga, which part of it does it reside, like (private sql area, sort area etc).
Also I, read the oracle documents for pga, but it does not give details of each component e.g. ( it will say "The run-time area, which is freed when the execution is terminated." but for what purpose it is used, when it is allocated etc is not given)
So can you please provide a link or document which has some extra information.
Tom Kyte
November 14, 2008 - 4:34 pm UTC

a multipass sort is just that, yes, can be multiple passes.

It resides in the sort workarea of the PGA.

The PGA is process memory that is managed by the OS as a heap (malloc()'ed for C programmers) or dynamically attached and detached using memmap() for some workareas (sort workareas using automatic memory management can do that if the OS permits). That is all - the PGA is just process memory, you don't and we don't have to assign a "name" to everything, it isn't managed like the SGA at all (which is a fixed size shared data structure, PGA memory is just allocated as needed on the fly)

You have access to everything you need already.

There are no "each component" of the PGA to study in such detail, it is just process memory - allocated as needed, freed when done...

(truth be told, if you want to understand it better - program in C for a couple of years.... Or assembler, if you know how to program in C, you know all about the PGA)

A reader, November 14, 2008 - 8:38 am UTC

Hi Tom,
In your example of a one-pass sort, after the first pass has created several chunks of sorted data, why (ie under what conditions) would Oracle not be able to "read a bit of each chunk", therby requiring another pass? Can you give a simple numeric example such as,

Chunk1: 2,4,5,8,13
Chunk2: 1,3,7,9,10
Chunk3: 6,11,12,14

Would Oracle now have to sort the 3 chunks in a 2nd pass before it can return 1,2,3,...,13,14? But if it can do that then why the need to break it up into chunks in the first place?
Tom Kyte
November 18, 2008 - 5:53 pm UTC

the condition would be:

a bit of each chunk - of EACH chunk - doesn't fit in the sort area size. We cannot merge unless and until we read each and every chunk.