Skip to Main Content

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Pradeep.

Asked: March 20, 2017 - 2:34 pm UTC

Last updated: March 23, 2017 - 3:22 am UTC

Version: 11.2.0.4

Viewed 1000+ times

You Asked

Dear Team,

I have one analytic question, please help me to understand this.

Suppose I have
SGA_TARGET=5 GB
DB_CACHE_SIZE= 2 GB
PGA_SIZE= 2 GB
TABLE SIZE= 15 GB (e.g. table_name employee)

If I fire command like "select * from employee order by sal"

definitely it go for full table scan, my doubt is if my db_cache_size id 2GB and temp size also 5 GB.
how it going to hold all data and sort to give output.

for update we know after db_cache full checkpoint occur and all uncommitted data written to datafile and make db buffer free to hold next bunch of data but for select how it going to work.

In sort how large full table scan with sort works internally,which are the pools comes in picture to perform this operation.

Please advice me with and without index on table.

Thank's
Pradeep





and Connor said...

If your temporary tablepspace is *capped* at 5G, then you will likely have trouble sorting a 15G table, because yes, we do need space to hold both intermediate and final sorting results.

How *much* space you need depends on various factors. For example,

a) You might have a 2G PGA, but we dont (by default) allow a single session to consume all of that. There are limits per session so that everyone gets a "fair chance" at having some pga mem available to them.

b) We dont just grab 15G and sort it. We will scan a portion of the data, sort it in memory, and then dump that to disk if we run out of pga mem, and repeat for the next portion etc. If we have enough memory than to merge these sorted pieces as we send the final result back to the calling program...then this is a 1-pass sort. But if we cant do that, we might need to take (say) 3 of the sorted portions, merge them, and then dump that back to disk. Then the next 3 portions etc etc... And *then* we take these "double-sorted" portions and merge them to give them back to the client. This is called a multi-pass sort (and obviously will burn a lot of I/O resources writing and reading back and forth to the temp space).

c) We dont *have* to grab all of the row data. If you just do: select a,b,c from emp, then we wont bring along columns e,f,g for no reason.

An index doesn't really change the *how* we sort, it simply potentially changes

i) whether we have to sort at all. If you are ordering by the same sequence as your index definition, we might be able to simply read the data in index order, and thus, it comes out already sorted.

ii) the amount of data we need to sort. We might be able to scan just the index and sort that, rather than sort the entire table.

Rating

  (2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

followup

A reader, March 22, 2017 - 10:24 am UTC

b) "double-sorted" portions :
where are seated these portions? if we will divide to conquer and thus not one-shot sorting (parts in mem and parts in Temp) the whole data , still the sets may be somewhere !

i) we move the prob. still the index creation need mem an temp right? so what advantage ? taking for instance the original need , is creating an index on SAL will free me from from using ORDEY BY clause? elaborate please.
Connor McDonald
March 23, 2017 - 3:22 am UTC

b) Yes, they will be stored in pga, or if they cannot fit, then in TEMP

i) if you want sorted results, you *always* need the ORDER BY. But if an index is in place, the optimizer may decide to read the data via the index (which is *stored* in sorted order already) and thus subsequent sorting may be avoided.

Gr8 way to explain task

Pradeep prajapati, March 22, 2017 - 2:08 pm UTC

Thanks for this gr8 explanation, Now I'm quite clear with things.

Many thanks
Pradeep