Peter -- Thanks for the question regarding "How to query a large table with cpu throttling", version 10.2.0
Submitted on 19-Feb-2008 1:04 Central time zone
Last updated 20-Feb-2008 12:31
You Asked
My requirements:
1. I have a large table, in the range of 1 billion records.
2. But the machine is a relatively slow one. Therefore I would not
like to clog the machine when querying this large table.
My question:
A. Supposed I need to extract two fields of ALL the record - but the
full table scan can afford to take its time. Therefore, how do I do
a full table scan, with cpu throttling (eg, sleeping or slowing down
once in a while) so as to let other jobs continue? (any SQL hints
available?)
Ideally I thought it should not involved any indexes, as that will
incur additional lookups, plus possibly some sorting etc, because the
target is a FTS anyway - is my analysis correct?
B. Supposed there are no solution to the above, so alternatively is
to do slicing. My method is find the min and max of the rowid, and
then do a direct rowid access:
select * from table where rowid = 'xxxx'.
where the xxxx correspond to the rowid to be enumerated by some yet-to-be-discovered algorithm. This statement does not generate any temporary sorting, or depending on any prior indexes. Problem is how to derive all the rowids for this particular table?
C. Currently, I found that select rowid from a_large_table where
rownum < 100 and rowid > 'xxxx' return almost immediately, whereas a
"select count(*)" can take about 20mins or more to return just a
number.
This therefore is the 3rd way to get data - 100 records at a time.
Similar in design to (2) above. Very fast it seemed. But there is no way to successively get 100 rows at a time, without incuring the cost of sorting on rowid.
I have run out of ideas.
and we said...
a) the resource manager. It can be used to limit the resources you are using WHEN AND IF anyone else wants them (eg: if no one else is making any demands on the machine, you get the entire thing - no reason to throttle you - you cannot put CPU and IO in the bank, save it up and use it later)
full scan is what you want, no indexes
http://download.oracle.com/docs/cd/B19306_01/server.102/b14231/dbrm.htm#i1010776
b) nah, that would not have been needed, all you would have had to have done if this didn't exist would be:
open cursor;
loop
array fetch 500 records;
print them out somewhere
exit when you have no more data to fetch.
sleep a bit
end loop;
c) see b) you wouldn't do it that way, but really see a)
INDEX FAST FULL SCAN?
February 19, 2008 - 5pm Central time zone
Reviewer: Kashif from Reston, VA
Tom,
If you're only selecting two columns from a billion row table, why would you not create an index on
the two columns and let the optimizer do an index fast full scan? Why FTS and no indexes?
Kashif
Followup February 20, 2008 - 8am Central time zone:
if there is a general purpose need for this index elsewhere - sure, IF at least one of the fields were NOT NULL (that would be a requirement for the index to be used as well)
so, if
a) having this index on this billion row table makes sense in general
b) at least one of the columns is NOT NULL
c) there is no where clause used to select the rows of interest that access other columns
d) the table is fat to begin with
then the index could be used as a skinny version of the table.
full scans are not evil, indexes are not good
February 20, 2008 - 1am Central time zone
Reviewer: Karthick Pattabiraman
Kashif you should understand how index works.
When you have 1000 records in a table and you are going to read all the 1000 records there is no
use of going to the index. Even if you have index on that table CBO will not use that and it will
go for a full scan.
What index scan means
1. Read the index find the rowid
2. Fetch the data for that rowid from the table.
This is effective only when you are selecting a small number of data from a big table. But the
poster wants to get all the records so in that case reading index will be a evil action.
Remember and say it to yourself
INDEX IS NOT GOD AND FULL TABLE SCAN IS NOT EVIL.
Under such cases Full Table scan is the Best and that is what we should be going for.
This below link contains one of my favorite answer given by tom. Rocking stuff may be of your
interest...
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:9422487749968
Actually an index would help
February 20, 2008 - 7am Central time zone
Reviewer: Tom
In this particular case the poster has specified that they only need 2 fields from the billion row
table. If they were to create an index on these two fields then the optimizer could choose to full
scan the index rather than the table. Since the content of the index would be considerably smaller
than that of the table this could reduce the amount of logical and physical IO considerably.
Of course, in order to do this you'd need to create the index which would itself pretty much need
to full scan the table....
Followup February 20, 2008 - 9am Central time zone:
see above for the other considerations...
To : Karthick
February 20, 2008 - 7am Central time zone
Reviewer: A reader from UK
Hi Karthick,
You might need to re-read the documentation about 'INDEX FAST FULL SCAN'. Index Fast full scan will not access the table via rowid.
SQL> create table tab_idx_ffs1 as select * from all_objects;
Table created.
SQL> create index idx_ffs1 on tab_idx_ffs1(object_name,subobject_name);
Index created.
SQL>BEGIN
DBMS_STATS.GATHER_TABLE_STATS (
OwnName => user
,TabName => 'TAB_IDX_FFS1'
,Estimate_Percent => 0
,Block_sample => TRUE
,Method_Opt => 'FOR ALL INDEXED COLUMNS SIZE 1 '
,Degree => NULL
,Cascade => TRUE
,No_Invalidate => FALSE);
END;
/
PL/SQL procedure successfully completed.
SQL> set autotrace on
SQL> set timing on
SQL> select /*+ full (tab_idx_ffs) */ object_name, subobject_name from tab_idx_ffs where
object_name > '0';
980136 rows selected.
<b>Elapsed: 00:01:11.62</b>
Execution Plan
----------------------------------------------------------
----------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1422K| 33M| 5867 (2)|
| 1 | TABLE ACCESS FULL| TAB_IDX_FFS | 1422K| 33M| 5867 (2)|
----------------------------------------------------------------------
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
85693 consistent gets
6236 physical reads
348 redo size
25232830 bytes sent via SQL*Net to client
457632 bytes received via SQL*Net from client
130688 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
980136 rows processed
SQL> select /*+ index_ffs(t idx_ffs) */ object_name, subobject_name from tab_idx_ffs t where
object_name > '0';
980136 rows selected.
<b>Elapsed: 00:01:04.98</b>
Execution Plan
----------------------------------------------------------
---------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
---------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1422K| 33M| 2783 (2)|
| 1 | INDEX FAST FULL SCAN| IDX_FFS | 1422K| 33M| 2783 (2)|
---------------------------------------------------------------------
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
75205 consistent gets
0 physical reads
0 redo size
8099883 bytes sent via SQL*Net to client
457632 bytes received via SQL*Net from client
130688 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
980136 rows processed
SQL> set autotrace off
RE: IF at least one of the fields were NOT NULL.
February 20, 2008 - 10am Central time zone
Reviewer: Kashif from Reston, VA
I remember you posting about using a virtual column such as a 0 in your index definition to ensure that even entirely null combinations of your indexed columns are indexed. E.g.:
a@CASST> create table ao as select * from all_objects;
Table created.
a@CASST> @desc ao
Name Null? Type
----------------------------------------- -------- ----------------------------
OWNER NOT NULL VARCHAR2(30)
OBJECT_NAME NOT NULL VARCHAR2(30)
SUBOBJECT_NAME VARCHAR2(30)
OBJECT_ID NOT NULL NUMBER
DATA_OBJECT_ID NUMBER
OBJECT_TYPE VARCHAR2(19)
CREATED NOT NULL DATE
LAST_DDL_TIME NOT NULL DATE
TIMESTAMP VARCHAR2(19)
STATUS VARCHAR2(7)
TEMPORARY VARCHAR2(1)
GENERATED VARCHAR2(1)
SECONDARY VARCHAR2(1)
a@CASST> create index ao_idx on ao(SUBOBJECT_NAME, DATA_OBJECT_ID, 0);
Index created.
a@CASST> set autotrace traceonly
a@CASST> select SUBOBJECT_NAME, DATA_OBJECT_ID from ao;
51975 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=36 Card=56163 Byte
s=1684890)
1 0 INDEX (FAST FULL SCAN) OF 'AO_IDX' (INDEX) (Cost=36 Card=5
6163 Bytes=1684890)
Statistics
----------------------------------------------------------
84 recursive calls
0 db block gets
3680 consistent gets
113 physical reads
0 redo size
430515 bytes sent via SQL*Net to client
24527 bytes received via SQL*Net from client
3466 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
51975 rows processed
a@CASST>
Also, not sure what you mean by the following?
"c) there is no where clause used to select the rows of interest that access other columns "
Thanks.
Karthick:
Index fast full scans can be a good way to cut down on the time you spend scanning a table if you need a few columns from a wide table.
Kashif
Followup February 20, 2008 - 12pm Central time zone:
sure, you would be making a 3 attribute index - at least one of the attributes in the index must be not null.
In general, for all we know, the query could be
select c1, c2 from t where c3 like '%X%';
I agree a fast full index scan can be good - have written that many times, Just stating
a) must have non-null attribute
b) in this case, I would hope the index can be used for something/many other things
c) no where clause is used that accesses things other than C1, C2 (as index starts to grow now...)
d) the table is fat to begin with.
February 21, 2008 - 12pm Central time zone
Reviewer: karthick from India
Yes i got that wrong. Thanks kashif...