Skip to Main Content
  • Questions
  • How to query a large table with cpu throttling

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Peter.

Asked: February 19, 2008 - 1:04 am UTC

Last updated: February 20, 2008 - 12:31 pm UTC

Version: 10.2.0

Viewed 1000+ times

You Asked

My requirements:

1. I have a large table, in the range of 1 billion records.

2. But the machine is a relatively slow one. Therefore I would not
like to clog the machine when querying this large table.

My question:

A. Supposed I need to extract two fields of ALL the record - but the
full table scan can afford to take its time. Therefore, how do I do
a full table scan, with cpu throttling (eg, sleeping or slowing down
once in a while) so as to let other jobs continue? (any SQL hints
available?)

Ideally I thought it should not involved any indexes, as that will
incur additional lookups, plus possibly some sorting etc, because the
target is a FTS anyway - is my analysis correct?

B. Supposed there are no solution to the above, so alternatively is
to do slicing. My method is find the min and max of the rowid, and
then do a direct rowid access:

select * from table where rowid = 'xxxx'.

where the xxxx correspond to the rowid to be enumerated by some yet-to-be-discovered algorithm. This statement does not generate any temporary sorting, or depending on any prior indexes. Problem is how to derive all the rowids for this particular table?

C. Currently, I found that select rowid from a_large_table where
rownum < 100 and rowid > 'xxxx' return almost immediately, whereas a
"select count(*)" can take about 20mins or more to return just a
number.

This therefore is the 3rd way to get data - 100 records at a time.
Similar in design to (2) above. Very fast it seemed. But there is no way to successively get 100 rows at a time, without incuring the cost of sorting on rowid.

I have run out of ideas.

and Tom said...

a) the resource manager. It can be used to limit the resources you are using WHEN AND IF anyone else wants them (eg: if no one else is making any demands on the machine, you get the entire thing - no reason to throttle you - you cannot put CPU and IO in the bank, save it up and use it later)

full scan is what you want, no indexes

http://docs.oracle.com/cd/B19306_01/server.102/b14231/dbrm.htm#i1010776

b) nah, that would not have been needed, all you would have had to have done if this didn't exist would be:

  open cursor;
  loop
      array fetch 500 records;
      print them out somewhere
      exit when you have no more data to fetch.

      sleep a bit
  end loop;


c) see b) you wouldn't do it that way, but really see a)

Rating

  (7 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

INDEX FAST FULL SCAN?

Kashif, February 19, 2008 - 5:28 pm UTC

Tom,

If you're only selecting two columns from a billion row table, why would you not create an index on the two columns and let the optimizer do an index fast full scan? Why FTS and no indexes?

Kashif

Tom Kyte
February 20, 2008 - 8:15 am UTC

if there is a general purpose need for this index elsewhere - sure, IF at least one of the fields were NOT NULL (that would be a requirement for the index to be used as well)

so, if

a) having this index on this billion row table makes sense in general
b) at least one of the columns is NOT NULL
c) there is no where clause used to select the rows of interest that access other columns
d) the table is fat to begin with

then the index could be used as a skinny version of the table.

full scans are not evil, indexes are not good

Karthick Pattabiraman, February 20, 2008 - 1:56 am UTC

Kashif you should understand how index works.

When you have 1000 records in a table and you are going to read all the 1000 records there is no use of going to the index. Even if you have index on that table CBO will not use that and it will go for a full scan.

What index scan means

1. Read the index find the rowid
2. Fetch the data for that rowid from the table.

This is effective only when you are selecting a small number of data from a big table. But the poster wants to get all the records so in that case reading index will be a evil action.

Remember and say it to yourself

INDEX IS NOT GOD AND FULL TABLE SCAN IS NOT EVIL.

Under such cases Full Table scan is the Best and that is what we should be going for.

This below link contains one of my favorite answer given by tom. Rocking stuff may be of your interest...

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:9422487749968


Actually an index would help

Tom, February 20, 2008 - 7:46 am UTC

In this particular case the poster has specified that they only need 2 fields from the billion row table. If they were to create an index on these two fields then the optimizer could choose to full scan the index rather than the table. Since the content of the index would be considerably smaller than that of the table this could reduce the amount of logical and physical IO considerably.

Of course, in order to do this you'd need to create the index which would itself pretty much need to full scan the table....
Tom Kyte
February 20, 2008 - 9:29 am UTC

see above for the other considerations...

To : Karthick

A reader, February 20, 2008 - 7:56 am UTC

Hi Karthick,
You might need to re-read the documentation about 'INDEX FAST FULL SCAN'. Index Fast full scan will not access the table via rowid.

SQL> create table tab_idx_ffs1 as select * from all_objects;

Table created.

SQL> create index idx_ffs1 on tab_idx_ffs1(object_name,subobject_name);

Index created.

SQL>BEGIN
  DBMS_STATS.GATHER_TABLE_STATS (
      OwnName        => user
     ,TabName        => 'TAB_IDX_FFS1'
    ,Estimate_Percent  => 0
    ,Block_sample      => TRUE
    ,Method_Opt        => 'FOR ALL INDEXED COLUMNS SIZE 1 '
    ,Degree            => NULL
    ,Cascade           => TRUE
    ,No_Invalidate     => FALSE);
END;
/

PL/SQL procedure successfully completed.

SQL> set autotrace on
SQL> set timing on

SQL> select /*+ full (tab_idx_ffs) */ object_name, subobject_name from tab_idx_ffs where object_name > '0';

980136 rows selected.

<b>Elapsed: 00:01:11.62</b>

Execution Plan
----------------------------------------------------------                      
                                                                                
----------------------------------------------------------------------          
| Id  | Operation         | Name        | Rows  | Bytes | Cost (%CPU)|          
----------------------------------------------------------------------          
|   0 | SELECT STATEMENT  |             |  1422K|    33M|  5867   (2)|          
|   1 |  TABLE ACCESS FULL| TAB_IDX_FFS |  1422K|    33M|  5867   (2)|          
----------------------------------------------------------------------          
                                                                                
Statistics
----------------------------------------------------------                      
          1  recursive calls                                                    
          0  db block gets                                                      
      85693  consistent gets                                                    
       6236  physical reads                                                     
        348  redo size                                                          
   25232830  bytes sent via SQL*Net to client                                   
     457632  bytes received via SQL*Net from client                             
     130688  SQL*Net roundtrips to/from client                                  
          0  sorts (memory)                                                     
          0  sorts (disk)                                                       
     980136  rows processed                                                     

SQL> select /*+ index_ffs(t idx_ffs) */ object_name, subobject_name from tab_idx_ffs t where object_name > '0';

980136 rows selected.

<b>Elapsed: 00:01:04.98</b>

Execution Plan
----------------------------------------------------------                      
                                                                                
---------------------------------------------------------------------           
| Id  | Operation            | Name    | Rows  | Bytes | Cost (%CPU)|           
---------------------------------------------------------------------           
|   0 | SELECT STATEMENT     |         |  1422K|    33M|  2783   (2)|           
|   1 |  INDEX FAST FULL SCAN| IDX_FFS |  1422K|    33M|  2783   (2)|           
---------------------------------------------------------------------           
                                                                                
Statistics
----------------------------------------------------------                      
          1  recursive calls                                                    
          0  db block gets                                                      
      75205  consistent gets                                                    
          0  physical reads                                                     
          0  redo size                                                          
    8099883  bytes sent via SQL*Net to client                                   
     457632  bytes received via SQL*Net from client                             
     130688  SQL*Net roundtrips to/from client                                  
          0  sorts (memory)                                                     
          0  sorts (disk)                                                       
     980136  rows processed                                                     

SQL> set autotrace off




RE: IF at least one of the fields were NOT NULL.

Kashif, February 20, 2008 - 10:21 am UTC

I remember you posting about using a virtual column such as a 0 in your index definition to ensure that even entirely null combinations of your indexed columns are indexed. E.g.:

a@CASST> create table ao as select * from all_objects;

Table created.

a@CASST> @desc ao
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 OWNER                                     NOT NULL VARCHAR2(30)
 OBJECT_NAME                               NOT NULL VARCHAR2(30)
 SUBOBJECT_NAME                                     VARCHAR2(30)
 OBJECT_ID                                 NOT NULL NUMBER
 DATA_OBJECT_ID                                     NUMBER
 OBJECT_TYPE                                        VARCHAR2(19)
 CREATED                                   NOT NULL DATE
 LAST_DDL_TIME                             NOT NULL DATE
 TIMESTAMP                                          VARCHAR2(19)
 STATUS                                             VARCHAR2(7)
 TEMPORARY                                          VARCHAR2(1)
 GENERATED                                          VARCHAR2(1)
 SECONDARY                                          VARCHAR2(1)

a@CASST> create index ao_idx on ao(SUBOBJECT_NAME, DATA_OBJECT_ID, 0);

Index created.

a@CASST> set autotrace traceonly
a@CASST> select SUBOBJECT_NAME, DATA_OBJECT_ID from ao;

51975 rows selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=36 Card=56163 Byte
          s=1684890)

   1    0   INDEX (FAST FULL SCAN) OF 'AO_IDX' (INDEX) (Cost=36 Card=5
          6163 Bytes=1684890)





Statistics
----------------------------------------------------------
         84  recursive calls
          0  db block gets
       3680  consistent gets
        113  physical reads
          0  redo size
     430515  bytes sent via SQL*Net to client
      24527  bytes received via SQL*Net from client
       3466  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      51975  rows processed

a@CASST> 


Also, not sure what you mean by the following?

"c) there is no where clause used to select the rows of interest that access other columns "

Thanks.

Karthick:

Index fast full scans can be a good way to cut down on the time you spend scanning a table if you need a few columns from a wide table.

Kashif

Tom Kyte
February 20, 2008 - 12:31 pm UTC

sure, you would be making a 3 attribute index - at least one of the attributes in the index must be not null.


In general, for all we know, the query could be

select c1, c2 from t where c3 like '%X%';


I agree a fast full index scan can be good - have written that many times, Just stating

a) must have non-null attribute
b) in this case, I would hope the index can be used for something/many other things
c) no where clause is used that accesses things other than C1, C2 (as index starts to grow now...)
d) the table is fat to begin with.

karthick, February 21, 2008 - 12:17 pm UTC

Yes i got that wrong. Thanks kashif...

A reader, December 26, 2012 - 4:22 am UTC