Skip to Main Content
  • Questions
  • SQL*Loader, Resource Manager, Locally Manager TS's

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Andre.

Asked: November 27, 2000 - 9:42 pm UTC

Last updated: August 24, 2007 - 1:50 pm UTC

Version: 8.1.6

Viewed 1000+ times

You Asked

Hi Tom !!!

A bunch of questions. I hope you don't mind... (If you can shed some light on them, I'll be very glad. I bet the answers are on the tip of your tongue)

- What is the use of the FILLER keyword in a SQL*Loader ctl ?

- In Resource Manager, how does muti-level CPU usage work (emphasis method) ? It's not clear whatsoever in the on-line books. It is said that when one level is "unable" to use the resources of one level, then the next level is utilized.

- What's the actual difference between the V$TEMP_EXTENT_MAP and the V$TEMP_EXTENT_POOL views, regarding locally managed temporary TS's ?

- Using the MIGRATION utility, what is the use of the MULTIPLIER parameter ? I know it is used to size the new 8i index, but how it is done ?

- Is there any "reasonable" explanation why the QUERY_REWRITE_ENABLED parameter should be set to true in order
to use function-based indexes ?

- I understand dimensions help query rewrites. But how can constraints be used for query rewrites ?

- I've read about one limitation of stored outlines. You can use plans created by the CBO and manipulate them -- especially transporting them. But you cannot "create" your own plan. That would be a deficiency of this new feature. Is this approach correct ?

- In version 8.1.6 Oracle discourages the use of the JAVAUSERPRIV, JAVASYSPRIV and JAVADEBUGPRIV roles, and encourages the use of direct privileges. Why is that ? These role make sense to me and enable a considerable amount of control over the Java resources. How can I grant these isolated privileges explicitly ?

- I've read there are server-side JDBC drivers in order to translate the java source code previously published to PL/SQL, in sort of a "recursive" way. That is, the JVM must communicate with the SQL engine trhough this "internal" driver. Is that correct ?

Thanks for your attention and patience.

Warmest regards from sunny Rio de Janeiro, Brazil !



and Tom said...

o the filler keyword is to map fields that appear in the raw INPUT field to a non-database field. One of its primary uses is to skip fields in a delimited file (if you want to load fields 1, 3 and 5 skipping over 2 and 4 for example). See

</code> http://asktom.oracle.com/~tkyte/SkipCols/index.html

for more info.



o the meaning of "one level is unable to use the resources" is more simply "when one level is not requesting those resources, they will be given to the lower levels".

If a consumer-group does not use its CPU-percentage, the remainder falls through to the next level.

This is superior to a single level plan whereby I would allocate X% of the CPU to a given group and that is all they would get, regardless of the usage by other users (even if there was plenty of leftover resources -- you would not get them).  A multi-level plan allows resources to be fully utilized whenever possible.



o for the meaning of v$temp_extent_* views:

V$TEMP_EXTENT_MAP 
Information for all extents in all locally managed temporary tablespaces.  
 
V$TEMP_EXTENT_POOL 
For locally managed temporary tablespaces: the state of temporary space cached and used for by each instance
 
one is a map of the extents, the other shows cache information.  The number of rows in these views should be different typically as the cached information comes and goes.


o The MULTIPLIER option specifies the initial size of the Oracle8 i_file#_block# index relative to the Oracle7 i_file#_block# index.  For example,  MULTIPLIER=30 triples the initial size when the index is created, where no MULTIPLIER specified, by default, uses the i_file#_block# value of 15, 
creating an index for Oracle8 that is 1.5 time larger than the Oracle7 i_file#_block# index.


o QUERY_REWRITE_ENABLED is what tells us to "trust" other information.  We use it with materialized views for example to trust that if we rewrite the query to use a summary table -- the data in the summary table is Ok and current - it'll get the same answer.

With function based indexes -- the same thing is true.  You are explicitly telling us to "trust" that your function is deterministic, that it does the right thing.  It is a protection feature -- you must physically enable this feature, you must be sure you want to use it.  

You could for example write a function "random" to return a random number.  You could create an index a table with this random() function.  You could then run a query:

select random from T where random = 5;

RANDOM
---------
10000


That happens because random is not deterministic, the answer is "apparently" wrong but it is in fact correct because at the time the index entry was made on that row, random returned 5.  This time it returned 10000.

o See
http://docs.oracle.com/cd/A81042_01/DOC/server.816/a76994/qr.htm#35006
for when primary key, foreign key, not null constraints are needed for query rewrite with MV's.

(quote)
....
To determine this, the optimizer may depend on some of the data relationships declared by the user via constraints and dimensions. Such data relationships include hierarchies, referential integrity, and uniqueness of key data, and so on.
....

o It is true you cannot by hand CREATE your own plan.  I do not myself view this as a deficiency as many of the generated plans for non trivial queries have dozens, if not hundreds, of pieces to them (eg: its really HARD to write the plan).

What you do with Query Plan Stability is to stabalize plans to protect from different plans upon an upgrade or when you just plain don't want them to change.  They are a mechanism to "freeze" the existing plan, not to tell the optimizer how to do the query.  that is what HINTS are for (and you can use hints to generate the plan you want and "freeze" that plan).


o the JAVAUSERPRIV, JAVASYSPRIV roles are alot like RESOURCE and DBA -- very broad, very easy to use but they convery SOOOOO much power.

The new privelege model follows that used by Java and is much more granular.  See 
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:952229840241
for an example using it.

o I don't fully follow this question.  There is a server side jdbc driver, the kprb driver.  You can read about it at:

http://docs.oracle.com/cd/A81042_01/DOC/java.816/a81354/advanc4.htm#1001042 <code>

it is an optimized jdbc driver that resides in the database for interacting with the database.




Rating

  (12 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

SQL*Loader, direct path Localy Managed TS extend allocation

Gerrold Kuijpers, August 25, 2005 - 9:06 am UTC

Tom,

We are loading lots of files (49.000+) using SQL*Loader parallel & direct path method into a 8.1.7.4 database.
Our tablespaces are locally managed, with extend size defined as 4Mb.

The load files vary in size, but the size averages around 380kb.

What we see happening is that Oracle allocates a new extend for each file loaded, in stead of 'using up' available space in extends. (Number of extends per table = number of loadfiles per table) and in doing so we blow our allocated TS size.

When we first concatenate the files into one single (big) file per table the extend usage is far more efficient.
E.g. From not fitting into a 16Gb Tablespace it now only used up 3.1 Gb.

As a nice side effect, the total load also took 50% less time by first concatenating the load files. (From > 4 hours to 1 hour concatenation + 30 minutes loading)

Can you explain how SQL*Loader allocates extend when used in parallel &direct path mode?

Thanks,
Gerrold



Tom Kyte
August 25, 2005 - 1:58 pm UTC

Quote from a forthcoming book (yeah, mine ;)  Bear in mind, this was tested on 9i and 10g only -- but the logic hasn't changed to my knowledge since 8i

<quote>
Extent Trimming and Locally-Managed Tablespaces

Enter locally-managed tablespaces. Here we have two types: UNIFORM SIZE, whereby every extent in the tablespace is always precisely the same size, and AUTOALLOCATE, whereby Oracle decides how big each extent should be using an internal algorithm. Both of these approaches solve nicely the problem of the 99MB free space, followed by 1MB used space, followed by 99MB free space, and so on, resulting in lots of free space that cannot be used. However, they each solve it very differently. The UNIFORM SIZE approach obviates extent trimming from consideration all together. When you use UNIFORM SIZEs, Oracle cannot perform extent trimming. All extents are of that single size—none can be smaller (or larger) than that single size. AUTOALLOCATE extents, on the other hand, do support extent trimming, but in an intelligent fashion. They use a few specific sizes of extents and have the ability to use space of different sizes—that is, the algorithm permits the use of all free space over time in the tablespace. Unlike the dictionary-managed tablespace, where if you request a 100MB extent, Oracle will fail the request if it can find only 99MB free extents (so close, yet so far), a locally-managed tablespace with AUTOALLOCATE extents can be more flexible. It may reduce the size of the request it was making in order to attempt to use all of the free space.

Let’s now look at the differences between the two locally-managed tablespace approaches. To do that, we need a real-life example to work with. We’ll set up an external table capable of being used in a parallel direct path load situation, which is something that we do frequently. Even if you are still using SQL*Loader to parallel direct path load data, this section applies entirely—you just have manual scripting to do to actually load the data. So, in order to investigate extent trimming, we need to set up our example load and then perform the loads under varying conditions and examine the results.

Setting Up

To get started, we need an external table. I’ve found time and time again that I have a legacy control file from SQL*Loader that I used to use to load data. One that looks like this for example:

LOAD DATA
INFILE '/tmp/big_table.dat'
INTO TABLE big_table
REPLACE
FIELDS TERMINATED BY '|'
(
id ,owner ,object_name ,subobject_name ,object_id
,data_object_id ,object_type ,created ,last_ddl_time
,timestamp ,status ,temporary ,generated ,secondary
)

We can convert this easily into an external table definition using SQL*Loader itself:

$ sqlldr big_table/big_table big_table.ctl external_table=generate_only
SQL*Loader: Release 10.1.0.3.0 - Production on Mon Jul 11 14:16:20 2005
Copyright (c) 1982, 2004, Oracle.  All rights reserved.
Notice the parameter EXTERNAL_TABLE passed to SQL*Loader. It causes SQL*Loader in this case to not load data, but rather to generate a CREATE TABLE statement for us in the log file. This CREATE TABLE statement looked as follows (this is an abridged form; I’ve edited out repetitive elements to make the example smaller):

CREATE TABLE "SYS_SQLLDR_X_EXT_BIG_TABLE"
(
  "ID" NUMBER,
  ...
  "SECONDARY" VARCHAR2(1)
)
ORGANIZATION external
(
  TYPE oracle_loader
  DEFAULT DIRECTORY SYS_SQLLDR_XT_TMPDIR_00000
  ACCESS PARAMETERS
  (
    RECORDS DELIMITED BY NEWLINE CHARACTERSET WE8ISO8859P1
    BADFILE 'SYS_SQLLDR_XT_TMPDIR_00000':'big_table.bad'
    LOGFILE 'big_table.log_xt'
    READSIZE 1048576
    FIELDS TERMINATED BY "|" LDRTRIM
    REJECT ROWS WITH ALL NULL FIELDS
    (
      "ID" CHAR(255)
        TERMINATED BY "|",
        ....
      "SECONDARY" CHAR(255)
        TERMINATED BY "|"
    )
  )
  location
  (
    'big_table.dat'
  )
)REJECT LIMIT UNLIMITED

All we need to do is edit that a bit to name the external table the way we want; change the directories, perhaps; and so on:

ops$tkyte@ORA10GR1> create or replace directory my_dir as '/tmp/'
  2  /
Directory created.

And after that, all we need to do is actually create the table:

ops$tkyte@ORA10GR1> CREATE TABLE "BIG_TABLE_ET"
  2  (
  3    "ID" NUMBER,
…
 16    "SECONDARY" VARCHAR2(1)
 17  )
 18  ORGANIZATION external
 19  (
 20    TYPE oracle_loader
 21    DEFAULT DIRECTORY MY_DIR
 22    ACCESS PARAMETERS
 23    (
 24      RECORDS DELIMITED BY NEWLINE CHARACTERSET WE8ISO8859P1
 25      READSIZE 1048576
 26      FIELDS TERMINATED BY "|" LDRTRIM
 27      REJECT ROWS WITH ALL NULL FIELDS
 28    )
 29    location
 30    (
 31      'big_table.dat'
 32    )
 33  )REJECT LIMIT UNLIMITED
 34  /
Table created.

Then we make this table parallel enabled. This is the magic step—this is what will facilitate an easy parallel direct path load:

ops$tkyte@ORA10GR1> alter table big_table_et PARALLEL;
Table altered.


Note    The PARALLEL clause may also be used on the CREATE TABLE statement itself. Right after the REJECT LIMIT UNLIMITED, the keyword PARALLEL could have been added. I used the ALTER statement just to draw attention to the fact that the external table is, in fact, parallel enabled.


Extent Trimming with UNIFORM vs. AUTOALLOCATE Locally-Managed Tablespaces

That’s all we need to do with regard to setting up the load component. Now, we would like to investigate how space is managed in a locally-managed tablespace (LMT) that uses UNIFORM extent sizes, compared to how space is managed in an LMT that AUTOALLOCATEs extents. In this case, we’ll use 100MB extents. First we create LMT_UNIFORM, which uses uniform extent sizes :

ops$tkyte@ORA10GR1> create tablespace lmt_uniform
  2  datafile '/u03/ora10gr1/lmt_uniform.dbf' size 1048640K reuse
  3  autoextend on next 100m
  4  extent management local
  5  uniform size 100m;
Tablespace created.

And next we create LMT_AUTO, which uses AUTOALLOCATE to determine extent sizes:

ops$tkyte@ORA10GR1> create tablespace lmt_auto
  2  datafile '/u03/ora10gr1/lmt_auto.dbf' size 1048640K reuse
  3  autoextend on next 100m
  4  extent management local
  5  autoallocate;
Tablespace created.

Each tablespace started with a 1GB data file (plus 64KB used by locally-managed tablespaces to manage the storage; it would be 128KB extra instead of 64KB if we were to use a 32KB blocksize). We permit these data files to autoextend 100MB at a time. We are going to load this file:

$ ls -lag big_table.dat
-rw-rw-r--    1 tkyte    1067107251 Jul 11 13:46 big_table.dat

which is a 10,000,000-record file. It was created using the big_table.sql script found in the “Setting Up” section at the beginning of this book and then unloaded using the flat.sql script available on 
http://asktom.oracle.com/~tkyte/flat/index.html
Next, we do a parallel direct path load of this file into each tablespace:

ops$tkyte@ORA10GR1> create table uniform_test
  2  parallel
  3  tablespace lmt_uniform
  4  as
  5  select * from big_table_et;
Table created.

ops$tkyte@ORA10GR1> create table autoallocate_test
  2  parallel
  3  tablespace lmt_auto
  4  as
  5  select * from big_table_et;
Table created.

On my system, which has four CPUs, these CREATE TABLE statements executed with eight parallel execution servers and one coordinator. I verified that was the case by querying one of the dynamic performance views related to parallel execution, V$PX_SESSION, while these statements were running:

sys@ORA10GR1> select sid, serial#, qcsid, qcserial#, degree
  2  from v$px_session;

       SID    SERIAL#      QCSID  QCSERIAL#     DEGREE
---------- ---------- ---------- ---------- ----------
       137         17        154        998          8
       139         13        154        998          8
       141         17        154        998          8
       150        945        154        998          8
       161        836        154        998          8
       138          8        154        998          8
       147         15        154        998          8
       143         41        154        998          8
       154        998        154

9 rows selected.

Note    In creating the UNIFORM_TEST and AUTOALLOCATE_TEST tables, we simply specified “parallel” on each table, with Oracle choosing the degree of parallelism. In this case, I was  the sole user of the machine (all resources available) and Oracle defaulted it to 8 based on the number of CPUs (four) and the PARALLEL_THREADS_PER_CPU parameter setting, which defaults to 2. 

The SID,SERIAL# are the identifiers of the parallel execution sessions, and the QCSID,QCSERIAL# is the identifier of the query coordinator of the parallel execution. So, with eight parallel execution sessions running, we would like to see how the space was used. A quick query against USER_SEGMENTS gives us a good idea:

ops$tkyte@ORA10GR1> select segment_name, blocks, extents
  2  from user_segments 
  3 where segment_name in ( 'UNIFORM_TEST', 'AUTOALLOCATE_TEST' );
 
SEGMENT_NAME        BLOCKS    EXTENTS
--------------- ---------- ----------
UNIFORM_TEST        204800         16
AUTOALLOCATE_TEST   145592        714

Since we were using an 8KB blocksize, that shows a difference of about 462MB, or looking at it from ratio perspective, AUTOALLOCATE_TEST is about 70 percent the size of UNIFORM_TEST as far as allocated space goes. If we look at actual used space

ops$tkyte@ORA10GR1> exec show_space('UNIFORM_TEST' );
Free Blocks.............................          59,224
Total Blocks............................         204,800
Total Bytes.............................   1,677,721,600
Total MBytes............................           1,600
Unused Blocks...........................               0
Unused Bytes............................               0
Last Used Ext FileId....................               6
Last Used Ext BlockId...................               9
Last Used Block.........................          12,800
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA10GR1> exec show_space('AUTOALLOCATE_TEST' );
Free Blocks.............................              16
Total Blocks............................         145,592
Total Bytes.............................   1,192,689,664
Total MBytes............................           1,137
Unused Blocks...........................               0
Unused Bytes............................               0
Last Used Ext FileId....................               8
Last Used Ext BlockId...................              41
Last Used Block.........................               8
PL/SQL procedure successfully completed.

 Note    The SHOW_SPACE procedure is described in the “Setting Up” section at the beginning of this book.

we can see if we take away the blocks on the freelist for UNIFORM_TEST—59,224 of them—the tables consume about the same amount of actual space, but the amount of space needed by the UNIFORM tablespace is considerably more. This is all due to the extent trimming that did not take place. It we look at UNIFORM_TEST, we see this clearly:

ops$tkyte@ORA10GR1> select segment_name, extent_id, blocks
  2  from user_extents where segment_name = 'UNIFORM_TEST';
 
SEGMENT_NAME     EXTENT_ID     BLOCKS
--------------- ---------- ----------
UNIFORM_TEST             0      12800
UNIFORM_TEST             1      12800
UNIFORM_TEST             2      12800
UNIFORM_TEST             3      12800
UNIFORM_TEST             4      12800
UNIFORM_TEST             5      12800
UNIFORM_TEST             6      12800
UNIFORM_TEST             7      12800
UNIFORM_TEST             8      12800
UNIFORM_TEST             9      12800
UNIFORM_TEST            10      12800
UNIFORM_TEST            11      12800
UNIFORM_TEST            12      12800
UNIFORM_TEST            13      12800
UNIFORM_TEST            14      12800
UNIFORM_TEST            15      12800
 
16 rows selected.

Each extent is 100MB in size. Now, it would be a waste of paper to list all 714 extents, so let’s look at them in aggregate:

ops$tkyte@ORA10GR1> select segment_name, blocks, count(*)
  2  from user_extents
  3  where segment_name = 'AUTOALLOCATE_TEST'
  4  group by segment_name, blocks
  5  /
 
SEGMENT_NAME          BLOCKS   COUNT(*)
----------------- ---------- ----------
AUTOALLOCATE_TEST          8        128
AUTOALLOCATE_TEST        128        504
AUTOALLOCATE_TEST        240          1
AUTOALLOCATE_TEST        392          1
AUTOALLOCATE_TEST        512          1
AUTOALLOCATE_TEST        656          1
AUTOALLOCATE_TEST        752          5
AUTOALLOCATE_TEST        768          1
AUTOALLOCATE_TEST       1024         72
 
9 rows selected.

This generally fits in with how locally-managed tablespaces with AUTOALLOCATE are observed to allocate space. The 8, 128, and 1,024 block extents are “normal”; we will observe them all of the time with AUTOALLOCATE. The rest, however, are not “normal”; we do not usually observe them. They are due to the extent trimming that takes place. Some of the parallel execution servers finished their part of the load—they took their last 8MB (1,024 blocks) extent and trimmed it, resulting in a spare bit left over. One of the other parallel execution sessions, as it needed space, could use this spare bit. In turn, as these other parallel execution sessions finished processing their own loads, they would trim their last extent and leave spare bits of space. 
So, which approach should you use? If your goal is to direct path load in parallel as often as possible, I suggest AUTOALLOCATE as your extent management policy. Parallel direct path operations like this will use not use space under the object’s HWM—the space on the freelist. So, unless you do some conventional path inserts into these tables also, UNIFORM allocation will permanently have additional free space in it that it will never use. Unless you can size the extents for the UNIFORM locally-managed tablespace to be much smaller, you will see what I would term excessive wastage over time, and remember that this space is associated with the segment and will be included in a full scan of the table.
To demonstrate this, let’s do another parallel direct path load into these existing tables, using the same inputs:

ops$tkyte@ORA10GR1> alter session enable parallel dml;
Session altered.
 
ops$tkyte@ORA10GR1> insert /*+ append */ into UNIFORM_TEST 
  2 select * from big_table_et;
10000000 rows created.
 
ops$tkyte@ORA10GR1> insert /*+ append */ into AUTOALLOCATE_TEST 
  2 select * from big_table_et;
10000000 rows created.
 
ops$tkyte@ORA10GR1> commit;
Commit complete.
If we compare the space utilization of the two tables after that operation as follows:
ops$tkyte@ORA10GR1> exec show_space( 'UNIFORM_TEST' );
Free Blocks.............................         118,463
Total Blocks............................         409,600
Total Bytes.............................   3,355,443,200
Total MBytes............................           3,200
Unused Blocks...........................               0
Unused Bytes............................               0
Last Used Ext FileId....................               6
Last Used Ext BlockId...................         281,609
Last Used Block.........................          12,800

PL/SQL procedure successfully completed.
 
ops$tkyte@ORA10GR1> exec show_space( 'AUTOALLOCATE_TEST' );
Free Blocks.............................              48
Total Blocks............................         291,184
Total Bytes.............................   2,385,379,328
Total MBytes............................           2,274
Unused Blocks...........................               0
Unused Bytes............................               0
Last Used Ext FileId....................               8
Last Used Ext BlockId...................         140,025
Last Used Block.........................               8
 
PL/SQL procedure successfully completed.
we can see that as we load more and more data into the table UNIFORM_TEST using parallel direct path operations, the space utilization gets worse over time. We would want to use a significantly smaller uniform extent size or use the AUTOALLOCATE. The AUTOALLOCATE may well generate more extents over time, but the space utilization is superior due to the extent trimming that takes place. 
</quote> 

Thank you!

Gerrold Kuijpers, August 26, 2005 - 6:27 am UTC

Thanks Tom. I had not hoped for such an extended explanation. Given the constraints of the project (version 8 database, vendor determined definitio and a few more) we'll stick with concatenating the files first, but it was very usefull to have it explained.

What is the title of the new book, so I can pre-order it?

Tom Kyte
August 26, 2005 - 8:47 am UTC

I'm going to hit my head with a hammer

Serge Shmygelsky, March 31, 2006 - 9:21 am UTC

I've spent almost entire day looking for the solution of this problem until I found this post. I should finish your book. I should have finished it the day I bought it!!!
Just in case. If there are any drawbacks for using AUTOALLOCATE?

As I remember, when we're scanning table, we cannot cross extent boundaries, e.g. even if we specify DB_FILE_MULTIBLOCK_READ_COUNT=16 we won't be able to read 16 blocks per 1 I/O for 8 blocks extent. So obviously we'll need more I/Os. I'm not sure it is a big deal but maybe there are other 'gotchas'?

Tom Kyte
March 31, 2006 - 12:26 pm UTC

only the first few are 8blocks - I would not be overly concerned about that.

the drawback to autoallocate is lack of control - if you cannot live with the fact that you cannot dictate the extent size, you will not like it.

Need to load multiple files into one table.

Russell Flower, November 15, 2006 - 10:27 pm UTC

Tom,

We are using Oracle 9i on Sun Solaris.

We have the requirement to load approx 7000 files per day into the database (currently have approx 53K files with backdata to load). Each file contains approx 288 records

The files all contain data like this:
1163509500,79314313.7066667
1163509800,79314336.2933333
1163510100,79314214.3466667
1163510400,79314326.8

Where column 1 is the unix epoch of the measurement and column 2 is the actual measurement (may be NULL, 0 or a number).

There is a requirement to show summaries by network component (each of the 7000 files is a component). Therefore I need to load the component name as part of the database table. Looking around, I found a thread of yours that mentioned scripting a file and feeding the place as a CONSTANT to SQL Loader. I also convert the epoch to a date/time as part of the dataload process.

I have come up with the following unix shell script:

for FILENAME in ${HOME}/rsf/BPlive/*.csv
do
FILERUN=`echo ${FILENAME} | sed 's/^.*\///'`

echo load data > test_langy.ctl
echo into table TEMP_TEST_LANGY append >> test_langy.ctl
echo FIELDS TERMINATED BY \',\' >> test_langy.ctl
echo TRAILING NULLCOLS >> test_langy.ctl
echo "(" >> test_langy.ctl
echo PLACE CONSTANT '"'${FILERUN}'"' , >> test_langy.ctl
echo DATADATA CHAR '"'TO_DATE'('\'01-JAN-1970\',\'DD-MON-YYYY\'')'+:datadata/86400'"', >> test_langy.ctl
echo MEASUREMENT CHAR '"'TO_NUMBER'('RTRIM'(':Measurement')'')''"' >> test_langy.ctl
echo ")" >> test_langy.ctl

sqlldr UID/PW ${HOME}/rsf/test_langy.ctl data=${FILENAME} >> test_langy.out

done


which results in a CTL file created like:

load data
into table TEMP_TEST_LANGY append
FIELDS TERMINATED BY ','
TRAILING NULLCOLS
(
PLACE CONSTANT "woo-core6506-2_mem5minUsed_processor.csv" ,
DATADATA CHAR "TO_DATE('01-JAN-1970','DD-MON-YYYY')+:datadata/86400",
MEASUREMENT CHAR "TO_NUMBER(RTRIM(:Measurement))"
)

As an example, the filenames look like:
20061116_woo-core6506-2_ifOutPkts_gigabitethernet3_17.csv
20061116_woo-core6506-2_ifOutPkts_gigabitethernet3_18.csv
20061116_woo-core6506-2_ifOutPkts_gigabitethernet3_19.csv
20061116_woo-core6506-2_ifOutPkts_gigabitethernet3_20.csv
20061116_woo-core6506-2_ifOutPkts_gigabitethernet3_8.csv
20061116_woo-core6506-2_ifOutPkts_tengigabitethernet1_1.csv

the table structure is as follows:
CREATE TABLE TEMP_TEST_LANGY
(
PLACE VARCHAR2(100 BYTE),
DATADATA DATE,
MEASUREMENT NUMBER
)


The process took approx 40 minutes to load the 7000 tables. I know this is OK, but I am thinking there could be a better / more efficient way to do this.

I have read a bit about direct path / parallel / and external tables, but not sure how I could implement any of these concepts.

Do you have any ideas or tips that may help here?

Is it possible to create an extrenal table that looks at all CSV files in a directory?

Can the filenames be picked up somehow as part of that external table?

Thanks again.

Russell

Tom Kyte
November 16, 2006 - 3:08 pm UTC

288 records - no direct path for you, too small unless you concatenate lots of them together.

so, why not do a bigger file

or run many in parallel. & them, 10 or so at a time, wait and then do 10 more.

Proving value of AUTOALLOCATE

Stewart W. Bryson, July 19, 2007 - 4:22 pm UTC

Tom:

I'm architecting a data warehouse for a client of mine, and this environment is very OLTP-centric, and the DBA's want to manage the data warehouse as if it were an OLTP database. The ETL is doing parallelized, direct-path inserts into tables, and I have argued and argued for AUTOALLOCATE tablespaces, but to no avail. I've brought up the value of extent trimming, to which I hear the response "space is cheap, don't worry." The loads are UPDATE and DELETE free... just pure inserts. The dimension tables are "updated" every load with "INSERT /*+ APPEND */... SELECT..." statements into staging tables which are then partition exchanged back in. The fact tables are loaded with direct-path parallelized inserts into the current partition.

The DBA is concerned about fragmentation... he fails to see that there would never be any with AUTOALLOCATE with our loading paradigm, since all the data would be packaged in incredibly tight. As a matter of fact, UNIFORM extent sizing will be the reason our data is not packed in tight.

Since the concern with wasted space is falling on deaf ears, I wanted to take the performance route. Is there any way to demonstrate how many half-empty blocks are being returned in a full table scan, for instance?

Thanks very much.
Tom Kyte
July 19, 2007 - 7:31 pm UTC

what is their concern with "fragmentation" what are they worried about? (remember - disk is cheap, they said so, so it cannot be wasted space of which they by definition HAVE A TON OF - so it must be for some other reason - what is it?)

it won't be half empty blocks, we'll stop scanning when we hit the high water mark on the extents....

if you have access to my book Expert Oracle Database Architecture - in the parallel chapter, I go into this in detail - we reprinted it recently in the magazine too:

https://www.oracle.com/technetwork/issue-archive/2007/07-may/o37asktom-101781.html

that might help

AUTOALLOCATE continued

Stewart W. Bryson, July 20, 2007 - 10:31 am UTC

I've read the article and the book... I went looking for the question here so I could follow-up. :)

One thing you mentioned in the article compared with one of your comments here confuses me a little.

Listed here:
>> it won't be half empty blocks, we'll stop scanning when we hit the high water mark on the extents....
From the article:
>> you will see what I would term excessive wasted space over time¿and remember that this space is associated with the segment and will be included in a full scan of the table.


As to the why on "fragmentation"... it's the "M" word you're so familiar with: "MYTH". You know the one... it goes like this:

"Fragmentation is bad"! End of story.

For one thing... they don't believe me when I tell them the loads are UPDATE and DELETE free. I guess it's a foreign concept. They asked the question: "What if there's a mistake and you have to correct it?" My response: "Then we'll CTAS and replace it." They don't get it.

Project is on a tight deadline, so I have to be careful with my time. Do you think that I will see actual performance gains in addition to the wasted space if I take your example and run a full gambit of performance tests doing FTS's against the segments in the different tablespaces?

Thanks much for your help.


Tom Kyte
July 20, 2007 - 5:00 pm UTC

sorry, the "half full blocks" bit tripped us up - they won't be half full - but anything below the high water mark (which these will be after a parallel direct in the uniform) will be scanned - negatively impacting the full scan - yes.


AUTOALLOCATE

Stewart W. Bryson, July 21, 2007 - 5:58 pm UTC

I meant to say "half full extents"... completely changes the meaning. And really, I shouldn't say "half-full"... I should say "less than full extents".

Any way to see the "less than full extents"... those that would have been trimmed had these inserts been done in an AUTOALLOCATE tablespace, as opposed to a UNIFORM tablespace?

Thanks very much.
Tom Kyte
July 22, 2007 - 11:08 am UTC

In the referenced article - I use my script "show space" to show free space, eg:

Code Listing 5: Show space used

SQL> exec show_space('UNIFORM_TEST' );<b>
Free Blocks........................................................59,224</b>
Total Blocks......................................................204,800
Total Bytes.................................................1,677,721,600
Total MBytes........................................................1,600
Unused Blocks...........................................................0
Unused Bytes............................................................0
Last Used Ext FileId....................................................6
Last Used Ext BlockId...................................................9
Last Used Block....................................................12,800
PL/SQL procedure successfully completed.
 
SQL> exec show_space('AUTOALLOCATE_TEST' );<b>
Free Blocks............................................................16</b>
Total Blocks......................................................145,592
Total Bytes.................................................1,192,689,664
Total MBytes........................................................1,137
Unused Blocks...........................................................0
Unused Bytes............................................................0
Last Used Ext FileId....................................................8
Last Used Ext BlockId..................................................41
Last Used Block.........................................................8
PL/SQL procedure successfully completed.

Code Listing 6: Extents in UNIFORM_TEST 


every block that is on the freelist here represents an allocated block that will be scanned by has NO DATA ON IT (because we only load, so, the load fills a block and goes on - all blocks on the freelist are actually empty!)

Now, I did the test again using ASSM (automatic segment space management) and here it is even more clear:

ops$tkyte%ORA10GR2> exec show_space( 'UNIFORM_TEST' );
Unformatted Blocks .....................               0
FS1 Blocks (0-25)  .....................               0
FS2 Blocks (25-50) .....................               0
FS3 Blocks (50-75) .....................               0<b>
FS4 Blocks (75-100).....................          43,024</b>
Full Blocks        .....................         364,952
Total Blocks............................         409,600
Total Bytes.............................   3,355,443,200
Total MBytes............................           3,200
Unused Blocks...........................               0
Unused Bytes............................               0
Last Used Ext FileId....................              24
Last Used Ext BlockId...................         345,609
Last Used Block.........................          12,800

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> exec show_space( AUTOALLOCATE_TEST' );
ERROR:
ORA-01756: quoted string not properly terminated


ops$tkyte%ORA10GR2> exec show_space( 'AUTOALLOCATE_TEST' );
Unformatted Blocks .....................               0
FS1 Blocks (0-25)  .....................               0
FS2 Blocks (25-50) .....................               0
FS3 Blocks (50-75) .....................               0<b>
FS4 Blocks (75-100).....................              23</b>
Full Blocks        .....................         364,953
Total Blocks............................         367,256
Total Bytes.............................   3,008,561,152
Total MBytes............................           2,869
Unused Blocks...........................               0
Unused Bytes............................               0
Last Used Ext FileId....................              25
Last Used Ext BlockId...................         368,777
Last Used Block.........................              64

PL/SQL procedure successfully completed.



the FS4 (75% to 100% EMPTY) blocks are under the high water mark and will be scanned - and the uniform one in general will have lots more.

Excellent Tom... thanks

Stewart W. Bryson, July 24, 2007 - 10:55 am UTC

Thats what I was looking for.

As always, thanks.

Data segment compression

Stewart W. Bryson, July 24, 2007 - 1:07 pm UTC

One more follow-up... does data segment compression have any effect on this? I'm seeing some partitions that I believe should have less than full blocks, but there aren't any. The only difference is that these are in a fact table and I'm compressing the fact table partitions.

Thanks.
Tom Kyte
July 26, 2007 - 8:39 am UTC

Interesting, it does seem to have an effect, the blocks remain unformatted.

ops$tkyte%ORA10GR2> exec show_space('UNIFORM_TEST' );
Unformatted Blocks .....................          81,425
FS1 Blocks (0-25)  .....................               0
FS2 Blocks (25-50) .....................               0
FS3 Blocks (50-75) .....................               0
FS4 Blocks (75-100).....................               0
Full Blocks        .....................         122,551
Total Blocks............................         204,800
Total Bytes.............................   1,677,721,600
Total MBytes............................           1,600
Unused Blocks...........................               0
Unused Bytes............................               0
Last Used Ext FileId....................              24
Last Used Ext BlockId...................         128,009
Last Used Block.........................          12,800

PL/SQL procedure successfully completed.

An example

Stewart W. Bryson, July 24, 2007 - 1:11 pm UTC

Here's an example of the point above:

SQL> exec show_space('AR_TRANSACTION_FACT','WHDATA','TABLE PARTITION','AR_TRXN_2007P5');
Unformatted Blocks .....................               0
FS1 Blocks (0-25)  .....................               0
FS2 Blocks (25-50) .....................               0
FS3 Blocks (50-75) .....................               0
FS4 Blocks (75-100).....................               0
Full Blocks        .....................           8,924
Total Blocks............................           9,600
Total Bytes.............................     314,572,800
Total MBytes............................             300
Unused Blocks...........................             608
Unused Bytes............................      19,922,944
Last Used Ext FileId....................              46
Last Used Ext BlockId...................         325,125
Last Used Block.........................              32

PL/SQL procedure successfully completed.

Elapsed: 00:00:25.31
SQL> 


There's a difference between full blocks and total blocks, but that difference isn't showing up in the FSN buckets.

segment-switch loading

Stewart W. Bryson, August 22, 2007 - 2:57 pm UTC

Tom:

I have a question regarding scenarios where "segment-switching" is used to load large amounts of data into a table, either through partition exchanging, table renaming, etc.

Which of the following methods is better from a space utilization perspective? For instance, assuming both segments are in the same tablespace and I am performing a partition exchange, which would be superior:
1) Create a new segment, insert the required records into that segment, exchange in the new segment, and drop the old exchanged-out segment.
2) Insert the required records into an already existent and truncated staging segment, exchange in that new segment, and then truncate the old, exchanged-out segment for reuse the next time this process runs.

Number 1) seems better from a space utilization perspective. It seems that keeping the extents of each segment together will lead to larger blocks of contigous space when the staging segment is dropped.

Since I am using local tablespaces and direct-path inserts, it might be that neither is better than the other.

Thanks as always.

Tom Kyte
August 23, 2007 - 10:42 am UTC

contigous - smigous

they are not going to be physically contiguous - file systems do not store things that way and with RAID and other things - the bits and bytes are all over the place anyway.

even if they were contiguous, that disk is servicing everyone - the heads, they are a moving (the old - do it contiguous because the heads will be at the right place - that is not happening, ever)

and if you use locally managed tablespaces (with system allocated extent sizes so we can do extent trimming if space utilization is your key concern) - space will be used as efficiently as you'll get.

contiguous extents - not something to really be shooting for, not relevant.

Thanks

Stewart W. Bryson, August 23, 2007 - 12:22 pm UTC

I'd love to be a fly on the wall watching you answer these questions. Are you laughing? Are you crying? Both at the same time?

By the way... is the blog dead?

Tom Kyte
August 24, 2007 - 1:50 pm UTC

the blog will be resurrected in September...

Switching jobs here at Oracle - taking most of August "off" (not entirely off, but mostly off - spending time with the kids :) )

More to Explore

Performance

Get all the information about database performance in the Database Performance guide.