Database, SQL and PL/SQL

On Partitions, Lookups, and Integrity

Our expert partitions, looks up with a PL/SQL table, and does everything with referential integrity.

By Tom Kyte

March/April 2003

We would like to convert all our regular tables into partitioned tables. These regular tables might have millions of rows. The following information shows what we have now:

TableName   Structure
---------   ----------------
TABLE_1
TABLE_2     Same as TABLE_1
TABLE_3     Same as TABLE_1
TABLE_4
TABLE_5     Same as TABLE_4
TABLE_6     Same as TABLE_4
TABLE_7     Same as TABLE_4
TABLE_8     Same as TABLE_4
TABLE_9
TABLE_10    Same as TABLE_1
TABLE_11    Same as TABLE_1
TABLE_12
...         ...

As you can see, several tables have the same structure as other tables. Now, we want partitioned tables as follows:

TableName        Comments
---------        ------------------------
TABLE_1          will also have TABLE_2, TABLE_3, TABLE_10 & TABLE_11 data
TABLE_4          will also have TABLE_5, TABLE_6, TABLE_7 & TABLE_8 data
TABLE_12         only its data

How do we convert all regular tables into partitioned tables? Note that we are using one of the date columns as a partition key. What is the fastest and safest way to migrate?

This is pretty easy to do. In the following, pretend that t1, t2 , and t3 are your existing tables. Table t is the table you want (you can drop t1 and rename t to t1 later if needed). I'll use the following approach:

SQL> create table t1
as select sysdate dt,
all_objects.* from all_objects;
Table created.
SQL> create table t2
as select add_months(sysdate,-12) dt,
all_objects.* from all_objects;
Table created.
SQL> create table t3
as select add_months(sysdate,-24) dt,
all_objects.* from all_objects;
Table created.
SQL> create table t (dt, OWNER,
OBJECT_NAME, SUBOBJECT_NAME,OBJECT_ID,
DATA_OBJECT_ID, OBJECT_TYPE, CREATED,
LAST_DDL_TIME, TIMESTAMP, STATUS,
TEMPORARY, GENERATED, SECONDARY)
partition by range(dt) (
partition part2000 values less than
(to_date('01-jan-2001','dd-mon-yyyy')),
partition part2001 values less than
(to_date('01-jan-2002','dd-mon-yyyy')),
partition part2002 values less than
(to_date('01-jan-2003','dd-mon-yyyy'))
) as
select sysdate dt, all_objects.*
from all_objects where 1=0;
Table created.

I just created an empty partitioned table, partitioned by date, that is structured exactly like your existing table. The ranges are such that t1 fits in one partition, t2 in another, and so on. Now, what you want to do is swap that empty partition part2000 with your full table t1 , as follows:

SQL> alter table t
  2  exchange partition part2000
  3  with table t3
  4  without validation
  5  /
Table altered.

Do the same for the remaining table/ partition pairs: swap t2 with part2001 and t3 with part2002 . Now, t is a table with partitions formerly known as tables t1, t2 , and t3.

All About Lookups

I was told that instead of repeatedly querying a lookup table for lookup values (like querying a description table for description using description_id), it is much more efficient to load a PL/SQL table and do a lookup on it. Can you show me how we can load a PL/SQL table? How do you perform a lookup using a PL/SQL table? If I want to load a PL/SQL table with values from more than just columns, can I do it? Aren't PL/SQL tables single-dimensional?

The answer to this is "it all depends." Let's say you have the following code:

For x in ( select * from t )
Loop
   select description
     into l_description
     from lookup
    where id = x.description_id;
   insert into another_tableÉ

Given this code, using a PL/SQL lookup table might be faster; however, it will not be as fast as getting the correct answer! Following my mantra of "if you can do it in a single SQL statement, do it," you should use a join. The following example compares an insert with and without a lookup table:

SQL> create table lookup
  2  (id primary key,
  3   description )
  4  organization index
  5  as
  6  select object_id, object_name
  7    from all_objects;
Table created.

That will be the lookup table: a table of codes ( id ) for descriptions. In a moment, you'll see how to load that into a PL/SQL index by table with which to perform lookups. But first, I'll generate another table of test data—the input data on which I want to do lookups:

SQL> create table t
  2  as
  3  select object_id
  4    from all_objects;
Table created.

And then I'll create a table to "load into." I'll be simulating a typical data load, which includes reading some data, doing lookups, and loading into another table:

SQL> create table target
  2  (id number, descript varchar2(30));
Table created.

I'll start with the PL/SQL lookup table approach. First, I have to load up the PL/SQL table type; that is accomplished in lines 8 through 12. Then, I iterate over the data to be processed, do the lookup, and insert it into the target table:

SQL> declare
  2    type lookupTableType is table of
  3     lookup.description%type index by
  4     binary_integer;
  5
  6     l_lookup lookupTableType;
  7  begin
  8     for x in (select * from lookup)
  9     loop
 10        l_lookup(x.id) :=
 11           x.description;
 12     end loop;
 13
 14     for x in (select * from t)
 15     loop
 16        begin
 17           insert into target
 18           values
 19           ( x.object_id,
 20             l_lookup(x.object_id) );
 21        exception
 22           when no_data_found then
 23              insert into target
 24              values
 25              ( x.object_id, null );
 26        end;
 27     end loop;
 28  end;
 29  /
PL/SQL procedure successfully completed.
Elapsed: 00:00:13.24

As you can see, it took a little over 13 seconds to do the lookup. Can we do better? Definitely:

SQL> begin
  2     insert into target
  3     select t.object_id, l.description
  4       from t, lookup l
  5      where t.object_id = l.id;
  6  end;
  7  /
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.43

This single insert does the same exact thing in a mere fraction of the time. Almost any time you can remove procedural code and do the same thing in a single SQL statement, you'll be better off. There's less code to type and maintain, and it's generally much faster, to boot.

Is Referential Integrity Slower?

We have a table, city , consisting of different cities where our clients' offices are located. The city table consists of metadata: column values that are not likely to change at all. We have some employee allowances calculated based on the city where the employee works.

We have a Visual Basic form for inserting employee-related details. We have a foreign key on the city column in the employee table, and the parent key is the city table.

One of our consultants recommended that we discard checking the validity of the city entered and maintain all city validation checking through front-end coding. The reasons cited were that referential integrity checking at the back end is too time-consuming and slows down the data-entry processing jobs.

Initially, we would have a lot of data entry, and the number of cities in the city table is 25. I wasn't convinced by the consultant's recommendations. Is the argument valid?

You are wise; the consultant is wrong.

Doing this checking on the client will not be faster. The client has to make a round trip to the database in order to do this unless the lookup table is cached—that is, unless the check always brings the entire table back to the client over the network. Now, if the client tries to cache this information, the client will succeed only in logically corrupting the data at some point when you do decide to update this table, or else you'll have to ask all your users to exit the program first! I suppose that is what they are thinking here: "cache it" and "cache is faster."

You must ask yourself: Is this the only application that will access this data forever? (This is a rhetorical question; history proves that the answer is no). So, if you hide all the rules of how the data relates in the application, what then? In a year, you have to dig it all out again and reimplement it over and over.

Did the consultant benchmark this to see the impact on performance? Let's do a quick check. I'll use the following simple schema:

drop table cities;
create table cities as
select username city from all_users
where rownum <= 50;
alter table cities add constraint cities_pk primary key(city);
create table t1 ( x char(80), city
references cities );
create table t2 ( x char(80), city varchar2(30) );

Now for the benchmark. I'll insert heavily into t1 with declarative referential integrity enabled and into t2 without it. To do that, I'll use the following code:

alter session set sql_trace=true;
declare
    type array is table of varchar2(30)
           index by binary_integer;
    l_data array;
begin
    select * BULK COLLECT into l_data
      from cities;
    for i in 1 .. 1000
    loop
        for j in 1 .. l_data.count
        loop
            insert into t1
            values ('x', l_data(j) );
        end loop;
    end loop;
end;
/

Copy this PL/SQL block and change t1 to t2 for the second test. Now, using the TKPROF tool to format the SQL_TRACE output, you discover the following (some information edited out for space):

INSERT into t1
values
 ('x', :b1 )
call    count   cpu     query   current
-----   -----   ----    -----   -------
Parse   1       0.00    2       0
Execute 37000   9.25    479     78969
Fetch   0       0.00    0       0
-----   -----   ----    -----   -------
total   37001   9.26    481     78969
INSERT into t2
values
 ('x', :b1 )
call    count   cpu     query   current
----    -----   ----    -----   -------
Parse   1       0.00    0       0
Execute 37000   7.87    484     41900
Fetch   0       0.00    0       0
----    -----   ----    -----   -------
total   37001   7.88    484     41900

So, with referential integrity we get 0.00025 CPU seconds (9.25/37,000) per single row insert and 0.000212703 CPU seconds (7.87/37,000) per single row insert without it. Will your end users notice? Will you notice? The vast majority of the end user wait experience will be from the VB program painting screens. Do you think this consultant's lookup routine will execute in .000037297 seconds? If not, you are better off in the database.

Not only that, but suppose you decide to run an ad hoc query on this application data later. The tools will have the referential integrity to guide them. When you go to extend this application later, no new application can destroy the integrity, which it could otherwise do easily. And when you change the list of cities, you won't have to first shut down all clients (to clear their caches) and then restart them. You'll be able to sleep at night knowing your database is protected.

The benefits you get—data integrity, ease of maintenance, and self-documenting code, to name a few—far outweigh any perceived time issues here.

Queries for DDL Actions

I need to know how to identify the data definition language (DDL) actions performed in the last 24 hours. If a developer overwrites some function, is there any way to get the old code from SYSTEM tables? Also, how can I identify the dependent objects and get the list of other objects dependent on the object in all schemas?

You want to enable auditing. It won't help you for the last 24 hours, but tomorrow you'll be set to go. Once someone overwrites the code, the old code is gone, wiped out, and just not there anymore. Short of your restoring the database and doing a point-in-time recovery to before the time someone dropped the procedure, that code is gone.

You could use a DDL trigger (before CREATE ) to capture the code and do the audit as well. This would work in so many cases that I'll expand on the idea and implement it here. I'll start with a table to hold the old code (and multiple versions of it) and record who overwrote it and when:

SQL> create table old_code
  2  as
  3  select user username, 0 version,
            sysdate date_changed,
            user_source.*
  4    from user_source
  5   where 1=0
  6  /
Table created.
SQL> create sequence version_seq;
Sequence created.

Now I can use this event trigger before CREATE on a schema to save any "old code" before it is overwritten:

SQL> create or replace
         trigger create_trigger
  2  before create on schema
  3  declare
  4    l_date  date := sysdate;
  5    l_ver   number;
  6  begin
  7    if (ora_dict_obj_type in
              ( 'PACKAGE',
                'PACKAGE BODY',
                'PROCEDURE',
                'FUNCTION' ) )
  8    then
  9       select version_seq.nextval
            into l_ver
            from dual;
 10
 11       insert into old_code
 12       select user, l_ver,
               l_date, user_source.*
 13         from user_source
 14        where name = ora_dict_obj_name
 15         and type = ora_dict_obj_type;
 16    end if;
 17  end;
 18  /
Trigger created.

To show this trigger in action, the following example walks through a case where I replace a function in the database:

SQL> create or replace
     function f return number
  2  as
  3  begin
  4          return 0;
  5  end;
  6  /
Function created.
SQL> create or replace
     function f return date
  2  as
  3  begin
  4          return sysdate;
  5  end;
  6  /
Function created.
SQL> select line, text from
     old_code order by line;
LINE    TEXT
----  -------------------------
   1    function f return number
   2    as
   3    begin
   4    return 0;
   5    end;

The original code is safe and secure. I kept track of who overwrote it and when (so I can yell at that person if need be), and I have the source in the event that I need to restore it.

As for getting the dependencies, the USER_DEPENDENCIES, ALL_DEPENDENCIES , and DBA_DEPENDENCIES data dictionary views contain all that for you.

Getting the Record Count of an OS File

We load between 4 and 15 flat files from two different legacy systems into our data warehouse nightly. As part of our regular daily audit of the data loads, we need to verify the record count of the files (sent via FTP). All files are fixed-length. Record lengths vary from 100 characters to more than 3,500 characters. All files are in the US7ASCII character set. The number of records for the "narrower" files ranges from less than 50 to 250,000; the number of records for the "wider" files ranges from 10,000 to 4 million. Overall file sizes range from several kilobytes to over 100MB. Executing "wc Ðl" on our largest files typically runs in about a minute. Getting the actual OS flat-file record count is the stumper. I've seen the examples for getting a directory listing into a database table via Java and examples of executing OS commands from PL/SQL via Java as well, but I'm not a Java programmer, and I need to combine these two example cases so that I can get a record count of all files in a given directory and have the results stored in a database table. Any suggestions?

Since they are fixed-length files, all you need to do is get the file size and then divide by the record size. That is easy. Using a BFILE, you can query a file's size, and you should already know the logical record length (LRECL) for the files so you can divide by it:

SQL> create or replace directory
  2  MY_FILES as '/tmp/';
Directory created.
SQL> create or replace
  2  function number_of_records
              ( p_filename in varchar2,
                p_lrecl in number )
  3  return number
  4  as
  5   l_bfile    bfile;
  6   l_recs     number;
  7  begin
  8   l_bfile :=
        bfilename('MY_FILES',
                  p_filename);
  9   dbms_lob.fileopen( l_bfile );
 10   l_recs :=
         dbms_lob.getlength(l_bfile) /
           p_lrecl;
 11   dbms_lob.fileclose( l_bfile );
 12   return l_recs;
 13  end;
 14  /
Function created.
SQL> !wc -l /tmp/foo.dat
     18 /tmp/foo.dat
SQL> exec dbms_output.put_line
    (number_of_records( 'foo.dat', 20 ));
18
PL/SQL procedure successfully completed.

If for whatever reason you do not know the LRECL, you could go a step further:

SQL> create or replace
  2  function number_of_records
              ( p_filename in varchar2 )
  3  return number
  4  as
  5   l_bfile    bfile;
  6   l_lrecl    number;
  7   l_recs     number;
  8  begin
  9   l_bfile :=
       bfilename( 'MY_FILES',
                  p_filename );
 10   dbms_lob.fileopen( l_bfile );
 11   l_lrecl :=
         dbms_lob.instr
         ( l_bfile,
           utl_raw.cast_to_raw(chr(10)));
 12   l_recs :=
           dbms_lob.getlength( l_bfile )/
                   l_lrecl;
 13   dbms_lob.fileclose( l_bfile );
 14   return l_recs;
 15  end;
 16  /
Function created.
SQL> exec dbms_output.put_line
         (number_of_records('foo.dat'));
18
PL/SQL procedure successfully completed.

This function looks for the first newline character sequence ( chr(10) ) in the file and assumes that is the LRECL for the file.

Next Steps

ASK Tom
asktom.oracle.com
Oracle Vice President Tom Kyte answers your most difficult technology questions. Highlights from that forum appear in this column.

Share

On Partitions, Lookups, and Integrity

Our expert partitions, looks up with a PL/SQL table, and does everything with referential integrity.