Skip to Main Content


Question and Answer

Connor McDonald

Thanks for the question, Munzer.

Asked: December 08, 2001 - 10:04 pm UTC

Last updated: March 22, 2018 - 5:25 pm UTC

Version: 8i

Viewed 50K+ times! This question is

You Asked


1. I am designing a database for order entry. I have a table PO (po_no, po_date, po_status) and another child table (po_no, stock_no,stock_desc,qty, unit). Because a user may not find a stock_no in a reference table (for all stock numbers) he can enter a '999999' for any stock number and then enter his own description for it.

Since I had to have a primary key I added an "item_no" for each item on the po so i can have (PO_no, item_no,stock_no) as primary key. Is this a good design or there is a better way?

Problem with this is that user has to manually enter the item number on the form sicne i can not have a sequence that increases per PO, meaning if there are 10 items on the po it will go from 1 to 10 and then reset to start back again at 1.

Would it be better just to have a (one column) sequence number as a unqiue identified in the child table?

2. I am letting the user enter the info using an HTML form creatd by PL/SQL procedure. IS there a way for a user when he selects the stock number from a drop-down list the description will pop out automatically in the description field. DO you have to submit for with all data entered, find a description for that stock number and return all values again when running another procedure. I do not think this is a good way of doing it?

and Tom said...

1) since this item_no is meaningless, only present to ensure uniqueness, why does it have to go from 1..10. Why isn't it good enough just to be UNIQUE???

Use a sequence, it matters not if it goes from 1..10. You can always get 1..10 when you query if you want:

select child_table.*, row_number() over ( order by item_no )
from child_table
where po_no = :bind_variable;

2) see
</code> <code>


  (167 ratings)

Is this answer out of date? If it is, please let us know via a Comment


Database Design for Web Site

A reader, June 09, 2004 - 4:15 pm UTC

Hi Tom,

We have a web site which some parts of it (applications), need database. Some application's data are static and some parts are not and have transaction every minute! So in general we have two different approaches for tuning and backup/recovery policy! One needs to have a Standby database and taking hot backup every hour! and the other one cold backup once a week. Another thing is that they even have different audience! Kids and teenagers more logon into the static part and adults logon into News part, which is dynamic.

I was thinking of having one database as the development database, but two different databases on Production server!

What do you think? Please let me know your valuable idea.

Thank you again.

Tom Kyte
June 09, 2004 - 4:46 pm UTC

i think it all belongs in a single database, easier to manage, easier everything.

if you have a standby, why would you take a hot backup every hour? even without a standby -- you just need the archives. taking a hot backup eveyr hour would fall into the "overkill" category. run 2 standbys if you are truly paranoid (cascading -- prod -> standby -> standby)

but don't even thing about two databases, that would not be a "good thing" (tm)

Thank you Tom

A reader, June 10, 2004 - 9:59 am UTC

Can name some of disadvantages of having two database?

A reader, June 10, 2004 - 10:12 am UTC

Hi Tom,

I was thinking the same. But here is another DBA who thinks it's better to have two databases instead one and he has reasons like :
- We can have two different Pfiles, backup policies,...

I intuitively think one database is better but I couldn't name any specific reason for that! Can you name some of them?

Thank you again,

Tom Kyte
June 10, 2004 - 5:02 pm UTC

you can have different backup policies if you like with a single database (but tell me, how different would they be really? but if you like, you backup TABLESPACES belonging to applications on whatever schedule you want)

2 different pfiles sounds like a problem more than a "feature" or something "desirable".

ask them to give details.

I find two databases to be

o harder to manage, there are two, or N, not one. one is simple

o totally impossible to tune. how to stop database A from consuming 100 percent of the resources? at least if both are one in the same -- i can control things

o totally impossible to keep up to date. my *favorite* feature of single instance is that either both applications are at or NEITHER is. I can assure you of one thing, if you have N databases, you will have N versions of Oracle to maintain. I like "one" again.

o backup is easier -- one thing (one again).

I cannot think of a thing that is "harder"

Redo Generation For Standby Database!

A reader, June 14, 2004 - 3:00 pm UTC

Hi Tom,

Thank you for your reply. One thing that I may not mentioned clearly is as I already said we’ve set up a standby database for our primary database which is a dynamic part (news). By dynamic here I mean the data will change and it is important to have its backup. The other part which I said "Static", is not actually static. Even it has more transaction than the first one. But the data isn't worth it to backup! It's Game's Data that kids play on the website.

So now I think if I consolidate two databases that is for one website but absolutely two different and separate application regarding type of user, Data, backup policy, tuning, types of queries,..., Because of the standby database which automatically transfer and apply redo logs to it, will create network traffic and also storage problem on my standby database for those unwanted Redo Logs. What do you think ? Please as always let me know your valuable idea.


Tom Kyte
June 15, 2004 - 8:00 am UTC

You can use a logical standby -- and only do the segments of interest, instead of the entire thing.

A reader, June 15, 2004 - 11:58 am UTC

Data Guard Concept and Administration
“A logical standby database is logically identical to the primary database and can be
used as a production database if the primary database is taken offline for routine
maintenance or is damaged through human error, corruption, or a disaster.”

“ A physical standby database must be physically identical to the primary database. Put another way,
standby on-disk database structures must be identical to the primary database on a block-for-block basis, because a recovery operation applies changes block-for-block using the physical row ID.”

Hi Tom,

I read about Standby database but still have some confusion! What is the major different of Logical and Physical Standby Database? Is this correct if I say: With Logical Standby database, you can have selective datafiles for standby database? Does it mean, the primary database doesn’t create Archive log for those datafiles haven’t been selected in Standby database?

In general, how can I prevent creating Redo Log for part of my primary database? How can I solve the issue that I mentioned before for having two databases instead of one?

Could you please elaborate this more?

Thanks in advance,

Tom Kyte
June 15, 2004 - 4:16 pm UTC

a logical standby is maintained by turning REDO into inserts/updates/deletes.

A physical standby is maintained by applying redo to copies of datafiles.

so, a logical standby LOGICALLY contains the same data -- but it might(probably) be in different blocks, slots on blocks, and files.

a physical standby is a bit for bit copy of the primary.

A reader, June 15, 2004 - 4:33 pm UTC

Thank Tom.

Sorry if I am asking stupid question! But I didn't get it well!

So, how does logical standby database will avoid generating unwanted redo logs? If you remember, the issue of having two database instead one was, different needs of backup policy for different part of database. and I said, if we have one database, because of having a Standby database, it will create two many unnecessary redo logs and may create network traffic. and then you suggested use 'logical Standby' for my standby database. Still I don't know how it helps!

Thank you for your time,

Tom Kyte
June 16, 2004 - 11:14 am UTC

there are no "unwanted" redo logs.

the primary will generate redo for all objects (no avoiding that no MATTER WHAT).

the redo will only be applied to the standby if the redo is for some object you care about.

all of the redo would be shipped this is true -- but I'd rather (this is me, opinion, the way I would do it -- do what you will) size for the entire thing to be managed in a single database. It just makes everything easier.

table design

mo, June 25, 2004 - 5:12 pm UTC


WHen you usually design a database, and you have one parent and 4 children tables, do you usually add timestamps to 5 tables or just the parent.

(Creation_Date Date,
Created_User VARCHAR2(10),
Modifiy_Date DAte,
Modify_User VARCHAR2(10) );

I am thinking there is no reason to add these to 4 tables unless we insert/update children separately. If these are all part of an application form then do owe still need it.

On the other hand it may be easier to query one table to look at when record was created/modified.

What is your opinion on this?


Tom Kyte
June 26, 2004 - 1:11 pm UTC

only your business rules can decide that. If you ALWAYS update the parent and child record "as if it were one record" -- that would indicate to me that this is really a 1:1 optional relationship (eg: a parent has a SINGLE child).

I pretty much go for:

create table the_table
( common_attributes......,


that is, one table.

database design

mo, October 25, 2004 - 6:15 pm UTC


I would like to take your opinion on an oracle application design.

We have an internal web application that stores data on student applications. Applicants personal information is stored in an "STUDENT" table and application data in "APPLICATION" table. A student can submit more than one type of application.

All applications have been mailed by students and entered by internal staff. Each student was assigned a unique "Student_ID" to unquely identify him/her. Database has over 2000 students already.

Now, We want to post the HTML application forms externally on the web for students to enter to directly to the database. This will take the load off the internal staff.

Problem is when someone decides to fill an application on the web how do you verify that he already has a record in the "STUDENT" table and not create a duplicate record.

I was thinking of letting each student register before he can fill out an application. Then the system can assign him/her a userid/password (like Yahoo) so he can login next time and be identified. After he registers we can check (first name,mi, last name, city, state) against the database to see if he possibly might be there. We should inform him whether this was him or not. If he answers yes, then we update the record in "STUDENT" table, and if he says no or system could not find a match we insert a new record.

I am seeing a few flaws in this design like he may have changed his address or people could have same names, or he can misspell a letter in his name.

Do you have any good ideas to implement this?

Tom Kyte
October 25, 2004 - 7:49 pm UTC

How do the people that manually enter the data now determine "which student record" to update?

database design

mo, October 25, 2004 - 10:55 pm UTC


When they get an application, they do a "Search for student" which does a soundex search on last name and first name. If anything was found, they check the list to see if a match is there (address info) and then they add/update records for him. Otherwise they create a new record.

If you are thinking of doing the same thing for the student himself then it might be dangerous because he will be able to update other people records that come on the list. I know there might not be a "perfect" solution, but I like to see what you think the best solution for doing this.

Tom Kyte
October 26, 2004 - 7:48 am UTC

any of your solutions will let the student update "other peoples records"

if you do this:

I was thinking of letting each student register before he can fill out an
application. Then the system can assign him/her a userid/password (like Yahoo)
so he can login next time and be identified. After he registers we can check
(first name,mi, last name, city, state) against the database to see if he
possibly might be there. We should inform him whether this was him or not. If
he answers yes, then we update the record in "STUDENT" table, and if he says no
or system could not find a match we insert a new record.

what would prevent me from giving you a false fname/lname etc and being "whomever I wanted"

Only thing that would 'work' would be to assign each student an identity and password -- and then notify them that "this is your identity". anything that lets them identify themselves to you will be subject to 'spoofing'.

database design

mo, October 26, 2004 - 9:15 am UTC


I did not quite understand what you mean.

Are you saying that I need to create a userid/password for every student that already exists in the database and every student filling out an application on the web. "Students can be anyone in the world".

But then as you said how would this validate if a new applicant is already in the database or not (insert or update transaction)?
I guess it is also impossible to prevent someone from faking a name too?
Would you suggest saving the records coming from the web into separate table from "STUDENT" and then have someone match/move the record into the "STUDENT" table?

Tom Kyte
October 26, 2004 - 9:17 am UTC

i'm not suggesting anything, i have insufficient data.

all i'm saying is that any approach that lets the student input information to hopefully find their record is subject to spoofing. What you say you are worried about is 100% obtainable if you let them do that.

If that is an issue, I can really only see "starting over". each student is assigned an account -- a fixed known account -- and given access to that specific account.

Table Design

mo, November 29, 2004 - 9:38 am UTC


I have created a table to track PO information. I have PO_Number and PO_FY as Primary Key.

Now I see a need that a user may enter a typo and have to re-enter the PO_Number or PO_FY as any system.

This means that I have to update the primary key of the record.

1. Do you see any issues with that?

I myself see some issues if the PK that needs to be updated is being used in other table. This means that you have to update the values in other tables too which means more work. Using a seqence number elimnates this.

2. Do you recommend using a unique sequence instead?

Thank you,

Tom Kyte
November 29, 2004 - 3:07 pm UTC

1) yes, probably means that po_number, po_fy isn't the primary key, but rather a unique key with a not null constraint on it.

2) yes I do, yes others would not, yes this is a debatable thing that has been debated back and forth and back and forth.

Me, if the primary key is going to be updated -> it is not the primary key.

database design

mo, November 29, 2004 - 4:31 pm UTC


I will go with your advice and a add a new sequence number (PO_ID) to the PO table. However if I have another table "INVOICE" that has (PO_NUMBER, PO_FY) as part of the invoice since the user selects those when he creates an invoice, would you be replacing these fields with PO_ID in invoice table, or you would keep them as is and update them whenever an update occurs in PO table for a po number.

Thank you,

Tom Kyte
November 29, 2004 - 6:36 pm UTC

the entire point is to have a primary key that can safely be propagated to child tables.

eg: the table invoice uses po_id as its sole foreign key column to this table.

else, why bother.

designing the database

riyaz, November 30, 2004 - 7:12 am UTC

After seeing threads related to "design" in this great forum, I am using the following methodology for my new project.

- Always only update MAIN BASE TABLES.
- e.g.,, In an "invoice generation screen" only invoice header and invoice detail table should be updated through Front End (ie., through Java)

The following things would be UPDATED through TRIGGER, written on base tables.

- tables which are used for MIS reports (summary data tables)
- transaction log, used to monitor the transaction. (when entered, who entered)
- updating back the status on source document. (e.g., Sales order will be updated back with invoiced quantity and status when invoices raised.)
- Status updation (e.g, Pending, Over or Cancel)
- History tables (for having each status of modification information / replicate of base tables)
- The workings like, "Tax calculation" is also being done in procedure and called in Front End.

Here, only base tables (header and detail tables) only will be updated through Front end Jsp or using GUI like VB.

And, other than the base table data, all other information will be updated using TRIGGER which has been written on base table, mostly on detail table.
(Trigger will be very short and will be calling procedure, which is available with in package)

Please confirm my approach and add your views also.

Tom Kyte
November 30, 2004 - 8:03 am UTC

I don't understand what "always only update MAIN BASE TABLES" means.

I'd rather use stored procedures (transactional API's) and never let the front end do an insert/update/delete personally.

that would remove the need for automagical triggers firing in the background as well. (not a huge fan of triggers containing any sort of complex logic, gets really confusing)

Thanks, your final word pl.

riyaz, December 02, 2004 - 12:00 am UTC

so you mean to say, it is like this

From Java,
Case - I
1. Write store procedure to insert into base tables and this procedure will be called from Java/jsp or GUi.
2. Trigger to be written on the base tables, will update the cross tables

Case - II
Put everything in the store procedure and call from Front end. (java or any gui) SO that all the tables will get inserted and all cross table updation will also happen.

Which one you prefer? And also confirm me whether we can have all business logic (tax calcuation etcc) in stored procedure and call from Front end whenever required.

Tom Kyte
December 02, 2004 - 7:30 am UTC

case II!!!! definitely.

confirmed. you can.


riyaz, December 02, 2004 - 12:05 am UTC

In a Purchase order module

Base tables: po_head and po_det
Cross tables: stock, purchase_summary etc..

Single Table or Many Table

Tony, December 02, 2004 - 11:46 pm UTC

I am in the process of designing a database for an OLTP system. The application is large in size, and hence consists of several modules such as Sales Order processing, Vendor PO, Vendor Development, Material Request, Goods Delivery, Goods Return, Inward Quality Check(IQC), Outward Quality Check(OQC), Stores Issue and Receipt, Production Planning, Assembly, Packing, Shipment and Invoice. Across some of the modules/activities, most of the fields (attributes) are same. For example, IQC and OQC modules have similar fields except one or two fields. The modules are transaction intensive, and will heave concurrent access by many users. I have three approaches for designing the tables for modules having similarity. With my little knowledge, I have explained the pros and cons of each approach.

1).Design single table for modules/ activity having similarity. For example, IQC and OQC modules will have single table with additional flag column to identify the modules/activity.

Pros:- No need to UNION/JOIN many tables to pull out data for reports, and thus the reports will run faster.

Cons:- Number of concurrent hits on single table will be high, and this might reduce the performance of concurrent inserts/updates/deletes.

2).Design single table with module-wise partition for modules/ activity having similarity.

Pros:- Each module will access only the corresponding partition in the table. So, there will not be any concurrency issues.

Cons:- Partition might not be suitable for OLTP systems. And partitions and global/local index maintenance might be a problem in OLTP system.

3).Design separate table for each module having similarity.
For example, IQC and OQC modules/ activity will have separate table.

Pros:- Number of concurrent hits on the tables will be less. This will improve the performance of concurrent inserts/updates/deletes.

Cons:- Need to UNION/ JOIN many tables to pull out data for reports, and this might slow down the performance of reports.

Tom, please advice me to find the right approach. Please correct me if my assumptions (pros and cons) are wrong.

Thanks a lot in advance.

Tom Kyte
December 03, 2004 - 7:43 am UTC

1) partition if you are concerned about the cons. (after testing, the database is fairly resiliant in this manner)

2) see number 1

I would tend towards simplicity, a single table seems to be called for. there are techniques you can use such as partitioning to physically segregate the data if need be.

ALT, May 07, 2005 - 12:31 am UTC


Tom Kyte
May 07, 2005 - 8:02 am UTC


see tha administrators guide.

you can use dbms_fga to capture all SELECTS (and in 10g, all modification statements too)

you can use triggers to capture their sql during modifications in 9i if you really like.

v$sql might be more than enough information.

you use grants and auditing to deny access (grants/revokes to deny, auditing to discover)

Can name some of disadvantages of having two database?

abc, May 17, 2005 - 10:20 pm UTC

hi ,
i found it useful but i have a question i.e
how to retrieve employees data that have same first name,last name but have different ID's from a table i mean the result should show only the employees who have same first name & remaing employees shouldn't be displayed

Tom Kyte
May 18, 2005 - 8:45 am UTC

select *
from ( select emp.*, count(*) over (partition by first_name,last_name) cnt
from emp
having cnt > 1;

Optimal Database Design

Mohamed Abd El Mawla, June 15, 2005 - 5:06 am UTC

Hi Tom

We are designing a new database that 'll has many finnancial modules .

I'd like to ask about the optimal database design where i have an argument with developers about this .

They want to have a single database user that owns all database objects but i want to separate all objects according to their speicality .

i.e : user1 for module1(Gl as an example)
user2 for module2(AR as an example)
and so on

I'd been asked a question about the benefits of each method .

So, What is your openion about this ?

Single Database user / Multi Database users managed by synonyms .

What are the benefits of each ???

Tom Kyte
June 15, 2005 - 9:50 am UTC

the benefits of multiple schemas will be in the areas of security (you can secure things "more easily" as there is no single "super user" -- and modularity, you can have separate components.

I like segmentation like that myself. Makes it easier to organize things and see relationships between things

Optimal Database Design

Mohamed Abd El Mawla, June 18, 2005 - 3:03 am UTC

Thanks Tom

Your words are always very useful for me.

The topic of security is the more effective answer where oneday you have to distinguish between senior and jonior developers by using multiple schemas .

This target can not be reached with single database schema where all developers will have the same privileges on the objects .

But i'd like to ask about performance issue here ??

Is there any performance issue with single schema versus multiple schemas using synonyms ??

There was a response from you before about public database synonyms and its performance issues .


Tom Kyte
June 18, 2005 - 7:15 pm UTC

why would you use synonyms?

private synonyms have nominal overhead.

do not use public synonyms, not for performance reason but because globals in general are "bad" and you'll prevent consolidation of systems down the road.

which book?

Alexander, June 18, 2005 - 8:26 pm UTC


what book on Oracle database design would you recommend for
someone with 10 years of experiences with other databases?

Tom Kyte
June 18, 2005 - 8:57 pm UTC

The server concepts guide

read it, retain 10% of it, and you'll already know 90% more about oracle than most.

MOST IMPORTANT THING TO "get": Multi-versioning, read consistency. by far.

did you mean Database Concepts?

Alexander, June 20, 2005 - 11:02 am UTC


did you mean Database Concepts? The one we download in PDF format?

Part I. What Is Oracle?

Tom Kyte
June 20, 2005 - 12:43 pm UTC

yes, that one.

Application design

yeshk, December 09, 2005 - 3:02 pm UTC

Hi Tom,
Thanks a lot for your time and effort.

I am redesigning an application. This application is mainly inserting into the database. A little updates but not much fetching data from database. Which is a better option to go.
1) Should I create a normalized(very) database?
2) What is the implication of having a lot of columns in one table? I do not want to split the data into two different tables when the information is one to one. Example : A applicant data with his address information.
Do you think that an applicat personal information like Name an DOB should be maintained differently than address?
3) I am planning to create a table for each object like a borrower, loan information or user information etc. Is that a good idea? your inputs please.

This application is done in Java. So the team doesn't want to do anything other than storing and fetching information from database. I tried telling that manipulating dates is faster in Oracle. but they want to fetch from database and then parse and split the date as day/month/year etc. How do I make them understand that handling/validating/managing data in the database is the best way to go.


Tom Kyte
December 10, 2005 - 4:46 am UTC

1) cannot be answered without understanding the application.

2) any table with more than 255 columns will be stored in multiple row pieces, if you are not read intensive - it will not impact you too much.

3) see #1

well, you just lost me on that last paragraph. Actually, interesting last paragraph, seems to be implying this rule:

JAVA DEVELOPMENT => database is not a thing of relavancy

Guess the only way to convince them is to let them program for a couple of years and get a little age on themselves.

When they code the worlds slowest, least manageable application some number of times, they'll get the point.

Another way to "prove" your point

Bob B, December 12, 2005 - 9:51 am UTC

I'd say, have a contest to do something small with java and the database. Spec out a tiny app (perhaps an app to add, view all, view one, and update contacts) and then have a challenge amongst the developers you want to persuade. Each developer, including yourself, tries to make the app. The apps are then rated by correctness, time to complete, and response time. After that's done, you could add a step 2 for a change request and see how long the maintenance/addition takes.

Tom Kyte
December 12, 2005 - 10:13 am UTC

well, that is not entirely fair, but it would demonstrate this sort of concept:

there are tools that can efficiently do simple, straightforward things easily (eg: to do your contest, I would choose htmldb - in the time it takes to type this sentence, I would be done - and would have lost update detection, CRUD screens, everything done).

there are tools that can do anything - but not everything simply. sometimes not ANYTHING simply, but they can be made to do anything.

there are tools that can do some very specialized things better than anything else.

there are tools that .......... <and so on>.........

The contest

Bob B, December 12, 2005 - 11:17 am UTC

The point of the contest was not to use the best tool for the job. It was to say, ok we all have access to these two tools (java and oracle). Each of us will try to do x with these tools and we'll compare implementations to see which implementation style is best for us as a team.

I was actually thinking of html db when I said contacts app, because I believe you can download a written and tested html db contacts application.


mo, January 19, 2006 - 10:34 pm UTC


I have an inventory management system that stores inventory information on stock items. The stock items get replenished using a different financial system where purchase orders are entered. Currently we do not have access to this financial system.

my plan is for a warehouse person who receives a package is to lookup the po numbers from the financial system and import that po items information.

My questions:

1. The financial system had other information (payroll) and we may not be able to get a read account or link to it. What would you do in this case? (i.e create views or another instnace that mirrors the financial database and refresh it nightly)

2. If they give us access, Can we get limited read access on certain tables/fields/records? Or you can't create an oracle account that limits access at that level.

3. What is your way of solving this?


Tom Kyte
January 20, 2006 - 9:32 am UTC

I'll assume the other system is remote and you'll be using a dblink.

You can certainally

b) create views that limit the rows/columns the user can see
c) grant them select on the needed tables or views that they must have access to

create your database link using that account. You'll have read only access to precisely the data you need and nothing more.


mo, January 20, 2006 - 6:24 pm UTC


1. Since they may not allow even a read acces to a production system, and might tell us to use a data warehouse instance, would you ask them to create the read-only views into that another database. If this is feasible would the views be updated automatically via a link?

2. Since I am linking from the inventory system into the other system I would be creating the database link in my system. Do they have to do anything at their side for this?

Tom Kyte
January 20, 2006 - 6:33 pm UTC

if you don't have even read only access, you'll have to call them and ask them to read it to you or something - I mean, that would be "no access"

they would have to set up the account, grant it the privs, create the views.

Database Design

Reader, January 21, 2006 - 1:21 am UTC

We have a legacy system which is feeding Data to another system (OLTP) .During the night we refresh our interface tables which takes about 15 mins.The timing is not a big concern today .The problems are :
1.On these interface tables we have a lot of dependency ie FK .So we disable these contraints refresh the data and then enable the constraints (novalidate)
Sometime if there is delay and some tables are still active then the constraints cannot be disabled and refresh hangs .

Can you suggest an alternate solution ?


Tom Kyte
January 21, 2006 - 10:53 am UTC

do not disable the foreign key. That is my only suggestion - regardless. Even if it makes it take longer, that is my only suggestion.

I think you'll find it to be pretty efficient (foreign key checking). And when the data starts getting rejected - you'll be notified immediately instead of finding it as a strange "impossible to be true" bug later.


mo, January 24, 2006 - 12:46 pm UTC


As a followup to previous message, the financial system owners agreed to copy needed records for us from their production instance into another warehouse instance (they own) on a daily basis.

1. If you are the owner of the inventory system, would you create tables to copy the data over on a daily basis or would you create materialize views based on those table.

2. How would you handle the daily updates of the data. Do you schedule a daily job that deletes the data in records and copy over the tables again?

3. Their copy will merge all tables that make up a record into one. This means we will have repeating information for some fields. Would this be a problem? Would you break it up and save it into separate tables or you would parse the data in your application code.

4. What is a legacy system? I hear the term but do not know what it means?

Thank you,

Tom Kyte
January 24, 2006 - 8:52 pm UTC

a legacy system is an existing system. Immediately after a new system goes production, it is a "legacy" (existing, in place) system.

I cannot design a warehouse refresh right here, with this level of detail. #1 for example, you cannot say if you can really use a materialized view effectively, there is no mention of volumes, no mention of materialized view logs, not really much to go on (and this is not the place for really new questions...)

database design

mo, January 24, 2006 - 10:34 pm UTC


I could not understand what you mean by your second paragraph. Are you saying that you need more info to see if a materialized view solution is viable?

Would not this
delete table;
insert into table select * from table_in_warehouse@db_link;

work like a materialized view.

Do you also mean this subject should be under a new question?

Tom Kyte
January 25, 2006 - 1:23 pm UTC

sorry, I have extremely limited information here - you ask "design system", I respond "insufficient data".

Your two lines of code would be logically equivalent to a complete refresh materialized view, yes.

Database sizing

Nikunj, January 25, 2006 - 12:14 am UTC

Hi Tom,

1. How we can size the database for example I would like to develop financial accounting system.

2. Proc and crons of each application on different tablespace or every application on single tablespace.
I.e. Financial Accounting system on Financial tablespace.
Hr on HR Tablespace
Distribution on Distribution Tablespace

Or Financial, HR, Distribution on USER tablespace ?


database design

mo, January 25, 2006 - 9:12 pm UTC


sorry for the limted info. But I will try to explain it here briefly. It is a simple thing but I want to find the best way of doing it.

We have 3 databases:

DB_A (Financial sytem)
DB_B (Data Warehouse for Financial system)
DB_C (Inventory System)

DB_A has several tables A,B,C that contain requisition data and tables E,F,G for purchase order data.

The DBA of DB_A will consolidate all data in tables A,B, and C into one table for requisitions data access and the same thing for PO data. He will do it every night.

I need to capture the data from those tables in DB_B into DB_C and update the data nightly.

1. How would you do this?
2. Would you copy the data over to the inventory system (since we own it) or you just keep using the database link and read the data from the data warehouse?
3. Would consolidating the data in several tables into one table create a problem for parsing or not?

Thank you,

database design

mo, January 28, 2006 - 9:22 pm UTC


do you think the above explanation of what i need to do was very limited too?

Tom Kyte
January 29, 2006 - 8:26 am UTC

all I can do is tell you (as I have in many places on this web site) how you can do it. there are materialized views and here is how they work.... there is dump and load and here is how it works.... you can write code to move it and here is how that works.....

it is up to you to evaluate and most importantly - understand - each so you can pick the one that is right for you.

I don't know if you have the ability to set up mv logs (nor do I really want to know, you need to know - you need to evaluate...)

I don't know if you would benefit from incremental (fast) refreshes or if complete would be best.

I don't know the volume of data here.

I don't know if you can pull just changes (eg: they put a timestamp in place)

I can only tell you the tools you have available, you have to decide which tool is best for you in this case, remembering that the next time you do something similar, a different method might be best because some variable changed.

Database design

Eugene Yen, March 02, 2006 - 11:44 am UTC

Hi Tom:

You comments are really helpful to me through the threads. There is a design issue that is slightly different from the discussed above and I like to know your opinions.

We are designing a database that will manage the data of many customers. All of the customers will use a same web site to communicates to the database. There are the requests of most customers that they like to have their data as separated as possible from the others and they also need to do offline bi-direction data exchange to this database from their databases( possibly through web services).

Basically the data structure is the same. THe customers number, however, can go upto a thousand potentially. And each of their data volumn is about in the middle range.

I have already opted not to create separate schemas or tables for each customers. Instead I think it is easier to separate data at the records level. But it is quite a challenge still. I was told that in stock market many databases has encounterd similar challenges – each user like to have their data separate form the others.

My initial approach is to mark the transaction data (records) by the customer ID in columns like “CUSTOMIER_IDENTIFIER”. And manage each user’s data by this flag.

What is your suggestions?


Tom Kyte
March 02, 2006 - 1:13 pm UTC

I'm so glad you didn't go down the multiple schema/tables approach. That would be the wrong approach.

having that customer_identifier is probably the right approach (given what I know from above).

You can either create a view that adds "where customer_id = sys_context( 'your_ctx', 'cust_id' )" or use fine grained access control to control access to the data.

You would, upon identifying the user, set the application context with the customer id so that when they query any table - they only can see or touch THEIR data.

DB Design issue

Eugene, March 02, 2006 - 2:10 pm UTC

Hi Tom:

Thank you for the reply which points me to a good direction and gives me a good starting point.

Appreciate that.


design question,

A reader, September 21, 2006 - 2:24 pm UTC

We are writing an application to extract data from a multi million row table on which we have only SELECT privilege.

The transations on the multi million row table happens quite frequently and it could INSERT, UPDATE OR DELETE. Basically we don't have any control over that table.

We extract the table say at 2:00pm by giving condition like "where update_timestamp between 1:00pm and 2:00pm".

If there is a transaction on the base table starting at 1:50pm and ends at 2:05pm, I would not be extracting those data in my 1:00pm to 2:00pm condition.
Next time when I issue with a condition "between 2:01pm and 3:00pm" then I miss those records that were inserted in the base table between 1:50pm and 2:05pm. The sysdate value will be 1:50pm (that is when the transaction starts).

This is a serious issue for as. Right now we are thinking of extracting 4-6 hours of data from the base table. there will be duplicate records but we will handle that in OUR environment.

Any other idea considering data integrity and performance as top priority?


Tom Kyte
September 22, 2006 - 2:26 am UTC

why wouldn't you use change data capture, streams or some other builtin capability???

but for your logic, instead of you remembering "we pulled data last at 2:00pm", you must remember:

select min(dt)
from (select sysdate from dual
union all
select start_date from v$transaction);

that way - you will remember "1:50pm" - the start date of the oldest transaction currently running in the system rather than 2:00pm.

</code> <code>

still have a question,

A reader, September 22, 2006 - 6:32 pm UTC

thanks for the suggestion, but now I may be missing in understanding the concepts here. If a transaction starts at 1:50pm the start_date in v$transaction would be 1:50pm and my query (which starts at 2:00pm) now checks for 1:50pm onwards instead of 2:00pm. But it does not extract the data which started at 1:50pm since it is not commited yet (which is at 2:05pm). At 2:05 pm the information in v$transaction is perhaps gone right?

Next time when I start a similar query at 3:00pm and if there is no transaction happening at that time, then it would consider 3:00pm as starting time and it does not extract records committed at 2:05pm.

As per your other suggestion of using streams, it is a very good idea but the buerocrats in our company would not agree that without profound testing.


Tom Kyte
September 24, 2006 - 1:22 pm UTC

in order to accomplish your goal, the standard logic is:

a) remember what time it is RIGHT NOW
b) look up the last remembered time
c) pull all data since (b)
d) commit;

repeat. but you need to modify a) to be "remember what time it is right now OR the oldest transaction time"

So, let us say you start the very very very first copy of your data at 8am today. And you refresh every "hour"

You would:

a) remember the minimum of 8am, start time of all current transactions in some table. suppose this in fact is 8am (no other transactions)
b) pull all data.
c) commit;

one hour later you would

a) remember the minimum of 9am, start time of all current transactions (say this result is 8:55am).
b) query up the last remembered value, that is 8am, from a) above.
c) pull all data with a timestamp >= 8am
d) commit;

one hour later at 10am you would

a) remember the minimum of 10am, start........
b) query up the last remembered value, that is 8:55am from a) above.
c) ...
d) ...


I would agree that nothing be done without profound testing, however, you chose the hardest, most manual, I will impose a negative performance impact on everything, we will be debugging and maintaining this code forever approach. You might want to reconsider your approach before it goes into production and becomes rather permanent.

great tips,

A reader, September 25, 2006 - 10:08 am UTC

That was wonderful and very clear. We will try your tips which we never had an idea before.


Good One...

Vishnu Vardhan Reddy Singireddy, September 25, 2006 - 1:49 pm UTC

Good One

Vishnu Vardhan Reddy Singireddy

Good One

Vishnu Vardhan Reddy Singireddy, September 25, 2006 - 1:51 pm UTC

Good One
Vishnu Vardhan Reddy Singireddy

DB design

sam, November 21, 2006 - 7:11 pm UTC


1. Would you store derived values in a table column? or you always calculate those values in your code?

2. Let us say you offer free cassettes and CDs to 100 customers each month.

You assign each customer a number of books and a number of CDs. This number will be shown to customer when he enters the quantity (quota based number). He can use that whole number on one song or select different song titles. He can not exceed that quota. Every month, the inital number assigned to that customer is added to this "Quota allowed" number. Also all the leftovers (unused quantity) will be added to the initial number assigned for him. The admin can also change that initial number assigned to a customer from month to month.

How would you compute the amount available for each customer every month. I have two tables:



Would you be storing multiple records per customer in the Customer_Quota so you can track the old quantites assigned per month as you need to track that to compute available quantity now.

Tom Kyte
November 22, 2006 - 4:48 pm UTC

1) or maybe a function based index - only reasonable answer is "it depends"

give me an example :)

2) sorry, did not follow - and does not seem to be a follow up but a new question

Application design NOT Database

Arindam Mukherjee, November 29, 2006 - 9:53 am UTC

Respected Mr. Tom,

Today you please let me ask you one offbeat question. I have already read your book “Effective Oracle by Design”. My question – is there any website where I can read various designs on different kind of applications those already are running? It’s my conviction, after designing the application I will go for database design module wise for that application.

db design

sam, July 16, 2007 - 3:06 pm UTC


I want to verify some database design issue with you.

I learned from several DB books and normalization rules that you do not save "Calculated" values in the database, such as "Invoice Total" and "Inventory Level" for a stock item.

Do you agree with this?

I think the reasoning behind this was that every time you update a record (e.g unit price or quantity) you have to make sure you updated the calculated value and thus they recommend to calculate the value real-time on demand.

There is an issue with this design though that I found out.

I have an inventory system with 4 table, inventory, orders, receipts and shipments.

Let us say you define in the "inventory" table what the stock item you want to track and the Initial Quantity.

Then to calculate the "Inventory on Hand" you take that starting value for the maximum dated record, and add all receipt quantities after that date and subtract all shipment quantities after that date.

This works fine when you have a few hundred records. However, when your tables starting getting big, the function that caculates become slow because it is scanning thousands or millions of records.

Some common sense quesions that arise, is why does the system have to go back and look years back worth of data to calculate the inventory on hand for a stock item. This should be done on a day by day basis and stored in a field?

What is your opinion on this and how would you design such a system?

ALso, do you use any references/book for ideas on a good data model for speicifc application or you design the thing from scratch? e.g, It might be helpful to look at existing banking data model if you are developing a banking application.

Thank you,
Tom Kyte
July 17, 2007 - 11:38 am UTC

... I learned from several DB books and normalization rules that you do not save "Calculated" values in the database, such as "Invoice Total" and "Inventory Level" for a stock item. ....

Nope, I don't agree - if the values are maintained by the system itself. Think about it - an index is repeated values that must be maintained in two places isn't it. Therefore we should not use indexes because it duplicates data. But of course we do because they are "transparent" to us. So, using materialized views - maintaining this data is quite doable and smart. I do - my home page tells you some interesting numbers - I do NOT count those rows every time you hit that page however (my server would have to be a lot bigger if I did). Rather, I use a materialized view maintained by the system to transparently maintain these "copies" for me.

in a data warehouse that statement is absolutely WRONG....

db design

Sam, July 17, 2007 - 2:15 pm UTC


1. OK i got half of the answer which implies you will save the caculated "Inventory on hand" for each stock item in a materialized view. I guess you sometimes favor saving calculated values in the database for immediate/faster access. is this correct

2. Now, the second half is how you implement this design and why do I need to use a materialized view. I might as well save this with my "Stock_Item" table.

Would you be updating this caculated field evertime a new receipt, shipment, adjustment of inventory occurs and everytime an existing record is updated?
Tom Kyte
July 17, 2007 - 2:39 pm UTC

1) sure

2) you want it to be transparent - done by the "database", like an index. So you know it is done. And done correctly.

Trying to maintain your own rollups and dependent data like that is where many a developer "goes wrong" as they many times try to do this via triggers and what not and get it totally wrong.

besides, we all know you only write code when forced to - so, if you don't have to do it yourself, you wouldn't! that is why you buy packaged bits of technology - so you are not reinventing the wheel!!

why they are NOT doing this out there?

Lajos, July 18, 2007 - 2:00 am UTC

Hi Tom,
I'm in the field of consulting on technical issues in the financial industry. Account balances could be a perfect example for your case: a "balance of an account on a given day" is not a separate entity, but simply the sum of credits minus debits for the account which happened not later than the given day. So it could be a candidate for your materialized view idea (transparent, done by the database etc.).
Yet I've never seen a CBS (Core Banking System) which does that, even those on the Oracle platform. They always have some form of end-of-day processing (EOD) and when I ask "what does your EOD processing do?", the answer always contains "we build the balances during EOD". And of course there are different (and sometimes awkward-looking) solutions to the problem to answer queries against balances anytime, even during EOD. Do you see something in this area that explicitly prevents implementing your idea?
Tom Kyte
July 18, 2007 - 10:26 am UTC

other than the fact that many developers - whom spend copious amounts of time learning the fascinating intricacies of their programming languages - do not know what the database can do....


database design

Sam, July 23, 2007 - 6:38 pm UTC


as a followup to the july 17 response, I am confused a little bit on how you create this MV.

Right now i have several functions that computer total inventory for a stock number for a give organization (several warehoueses), for a given warehouse, or for a given storage location

EAch one of those functions is a pl/sql program.

Do I keep everything the same, and just create a mterialized view that tracks the various numbers. How would the materialize view look like or you would create 3 materialize views:

stock_no, org_total, warehouse, warehouse_total, bin, bin_total

2. Can you refer me to a schema where it implements you solution for a banking, credit card, or inventory application to see how they do it.


for SAM

Lajos, September 17, 2007 - 9:50 am UTC

maybe something like this:
create table accounts(accno number not null primary key)

create table bookings(accno      number not null references accounts(accno)
                     ,value_date date not null
                     ,debit      number not null
                     ,credit     number not null)
insert into accounts values(1);
insert into accounts values(2);
insert into accounts values(3);

insert into bookings(accno,value_date,debit,credit) values(1,to_date('20070101','yyyymmdd'),100,0);
insert into bookings(accno,value_date,debit,credit) values(1,to_date('20070101','yyyymmdd'),0,20);
insert into bookings(accno,value_date,debit,credit) values(1,to_date('20070102','yyyymmdd'),0,10);
insert into bookings(accno,value_date,debit,credit) values(1,to_date('20070102','yyyymmdd'),50,0);
insert into bookings(accno,value_date,debit,credit) values(2,to_date('20070102','yyyymmdd'),0,200);
insert into bookings(accno,value_date,debit,credit) values(2,to_date('20070102','yyyymmdd'),50,0);

select distinct accno,value_date,sum(debit-credit) over(partition by accno order by value_date) balance
from bookings
order by accno,value_date;
---------- --------- ----------
         1 01-JAN-07         80
         1 02-JAN-07        120
         2 02-JAN-07       -150

create materialized view account_balances
build deferred
refresh on demand
as select distinct accno,value_date,sum(debit-credit) over(partition by accno order by value_date) balance
from bookings;

Materialized view created.

select * from account_balances
order by accno,value_date;

no rows selected

exec dbms_mview.refresh('ACCOUNT_BALANCES')

PL/SQL procedure successfully completed.

select * from account_balances
order by accno,value_date;

---------- --------- ----------
         1 01-JAN-07         80
         1 02-JAN-07        120
         2 02-JAN-07       -150

insert into bookings(accno,value_date,debit,credit) values(1,to_date('20070103','yyyymmdd'),30,0);
1 row created.
Commit complete.

select * from account_balances
order by accno,value_date;

---------- --------- ----------
         1 01-JAN-07         80
         1 02-JAN-07        120
         2 02-JAN-07       -150

exec dbms_mview.refresh('ACCOUNT_BALANCES')

PL/SQL procedure successfully completed.

select * from account_balances
order by accno,value_date;

---------- --------- ----------
         1 01-JAN-07         80
         1 02-JAN-07        120
         1 03-JAN-07        150
         2 02-JAN-07       -150

This is far from real-world usage of course. I'm still wondering on putting something like this into real usage, mainly because I personally fail to see how to answer questions like "what is the balance for this account as of this date". The mview cannot be on commit refresh, because we don't want to serialize inserts into bookings.


Sam, September 19, 2007 - 11:08 pm UTC


I want to take your opinion on designing this in the DB to see which is the best alternative.

I have 50 customers that log in to a web system and order magazines once per month. The magazines have four media (paper,cassette, DVD, CD). Customer can select combination of these medias on same order. I want to impose new annual quota rules regarding the ordering, so each customer will be assigned an annual quota (allowance) and that will be displayed/calculated on a monthly basis (annual/12). There will also be carryovers from month to month.

It seems there are two ways to design the quota reference table

with this I will have one row per customer.

customer_id (100)
Year (2007)
paper_quota (1000)
cassette_quota (2000)
dvd_quota (3000)
cd_quota (4000)
Effective_Date 01-JAN-2007

I think an alternative design is to have 4 rows one per media:

customer_id (100)
Year (2007)
Media Paper
Quota (1000)
Effective Date (01-JAN-2007)

customer_id (100)
Year (2007)
Media DVD
Quota (1000)
Effective Date (01-JAN-2007)

One screen requirement is to display a screen with 50 rows for customer editing and display. I am not sure if that affects your decision or not.

Another requirement is that customer may adjust the quota for some customers through out the year. So this will result in a new ROW with a NEW effective date. I need to freeze the old row as that is used in a formula.

Which design do you think is better and why? I think both are normalized to 1NF, 2NF, and 3 NF.

If customer made a typo in a number, would you update the errored value or would you insert a new row and mark the first "Terminated".
Tom Kyte
September 24, 2007 - 7:01 am UTC

why not just have an orders table that you can query and figure out how much they've ordered year to date?

you already have everything (you must if you have an existing system whereby they can order this stuff)


sam, September 20, 2007 - 7:17 am UTC


I forgot to mention for the above tables that i will create a one column sequence (Quota_No) for a primary key
instead of using 3 column or 4 column natural keys.

natural keys for table 1 would be (customer_id,year,effective_Date). Table 2 would be (customer_id,year,media,effective_Date). Sequence for PK would be much easier to use in ths case.


A reader, September 24, 2007 - 8:41 am UTC


I already have an orders table for capturing orders.

However my questions was about enforcing quotas for monthly orders. I was checking about which table design you think is better for defining the quota numbers for each media and why.
Tom Kyte
September 26, 2007 - 1:38 pm UTC

I'm not sure why you want another table to duplicate that which you ALREADY HAVE.

Your specification is not detailed enough to suggest anything.


A reader, September 26, 2007 - 6:11 pm UTC


I have an orders transaction table. This captures the orders entered by customers.

Now, I want to put limits sort of your credit card limits on charges you make. This is a reference lookup table. So when customer goes to place an order I tell him you have 100 DVDs available to order and 200 Cassettes and so on. When he saves the order I validate the quantities selected against the monthly available quota before I accept or reject the order.

is it clear now. I am trying to see how you would design this lookup table. Would you have one row per customer (for all media) or 4 rows (one per media) per customer as explained above.
Tom Kyte
September 26, 2007 - 10:14 pm UTC

again, I say


why do you need yet another table to see if their new order would exceed their quota?

modeling perils ...

Gabe, September 26, 2007 - 9:02 pm UTC


Your model might consist of tables like CUSTOMER, ORDER, ORDER_ITEM, PRODUCT, CUSTOMER_PRODUCT_QUOTA.

A CUSTOMER can place many ORDERs, but an ORDER belongs to one and only one CUSTOMER. An ORDER may have many ORDER_ITEMs, but has to have at least one. Any ORDER_ITEM belongs to one and only one ORDERs and is for one and only one PRODUCT. The ORDER_ITEM has attributes like Quantity, Price, etc. A CUSTOMER has a QUOTA allowance for a PRODUCT over a period of time ... this would be the CUSTOMER_PRODUCT_QUOTA table:

{customer_id, product_id, start_date, thru_date, quota_quantity, ...}

You would have to implement a mechanism for ensuring there are no overlapping time periods for the same CUSTOMER and PRODUCT.

You would also have to decide how to implement Quota=0 and Quota=unlimited. For instance, if there isn't a row in CUSTOMER_PRODUCT_QUOTA for a given CUSTOMER and PRODUCT, does that mean ...
1. The CUSTOMER cannot order the PRODUCT?
2. The CUSTOMER can order an unlimited number of PRODUCTs?

An ORDER typically has many DATEs associated with it: Entry/Received Date, Accepted Date, Shipping Date, Closed Date, etc. You'd have to decide which of these Dates should be used to compare against the period of time within a CUSTOMER_PRODUCT_QUOTA.

When placing/modifying an ORDER for a CUSTOMER the system should check that, for any PRODUCT being ordered, the SUM of all Quantities on ORDER_ITEMs for this PRODUCT belonging to active/closed [but not cancelled] ORDERs of this CUSTOMER, having the "whatever date" within the time period of the relevant/current CUSTOMER_PRODUCT_QUOTE, does not exceed the quota_quantity of the said CUSTOMER_PRODUCT_QUOTE row.

Now, these are just some hints ... modeling without clear, in-depth knowledge of the particular problem domain is a perilous undertaking. For this reason, it is unlikely you'd get modeling solutions here ... the other reason could be that, what one presents is already indicative of ambiguity ... enough ambiguity that the prospect of entertaining a lengthy exchange to establish a common language/terminology, clarify what is said and discover what hasn't been said becomes a quick show-stopping decision.

Anyway, if you want a quick opinion, on your terms, no questions asked ... 4 rows (one per media) per customer.


A reader, September 26, 2007 - 10:27 pm UTC


It seems it still not clear to you
I am not creating another table. I am taking your opinion on which is the best way to design the QUOTAS table from the two possible design i listed above and why.

Tom Kyte
September 27, 2007 - 6:48 am UTC

ahh, ok - the quota table should be something like


if no row exists, they have no quota, if row does exist, they have quota. the verification logic would look something like:
add_order( .... );
select * from customer where custid = ? for update; <<=== serialize at custid

select ordered, quota_amount, media_type
  from (select custid, media_type, count(*) ordered from orders where custid=?)a,
       quotas b
 where a.custid = b.custid
   and a.ordered > b.quote_amount;

if you get rows back, they are in violation of the quota.

Argh ...

Gabe, September 27, 2007 - 8:13 am UTC


On second thought, there is nothing in what you described to suggest a need for a separate QUOTA table. If the system is all about these 4 products then an ORDER table with 4 Quantities columns and a Customer table having 4 quota columns should keep everything nice and simple.

Sorry about the detour.


A reader, September 27, 2007 - 10:05 pm UTC


yes but why you picked this



customer,media_1,quota_amount, media_2,quota_amount, media_#, quota_amount,effective_Date

Are not both valid normalized designs or the first design is more flexible and easier to code in case a new media comes out.

2. The above is the same as when you have 3 phones for a customer (cell,home,office). do you add those to same customer table (cell_phone,home_phone,office_phone) or you create a child table and create 3 rows and link them to master customer table.

3. when you design transaction systems do you always normalize to 3NF (no transitive dependencies) or not? 3NF does not allow storing derived/calculated values in DB such as invoice total because it depends on price and quantity instead of primary key.

Tom Kyte
September 28, 2007 - 5:26 pm UTC

1) because I've been doing relational databases long enough to know that you will have 5 media types tomorrow.

2) and in 1985, we didn't have "cell" because cell phones where just coming out. So, in 1985 you would have had "home,office". In 2007, you have home, office, mobile, pager, whatever. And it'll change.

I would have a table of phone numbers, yes.

3) typically not, but not always "not"

Data model

Mike, September 28, 2007 - 8:31 am UTC

To those who would suggest 'an ORDER table with 4 Quantities columns and a Customer table having 4 quota columns': Don't do it.

This is so bad that if I heard a data-modeler candidate suggest a design like this during an interview, that would be the end of that candidate's chances with me.

'Are not both valid normalized designs...?' No, it violates first-normal form (no repeating groups). There are situations when it is OK to deviate from first-normal form, but this is certainly not one of them.

Acceptable scenarios for non-1NF designs generally have specific, stable meanings for members of the repeating group. The example of home phone, business phone, cell phone is an example of this (although 'stability' may be questionable there). Or if you want to store balances for each month of a year in a single row, you can have a column for January balance, February balance, etc., as we can be confident that we will continue to have 12 months per year for the foreseeable future.

What if the business adds a product? Or drops one? Or both? These are normal things that businesses do, but your design will not allow them to do that without you getting in the way.

And these are just the tip of the iceberg, as far as problems likely to arise from a non-1NF design.

You should really have a positive reason for your denormalization - not just a lack of reasons not to.

OK to disagree ...

Gabe, September 28, 2007 - 12:52 pm UTC


This is so bad that if I heard a data-modeler candidate suggest a design like this during an interview, that would be the end of that candidate's chances with me.

That's fine ... tell me where you work and I won't bother applying ;)

No, it violates first-normal form (no repeating groups)

I think you should review what repeating groups actually means.
Look, what if this and that takes one quickly into universal models. They are valid questions to ask business as part of the modeling exercise ... in a forum, you hardly have access to the business and going back and forth is perilous. (Note to self: stick to the decision to keep away in the future). So, where do you draw the line? Well, you can only go with what was actually said and, I don't see anything in here to justify Quota as a separate entity rather than 4 distinct attributes of Customer. Not every "web system" with "50 customers" selling exactly 4 things needs an ERP.

There are systems where the wheels of a vehicle are instances of a Parts entity ... and there are systems where the wheels of a vehicle are attributes of the Car entity: right front, left front, right back, left back.

But it is OK if we disagree.

db design

A reader, September 28, 2007 - 6:32 pm UTC


<because I've been doing relational databases long enough to know that you will have 5 media types tomorrow.>

but you can also add columns for extra media in the same Quota table:


I assume when you have a row per media instead of adding new columns you do not have to change the database structure and it is much easier for querying. correct?

I think both design are valid but one is much more scalable than the other.
Tom Kyte
October 03, 2007 - 1:21 pm UTC

putting data in record is almost never "valid". Adding columns because you added a new media type is very "newbie" like. Not the way to do it.

A customer has one or more media types associated with them. Media types are an entity to themselves.

db design

A reader, October 09, 2007 - 9:52 pm UTC


1. If there is a requirement to store a dynamic variable in a table that anapplication uses to display the number of rows on a screen, do you usually create one column table for that like this or you create a differetn structure?

Number_of_Rows number(3)

2. Let us say I have a formula that computes something

MV(A) = MAX(SUM(AB[x]/TM - TC, TO - A);

I can create a function to implent this. However would i be able to store this formula in a table and then change it in table and function will change accordingly?

If i code the function to implement the formula I would not be able to do that.


A reader, November 10, 2007 - 9:08 pm UTC


I have a 4 small questions on db design.

1. Is it really possible to store a formula in a table and have a function to implement this formula. Then when formula changes the function will change accordingly?

I can't grasp it can be done but I heard some talking about it. The function will have to be coded to implement the formula.

2. For a quota table like

would you add a sequence (quota_id) for PK and add a unique constraint on the 3 columns,or have 3 columns (customer_id,year,effective_date) as a PK.

Which would be easier for querying/coding.

3. Is it OK to have one validation table with one column for PK like




4. If i want to store variables for application code in a table like "number of rows on a page" so we can change later without changing any code, would you store one variable per table or can have one big tables with all variables values in it

Tom Kyte
November 11, 2007 - 7:39 pm UTC

1) what is a "formula" ?

2) insufficient data to answer anything. There is no cut and dried answer to the "surrogate key" question. It depends on the referring objects and whether this quota row is "never changing" as far as the key goes.

I can say, if you have effective date, you do not need YEAR do you, you have the dates already.

3) sure

4) one index organized table would do it.


A reader, November 11, 2007 - 11:34 pm UTC


1. Excellent point. One customer can have multiple quota rows per year with different effective dates. However, he can update effective_date, so this rules out using natural key for (customer_id,quota_year,effective_date). I have to use an oracle generated sequence. correct?

I would like to store the "quota_year number(4)" even though it can be derived from effective_date because a lot of screens query by that column (quota_year) and user defines selects that too on the data entry screen. Is that an issue?

2. A formula is a matematical equation. something like this

MV(A) = MAX(SUM(AB[x]/TM - TC, TO - A);

and the variables are caclulated from other transaction tables (i.e Sum monthly orders prior to current month x, sum monthly shipments prior to current month).

I am going to create a function to implement the formula. But someone was saying you can store the formula in the database so the function can implement the change automatically.

I can't see how this can be done by storing the formula in the database. I still have to go to the function and update it based on the changed formula. I doubt there is an automatic way where you change a formula in a database and function changes automatically. what do you think?

Tom Kyte
November 16, 2007 - 12:33 pm UTC

you do not "have to", primary keys should be immutable but if you wanted to use an index organized table to have all of the data for a given customer physically colocated together (clustered together) for retrieval purposes you would use the natural key.

where trunc(dt,'y') = to_date('0101'||:x,'mmddyyyy')
where to_char(dt,'yyyy') = :x

(trunc is more efficient) is what you want to do - you DO NOT want to store the year, no.

You can index trunc(dt,'y')/to_char(dt,'yyyy') as well if you like.

At most you would

create view V as select t.*, trunc(dt,'y') yyyy from t;

(or to_char)

2) Well, gee - a formula is a mathematical equation :) laughing out loud at that - I guess I got exactly what I asked for.

Too bad there is no "mathematical equation" datatype really.

implement the function, the function is stored in the database, just like data - function changes, so what, change the function.

You would have to do a ton of dynamic SQL if you store a string in the database and call it a formula.


A reader, November 17, 2007 - 4:20 pm UTC


1. What do you exactly mean by "index organized table"? is this a validation/lookup table or it is a table created by creating an index on a data table?

I assume that even if I use a sequence+unique index I can get the same result, right?

2. what do you mean by this?
<You would have to do a ton of dynamic SQL if you store a string in the database and call it a formula.>

You mean if i want to implement a dynamic function where if function changes code automatically changes.

David, November 19, 2007 - 8:38 pm UTC

@A reader

In answer to 'what is an index organised table?'

I suggest you read one of Tom's books or the Oracle concepts guide
( ) for more
info, they all explain it better than I can! ;-)

In a 'normal' (aka heap) table the data is generally stored fairly randomly, it's just a heap of data, hence the name. We might index the data to allow us to lookup specific values without having to scan the entire table.

If we only use the table to perform lookups then we would include the result of the lookup as an extra column in the index to avoid the subsequent table access by ROWID after getting the ROWID from the index.

For example, a table such as


might have an index created on both key and value otherwise we'd have to get the rowid for the key we asked for and then go lookup the value in that row in the table.

This extra column also has the advantage that the keys and values are sorted together so if we're performing a lookup on a range then the values can be rapidly returned from the index without potentially jumping around the disorganised underlying heap table to get the rows (if the index blocks indicate that the rows in the table for consecutive keys are in different table blocks).

The downside is that if we add the extra column in the index then we've effectively stored all the data twice, once in the table and once in the index. For large or many lookup tables this overhead might become significant.

If we declare the table as an index organised table then we dispense with the heap structure for the underlying table. The index becomes the table and we perform all lookups directly from the index.

For the above example..

create table lookup (
key number not null primary key,
value varchar2(100) not null
organization index

Note you can still add secondary indexes on the IOT in addition to the primary key.

For Q2, just think how hard it would be to perform some SQL based on some arbitrary function that is supplied as a string. Your SQL would need to parse that string, understand any operators and variable placeholders in the string. It would have to understand and obey operator precedence and parantheses to get the correct ordering of operations. It would have to do this dynamically, hence a lot of dynamic SQL code...

Much easier, safer and reliable to expose as a function and change the implementation of the function itself to alter the behaviour. You could implement multiple versions of the function and employ overloading to access the different versions.

db design

A reader, November 21, 2007 - 11:30 pm UTC


Is it OK to add a small validation table


where A = Active and T = TErminated

for all data tables that need a record status.

Does this add any overhead on the join? Someone mentioned that and I doubt that.

2. would adding only a column "active_or_terminated" to the data table and a check constraint wihtout haveing a validation table be a better design and lower overhead.

Tom Kyte
November 26, 2007 - 10:30 am UTC

is it ok?
sure, yes, definitely. primarily because tomorrow the status will be

a = active
t = terminated
m = maternity leave
s = sick leave
v = vacation

and so on....

there is no need for a "join" if you don't want - you put the foreign key on so we can validate the data upon insert and for retrieval, just use a view:

create view 
select t.*, decode( status, 'A', 'Active', 'T', 'Terminated', 'UNKNOWN' )
  from t;


A reader, November 26, 2007 - 4:39 pm UTC


The "Active", "Terminated" refers to the Record and not the person. When a new quota record is addded you termintate the old one and add a new one with "Active" status.

I just do not beleive that a join will increase overhead. Are we talking 1/1000000 of a second.
Tom Kyte
November 27, 2007 - 3:08 pm UTC

somewhere between no overhead and a lot.

as in "it depends"

but I already gave you a valid approach that

a) enforces data integrity
b) removes the need for a join if it is a problem


A reader, November 28, 2007 - 10:00 pm UTC


thanks, but i am curious on that "depends".

Does it depend on the number of rows in transactional table. Let us say you have 100,000 indexed records and you join that to the lookup table which has 2 records.

I guess we can benchmark it/time it to see if it does really make any difference or not unless there are other factors?
Tom Kyte
November 29, 2007 - 8:32 am UTC

if you have 100,000 records and join to 2 we would expect a plan like:


the small table (2 row table) would be read and turned into a hash table.

then, for each row in the big table, we would hash the key - and probe the hash table.

eg: psuedo code:

for every record in small_table
    x := hash(join_key);
    place into hash table based on X
end loop;

for every record in big_table
    y := hash(join_key);
    access hash table(y) to find match (or not)
    if match found 
        output record
    end if;
end loop

as opposed to (with a DECODE or case)

for each record in large table
    evaluate simple decode/case
    output record
end loop;

will that matter? likely not in most cases. But the hash computation for each record and the probe into the hash table would likely be a little more expensive than evaluation of a case/decode.

db design

A reader, December 07, 2007 - 12:11 am UTC


I have two questions on db design:

1. If you are creating a table for authentication. Do you add the session id and expiration time to the users table or in a separate table and you terminate old sessions everytime you create a new session.


2. If you have a book that may or may not exist on 4 servers, and goes through two tests would you track the status of that book on each server by adding one column per server or you do it in the data itself like

serverD_exist _YN




Tom Kyte
December 10, 2007 - 10:32 am UTC

1) typically users can have more than one session. hence, it would not make sense to have it be attributes of the users table. But, maybe that doesn't apply for you - so not possible to say really.

I don't use tables for authentication - I use the database or a directory service (single sign on)

2) huh?

but in general, if you are tempted to have columns named after servers (things, dates, observations, whatever) just STOP.

today you have 4 servers, tomorrow you have 5. Next year you have 10. Two years from now you have 1.

Using columns for that is a disaster.


A reader, December 10, 2007 - 2:07 pm UTC


1. Why do you think adding column for a derived status is a disaster. There are many schemas that have "Phone1", "Email1", "Total Quantity of Item".

If there are more servers added (which I doubt but may happen), you add another column?

What do you recommend here? creating a 3 column table and use data instead or create one "Status code" and build a huge table for all combinations of possible statuses.



2. How can you create database accounts if you have 100 users and using mod_plsql. DO you create 100 DADs or you let them key in userid/password in the DAD?

Tom Kyte
December 10, 2007 - 2:21 pm UTC

1) i already stated why

because there will be a phone2, phone3, phoneN someday.

The correct way to approach this is fundamentally simple. You have a one to many relationship.

Answer this question: find me the person with the phone number (123)-456-7890

I don't care if it is phone1 or phone100 - find it.

find it fast
find it efficiently

select * from person where person_id in (select person_id from phones where phone=:x);

I'm done - person_id is the primary key, is nicely indexed.

Now, how do you do it?

That is just one reason - the correct way to model and implement a 1:M relationship is via a parent/child table.

2) They can supply the userid and password, the DAD does not need to store one.

One DAD.
They all use it.


A reader, December 10, 2007 - 5:12 pm UTC


Actually I do have a table to store the 1:M relationhip between a book and server "Book_server". It stores several details about when a book moves in or out of a server.

But what I want to decide on, is how to store the last status (derived) of that book on each server. For example I may have 100 records that move a book in and then out. I want to store the final status for existence of that book on that server (in or out).

I think it is similar to inventory calculation where you have hundreds of receipts and shipments and you store the "final stock qty" in the master stock table.

2. I assume when you use the DAD authentication like you say, I have to create a static login page first and then pass that to the DAD because I cant have the login page generated from database unless I use some kind of dummy account.

I also think that if you want access control meaning showing users different links on a main menu web page you still have to create a table that assigns links or roles to db users? right
Tom Kyte
December 10, 2007 - 6:16 pm UTC

... But what I want to decide on, is how to store the last status (derived) of that
book on each server. ...

select * 
  from (select * from book_transactions where book_id = :x order by book_id, trans_dt desc )
where rownum = 1;

you don't store it "special", you just query it out.

or you have a SINGLE attribute - "what_server_am_i_on" that you set to the server name - not an attribute for each server it might be on.

2) no, if you choose to not store the user/pass with the DAD, the browser will prompt you. You create no page.

grant - you just "grant", then you have roles and privileges and they are stored in a table for you, the dictionary.


A reader, December 10, 2007 - 5:27 pm UTC


By the way, I liked your example about the query on the phone number but I do not see it apply here. Each server is different object.

How can you rephrase the question to ask "Find me the book that exists on server 2".

I would do

select book_no from books where server2_in='Y';

AM i wrong?
Tom Kyte
December 10, 2007 - 6:20 pm UTC

... Each server is different object. ...

that makes no sense. I don't know what you mean.

if you have this transaction table....

select *
from (select a.*, row_number() over (partition by book_id order by trans_date desc) rn from book_trans a )
where rn = 1 and server_in = 'server2';

keep the most current transaction record for each book, and find the books such that the most current record says "I am on server2"

you seemed to be saying before "I have a book, I would like to know what server to find it on", that is easily answered as well.


A reader, December 10, 2007 - 7:10 pm UTC


The queries that will mostly be done are:

What books do I have on server 1?
What books do I have on server 2?
What books do I have on server 3?
What books are on Server 1 and not on Server 2?


1. Having a single attribute like you suggest would be too complex because you have to define all the combinations of book existence on each server and create a unique code for it. right?

2. Are you suggesting to query it out every time from the transaction table and NOT storing the final Status in a table?

select *
from (select * from book_transactions where book_id = :x order by book_id, trans_dt desc )
where rownum = 1;

If yes, would you create an MV or one view per server status for ad hoc users to do a simple query to know the status? It would be nice if I have one view that computes the status of each server.

Tom Kyte
December 10, 2007 - 8:11 pm UTC

If a book can have more than one server - it should be such that there is a 1:M table that expresses "servers I am on"

A "transaction" record can be processed so as to remove or add a record to this table, but it would still be a 1:M relation

book 1:M book_transactions (your existing table)
+- 1:M book_to_servers_I_am_on

Now you can:

select book_id from book_to_servers_i_am_on where server = '1'; - to find those on server 1

select book_id
from book_to_servers_i_am_on
where server in ( '1', '2' )
group by book_id
having count( case when server = '1' then 1 end ) = 1
and count( case when server = '2' then 1 end ) = 0

to find those on server 1 not on server 2.


A reader, December 10, 2007 - 10:30 pm UTC


Thank you for the excellent reply.

1. Just to confirm, is "books_to_servers_i_am_on" a "Book_Server_Status" table that you keep inserting records into (using a trigger) after there a transaction is stored into "Book_transactions"?

2. Would not this be redundant since you can compute the same result by creating a view against "Books_transactions" that looks for the last created date for a server/book transaction.

3. An ad hoc user can never write a SQL query like the one you wrote to find a book on server 1 and not on server 2. They also might want to check different things like how many books are not on server1, not on server2 and on server 3.

How do you make those queries easy for them with a simple command (ie as if you have one column per status(Y/N) )?

4. If a book goes through several cycles of same two tests on server1 only, would you use the same table "books_To_Servers_iam_on" to track the final result of each test (P/F) or you would create another similar status table or you would just use a view against the transactional table where all the messages coming in are stored.

Tom Kyte
December 11, 2007 - 7:43 am UTC

1) NO TRIGGERS, use well formed TRANSACTIONS.

2) you have "transactions", you have other "data", some attributes of your data includes the servers they exist on. your transactions table is like an audit trail, the REAL data is in the table I'm suggesting.

3) you write user interfaces, you provide API's. Most users cannot write a query to save their lives, that is what we do for a living.

4) you'll need to flesh this out - I don't know what "several cycles" is, means or implies.

db design

A reader, December 11, 2007 - 10:57 am UTC


1. Why you do you prefer to write a transaction instead of a trigger. I think you mean do this:

--For a move transaction
insert into Book_transactions(trans_seq_no,book_no,server,status,col5,col6,col7,date_created) values (1,ISBN1000,Server1,IN,val5,val6,col7,sysdate);

--For the final status transaction
insert into Books_to_servers_i_am_on(book_no,server) values (ISBN1000,Server1)

instead of defning (After insert trigger) on Book_transactions.

2. Are you saying that there is no issue in saving same data (book,server) in two tables?

Table 1 is an audit trail of all the transactions and Table 2 will have the final server/book status record?

or are you saying to not save that piece in Table 1 but only in Table 2.

3. What I meant by several cycles is that if Book goes through "Test1" and it Fails, it can go through it again until it Passes. So you can have tests like this:

Book1, test1, Fail, 1/1/2007
Book1, test1, Pass. 1/2/2007
Book1, test2, Fail, 1/3/2007
Book1, test2, Fail, 1/4/2007
Book1, test2, Pass, 1/5/2007

The status message for this data is also stored in "Books_servers" table. Do you think it is appropirate to save the last final status of the test in same "books_to_servers_i_am_on" or create a smilar test status table like
"book_tests_results" that will have the last activity for each test?

or would you create a view that derives status from last datetime stamp record for each test.
Tom Kyte
December 11, 2007 - 11:25 am UTC

... Why you do you prefer to write a transaction instead of a trigger ...

because triggers are evil black magic that cause no end of issues over time as people just aren't expecting "magic side effects that just happen by accident". Every single time I see a trigger implementation - there is a war story involved that goes with it.

If you can do something:

a) in a straight forward, easy to understand, easy to document, easy to maintain way


b) by magic, stuff just happens, no one sees it happening, automagical

I choose (A) every single time.

2) we are not saving the same data it would appear. Your "transaction" table is an AUDIT TRAIL as far as I'm concerned, a historical record. The transaction in my opinion is:

a) update or insert a record into the "books to server i am on"
b) create an audit trail record stating you did this

the audit trail is historical, the table in (a) is your data.


A reader, December 12, 2007 - 11:08 pm UTC


You convinced me with your excellent db design skills. just a few small questions

1. If i have a screen used by "Testing" where they can mark a failed book to be moved from server 1 to server 2. Another process/screen allows them to flag a passed book to be moved from server 2 to server 3. A 3rd process/screen allows to them to flag a book to move from server 3 to 4.

I have created 3 tables to track each transaction:




would you create individual tables or ONE big table with another column that tells you the "Type of MOVE"?


2. Do you have any opinion on #3 above on whether you would also store book test reults in same table "which_server_I_am_on" or create a separate table or create a view that derives the values.

3. How do you usually decide between creating a view that looks up the last activity on a book in the transactions table which answers (which servers I am on) versus creating the small table "books_to_server_I_am_on" you advised?

Tom Kyte
December 13, 2007 - 9:42 am UTC

1) why three tables. why three tables named after servers? what happens when you add the next server - which you say probably won't happen, but could. what happens when server 2 disappears and you need to move from s1 to s3.

think about this please - it is the same sort of situation as using columns in a table - instead of a child table.

You have different transaction types, ONE transaction table is more than sufficient, not only more than sufficient - the only way to do it really.

3) your transaction table does not support the query in question - if it did, we would just have used that. You cannot just pull the last transaction as I had originally presumed. Your transaction table is and audit trail really and in this case would not have been useful.

db design

A reader, December 13, 2007 - 5:02 pm UTC


1. Actually the tables are named after processes (not servers). We have 3 processes that move a book from one location to another location. Each process moves the book from one location to a different location.

I think what you are saying is that the more locations increase the more tables I will need which becomes unmanageable at some point in the future.

However, is not db design similar to organizing information on you desk. If you are in charge of moving stuff around, would not you create 3 different file folders and store each type of request in one.

I like your idea though. Would you add one column "Move_Type" or two columns that define name of server "from_server" and "to_server" to implement this?

2. Actually the transaction table does support the query in question since it is derived from it. Here is an example of the data that goes into each table.

(trans_no (PK)
book_status_Code (in/out)

(book_no (PK),
server_id (PK),
trans_no (FK) optional)

For example, I can received 100 messages into the "Book_transactions" on a book/server but the last one will show whether it moved in or out of server. That last status will be inserted into "books_to_server_i_am_on". So I will have one record in "books_to_server_i_am_on".

So doing a query on "Book_transactions" like you said before should also give me final status which matches the record on book_transactions.

select *
from (select a.*, row_number() over (partition by book_id order by trans_date desc) rn from book_transactions a )
where rn = 1 and server_in = 'server2';

Thank you,

Tom Kyte
December 14, 2007 - 12:53 pm UTC

do not confuse your desktop with a database.

No, it is not similar. think of the files as rows. think of your desktop as your table - if you must.

db design

A reader, December 14, 2007 - 4:17 pm UTC


Do you agree with #2 above that the status can also be derived for the transactional table too.
Tom Kyte
December 14, 2007 - 4:24 pm UTC

as long as what you say is true - that the "last" record is the one that you want.

db design

A reader, December 14, 2007 - 4:45 pm UTC

I think you say I can still have a status table instead of a view that derives it from transactional table. The view might take time if transactional table has thousands of records.

I think this is more like tracking "Stock Quantity" in a retail or inventory system. An item gets received and shipped every day. You compute the "Stock Quantity" each time and save it in a table so users can see it.
Tom Kyte
December 17, 2007 - 10:46 am UTC

You see, I would not tend to store stock quantity that way - because typically you are looking for the quantity of a single ITEM - and computing that is not expensive.

Same for individual books - I would not store this redundantly - I would have used the query with rownum way above:

select *
from (select * from book_transactions where book_id = :x order by book_id, trans_dt desc )
where rownum = 1;

extremely cheap to execute.

It was only when you said that you frequently needed this for ALL BOOKS at the same time - that we'd even consider storing it "special"

db design

A reader, December 14, 2007 - 5:09 pm UTC


If a book moved "out" of server 1 and "in" to server 2.

IF you receive one status message per above, is it OK to enter 2 records into the "Books_transactions" to indicate the "out" and "in"?

OR you would log in the status message (one record) as you receive it in "Books_transctions" for audit trail and then parse it and enter the in/out results into other transaction tables?

Start your design with a data model

Mike, December 17, 2007 - 9:58 pm UTC

It seems you have spent a lot of time saying 'how about this design', 'how about that design'. But these seem to be unconnected attempts to score a hit, not the fruit of a good design process.

A good database design starts with a data model: a representation of the business data requirements. You should start with an understanding of the entities: 'what are the things I need to keep information about?' From what I read (hopefully all of the anonymous postings are really from the same person about the same problem):
- You have Books.
- You have Servers, which may also be called Processes.
- You have Tests, which (given that Servers may actually be Processes) may or may not also be suspected to be variations of processes.
- You have events which are Moves among servers.
- You have events which are Test Results.

For each entity, an identifier (natural or contrived) is identified.

Then you identify relationships between them. A Book is located at one (or more? evidently) Servers. A Move involves a Book, a 'from' Server and a 'to' Server. A Test Result involves a Test and a Book.

Then you identify the rest of the attributes, or descriptive data elements, like names, times, quantities, etc.

From these requirements, described by entities, relationships, and attributes, you have a data model. The starting point for your database design would generally be a table for each entity type, primary-key columns for identifiers, foreign-key columns and constraints for the relationships (or associative tables for many-to-many relationships), and columns for the remaining attributes. All good designs are derived from this starting point.

I have simplified here, because of the sketchy information and the limited time and space. But this is the process that you must do in order to arrive at the correct design. If you do not have an accurate data model, you do not have anything against which you can judge design proposals.

The book I learned from (and have recommended to others in the past) is Handbook of Relational Database Design, by Fleming & von Halle. It is nearly 20 years old but it is still listed on Others may recommend other books. I suggest that you invest some time in learning the fundamentals of database design.

(This whole scenario sounds like it could be an academic assignment....hmmm)
Tom Kyte
December 18, 2007 - 1:19 pm UTC

(thank you...)


A reader, December 18, 2007 - 12:11 am UTC


1. What did you exactly mean by this statement?

<It was only when you said that you frequently needed this for ALL BOOKS at the same time - that we'd even consider storing it "special" >

2. you said that you would "compute" instead of "Store" the Stock qty. However, does it make sense to compute thousands of records every time you want to know the "Stock Qty" of an item

For example, let us say we did an inventory for "Pepsi" cans in a supermarket and counted 1000. Now we enter this value in the database and then we will have transactions "Shipments" and "Receipts". Your computation will always be:

Last Inventory Date (1000) - All shipments from that date + All Receipts from that date.

After 6 months (Assuming we did not do another count), you will be scanning records within 6 months time frame.

Does it not make sense to create another table

and store that every night or every transaction.

Tom Kyte
December 18, 2007 - 1:31 pm UTC

1) I meant precisely what I said.

I made the reasonable assumption initially that you wanted to find the status of A BOOK. I showed you how.

later you said "I want it for all books", that changes radically the question needed to be asked of the data - so the forgone assumption I made was not valid and the answer changed.

this is why we gather requirements and then design.

2) if I want it for a single stock item, I'd compute it when and IF i needed is (because my experience is you rarely need it in real life, the expense is not worth storing it for each modification - the extra code, the extra testing the extra concurrency issues - don't want to deal with it)


A reader, December 19, 2007 - 10:42 pm UTC


OK I think I understand what you saying which is:

if the requirement is to know the status of one book create a view to compute it.

If the requirement is to "find all the books on server A or find all books on server A and C" then store it in a table and update the status in every transaction as it will take time to compute it real time on requests. This rule applies even in the case you have low number of transactions per book (like 10-20 records per book).


2. In your experience with Banking applications and retail chains, do they mostly store stock quantity or customer balance or compute it. It seems to me it is stored in a table but might be updated nightly?

3. If you are creating a table to store an administrative input screen where admin logs in, select a function (6 radio buttons for different functions) and press submit. The table will be an audit log (uneditable) meaning that every time he does "submit" a new row ins inserted.

Would you create a "Reference table" for all functions and store a function_id in the transaction table or you would create the actual name of the function and just have one table. Function length varies from 1 word to 4 words.

Which is better?

Thank you,
Tom Kyte
December 20, 2007 - 10:00 am UTC

I think you need to

a) define your requirements

b) model a schema that fully supports efficiently your requirements

that is a general way of saying what I'm saying.

if one way were universally better, it would be hard coded into the software and you would not be able to do it any other way.


A reader, December 20, 2007 - 11:26 am UTC


I did both of what you said. IT is just that little small thing that is left on deciding whether to store derived value or crete a view.
My requirement is to know "all books that exist on one server" or "what all books that are on one server and not on other server", etc.

Based on this i concluded from your discussion, that I might want to store these in a status table (server,book) that would be updated based on the transaction coming.

Am i wrong in my intepretation of what you are saying.
Tom Kyte
December 20, 2007 - 2:34 pm UTC

that would be correct. You want "all book status frequently"


A reader, December 20, 2007 - 6:32 pm UTC


WOuld you do a) or b) or c) described below for this requirement?

I want to store the action that an administrative does on a screen where he select a function (6 radio buttons for different functions) and press submit. The table will be an audit log (uneditable) meaning that every time he does "submit" a new row is inserted.

a. Would you create a "Reference table(function_id,function_Desc)" for all functions and store a function_id only in the transaction table or

b. you would store the actual name of the function in each record and just have one table.

c. Or use two tables but store function_id and
function_desc in the audit log table for viewing ease and query ease.

Function length varies from 1 word to 4 words or 10 bytes to 30 bytes.

Tom Kyte
December 21, 2007 - 2:48 pm UTC


will the name of the function change over time?
does the name of the function need to be constrained to a specific set?
is this function referenced in other tables?


A reader, December 21, 2007 - 11:01 pm UTC


I think you need to write a DB design book. it seems you are a master in that just like oracle.

a) No, name of function will stay the same.
b) i am not sure what you mean. name of function is like "Lock table", "close weekly job", "unlock table", "run monthly report", etc.

all of these are radio buttons on the screen. So i guess yes he is constrained with one of these options.

c) No, this function is not referenced in other tables. just a history log of the action he took.

Tom Kyte
December 22, 2007 - 11:47 am UTC

a) are you *sure* :) things change, think long and hard about that.

b) do you need to make sure "Lock Table" is put into this table or would "lock table" be OK or would "Lock-Table" or .....

do you want to constraint this attribute to a specific domain (probably, you want clean data)

You can accomplish that via a check constraint for a small set of values, you can accomplish that with a lookup table that has a single column containing the list of valid values.

If (a) is "no, name of function will stay the same forever", you either want

1) check constraint to constraint the valid values
2) or a single column lookup table with them


A reader, December 22, 2007 - 11:50 pm UTC


I would like you recommendation in the 2 cases.

1. assume function name will not change. you said I will create a check constraint or lookup table. But which solution do I implement described above a), b), c).
Would you store the function_name in the transactional table ot just function_id or both.

2. assume function name will change. how would the solution change from answer in #1.

Thank you,
Tom Kyte
December 23, 2007 - 5:47 pm UTC

1) what is "function_id" - sounds like an unnecessary surrogate - hence I would not see it.

2) then you have a lookup whereby you assign a surrogate key (ID) to this name, the name is unique, the ID is the primary key, you use the ID in your tables.

If name changes, update name in this lookup table and it is changed everywhere.


A reader, December 23, 2007 - 9:57 pm UTC


I can't see how function name changes. Do you mean if you have a function called "Backup Asktom" and next week you decided to call it "Back up Asktom website", is that a function change. You are only changing the "Label".

Function_id is a unique number for each function. I think what you are saying is that:

If function name does not change store the function name in transactional table even though the name will be repeated everytime the admin performs it

Seq_no (PK)

If function name will change over time then do this

Seq_no (PK)
Function_id (FK)

Function_id (PK)

Is this correct?

Do not you think #1 would be unnormalized as function name will repeat in different rows. Also, I assume I can't search by function name unless I spell out the full name.

Tom Kyte
December 24, 2007 - 8:25 am UTC

that is what I meant - if the function name changes in the future. Forget this particular example, think "primary key"

If the attribute I choose as my primary key is subject to possible change in value in the future, it is NOT my primary key. That is all I'm trying to say here.

I am saying you will have a "lookup table" in ALL CASES.

function name is a constrained value, subject to validation.

You will either

a) use it as a 'natural key' in a 'lookup table' - either you will have:

create table audit( fname varchar2() check (fname in ( 'a','b','c' ))

OR you will have

create table lookup (fname varchar2() primary key ) organization index;
create table audit( fname references lookup );

b) use it with a surrogate key:

create table lookup( id number primary key, fname varchar2() ) organization index;

create table admin( function_id references lookup, ... );

Your first case is MISSING a lookup table. Your question doesn't make sense - just because a value repeats does NOT MEAN "you are not normalized". Think about this for a minute, if you use the lookup table with function_id, function_name - does not the function ID now repeat just as often as the name would have - tell me what the difference would be and why one is more normal than the other.


A reader, December 24, 2007 - 8:16 pm UTC

1. In both cases I am not planning to use ¿Function_id¿ or ¿Function_name¿ as part of the Primary Key in the audit table.. I already have an Auto Generated Sequence for the transactional/audit table for that.
So I can constrain the function name with either a check constraint or a lookup table.
The question remains do I store the function name or function id or both in my audit/transaction table?
Currently, the screen will reflect the functions the customer requested. In the future it may/may not change. I can¿t really predict. In a few months/years they might ask to add new functions. Changing the existing labels I doubt it although it can happen.

2. Concerning normalization, is not this is similar to storing customer name all the time in a transactional system. You always store Customer ID and not name? Would not that make repeating values issue.

1NF = No repeating groups or multivalued fields.
2NF = Every column should be dependent only on primary key.
3NF = No transitive dependencies (no derived columns).

Tom Kyte
December 25, 2007 - 10:42 pm UTC

1) it would not be a primary key in the audit trail, at most it would be a foreign key to a lookup table OR constrained to a set of values by a check constraint.

I do not see anywhere where someone said otherwise?

If function name is immutable, it is the NATURAL KEY OF THE LOOKUP TABLE (which might be enforced via a check constraint as easily as a lookup table).

there is no function "id" if function name is a primary key (not the primary key of some audit trail table - a primary key of a lookup table that is "the set of functions")

2) no, you have totally missed the definition of normalization. Entirely.

We might use a customer ID because..... customer names change - or they might even repeat (I can certainly foresee two John Smiths). The customer name does not uniquely identify a customer. So, we use a surrogate in this case.

Do you see that you repeat your customer_id as much as you would repeat your customer_name - please tell me why one would be "normalized" and the other would not. Please tell me the difference between the two???? (hint: there aren't any - we are not talking about normalization here at all - we are talking about what is appropriate for a key value - just because you see the same data repeated row after row in a CHILD (hint - keyword CHILD) table does not mean "you are not normal" - it just means you have a foreign key)

IF (your function name is immutable)
   It is as good, if not better, than some meaningless number and should be used as the KEY of a "function" entity - this entity may or may not be physically present as a lookup table, we might just use a CHECK CONSTRAINT instead of a foreign key - but the end result is the SAME
end if

db design

A reader, December 30, 2007 - 11:17 pm UTC


OK, I got it.

I think what you saying is that:

Both cases I need wither a lookup table or check constraint to only allow valid functions be stored in the audit table.

If the function name will not change, then store that in the audit table instead of a meaningless key.

If function name will change then create a key and store that in audit table.


But how would I or you know if function name will be changed in the future. We are only talking labels here too.

Would not storing a function ID be safer and also easier for indexing/querying?

Thank you,

Tom Kyte
January 01, 2008 - 6:04 pm UTC

If your natural key is a true key (doesn't change) then yes, you would not make up some surrogate for it.

you would use the natural key (the function name in your case)

... But how would I or you know if function name will be changed in the future. We
are only talking labels here too....

YOU have to answer that.

If these are labels - they will PROBABLY change - because next year, someone will not like the labels you picked. They'll want them different.


A reader, January 02, 2008 - 11:28 am UTC


One function name is:

"Close monthly order (all customers)"

That would be kind of long to index/query even assuming it would not change. Do not you think so?

Tom Kyte
January 02, 2008 - 2:40 pm UTC

that is not a function name (please, just look at this stuff and use some reasonable-ness test). That is obviously a description, and obviously will but subject to change - next month it should be "Close Monthly Orders (for all customers)"


A reader, January 03, 2008 - 3:19 pm UTC


yes it is a description. Does that change anything in your proposed solution above. I wont use it as a PK for sure.

But I assume you would still use a lookup table/check constraint to validate it and assign a surrogate key for each description.

Or you would just save the description in each transaction saved in the audit table?
Tom Kyte
January 03, 2008 - 3:47 pm UTC

a description is not a primary key value (obvious isn't it?)

The description sure seems to me to be "not immutable".

This is a column in a lookup table. You would have a small code pointing to it.


A reader, January 05, 2008 - 11:37 pm UTC


If you are designing a simple system to store all the http messages coming in to your application and all the XML responses that the stored procedure sent out to the client program. I need to check out the response that went out for each message coming in.

would you create two table like this:


mesgout_no (PK)
mesg_no (FK)

OR one table


I think you are going to ask if it is one to one realtionhip or one to many relationship. Correct me if I am wrong I do not think a stored procedue can generate multiple http responses to one http request.

Tom Kyte
January 07, 2008 - 7:59 am UTC

this is absolutely a single table.

there is a one to one relationship between


think about it.... You have a set of inputs, you do something, you formulate a response and send it back.


A reader, January 08, 2008 - 7:22 pm UTC


SO you are saying have one ROW for URL coming in and XML message going out?

You can never have multiple messages out for one http request?

But using your logic, if you have a shop and you only ship one ORDER once, would you create one table for the order and shipment information. It seems to me two tables would me much cleaner even with one order to one shipment relationship. is not?
Tom Kyte
January 08, 2008 - 8:00 pm UTC

A stored procedure

a) receives a set of inputs
b) produces a set of outputs

you are saying you want to log that, therefore, you have a single table to log that.

Now, whether you need a table per procedure is AN ENTIRELY DIFFERENT question, however, if all of your procedures

a) take a single xml input
b) produce a single xml output

you are back to one table.


A reader, January 09, 2008 - 2:07 pm UTC


When you populate Validation/Lookup tables, do you use a number sequence or a abbreviation code for PK.

like for this description

Close Monthly Orders (for all customers)

do you use a code like "Close" or "CMO"
or a number like "1"

Tom Kyte
January 10, 2008 - 2:13 pm UTC


for something well defined like a state lookup, I'd use the state lookups

VA - Virginia
NY - New York

For something with established codes, you would use those.

For something like you just described though, it sounds like you are "making up codes", in which case any surrogate will do, users will never see these values - numbers would work fine and be easier to add over time.

What would you do in the future when you not only

CMO - Close Monthly Orders
??? - Clone Monthly Overstock

db design

A reader, January 25, 2008 - 8:01 pm UTC


If you need to store a software version in db colum like
version 1 release 2 as


would you store it as number(5,2) or varchar2(5).

I need to compare this number to another user provided number to see if it is larger or smaller than that.

Tom Kyte
January 28, 2008 - 7:04 am UTC

depends on your versioning scheme, doesn't it.

will your software always be major_release.patch_number

and they are both numbers so that 1.2 makes "sense" - it would be 1.2, not 1.20, not 001.20, but just 1.2.

or will you have a 1.2a, 1.2b, and could 1.2z come before 1.2j (there was a 6.1 before 6.0.36 of Oracle...)

In general, version numbers should likely be held in a table with a "date of release" for sorting purposes and then some string that represents the version. 5 characters seems rather small.

RE: "software version in db column"

Duke Ganote, January 28, 2008 - 10:10 am UTC


A reader, January 28, 2008 - 11:52 am UTC


the requirement is to maintain the current oracle app version/release. No need for maintaining history and there is no alphabet involved.

When the client makes a login, he submits his version number and we compare that number to the current one to make a decision.

My point is you cant do comparisons (<, >, =) with varchar2 columns. The column has to be a number type or converted to number when doing comparisons.
Tom Kyte
January 29, 2008 - 2:47 am UTC

... the requirement is to maintain the current oracle app version/release. No need
for maintaining history and there is no alphabet involved. ...

but your next paragraph belays that

... When the client makes a login, he submits his version number and we compare
that number to the current one to make a decision. ..

in order to know ">" or "<", you need to understand the "history" and yes, today perhaps there is no letters, but think about it

Also, how could a client have a version greater than your current? So, it would seem it could be:

a) check to see if it is the same - then they are current
b) else they are not current, hence they have an older release

You can do >, <, = on varchar2 columns - why would you say otherwise?

Manytimes people stuff a date into a string - using yyyymmddhh24miss for the format, that absolutely lets you use >, <, = to compare the strings "as if they were dates"

You can do lots of things, but in this case if you always have "major version.patch level", using a number would suffice.

Can we have multiple instances in one database?

Dawar Naqvi, January 31, 2008 - 9:14 pm UTC

Can we have multiple instances in one database?

Tom Kyte
February 04, 2008 - 3:33 pm UTC

an instance mounts and opens exactly ONE database.
a database (with RAC - real application clusters) may have more than one instance associated with it.

instance => to a single => database
database => accessed by one or more => instances

instance = memory + process
database = set of files.

instances are not "in" databases, instances are processes and memory you use to access a database.

@Dawar Naqvi RE: multiple instances in one database

Duke Ganote, February 01, 2008 - 3:03 pm UTC

The Oracle Concepts Manual gives definitions, saying:

"An Oracle database server consists of an Oracle database and an Oracle instance. Every time a database is started, a system global area (SGA) is allocated and Oracle background processes are started. The combination of the background processes and memory buffers is called an Oracle instance.

"Real Application Clusters: Multiple Instance Systems
Some hardware architectures (for example, shared disk systems) enable multiple computers to share access to data, software, or peripheral devices. Real Application Clusters (RAC) takes advantage of such architecture by running multiple instances that share a single physical database."

db design

A reader, April 01, 2008 - 10:50 pm UTC


as a followup to above question let us say you have a client sending messages that tell database status of books (i.e moved into server A, moved out from server A, etc.).

I am saving all the incoming messaging into an audit trail table "book_trans", and then based on the message content I am inputing a status record into book_status table. If a book exists on server A, I enter this record amd i store the trans no that generated this record. Of a message says that book was deleted I delete the record from book server status.

status (in/out)


My question to you, is that I need to track 3 things for a book: whether a book exists on specific server and if yes which directory it is on.

1. Do i add "directory" to "book_to_server_iam_on" or always get it using the trans_no column from book_trans table.
2. is it ok to store the "trans_no" in "book_to_server_iam_on"". I added it to sort of debugging to see the last message that generated this status record.
Tom Kyte
April 02, 2008 - 6:26 am UTC

1) you already have it stored, indexed, easily accessible in book_trans, I see no reason to duplicate that information.

2) seems like a foreign key to me in this case - back to the relevant information for this book.

db design

A reader, April 02, 2008 - 12:15 pm UTC


well "book_trans" might have 1000 transactions of which the last action/transaction result (derived) is stored in book_status. So you are saying not to store the "book directory" value of that last message again and just store the last transaction number in "book_status" and get it from "booK_trans" using trans_id instead of querying one table only "book_status" for which books i have on servers and where they are located. is this correct?

Tom Kyte
April 02, 2008 - 12:24 pm UTC

I'm not saying anything really - we've had this discussion over the course of many many weeks/months - and I don't remember all of the details.

store data normalized, do not replicate data. do not be afraid to just join.

I would have to assume that trans_no is the primary key - so it doesn't matter how many rows are in that table, we are only getting one.

roles in hierarchy.

Amir Riaz, April 05, 2008 - 12:02 pm UTC

Hi tom

I am designing a database. I have a requirement in which i have to maintain a hierarchy amoung users. for example the top user is

manager, chief engineer,
financial manager marketing engineer
account Product engineer

the requirement is each user can see all the data below his hierarchy. there is also a chief administrator and its administrators. a chief administrator manages administrators. an administrator manages managers roles and all the roles below it. these role are static means once an administrator assign to a user. only administrator can change it

while chief engineer and its below roles are dynamic. i.e any person in the top hierarchy can change the users below it for example chief engineer can make a marketing engineer a product engineer and a product engineer to any one low in hierarchy

I have created a user entity and a role entity with many to many relationship(My God another contraint). i have entered all the above hierarchies as role in it. what i think is how can i represent hierarcy in role table. I have thought two options so far.

use numbers with the hieghtest number representing the top role

use self join with role entity.

should i use another entity for dynamic roles like dynamic_user and join it to roles. it has same attributes as user entity has.

I am using FGAC for security purposes.

Amir Riaz
Tom Kyte
April 07, 2008 - 8:53 am UTC

... use numbers with the hieghtest number representing the top role ...

dismiss this thinking - the "highest number" thinking - there is no such thing as a "highest number".

I cannot answer your question - the details are a bit "fuzzy". Normally, a hirearchy would require a single two column table

------  ------------------
CEO     <null>

and role would be the primary key, and you would have a foreign key to point to it.

I did not get the "many to many" bit

Thanks for reply

Amir Riaz, April 07, 2008 - 12:32 pm UTC

Hi tom

I did not get the "many to many" bit

what i mean is a user can have multiple roles and a role can be assigned to multiple users.

so far i have two solutions. In one solution i have created a entity named users with user_id(pk) and roles with role_id(pk) and an associated table to resolve the many to many relationship have role_id and user_id as primary key. But again how can i form hierarchies in them.

Second solution i will have a users table and have subtype tables like ceo, chief engineer, product engineer. so for example chief engineer table will have primary of user_id and chief_eng_id and similarly with other tables of product engineer. also as You have pinpoint a very vital point that in hierarchies we should use one table and two columns like your example above I am thinking that with second solution i create another table named user_hierarchies and try to arrange data a you did in your example I hope it will work what your opinion.

with which solution hierarchy can be formed easily.
also Some where on this site i have seen FGAC example in which you demonstrated how a top user can view the data of all the users below it. but i am failed to found it now

Amir Riaz
Tom Kyte
April 09, 2008 - 9:01 am UTC

will, then users (if they are many to many with roles) could and likely will lead to recursive loops in your data - what than? What happens when the ceo manages the managers who manage the clerk - and the ceo is also a clerk...

Head spining

Amir Riaz, April 09, 2008 - 12:43 pm UTC

The role Ceo ,manager financial, manager account are static means that they are assigned by administrator to login into the site and cannot be changed after words. The roles remain fixed to users and cannot be changed without administrator login. Actually its a one to many relationship but to keep the design simple i am planing to keep them in many to many.

the roles Product engineer, marketing engineer, chief engineer are dynamic means that they are initially assigned by the administrator but after words any person in higher hierarchy can change it to lower giving his position to anyone else. A person cannot be product engineer and chief engineer in the same machine part. however he can be product engineer in one part and chief engineer in an other part. so there is no cyclic data and it also gives a many to many relationship between users and roles.

what's your opinion about the second solution? or do you feel that there are still conflict in requirements?
How can i detect conflict in requirements?

Tom Kyte
April 09, 2008 - 2:59 pm UTC

... Actually its a one to many relationship but to keep the
design simple i am planing to keep them in many to many.

sorry, now MY head is spinning

how would using an entirely inappropriate model make things 'simple'

seems to me - it would lead to no end of confusion, bad data and bugs..

db design

A reader, April 17, 2008 - 7:48 pm UTC


A quick question on the above.

Book_trans is the table that stores incoming messages for whether a book was copied to a server or not.

status (in/out)

This is a derived status table based on transactions coming in into book_trans. When a book moves to a server, I save a recod into book_trans (audit trail) and inser a record into book_to_server_iam_on.


Some of the transactions are "MOVE" versus "COPY" meaning it involves "IN" record into server B and "OUT" record from server A. There is other information that the message sends.

My question, do you think it is a good design to insert two records into book_trans if the message was a MOVE type or I should only insert one record and update my other table "book_to_server_iam_on".

Which design you think is best?
Tom Kyte
April 17, 2008 - 10:06 pm UTC

we've had this conversation ad nauseum


A reader, April 17, 2008 - 10:30 pm UTC


I did understand. the previous discussion was related to different issue.

you do not have a preference on how you would do it. Would you take one message and generate two records in the trans table or just one.
Tom Kyte
April 17, 2008 - 10:51 pm UTC

go back to your requirements, for your system.

do you need to record a MOVE for your audit trail or not.

I don't have enough information (my view of what you are doing is actually very hazy - you keep moving the cheese - I have no idea what your goal is - you do though, so... you get to decide)


A reader, April 17, 2008 - 11:17 pm UTC


sorry maybe I do not explain things well.

requirements are not very clear on this too. requirement just state to store the incoming message.

But if you are designing a banking system and user moved money from account A to account B and you are building an audit trail of his transaction, and some client send you one message about this transaction.

would you store that as one transaction in oracle audit trail table or two transactions " one for deposit into account B and one debit into account A.
Tom Kyte
April 18, 2008 - 8:24 am UTC

... requirements are not very clear on this too. requirement just state to store
the incoming message.

perfect, store the message, but also store the DATA CORRECTLY. No materialized view, just store the data correctly.

would you store that as one transaction in oracle audit trail table or two
transactions " one for deposit into account B and one debit into account A.

you tell me, what are the data requirement needs of YOUR system.


A reader, April 18, 2008 - 9:10 am UTC


date requriements are to track a history of the moves of a book. so if i went to DB i need to see that book 1 has moved "in" to server "A" today. This automatically implies that book moved "out" of server "B".

The incoming message (which triggers the API) only emphasize the move in and the code should create automatically the "out" record. Just wondering if it s OK to create two records in "audit" trail table for that one incoming message, or the seond record is considered to be derived or concluded and should not be stored there.

how would you design it.
Tom Kyte
April 18, 2008 - 10:20 am UTC

You may do it however you want to, what pleases you the most.


A reader, April 18, 2008 - 3:19 pm UTC


I always like to take opinion of the best in the world. This is how you learn new things correctly.

I interpreted your advice
<No materialized view, just store the data correctly. >

to mean take one message and store two records in audit trail to reflect the OUT from one server and IN to another server..

This is my plan.

BTW, is there like a "brain exercise gym" you visit. I always wondered how you get that vast knowledge. is it "reading" or "practicing", or "genetic intelligence".

What is the secret.


supriya, April 21, 2008 - 2:32 am UTC

Hi Tom,

I am developing a small application for maintaining student information using forms6i,oracle9 and os-windows.
i have sample version of oracle software which is downloaded from oracle site (right now using the same).
here my question is ...

1) can i use sample version of database for real application?
2)is it creates any problem after sometime?

right now i have created one new user (admin) by connecting to system user and started creating tables,...etc

3) I am not much familiar with concepts of maintainance of database...i mean

how to maintain table-spaces/extents once it's full
how to extent the extents/segments/tablespaces
any auto extent feature? if yes, where to set this parameter?

the people who are going to use this application doesn't have any DBA person.

what are the steps need to take to handle databse without DBA?

Transaction related info : daily nearly 5000 to 6000 transactions in atable having around 10 columns in a table.
column max size is varchar2(<30)...

application has hardly 15 to 20 tables.

appreciate quick response.

Thanks in advance,

Tom Kyte
April 23, 2008 - 4:32 pm UTC

1) if you use Oracle Database XE (express edition) you are bound by the licensing agreement that goes with that (4gb limit on data, 1 cpu, and so on).

If you use Oracle Database SE, and EE (standard/enterprise) you have to purchase that.

2) ... what are the steps need to take to handle databse without DBA? ...

probable to not use the tools you are using, to use HTML and to host the database for them (eg: use APEX - application express - for demos).

I seriously doubt you want to use forms, Oracle 9i (old) for this without having some IT staff to deploy

A reader, April 22, 2008 - 5:45 pm UTC

I have 10 users in each group like finance, hr, research.
10 users in each group will have similar privileges.

Do you recommend creating user for each of the user in the database? So it will add up to 10*3=30 users. This might increase in the future.
Do you recommend creating user groups - fin, hr and res and grant privileges to these users?

Which is better in terms of user management in Oracle database?
Tom Kyte
April 23, 2008 - 6:00 pm UTC

especially with such small numbers of users (anything less than many many thousands being small), I'd want an account per user, for accountability reasons.

db design

A reader, April 27, 2008 - 4:19 pm UTC


supriya, May 05, 2008 - 5:32 am UTC

Please refer the follwing for my previous question and followup given by you...
Followup April 23, 2008 - 4pm US/Eastern:

oracle version details ...

SQL*Plus: Release - Production on Mon May 5 14:41:57 2008
Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
Connected to:
Oracle9i Enterprise Edition Release - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release - Production

OS : windows XE

for this small application do i need to go purchase of the same.This application i am using just for internal purpose and installed only on 1 machine...application/database will be accessed by one machine only.

database and application is inatlled in their a few days there are going to start using it...
with online support...

here my question is

1)once the datafile is over how to extend that?
like this any other precautions need to take regarding DBA activities...
2)any initilization parameters need to set?

Thanks in Advance

Tom Kyte
May 05, 2008 - 10:17 am UTC

1) the the datafile is over WHAT?

Please check out the 2 day DBA guide that comes with XE - it covers the basics of administration.

2) you can accept the out of the box ones if you like - the ones that need be set (db block size for example) already are, else the database would not start.

db dsign

A reader, May 07, 2008 - 12:06 am UTC


If you have database "A" where all developers work with. You create a new database "B" that interfaces with another external client. database A needs to access some tables in database B but you want to keep database B secure.

Would you give developers direct access to the tables in A over a link or would you create views in A for the tables in B (also over a link) or would you use synonyms?

If views are OK, do you make those Read-only (if they only select data).

Tom Kyte
May 07, 2008 - 1:34 am UTC

probably create a new user in A and B to support this. Let us call this user "RO" for read only.

In database B, the RO user is granted select on the necessary objects.

In database A, the RO user creates a non-public database link to B, and creates views that are select * from table@remote. It matters not if you use read only - because RO only has select on these tables.

RO grants select on the views to whomever.


A reader, May 07, 2008 - 11:05 am UTC

1. why create a second user in A. just create view in RO in B and give select only access to the main account in A. this is what they use.

2. Another way we implemented is create Local views in A for some tables in B (view was created using db link). do you think this is bad design or it is OK. the only problem with this is that everytime you add or delete a column in table in B you have to recreate the view in A. it does not refresh automatically. However, that would be true too even if the view was in B.

I guess we can grant direct select on table directly to avoid the view thing. not sure if this is bad or not.
Tom Kyte
May 08, 2008 - 3:17 am UTC

1) because you asked my opinion. Because I would have user in database A and B that was set up for just this purpose and nothing else. Because I like small bite sized, manageable things. Because I like the concept of least privileges. Because it is easier to verify that what we want accomplished (just read only access to this set of tables is true). Because it is the way I would do it.

You asked, I responded.

Let me start asking YOU questions - why would you want to "avoid this view thing".

and I do not think that every time you add a column in B, you have to recreate the view in A - why would A need access to this new column, unless the code on A was going to be modified to support it, in which case - in your TEST SYSTEM you would have created a new view, a new set of code and then you would have rolled out the new functionality in a controlled fashion

This is why I would choose to USE a view - so that a change on the other system does not kill me, I can plan for the change, test it, roll it out when I want to.

Database Design Book

anonymous, June 03, 2008 - 1:36 am UTC

Hi Tom,

Can you refer me to a good book on Database Design?


Question reading separating tables design from the application source:

steve, June 12, 2008 - 6:07 am UTC

Hi Tom

I have just started working with a new company and their Database procedures require that two schemas for each PLSQL application.

They have one schema holding Meta data for configuring the software lets call it APP_DATA that is locked down (I believe so no one can do any DDL). They then have another lets call it APP_SOFTWARE that holds just the PLSQL packages. This means all the PLSQL packaged are separated from the configuration data tables that the application will be writing to and reading from.
NOTE: the Customer data they are processing is stored in separate schemas.

My questions are:

Have you seen this before (Is this a "best practice")?

Can you guess at what security worries would require them to separate the PLSQL from it's own configuration-tables, especially when the source code has the rights to modify these tables?

Tom Kyte
June 12, 2008 - 8:10 am UTC

this is great.

You cannot mess with the constraints.
You cannot mess with the triggers.
You cannot mess with the physical schema.

and if they want to, some tables can be read only, others could be read/update only, yet others read/update/insert, yet others ......

The source code does not have to have the right

Separation is good - helps to document the set of least privileges needed. The application (source code) should not have the ability to mess with the structures (the tables) and this accomplishes it very nicely.

Not a bad idea to break it down even further, using schema's to hold different components of a large application - nothing wrong with that.

RE: separating tables design from the application

Duke Ganote, June 12, 2008 - 5:24 pm UTC

I can verify separate schemas for data structures and code is common enough. I first saw that approach at my former employer for our "roll your own" ETL code and data warehouse tables. Made perfect sense because the data architect wanted strict controls on the data structures even in dev, while the developers had latitude for the code itself.

Most recently, I separated the ETL code (with some configuration tables) from the actual data mart. That way, the BI report writers only saw the star schema itself, without any distracting temp or configuration tables.


A reader, June 14, 2008 - 12:43 am UTC


I have two databases AAA and BBB.

DB AAA interacts with http client (external world) and DB BBB supports production users mostly.

An external program does a few tests on a book: test1, test2, test3 and sends the 3 xml files to schema A so production users can view. A book can have many sets of tests.

How would you design this?

I am thinking of storing the received data via http message into an audit trail table and then insert the files into schema B.

database AAA

trans_no Number(10)
book_no number(10)
test1_xml CLOB
test2_xml CLOB
test3_xml CLOB

Database BBB

book_no number(10)
test1_xml clob
test2_xml clob
test3_xml clob

If i received 10 test result sets there will be 10 records in A_TRANS_TABLE and the last record in B_TESTS_TABLE.

1. is this a good design. would this be unnormalized since i am storing duplicate data sometimes. same data in two places. would you let user access a table directly in AAA instead.

2. would i be able to insert the XML file from AAA into BBB database over a link. I think a CLOB has a limitation. How would i accomplish that?

3. Would you use a CLOB or XML column type in 9i. any advantage to each in queries or parsing.


Tom Kyte
June 16, 2008 - 11:32 am UTC

you and I have talked books a lot on many many other threads. It keeps coming back to this basic fact:

given a short blurb of text, no one can 'design' anything.

I have no clue what a 'test' is in your mind.


A reader, June 16, 2008 - 4:36 pm UTC


An external program reads a copy of a book on Server A, and runs a spelling check.
It generates an XML file with the test results. It sends the xml file via http post to database A who interface with outside world and logs it into (save it) A_TRANS_TABLE

and then forward the file to internal production database (B) to be stored into B_TESTS_TABLE accessed by another department.

Do you see any duplication here? I do not see it. i just want to check with you

Also, How would you insert the file into database B from A. I do not think a DB link will allow that? correct

Do you use another http call in stored procedure in A or you call a stored procedure in B from A?

thank you,
Tom Kyte
June 16, 2008 - 4:46 pm UTC

I'll never understand why you keep wanting to interface to the database using http... It isn't very reliable - unless you start adding tons of protocols on top of it and then i just say "why didn't you just log in and use sql and do it easy"

If A can forward to B, then the process that puts the data in A could put it in B directly and be done with it, the database on A doesn't need be involved.

Of course a dblink would let you do it, but why even bother, just have the process insert into B.


A reader, June 16, 2008 - 8:00 pm UTC


It is not me. The architecture was designed by someone else.
I think they decided to use http because of security and firewalls issue. The external client was not given access to log in and use sql and do it easy.

However, we do have other production application that transfers files from over 200 customer sites to one database using http. It has been working fine for years.

This is not a heavily used system (a few transaction per day) so we may not have any issues. The only issues that might pop out are error handling issues.

1. The reason the process can not insert directly into B is that "B" is not open for the WEB while "A" is.

I just tried saving a file from DB "A" to DB "B" over a link and it did work. for some reason, I thought you could not. Maybe because I tried to create a view in B for a table in A that has a CLOB and it did not allow it.

The only thing is that it copied the reference in the Documents table like
F1331/sample.xml. How can i get the whole file into that CLOB column in B.

2. The best way for me to describe the design is that "A" is like a post office.
It logs in all the incoming mail messages and then forward the data into "B". B will only one record with the latest information for a specific book.

Do you see any duplicate or normalization or design issues here.


A reader, August 28, 2008 - 8:46 am UTC

<<If A can forward to B, then the process that puts the data in A could put it in B directly and be done with it, the database on A doesn't need be involved. >>

"A" has a web server and open to the outside world and "B" does not. If you open up B did not you just open up a security hole.

The idea is let A receive, validate and process dta received and then decide to accept it (update B) or reject it.

2. The "http" was selected as a method because A is behind a firewall and they want to keep it flexible meaning if they relocated the server port 80 will always be open.

I can see that http is not really created for app to app communication and more of human to app communication. But, i think for low transaction systems it will work. IF you post some data and you did not get an ACK back you just keep sending the request again. You might end up with duplicate data in the DB.

Tom Kyte
August 29, 2008 - 10:26 pm UTC

... The idea is let A receive, validate and process dta received and then decide to
accept it (update B) or reject it.

if that is the idea, then they may certainly code it, there will be no "magic" and that is what they were looking for.

And if A can forward to B, then B is accessible via A and A is accessible via the internet and hence B is too.

Typically if you have:

internet <-> A <- B

where internet is outside, A is in a DMZ (in and out of the corporate system) and B is in the corporate system - then A cannot forward to B.

For example, I can from within Oracle log on to asktom.

I cannot on asktom do much of anything into corporate - it is a one way connect capability from inside to the DMZ. If asktom could initiate a conversation to inside of corporate, then it would be a place to launch an attack - and it would be attackable. So, if they have A can forward to B, they have already lapsed security.

... You might end up with duplicate data in the DB.

you will, not might - always replace "might" with "will" when you write a sentence like that.

"In noarchivelog mode you MIGHT lose all data since your last full backup someday"
that is wrong, it is correct to say:
"In noarchivelog mode you WILL lose all data since your last full backup someday"


A reader, September 09, 2008 - 10:52 pm UTC


1. why do you think http is not reliable? many systems use that as a way for communications and file transfer these days. is it because it is stateless and you prefer stateful for batch program to DB interface.

2. DB "A" has a web connector open (mod_plsql). DB "B' does not. Even though there is a connection from A to B, you can limit that access to one procedure. Most of B is still protected from the web access or any other access (tables or code)

is not this correct
Tom Kyte
September 11, 2008 - 10:36 am UTC

1) if you connect to a web server and use tcp to send a request over to the server... and then lose the connection, did it

a) work
b) not work

and how can you find out?

Has nothing to do with stateful/stateless - has to do with the fact it is just a wire protocol, it doesn't have any thing in it for transactional control, recovery etc - OTHER PROTOCOLS written on top of it DO

2) the network path is set up, that means machine 'a' can talk to machine 'b', which means if I compromise 'a', I can readily compromise 'b' - it is already insecure. forget about the database, the network rules are in place that allow 'b' to be accessed already.

db design

A reader, September 13, 2008 - 1:45 pm UTC


1. I would not know if it worked or not unless i get an ACK from the other side.
In this case i would post the transaction again.

or you can send a "message sequence number" and after your connection fails, you query the DB for that seq number. if it is in DB then it worked. if it is not then it did not.

sort of like sending regular mail. you have no idea that the other party received your envelope unless you request an ACK signature back but most mail is done without that. if you did not receive the confirmation back you send another copy of the letter.

Here is how the interface works:
- Batch program sends a message to DB, asking "which book i need to process". DB responds back with a list of books. Batch program does its work for each book and then sends a status update to DB to tell it "i completed the work for this book".

There are two things that can happen now:
a) DB status was "successfully" updated and flag for work complete was reset to OFF. In this case, the batch client does not care about seeing an ACK because the next time it will send the query message the book will not listed.
b) DB status was not updated (some failure happened) and flag for book stayed ON. In this case the batch will pick up the book again on the next run after a few minutes and process again. If it fails again you log that issue into a text error file or mark it in DB and some manual intervention has to be made.

2. What common protocols are usually used for batch program to DB interfaces? there are millions of transactions occuring every day over tcp/http.

you told me once to have client program connect directly to DB via Perl DBI interface and run the sql. But the same scenario could happen, batch does it work and the sql fails or connection to DB fails, do i stop batch processing.

3. You are saying that in order to have complete security "server B" should be on separate network of "server A" or if they are on same netowrk there should be no db link.

but i mean cant you have them both on same network, and apply all your security stuff: db accounts, firewalls, etc.most web setups you have a web server outside the firewall that talks to the DB machine inside the firewall. They still can talk to each other.
Tom Kyte
September 16, 2008 - 9:41 pm UTC

and what you just started laying out there is.......

a protocol

which you build on top of http

to which both ends of the conversation have to agree on

and support the feature/functions thereof.

Eg: you just built web services - ta-dah.

So, if you like reinventing already made industry standard API's that if you used instead of growing your own - you'd be ready to interface to other things in the future - hey, go for it (you know - smk - you and I have *had* this discussion so many times, so so so many times)

People do not in general use http for 'batch', in fact, I don't really know of anyone. That is like asking "who is using tcp/ip for batch".

I'm going to give up responding to this one - we (you and I directly) have had this discussion over and over and over again. The answer will always - ALWAYS be "http is not a transactional protocol, you would have to build something on top of it - but then IT IS NOT HTTP ANYMORE, it would be your own proprietary protocol. Fine, go for it, bad idea given that there are standards and software libraries to do it better already - but *go for it*"

2) you have the ability to catch, detect, log, handle errors. There is a big difference between using a documented database api and fling URLs - unless and until you build a big API around your URL flinging and by then - you know what? You've just built yourself an API - it is *not* http anymore. http is just the transport (like the US mail - I do have certified mail, I can track mail, I get mail back if it bounces, etc.... they are all higher level protocols on top of the standard "here is an envolope and a stamp").

3) most people - assuming the layout is:

internet <-> server A ..... server B

would have:

internet <-> server A <- server B

You want to have

internet <-> server A <-> server B

that is, B can initiate a conversation to A, but A cannot initiate to B - because you see, A is exposed - it is a stepping stone to B.

I've said it over and over - exactly the same way - to you - on this site...

Most people would not do that - making A a stepping stone is making A a very weak link.

Reader, September 29, 2008 - 4:37 pm UTC

At the NYOUG conference held on Sept 9th 2008, you showed some examples of date field being stored as string/number. Can you please let me know if that presentation slide is on your website?

Tom Kyte
September 29, 2008 - 5:08 pm UTC

click on files tab and look for (2nd page currently if you sort by created on)

table design

sam, September 29, 2008 - 9:26 pm UTC


If you are creating an app for application user password management and you want to store the values of these attributes to be used by the authenticating fucntions,

would you create one table with 6 columns (password_life_time number(3),password_grace_time number(2), ...)
and a row for the values

OR you would create a two column table (password_variable, password_value) and enter those as variables.


Which is better design?
Tom Kyte
September 29, 2008 - 10:07 pm UTC

I wouldn't - I'd use the builtin capabilities.

You are free to model your data however best meets your needs.

I would not use an EAV (entity attribute value) model when I knew what attributes I wanted - they belong as columns.

db design

A reader, September 30, 2008 - 11:57 am UTC


yes i will use the builtin if we migrate to one db account per user. otherwise, if we continue with application accounts (one big application model) i have to do something similar to what DB has. I know it is reinventing the wheel.

<I would not use an EAV (entity attribute value) model when I knew what attributes I wanted - they belong as columns>

But I did not quite understand your comment. I think you are suggesting to have a table column for each. But, are not attributes always known in both cases. Even if you create "variables" you know these can be columns.

I am trying to see the cases where you use variable/value versus column.

Tom Kyte
September 30, 2008 - 1:03 pm UTC

... I am trying to see the cases where you use variable/value versus column. ...

I don't do that, that is what I'm saying.

do not use the EAV for storing what properly belongs in columns - you asked

would you create one table with 6 columns (password_life_time number(3),password_grace_time
number(2), ...)
and a row for the values

OR you would create a two column table (password_variable, password_value) and enter those as

and I said "COLUMNS" - not the EAV stuff.


A reader, October 06, 2008 - 9:45 pm UTC


Does this mean that you never use the EAV structure in your databases.

For example , to store different system control variables (i.e. number of rows displayed on a screen, no of days to display records within) you always create a table and have one column per attribute
Tom Kyte
October 07, 2008 - 1:15 pm UTC

I did it once, when I myself invented it (all developers invent this model, over and over and over and over again..... It is just "so cool")

It was almost 20 years ago...

for something as trivial as what you state - system control variables - it would be

a) ok, since it would not really affect overall performace
b) a bad idea since your system control parameters are well known at compile time - you do not add them on the fly - your program is EXPECTING them to be there.

So here - it would "work", but "so what" - why would it be beneficial?

Do I use EAV? Yes, every day. I use application contexts - so I do. But, for my real data? For stuff I persist? No, I do not. I use attributes (flex attributes if needed) so I can index and substr/cast in views to get the right types/lengths and what not...

Design question

just reader, December 17, 2008 - 1:15 pm UTC


I am trying to redesign data model and I need you advice on how to better approach it.

Table names are obviously fictitious but semantics the same. I am trying to merge mgr and emp tables since all other attributes are identical, without breaking many-to-many relationship.

The relationship has to be maintained: one employee can report to many managers and manger can have many employees.
Could you hint me on what is the better way to accomplish this? I am pretty sure you've seen this before but I never.

create table emp (empid number primary key, empname varchar2(10));
create table mgr (mgrid number primary key, mrgname varchar2(10));
create table emp_mgr (empid number, mgrid number);

insert into emp values ( 1, 'john' );
insert into emp values ( 2, 'kevin' );
insert into emp values ( 3, 'david' );

insert into mgr values ( 1, 'king' );
insert into mgr values ( 2, 'alex' );

insert into emp_mgr ( empid, mgrid ) values ( 1, 1 );
insert into emp_mgr ( empid, mgrid ) values ( 1, 2 );
insert into emp_mgr ( empid, mgrid ) values ( 2, 2 );
insert into emp_mgr ( empid, mgrid ) values ( 3, 2 );
insert into emp_mgr ( empid, mgrid ) values ( 3, 1 );
Tom Kyte
December 29, 2008 - 12:30 pm UTC

Since managers are employee's - it doesn't make sense to me to have a separate manager table.
create table emp
( empid number primary key, 
  .... whatever ...

create table emp_mgr
( empid references emp,
  mgrid references emp,
  primary key(empid,mgrid)

re-Design question

daily reader, December 18, 2008 - 11:54 am UTC


Sorry, I noticed type in my above posting in the create mgr table command. Here is the correct one.

create table mgr (mgrid number primary key, mgrname varchar2(10));

Table design question

Salman Syed, February 05, 2009 - 4:09 pm UTC


Here is a skimmed down version of the problem I am facing.

There is an entity opportunity that is associated with one and only one process. A process can have 3-6 phases.

So we have three tables

1. Opportunity (opportunity_id, <attributes>, process_id)
2. Process (process_id, process_name)
3. Process_phase (process_phase_id, phase_name, phase_number)

We have to record the progress of each opportunity as to what the status of each phase is for the process that the opportunity is tied to. Each phase can have one of 4 statuses ('C', 'P', 'N', 'U')

One way would be to have a table Opportunity_process_phase which has the following columns:

(Opportunity_id, process_phase_id, completion_status). It will be an IOT with opportunity_id and process_phase_id as primary key.

The second option would be to have extra columns in the opportunity table (phase_1_status, phase_2_status, phase_3_status, phase_4_status, phase_5_status, phase_6_status) and just store the information over there.

Most of the queries will want to get the phase completion status data as a column so if we store is in a separate table the queries will require pivoting.

Which approach is better in your opinion? Or is there another approach you would recommend?
Tom Kyte
February 05, 2009 - 4:46 pm UTC

something is missing to tie process with process_phase isn't there?

today a process has 3-6 phases
tomorrow it will be seven.

I like the parent/child relationship

Performance issue

Salman Syed, February 05, 2009 - 7:06 pm UTC


My bad. There should have been a process_id column in the Process_phase table description.

Ok that is what I also have. However, I am facing some performance problems. The most frequently asked question regarding is give me a list of opportunities and along with each opportunity, show me what is the status of each phase for that opportunity.

So, we have to join opportunity to the opportunity_process_phase table and then pivot. Instead of pivoting, one option I am thinking is creating a Function-based index to give a comma separated list of statuses and use that on the front-end. That will eliminate the join as well as make query writing much simpler.

What is your opinion about that approach?

Tom Kyte
February 05, 2009 - 8:11 pm UTC

you cannot do a function based index.

A function based index must be on a deterministic function, yours would NOT be.

Are you solving a problem that is "hypothetical". Why are you so afraid of a join? Joins are what RDBMS's do for a living.

Have you actually benchmarked this - you are talking about a join from opp to opp_phase_process (an IOT, 3 maybe 4 IO's to pick up all fo the related data)

db design

A reader, February 11, 2009 - 11:33 am UTC


Would one table design make sense to you in this case and what problems you see in it.

Vendor A will make shipments and key it into a web form (or send an XML file).
Vendor B will receive the shipment and log it in as Received and/or enter a comment about bad CD/DVD.
Vendor A will send an invoice for that shipment.

Customer wants to see the shipment, receipt and invoice in one table for easier access and approve or disaaprove payment.
I see each as different entity and requires a separate table (even if it is one to one).
A view would serve his purpose for reporting and viewing.
Tom Kyte
February 11, 2009 - 1:11 pm UTC

customer wants an application to display and interact with data.

customer doesn't care about the tables.

you, as a developer, should implement the physical design that meets the requirements.


A reader, February 11, 2009 - 4:37 pm UTC


Actually they want to use their ad hoc query tool (not an application) to query the data. Export it to a spreadsheet and do their calculations.

does the one table design still make it valid.
Tom Kyte
February 12, 2009 - 10:49 am UTC

that is what a view would be used for then.

always do the design right, if you have the right physical design - any question the end users would want to ask will be answerable.

I'm not saying this should use N tables or 1 table - frankly, I didn't look at the system you are building - I'm just saying:

no matter what they say, you will use the correct physical schema design. You can, from that physical schema, materialize any data they desire - data that is clean, correct and has data integrity.


A reader, February 16, 2009 - 2:44 pm UTC


If you have a columnar spreadsheet
(1)Agency (2)Product (3)Ship Date (4)No of Items (5)ship from (6)ship to (7)Ship method (8)BOL No (9) shipment comment
(9)receipt date (10) Receipt comment (11) invoice no (12)invoice date (13) invoice amount

Supplier fills out 1- 8 -(High level data for a shipment)
Production fills out 9 - 10 - (Receipt data)
Customer fills out 12-13 - (Invoice data)

The need is simple. Just basic insert and update of data so customer can see shipments made and validate receipts before payments are made.

if you have one to one mapping would you go with one table for all?
If you have one-to-many between would you go with 3 tables: shipment, receipt, and invoice?
Tom Kyte
February 16, 2009 - 5:22 pm UTC

smk, how many designs have I done for you personally..

I've written many times - if you have a series of 1-1 relationships, you have a single table. Breaking it up into multiple tables is not useful.

I think that you and I have actually had this discussion, about breaking a bunch of 1-1 relationships up and my thoughts that "that is a bad idea, single table good"

Now that said, there are lots of other missing tables in this 'design', like the customer table, the supplier table, other lookups and such.

As for "If you have one-to-many between would you go with 3 tables: shipment, receipt,
and invoice?"

well, you didn't tell me who is one to many to whom - it could be they are all one to many with eachother and hence there would be more than three tables. Insufficient data

But please - don't ask unless there is a one to many...

db design

sam, February 22, 2009 - 7:06 pm UTC


All i need to track is that a supplier made a shipment of products and the other vendor received it without any problems and in full.

The supplier now ships one product "DVD-1" size (1 G bytes) but there "might" be additional
different sizes later on (DVD-2, DVD-3).

I need to track the shipment master data (i.e ship date, ship from, ship via)
and product details (product_Id, Qty).

A production vendor that copies the files to blank DVDs will receive the shipment and enter date received, comments
and quantity received for each shipment made. Mostly it is one to one (a shipment will be received once)
but there is a possibility a shipment made may get received on two different dates.

Supplier can enter he shipped 1000 DVD-1 today on a web page. Production vendor goes to the web page and mark those received when he receives it with some comments. that is all what i need to know However, he may get 600 today and 400 tomorrow so he may have two receipts

Since the requirement is to come up with simple design, and some columnar screen to be used for supplier and vendor is there a way other than creating SEVEN tables. One table (spreadsheet design) will result in redundant data.

is this how you would design the DB?

Shipment_Master(shipment_id,shipment_date, shipped_via,BOL_NO, comments)
Suppliers(suppliar_id, supplier_name,...)
Products(product_id, name,...)

Would you also do some kind of columar screen (like a spreadsheet) or form layout.



A reader, March 10, 2009 - 11:32 pm UTC

I have two tables and one job. A video can have same number in different media.

When a video is assigned to a vendor, a record is created in VPROD with stage "PR".
Then when the vendor ships it he updates the ship date of that record.

The oracle job runs every monday at 7:00 PM. It meets a requirement to generate a text report of all the videos that were shipped in the previous week by the vendor.

select vidno from video a, vprod b where a.vidno=b.vidno and media = 'DVD' and b.stage='CP'
and ship_date between sysdate-7 and sysdate



Problem happens when a vendor enters shipment dates for videos he shipped last week after the job runs on monday (between 8 PM and 12 PM or the next few days for shipment dates of last week)
These will be never be included in the shipped report generated monday. What is the best way to get around this?
I am thinking of creating a flag per media in VIDEO for each media and marking this when it is printed in the report using UTL_FILE.

Then changing the query to this - is this a good idea or there are better ways.

select vidno from video a, vprod b where a.vidno=b.vidno and media = 'DVD' and b.stage='CP'

Facts table with invalidable data

Eduard, May 19, 2009 - 1:53 pm UTC

Hello Tom,

First of all thanks for your patience!

I was thinking in how we can design a database with a facts table by date where we calculate some values that could be invalid because of master data changes.

Keeping track of the master data changes is the first point. But then how can we control which data was calculated before the master data change and after and why the value has changed, etc...

The facts table would be used as the source for some drill-down reports (probably using materialized views)

Something like best practices for maintaining facts table with invalidable data...

For example let's focus in the next scenario.

As master data we have InvoiceItems, Products, Customers and Conditions.

-An InvoiceItem has an InvoiceItemId, a ProductId, a CustomerId, a DueDate, and a GrossValue.

-A Condition has a PercentageToApply, DateFrom, DateThru, a set of product ids and a set of customer ids

We want to calculate incurred amounts (facts table).

BatchId, DueDate, ProductId, CustomerId, ConditionId, InvoiceItemId, IncurredAmount

For example if the input data is:

InvoiceItemId, ProductId, CustomerId, DueDate, GrossValue
Inv1, PrdA, CustA, 17/05/2009, 200$
Inv2, PrdA, CustB, 18/05/2009, 300$

ConditionId, PctToApply, DateFrom, DateThru, set of product ids, set of customer ids
Cond1, 3%, 01/05/2009, 15/05/2009, {PrdA, PrdB], {CustA, CustB}
Cond2, 5%, 15/05/2009, 31/05/2009, {PrdA, PrdB], {CustA}

The output data (facts table) would be:

BatchId, DueDate, ProductId, CustomerId, ConditionId, InvoiceItemId, IncurredAmount
Batch1, 17/05/2009, PrdA, CustA, Cond2, Inv1, 10$

If each day we receive new InvoiceItems, and new Conditions we will keep adding rows in our IncurredAmounts table.

But what can we do if our existing Conditions could be modified, for example the percentage or the date from? Let's imagine Products and Customers have a parent-child hierachy by date, and a single Condition could apply to a whole hierarchy, hierarchies could change, etc...

We would like to create some drilldown reports (maybe using materialized views) using IncurredAmounts data. The reports should take into account the date. I mean it must show the data (and the hierarchies) as it was in this date. Not the current values.

The question is how can we design our tables and processes in order to avoid recalculating too many rows?



A reader, October 29, 2009 - 10:26 pm UTC


I have a master table BOOK with a derived Y/N column S1_approved. Intially if no review was done it is set to NULL. if review is approved it is set to "Y" and if review rejects it is set to "N".

One query Is to list all books that are waiting for S2 approval which is
select * from book where S1_Approved = 'Y' and (S2_approved is null or S2_approved = 'N')

another query is to list books that are waiting for S1 approval which is
select * from book where s1_approved is null or S1_approved = 'N'

Is it normal to use the value NULL since NULLS are not indexed and the above query might do a full scan.

The table is new so there is not a whole lot of records yet but it expect a 500 records per month.

What is a good common practice for this.
Would you use "U" instead of NULL or default it to "N" and only have Y or N. My understanding an index wont help anyway since the column has low cardinality.

bkno number(10) PK,
title vachar2(100),
author varchar2(30),
s1_Approved varchar2(1)
s2_approved varchar2(1)

Tom Kyte
October 30, 2009 - 10:11 am UTC

... Is it normal to use the value NULL since NULLS are not indexed and the above
query might do a full scan ...

bzzzt wrong, very wrong. totally wrong.

but anyway, instead of

select * from book where s1_approved is null or S1_approved = 'N'

you would

ops$tkyte%ORA10GR2> create table t ( x varchar2(1), data char(200) );

Table created.

ops$tkyte%ORA10GR2> insert into t
  2  select decode( mod(rownum,100), 0, 'N', 1, null, 'Y' ), 'x'
  3    from all_objects;

50375 rows created.

ops$tkyte%ORA10GR2> create index t_idx on t(x,0);

Index created.

ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T', method_opt => 'for columns x size 254' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> set autotrace traceonly explain
ops$tkyte%ORA10GR2> select * from t where x is null union all select * from t where x = 'N';

Execution Plan
Plan hash value: 708709259

| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT             |       |   992 |   197K|    55  (42)| 00:00:01 |
|   1 |  UNION-ALL                   |       |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T     |   575 |   114K|    32   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | T_IDX |   575 |       |     3   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| T     |   417 | 85068 |    23   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | T_IDX |   417 |       |     2   (0)| 00:00:01 |

Predicate Information (identified by operation id):

   3 - access("X" IS NULL)
   5 - access("X"='N')

ops$tkyte%ORA10GR2> set autotrace off

A reader, October 30, 2009 - 10:33 am UTC

Excellent Reply !!!

data sharding

James Su, October 30, 2009 - 4:39 pm UTC

Hi Tom,

There's a question in your blog:

Wollaton DBA said....
Hi Tom,

I came across this web site this weekend which details architectures for scaling web 2.0 social network web sites. A common practice in scaling the backend appears to be use "Data sharding": Is this one worse design practices being rolled out right now ?.

As soon as I saw this, I thought, I'd love to get Tom Kyte's view on this as he is always quite passionate about avoiding the take advantage of nothing approach when using databases.

Off the top of my head it appears to me that this falls apart if you need to aggregate data above the 'shard' layer. It also seems to me that there is the potential for reinventing all the good stuff that is available in most database offerings in the application tier.
When most of the major players in the RDBMS market have spent the best part of the last 20 years or so refining and honing their products, why would anyone think they could do a better job on one project ?.

I look forward to getting your take on this.


Mon Jan 19, 07:15:00 AM EST

What is your opinion about data sharding? Thanks.
Tom Kyte
November 09, 2009 - 11:46 am UTC

... this falls apart if you need to
aggregate data above the 'shard' layer...

correct, it works well for a very specific type of application - and not much above that.

.. It also seems to me that there is the
potential for reinventing all the good stuff that is available in most database
offerings in the application tier. ...

again, true.

... why would anyone
think they could do a better job on one project ?.

for a very specific type of application, on the level of a facebook, certain parts of yahoo, a shopping cart at amazon - they are able to do well without an rdbms. However, what do you think even these systems use when you get to ordering a product, using your credit card to perform a transaction and so on? To pay their people? To manage them?

There is more than one tool out there because there is more than one job. And most people are not building facebook or twitter (thank goodness, hope some of them go the way of myspace - oblivion - but some of them are neat like reddit)

Unless and until I had the problem (of a facebook/twitter - where the data isn't mission critical and very simple), and I needed things like transactions, concurrency, security, flexibility and all - I wouldn't really be looking at them.

11g rocks

Mike Kutz, November 03, 2009 - 9:58 am UTC

Oracle 11g should be able to do it better.

Hash Partitioning on "master" table spreads out the data.
REF Partitioning on the "detail" tables groups the relevent data together.
ASM comes to play in here somewhere
RAC for load balancing
RAC for redundancy
RAC for easy "add-a-node to increase performance"
RMAN for hot-backups of the data
SET AUTOTRACE ON to figure out where your serialization is.

and you get:
the ability to do one query to pull out all the data across all "shards".
very good uptime
and the ability to sleep at night.

application dilemna.. - need advice.

A reader, February 18, 2010 - 3:43 am UTC

hi tom,

i am on 10g / linux.

- my current schema design are serving a few different providers.
- each providers have different requirements needs

its is a tough job generalizing the whole schema and yet serve each individual differences among the

Now i have this problem,
when we provision a "customer" into the system,
a few tables must be filled and several procedures will be invoked.

Procedure#1 (para 1, para2, para3)

Procdure#2 (para 4 , para 5 , para 6)

Procedure#3 (para 8, para 9, para 10 etc etc)

etc etc

Some providers will required invokcation of
P#1 and P#2

and some providers will required invokcation of
P#1 and p#3

some will need p#1,2,3

Currently the invoking of this procedures are done externally by WEB.
However, i am trying to consolidate this provisioning of subscriber process into the database..

So i thought -- create a main procedure for each provider..



but i realize, at the end of the day, i would have tons of provision procedures for each different providers..

So i thought i do this..

-- looks ok

but then again, and if i were to use the earlier design, i would have to cramp all the parameters together will be so long ..

PROVISION_USER_PROVIDER(PROVIDER_ID,para1,para2,para3,para4,para5,para6,para7,para8,para9) -- So long~
@p1 (para1,para2,para3)
@p2 (para4,para5,para6)
@p3 (para7,para8,para9)

q1) ever encounter this ? whats your resolution !

help gurus!

Tom Kyte
February 18, 2010 - 9:35 am UTC

use named parameter notation

create procedure abc( provider_id in ...,
p1 in ... := null,
p2 in ... := null,
p3 in ... := null,
p9 in ... := null )
if provider_id in ('xxx','yyy')
   proc1( p1, p2, p3 );
   proc2( p4, p5, p6 );
end if;
if provider_id in ('yyy')
   proc3( p7, p8, p9 );
end if;

and call it:

abc( 'xxx', p1=>?, p2=>?, p3 => ?, p4 => ?, p5 => ? p6 => ? );

abc( 'yyy', p1=>?, p2=>?, p3 => ?, p4 => ?, p5 => ? p6 => ?, p7 => ?, p8 => ?, p9 => );

and so on.

I frankly don't care how "long" (how many) the inputs are to a procedure. What does that have to do with doing what you have to do?

application dilemna.. - need advice.

A reader, February 18, 2010 - 11:34 pm UTC

hi tom,

it matters to my IT manager alot. because i have to admit it
(long parameter list) look taunting.. and confusing sometimes..

but nevertheless, i will heed ur advice.

one instance

A reader, March 15, 2010 - 11:27 am UTC


Did you write any articles here or in your books on the pros/cons of having one database versus several databases.

what situations would you consider separating instance or workload/performance reasons or different applications like HR and payroll or website. You just stick everything into one DB as separate schemas all the time.

We already have 3-4 9i databases on same server with links in between. Someone brought the issue of merging then in the 11g migration some but that may require a lot of code changes (removing link) and testing.

Tom Kyte
March 15, 2010 - 11:38 am UTC

... Did you write any articles here or in your books on the pros/cons of having one
database versus several databases. ...

I write that all of the time.

If you want more than one production database on a single machine, you should configure multiple hosts for that single machine (virtualization)

How about this DESIGN?

Parthiban Nagarajan, October 22, 2010 - 4:33 am UTC

Hi Tom

May I have your comments about the following database analysis?
What do you think about this kind of meta data design?
Will it perform better on long running?

SQL> col data_type format a20
SQL> col col_name format a30
SQL> col column_name format a30
SQL> set pages 40
SQL> select char_length,count(*)
  2    from user_tab_columns
  3   where data_type='VARCHAR2'
  4     and table_name like 'SI%T'
  5   group by char_length
  6   order by 1 desc;

----------- ----------
       4000       9775
       2000          2
       1000         45
        500          8
        300         14
        255          3
        200          1
        120          6
        100       1261
         80          8
         60          2
         50          4
         40          2
         30         10
         20          3
         10          5
          6          1
          1       1269

18 rows selected.

SQL> select count(*) from user_tables;


SQL> select count(*) from user_tables where translate(table_name, 'A0123456789_', 'A')  = 'SIT';


SQL> select count(*) from user_tables where translate(table_name, 'A0123456789_', 'A') != 'SIT';


SQL> select constraint_type, count(*)
  2    from user_constraints
  3   group by constraint_type;

C   COUNT(*)
- ----------
R         11
U         25
P         62
C        316

SQL> select count(*) from user_tables
  2   where not exists (select null from user_constraints
  3                      where table_name = user_tables.table_name);


SQL> select column_name, data_type, data_length, count(*)
  2    from user_tab_columns
  3   where column_name = 'XML'
  4      or column_name = 'COMMENTS'
  5   group by column_name, data_type, data_length;

COLUMN_NAME                    DATA_TYPE            DATA_LENGTH   COUNT(*)
------------------------------ -------------------- ----------- ----------
XML                            VARCHAR2                    4000        307
COMMENTS                       VARCHAR2                    2000          2
XML                            VARCHAR2                    2000          2
COMMENTS                       CLOB                        4000          8
COMMENTS                       VARCHAR2                    1200          3
XML                            VARCHAR2                    1200          2
COMMENTS                       VARCHAR2                    4000        359
XML                            CLOB                        4000          3

8 rows selected.

SQL> select column_name, count(*)
  2    from user_tab_columns
  3   group by column_name
  4  having count(*) > 2500
  5   order by 2 desc;

COLUMN_NAME                      COUNT(*)
------------------------------ ----------
CREATED_BY                           3213
CREATION_DATE                        3209
METRIC_ID                            2928
PROCESS_INSTANCE_ID                  2917
INSTANCE_ID                          2847
LATEST_FLAG                          2801
INSTANCE_REC_NUM                     2798
PROCESS_FLOW_STATUS                  2792
METRIC_COMPLETED_DATE                2791
METRIC_RUN_DATE                      2791
MULTIROW_GROUP_NAME                  2790
ELECTRONICALLY_SIGNED                2790
MULTIROW_REGION_ID                   2788

13 rows selected.

SQL> select translate(column_name, 'A0123456789', 'A') col_name, data_type, count(*)
  2    from user_tab_columns
  3   where data_type   = 'VARCHAR2'
  4     and data_length = 4000
  5     and (column_name like '%CUSTOM%'
  6      or column_name like '%COL%'
  7      or column_name like '%NULL%'
  8      or column_name like '%DUMMY%')
  9     and column_name not like '%COLOR%'
 10   group by translate(column_name, 'A0123456789', 'A'), data_type
 11  having count(*) >= 15
 12   order by 3 desc;

COL_NAME                       DATA_TYPE              COUNT(*)
------------------------------ -------------------- ----------
DUMMY                          VARCHAR2                   1802
COL                            VARCHAR2                   1154
CUSTOM_FIELD                   VARCHAR2                    873
CUSTOM_TEXT                    VARCHAR2                    698
CUSTOM_COL                     VARCHAR2                    141
NULL                           VARCHAR2                    106
CUST_COL                       VARCHAR2                    103
DUMMY_VAR                      VARCHAR2                     70
CUSTOM_COLUMN                  VARCHAR2                     58
CUSTOM_ISSUE                   VARCHAR2                     55
CUSTOM_TEST_RESULTS            VARCHAR2                     55
CUSTOM_ACTION                  VARCHAR2                     55
CUSTOM_INFORMATION_LOV         VARCHAR2                     48
CUSTOM_COLUMN_                 VARCHAR2                     42
CUSTOM                         VARCHAR2                     40
COLUMN                         VARCHAR2                     38
DUMMY_VARCHAR                  VARCHAR2                     30
CUSTOM_ISSUES_MR               VARCHAR2                     24
CUSTOM_FIELD_                  VARCHAR2                     24
COMMENTS_COLUMN                VARCHAR2                     23
TEST_CUSTOM                    VARCHAR2                     21
CUSTOM_MR                      VARCHAR2                     20
AE_CUSTOM                      VARCHAR2                     18
CUSTOM_DUMMY                   VARCHAR2                     18
ACTION_ITEM_CUSTOM_FREE_TEXT   VARCHAR2                     18
DUMMY_                         VARCHAR2                     16

27 rows selected.

SQL> select table_name, column_name
  2    from user_tab_columns
  3    join user_tables
  4   using (table_name)
  5   where instr(column_name, 'SQL') <> 0;

TABLE_NAME                     COLUMN_NAME
------------------------------ ------------------------------
SI_METRICS_T                   METRIC_SQL
SI_100135_T                    INFOLET_SQL
SI_100133_T                    SQL_TEXT
SI_100131_T                    SQL_HASH
SI_100115_T                    METRIC_SQL
SI_100102_T                    SQL_STATEMENT

8 rows selected.

SQL> select count(*), count(metric_sql),
  2         sum(case when instr(lower(metric_sql), 'select') <> 0 then 1 else 0 end) as select_cnt,
  3      sum(case when instr(lower(metric_sql), 'from'  ) <> 0 then 1 else 0 end) as from_cnt,
  4      sum(case when instr(lower(metric_sql), 'where' ) <> 0 then 1 else 0 end) as where_cnt
  5    from si_metrics_t;

---------- ----------------- ---------- ---------- ----------
      1388              1111       1111       1111        838

SQL> select column_name, count(*)
  2    from user_indexes
  3    join user_ind_columns
  4   using (index_name)
  5   group by column_name
  6  having count(*) > 30;

COLUMN_NAME                      COUNT(*)
------------------------------ ----------
PROCESS_INSTANCE_ID                  1269
INSTANCE_ID                          2519
METRIC_COMPLETED_DATE                1209

SQL> select substr(argument_name,instr(argument_name, '_')+1) col_name, data_type, in_out, count(*)
  2    from user_arguments
  3   where instr(argument_name, 'ERROR') <> 0
  4   group by substr(argument_name,instr(argument_name, '_')+1), data_type, in_out
  5   order by 3, 4;

COL_NAME                       DATA_TYPE            IN_OUT      COUNT(*)
------------------------------ -------------------- --------- ----------
DB_ERROR_CODE                  NUMBER               IN                 1
ERROR_TYPE                     VARCHAR2             IN                 1
ERROR_TEXT                     VARCHAR2             IN                 1
ERROR_HANDLE_ID                NUMBER               IN                 1
ERROR_SOURCE_ID                NUMBER               IN                 1
ERROR_NAME                     VARCHAR2             IN                 1
ERROR_SOURCE_NAME              VARCHAR2             IN                 1
ERROR_TIME                     DATE                 IN                 2
ERROR_CODE                     NUMBER               IN                 3
ERROR_MESSAGE                  VARCHAR2             IN                 5
ERROR_TYPE                     VARCHAR2             IN/OUT             1
ERROR_SEQUENCE                 NUMBER               IN/OUT             1
ERROR_DETAILS                  VARCHAR2             IN/OUT             2
ERROR_CODE                     VARCHAR2             IN/OUT            77
ERROR_SEQ                      NUMBER               IN/OUT           101
ERROR_HANDLE_ID                NUMBER               IN/OUT           103
ERROR_CODE                     NUMBER               IN/OUT           658
ERROR_MESSAGE                  VARCHAR2             IN/OUT           732
ERROR_NUMBER                   NUMBER               OUT                1
ERROR_TEXT                     VARCHAR2             OUT                4
ERROR_MSG                      VARCHAR2             OUT                5
ERROR_HANDLE_ID                NUMBER               OUT               13
ERROR_CODE                     VARCHAR2             OUT               17
ERROR_MESSAGE                  VARCHAR2             OUT              636
ERROR_CODE                     NUMBER               OUT              680

25 rows selected.

SQL> spool off;

Tom Kyte
October 25, 2010 - 5:23 pm UTC

so most string attributes are varchar2(4000), seems highly unlikely to be true. Most likely a sign that no one gave any thought to what the underlying data really meant.

the table names are all meaningless, SI_A54153124_T or something like that. Not nice.

1900 tables, 62 primary keys... As Spock would say, fascinating.

lots of XML columns, implying no thought to storing data again.

27 column names for 1900 tables. The most popular being "dummy" (hah, I see a joke hidden in there)

sql stored in a table, ugly.

I'd be afraid. I cannot comment about performance since we don't know what the schema really looks like and what the system needs to do

but I do know I'd not really want to touch the schema as it exists - it would appear that not much thought, if any, was given to the design.

RE: How about this DESIGN?

Parthiban Nagarajan, October 28, 2010 - 3:05 am UTC

Hi Tom

Thanks for your comments

(1) You wonder about VARCHAR2(4000) columns.
I just wonder about other VARCHAR2 columns.
Because, we use to create the tables with an internal tool ( a Java desktop application ).
It has no provision to mention the sizes of the columns.
The tool is created with Flexibility in mind, I guess.
Today, the user may enter only a few chars.
Think tomorrow. What if the user enters thousands of chars?!!
Same to the NUMBER datatype too.
Please see the following.
    SQL> select data_length, data_precision, data_scale, nullable, count(*)
      2    from user_tab_columns
      3   where data_type = 'NUMBER'
      4   group by data_length, data_precision, data_scale, nullable
      5   order by count(*) desc;

    ----------- -------------- ---------- - ----------
             22                           Y      36251
             22              4          0 Y        223
             22                           N        173
             22              4          1 Y         95
             22             20          0 Y         31
             22              3          0 Y         26
             22              1          0 Y         23
             22              2          0 Y         20
             22             10          0 Y         17
             22                         0 Y         15
             22              5          0 Y          6
             22             38          0 Y          3
             22              6          0 Y          2
             22             10          3 Y          2
             22             10          0 N          2
             22             22          0 Y          2
             22             16          0 Y          2

    17 rows selected.

You might wonder about the 1st record.
I wonder about the rest of the records. :)

(2) Yes. You are correct, well, almost.
The table_names look like SI_1xxxxx_T
The table creating tool generates the table_names with a sequence starting from 100000.
But, we do not refer those table_names in the stored procedures as it is.
We create a view on top of SI_1xxxxx_T tables giving a meaningful name (thank god, I say).
But that too not SELECT * FROM SI_1xxxxx_T; but SELECT <all_col_names> FROM SI_1xxxxx_T;
I should really thank them for not creating PUBLIC synonyms.
(In my previous company, we use to create PUBLIC Synonym for almost all database objects)

(3) No. We not only store the complete SQL in a table.
We do have SQL expressions (part of WHERE clause) to accompany the SQL statements in separate table.
    SQL> select substr(lower(ltrim(expression, '(')), 1, 5) as exp, count(*)
      2    from si_XXXXX_t  -- Just Masked the Table Name
      3   group by substr(lower(ltrim(expression, '(')), 1, 5)
      4  having count(*) >= 40
      5   order by 2 desc;

    EXP     COUNT(*)
    ----- ----------
    trim(        948
    trunc         40

And there, you just see the use of TRIM and TRUNC functions to make the CBO to ignore the indexes.

(4) I have also seen some columns storing Comma Separated Values.
For e.g. Each DEPTNO in DEPT table having all EMPNOs concatenated together with ','

(5) Yes. We should not tell anything about the performance without looking into the schema and production data.
But still, we do have some evidences for bad performance.
(a) VARCHAR2(4000)
(b) Usage of TRIM and TRUNC
(c) Few Constraints
(d) and many more ...

Aren't these things enough to predict?

Tom Kyte
October 28, 2010 - 12:09 pm UTC


Today, the user may enter only a few chars.
Think tomorrow. What if the user enters thousands of chars?!!

funny thing that.

We can EASILY alter the table and modify the column to make it larger to support the new requirement.

we CANNOT easily alter the table to assign the correct and proper datatype.

So, if you want flexibility - then use the right length - we can make it larger without any work at all later. We cannot however make it smaller very easily when we want to do it right.

2) the tables should have proper names as well - IF SOMEONE DESIGNED THEM - but I have a feeling no one designs anything in this schema, they just abuse the database and create whatever they feel like whenever they feel like.

Two years from now you'll be back asking me the famous question "how can I tell what tables are really being used". No design here.

3) you are storing SQL in the table, ugh, hate it. I don't know what your point is here. You asked me what I thought and I think it all looks pretty scarily bad.

4) hah, it just got worse.

5) this looks like a mess, yes.

RE: How about this DESIGN?

Parthiban Nagarajan, October 28, 2010 - 6:21 am UTC


I forgot to mention ...

Regarding point (3), developers are given the tool to create those expressions. So, when you choose a column COL which is of VARCHAR2 type, the tool automatically wraps with TRIM(COL)
Similarly, if the COL type is DATE, the tool automatically wraps with TRUNC(COL) ...

Thanks Tom :)

RE: How about this DESIGN?

David Weigel, October 29, 2010 - 7:17 am UTC

Time for a new tool, Parthiban. (And you ask what Tom thinks about the design -- I don't see any at all, actually.)

RE: How about this DESIGN?

Parthiban Nagarajan, November 04, 2010 - 12:56 am UTC

@David Weigel

... about the design -- I don't see any at all, actually ...

That is the Cool Comment ... ;) I like it ...
And regarding the tool - It is invented for assisting the developers. Though it does its purpose in some way, it actually restricts the developers, I think.
And for the development of such tool, it seems that there is no involvement of a database savvy developer.

Generic design question

Geoff Duniam, March 21, 2011 - 4:40 am UTC

Hi Tom,

We're modeling an MDM system - one logical entity (Saleable Inventory) has three types - Business type, Business Function and Expansion term. All share the same (or in future, similar) attributes, so they're just type instances of the same entity. However there are hierarchical relationships between these types - all business functions are grouped under a parent business type, and there is a equivalence relationship between Expansion terms and Business types.

Here's the question - because I'd like to have the database enforce these relationships - without triggers - I wanted your advice on whether or not to physically model the sub-types separately and enforce the hierarchical and equivalence relationships with PK-FK referential integrity, or is this overkill and all these subtypes should be modelled as one physical entity? Personally, I'd prefer to have the database enforce the relationships, but I was just wondering what your advice would be.

As always, great site, thanks for all your help.

Tom Kyte
March 21, 2011 - 10:08 am UTC

... However there are hierarchical relationships between these
types - all business functions are grouped under a parent business type, and
there is a equivalence relationship between Expansion terms and Business types. ...

Not sure what that means - can you explain that a bit better, what would the logical model look like


Geoff Duniam, March 22, 2011 - 5:50 am UTC

Hi Tom,

Thanks for the quick response - Sorry I wasn't clear. All types (Business Type, Business Function and Expansion term) are "terms" of the same "class” of "SALEABLE INVENTORY" terms. (In this case, “Class” and “terms” are business definitions).

The business relationships between these three types of SALEABLE INVENTORY terms are, logically -

1. A Business Type is the "parent" of none or many Business Functions.
2. A Business Function has one and only one Business Type parent.
3. A Business type can map (or "expand to" - business definition) to none or many Expansion Terms, and an Expansion Term can map to none or many (“expand from”) Business Types

The data is being provided as an XML feed – the “CLOTHES SHOPS” Business Type record looks like this

<?xml version="1.0" encoding="UTF-8"?>
<termName>CLOTHES SHOPS</termName> <! This is the term name>
<termNote label="Saleable Inventory ID">200002908</termNote>
<termNote label="BTID">0000000000</termNote>

<! These attributes identify the term type. The attributes below indicate a Business Term>
<! That expands out to Expansion Terms. A Business Function term would only have>
<! a “Business function” termAttribute label >
<termAttribute label="Business function"/>
<termAttribute label="Business type"/>
<termAttribute label="Multi-class term"/>

<!relationships between the terms are defined here>
<! This is an example of the M-M to Expansion Terms>
<relationType>X-Expands to</relationType> <termId>OMOO2033OO4565350V4rGKL829jnfPI1TkShI</termId>
<termName>WEDDING DRESSES</termName>
<! These are examples of the Parent-Child to Business Function relationship>

... <! more relationships>

Business Function record looks like this
<termCategory>SALEABLE INVENTORY</termCategory>
<termNote label="Saleable Inventory ID">200002938</termNote>
<termAttribute label="Business function"/>
<termAttribute label="Multi-class term"/>

Expansion Term record looks like this
<termName>COMPUTER SHOPS</termName>
<termCategory>SALEABLE INVENTORY</termCategory>
<termNote label="Saleable Inventory ID">200002909</termNote>
<termAttribute label="Multi-class term"/>
<termAttribute label="Expansion term"/>

Hence the question that as I want these relationships enforced in the database, and that the term types are attributes of the SALEABLE INVENTORY do we model these separate terms as individual entities? Or is this a heavyweight solution to a lightweight problem and we should model this as a single entity? It's also a requirement that a Business Function can change it's Business type parent, and the relationships between Expansion terms and Business types will also change from time to time. (I'm assuming that we'd need an additional entity to enforce the Many-many relationship between business types and Expansion terms)

Thanks as always for your time, really appreciate it.

Tom Kyte
March 22, 2011 - 8:39 am UTC

1. A Business Type is the "parent" of none or many Business Functions.
2. A Business Function has one and only one Business Type parent.
3. A Business type can map (or "expand to" - business definition) to none or
many Expansion Terms, and an Expansion Term can map to none or many (“expand
from”) Business Types

I'm not going to parse XML - it is ugly and verbose and not very useful....

but, sounds like each of the three is a table (type,function,terms).

type is 1:m with function
type is m:m with terms

So there are four tables type, function, terms, and the association table type_to_terms


Geoff Duniam, March 22, 2011 - 12:26 pm UTC

Hi Tom,

Thanks for the clarification, makes sense. Really appreciate your time.

Sorry about the XML, I don't much like it either!

Indentifying Relationship Question

sumi menon, May 13, 2011 - 6:01 am UTC

Hi tom,
Any candidate Key or alternate Key of the parent table can be the foreign key of child table.But my question is can a Unique Not null Alternat Key of the parent table (Which is not the primary key)can be part of Child tables Primary Key i.e in an Identifying relationship.
Tom Kyte
May 13, 2011 - 12:58 pm UTC

If you are asking

can a unique constraint (null/not null is not relevant) be used and referenced by a foreign key?

the answer is "yes"

If further the question goes on to ask "can that foreign key set of attributes be part of a childs primary key"

the answer is "yes"

That is my interpretation of what you've written.


owolabi joshua, October 29, 2011 - 2:25 pm UTC

Hi Tom,
pls my supervisor said i should make admin,staff and student tables to be on one table drawing their conceptual,logical,physical and ER diagrams.pls help me out.
Tom Kyte
October 31, 2011 - 10:57 am UTC

sql developer has an ER diagrammer and it is free, just go to and you can download it.

design help,

A reader, November 07, 2011 - 3:35 pm UTC

We have two databases A1 and B1. Some data spread across in multiple tables in A1 need to be there in B1. In other words, the data for B1 comes from A1 (only for those set of tables).

The challenge is to know when will the data change happens in A1 so that the exact changes need to be applied in B1 as well.

The process which we have defined is - create a new schema in B1 and put the same set of tables there and set up one-way replication between A1 and B1 (original tables in A1 to tables in new schema on B1). When the replication sends an updated or a new record to B1 (to the new schema), a trigger underneath the table will fire and let Informatica know that were was a change happened. Informatica later, access the table from the new schema and compares against the table with the original schema and synchronizes the data. All this have to happen in less than a second.

We cannot create a same trigger in A1 due to other restrictions.

I hope I conveyed my requirement and the current design we have.

How can Oracle be used to eliminate any layer here to speed up the process?

Tom Kyte
November 07, 2011 - 5:31 pm UTC

what does informatica have to do with replication?

The challenge is to know when will the data change happens in A1 so that the
exact changes need to be applied in B1 as well.

that is not a challenge, that is the definition of "replication" and you can just do that using the database. informatica doesn't come into the picture. Why is it in your picture?

Derived column based on column from another table

Lal Cyril, December 20, 2011 - 5:31 am UTC


One of our applications use a table called trips, which stores the trip details for a trip. This has a trip start time and trip end time stored as utc value (date column) and the pk for this table is tripid.

One trip can be allocated to a group of persons. The application uses a person_allocation table to store the trips allocated for each person. This table has persid and the tripid of the trip.

Each person has a base city code which denotes the base city of the person.

I need to write a query to fetch the details of person and trips for a given date range. The input date range should be converted to each persons local time and then data needs to be fetched. The following is the query for the same

select a.persid,a.tripid,b.start_time,b.end_time
from trips a,person_allocation b
where a.tripid=b.tripid
and fun_ltc(b.start_time, between to_date('01-jan-2011','dd-mon-yyyy') and to_date('02-jan-2011','dd-mon-yyyy');

The problem with this query is the function will be run for all rows and then only filtering can be applied.

I feel that for solving this its better to store the start_time and end_time of the trip as local time in person_allocation table.

To derive the local time columns is there any oracle feature that i can use other than using a trigger,
like function based index, virtual column etc.

I read that virtual columns can be based on the columns of the same table.

Tom Kyte
December 20, 2011 - 8:19 am UTC

To derive the local time columns is there any oracle feature that i can use
other than using a trigger,
like function based index, virtual column etc.,

well, there is always the ability to just insert the data isn't there? I mean, you insert a row into this person allocation table - you insert the person id and trip id, just insert the other bits as well?

And wrap that in a stored procedure - have the client call a procedure that passes in person id, trip id and it does the rest.

And wrap the updates of the trip table in a stored procedure as well - since if someone updates a date in that table - you'll have to cascade that update to this table.

but short of that, you can optimize what you have now. For starters - you coulde code:

where b.start_time between to_date('01-jan-2011','dd-mon-yyyy')-1 and to_date('02-jan-2011','dd-mon-yyyy')+1
and fun_ltc(b.start_time, between
to_date('01-jan-2011','dd-mon-yyyy') and to_date('02-jan-2011','dd-mon-yyyy');

just broaden the search range by a day or so (pretty much covers all timezones) and then apply the function - we'd be able to use an index if sensible to retrieve a smaller set of data and filter it down - instead of every row.

and you might consider

where b.start_time between to_date('01-jan-2011','dd-mon-yyyy')-1 and to_date('02-jan-2011','dd-mon-yyyy')+1
and ( SELECT fun_ltc(b.start_time, FROM DUAL )
to_date('01-jan-2011','dd-mon-yyyy') and to_date('02-jan-2011','dd-mon-yyyy');

read this:

to see why

Query on a cross schema FK constraint

Sameer, January 06, 2012 - 1:54 am UTC

Hi Tom,

I would like to know if there is any downside to having cross schema Foreign Key constraint. (Meaning FK on a table to for which the parent table resides in a different schema).

Tom Kyte
January 10, 2012 - 9:22 pm UTC

the downside is that 'seeing' it is there can be more difficult. There is nothing 'technically' wrong with it - it is doable and it is done on occasion.

Create Database on Windows from remote machine

Nikhilesh, February 14, 2012 - 7:15 am UTC

Dear Tom,
As we have moved to Database As a Service Architecture (I don't know anything about it) we don't have direct access to the BOX. Now providers says they have configured Oracle 11.2 service on it and we need to create our database.
So my questions are
1) Is it possible to use DBCA from remote machine to create database - My understanding is "NO"
2) If we want to create database from command line still we will need to create INSTANCE on the BOX using "oradim", isn't it true?
3)Now if we have a service created on BOX and know directory structure is it possible to create database from remote machine? How to connect to the remote instance using SYSDBA?

Thanks in advance.

Tom Kyte
February 14, 2012 - 9:01 am UTC

1) you need to run dbca on the machine you want the database on. You can get dbca to create the scripts to create a database and then run the scripts later.

2) correct

3) you'd need the instance to be created and have the necessary connection information. your "provider" would have to provide you all of that. and if they are a provider, they would know exactly what needs to be done and would be telling you that. If they cannot tell you the necessary steps - please do find another provide as soon as possible :) !

If they say they have the services set up, then it sounds like they have an instance going and have set up your parameter file and such. They must have the listener running with static registration (so you can connect and startup/shutdown remotely). They need to give you the tns connect information for this instance being serviced by that listener.

Thank you!!

Nikhilesh, February 15, 2012 - 5:20 am UTC

Dear Tom,
I have no words to thank you. ORACLE is lucky to have you with them. Gaining knowledge using GOD Gifted brain is possible for anyone but sharing that knowledge with "not so gifted" people (like me) thats too free of cost and without any attitude makes you a great human being. You will always stay in heart.

Thank you again..

As you said, yes, they have provided TNS file. We have raised a request for SYS password and soon we will create our database.

Well said Nikhilesh!

Robert, April 27, 2012 - 11:23 am UTC

relational database

bhupesh, July 27, 2012 - 5:59 pm UTC

Hi Tom,

I am a newbie and would like to learn about the relational database concepts. Could you please suggest which is the best book to understand more about it? I am planning to become an oracle developer and am sure without the knowledge of relational concepts it would be very hard to write queries.

Thanks a lot!!!
Tom Kyte
July 30, 2012 - 11:47 am UTC

I would start with something more basic: how the database works:

DB Design

Ahmad Al-Sallal, April 23, 2013 - 9:37 am UTC

Hi Tom,
we are a start-up company, and we are trying to analyze the way of creating our applications and database.
here our questions:
1) what is the best scenario
(a) imbedded our DMLs inside the application or
(b) create a table stores DML which has a FK to TRANSACTIONS table (e.g: "Medical Production Posting" transaction has "3 DMLs".
and if it is the later then how can we pass parameters to the DMLs from screen fields???

2) we need to generate a general way to return a serial of any column, so i suggested to create the following table to handle all serializing issues (instead of using Sequences or MAX(ID) ):
1 table_owner
2 table_name
3 column_name
4-6 company/office/Branch IDs
7 period_id -- to handle if this column will be reset at the end of each day/week/month/quarter or year.
8 next_serial#
and then create a function (pragma autonomous_transaction) to return the serial
is it a good idea? on any adive.

3) Do you have a book summaries developmetn ideas and tips?

thanks alot
Tom Kyte
April 23, 2013 - 1:03 pm UTC

1) use stored procedures

do NOT put your sql into tables, ugh, that would be among the worst of the worst of the worst of approaches.

and I'm not a fan of non-database coders writing database code - so you can get database coders (that understand how to spell database and how to program transactions correctly) to do stored procedures and your GUI programmers can invoke that. Tuning, maintenance, reusability, performance - everything will be massively improved using this technique.

2) I don't know what you mean? I didn't get that question at all. I have no idea what you are trying to ask for.

3) the oracle server concepts would be starting place #1.

You should also hire a lead developer that has successfully built a large scale database (focus on SUCCESSFULLY) and learn from them.

New Book on Oracle 12c Architecture

AL Swamy, July 05, 2013 - 7:30 pm UTC


As Oracle 12c has major change in architecture, can we expect your new book on Oracle 12c?

Tom Kyte
July 16, 2013 - 2:18 pm UTC

yes, I was just talking with my editor about that today in fact. Work has begun on updating Expert Oracle Database Architecture to include that.

Good to hear

AL SWAMY, July 16, 2013 - 6:26 pm UTC


Good to hear that you are already working on Oracle 12c new book. Hope you would release kindle version also.


A reader, August 12, 2016 - 12:22 pm UTC

How to maintain price column like when first time we purchase a raw material at 100$ and after next day price hike and to 150$.

case 2)
suppose we purchase 10 items in price 50 $ , a vendor send 4 items and next day send 6 item but at that time price hike and actual is 80$. so how to manage in a table or take another column.?
Chris Saxon
August 12, 2016 - 1:29 pm UTC

I'm not sure what you're asking...

If you want to know what price you paid for a particular product, record the payment details with the product!


Restricted access to the rows

Arvind, March 22, 2018 - 9:16 am UTC

I have two questiosn:

1. How to designing a database for multiple business users where each businees user can access only it's own data?

2. Will it be good to keep tables same or create different users in database for each business users and separate set of tables for each user?


Chris Saxon
March 22, 2018 - 5:25 pm UTC

1. You can use VPD to control which rows users can see

2. There's no fixed answer here. Separate schemas mean you have more work when deploying table changes. But does make it easier to guarantee that each user can only see their data and have custom attributes for each user. You have to decide which trade-offs you're willing to make and why.

Note if you have multitenant on 12.2+, you could use application containers to enable best of both worlds. Each user gets their own PDB, ensuring data security. And you deploy changes to an application root, so you don't have to do N releases.

Data model for Case details for Inventory case management system

Rajeshwaran, Jeyabal, April 09, 2020 - 5:39 am UTC


we got a requirement to build and Inventory report using the details available in case and case_history tables.

A user is assigned to a unit
A unit can be mapped to a Department.
A user can create any number of cases (available in case table) - and that can go through various states like - inprogress, assigned, received, closed etc - all these changes are available in case_history table.

with the details in case_history table - How can i prepare a reporting data model - such that the inputs to the report are Start and End Date along with Department name, and we are requested to show the aggregations counts for each user level - showing the inprogress count, assigned count, received count, closed cnt etc.

1) So how should the Fact table for this report should look like ?
2) Do we need to store the grains at the case level or user level?
3) How can we achieve the data in this fact table, such that it also supports the back dated reports, along with the previous date outstanding balances ( some think like how we get the Account statements in bank, when i launch the statement for 04/02 to 04/06, we need to carry over the balance from 04/01 and show all the transactions between 04/02 and 04/06 and the final balance in the end - kind of like that)

Kindly advice.

More to Explore


Analytic SQL got you confused? Check out Connor McDonald's complete video course.