Home>Question Details



Munzer -- Thanks for the question regarding "Database Design", version 8i

Submitted on 8-Dec-2001 22:04 Central time zone
Last updated 9-Nov-2009 11:46

You Asked

Tom:

1.  I am designing a database for order entry.  I have a table PO (po_no, po_date, 
po_status) and another child table (po_no, stock_no,stock_desc,qty, unit).  Because a 
user may not find a stock_no in a reference table (for all stock numbers) he can enter a 
'999999' for any stock number and then enter his own description for it.  

Since I had to have a primary key I added an "item_no" for each item on the po so i can 
have (PO_no, item_no,stock_no) as primary key.  Is this a good design or there is a 
better way?

Problem with this is that user has to manually enter the item number on the form sicne i 
can not have a sequence that increases per PO, meaning if there are 10 items on the po it 
will go from 1 to 10 and then reset to start back again at 1.

Would it be better just to have  a (one column) sequence number as a unqiue identified in 
the child table?

2.  I am letting the user enter the info using an HTML form creatd by PL/SQL procedure.  
IS there a way for a user when he selects the stock number from a drop-down list the 
description will pop out automatically in the description field.  DO you have to submit 
for with all data entered, find a description for that stock number and return all values 
again when running another procedure.  I do not think this is a good way of doing it?

 

and we said...

1) since this item_no is meaningless, only present to ensure uniqueness, why does it have 
to go from 1..10.  Why isn't it good enough just to be UNIQUE???

Use a sequence, it matters not if it goes from 1..10.  You can always get 1..10 when you 
query if you want:

select child_table.*, row_number() over ( order by item_no )
  from child_table
 where po_no = :bind_variable;

2) see
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:234614318676
...
 

Reviews    
4 stars Database Design for Web Site   June 9, 2004 - 4pm Central time zone
Reviewer: A reader 
Hi Tom,

We have a web site which some parts of it (applications), need database. Some application's data 
are static and some parts are not and have transaction every minute! So in general we have two 
different approaches for tuning and backup/recovery policy! One needs to have a Standby database 
and taking hot backup every hour! and the other one cold backup once a week. Another thing is that 
they even have different audience! Kids and teenagers more logon into the static part and adults 
logon into News part, which is dynamic.

I was thinking of having one database as the development database, but two different databases on 
Production server!

What do you think? Please let me know your valuable idea.

Thank you again.
 


Followup   June 9, 2004 - 4pm Central time zone:

i think it all belongs in a single database, easier to manage, easier everything.

if you have a standby, why would you take a hot backup every hour?  even without a standby -- you 
just need the archives.  taking a hot backup eveyr hour would fall into the "overkill" category.  
run 2 standbys if you are truly paranoid (cascading -- prod -> standby -> standby)

but don't even thing about two databases, that would not be a "good thing" (tm) 

4 stars Thank you Tom   June 10, 2004 - 9am Central time zone
Reviewer: A reader 


4 stars Can name some of disadvantages of having two database?   June 10, 2004 - 10am Central time zone
Reviewer: A reader 
Hi Tom,

I was thinking the same. But here is another DBA who thinks it's better to have two databases 
instead one and he has reasons like : 
- We can have two different Pfiles, backup policies,...

I intuitively think one database is better but I couldn't name any specific reason for that! Can 
you name some of them?

Thank you again,
Arash

 


Followup   June 10, 2004 - 5pm Central time zone:

you can have different backup policies if you like with a single database (but tell me, how 
different would they be really?  but if you like, you backup TABLESPACES belonging to applications 
on whatever schedule you want)

2 different pfiles sounds like a problem more than a "feature" or something "desirable".

ask them to give details.  

I find two databases to be

o harder to manage, there are two, or N, not one.  one is simple

o totally impossible to tune.  how to stop database A from consuming 100 percent of the resources?  
at least if both are one in the same -- i can control things

o totally impossible to keep up to date.  my *favorite* feature of single instance is that either 
both applications are at 9.2.0.5 or NEITHER is.  I can assure you of one thing, if you have N 
databases, you will have N versions of Oracle to maintain.  I like "one" again.

o backup is easier -- one thing (one again).

I cannot think of a thing that is "harder" 

4 stars Redo Generation For Standby Database!   June 14, 2004 - 3pm Central time zone
Reviewer: A reader 
Hi Tom,

Thank you for your reply. One thing that I may not mentioned clearly is as I already said we’ve set 
up a standby database for our primary database which is a dynamic part (news). By dynamic here I 
mean the data will change and it is important to have its backup. The other part which I said 
"Static", is not actually static. Even it has more transaction than the first one. But the data 
isn't worth it to backup! It's Game's Data that kids play on the website.

So now I think if I consolidate two databases that is for one website but absolutely two different 
and separate application regarding type of user, Data, backup policy, tuning, types of queries,..., 
Because of the standby database which automatically transfer and apply redo logs to it, will create 
network traffic and also storage problem on my standby database for those unwanted Redo Logs. What 
do you think ? Please as always let me know your valuable idea.

Thanks,
Arash
 


Followup   June 15, 2004 - 8am Central time zone:

You can use a logical standby -- and only do the segments of interest, instead of the entire thing.


 

4 stars   June 15, 2004 - 11am Central time zone
Reviewer: A reader 
Data Guard Concept and Administration
“A logical standby database is logically identical to the primary database and can be
used as a production database if the primary database is taken offline for routine
maintenance or is damaged through human error, corruption, or a disaster.”

“ A physical standby database must be physically identical to the primary database. Put another 
way,
standby on-disk database structures must be identical to the primary database on a block-for-block 
basis, because a recovery operation applies changes block-for-block using the physical row ID.”

Hi Tom,

I read about Standby database but still have some confusion! What is the major different of Logical 
and Physical Standby Database? Is this correct if I say: With Logical Standby database, you can 
have selective datafiles for standby database? Does it mean, the primary database doesn’t create 
Archive log for those datafiles haven’t  been selected in Standby database? 

In general, how can I prevent creating Redo Log for part of my primary database? How can I solve 
the issue that I mentioned before for having two databases instead of one?

Could you please elaborate this more?

Thanks in advance,
Arash
 


Followup   June 15, 2004 - 4pm Central time zone:

a logical standby is maintained by turning REDO into inserts/updates/deletes.

A physical standby is maintained by applying redo to copies of datafiles.


so, a logical standby LOGICALLY contains the same data -- but it might(probably) be in different 
blocks, slots on blocks, and files.

a physical standby is a bit for bit copy of the primary. 

4 stars   June 15, 2004 - 4pm Central time zone
Reviewer: A reader 
Thank Tom.

Sorry if I am asking stupid question! But I didn't get it well!

So, how does logical standby database will avoid generating unwanted redo logs? If you remember, 
the issue of having two database instead one was, different needs of backup policy for different 
part of database. and I said, if we have one database, because of having a Standby database, it 
will create two many unnecessary redo  logs and may create network traffic. and then you suggested 
use 'logical Standby' for my standby database. Still I don't know how it helps!

Thank you for your time,
Arash
 


Followup   June 16, 2004 - 11am Central time zone:

there are no "unwanted" redo logs.

the primary will generate redo for all objects (no avoiding that no MATTER WHAT).  

the redo will only be applied to the standby if the redo is for some object you care about.

all of the redo would be shipped this is true -- but I'd rather (this is me, opinion, the way I 
would do it -- do what you will) size for the entire thing to be managed in a single database.  It 
just makes everything easier. 

5 stars table design   June 25, 2004 - 5pm Central time zone
Reviewer: mo 
Tom:

WHen you usually design a database, and you have one parent and 4 children tables, do you usually 
add timestamps to 5 tables or just the parent.

(Creation_Date  Date,
 Created_User   VARCHAR2(10),
 Modifiy_Date   DAte,
 Modify_User    VARCHAR2(10) );

I am thinking there is no reason to add these to 4 tables unless we insert/update children 
separately. If these are all part of an application form then do owe still need it.

On the other hand it may be easier to query one table to look at when record was created/modified.

What is your opinion on this?

THanks,
 


Followup   June 26, 2004 - 1pm Central time zone:

only your business rules can decide that.  If you ALWAYS update the parent and child record "as if 
it were one record" -- that would indicate to me that this is really a 1:1 optional relationship 
(eg: a parent has a SINGLE child).

I pretty much go for:

create table the_table
( common_attributes......,

  optional_set_of_attributes_one,
  optional_set_of_attributes_two,
  optional_set_of_attributes_three,
  optional_set_of_attributes_four
)
/

that is, one table.   

5 stars database design   October 25, 2004 - 6pm Central time zone
Reviewer: mo 
Tom:

I would like to take your opinion on an oracle application design.

We have an internal web application that stores data on student applications.  Applicants personal 
information is stored in an "STUDENT" table and application data in "APPLICATION" table. A student 
can submit more than one type of application.

All applications have been mailed by students and entered by internal staff.  Each student was 
assigned a unique "Student_ID" to unquely identify him/her. Database has over 2000 students 
already.


Now, We want to post the HTML application forms externally on the web for students to enter to 
directly to the database. This will take the load off the internal staff.

Problem is when someone decides to fill an application on the web how do you verify that he already 
has a record in the "STUDENT" table and not create a duplicate record.

I was thinking of letting each student register before he can fill out an application.  Then the 
system can assign him/her a userid/password (like Yahoo) so he can login next time and be 
identified.  After he registers we can check (first name,mi, last name, city, state) against the 
database to see if he possibly might be there.  We should inform him whether this was him or not. 
If he answers yes, then we update the record in "STUDENT" table, and if he says no or system could 
not find a match we insert a new record.

I am seeing a few flaws in this design like he may have changed his address or people could have 
same names, or he can misspell a letter in his name.

Do you have any good ideas to implement this? 


Followup   October 25, 2004 - 7pm Central time zone:

How do the people that manually enter the data now determine "which student record" to update? 

5 stars database design   October 25, 2004 - 10pm Central time zone
Reviewer: mo 
Tom:

When they get an application, they do a "Search for student" which does a soundex search on last 
name and first name.  If anything was found, they check the list to see if a match is there 
(address info) and then they add/update  records for him.  Otherwise they create a new record.

If you are thinking of doing the same thing for the student himself then it might be dangerous 
because he  will be able to update other people records that come on the list.  I know there might 
not be a "perfect" solution, but I like to see what you think the best solution for doing this. 


Followup   October 26, 2004 - 7am Central time zone:

any of your solutions will let the student update "other peoples records"


if you do this:

...
I was thinking of letting each student register before he can fill out an 
application.  Then the system can assign him/her a userid/password (like Yahoo) 
so he can login next time and be identified.  After he registers we can check 
(first name,mi, last name, city, state) against the database to see if he 
possibly might be there.  We should inform him whether this was him or not. If 
he answers yes, then we update the record in "STUDENT" table, and if he says no 
or system could not find a match we insert a new record.
..........


what would prevent me from giving you a false fname/lname etc and being "whomever I wanted"


Only thing that would 'work' would be to assign each student an identity and password -- and then 
notify them that "this is your identity".  anything that lets them identify themselves to you will 
be subject to 'spoofing'.


 

3 stars database design   October 26, 2004 - 9am Central time zone
Reviewer: mo 
Tom:

I did not quite understand what you mean.

Are you saying that I need to create a userid/password for every student that already exists in the 
database and every student filling out an application on the web.  "Students can be anyone in the 
world".

But then as you said how would this validate if a new applicant is already in the database or not 
(insert or update transaction)?
I guess it is also impossible to prevent someone from faking a name too?
Would you suggest saving the records coming from the web into separate table from "STUDENT" and 
then have someone match/move the record into the "STUDENT" table? 


Followup   October 26, 2004 - 9am Central time zone:

i'm not suggesting anything, i have insufficient data.

all i'm saying is that any approach that lets the student input information to hopefully find their 
record is subject to spoofing.  What you say you are worried about is 100% obtainable if you let 
them do that.

If that is an issue, I can really only see "starting over".  each student is assigned an account -- 
a fixed known account -- and given access to that specific account.    

4 stars Table Design   November 29, 2004 - 9am Central time zone
Reviewer: mo 
Tom:

I have created a table to track PO information.  I have PO_Number and PO_FY as Primary Key.

Now I see a need that a user may enter a typo and have to re-enter the PO_Number or PO_FY as any 
system.

This means that I have to update the primary key of the record.

1.  Do you see any issues with that?

I myself see some issues if the PK that needs to be updated is being used in other table. This 
means that you have to update the values in other tables too which means more work.  Using a 
seqence number elimnates this.

2. Do you recommend using a unique sequence instead?

Thank you, 


Followup   November 29, 2004 - 3pm Central time zone:

1) yes, probably means that po_number, po_fy isn't the primary key, but rather a unique key with a 
not null constraint on it.

2) yes I do, yes others would not, yes this is a debatable thing that has been debated back and 
forth and back and forth.


Me, if the primary key is going to be updated -> it is not the primary key. 

5 stars database design   November 29, 2004 - 4pm Central time zone
Reviewer: mo 
Tom:

I will go with your advice and a add a new sequence number (PO_ID) to the PO table.  However if I 
have another table "INVOICE" that has (PO_NUMBER, PO_FY) as part of the invoice since the user 
selects those when he creates an invoice, would you be replacing these fields with PO_ID in invoice 
table, or you would keep them as is and update them whenever an update occurs in PO table for a po 
number.

Thank you, 


Followup   November 29, 2004 - 6pm Central time zone:

the entire point is to have a primary key that can safely be propagated to child tables.

eg: the table invoice uses po_id as its sole foreign key column to this table.


else, why bother. 

5 stars designing the database   November 30, 2004 - 7am Central time zone
Reviewer: riyaz from south India
After seeing threads related to "design" in this great forum, I am using the following methodology 
for my new project.

- Always only update MAIN BASE TABLES. 
- e.g.,, In an "invoice generation screen" only invoice header and invoice detail table should be 
updated through Front End (ie., through Java)

The following things would be UPDATED through TRIGGER, written on base tables.

- tables which are used for MIS reports (summary data tables)
- transaction log, used to monitor the transaction. (when entered, who entered)
- updating back the status on source document. (e.g., Sales order will be updated back with 
invoiced quantity and status when invoices raised.)
- Status updation (e.g, Pending, Over or Cancel)
- History tables (for having each status of modification information / replicate of base tables)
- The workings like,  "Tax calculation" is also being done in procedure and called in Front End.

Here, only base tables  (header and detail tables) only will be updated through Front end Jsp or 
using GUI like VB.

And, other than the base table data, all other information will be updated using TRIGGER which has 
been written on base table, mostly on detail table.
(Trigger will be very short and will be calling procedure, which is available with in package)

Please confirm my approach and add your views also.
 


Followup   November 30, 2004 - 8am Central time zone:

I don't understand what "always only update MAIN BASE TABLES" means.

I'd rather use stored procedures (transactional API's) and never let the front end do an 
insert/update/delete personally.

that would remove the need for automagical triggers firing in the background as well.  (not a huge 
fan of triggers containing any sort of complex logic, gets really confusing) 

5 stars Thanks, your final word pl.   December 2, 2004 - 12am Central time zone
Reviewer: riyaz from South India
so you mean to say, it is like this

From Java, 
Case - I
1. Write store procedure to insert into base tables and this procedure will be called from Java/jsp 
or GUi.
2. Trigger to be written on the base tables, will update the cross tables

Case - II
Put everything in the store procedure and call from Front end. (java or any gui) SO that all the 
tables will get inserted and all cross table updation will also happen.

Which one you prefer? And also confirm me whether we can have all business logic (tax calcuation 
etcc) in stored procedure and call from Front end whenever required.






  


Followup   December 2, 2004 - 7am Central time zone:

case II!!!!  definitely.


confirmed.  you can. 

5 stars contd   December 2, 2004 - 12am Central time zone
Reviewer: riyaz from South India
In a Purchase order module

Base tables: po_head and po_det
Cross tables: stock, purchase_summary etc.. 


5 stars Single Table or Many Table   December 2, 2004 - 11pm Central time zone
Reviewer: Tony from India
I am in the process of designing a database for an OLTP system. The application is large in size, 
and hence consists of several modules such as Sales Order processing, Vendor PO, Vendor 
Development, Material Request, Goods Delivery, Goods Return, Inward Quality Check(IQC), Outward 
Quality Check(OQC), Stores Issue and Receipt, Production Planning, Assembly, Packing, Shipment and 
Invoice.  Across some of the modules/activities, most of the fields (attributes) are same. For 
example, IQC and OQC modules have similar fields except one or two fields.  The modules are 
transaction intensive, and will heave concurrent access by many users. I have three approaches for 
designing the tables for modules having similarity. With my little knowledge, I have explained the 
pros and cons of each approach.

1).Design single table for modules/ activity having similarity. For example, IQC and OQC modules 
will have single table with additional flag column to identify the modules/activity.
   
Pros:- No need to UNION/JOIN many tables to pull out data for reports, and thus the reports will 
run faster.

Cons:- Number of concurrent hits on single table will be high, and this might reduce the 
performance of concurrent inserts/updates/deletes.

2).Design single table with module-wise partition for modules/ activity having similarity. 

Pros:- Each module will access only the corresponding partition in the table. So, there will not be 
any concurrency issues.
   
Cons:- Partition might not be suitable for OLTP systems. And partitions and global/local index 
maintenance might be a problem in OLTP system.

3).Design separate table for each module having similarity. 
For example, IQC and OQC modules/ activity will have separate table.

Pros:- Number of concurrent hits on the tables will be less. This will improve the performance of 
concurrent inserts/updates/deletes.

Cons:- Need to UNION/ JOIN many tables to pull out data for reports, and this might slow down the 
performance of reports.
  
 
Tom, please advice me to find the right approach. Please correct me if my assumptions (pros and 
cons) are wrong.

Thanks a lot in advance. 


Followup   December 3, 2004 - 7am Central time zone:

1) partition if you are concerned about the cons. (after testing, the database is fairly resiliant 
in this manner)

2) see number 1


I would tend towards simplicity, a single table seems to be called for.  there are techniques you 
can use such as partitioning to physically segregate the data if need be. 

5 stars   May 7, 2005 - 12am Central time zone
Reviewer: ALT from indaia
HOW DO I KNOW WHAT ALL SQL STATEMENTS ARE FIRED BY THE DATABAS USERS IN MY DATABSE  on daily basis
AND 
HOW CAN I DENIED THE USER ACCESE TO MY DATABASE(THAT USER IS DOING SOME ILLEGAL ACTIVITES WITH MY 
DATABSE) 


Followup   May 7, 2005 - 8am Central time zone:

AUDITING.

see tha administrators guide.

you can use dbms_fga to capture all SELECTS (and in 10g, all modification statements too)

you can use triggers to capture their sql during modifications in 9i if you really like.

v$sql might be more than enough information.


you use grants and auditing to deny access (grants/revokes to deny, auditing to discover) 

2 stars Can name some of disadvantages of having two database?   May 17, 2005 - 10pm Central time zone
Reviewer: abc from farmington,MI,48335
hi ,
i found it useful but i have a question i.e 
how to retrieve employees data that have same first name,last name but have different ID's  from a 
table i mean the result should show only the employees who have same first name & remaing employees 
shouldn't be displayed 


Followup   May 18, 2005 - 8am Central time zone:

select *
  from ( select emp.*, count(*) over (partition by first_name,last_name) cnt
           from emp
       )
having cnt > 1;

 

4 stars Optimal Database Design   June 15, 2005 - 5am Central time zone
Reviewer: Mohamed Abd El Mawla from Egypt - Cairo
Hi Tom

We are designing a new database that 'll has many finnancial modules .

I'd like to ask about the optimal database design where i have an argument with developers about 
this .

They want to have a single database user that owns all database objects but i want to separate all 
objects according to their speicality .

i.e : user1 for module1(Gl as an example) 
      user2 for module2(AR as an example)
and so on

I'd been asked a question about the benefits of each method .

So, What is your openion about this ? 

Single Database user / Multi Database users managed by synonyms .

What are the benefits of each ???
thanks

 


Followup   June 15, 2005 - 9am Central time zone:

the benefits of multiple schemas will be in the areas of security (you can secure things "more 
easily" as there is no single "super user" -- and modularity, you can have separate components.


I like segmentation like that myself.  Makes it easier to organize things and see relationships 
between things 

5 stars Optimal Database Design   June 18, 2005 - 3am Central time zone
Reviewer: Mohamed Abd El Mawla from Egypt - Cairo
Thanks Tom

Your words are always very useful for me.

The topic of security is the more effective answer where oneday you have to distinguish between 
senior and jonior developers by using multiple schemas .

This target can not be reached with single database schema where all developers will have the same 
privileges on the objects .

But i'd like to ask about performance issue here ??

Is there any performance issue with single schema versus multiple schemas using synonyms ??

There was a response from you before about public database synonyms and its performance issues .

thanks 


Followup   June 18, 2005 - 7pm Central time zone:

why would you use synonyms?

private synonyms have nominal overhead.

do not use public synonyms, not for performance reason but because globals in general are "bad" and 
you'll prevent consolidation of systems down the road. 

5 stars which book?   June 18, 2005 - 8pm Central time zone
Reviewer: Alexander 
Tom,

what book on Oracle database design would you recommend for 
someone with 10 years of experiences with other databases? 


Followup   June 18, 2005 - 8pm Central time zone:

The server concepts guide

read it, retain 10% of it, and you'll already know 90% more about oracle than most.


MOST IMPORTANT THING TO "get": Multi-versioning, read consistency.  by far. 

5 stars did you mean Database Concepts?   June 20, 2005 - 11am Central time zone
Reviewer: Alexander 
Tom,

did you mean Database Concepts? The one we download in PDF format?

Part I. What Is Oracle? 


Followup   June 20, 2005 - 12pm Central time zone:

yes, that one. 

4 stars Application design   December 9, 2005 - 3pm Central time zone
Reviewer: yeshk from St.louis,MO
Hi Tom,
  Thanks a lot for your time and effort.

I am redesigning an application. This application is mainly inserting into the database. A little 
updates but not much fetching data from database. Which is a better option to go. 
1) Should I create a normalized(very) database? 
2) What is the implication of having a lot of columns in one table? I do not want to split the data 
into two different tables when the information is one to one. Example : A applicant data with his 
address information. 
Do you think that an applicat personal information like Name an DOB should be maintained 
differently than address?
3) I am planning to create a table for each object like a borrower, loan information or user 
information etc. Is that a good idea? your inputs please.

This application is done in Java. So the team doesn't want to do anything other than storing and 
fetching information from database. I tried telling that manipulating dates is faster in Oracle. 
but they want to fetch from database and then parse and split the date as day/month/year etc. How 
do I make them understand that handling/validating/managing data in the database is the best way to 
go. 

Thanks
yeshk 


Followup   December 10, 2005 - 4am Central time zone:

1) cannot be answered without understanding the application.

2) any table with more than 255 columns will be stored in multiple row pieces, if you are not read 
intensive - it will not impact you too much.

3) see #1


well, you just lost me on that last paragraph.  Actually, interesting last paragraph, seems to be 
implying this rule:


  JAVA DEVELOPMENT => database is not a thing of relavancy

Guess the only way to convince them is to let them program for a couple of years and get a little 
age on themselves.  

When they code the worlds slowest, least manageable application some number of times, they'll get 
the point. 

5 stars Another way to "prove" your point   December 12, 2005 - 9am Central time zone
Reviewer: Bob B from Mechanicsburg, PA
I'd say, have a contest to do something small with java and the database.  Spec out a tiny app 
(perhaps an app to add, view all, view one, and update contacts) and then have a challenge amongst 
the developers you want to persuade.  Each developer, including yourself, tries to make the app.  
The apps are then rated by correctness, time to complete, and response time.  After that's done, 
you could add a step 2 for a change request and see how long the maintenance/addition takes. 


Followup   December 12, 2005 - 10am Central time zone:

well, that is not entirely fair, but it would demonstrate this sort of concept:

there are tools that can efficiently do simple, straightforward things easily (eg: to do your 
contest, I would choose htmldb - in the time it takes to type this sentence, I would be done - and 
would have lost update detection, CRUD screens, everything done).

there are tools that can do anything - but not everything simply.  sometimes not ANYTHING simply, 
but they can be made to do anything.

there are tools that can do some very specialized things better than anything else.

there are tools that .......... <and so on>......... 

5 stars The contest   December 12, 2005 - 11am Central time zone
Reviewer: Bob B from Mechanicsburg, PA
The point of the contest was not to use the best tool for the job.  It was to say, ok we all have 
access to these two tools (java and oracle).  Each of us will try to do x with these tools and 
we'll compare implementations to see which implementation style is best for us as a team.

I was actually thinking of html db when I said contacts app, because I believe you can download a 
written and tested html db contacts application. 


5 stars design   January 19, 2006 - 10pm Central time zone
Reviewer: mo 
Tom:

I have an inventory management system that stores inventory information on stock items.  The stock 
items get replenished using a different financial system where purchase orders are entered.  
Currently we do not have access to this financial system.

my plan is for a warehouse person who receives a package is to lookup the po numbers from the 
financial system and import that po items information.

My questions:

1.  The financial system had other information (payroll) and we may not be able to get a read 
account or link to it.  What would you do in this case? (i.e create views or another instnace that 
mirrors the financial database and refresh it nightly)

2.  If they give us access, Can we get limited read access on certain tables/fields/records?  Or 
you can't create an oracle account that limits access at that level.

3. What is your way of solving this?

THanks,

 


Followup   January 20, 2006 - 9am Central time zone:

I'll assume the other system is remote and you'll be using a dblink.

You can certainally

b) create views that limit the rows/columns the user can see
c) grant them select on the needed tables or views that they must have access to

create your database link using that account.  You'll have read only access to precisely the data 
you need and nothing more. 

5 stars account   January 20, 2006 - 6pm Central time zone
Reviewer: mo 
Tom:

1.  Since they may not allow even a read acces to a production system, and might tell us to use a 
data warehouse instance, would you ask them to create the read-only views into that another 
database. If this is feasible would the views be updated automatically via a link?

2. Since I am linking from the inventory system into the other system I would be creating the 
database link in my system. Do they have to do anything at their side for this?
 


Followup   January 20, 2006 - 6pm Central time zone:

if you don't have even read only access, you'll have to call them and ask them to read it to you or 
something - I mean, that would be "no access"


they would have to set up the account, grant it the privs, create the views.
 

3 stars Database Design   January 21, 2006 - 1am Central time zone
Reviewer: Reader from US
We have a legacy system which is feeding Data to another system (OLTP) .During the night we refresh 
our interface tables which takes about 15 mins.The timing is not a big concern today .The problems 
are :
1.On these interface tables we have a lot of dependency ie FK .So we disable these contraints 
refresh the data and then enable the constraints (novalidate) 
Sometime if there is delay and some tables are still active then the constraints cannot be disabled 
and refresh hangs .

Can you suggest an alternate solution ?

Thanks   


Followup   January 21, 2006 - 10am Central time zone:

do not disable the foreign key.  That is my only suggestion - regardless.  Even if it makes it take 
longer, that is my only suggestion.

I think you'll find it to be pretty efficient (foreign key checking).  And when the data starts 
getting rejected - you'll be notified immediately instead of finding it as a strange "impossible to 
be true" bug later. 

5 stars interface   January 24, 2006 - 12pm Central time zone
Reviewer: mo 
Tom:

As a followup to previous message, the financial system owners agreed to copy needed records for us 
from their production instance into another warehouse instance (they own) on a daily basis.

1.  If you are the owner of the inventory system, would you create tables to copy the data over on 
a daily basis or would you create materialize views based on those table.

2. How would you handle the daily updates of the data. Do you schedule a daily job that deletes the 
data in records and copy over the tables again?

3. Their copy will merge all tables that make up a record into one. This means we will have 
repeating information for some fields. Would this be a problem? Would you break it up and save it 
into separate tables or you would parse the data in your application code.

4.  What is a legacy system? I hear the term but do not know what it means?

Thank you, 


Followup   January 24, 2006 - 8pm Central time zone:

a legacy system is an existing system.  Immediately after a new system goes production, it is a 
"legacy" (existing, in place) system.

I cannot design a warehouse refresh right here, with this level of detail.  #1 for example, you 
cannot say if you can really use a materialized view effectively, there is no mention of volumes, 
no mention of materialized view logs, not really much to go on (and this is not the place for 
really new questions...) 

5 stars database design   January 24, 2006 - 10pm Central time zone
Reviewer: mo 
Tom:

I could not understand what you mean by your second paragraph.  Are you saying that you need more 
info to see if a materialized view solution is viable?

Would not this 
delete table;
insert into table select * from table_in_warehouse@db_link;

work like a materialized view.

Do you also mean this subject should be under a new question? 


Followup   January 25, 2006 - 1pm Central time zone:

sorry, I have extremely limited information here - you ask "design system", I respond "insufficient 
data".

Your two lines of code would be logically equivalent to a complete refresh materialized view, yes.  
 

3 stars Database sizing   January 25, 2006 - 12am Central time zone
Reviewer: Nikunj from India
Hi Tom,

1. How we can size the database for example I would like to develop financial accounting system.

2. Proc and crons of each application on different tablespace or every application on single 
tablespace.
I.e. Financial Accounting system on Financial tablespace.
Hr on HR Tablespace
Distribution on Distribution Tablespace

Or Financial, HR, Distribution on USER tablespace ?

Regards, 


5 stars database design   January 25, 2006 - 9pm Central time zone
Reviewer: mo 
Tom:

sorry for the limted info. But I will try to explain it here briefly.  It is a simple thing but I 
want to find the best way of doing it.

We have 3 databases:

DB_A  (Financial sytem)
DB_B  (Data Warehouse for Financial system) 
DB_C  (Inventory System)

DB_A has several tables A,B,C that contain requisition data and tables E,F,G for purchase order 
data.

The DBA of DB_A will consolidate all data in tables A,B, and C into one table for requisitions data 
access and the same thing for PO data. He will do it every night.

I need to capture the data from those tables in DB_B into DB_C and update the data nightly.

1.  How would you do this?
2.  Would you copy the data over to the inventory system (since we own it) or you just keep using 
the database link and read the data from the data warehouse?
3.  Would consolidating the data in several tables into one table create a problem for parsing or 
not?

Thank you, 


4 stars database design   January 28, 2006 - 9pm Central time zone
Reviewer: mo 
Tom:

do you think the above explanation of what i need to do was very limited too? 


Followup   January 29, 2006 - 8am Central time zone:

all I can do is tell you (as I have in many places on this web site) how you can do it.  there are 
materialized views and here is how they work.... there is dump and load and here is how it 
works.... you can write code to move it and here is how that works.....

it is up to you to evaluate and most importantly - understand - each so you can pick the one that 
is right for you.

I don't know if you have the ability to set up mv logs (nor do I really want to know, you need to 
know - you need to evaluate...)

I don't know if you would benefit from incremental (fast) refreshes or if complete would be best.

I don't know the volume of data here.

I don't know if you can pull just changes (eg: they put a timestamp in place)

I can only tell you the tools you have available, you have to decide which tool is best for you in 
this case, remembering that the next time you do something similar, a different method might be 
best because some variable changed. 

4 stars Database design   March 2, 2006 - 11am Central time zone
Reviewer: Eugene Yen from USA
Hi Tom:

You comments are really helpful to me through the threads.  There is a design issue that is 
slightly different from the discussed above and I like to know your opinions.  

We are designing a database that will manage the data of many customers.  All of the customers will 
use a same web site to communicates to the database.  There are the requests of most customers that 
they like to have their data as separated as possible from the others and they also need to do 
offline bi-direction data exchange to this database from their databases( possibly through web 
services).  

Basically the data structure is the same.  THe customers number, however, can go upto a thousand 
potentially.  And each of their data volumn is about in the middle range. 
 
I have already opted not to create separate schemas or tables for each customers. Instead I think 
it is easier to separate data at the records level.  But it is quite a challenge still.  I was told 
that in stock market many databases has encounterd similar challenges – each user like to have 
their data separate form the others.     

My initial approach is to mark the transaction data (records) by the customer ID in columns like 
“CUSTOMIER_IDENTIFIER”.  And manage each user’s data by this flag.

 
What is your suggestions?

Eugene

 


Followup   March 2, 2006 - 1pm Central time zone:

I'm so glad you didn't go down the multiple schema/tables approach.  That would be the wrong 
approach.

having that customer_identifier is probably the right approach (given what I know from above).  

You can either create a view that adds "where customer_id = sys_context( 'your_ctx', 'cust_id' )" 
or use fine grained access control to control access to the data.  

You would, upon identifying the user, set the application context with the customer id so that when 
they query any table - they only can see or touch THEIR data. 

4 stars DB Design issue   March 2, 2006 - 2pm Central time zone
Reviewer: Eugene from usa
Hi Tom:

Thank you for the reply which points me to a good direction and gives me a good starting point.

Appreciate that.

Eugene 


4 stars design question,   September 21, 2006 - 2pm Central time zone
Reviewer: A reader 
We are writing an application to extract data from a multi million row table on which we have only 
SELECT privilege.

The transations on the multi million row table happens quite frequently and it could INSERT, UPDATE 
OR DELETE.  Basically we don't have any control over that table.

We extract the table say at 2:00pm by giving condition like "where update_timestamp between 1:00pm 
and 2:00pm".

If there is a transaction on the base table starting at 1:50pm and ends at 2:05pm, I would not be 
extracting those data in my 1:00pm to 2:00pm condition.
Next time when I issue with a condition "between 2:01pm and 3:00pm" then I miss those records that 
were inserted in the base table between 1:50pm and 2:05pm.  The sysdate value will be 1:50pm (that 
is when the transaction starts).

This is a serious issue for as.  Right now we are thinking of extracting 4-6 hours of data from the 
base table. there will be duplicate records but we will handle that in OUR environment.  

Any other idea considering data integrity and performance as top priority?

Thanks,
 


Followup   September 22, 2006 - 2am Central time zone:

why wouldn't you use change data capture, streams or some other builtin capability???

but for your logic, instead of you remembering "we pulled data last at 2:00pm", you must remember:

select min(dt)
 from (select sysdate from dual
       union all
       select start_date from v$transaction);


that way - you will remember "1:50pm" - the start date of the oldest transaction currently running 
in the system rather than 2:00pm.

see
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:7636765105002

 

4 stars still have a question,   September 22, 2006 - 6pm Central time zone
Reviewer: A reader 
thanks for the suggestion, but now I may be missing in understanding the concepts here.  If a 
transaction starts at 1:50pm the start_date in v$transaction would be 1:50pm and my query (which 
starts at 2:00pm) now checks for 1:50pm onwards instead of 2:00pm.  But it does not extract the 
data which started at 1:50pm since it is not commited yet (which is at 2:05pm).  At 2:05 pm the 
information in v$transaction is perhaps gone right?

Next time when I start a similar query at 3:00pm and if there is no transaction happening at that 
time, then it would consider 3:00pm as starting time and it does not extract records committed at 
2:05pm.

As per your other suggestion of using streams, it is a very good idea but the buerocrats in our 
company would not agree that without profound testing.

Thanks,


 


Followup   September 24, 2006 - 1pm Central time zone:

in order to accomplish your goal, the standard logic is:


a) remember what time it is RIGHT NOW
b) look up the last remembered time
c) pull all data since (b)
d) commit;


repeat.  but you need to modify a) to be "remember what time it is right now OR the oldest 
transaction time"


So, let us say you start the very very very first copy of your data at 8am today. And you refresh 
every "hour"

You would:

a) remember the minimum of 8am, start time of all current transactions in some table.  suppose this 
in fact is 8am (no other transactions)
b) pull all data.
c) commit;

one hour later you would

a) remember the minimum of 9am, start time of all current transactions (say this result is 8:55am).
b) query up the last remembered value, that is 8am, from a) above.
c) pull all data with a timestamp >= 8am
d) commit;

one hour later at 10am you would

a) remember the minimum of 10am, start........
b) query up the last remembered value, that is 8:55am from a) above.
c) ...
d) ...

repeat.


I would agree that nothing be done without profound testing, however, you chose the hardest, most 
manual, I will impose a negative performance impact on everything, we will be debugging and 
maintaining this code forever approach.  You might want to reconsider your approach before it goes 
into production and becomes rather permanent. 

5 stars great tips,   September 25, 2006 - 10am Central time zone
Reviewer: A reader 
That was wonderful and very clear.  We will try your tips which we never had an idea before.

Thanks,
 


5 stars Good One...   September 25, 2006 - 1pm Central time zone
Reviewer: Vishnu Vardhan Reddy Singireddy from USA
Good One

Thanks 
Vishnu Vardhan Reddy Singireddy 


5 stars Good One   September 25, 2006 - 1pm Central time zone
Reviewer: Vishnu Vardhan Reddy Singireddy from USA
Good One
Thanks 
Vishnu Vardhan Reddy Singireddy
vishnu.singireddy@gmail.com 


4 stars DB design   November 21, 2006 - 7pm Central time zone
Reviewer: sam 
Tom:

1.  Would you store derived values in a table column? or you always calculate those values in your 
code?

2.  Let us say you offer free cassettes and CDs to 100 customers each month.

You assign each customer a number of books and a number of CDs. This number will be shown to 
customer when he enters the quantity (quota based number). He can use that whole number on one song 
or select different song titles. He can not exceed that quota. Every month, the inital number 
assigned to that customer is added to this "Quota allowed" number. Also all the leftovers (unused 
quantity) will be added to the initial number assigned for him.  The admin can also change that 
initial number assigned to a customer from month to month.

How would you compute the amount available for each customer every month. I have two tables:

Customer_Quota
------------------
Quota_id
Customer_ID
Qty_Cassette_Assign
Qty_CD_Assign
Creation_Date

Customer_Selection
-------------------
Customer_ID
Title_ID
Qty_selected

Would you be storing multiple records per customer in the Customer_Quota so you can track the old 
quantites assigned per month as you need to track that to compute available quantity now.

 


Followup   November 22, 2006 - 4pm Central time zone:

1) or maybe a function based index - only reasonable answer is "it depends"

give me an example :)

2) sorry, did not follow - and does not seem to be a follow up but a new question 

5 stars Application design NOT Database   November 29, 2006 - 9am Central time zone
Reviewer: Arindam Mukherjee from Kolkata, India
Respected Mr. Tom,

Today you please let me ask you one offbeat question. I have already read your book “Effective 
Oracle by Design”. My question – is there any website where I can read various designs on different 
kind of applications those already are running? It’s my conviction, after designing the application 
I will go for database design module wise for that application.
 


5 stars db design   July 16, 2007 - 3pm Central time zone
Reviewer: sam 
Tom:

I want to verify some database design issue with you.

I learned from several DB books and normalization rules that you do not save "Calculated" values in the database, such as "Invoice Total" and "Inventory Level" for a stock item.

Do you agree with this?

I think the reasoning behind this was that every time you update a record (e.g unit price or quantity) you have to make sure you updated the calculated value and thus they recommend to calculate the value real-time on demand.

There is an issue with this design though that I found out.

I have an inventory system with 4 table, inventory, orders, receipts and shipments.

Let us say you define in the "inventory" table what the stock item you want to track and the Initial Quantity.

Then to calculate the "Inventory on Hand" you take that starting value for the maximum dated record, and add all receipt quantities after that date and subtract all shipment quantities after that date.

This works fine when you have a few hundred records. However, when your tables starting getting big, the function that caculates become slow because it is scanning thousands or millions of records.

Some common sense quesions that arise, is why does the system have to go back and look years back worth of data to calculate the inventory on hand for a stock item. This should be done on a day by day basis and stored in a field?

What is your opinion on this and how would you design such a system?

ALso, do you use any references/book for ideas on a good data model for speicifc application or you design the thing from scratch? e.g, It might be helpful to look at existing banking data model if you are developing a banking application.

Thank you,

Followup   July 17, 2007 - 11am Central time zone:

... I learned from several DB books and normalization rules that you do not save "Calculated" values in the database, such as "Invoice Total" and "Inventory Level" for a stock item. ....


Nope, I don't agree - if the values are maintained by the system itself. Think about it - an index is repeated values that must be maintained in two places isn't it. Therefore we should not use indexes because it duplicates data. But of course we do because they are "transparent" to us. So, using materialized views - maintaining this data is quite doable and smart. I do - my home page tells you some interesting numbers - I do NOT count those rows every time you hit that page however (my server would have to be a lot bigger if I did). Rather, I use a materialized view maintained by the system to transparently maintain these "copies" for me.

in a data warehouse that statement is absolutely WRONG....



5 stars db design   July 17, 2007 - 2pm Central time zone
Reviewer: Sam 
TOm:

1. OK i got half of the answer which implies you will save the caculated "Inventory on hand" for each stock item in a materialized view. I guess you sometimes favor saving calculated values in the database for immediate/faster access. is this correct

2. Now, the second half is how you implement this design and why do I need to use a materialized view. I might as well save this with my "Stock_Item" table.

Would you be updating this caculated field evertime a new receipt, shipment, adjustment of inventory occurs and everytime an existing record is updated?

Followup   July 17, 2007 - 2pm Central time zone:

1) sure

2) you want it to be transparent - done by the "database", like an index. So you know it is done. And done correctly.

Trying to maintain your own rollups and dependent data like that is where many a developer "goes wrong" as they many times try to do this via triggers and what not and get it totally wrong.

besides, we all know you only write code when forced to - so, if you don't have to do it yourself, you wouldn't! that is why you buy packaged bits of technology - so you are not reinventing the wheel!!
5 stars why they are NOT doing this out there?   July 18, 2007 - 2am Central time zone
Reviewer: Lajos from Hungary
Hi Tom,
I'm in the field of consulting on technical issues in the financial industry. Account balances could be a perfect example for your case: a "balance of an account on a given day" is not a separate entity, but simply the sum of credits minus debits for the account which happened not later than the given day. So it could be a candidate for your materialized view idea (transparent, done by the database etc.).
Yet I've never seen a CBS (Core Banking System) which does that, even those on the Oracle platform. They always have some form of end-of-day processing (EOD) and when I ask "what does your EOD processing do?", the answer always contains "we build the balances during EOD". And of course there are different (and sometimes awkward-looking) solutions to the problem to answer queries against balances anytime, even during EOD. Do you see something in this area that explicitly prevents implementing your idea?

Followup   July 18, 2007 - 10am Central time zone:

other than the fact that many developers - whom spend copious amounts of time learning the fascinating intricacies of their programming languages - do not know what the database can do....

no.
5 stars database design   July 23, 2007 - 6pm Central time zone
Reviewer: Sam 
Tom:

as a followup to the july 17 response, I am confused a little bit on how you create this MV.

Right now i have several functions that computer total inventory for a stock number for a give 
organization (several warehoueses), for a given warehouse, or for a given storage location

EAch one of those functions is a pl/sql program.

Do I keep everything the same, and just create a mterialized view that tracks the various numbers.  
How would the materialize view look like or you would create 3 materialize views:

stock_no, org_total, warehouse, warehouse_total, bin, bin_total


2.  Can you refer me to a schema where it implements you solution for a banking, credit card, or 
inventory application to see how they do it.

Thanks,


5 stars for SAM   September 17, 2007 - 9am Central time zone
Reviewer: Lajos from Hungary
maybe something like this:
create table accounts(accno number not null primary key)
/

create table bookings(accno      number not null references accounts(accno)
                     ,value_date date not null
                     ,debit      number not null
                     ,credit     number not null)
/
insert into accounts values(1);
insert into accounts values(2);
insert into accounts values(3);

insert into bookings(accno,value_date,debit,credit) values(1,to_date('20070101','yyyymmdd'),100,0);
insert into bookings(accno,value_date,debit,credit) values(1,to_date('20070101','yyyymmdd'),0,20);
insert into bookings(accno,value_date,debit,credit) values(1,to_date('20070102','yyyymmdd'),0,10);
insert into bookings(accno,value_date,debit,credit) values(1,to_date('20070102','yyyymmdd'),50,0);
insert into bookings(accno,value_date,debit,credit) values(2,to_date('20070102','yyyymmdd'),0,200);
insert into bookings(accno,value_date,debit,credit) values(2,to_date('20070102','yyyymmdd'),50,0);
commit;

select distinct accno,value_date,sum(debit-credit) over(partition by accno order by value_date) 
balance
from bookings
order by accno,value_date;
     ACCNO VALUE_DAT    BALANCE
---------- --------- ----------
         1 01-JAN-07         80
         1 02-JAN-07        120
         2 02-JAN-07       -150

create materialized view account_balances
build deferred
refresh on demand
as select distinct accno,value_date,sum(debit-credit) over(partition by accno order by value_date) 
balance
from bookings;

Materialized view created.

select * from account_balances
order by accno,value_date;

no rows selected

exec dbms_mview.refresh('ACCOUNT_BALANCES')

PL/SQL procedure successfully completed.

select * from account_balances
order by accno,value_date;

     ACCNO VALUE_DAT    BALANCE
---------- --------- ----------
         1 01-JAN-07         80
         1 02-JAN-07        120
         2 02-JAN-07       -150

insert into bookings(accno,value_date,debit,credit) values(1,to_date('20070103','yyyymmdd'),30,0);
1 row created.
commit;
Commit complete.

select * from account_balances
order by accno,value_date;

     ACCNO VALUE_DAT    BALANCE
---------- --------- ----------
         1 01-JAN-07         80
         1 02-JAN-07        120
         2 02-JAN-07       -150

exec dbms_mview.refresh('ACCOUNT_BALANCES')

PL/SQL procedure successfully completed.

select * from account_balances
order by accno,value_date;

     ACCNO VALUE_DAT    BALANCE
---------- --------- ----------
         1 01-JAN-07         80
         1 02-JAN-07        120
         1 03-JAN-07        150
         2 02-JAN-07       -150



This is far from real-world usage of course. I'm still wondering on putting something like this into real usage, mainly because I personally fail to see how to answer questions like "what is the balance for this account as of this date". The mview cannot be on commit refresh, because we don't want to serialize inserts into bookings.

4 stars design   September 19, 2007 - 11pm Central time zone
Reviewer: Sam 
Tom:

I want to take your opinion on designing this in the DB to see which is the best alternative.

I have 50 customers that log in to a web system and order magazines once per month. The magazines 
have four media (paper,cassette, DVD, CD).  Customer can select combination of these medias on same 
order.  I want to impose new annual quota rules regarding the ordering, so each customer will be 
assigned an annual quota (allowance) and that will be displayed/calculated on a monthly basis 
(annual/12). There will also be carryovers from month to month.

It seems there are two ways to design the quota reference table

with this I will have one row per customer.

customer_id  (100)
Year         (2007)
paper_quota   (1000)
cassette_quota (2000)
dvd_quota      (3000)
cd_quota       (4000)
Effective_Date  01-JAN-2007



I think an alternative design is to have 4 rows one per media:

customer_id   (100)
Year          (2007)
Media         Paper
Quota         (1000)
Effective Date (01-JAN-2007)


customer_id   (100)
Year          (2007)
Media         DVD
Quota         (1000)
Effective Date (01-JAN-2007)
etc.

One screen requirement is to display a screen with 50 rows for customer editing and display. I am 
not sure if that affects your decision or not.

Another requirement is that customer may adjust the quota for some customers through out the year. 
So this will result in a new ROW with a NEW effective date. I need to freeze the old row as that is 
used in a formula.

Which design do you think is better and why? I think both are normalized to 1NF, 2NF, and 3 NF.

If customer made a typo in a number, would you update the errored value or would you insert a new 
row and mark the first "Terminated".


Followup   September 24, 2007 - 7am Central time zone:

why not just have an orders table that you can query and figure out how much they've ordered year to date?

you already have everything (you must if you have an existing system whereby they can order this stuff)
4 stars design   September 20, 2007 - 7am Central time zone
Reviewer: sam 
Tom:

I forgot to mention for the above tables that i will create a one column sequence (Quota_No) for a 
primary key
instead of using 3 column or 4 column natural keys.

natural keys for table 1 would be (customer_id,year,effective_Date). Table 2 would be 
(customer_id,year,media,effective_Date). Sequence for PK would be much easier to use in ths case.



2 stars design   September 24, 2007 - 8am Central time zone
Reviewer: A reader 
TOm:

I already have an orders table for capturing orders.

However my questions was about enforcing quotas for monthly orders. I was checking about which 
table design you think is better for defining the quota numbers for each media and why.


Followup   September 26, 2007 - 1pm Central time zone:

I'm not sure why you want another table to duplicate that which you ALREADY HAVE.

Your specification is not detailed enough to suggest anything.
3 stars design   September 26, 2007 - 6pm Central time zone
Reviewer: A reader 
Tom:

I have an orders transaction table. This captures the orders entered by customers.

Now, I want to put limits sort of your credit card limits on charges you make.  This is a reference 
lookup table. So when customer goes to place an order I tell him you have 100 DVDs available to 
order and 200 Cassettes and so on. When he saves the order I validate the quantities selected 
against the monthly available quota before I accept or reject the order.

is it clear now. I am trying to see how you would design this lookup table. Would you have one row 
per customer (for all media) or 4 rows (one per media) per customer as explained above.


Followup   September 26, 2007 - 10pm Central time zone:

again, I say

YOU HAVE THEIR ORDERS
YOU HAVE THEIR QUOTAS

why do you need yet another table to see if their new order would exceed their quota?
4 stars modeling perils ...   September 26, 2007 - 9pm Central time zone
Reviewer: Gabe 
Sam:

Your model might consist of tables like CUSTOMER, ORDER, ORDER_ITEM, PRODUCT, CUSTOMER_PRODUCT_QUOTA.

A CUSTOMER can place many ORDERs, but an ORDER belongs to one and only one CUSTOMER. An ORDER may have many ORDER_ITEMs, but has to have at least one. Any ORDER_ITEM belongs to one and only one ORDERs and is for one and only one PRODUCT. The ORDER_ITEM has attributes like Quantity, Price, etc. A CUSTOMER has a QUOTA allowance for a PRODUCT over a period of time ... this would be the CUSTOMER_PRODUCT_QUOTA table:

{customer_id, product_id, start_date, thru_date, quota_quantity, ...}

You would have to implement a mechanism for ensuring there are no overlapping time periods for the same CUSTOMER and PRODUCT.

You would also have to decide how to implement Quota=0 and Quota=unlimited. For instance, if there isn't a row in CUSTOMER_PRODUCT_QUOTA for a given CUSTOMER and PRODUCT, does that mean ...
1. The CUSTOMER cannot order the PRODUCT?
Or
2. The CUSTOMER can order an unlimited number of PRODUCTs?

An ORDER typically has many DATEs associated with it: Entry/Received Date, Accepted Date, Shipping Date, Closed Date, etc. You'd have to decide which of these Dates should be used to compare against the period of time within a CUSTOMER_PRODUCT_QUOTA.

When placing/modifying an ORDER for a CUSTOMER the system should check that, for any PRODUCT being ordered, the SUM of all Quantities on ORDER_ITEMs for this PRODUCT belonging to active/closed [but not cancelled] ORDERs of this CUSTOMER, having the "whatever date" within the time period of the relevant/current CUSTOMER_PRODUCT_QUOTE, does not exceed the quota_quantity of the said CUSTOMER_PRODUCT_QUOTE row.


Now, these are just some hints ... modeling without clear, in-depth knowledge of the particular problem domain is a perilous undertaking. For this reason, it is unlikely you'd get modeling solutions here ... the other reason could be that, what one presents is already indicative of ambiguity ... enough ambiguity that the prospect of entertaining a lengthy exchange to establish a common language/terminology, clarify what is said and discover what hasn't been said becomes a quick show-stopping decision.

Anyway, if you want a quick opinion, on your terms, no questions asked ... 4 rows (one per media) per customer.


3 stars design   September 26, 2007 - 10pm Central time zone
Reviewer: A reader 
Tom:

It seems it still not clear to you
<
YOU HAVE THEIR ORDERS 
YOU HAVE THEIR QUOTAS 
>
I am not creating another table. I am taking your opinion on which is the best way to design the 
QUOTAS table from the two possible design i listed above and why.




Followup   September 27, 2007 - 6am Central time zone:

ahh, ok - the quota table should be something like


customer
media_type
quota_amount


if no row exists, they have no quota, if row does exist, they have quota. the verification logic would look something like:
add_order( .... );
select * from customer where custid = ? for update; <<=== serialize at custid

select ordered, quota_amount, media_type
  from (select custid, media_type, count(*) ordered from orders where custid=?)a,
       quotas b
 where a.custid = b.custid
   and a.ordered > b.quote_amount;

if you get rows back, they are in violation of the quota.
4 stars Argh ...   September 27, 2007 - 8am Central time zone
Reviewer: Gabe 
Sam:

On second thought, there is nothing in what you described to suggest a need for a separate QUOTA 
table. If the system is all about these 4 products then an ORDER table with 4 Quantities columns 
and a Customer table having 4 quota columns should keep everything nice and simple.

Sorry about the detour.


5 stars table   September 27, 2007 - 10pm Central time zone
Reviewer: A reader 
Tom:

yes but why you picked this

customer 
media_type 
quota_amount 
effective_Date

over

customer,media_1,quota_amount, media_2,quota_amount, media_#, quota_amount,effective_Date

Are not both valid normalized designs or the first design is more flexible and easier to code in 
case a new media comes out.

2.  The above is the same as when you have 3 phones for a customer (cell,home,office). do you add 
those to same customer table (cell_phone,home_phone,office_phone) or you create a child table and 
create 3 rows and link them to master customer table.

3. when you design transaction systems do you always normalize to 3NF (no transitive dependencies) 
or not? 3NF does not allow storing derived/calculated values in DB such as invoice total because it 
depends on price and quantity instead of primary key.


Followup   September 28, 2007 - 5pm Central time zone:

1) because I've been doing relational databases long enough to know that you will have 5 media types tomorrow.

2) and in 1985, we didn't have "cell" because cell phones where just coming out. So, in 1985 you would have had "home,office". In 2007, you have home, office, mobile, pager, whatever. And it'll change.

I would have a table of phone numbers, yes.

3) typically not, but not always "not"
4 stars Data model   September 28, 2007 - 8am Central time zone
Reviewer: Mike from Cleveland, OH USA
To those who would suggest 'an ORDER table with 4 Quantities columns and a Customer table having 4 
quota columns': Don't do it. 

This is so bad that if I heard a data-modeler candidate suggest a design like this during an 
interview, that would be the end of that candidate's chances with me. 

'Are not both valid normalized designs...?' No, it violates first-normal form (no repeating 
groups). There are situations when it is OK to deviate from first-normal form, but this is 
certainly not one of them.

Acceptable scenarios for non-1NF designs generally have specific, stable meanings for members of 
the repeating group. The example of home phone, business phone, cell phone is an example of this 
(although 'stability' may be questionable there). Or if you want to store balances for each month 
of a year in a single row, you can have a column for January balance, February balance, etc., as we 
can be confident that we will continue to have 12 months per year for the foreseeable future. 

What if the business adds a product? Or drops one? Or both? These are normal things that businesses 
do, but your design will not allow them to do that without you getting in the way. 

And these are just the tip of the iceberg, as far as problems likely to arise from a non-1NF 
design.

You should really have a positive reason for your denormalization - not just a lack of reasons not 
to.


4 stars OK to disagree ...   September 28, 2007 - 12pm Central time zone
Reviewer: Gabe 
Mike:

This is so bad that if I heard a data-modeler candidate suggest a design like this during an interview, that would be the end of that candidate's chances with me.

That's fine ... tell me where you work and I won't bother applying ;)

No, it violates first-normal form (no repeating groups)

I think you should review what repeating groups actually means.

http://www.dbdebunk.com/page/page/622301.htm

Look, what if this and that takes one quickly into universal models. They are valid questions to ask business as part of the modeling exercise ... in a forum, you hardly have access to the business and going back and forth is perilous. (Note to self: stick to the decision to keep away in the future). So, where do you draw the line? Well, you can only go with what was actually said and, I don't see anything in here to justify Quota as a separate entity rather than 4 distinct attributes of Customer. Not every "web system" with "50 customers" selling exactly 4 things needs an ERP.

There are systems where the wheels of a vehicle are instances of a Parts entity ... and there are systems where the wheels of a vehicle are attributes of the Car entity: right front, left front, right back, left back.

But it is OK if we disagree.


5 stars db design   September 28, 2007 - 6pm Central time zone
Reviewer: A reader 
Tom:


<because I've been doing relational databases long enough to know that you will have 5 media types 
tomorrow.> 

but you can also add columns for extra media in the same Quota table:

Media_4
QUota_4
MEdia_5
Quota_5

I assume when you have a row per media instead of adding new columns you do not have to change the 
database structure and it is much easier for querying. correct?

I think both design are valid but one is much more scalable than the other.


Followup   October 3, 2007 - 1pm Central time zone:

putting data in record is almost never "valid". Adding columns because you added a new media type is very "newbie" like. Not the way to do it.

A customer has one or more media types associated with them. Media types are an entity to themselves.
5 stars db design   October 9, 2007 - 9pm Central time zone
Reviewer: A reader 
Tom:

1.  If there is a requirement to store a dynamic variable in a table that anapplication uses to 
display the number of rows on a screen, do you usually create one column table for that like this 
or you create a differetn structure?

SCREEN_ROWS
-----------
Number_of_Rows  number(3)

2.  Let us say I have a formula that computes something

MV(A) = MAX(SUM(AB[x]/TM - TC, TO - A);

I can create a function to implent this. However would i be able to store this formula in a table 
and then change it in table and function will change accordingly?

If i code the function to implement the formula I would not be able to do that.


5 stars design   November 10, 2007 - 9pm Central time zone
Reviewer: A reader 
Tom:

I have a 4 small questions on db design.

1. Is it really possible to store a formula in a table and have a function to implement this 
formula. Then when formula changes the function will change accordingly?

I can't grasp it can be done but I heard some talking about it. The function will have to be coded 
to implement the formula.

2.  For a quota table like
Quota(customer_id,year,efective_Date,..other_details)

would you add a sequence (quota_id) for PK and add a unique constraint on the 3 columns,or have 3 
columns (customer_id,year,effective_date) as a PK.

Which would be easier for querying/coding.


3. Is it OK to have one validation table with one column for PK like

Calendar_year
-------------
2008
2009
2010

and

States
---------
CA
CO
AL

4. If i want to store variables for application code in a table like "number of rows on a page" so 
we can change later without changing any code, would you store one variable per table or can have 
one big tables with all variables values in it

variables
-------------------------
variable_name,variable_value


Followup   November 11, 2007 - 7pm Central time zone:

1) what is a "formula" ?

2) insufficient data to answer anything. There is no cut and dried answer to the "surrogate key" question. It depends on the referring objects and whether this quota row is "never changing" as far as the key goes.

I can say, if you have effective date, you do not need YEAR do you, you have the dates already.

3) sure

4) one index organized table would do it.
5 stars design   November 11, 2007 - 11pm Central time zone
Reviewer: A reader 
Tom:

1. Excellent point. One customer can have multiple quota rows per year with different effective 
dates. However, he can update effective_date, so this rules out using natural key for 
(customer_id,quota_year,effective_date). I have to use an oracle generated sequence. correct?


I would like to store the "quota_year  number(4)" even though it can be derived from effective_date 
because a lot of screens query by that column (quota_year) and user defines selects that too on the 
data entry screen. Is that an issue?

2.  A formula is a matematical equation. something like this

MV(A) = MAX(SUM(AB[x]/TM - TC, TO - A);

and the variables are caclulated from other transaction tables (i.e Sum monthly orders prior to 
current month x, sum monthly shipments prior to current month).

I am going to create a function to implement the formula. But someone was saying you can store the 
formula in the database so the function can implement the change automatically.

I can't see how this can be done by storing the formula in the database. I still have to go to the 
function and update it based on the changed formula.  I doubt there is an automatic way where you 
change a formula in a database and function changes automatically. what do you think?





Followup   November 16, 2007 - 12pm Central time zone:

you do not "have to", primary keys should be immutable but if you wanted to use an index organized table to have all of the data for a given customer physically colocated together (clustered together) for retrieval purposes you would use the natural key.


where trunc(dt,'y') = to_date('0101'||:x,'mmddyyyy')
or
where to_char(dt,'yyyy') = :x

(trunc is more efficient) is what you want to do - you DO NOT want to store the year, no.

You can index trunc(dt,'y')/to_char(dt,'yyyy') as well if you like.

At most you would

create view V as select t.*, trunc(dt,'y') yyyy from t;

(or to_char)



2) Well, gee - a formula is a mathematical equation :) laughing out loud at that - I guess I got exactly what I asked for.

Too bad there is no "mathematical equation" datatype really.

implement the function, the function is stored in the database, just like data - function changes, so what, change the function.

You would have to do a ton of dynamic SQL if you store a string in the database and call it a formula.





5 stars design   November 17, 2007 - 4pm Central time zone
Reviewer: A reader 
Tom:

1. What do you exactly mean by "index organized table"? is this a validation/lookup table or it is 
a table created by creating an index on a data table?

I assume that even if I use a sequence+unique index I can get the same result, right?

2.  what do you mean by this?
<You would have to do a ton of dynamic SQL if you store a string in the database and call it a 
formula.>

You mean if i want to implement a dynamic function where if function changes code automatically 
changes.


3 stars   November 19, 2007 - 8pm Central time zone
Reviewer: David from London
@A reader

In answer to 'what is an index organised table?'

I suggest you read one of Tom's books or the Oracle concepts guide 
(http://download.oracle.com/docs/cd/B19306_01/server.102/b14220/schema.htm#sthref1044) for more 
info, they all explain it better than I can! ;-)


In a 'normal' (aka heap) table the data is generally stored fairly randomly, it's just a heap of 
data, hence the name.  We might index the data to allow us to lookup specific values without having 
to scan the entire table.

If we only use the table to perform lookups then we would include the result of the lookup as an 
extra column in the index to avoid the subsequent table access by ROWID after getting the ROWID 
from the index.

For example, a table such as

KEY  VALUE
----------
1      RED
2     BLUE
3    GREEN

might have an index created on both key and value otherwise we'd have to get the rowid for the key 
we asked for and then go lookup the value in that row in the table.

This extra column also has the advantage that the keys and values are sorted together so if we're 
performing a lookup on a range then the values can be rapidly returned from the index without 
potentially jumping around the disorganised underlying heap table to get the rows (if the index 
blocks indicate that the rows in the table for consecutive keys are in different table blocks).

The downside is that if we add the extra column in the index then we've effectively stored all the 
data twice, once in the table and once in the index.  For large or many lookup tables this overhead 
might become significant.

If we declare the table as an index organised table then we dispense with the heap structure for 
the underlying table.  The index becomes the table and we perform all lookups directly from the 
index.

For the above example..

create table lookup (
  key   number not null primary key,
  value varchar2(100) not null
)
organization index
/

Note you can still add secondary indexes on the IOT in addition to the primary key.



For Q2, just think how hard it would be to perform some SQL based on some arbitrary function that 
is supplied as a string.  Your SQL would need to parse that string, understand any operators and 
variable placeholders in the string.  It would have to understand and obey operator precedence and 
parantheses to get the correct ordering of operations.  It would have to do this dynamically, hence 
a lot of dynamic SQL code...

Much easier, safer and reliable to expose as a function and change the implementation of the 
function itself to alter the behaviour.  You could implement multiple versions of the function and 
employ overloading to access the different versions.


5 stars db design   November 21, 2007 - 11pm Central time zone
Reviewer: A reader 
Tom:

Is it OK to add a small validation table

ref_active_status
------------------
status_code
Status_desc

where A = Active and T = TErminated

for all data tables that need a record status.

Does this add any overhead on the join? Someone mentioned that and I doubt that.

2.  would adding only a column "active_or_terminated" to the data table and a check constraint 
wihtout haveing a validation table be a better design and lower overhead.

thanks,


Followup   November 26, 2007 - 10am Central time zone:

is it ok?
sure, yes, definitely. primarily because tomorrow the status will be

a = active
t = terminated
m = maternity leave
s = sick leave
v = vacation

and so on....


there is no need for a "join" if you don't want - you put the foreign key on so we can validate the data upon insert and for retrieval, just use a view:

create view 
as
select t.*, decode( status, 'A', 'Active', 'T', 'Terminated', 'UNKNOWN' )
  from t;



5 stars design   November 26, 2007 - 4pm Central time zone
Reviewer: A reader 
Tom:

The "Active", "Terminated" refers to the Record and not the person. When a new quota record is 
addded you termintate the old one and add a new one with "Active" status.

I just do not beleive that a join will increase overhead. Are we talking 1/1000000 of a second.


Followup   November 27, 2007 - 3pm Central time zone:

somewhere between no overhead and a lot.

as in "it depends"

but I already gave you a valid approach that

a) enforces data integrity
b) removes the need for a join if it is a problem


5 stars design   November 28, 2007 - 10pm Central time zone
Reviewer: A reader 
Tom:

thanks, but i am curious on that "depends".

Does it depend on the number of rows in transactional table. Let us say you have 100,000 indexed 
records and you join that to the lookup table which has 2 records.

I guess we can benchmark it/time it to see if it does really make any difference or not unless 
there are other factors?


Followup   November 29, 2007 - 8am Central time zone:

if you have 100,000 records and join to 2 we would expect a plan like:


HASH JOIN
  TABLE ACCESS FULL t1
  TABLE ACCESS FULL t2


the small table (2 row table) would be read and turned into a hash table.

then, for each row in the big table, we would hash the key - and probe the hash table.

eg: psuedo code:


for every record in small_table
loop
    x := hash(join_key);
    place into hash table based on X
end loop;

for every record in big_table
loop
    y := hash(join_key);
    access hash table(y) to find match (or not)
    if match found 
    then
        output record
    end if;
end loop



as opposed to (with a DECODE or case)

for each record in large table
loop
    evaluate simple decode/case
    output record
end loop;



will that matter? likely not in most cases. But the hash computation for each record and the probe into the hash table would likely be a little more expensive than evaluation of a case/decode.




5 stars db design   December 7, 2007 - 12am Central time zone
Reviewer: A reader 
Tom:

 I have two questions on db design:

1.  If you are creating a table for authentication. Do you add the session id and expiration time 
to the users table or in a separate table and you terminate old sessions everytime you create a new 
session.

users
------
user_id
password
session_id
expiration_time

2.  If you have a book that may or may not exist on 4 servers, and goes through two tests would you 
track the status of that book on each server by adding one column per server or you do it in the 
data itself like

Books
------
Book_no
serverA_exist_YN
serverB_exist_YN
serverC_exist_YN
serverD_exist _YN
Test1_PF
Test2_PF

OR

Book_status
-----------
Book_no
Status_type
Status_code

100,Test1,P
100,Server1,Y
100,Server2,N


Followup   December 10, 2007 - 10am Central time zone:

1) typically users can have more than one session. hence, it would not make sense to have it be attributes of the users table. But, maybe that doesn't apply for you - so not possible to say really.

I don't use tables for authentication - I use the database or a directory service (single sign on)

2) huh?

but in general, if you are tempted to have columns named after servers (things, dates, observations, whatever) just STOP.

today you have 4 servers, tomorrow you have 5. Next year you have 10. Two years from now you have 1.

Using columns for that is a disaster.
5 stars db   December 10, 2007 - 2pm Central time zone
Reviewer: A reader 
Tom:

1. Why do you think adding column for a derived status is a disaster. There are many schemas that 
have "Phone1", "Email1", "Total Quantity of Item".

If there are more servers added (which I doubt but may happen), you add another column?

What do you recommend here? creating a 3 column table and use data instead or create one "Status 
code" and build a huge table for all combinations of possible statuses.

MAG_NO
STATUS_TYPE
STATUS

1000,s1,Y
1000,S2,Y
1000,S3,N


2. How can you create database accounts if you have 100 users and using mod_plsql. DO you create 
100 DADs or you let them key in userid/password in the DAD?

thanks,


Followup   December 10, 2007 - 2pm Central time zone:

1) i already stated why

because there will be a phone2, phone3, phoneN someday.

The correct way to approach this is fundamentally simple. You have a one to many relationship.


Answer this question: find me the person with the phone number (123)-456-7890

I don't care if it is phone1 or phone100 - find it.

find it fast
find it efficiently


select * from person where person_id in (select person_id from phones where phone=:x);


I'm done - person_id is the primary key, phones.phone is nicely indexed.

Now, how do you do it?

That is just one reason - the correct way to model and implement a 1:M relationship is via a parent/child table.


2) They can supply the userid and password, the DAD does not need to store one.

One DAD.
They all use it.
5 stars db   December 10, 2007 - 5pm Central time zone
Reviewer: A reader 
Tom:

Actually I do have a table to store the 1:M relationhip between a book and server "Book_server". It 
stores several details about when a book moves in or out of a server.

But what I want to decide on, is how to store the last status (derived) of that book on each 
server. For example I may have 100 records that move a book in and then out. I want to store the 
final status for existence of that book on that server (in or out).  

I think it is similar to inventory calculation where you have hundreds of receipts and shipments 
and you store the "final stock qty" in the master stock table. 


2.  I assume when you use the DAD authentication like you say, I have to create a static login page 
first and then pass that to the DAD because I cant have the login page generated from database 
unless I use some kind of dummy account.

I also think that if you want access control meaning showing users different links on a main menu 
web page you still have to create a table that assigns links or roles to db users? right


Followup   December 10, 2007 - 6pm Central time zone:

... But what I want to decide on, is how to store the last status (derived) of that
book on each server. ...


select * 
  from (select * from book_transactions where book_id = :x order by book_id, trans_dt desc )
where rownum = 1;



you don't store it "special", you just query it out.


or you have a SINGLE attribute - "what_server_am_i_on" that you set to the server name - not an attribute for each server it might be on.


2) no, if you choose to not store the user/pass with the DAD, the browser will prompt you. You create no page.

grant - you just "grant", then you have roles and privileges and they are stored in a table for you, the dictionary.
5 stars db   December 10, 2007 - 5pm Central time zone
Reviewer: A reader 
Tom:

By the way, I liked your example about the query on the phone number but I do not see it apply 
here. Each server is different object.

How can you rephrase the question to ask "Find me the book that exists on server 2".

I would do

select book_no from books where server2_in='Y';

AM i wrong?


Followup   December 10, 2007 - 6pm Central time zone:

... Each server is different object. ...

that makes no sense. I don't know what you mean.


if you have this transaction table....


select *
from (select a.*, row_number() over (partition by book_id order by trans_date desc) rn from book_trans a )
where rn = 1 and server_in = 'server2';


keep the most current transaction record for each book, and find the books such that the most current record says "I am on server2"

you seemed to be saying before "I have a book, I would like to know what server to find it on", that is easily answered as well.
5 stars db   December 10, 2007 - 7pm Central time zone
Reviewer: A reader 
Tom:

The queries that will mostly be done are:

What books do I have on server 1?
What books do I have on server 2?
What books do I have on server 3?
What books are on Server 1 and not on Server 2?

etc.

1.  Having a single attribute like you suggest would be too complex because you have to define all 
the combinations of book existence on each server and create a unique code for it. right?


2.  Are you suggesting to query it out every time from the transaction table and NOT storing the 
final Status in a table?

select * 
  from (select * from book_transactions where book_id = :x order by book_id, trans_dt desc )
where rownum = 1;

If yes, would you create an MV or one view per server status for ad hoc users to do a simple query 
to know the status? It would be nice if I have one view that computes the status of each server.



Followup   December 10, 2007 - 8pm Central time zone:

If a book can have more than one server - it should be such that there is a 1:M table that expresses "servers I am on"

A "transaction" record can be processed so as to remove or add a record to this table, but it would still be a 1:M relation


book 1:M book_transactions (your existing table)
|
+- 1:M book_to_servers_I_am_on


Now you can:

select book_id from book_to_servers_i_am_on where server = '1'; - to find those on server 1

select book_id
from book_to_servers_i_am_on
where server in ( '1', '2' )
group by book_id
having count( case when server = '1' then 1 end ) = 1
  and count( case when server = '2' then 1 end ) = 0



to find those on server 1 not on server 2.
5 stars db   December 10, 2007 - 10pm Central time zone
Reviewer: A reader 
Tom:

Thank you for the excellent reply.

1.  Just to confirm, is "books_to_servers_i_am_on" a "Book_Server_Status" table that you keep 
inserting records into (using a trigger) after there a transaction is stored into 
"Book_transactions"?

2.  Would not this be redundant since you can compute the same result by creating a view against 
"Books_transactions" that looks for the last created date for a server/book transaction.


3.  An ad hoc user can never write a SQL query like the one you wrote to find a book on server 1 
and not on server 2. They also might want to check different things like how many books are not on 
server1, not on server2 and on server 3.

 How do you make those queries easy for them with a simple command (ie as if you have one column 
per status(Y/N) )?

4. If a book goes through several cycles of same two tests on server1 only, would you use the same 
table "books_To_Servers_iam_on" to track the final result of each test (P/F) or you would create 
another similar status table or you would just use a view against the transactional table where all 
the messages coming in are stored.


Followup   December 11, 2007 - 7am Central time zone:

1) NO TRIGGERS, use well formed TRANSACTIONS.

2) you have "transactions", you have other "data", some attributes of your data includes the servers they exist on. your transactions table is like an audit trail, the REAL data is in the table I'm suggesting.

3) you write user interfaces, you provide API's. Most users cannot write a query to save their lives, that is what we do for a living.

4) you'll need to flesh this out - I don't know what "several cycles" is, means or implies.
5 stars db design   December 11, 2007 - 10am Central time zone
Reviewer: A reader 
Tom:

1. Why you do you prefer to write a transaction instead of a trigger. I think you mean do this:

--For a move transaction
insert into Book_transactions(trans_seq_no,book_no,server,status,col5,col6,col7,date_created) 
values (1,ISBN1000,Server1,IN,val5,val6,col7,sysdate);

--For the final status transaction
insert into Books_to_servers_i_am_on(book_no,server) values (ISBN1000,Server1)


instead of defning (After insert trigger) on Book_transactions.

2.  Are you saying that there is no issue in saving same data (book,server) in two tables?

Table 1 is an audit trail of all the transactions and Table 2 will have the final server/book 
status record?

or are you saying to not save that piece in Table 1 but only in Table 2.


3.  What I meant by several cycles is that if Book goes through "Test1" and it Fails, it can go 
through it again until it Passes. So you can have tests like this:

Book1, test1, Fail, 1/1/2007
Book1, test1, Pass. 1/2/2007 
Book1, test2, Fail, 1/3/2007
Book1, test2, Fail, 1/4/2007
Book1, test2, Pass, 1/5/2007

The status message for this data is also stored in "Books_servers" table.  Do you think it is 
appropirate to save the last final status of the test in same "books_to_servers_i_am_on" or create 
a smilar test status table like
"book_tests_results" that will have the last activity for each test?

or would you create a view that derives status from last datetime stamp record for each test.


Followup   December 11, 2007 - 11am Central time zone:

... Why you do you prefer to write a transaction instead of a trigger ...

because triggers are evil black magic that cause no end of issues over time as people just aren't expecting "magic side effects that just happen by accident". Every single time I see a trigger implementation - there is a war story involved that goes with it.

If you can do something:

a) in a straight forward, easy to understand, easy to document, easy to maintain way

or

b) by magic, stuff just happens, no one sees it happening, automagical

I choose (A) every single time.



2) we are not saving the same data it would appear. Your "transaction" table is an AUDIT TRAIL as far as I'm concerned, a historical record. The transaction in my opinion is:


a) update or insert a record into the "books to server i am on"
b) create an audit trail record stating you did this

the audit trail is historical, the table in (a) is your data.


5 stars db   December 12, 2007 - 11pm Central time zone
Reviewer: A reader 
Tom:

You convinced me with your excellent db design skills. just a few small questions

1. If i have a screen used by "Testing" where they can mark a failed book to be moved from server 1 
to server 2.  Another process/screen allows them to flag a passed book to be moved from server 2 to 
server 3. A 3rd process/screen allows to them to flag a book to move from server 3 to 4.

I have created 3 tables to track each transaction:

s1_s2_move
-----------
book_no
user_id
creation_Date

s2_s3_move
-----------
book_no
user_id
creation_Date

etc....

would you create individual tables or ONE big table with another column that tells you the "Type of 
MOVE"?

all_moves
---------
book_no
type_of_move
user_id
creation_Date

2.  Do you have any opinion on #3 above on whether you would also store book test reults in same 
table "which_server_I_am_on" or create a separate table or create a view that derives the values.

3. How do you usually decide between creating a view that looks up the last activity on a book in 
the transactions table which answers (which servers I am on) versus creating the small table 
"books_to_server_I_am_on" you advised? 


Followup   December 13, 2007 - 9am Central time zone:

1) why three tables. why three tables named after servers? what happens when you add the next server - which you say probably won't happen, but could. what happens when server 2 disappears and you need to move from s1 to s3.

think about this please - it is the same sort of situation as using columns in a table - instead of a child table.

You have different transaction types, ONE transaction table is more than sufficient, not only more than sufficient - the only way to do it really.


3) your transaction table does not support the query in question - if it did, we would just have used that. You cannot just pull the last transaction as I had originally presumed. Your transaction table is and audit trail really and in this case would not have been useful.
5 stars db design   December 13, 2007 - 5pm Central time zone
Reviewer: A reader 
Tom:

1. Actually the tables are named after processes (not servers). We have 3 processes that move a 
book from one location to another location.  Each process moves the book from one location to a 
different location.

I think what you are saying is that the more locations increase the more tables I will need which 
becomes unmanageable at some point in the future.

However, is not db design similar to organizing information on you desk. If you are in charge of 
moving stuff around, would not you create 3 different file folders and store each type of request 
in one.

I like your idea though.  Would you add one column "Move_Type" or two columns that define name of 
server "from_server" and "to_server" to implement this?


2.  Actually the transaction table does support the query in question since it is derived from it. 
Here is an example of the data that goes into each table.

book_transactions 
(trans_no (PK)
 book_no,
 server_id,
 directory,
 book_status_Code (in/out)
 date_in
 date_out
 created_Date,
 other_details.........)
 
 
 books_to_server_i_am_on
 (book_no (PK),
 server_id (PK),
 trans_no (FK) optional)
 )
 
 For example, I can received 100 messages into the "Book_transactions" on a book/server but the 
last one will show whether it moved in or out of server. That last status will be inserted into 
"books_to_server_i_am_on". So I will have one record in "books_to_server_i_am_on".
 
So doing a query on "Book_transactions" like you said before should also give me final status which 
matches the record on book_transactions.

select * 
from (select a.*, row_number() over (partition by book_id order by trans_date desc) rn from 
book_transactions a ) 
where rn = 1 and server_in = 'server2'; 

Thank you,


Followup   December 14, 2007 - 12pm Central time zone:

do not confuse your desktop with a database.

No, it is not similar. think of the files as rows. think of your desktop as your table - if you must.
5 stars db design   December 14, 2007 - 4pm Central time zone
Reviewer: A reader 
Tom:

Do you agree with #2 above that the status can also be derived for the transactional table too.


Followup   December 14, 2007 - 4pm Central time zone:

as long as what you say is true - that the "last" record is the one that you want.
5 stars db design   December 14, 2007 - 4pm Central time zone
Reviewer: A reader 
I think you say I can still have a status table instead of a view that derives it from 
transactional table. The view might take time if transactional table has thousands of records.

I think this is more like tracking "Stock Quantity" in a retail or inventory system. An item gets 
received and shipped every day. You compute the "Stock Quantity" each time and save it in a table 
so users can see it.


Followup   December 17, 2007 - 10am Central time zone:

You see, I would not tend to store stock quantity that way - because typically you are looking for the quantity of a single ITEM - and computing that is not expensive.

Same for individual books - I would not store this redundantly - I would have used the query with rownum way above:

select *
from (select * from book_transactions where book_id = :x order by book_id, trans_dt desc )
where rownum = 1;

extremely cheap to execute.

It was only when you said that you frequently needed this for ALL BOOKS at the same time - that we'd even consider storing it "special"
5 stars db design   December 14, 2007 - 5pm Central time zone
Reviewer: A reader 
Tom:

If a book moved "out" of server 1 and "in" to server 2.
 
IF you receive one status message per above, is it OK to enter 2 records into the 
"Books_transactions" to indicate the "out" and "in"?


OR you would log in the status message (one record) as you receive it in "Books_transctions" for 
audit trail and then parse it and enter the in/out results into other transaction tables?



4 stars Start your design with a data model   December 17, 2007 - 9pm Central time zone
Reviewer: Mike from Cleveland, OH USA
It seems you have spent a lot of time saying 'how about this design', 'how about that design'. But 
these seem to be unconnected attempts to score a hit, not the fruit of a good design process.

A good database design starts with a data model: a representation of the business data 
requirements. You should start with an understanding of the entities: 'what are the things I need 
to keep information about?' From what I read (hopefully all of the anonymous postings are really 
from the same person about the same problem):
- You have Books.
- You have Servers, which may also be called Processes.
- You have Tests, which (given that Servers may actually be Processes) may or may not also be 
suspected to be variations of processes.
- You have events which are Moves among servers.
- You have events which are Test Results.
etc.

For each entity, an identifier (natural or contrived) is identified. 

Then you identify relationships between them. A Book is located at one (or more? evidently) 
Servers. A Move involves a Book, a 'from' Server and a 'to' Server. A Test Result involves a Test 
and a Book.

Then you identify the rest of the attributes, or descriptive data elements, like names, times, 
quantities, etc.

From these requirements, described by entities,  relationships, and attributes, you have a data 
model. The starting point for your database design would generally be a table for each entity type, 
primary-key columns for identifiers, foreign-key columns and constraints for the relationships (or 
associative tables for many-to-many relationships), and columns for the remaining attributes. All 
good designs are derived from this starting point.

I have simplified here, because of the sketchy information and the limited time and space. But this 
is the process that you must do in order to arrive at the correct design. If you do not have an 
accurate data model, you do not have anything against which you can judge design proposals.

The book I learned from (and have recommended to others in the past) is Handbook of Relational 
Database Design, by Fleming & von Halle. It is nearly 20 years old but it is still listed on 
Amazon.com. Others may recommend other books. I suggest that you invest some time in learning the 
fundamentals of database design.

(This whole scenario sounds like it could be an academic assignment....hmmm)


Followup   December 18, 2007 - 1pm Central time zone:

(thank you...)
5 stars design   December 18, 2007 - 12am Central time zone
Reviewer: A reader 
Tom:

1. What did you exactly mean by this statement?

<It was only when you said that you frequently needed this for ALL BOOKS at the same time - that 
we'd even consider storing it "special" >

2. you said that you would "compute" instead of "Store" the Stock qty. However, does it make sense 
to compute thousands of records every time you want to know the "Stock Qty" of an item

For example, let us say we did an inventory for "Pepsi" cans in a supermarket and counted 1000.  
Now we enter this value in the database and then we will have transactions "Shipments" and 
"Receipts".  Your computation will always be:

Last Inventory Date (1000) - All shipments from that date + All Receipts from that date.

After 6 months (Assuming we did not do another count), you will be scanning records within 6 months 
time frame.

Does it not make sense to create another table
(Stock_Number,
 Total_Quantity)

and store that every night or every transaction.

thanks,


Followup   December 18, 2007 - 1pm Central time zone:

1) I meant precisely what I said.

I made the reasonable assumption initially that you wanted to find the status of A BOOK. I showed you how.

later you said "I want it for all books", that changes radically the question needed to be asked of the data - so the forgone assumption I made was not valid and the answer changed.

this is why we gather requirements and then design.

2) if I want it for a single stock item, I'd compute it when and IF i needed is (because my experience is you rarely need it in real life, the expense is not worth storing it for each modification - the extra code, the extra testing the extra concurrency issues - don't want to deal with it)


5 stars design   December 19, 2007 - 10pm Central time zone
Reviewer: A reader 
TOm:

OK I think I understand what you saying which is:

if the requirement is to know the status of one book create a view to compute it.

If the requirement is to "find all the books on server A or find all books on server A and C" then 
store it in a table and update the status in every transaction as it will take time to compute it 
real time on requests. This rule applies even in the case you have low number of transactions per 
book (like 10-20 records per book).

Correct?

2.  In your experience with Banking applications and retail chains, do they mostly store stock 
quantity or customer balance or compute it. It seems to me it is stored in a table but might be 
updated nightly?

3.  If you are creating a table to store an administrative input screen where admin logs in, select 
a function (6 radio buttons for different functions) and press submit. The table will be an audit 
log (uneditable) meaning that every time he does "submit" a new row ins inserted. 

Would you create a "Reference table" for all functions and store a function_id in the transaction 
table or you would create the actual name of the function and just have one table. Function length 
varies from 1 word to 4 words.

Which is better?

Thank you,


Followup   December 20, 2007 - 10am Central time zone:

I think you need to

a) define your requirements

b) model a schema that fully supports efficiently your requirements

that is a general way of saying what I'm saying.


if one way were universally better, it would be hard coded into the software and you would not be able to do it any other way.
5 stars design   December 20, 2007 - 11am Central time zone
Reviewer: A reader 
Tom:

I did both of what you said. IT is just that little small thing that is left on deciding whether to 
store derived value or crete a view.  
My requirement is to know "all books that exist on one server" or "what all books that are on one 
server and not on other server", etc.

Based on this i concluded from your discussion, that I might want to store these in a status table 
(server,book) that would be updated based on the transaction coming.

Am i wrong in my intepretation of what you are saying.


Followup   December 20, 2007 - 2pm Central time zone:

that would be correct. You want "all book status frequently"
5 stars design   December 20, 2007 - 6pm Central time zone
Reviewer: A reader 
Tom:

WOuld you do a) or b) or c) described below for this requirement?

I want to store the action that an administrative does on a screen where he select a function (6 
radio buttons for different functions) and press submit. The table will be an audit log 
(uneditable) meaning that every time he does "submit" a new row is inserted. 

a.  Would you create a "Reference table(function_id,function_Desc)" for all functions and store a 
function_id only in the transaction table or 

b.  you would store the actual name of the function in each record and just have one table. 

c.  Or use two tables but store function_id and 
function_desc in the audit log table for viewing ease and query ease.


Function length varies from 1 word to 4 words or 10 bytes to 30 bytes.


Followup   December 21, 2007 - 2pm Central time zone:

depends

will the name of the function change over time?
does the name of the function need to be constrained to a specific set?
is this function referenced in other tables?


5 stars design   December 21, 2007 - 11pm Central time zone
Reviewer: A reader 
Tom:

I think you need to write a DB design book. it seems you are a master in that just like oracle.


a) No, name of function will stay the same.
b) i am not sure what you mean. name of function is like "Lock table", "close weekly job", "unlock 
table", "run monthly report",  etc.

all of these are radio buttons on the screen. So i guess yes he is constrained with one of these 
options.

c) No, this function is not referenced in other tables. just a history log of the action he took.



Followup   December 22, 2007 - 11am Central time zone:

a) are you *sure* :) things change, think long and hard about that.

b) do you need to make sure "Lock Table" is put into this table or would "lock table" be OK or would "Lock-Table" or .....

do you want to constraint this attribute to a specific domain (probably, you want clean data)

You can accomplish that via a check constraint for a small set of values, you can accomplish that with a lookup table that has a single column containing the list of valid values.

If (a) is "no, name of function will stay the same forever", you either want

1) check constraint to constraint the valid values
2) or a single column lookup table with them


4 stars design   December 22, 2007 - 11pm Central time zone
Reviewer: A reader 
Tom:

I would like you recommendation in the 2 cases.

1. assume function name will not change. you said I will create a check constraint or lookup table. 
But which solution do I implement described above a), b), c).
Would you store the function_name in the transactional table ot just function_id or both.

2.  assume function name will change. how would the solution change from answer in #1.

Thank you,


Followup   December 23, 2007 - 5pm Central time zone:

1) what is "function_id" - sounds like an unnecessary surrogate - hence I would not see it.

2) then you have a lookup whereby you assign a surrogate key (ID) to this name, the name is unique, the ID is the primary key, you use the ID in your tables.

If name changes, update name in this lookup table and it is changed everywhere.
5 stars design   December 23, 2007 - 9pm Central time zone
Reviewer: A reader 
Tom:

I can't see how function name changes.  Do you mean if you have a function called "Backup Asktom" 
and next week you decided to call it "Back up Asktom website", is that a function change. You are 
only changing the "Label".

Function_id is a unique number for each function. I think what you are saying is that:

If function name does not change store the function name in transactional table even though the 
name will be repeated everytime the admin performs it

Admin_Table
-----------
Seq_no (PK)
Function_name 
Created_Date

If function name will change over time then do this

Admin_Table
-------------
Seq_no (PK)
Function_id (FK)
Created_Date

Ref_functions_table
-------------------
Function_id (PK)
Function_name


Is this correct?


Do not you think #1 would be unnormalized as function name will repeat in different rows. Also, I 
assume I can't search by function name unless I spell out the full name.





Followup   December 24, 2007 - 8am Central time zone:

that is what I meant - if the function name changes in the future. Forget this particular example, think "primary key"

If the attribute I choose as my primary key is subject to possible change in value in the future, it is NOT my primary key. That is all I'm trying to say here.

I am saying you will have a "lookup table" in ALL CASES.


function name is a constrained value, subject to validation.

You will either

a) use it as a 'natural key' in a 'lookup table' - either you will have:

create table audit( fname varchar2() check (fname in ( 'a','b','c' ))

OR you will have

create table lookup (fname varchar2() primary key ) organization index;
create table audit( fname references lookup );


b) use it with a surrogate key:

create table lookup( id number primary key, fname varchar2() ) organization index;

create table admin( function_id references lookup, ... );




Your first case is MISSING a lookup table. Your question doesn't make sense - just because a value repeats does NOT MEAN "you are not normalized". Think about this for a minute, if you use the lookup table with function_id, function_name - does not the function ID now repeat just as often as the name would have - tell me what the difference would be and why one is more normal than the other.


5 stars design   December 24, 2007 - 8pm Central time zone
Reviewer: A reader 
Tom:
1.  In both cases I am not planning to use ¿Function_id¿ or ¿Function_name¿ as part of the Primary 
Key in the audit table.. I already have an Auto Generated Sequence for the transactional/audit 
table for that.
So I can constrain the function name with either a check constraint or a lookup table.
The question remains do I store the function name or function id or both in my audit/transaction 
table?
Currently, the screen will reflect the functions the customer requested. In the future it may/may 
not change. I can¿t really predict. In a few months/years they might ask to add new functions. 
Changing the existing labels I doubt it although it can happen.



2.    Concerning normalization, is not this is similar to storing customer name all the time in a 
transactional system. You always store Customer ID and not name? Would not that make repeating 
values issue.

1NF = No repeating groups or multivalued fields.
2NF = Every column should be dependent only on primary key.
3NF = No transitive dependencies (no derived columns).




Followup   December 25, 2007 - 10pm Central time zone:

1) it would not be a primary key in the audit trail, at most it would be a foreign key to a lookup table OR constrained to a set of values by a check constraint.

I do not see anywhere where someone said otherwise?



If function name is immutable, it is the NATURAL KEY OF THE LOOKUP TABLE (which might be enforced via a check constraint as easily as a lookup table).

there is no function "id" if function name is a primary key (not the primary key of some audit trail table - a primary key of a lookup table that is "the set of functions")


2) no, you have totally missed the definition of normalization. Entirely.

We might use a customer ID because..... customer names change - or they might even repeat (I can certainly foresee two John Smiths). The customer name does not uniquely identify a customer. So, we use a surrogate in this case.

Do you see that you repeat your customer_id as much as you would repeat your customer_name - please tell me why one would be "normalized" and the other would not. Please tell me the difference between the two???? (hint: there aren't any - we are not talking about normalization here at all - we are talking about what is appropriate for a key value - just because you see the same data repeated row after row in a CHILD (hint - keyword CHILD) table does not mean "you are not normal" - it just means you have a foreign key)

IF (your function name is immutable)
THEN
   It is as good, if not better, than some meaningless number and should be used as the KEY of a 
"function" entity - this entity may or may not be physically present as a lookup table, we might 
just use a CHECK CONSTRAINT instead of a foreign key - but the end result is the SAME
end if

5 stars db design   December 30, 2007 - 11pm Central time zone
Reviewer: A reader 
Tom:

OK, I got it.

I think what you saying is that:

Both cases I need wither a lookup table or check constraint to only allow valid functions be stored 
in the audit table.

If the function name will not change, then store that in the audit table instead of a meaningless 
key. 

If function name will change then create a key and store that in audit table.

Correct?

But how would I or you know if function name will be changed in the future. We are only talking 
labels here too.

Would not storing a function ID be safer and also easier for indexing/querying?

Thank you,


Followup   January 1, 2008 - 6pm Central time zone:

If your natural key is a true key (doesn't change) then yes, you would not make up some surrogate for it.

you would use the natural key (the function name in your case)

... But how would I or you know if function name will be changed in the future. We
are only talking labels here too....

YOU have to answer that.

If these are labels - they will PROBABLY change - because next year, someone will not like the labels you picked. They'll want them different.


5 stars design   January 2, 2008 - 11am Central time zone
Reviewer: A reader 
Tom:

One function name is:

"Close monthly order (all customers)"

That would be kind of long to index/query even assuming it would not change. Do not you think so? 



Followup   January 2, 2008 - 2pm Central time zone:

that is not a function name (please, just look at this stuff and use some reasonable-ness test). That is obviously a description, and obviously will but subject to change - next month it should be "Close Monthly Orders (for all customers)"


5 stars design   January 3, 2008 - 3pm Central time zone
Reviewer: A reader 
Tom:

yes it is a description. Does that change anything in your proposed solution above. I wont use it 
as a PK for sure.

But I assume you would still use a lookup table/check constraint to validate it and assign a 
surrogate key for each description.

Or you would just save the description in each transaction saved in the audit table?


Followup   January 3, 2008 - 3pm Central time zone:

a description is not a primary key value (obvious isn't it?)

The description sure seems to me to be "not immutable".

This is a column in a lookup table. You would have a small code pointing to it.
5 stars design   January 5, 2008 - 11pm Central time zone
Reviewer: A reader 
Tom:

If you are designing a simple system to store all the http messages coming in to your application 
and all the XML responses that the stored procedure sent out to the client program. I need to check 
out the response that went out for each message coming in. 

would you create two table like this:

MESG_IN_LOG
----------
mesg_no(PK)
message

MESG_OUT_LOG
-----------
mesgout_no (PK)
mesg_no (FK)
message

OR one table 

MSG
-------
msg_no
message_in
message_out

I think you are going to ask if it is one to one realtionhip or one to many relationship. Correct 
me if I am wrong I do not think a stored procedue can generate multiple http responses to one http 
request.



Followup   January 7, 2008 - 7am Central time zone:

this is absolutely a single table.

there is a one to one relationship between

a) INPUTS
b) OUTPUTS

think about it.... You have a set of inputs, you do something, you formulate a response and send it back.
5 stars design   January 8, 2008 - 7pm Central time zone
Reviewer: A reader 
Tom:

SO you are saying have one ROW for URL coming in and XML message going out?

You can never have multiple messages out for one http request?

But using your logic, if you have a shop and you only ship one ORDER once, would you create one 
table for the order and shipment information. It seems to me two tables would me much cleaner even 
with one order to one shipment relationship. is not?


Followup   January 8, 2008 - 8pm Central time zone:

A stored procedure

a) receives a set of inputs
b) produces a set of outputs

you are saying you want to log that, therefore, you have a single table to log that.

Now, whether you need a table per procedure is AN ENTIRELY DIFFERENT question, however, if all of your procedures

a) take a single xml input
b) produce a single xml output

you are back to one table.
5 stars design   January 9, 2008 - 2pm Central time zone
Reviewer: A reader 
TOm:

When you populate Validation/Lookup tables, do you use a number sequence or a abbreviation code for 
PK.

like for this description 

Close Monthly Orders (for all customers)

do you use a code like "Close" or "CMO"
or a number like "1"





Followup   January 10, 2008 - 2pm Central time zone:

depends.

for something well defined like a state lookup, I'd use the state lookups

VA - Virginia
NY - New York


For something with established codes, you would use those.


For something like you just described though, it sounds like you are "making up codes", in which case any surrogate will do, users will never see these values - numbers would work fine and be easier to add over time.

What would you do in the future when you not only

CMO - Close Monthly Orders
??? - Clone Monthly Overstock

5 stars db design   January 25, 2008 - 8pm Central time zone
Reviewer: A reader 
Tom:

If you need to store a software version in db colum like
version 1 release 2 as

1.20 

would you store it as number(5,2) or varchar2(5).

I need to compare this number to another user provided number to see if it is larger or smaller 
than that.



Followup   January 28, 2008 - 7am Central time zone:

depends on your versioning scheme, doesn't it.

will your software always be major_release.patch_number

and they are both numbers so that 1.2 makes "sense" - it would be 1.2, not 1.20, not 001.20, but just 1.2.

or will you have a 1.2a, 1.2b, and could 1.2z come before 1.2j (there was a 6.1 before 6.0.36 of Oracle...)

In general, version numbers should likely be held in a table with a "date of release" for sorting purposes and then some string that represents the version. 5 characters seems rather small.
3 stars RE: "software version in db column"   January 28, 2008 - 10am Central time zone
Reviewer: Duke Ganote from Batavia Township, Clermont County, Ohio USA
Maybe something like Tom's own:
http://tinyurl.com/26pkxu
which just is a short link to:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:516338600346572071#5242544003466
36247


5 stars version   January 28, 2008 - 11am Central time zone
Reviewer: A reader 
Tom:

the requirement is to maintain the current oracle app version/release. No need for maintaining 
history and there is no alphabet involved.

When the client makes a login, he submits his version number and we compare that number to the 
current one to make a decision.

My point is you cant do comparisons (<, >, =) with varchar2 columns. The column has to be a number 
type or converted to number when doing comparisons.


Followup   January 29, 2008 - 2am Central time zone:

... the requirement is to maintain the current oracle app version/release. No need
for maintaining history and there is no alphabet involved. ...

but your next paragraph belays that

... When the client makes a login, he submits his version number and we compare
that number to the current one to make a decision. ..

in order to know ">" or "<", you need to understand the "history" and yes, today perhaps there is no letters, but think about it



Also, how could a client have a version greater than your current? So, it would seem it could be:

a) check to see if it is the same - then they are current
b) else they are not current, hence they have an older release



You can do >, <, = on varchar2 columns - why would you say otherwise?

Manytimes people stuff a date into a string - using yyyymmddhh24miss for the format, that absolutely lets you use >, <, = to compare the strings "as if they were dates"

You can do lots of things, but in this case if you always have "major version.patch level", using a number would suffice.

4 stars Can we have multiple instances in one database?   January 31, 2008 - 9pm Central time zone
Reviewer: Dawar Naqvi from LA, CA -- USA
Can we have multiple instances in one database?


Followup   February 4, 2008 - 3pm Central time zone:

an instance mounts and opens exactly ONE database.
a database (with RAC - real application clusters) may have more than one instance associated with it.


instance => to a single => database
database => accessed by one or more => instances


instance = memory + process
database = set of files.


instances are not "in" databases, instances are processes and memory you use to access a database.
3 stars @Dawar Naqvi RE: multiple instances in one database   February 1, 2008 - 3pm Central time zone
Reviewer: Duke Ganote from Amelia, OH USA
The Oracle Concepts Manual gives definitions, saying:

"An Oracle database server consists of an Oracle database and an Oracle instance. Every time a 
database is started, a system global area (SGA) is allocated and Oracle background processes are 
started. The combination of the background processes and memory buffers is called an Oracle 
instance.

"Real Application Clusters: Multiple Instance Systems
Some hardware architectures (for example, shared disk systems) enable multiple computers to share 
access to data, software, or peripheral devices. Real Application Clusters (RAC) takes advantage 
of such architecture by running multiple instances that share a single physical database."

http://pages.citebite.com/a2w0k7l2hsim


5 stars db design   April 1, 2008 - 10pm Central time zone
Reviewer: A reader 
Tom:

as a followup to above question let us say you have a client sending messages that tell database 
status of books (i.e  moved into server A, moved out from server A, etc.).

I am saving all the incoming messaging into an audit trail table "book_trans", and then based on 
the message content I am inputing a status record into book_status table. If a book exists on 
server A, I enter this record amd i store the trans no that generated this record. Of a message 
says that book was deleted I delete the record from book server status.

book_trans
------------
trans_no
book_no
server_no
status (in/out)
directory
length
author
etc...


book_to_server_iam_on
-------------------
Book_no
Server_no
trans_no

My question to you, is that I need to track 3 things for a book: whether a book exists on specific 
server and if yes which directory it is on.

1. Do i add "directory" to "book_to_server_iam_on" or always get it using the trans_no column from 
book_trans table.
2. is it ok to store the "trans_no" in "book_to_server_iam_on"". I added it to sort of debugging to 
see the last message that generated this status record. 


Followup   April 2, 2008 - 6am Central time zone:

1) you already have it stored, indexed, easily accessible in book_trans, I see no reason to duplicate that information.

2) seems like a foreign key to me in this case - back to the relevant information for this book.
4 stars db design   April 2, 2008 - 12pm Central time zone
Reviewer: A reader 
Tom:

well "book_trans" might have 1000 transactions of which the last action/transaction result 
(derived) is stored in book_status. So you are saying not to store the "book directory" value of 
that last message again and just store the last transaction number in "book_status" and get it from 
"booK_trans"  using trans_id instead of querying one table only "book_status" for which books i 
have on servers and where they are located. is this correct?


Followup   April 2, 2008 - 12pm Central time zone:

I'm not saying anything really - we've had this discussion over the course of many many weeks/months - and I don't remember all of the details.

store data normalized, do not replicate data. do not be afraid to just join.

I would have to assume that trans_no is the primary key - so it doesn't matter how many rows are in that table, we are only getting one.
5 stars roles in hierarchy.   April 5, 2008 - 12pm Central time zone
Reviewer: Amir Riaz 
Hi tom

I am designing a database. I have a requirement in which i have to maintain a hierarchy amoung 
users. for example the top user is 

            ceo 
 manager,              chief engineer, 
     financial manager        marketing engineer
                   account            Product engineer


the requirement is each user can see all the data below his hierarchy. there is also a chief 
administrator and its administrators. a chief administrator manages administrators. an 
administrator manages managers roles and all the roles below it. these role are static means once 
an administrator assign to a user. only administrator can change it

while chief engineer and its below roles are dynamic. i.e any person in the top hierarchy can 
change the users below it for example chief engineer can make a marketing engineer a product 
engineer and a product engineer to any one low in hierarchy

I have created a user entity and a role entity with many to many relationship(My God another 
contraint). i have entered all the above hierarchies as role in it. what i think is how can i 
represent hierarcy in role table. I have thought two options so far.

use numbers with the hieghtest number representing the top role

use self join with role entity.

should i use another entity for dynamic roles like dynamic_user and join it to roles. it has same 
attributes as user entity has. 

I am using FGAC for security purposes.

thanks 
regards
Amir Riaz


Followup   April 7, 2008 - 8am Central time zone:

... use numbers with the hieghtest number representing the top role ...

dismiss this thinking - the "highest number" thinking - there is no such thing as a "highest number".

I cannot answer your question - the details are a bit "fuzzy". Normally, a hirearchy would require a single two column table

ROLE    PARENT_ROLE
------  ------------------
CEO     <null>
MANAGER CEO
CHIEF   CEO
....


and role would be the primary key, and you would have a foreign key to point to it.

I did not get the "many to many" bit

5 stars Thanks for reply   April 7, 2008 - 12pm Central time zone
Reviewer: Amir Riaz 
Hi tom

I did not get the "many to many" bit

what i mean is a user can have multiple roles and a role can be assigned to multiple users.

so far i have two solutions. In one solution i have created a entity named users with user_id(pk) and roles with role_id(pk) and an associated table to resolve the many to many relationship have role_id and user_id as primary key. But again how can i form hierarchies in them.

Second solution i will have a users table and have subtype tables like ceo, chief engineer, product engineer. so for example chief engineer table will have primary of user_id and chief_eng_id and similarly with other tables of product engineer. also as You have pinpoint a very vital point that in hierarchies we should use one table and two columns like your example above I am thinking that with second solution i create another table named user_hierarchies and try to arrange data a you did in your example I hope it will work what your opinion.

with which solution hierarchy can be formed easily.
also Some where on this site i have seen FGAC example in which you demonstrated how a top user can view the data of all the users below it. but i am failed to found it now

thanks
Regards
Amir Riaz

Followup   April 9, 2008 - 9am Central time zone:

will, then users (if they are many to many with roles) could and likely will lead to recursive loops in your data - what than? What happens when the ceo manages the managers who manage the clerk - and the ceo is also a clerk...
5 stars Head spining   April 9, 2008 - 12pm Central time zone
Reviewer: Amir Riaz 
The role Ceo ,manager financial, manager account are static means that they are assigned by 
administrator to login into the site and cannot be changed after words. The roles remain fixed to 
users and cannot be changed without administrator login. Actually its a one to many relationship 
but to keep the design simple i am planing to keep them in many to many. 

the roles Product engineer, marketing engineer, chief engineer are dynamic means that they are 
initially assigned by the administrator but after words any person in higher hierarchy can change 
it to lower giving his position to anyone else. A person cannot be product engineer and chief 
engineer in the same machine part. however he can be product engineer in one part and chief 
engineer in an other part. so there is no cyclic data and it also gives a many to many relationship 
between users and roles.

what's your opinion about the second solution? or do you feel that there are still conflict in 
requirements? 
How can i detect conflict in requirements?



Followup   April 9, 2008 - 2pm Central time zone:

... Actually its a one to many relationship but to keep the
design simple i am planing to keep them in many to many.
..

sorry, now MY head is spinning

how would using an entirely inappropriate model make things 'simple'

seems to me - it would lead to no end of confusion, bad data and bugs..
5 stars db design   April 17, 2008 - 7pm Central time zone
Reviewer: A reader 
Tom:

A quick question on the above.

Book_trans is the table that stores incoming messages for whether a book was copied to a server or 
not.

book_trans
------------
trans_no
book_no
server_no
status (in/out)
directory
length
author
etc...

This is a derived status table based on transactions coming in into book_trans. When a book moves 
to a server, I save a recod into book_trans (audit trail) and inser a record into 
book_to_server_iam_on.

book_to_server_iam_on
-------------------
Book_no
Server_no
trans_no


Some of the transactions are "MOVE" versus "COPY" meaning it involves "IN" record into server B and 
"OUT" record from server A. There is other information that the message sends.

My question, do you think it is a good design to insert two records into book_trans if the message 
was a MOVE type or I should only insert one record and update my other table 
"book_to_server_iam_on".

Which design you think is best?


Followup   April 17, 2008 - 10pm Central time zone:

we've had this conversation ad nauseum
5 stars design   April 17, 2008 - 10pm Central time zone
Reviewer: A reader 
Tom:

I did understand. the previous discussion was related to different issue. 

you do not have a preference on how you would do it. Would you take one message and generate two 
records in the trans table or just one.


Followup   April 17, 2008 - 10pm Central time zone:

go back to your requirements, for your system.

do you need to record a MOVE for your audit trail or not.


I don't have enough information (my view of what you are doing is actually very hazy - you keep moving the cheese - I have no idea what your goal is - you do though, so... you get to decide)
5 stars design   April 17, 2008 - 11pm Central time zone
Reviewer: A reader 
Tom:

sorry maybe I do not explain things well.

requirements are not very clear on this too. requirement just state to store the incoming message.

But if you are designing a banking system and user moved money from account A to account B and you 
are building an audit trail of his transaction, and some client send you one message about this 
transaction.

would you store that as one transaction in oracle audit trail table or two transactions " one for 
deposit into account B and one debit into account A.


Followup   April 18, 2008 - 8am Central time zone:

... requirements are not very clear on this too. requirement just state to store
the incoming message.
...

perfect, store the message, but also store the DATA CORRECTLY. No materialized view, just store the data correctly.


...
would you store that as one transaction in oracle audit trail table or two
transactions " one for deposit into account B and one debit into account A.
....

you tell me, what are the data requirement needs of YOUR system. 
5 stars design   April 18, 2008 - 9am Central time zone
Reviewer: A reader 
Tom:

date requriements are to track a history of the moves of a book. so if i went to DB i need to see 
that book 1 has moved "in" to server "A" today. This automatically implies that book moved "out" of 
server "B".

The incoming message (which triggers the API) only emphasize the move in and the code should create 
automatically the "out" record. Just wondering if it s OK to create two records in "audit" trail 
table for that one incoming message, or the seond record is considered to be derived or concluded 
and should not be stored there.

how would you design it.


Followup   April 18, 2008 - 10am Central time zone:

You may do it however you want to, what pleases you the most.
5 stars db   April 18, 2008 - 3pm Central time zone
Reviewer: A reader 
Tom:

I always like to take opinion of the best in the world. This is how you learn new things correctly.

I interpreted your advice 
<No materialized view, just store the data correctly. >

to mean take one message and store two records in audit trail to reflect the OUT from one server 
and IN to another server..

This is my plan.

BTW, is there like a "brain exercise gym" you visit. I always wondered how you get that vast 
knowledge. is it "reading" or "practicing", or "genetic intelligence".

What is the secret.


5 stars DATABASE ARCHITECTURE FOR APPLICATION   April 21, 2008 - 2am Central time zone
Reviewer: supriya from INDIA
Hi Tom,

I am developing a small application for maintaining student information using forms6i,oracle9 and 
os-windows.
i have sample version of oracle software which is downloaded from oracle site (right now using the 
same).
here my question is ...

1) can i use sample version of database for real application?
2)is it creates any problem after sometime?

right now i have created one new user (admin) by connecting to system user and started creating 
tables,...etc

3) I am not much familiar with concepts of maintainance of database...i mean 

how to maintain table-spaces/extents once it's full
how to extent the extents/segments/tablespaces
any auto extent feature? if yes, where to set this parameter?

the people who are going to use this application doesn't have any DBA person.

what are the steps need to take to handle databse without DBA?

Transaction related info : daily nearly 5000 to 6000 transactions in atable having around 10 
columns in a table.
column max size is varchar2(<30)...

application has hardly 15 to 20 tables.

appreciate quick response.


Thanks in advance,
Supriya



Followup   April 23, 2008 - 4pm Central time zone:

1) if you use Oracle Database XE (express edition) you are bound by the licensing agreement that goes with that (4gb limit on data, 1 cpu, and so on).

If you use Oracle Database SE, and EE (standard/enterprise) you have to purchase that.

2) ... what are the steps need to take to handle databse without DBA? ...

probable to not use the tools you are using, to use HTML and to host the database for them (eg: use APEX - application express - apex.oracle.com for demos).

I seriously doubt you want to use forms, Oracle 9i (old) for this without having some IT staff to deploy
5 stars   April 22, 2008 - 5pm Central time zone
Reviewer: A reader 
Tom,
I have 10 users in each group like finance, hr, research.
10 users in each group will have similar privileges.

Do you recommend creating user for each of the user in the database? So it will add up to 10*3=30 
users. This might increase in the future.
OR
Do you recommend creating user groups - fin, hr and res and grant privileges to these users?

Which is better in terms of user management in Oracle database?


Followup   April 23, 2008 - 6pm Central time zone:

especially with such small numbers of users (anything less than many many thousands being small), I'd want an account per user, for accountability reasons.
5 stars db design   April 27, 2008 - 4pm Central time zone
Reviewer: A reader 


5 stars DATABASE ARCHITECTURE FOR APPLICATION   May 5, 2008 - 5am Central time zone
Reviewer: supriya from INDIA
Please refer the follwing for my previous question and followup given by you...
**************************************
Followup   April 23, 2008 - 4pm US/Eastern:
***************************************

oracle version details ... 

*************************
SQL*Plus: Release 9.2.0.1.0 - Production on Mon May 5 14:41:57 2008
Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.
Connected to:
Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.1.0 - Production
***************************

OS : windows XE

for this small application do i need to go purchase of the same.This application i am using just 
for internal purpose and installed only on 1 machine...application/database will be accessed by one 
machine only.

database and application is inatlled in their systems...in a few days there are going to start 
using it...
with online support...

here my question is 

1)once the datafile is over how to extend that?
like this any other precautions need to take regarding DBA activities...
2)any initilization parameters need to set?

Thanks in Advance
Supriya



Followup   May 5, 2008 - 10am Central time zone:

1) the the datafile is over WHAT?

Please check out the 2 day DBA guide that comes with XE - it covers the basics of administration.


2) you can accept the out of the box ones if you like - the ones that need be set (db block size for example) already are, else the database would not start.
5 stars db dsign   May 7, 2008 - 12am Central time zone
Reviewer: A reader 
Tom:

If you have database "A" where all developers work with. You create a new database "B" that 
interfaces with another external client. database A needs to access some tables in database B but 
you want to keep database B secure.

Would you give developers direct access to the tables in A over a link or would you create views in 
A for the tables in B (also over a link) or would you use synonyms?

If views are OK, do you make those Read-only (if they only select data).



Followup   May 7, 2008 - 1am Central time zone:

probably create a new user in A and B to support this. Let us call this user "RO" for read only.

In database B, the RO user is granted select on the necessary objects.

In database A, the RO user creates a non-public database link to B, and creates views that are select * from table@remote. It matters not if you use read only - because RO only has select on these tables.

RO grants select on the views to whomever.


4 stars design   May 7, 2008 - 11am Central time zone
Reviewer: A reader 
1.  why create a second user in A. just create view in RO in B and give select only access to the 
main account in A. this is what they use.

2.  Another way we implemented is create Local views in A for some tables in B (view was created 
using db link). do you think this is bad design or it is OK. the only problem with this is that 
everytime you add or delete a column in table in B you have to recreate the view in A. it does not 
refresh automatically. However, that would be true too even if the view was in B. 

I guess we can grant direct select on table directly to avoid the view thing. not sure if this is 
bad or not.


Followup   May 8, 2008 - 3am Central time zone:

1) because you asked my opinion. Because I would have user in database A and B that was set up for just this purpose and nothing else. Because I like small bite sized, manageable things. Because I like the concept of least privileges. Because it is easier to verify that what we want accomplished (just read only access to this set of tables is true). Because it is the way I would do it.

You asked, I responded.


Let me start asking YOU questions - why would you want to "avoid this view thing".


and I do not think that every time you add a column in B, you have to recreate the view in A - why would A need access to this new column, unless the code on A was going to be modified to support it, in which case - in your TEST SYSTEM you would have created a new view, a new set of code and then you would have rolled out the new functionality in a controlled fashion

This is why I would choose to USE a view - so that a change on the other system does not kill me, I can plan for the change, test it, roll it out when I want to.
5 stars Database Design Book   June 3, 2008 - 1am Central time zone
Reviewer: anonymous 
Hi Tom,

Can you refer me to a good book on Database Design?

Thanks!


4 stars Question reading separating tables design from the application source:   June 12, 2008 - 6am Central time zone
Reviewer: steve from UK
Hi Tom
 
I have just started working with a new company and their Database procedures require that two 
schemas for each PLSQL application.

They have one schema holding Meta data for configuring the software lets call it APP_DATA that is 
locked down (I believe so no one can do any DDL). They then have another lets call it APP_SOFTWARE 
that holds just the PLSQL packages. This means all the PLSQL packaged are separated from the 
configuration data tables that the application will be writing to and reading from.  
NOTE: the Customer data they are processing is stored in separate schemas.


My questions are:

Have you seen this before (Is this a "best practice")?

Can you guess at what security worries would require them to separate the PLSQL from it's own 
configuration-tables, especially when the source code has the rights to modify these tables?
  


Followup   June 12, 2008 - 8am Central time zone:

this is great.

You cannot mess with the constraints.
You cannot mess with the triggers.
You cannot mess with the physical schema.

and if they want to, some tables can be read only, others could be read/update only, yet others read/update/insert, yet others ......

The source code does not have to have the right

Separation is good - helps to document the set of least privileges needed. The application (source code) should not have the ability to mess with the structures (the tables) and this accomplishes it very nicely.

Not a bad idea to break it down even further, using schema's to hold different components of a large application - nothing wrong with that.
4 stars RE: separating tables design from the application   June 12, 2008 - 5pm Central time zone
Reviewer: Duke Ganote from Amelia, Ohio USA
I can verify separate schemas for data structures and code is common enough.  I first saw that 
approach at my former employer for our "roll your own" ETL code and data warehouse tables.  Made 
perfect sense because the data architect wanted strict controls on the data structures even in dev, 
while the developers had latitude for the code itself.

Most recently, I separated the ETL code (with some configuration tables) from the actual data mart. 
 That way, the BI report writers only saw the star schema itself, without any distracting temp or 
configuration tables.


5 stars desig   June 14, 2008 - 12am Central time zone
Reviewer: A reader 
Tom:

I have two databases AAA and BBB.

DB AAA interacts with http client (external world) and DB BBB supports production users mostly.

An external program does a few tests on a book: test1, test2, test3 and sends the 3 xml files to 
schema A so production users can view. A book can have many sets of tests.

How would you design this?

I am thinking of storing the received data via http message into an audit trail table and then 
insert the files into schema B.

database AAA

A_trans_table
--------------
trans_no    Number(10)
book_no    number(10)
test1_xml  CLOB
test2_xml  CLOB
test3_xml  CLOB

Database BBB

B_tests_table
--------------
book_no    number(10)
test1_xml  clob
test2_xml  clob
test3_xml  clob

If i received 10 test result sets there will be 10 records in A_TRANS_TABLE and the last record in 
B_TESTS_TABLE.

1.  is this a good design. would this be unnormalized since i am storing duplicate data sometimes. 
same data in two places. would you let user access a table directly in AAA instead.

2.  would i be able to insert the XML file from AAA into BBB database over a link. I think a CLOB 
has a limitation. How would i accomplish that?

3.  Would you use a CLOB or XML column type in 9i.  any advantage to each in queries or parsing.

thanks,


Followup   June 16, 2008 - 11am Central time zone:

you and I have talked books a lot on many many other threads. It keeps coming back to this basic fact:

given a short blurb of text, no one can 'design' anything.

I have no clue what a 'test' is in your mind.


5 stars design   June 16, 2008 - 4pm Central time zone
Reviewer: A reader 
Tom:

An external program reads a copy of a book on Server A, and runs a spelling check.
It generates an XML file with the test results. It sends the xml file via http post to database A 
who interface with outside world and logs it into (save it) A_TRANS_TABLE

and then forward the file to internal production database (B) to be stored into B_TESTS_TABLE 
accessed by another department.

Do you see any duplication here? I do not see it. i just want to check with you

Also, How would you insert the file into database B from A. I do not think a DB link will allow 
that? correct

Do you use another http call in stored procedure in A or you call a stored procedure in B from A?

thank you,


Followup   June 16, 2008 - 4pm Central time zone:

I'll never understand why you keep wanting to interface to the database using http... It isn't very reliable - unless you start adding tons of protocols on top of it and then i just say "why didn't you just log in and use sql and do it easy"

If A can forward to B, then the process that puts the data in A could put it in B directly and be done with it, the database on A doesn't need be involved.

Of course a dblink would let you do it, but why even bother, just have the process insert into B.
5 stars db   June 16, 2008 - 8pm Central time zone
Reviewer: A reader 
Tom:

It is not me. The architecture was designed by someone else.
I think they decided to use http because of security and firewalls issue. The external client was 
not given access to log in and use sql and do it easy.

However, we do have other production application that transfers files from over 200 customer sites 
to one database using http. It has been working fine for years. 

This is not a heavily used system (a few transaction per day) so we may not have any issues. The 
only issues that might pop out are error handling issues.

1.  The reason the process can not insert directly into B is that "B" is not open for the WEB while 
"A" is.

I just tried saving a file from DB "A" to DB "B" over a link and it did work. for some reason, I 
thought you could not. Maybe because I tried to create a view in B for a table in A that has a CLOB 
and it did not allow it.

The only thing is that it copied the reference in the Documents table like 
F1331/sample.xml. How can i get the whole file into that CLOB column in B.

2.  The best way for me to describe the design is that "A" is like a post office.
    It logs in all the incoming mail messages and then forward the data into "B". B will only one 
record with the latest information for a specific book. 
    
    Do you see any duplicate or normalization or design issues here.
    


4 stars http   August 28, 2008 - 8am Central time zone
Reviewer: A reader 
<<If A can forward to B, then the process that puts the data in A could put it in B directly and be 
done with it, the database on A doesn't need be involved. >>

"A" has a web server and open to the outside world and "B" does not. If you open up B did not you 
just open up a security hole. 

The idea is let A receive, validate and process dta received and then decide to accept it (update 
B) or reject it.

2.  The "http" was selected as a method because A is behind a firewall and they want to keep it 
flexible meaning if they relocated the server port 80 will always be open.

I can see that http is not really created for app to app communication and more of human to app 
communication. But, i think for low transaction systems it will work. IF you post some data and you 
did not get an ACK back you just keep sending the request again. You might end up with duplicate 
data in the DB.




Followup   August 29, 2008 - 10pm Central time zone:

... The idea is let A receive, validate and process dta received and then decide to
accept it (update B) or reject it.
...

if that is the idea, then they may certainly code it, there will be no "magic" and that is what they were looking for.

And if A can forward to B, then B is accessible via A and A is accessible via the internet and hence B is too.

Typically if you have:

internet <-> A <- B

where internet is outside, A is in a DMZ (in and out of the corporate system) and B is in the corporate system - then A cannot forward to B.

For example, I can from within Oracle log on to asktom.

I cannot on asktom do much of anything into corporate - it is a one way connect capability from inside to the DMZ. If asktom could initiate a conversation to inside of corporate, then it would be a place to launch an attack - and it would be attackable. So, if they have A can forward to B, they have already lapsed security.

... You might end up with duplicate data in the DB.
...

you will, not might - always replace "might" with "will" when you write a sentence like that.


"In noarchivelog mode you MIGHT lose all data since your last full backup someday"
that is wrong, it is correct to say:
"In noarchivelog mode you WILL lose all data since your last full backup someday"


4 stars web   September 9, 2008 - 10pm Central time zone
Reviewer: A reader 
tom:

1.  why do you think http is not reliable? many systems use that as a way for communications and 
file transfer these days. is it because it is stateless and you prefer stateful for batch program 
to DB interface.

2.  DB "A" has a web connector open (mod_plsql). DB "B' does not. Even though there is a connection 
from A to B, you can limit that access to one procedure. Most of B is still protected from the web 
access or any other access (tables or code)

is not this correct


Followup   September 11, 2008 - 10am Central time zone:

1) if you connect to a web server and use tcp to send a request over to the server... and then lose the connection, did it

a) work
b) not work

and how can you find out?

Has nothing to do with stateful/stateless - has to do with the fact it is just a wire protocol, it doesn't have any thing in it for transactional control, recovery etc - OTHER PROTOCOLS written on top of it DO

2) the network path is set up, that means machine 'a' can talk to machine 'b', which means if I compromise 'a', I can readily compromise 'b' - it is already insecure. forget about the database, the network rules are in place that allow 'b' to be accessed already.
5 stars db design   September 13, 2008 - 1pm Central time zone
Reviewer: A reader 
Tom:

1.  I would not know if it worked or not unless i get an ACK from the other side.
In this case i would post the transaction again. 

or you can send a "message sequence number" and after your connection fails, you query the DB for 
that seq number. if it is in DB then it worked. if it is not then it did not.

sort of like sending regular mail. you have no idea that the other party received your envelope 
unless you request an ACK signature back but most mail is done without that. if you did not receive 
the confirmation back you send another copy of the letter.


Here is how the interface works: 
-  Batch program sends a message to DB, asking "which book i need to process". DB responds back 
with a list of books. Batch program does its work for each book and then sends a status update to 
DB to tell it "i completed the work for this book". 

There are two things that can happen now:
a) DB status was "successfully" updated and flag for work complete was reset to OFF. In this case, 
the batch client does not care about seeing an ACK because the next time it will send the query 
message the book will not listed.
b) DB status was not updated (some failure happened) and flag for book stayed ON. In this case the 
batch will pick up the book again on the next run after a few minutes and process again. If it 
fails again you log that issue into a text error file or mark it in DB and some manual intervention 
has to be made.


2.  What common protocols are usually used for batch program to DB interfaces? there are millions 
of transactions occuring every day over tcp/http.

you told me once to have client program connect directly to DB via Perl DBI interface and run the 
sql. But the same scenario could happen, batch does it work and the sql fails or connection to DB 
fails, do i stop batch processing.


3. You are saying that in order to have complete security "server B" should be on separate network 
of "server A" or if they are on same netowrk there should be no db link.

but i mean cant you have them both on same network, and apply all your security stuff: db accounts, 
firewalls, etc.most web setups you have a web server outside the firewall that talks to the DB 
machine inside the firewall. They still can talk to each other.


Followup   September 16, 2008 - 9pm Central time zone:

and what you just started laying out there is.......

a protocol

which you build on top of http

to which both ends of the conversation have to agree on

and support the feature/functions thereof.

Eg: you just built web services - ta-dah.


So, if you like reinventing already made industry standard API's that if you used instead of growing your own - you'd be ready to interface to other things in the future - hey, go for it (you know - smk - you and I have *had* this discussion so many times, so so so many times)



People do not in general use http for 'batch', in fact, I don't really know of anyone. That is like asking "who is using tcp/ip for batch".

I'm going to give up responding to this one - we (you and I directly) have had this discussion over and over and over again. The answer will always - ALWAYS be "http is not a transactional protocol, you would have to build something on top of it - but then IT IS NOT HTTP ANYMORE, it would be your own proprietary protocol. Fine, go for it, bad idea given that there are standards and software libraries to do it better already - but *go for it*"



2) you have the ability to catch, detect, log, handle errors. There is a big difference between using a documented database api and fling URLs - unless and until you build a big API around your URL flinging and by then - you know what? You've just built yourself an API - it is *not* http anymore. http is just the transport (like the US mail - I do have certified mail, I can track mail, I get mail back if it bounces, etc.... they are all higher level protocols on top of the standard "here is an envolope and a stamp").


3) most people - assuming the layout is:

internet <-> server A ..... server B

would have:

internet <-> server A <- server B

You want to have

internet <-> server A <-> server B

that is, B can initiate a conversation to A, but A cannot initiate to B - because you see, A is exposed - it is a stepping stone to B.

I've said it over and over - exactly the same way - to you - on this site...

Most people would not do that - making A a stepping stone is making A a very weak link.
5 stars   September 29, 2008 - 4pm Central time zone
Reviewer: Reader 
Tom,
At the NYOUG conference held on Sept 9th 2008, you showed some examples of date field being stored 
as string/number. Can you please let me know if that presentation slide is on your website?

Thanks


Followup   September 29, 2008 - 5pm Central time zone:

click on files tab and look for OOW_WorstPractices.zip (2nd page currently if you sort by created on)
5 stars table design   September 29, 2008 - 9pm Central time zone
Reviewer: sam 
Tom:

If you are creating an app for application user password management and you want to store the 
values of these attributes to be used by the authenticating fucntions,

would you create one table with 6 columns (password_life_time number(3),password_grace_time 
number(2), ...)
and a row for the values 

OR you would create a two column table (password_variable, password_value) and enter those as 
variables.


 PASSWORD_LIFE_TIME = 365
   PASSWORD_GRACE_TIME = 10
   PASSWORD_REUSE_TIME = UNLIMITED
   PASSWORD_REUSE_MAX=  0
   FAILED_LOGIN_ATTEMPTS = 3
   PASSWORD_LOCK_TIME = UNLIMITED;

Which is better design?


Followup   September 29, 2008 - 10pm Central time zone:

I wouldn't - I'd use the builtin capabilities.

You are free to model your data however best meets your needs.

I would not use an EAV (entity attribute value) model when I knew what attributes I wanted - they belong as columns.
5 stars db design   September 30, 2008 - 11am Central time zone
Reviewer: A reader 
Tom:

yes i will use the builtin if we migrate to one db account per user. otherwise, if we continue with 
application accounts (one big application model) i have to do something similar to what DB has. I 
know it is reinventing the wheel.

<I would not use an EAV (entity attribute value) model when I knew what attributes I wanted - they 
belong as columns>


But I did not quite understand your comment. I think you are suggesting to have a table column for 
each. But, are not attributes always known in both cases. Even if you create "variables" you know 
these can be columns.

I am trying to see the cases where you use variable/value versus column.



Followup   September 30, 2008 - 1pm Central time zone:

... I am trying to see the cases where you use variable/value versus column. ...

I don't do that, that is what I'm saying.

do not use the EAV for storing what properly belongs in columns - you asked

would you create one table with 6 columns (password_life_time number(3),password_grace_time
number(2), ...)
and a row for the values

OR you would create a two column table (password_variable, password_value) and enter those as
variables.


and I said "COLUMNS" - not the EAV stuff.
5 stars design   October 6, 2008 - 9pm Central time zone
Reviewer: A reader 
Tom:

Does this mean that you never use the EAV structure in your databases.

For example , to store different system control variables (i.e. number of rows displayed on a 
screen, no of days to display records within) you always create a table and have one column per 
attribute


Followup   October 7, 2008 - 1pm Central time zone:

I did it once, when I myself invented it (all developers invent this model, over and over and over and over again..... It is just "so cool")

It was almost 20 years ago...


for something as trivial as what you state - system control variables - it would be

a) ok, since it would not really affect overall performace
b) a bad idea since your system control parameters are well known at compile time - you do not add them on the fly - your program is EXPECTING them to be there.


So here - it would "work", but "so what" - why would it be beneficial?


Do I use EAV? Yes, every day. I use application contexts - so I do. But, for my real data? For stuff I persist? No, I do not. I use attributes (flex attributes if needed) so I can index and substr/cast in views to get the right types/lengths and what not...
5 stars Design question   December 17, 2008 - 1pm Central time zone
Reviewer: just reader from LA, CA
Tom,

I am trying to redesign data model and I need you advice on how to better approach it. 

Table names are obviously fictitious but semantics the same. I am trying to merge mgr and emp 
tables since all other attributes are identical, without breaking many-to-many relationship. 

The relationship has to be maintained: one employee can report to many managers and manger can have 
many employees. 
Could you hint me on what is the better way to accomplish this? I am pretty sure you've seen this 
before but I never.

create table emp (empid number primary key, empname varchar2(10));
create table mgr (mgrid number primary key, mrgname varchar2(10));
create table emp_mgr (empid number, mgrid number);

insert into emp values ( 1, 'john' );
insert into emp values ( 2, 'kevin' );
insert into emp values ( 3, 'david' );

insert into mgr values ( 1, 'king' );
insert into mgr values ( 2, 'alex' );

insert into emp_mgr ( empid, mgrid ) values ( 1, 1 );
insert into emp_mgr ( empid, mgrid ) values ( 1, 2 );
insert into emp_mgr ( empid, mgrid ) values ( 2, 2 );
insert into emp_mgr ( empid, mgrid ) values ( 3, 2 );
insert into emp_mgr ( empid, mgrid ) values ( 3, 1 );


Followup   December 29, 2008 - 12pm Central time zone:

Since managers are employee's - it doesn't make sense to me to have a separate manager table.
create table emp
( empid number primary key, 
  name 
  .... whatever ...
  mgr_attributes_if_any_will_be_null_if_not_mgr
)
/

create table emp_mgr
( empid references emp,
  mgrid references emp,
  primary key(empid,mgrid)
)
/



5 stars re-Design question   December 18, 2008 - 11am Central time zone
Reviewer: daily reader from LA, CA
Tom,

Sorry, I noticed type in my above posting in the create mgr table command. Here is the correct one.

create table mgr (mgrid number primary key, mgrname varchar2(10));


5 stars Table design question   February 5, 2009 - 4pm Central time zone
Reviewer: Salman Syed from Pittsburgh, PA
Tom,

Here is a skimmed down version of the problem I am facing.

There is an entity opportunity that is associated with one and only one process. A process can have 
3-6 phases.

So we have three tables

1. Opportunity (opportunity_id, <attributes>, process_id)
2. Process (process_id, process_name)
3. Process_phase (process_phase_id, phase_name, phase_number)

We have to record the progress of each opportunity as to what the status of each phase is for the 
process that the opportunity is tied to. Each phase can have one of 4 statuses ('C', 'P', 'N', 'U')

One way would be to have a table Opportunity_process_phase which has the following columns:

(Opportunity_id, process_phase_id, completion_status). It will be an IOT with opportunity_id and 
process_phase_id as primary key.

The second option would be to have extra columns in the opportunity table (phase_1_status, 
phase_2_status, phase_3_status, phase_4_status, phase_5_status, phase_6_status) and just store the 
information over there.

Most of the queries will want to get the phase completion status data as a column so if we store is 
in a separate table the queries will require pivoting.

Which approach is better in your opinion? Or is there another approach you would recommend?


Followup   February 5, 2009 - 4pm Central time zone:

something is missing to tie process with process_phase isn't there?

today a process has 3-6 phases
tomorrow it will be seven.

I like the parent/child relationship
5 stars Performance issue   February 5, 2009 - 7pm Central time zone
Reviewer: Salman Syed from Pittsburgh, PA
Tom,

My bad. There should have been a process_id column in the Process_phase table description.

Ok that is what I also have. However, I am facing some performance problems. The most frequently 
asked question regarding is give me a list of opportunities and along with each opportunity, show 
me what is the status of each phase for that opportunity.

So, we have to join opportunity to the opportunity_process_phase table and then pivot. Instead of 
pivoting, one option I am thinking is creating a Function-based index to give a comma separated 
list of statuses and use that on the front-end. That will eliminate the join as well as make query 
writing much simpler.

What is your opinion about that approach?

Thanks!


Followup   February 5, 2009 - 8pm Central time zone:

you cannot do a function based index.

A function based index must be on a deterministic function, yours would NOT be.

Are you solving a problem that is "hypothetical". Why are you so afraid of a join? Joins are what RDBMS's do for a living.

Have you actually benchmarked this - you are talking about a join from opp to opp_phase_process (an IOT, 3 maybe 4 IO's to pick up all fo the related data)
5 stars db design   February 11, 2009 - 11am Central time zone
Reviewer: A reader 
Tom:

Would one table design make sense to you in this case and what problems you see in it.

Vendor A will make shipments and key it into a web form (or send an XML file).
Vendor B will receive the shipment and log it in as Received and/or enter a comment about bad 
CD/DVD.
Vendor A will send an invoice for that shipment.

Customer wants to see the shipment, receipt and invoice in one table for easier access and approve 
or disaaprove payment.
I see each as different entity and requires a separate table (even if it is one to one).
A view would serve his purpose for reporting and viewing.


Followup   February 11, 2009 - 1pm Central time zone:

customer wants an application to display and interact with data.

customer doesn't care about the tables.

you, as a developer, should implement the physical design that meets the requirements.
5 stars db   February 11, 2009 - 4pm Central time zone
Reviewer: A reader 
Tom:

Actually they want to use their ad hoc query tool (not an application) to query the data. Export it 
to a spreadsheet and do their calculations.

does the one table design still make it valid.


Followup   February 12, 2009 - 10am Central time zone:

that is what a view would be used for then.

always do the design right, if you have the right physical design - any question the end users would want to ask will be answerable.

I'm not saying this should use N tables or 1 table - frankly, I didn't look at the system you are building - I'm just saying:

no matter what they say, you will use the correct physical schema design. You can, from that physical schema, materialize any data they desire - data that is clean, correct and has data integrity.
5 stars design   February 16, 2009 - 2pm Central time zone
Reviewer: A reader 
Tom:

If you have a columnar spreadsheet 
(1)Agency (2)Product (3)Ship Date (4)No of Items (5)ship from (6)ship to (7)Ship method (8)BOL No 
(9) shipment comment
(9)receipt date (10) Receipt comment (11) invoice no (12)invoice date (13) invoice amount

Supplier fills out  1- 8 -(High level data for a shipment)
Production fills out 9 - 10 - (Receipt data)
Customer fills out 12-13 - (Invoice data)

The need is simple. Just basic insert and update of data so customer can see shipments made and 
validate receipts before payments are made.

if you have one to one mapping would you go with one table for all?
If you have one-to-many between would you go with 3 tables: shipment, receipt, and invoice?


Followup   February 16, 2009 - 5pm Central time zone:

smk, how many designs have I done for you personally..

I've written many times - if you have a series of 1-1 relationships, you have a single table. Breaking it up into multiple tables is not useful.

I think that you and I have actually had this discussion, about breaking a bunch of 1-1 relationships up and my thoughts that "that is a bad idea, single table good"

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1460004570029#4516771788453


http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1460004570029#17626768579621


http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1460004570029#56102792586897


http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1460004570029#56102792586897


Now that said, there are lots of other missing tables in this 'design', like the customer table, the supplier table, other lookups and such.



As for "If you have one-to-many between would you go with 3 tables: shipment, receipt,
and invoice?"

well, you didn't tell me who is one to many to whom - it could be they are all one to many with eachother and hence there would be more than three tables. Insufficient data

But please - don't ask unless there is a one to many...
5 stars db design   February 22, 2009 - 7pm Central time zone
Reviewer: sam 
Tom:

All i need to track is that a supplier made a shipment of products and the other vendor received it 
without any problems and in full.

The supplier now ships one product "DVD-1" size (1 G bytes) but there "might" be additional
different sizes later on (DVD-2, DVD-3).

I need to track the shipment master data (i.e ship date, ship from, ship via)
and product details (product_Id, Qty).

A production vendor that copies the files to blank DVDs will receive the shipment and enter date 
received, comments
and quantity received for each shipment made. Mostly it is one to one (a shipment will be received 
once) 
but there is a possibility a shipment made may get received on two different dates.

Supplier can enter he shipped 1000 DVD-1 today on a web page. Production vendor goes to the web 
page and mark those received when he receives it with some comments. that is all what i need to 
know  However, he may get 600 today and 400 tomorrow so he may have two receipts

Since the requirement is to come up with simple design, and some columnar screen to be used for 
supplier and vendor is there a way other than creating SEVEN tables. One table (spreadsheet design) 
will result in redundant data.

is this how you would design the DB?


Shipment_Master(shipment_id,shipment_date, shipped_via,BOL_NO, comments)
Shipment_Line_Items(shipment_id,product_id,Qty)
Receipt_Master(receipt_id,shipment_id,receipt_date,Qty)
Receipt_Line_Items(receipt_id,product_id,Qty)
Suppliers(suppliar_id, supplier_name,...)
Products(product_id, name,...)
Vendors(vendor_id,name,...)


Would you also do some kind of columar screen (like a spreadsheet) or form layout.

THanks,


5 stars design   March 10, 2009 - 11pm Central time zone
Reviewer: A reader 
I have two tables and one job. A video can have same number in different media.

When a video is assigned to a vendor, a record is created in VPROD with stage "PR".
Then when the vendor ships it he updates the ship date of that record.

The oracle job runs every monday at 7:00 PM. It meets a requirement to generate a text report of 
all the videos that were shipped in the previous week by the vendor.

select vidno from video a, vprod b where a.vidno=b.vidno and media = 'DVD' and b.stage='CP' 
  and ship_date between sysdate-7 and sysdate 

VIDEO
------
VIDNO   NUMBER(10)   PK
MEDIA   VARCHAR2(3)  PK
other_details
DVD_REPORT_FLAG??

VPROD
---------
VIDNO   NUMBER
MEDIA   VARCHAR2(3)
STAGE   VARCHAR2(2)
SHIP_DATE DATE
other_details

Problem happens when a vendor enters shipment dates for videos he shipped last week after the job 
runs on monday (between 8 PM and 12 PM or the next few days for shipment dates of last week)
These will be never be included in the shipped report generated monday. What is the best way to get 
around this?
I am thinking of creating a flag per media in VIDEO for each media and marking this when it is 
printed in the report using UTL_FILE.

Then changing the query to this - is this a good idea or there are better ways.

select vidno from video a, vprod b where a.vidno=b.vidno and media = 'DVD' and b.stage='CP' 
  and ship_date IS NOT NULL and DVD_REPORT_FLAG IS NULL 


4 stars Facts table with invalidable data   May 19, 2009 - 1pm Central time zone
Reviewer: Eduard from Spain
Hello Tom,

First of all thanks for your patience!

I was thinking in how we can design a database with a facts table by date where we calculate some 
values that could be invalid because of master data changes.

Keeping track of the master data changes is the first point. But then how can we control which data 
was calculated before the master data change and after and why the value has changed, etc...

The facts table would be used as the source for some drill-down reports (probably using 
materialized views)

Something like best practices for maintaining facts table with invalidable data...  



For example let's focus in the next scenario.

As master data we have InvoiceItems, Products, Customers and Conditions.

    -An InvoiceItem has an InvoiceItemId, a ProductId, a CustomerId, a DueDate, and a GrossValue. 

    -A Condition has a PercentageToApply, DateFrom, DateThru, a set of product ids and a set of 
customer ids


We want to calculate incurred amounts (facts table).

IncurredAmounts:
BatchId, DueDate, ProductId, CustomerId, ConditionId, InvoiceItemId, IncurredAmount


For example if the input data is:

InvoiceItems: 
InvoiceItemId, ProductId, CustomerId, DueDate, GrossValue
Inv1, PrdA, CustA, 17/05/2009, 200$
Inv2, PrdA, CustB, 18/05/2009, 300$

Conditions:
ConditionId, PctToApply, DateFrom, DateThru, set of product ids, set of customer ids
Cond1, 3%, 01/05/2009, 15/05/2009, {PrdA, PrdB], {CustA, CustB}
Cond2, 5%, 15/05/2009, 31/05/2009, {PrdA, PrdB], {CustA}

The output data (facts table) would be:

IncurredAmounts:
BatchId, DueDate, ProductId, CustomerId, ConditionId, InvoiceItemId, IncurredAmount
Batch1, 17/05/2009, PrdA, CustA, Cond2, Inv1, 10$

If each day we receive new InvoiceItems, and new Conditions we will keep adding rows in our 
IncurredAmounts table.

But what can we do if our existing Conditions could be modified, for example the percentage or the 
date from? Let's imagine Products and Customers have a parent-child hierachy by date, and a single 
Condition could apply to a whole hierarchy, hierarchies could change, etc...

We would like to create some drilldown reports (maybe using materialized views) using 
IncurredAmounts data. The reports should take into account the date. I mean it must show the data 
(and the hierarchies) as it was in this date. Not the current values.

The question is how can we design our tables and processes in order to avoid recalculating too many 
rows?


Thanks,
Eduard


5 stars table   October 29, 2009 - 10pm Central time zone
Reviewer: A reader 
Tom:

I have a master table BOOK with a derived Y/N column S1_approved. Intially if no review was done it 
is set to NULL. if review is approved it is set to "Y" and if review rejects it is set to "N".

One query Is to list all books that are waiting for S2 approval which is 
select * from book where S1_Approved = 'Y' and (S2_approved is null or S2_approved = 'N')

another query is to list books that are waiting for S1 approval which is
select * from book where s1_approved is null or S1_approved = 'N'


Is it normal to use the value NULL since NULLS are not indexed and the above query might do a full 
scan.

The table is new so there is not a whole lot of records yet but it expect a 500 records per month.

What is a good common practice for this. 
Would you use "U" instead of NULL or default it to "N" and only have Y or N. My understanding an 
index wont help anyway since the column has low cardinality.

book
------
bkno   number(10)  PK,
title  vachar2(100),
author  varchar2(30),
s1_Approved varchar2(1)
s2_approved varchar2(1) 
 

 
    


Followup   October 30, 2009 - 10am Central time zone:

... Is it normal to use the value NULL since NULLS are not indexed and the above
query might do a full scan ...

bzzzt wrong, very wrong. totally wrong.

http://tkyte.blogspot.com/2006/01/something-about-nothing.html



http://tkyte.blogspot.com/2009/10/use-null-for-unknonw-data.html


but anyway, instead of

select * from book where s1_approved is null or S1_approved = 'N'

you would


ops$tkyte%ORA10GR2> create table t ( x varchar2(1), data char(200) );

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> insert into t
  2  select decode( mod(rownum,100), 0, 'N', 1, null, 'Y' ), 'x'
  3    from all_objects;

50375 rows created.

ops$tkyte%ORA10GR2> create index t_idx on t(x,0);

Index created.

ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T', method_opt => 'for columns x 
size 254' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> set autotrace traceonly explain
ops$tkyte%ORA10GR2> select * from t where x is null union all select * from t where x = 'N';

Execution Plan
----------------------------------------------------------
Plan hash value: 708709259

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |   992 |   197K|    55  (42)| 00:00:01 |
|   1 |  UNION-ALL                   |       |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T     |   575 |   114K|    32   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | T_IDX |   575 |       |     3   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| T     |   417 | 85068 |    23   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | T_IDX |   417 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("X" IS NULL)
   5 - access("X"='N')

ops$tkyte%ORA10GR2> set autotrace off


5 stars   October 30, 2009 - 10am Central time zone
Reviewer: A reader 
Excellent Reply !!!


4 stars data sharding   October 30, 2009 - 4pm Central time zone
Reviewer: James Su from Toronto
Hi Tom,

There's a question in your blog:

http://tkyte.blogspot.com/2009/01/this-should-be-fun-to-watch.html#5651108719482160308

Wollaton DBA said.... 
Hi Tom,

I came across this web site this weekend which details architectures for scaling web 2.0 social 
network web sites. A common practice in scaling the backend appears to be use "Data sharding": 
http://highscalability.com/unorthodox-approach-database-design-coming-shard. Is this one worse 
design practices being rolled out right now ?.

As soon as I saw this, I thought, I'd love to get Tom Kyte's view on this as he is always quite 
passionate about avoiding the take advantage of nothing approach when using databases. 

Off the top of my head it appears to me that this falls apart if you need to aggregate data above 
the 'shard' layer. It also seems to me that there is the potential for reinventing all the good 
stuff that is available in most database offerings in the application tier. 
When most of the major players in the RDBMS market have spent the best part of the last 20 years or 
so refining and honing their products, why would anyone think they could do a better job on one 
project ?.

I look forward to getting your take on this.

Chris

Mon Jan 19, 07:15:00 AM EST   

What is your opinion about data sharding? Thanks.


Followup   November 9, 2009 - 11am Central time zone:

... this falls apart if you need to
aggregate data above the 'shard' layer...

correct, it works well for a very specific type of application - and not much above that.

.. It also seems to me that there is the
potential for reinventing all the good stuff that is available in most database
offerings in the application tier. ...

again, true.

... why would anyone
think they could do a better job on one project ?.
...

for a very specific type of application, on the level of a facebook, certain parts of yahoo, a shopping cart at amazon - they are able to do well without an rdbms. However, what do you think even these systems use when you get to ordering a product, using your credit card to perform a transaction and so on? To pay their people? To manage them?

There is more than one tool out there because there is more than one job. And most people are not building facebook or twitter (thank goodness, hope some of them go the way of myspace - oblivion - but some of them are neat like reddit)


Unless and until I had the problem (of a facebook/twitter - where the data isn't mission critical and very simple), and I needed things like transactions, concurrency, security, flexibility and all - I wouldn't really be looking at them.
3 stars 11g rocks   November 3, 2009 - 9am Central time zone
Reviewer: Mike Kutz from High Point, NC
Oracle 11g should be able to do it better.

Hash Partitioning on "master" table spreads out the data.
REF Partitioning on the "detail" tables groups the relevent data together.
ASM comes to play in here somewhere
RAC for load balancing
RAC for redundancy
RAC for easy "add-a-node to increase performance"
RMAN for hot-backups of the data
SET AUTOTRACE ON to figure out where your serialization is.
etc.
etc.

and you get:
the ability to do one query to pull out all the data across all "shards".
very good uptime
and the ability to sleep at night.




Write a Review
 


All information and materials provided here are provided "as-is"; Oracle disclaims all express and implied warranties, including, the implied warranties of merchantability or fitness for a particular use. Oracle shall not be liable for any damages, including, direct, indirect, incidental, special or consequential damages for loss of profits, revenue, data or data use, incurred by you or any third party in connection with the use of this information or these materials.

About Oracle | Legal Notices and Terms of Use | Privacy Statement