maximum no of columns per table
nvssk, July 04, 2002 - 3:19 pm UTC
If an entity is having say 100 fields, Is it suggested to create one single table? or split into multiple tables based on the most frequently accesses columns? Do we need to consider data block size?
If we keep them in one single table what is the impact on performance?
If an entity is having LONG and/or CLOB columns also, what all should be considered for creating tables ?
Do we need to decide on storage parameters for each transaction table ?
July 04, 2002 - 3:41 pm UTC
if an entity has 100 attributes, it has 100 attributes -- it'll be a single table. hopefully the average row (or max row actually) fits on a block -- but that is true whether you have 10, 100 or more columns
A full scan will take longer -- but you should be asking "what is the performance impact if we split them out". Well, you'll have two (or N) primary key indexes to maintain, not just 1. You'll have n*(m+1) additional IO's for every row retrieve (where n = number of tables you choose, m = height of your index). It could really impact performance badly.
do no use longs, period.
when you use clobs, they are moved out of the table into their own segment when they exceed 4000 bytes, the work is done for you, no need to factor them out special.
I size objects by small, medium, large and extra large, using a locally managed tablespace with uniform extents. Alternatively, you can use a LMT with system manged extents and not even worry about it yourself.
Thanks
nvssk, July 04, 2002 - 3:50 pm UTC
Thank you very mcuh.
Primary key
nvssk, July 04, 2002 - 4:46 pm UTC
Hi Tom,
Thanks for answering the above questions.
If a table D depends on table C ( D cant exist without C)
and table C depends on table B,
and table B depends on table A.
What is the best way to design the keys so that it results in better performance ? Can we composite primary keys for the above scenario ? what are the other alternatives ?
Do we need to take any diversion, if table A have composite primary key in the above example?
July 04, 2002 - 5:02 pm UTC
You can certainly use composite keys if you choose.
The smaller the key, the faster the compare -- however, you have to outweigh the costs of using a surrogate key and maintaining two unique indexes if you decide to 'shorten' the key by using a single ID field populated via a sequence for example. (there are tradeoffs).
I tend to go with natural keys if the natural key is 1 to 3 columns (and NEVER changes), surrogate keys (id field populated via sequence) otherwise.
How about using clusters?
Ian Matyssik, July 04, 2002 - 11:48 pm UTC
Hello,
I understand that relative information better be in the same table. But wouldn't it be better if we will separate our LOBs into different table, I say that because LOBs are usially big and some times it takes time to store them them or update them. During that time I understand that the ROW will be locked and other information will be impossible to update. I hope you understand what I am trying to say. So I was thinking if it would be any better to use clusters as a PK on all related tables (separated for logical or organizational reasons) . Could you please advice on that.
July 05, 2002 - 10:02 am UTC
As I said before:
...
when you use clobs, they are moved out of the table into their own segment when
they exceed 4000 bytes, the work is done for you, no need to factor them out
special
.......
It doesn't matter that lobs are big, they are not stored in the table segment.
If someone is updating the LOB, yes the row will be locked -- but that is the way it SHOULD BE.
I believe this is one table, with one row per thing. LOBS do not take an inordinate amount of time to modify -- the row is protected from other updates, but reads of the data are not blocked. I do not view this as an issue.
I would not introduce the complexity you describe, no.
Should LOBs be stored in different Tablespace
Jerry, July 05, 2002 - 10:47 am UTC
LOB Should be stored in different Tablespace , I think that is what Oracle recommends . If that will be the case then there would be need for two tables .
July 05, 2002 - 11:14 am UTC
NO, you are missing the point of how lobs are stored.
they are stored in 3 pieces
o in the table, there is a lob locator (pointer)
o in the same tablespace as the table, there is a lob index (more pointers to all of the chunks)
o in another segment which may be in another tablespace (or not, upto you), are the lob chunks.
When you create a table with a LOB, there are 3 segments (objects) created -- the table itself, the lob index (which will be in the same tablespace) and the lob segment (which you control)
We already "broke" all of this stuff out physically. There is NEVER a need for two tables here.
Look at the create table statement for a lob as shown by exp/imp:
ops$tkyte@ORA817DEV.US.ORACLE.COM> create table t ( x int, y clob );
Table created.
ops$tkyte@ORA817DEV.US.ORACLE.COM> !exp userid=/ tables=t
Export: Release 8.1.7.3.0 - Production on Fri Jul 5 11:12:50 2002
(c) Copyright 2000 Oracle Corporation. All rights reserved.
Connected to: Oracle8i Enterprise Edition Release 8.1.7.3.0 - Production
With the Partitioning option
JServer Release 8.1.7.3.0 - Production
Export done in WE8ISO8859P1 character set and WE8ISO8859P1 NCHAR character set
About to export specified tables via Conventional Path ...
. . exporting table T 0 rows exported
Export terminated successfully without warnings.
ops$tkyte@ORA817DEV.US.ORACLE.COM> !imp userid=/ full=y indexfile=t.sql
Import: Release 8.1.7.3.0 - Production on Fri Jul 5 11:13:02 2002
(c) Copyright 2000 Oracle Corporation. All rights reserved.
Connected to: Oracle8i Enterprise Edition Release 8.1.7.3.0 - Production
With the Partitioning option
JServer Release 8.1.7.3.0 - Production
Export file created by EXPORT:V08.01.07 via conventional path
import done in WE8ISO8859P1 character set and WE8ISO8859P1 NCHAR character set
. . skipping table "T"
Import terminated successfully without warnings.
ops$tkyte@ORA817DEV.US.ORACLE.COM> !cat t.sql
REM CREATE TABLE "OPS$TKYTE"."T" ("X" NUMBER(*,0), "Y" CLOB) PCTFREE 10
REM PCTUSED 40 INITRANS 1 MAXTRANS 255 NOLOGGING STORAGE(INITIAL 524288
REM FREELISTS 1 FREELIST GROUPS 1) TABLESPACE "USERS" <b>LOB ("Y") STORE AS
REM (TABLESPACE "USERS" ENABLE STORAGE IN ROW CHUNK 8192 PCTVERSION 10
REM NOCACHE NOLOGGING STORAGE(INITIAL 524288 FREELISTS 1 FREELIST GROUPS
REM 1)) ;</b>
REM ... 0 rows
You already have all of the control you need over the location of the lob, one table -- period.
You are the Real Oracle GURU.
A reader, July 05, 2002 - 12:23 pm UTC
Wonderful and Thanks for the detailed explanation , Once again you are the best in the business.
Easier understanding of table design
nvssk, July 06, 2002 - 1:35 pm UTC
Hi Tom,
How can we achieve, easier developer understanding of the table design, without one explaining the table design document / ER Model.
The problem reported is difficulty in understanding the keys and relationship.
When an application is going to be developed in Java/JSP in bottom-up approach( table design first), Other than ER modeling, what specific features OO can be used in oracle ? what are the performance impacts ?
July 06, 2002 - 1:56 pm UTC
#1) telepathy I suppose?
#2) I would get database people and teach them Java/JSP. Or I would get database people -- let them do the database logic in finely tuned stored procedures and have the java/jsp guys just call the plsql code (NO SQL in java)... (of course, the java jsp programmers will forget to use bind variables and refuse to use prepared/callable statements more then once but hey -- at least the sql would run good)
If you are setting out to write a high performance database application, write a database application. Don't write a "cool java/jsp" that happens to use a database -- the database is going to be where you succeed or fail, don't work AROUND it, with with it.
Network data transfer
nvssk, July 08, 2002 - 7:17 am UTC
Hi Tom,
If the table contains say 100 columns,and you access the whole object, Wont it be lot of data on network,especially when its a web based application.
What is the performance impact due to this heavy data transfer ?Or Does it matter at all?
Can it be solved by creating separate views( of the required columns) for the usage based on the requirement and pickup the data only from the views.
Is there any impact on the joins?
Thanks for your help.
July 08, 2002 - 8:28 am UTC
don't access the whole object then -- don't be lazy, don't code "select *" (which in my humble opinion is a BUG in your code if you do! The fastest way to a broken program is to code:
select * <<<<==== use of "*" should be limited to sqlplus testing
insert into t values ( <<<==== didn't list the columns
insert into t select ... <<<<===== ditto
a simple drop and re-create with the columns in a different order or the addition or deletion of a column in the table BREAKS your application)
SELECT only that data which you actually NEED. Don't use "*" and this entire conversation is rendered MOOT.
A reader, July 08, 2002 - 3:10 pm UTC
Excuse the stupid question. Will Oracle not read the entire data block where the row is stored? What is the difference in the way Oracle read the data for a select * and for select col1, col2..etc?
July 08, 2002 - 7:36 pm UTC
It isn't a stupid question -- it is actually more complex then you probably anticipate and takes a couple of points to fully answer
o first, there is the "BUG" inherit in the "select *" code. A simple addition or deletion of a column will generally "break" an application. One of the glaring exceptions to this is PLSQL with implicit cursors.
for x in (select * from t)
will not break but is unnecessarily inefficient.
o second, there are different paths the optimizer can take to answer a query and it is very dependent on the columns you select. For example:
big_table@ORA817DEV.US.ORACLE.COM> select object_name from big_table where rownum < 100000;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=212 Card=1459904 Bytes=36497600)
1 0 COUNT (STOPKEY)
2 1 INDEX (FAST FULL SCAN) OF 'OBJECT_NAME_IDX' (NON-UNIQUE) (Cost=212 C
ard=1459904 Bytes=36497600)
Statistics
----------------------------------------------------------
0 recursive calls
45 db block gets
583 consistent gets
0 physical reads
0 redo size
3228183 bytes sent via SQL*Net to client
2534 bytes received via SQL*Net from client
21 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
99999 rows processed
big_table@ORA817DEV.US.ORACLE.COM> select * from big_table where rownum < 100000;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=577 Card=1459904 Bytes=150370112)
1 0 COUNT (STOPKEY)
2 1 TABLE ACCESS (FULL) OF 'BIG_TABLE' (Cost=577 Card=1459904 Bytes=1503
70112)
Statistics
----------------------------------------------------------
0 recursive calls
18 db block gets
1397 consistent gets
0 physical reads
0 redo size
11972872 bytes sent via SQL*Net to client
2534 bytes received via SQL*Net from client
21 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
99999 rows processed
big_table@ORA817DEV.US.ORACLE.COM>
Look at the huge differences in consistent gets (blocks processed). This is due entirely to the one query being answered via an index and the other full scanning. If you include columns in your select list that FORCE us to goto the base table -- when you don't need to -- you are forcing a less optimal query plan on us
o third, look at the difference in data transfered!. Almost 4x as much -- that could be huge.
o and the last thing I'll mention will be chained rows... The impact from this is subtle. If you have chained rows -- and you do a select *, we'll be forced to construct the entire row (which will take additional random IO's to find the "end" block for your row - we have to piece it together). Now, if you select only the columns you need AND these columns are on the head rowpiece, we'll never assemble the entire row -- we do just what we need to do to get your data.
So, yes, Oracle reads an entire block -- but by selecting only the columns you need you can:
o avoid reading that block in the first place (skipping the table access by rowid step in an index range scan for example)
o avoid reading multiple blocks to construct the entire row
o open up better query plans, more query plans
o have less buggy code -- code that is much more resilient to change
o transmit lots less data over the network
and so on....
Anil Sekhar, July 09, 2002 - 12:14 pm UTC
Great answer. Thanks Tom.
So if Oracle reads the entire datablock (when the data is not available from an index) where the row is stored, why are you against limiting a table to the frequently accessed attributes of an entity? If an entity has 100 attributes and only 20 are accessed frequently, is it not a waste to read 100 columns when you actually need only 10-15 of them? Sorry if I am beating a dead horse.
Also can you please tell more about the select * bug?
Thanks.
July 09, 2002 - 12:48 pm UTC
We read a block -- if you have 20 columns or 100 columns we read the block.
If you only need 20 of them, and we need to read the block, we'll still read the block. We'll only process 20 of them.
The select * bug is that if you have a program that does a select *, the * is resolved at runtime. If the number, order or types of the columns CHANGE over time -- your program breaks. If you selected ONLY those columns of interest -- only those changes that really truly affect you -- in fact affect you. We can ADD columns and you do not break. If we drop a column and you don't select that column -- you don't break -- if you did a SELECT * however, you would have in your FETCH command a place for this missing column and you would break.
Parent child relationships
nvssk, July 11, 2002 - 6:08 pm UTC
Lets say, There are tables P, C1, C2, C3, C4 and P is the parent of C1 ..C4.
If the application need to show C1,C2,C3,C4 togather for each row in P, what is the best way to model it?
i.e,
P1
C11
C12
C21
C22
C31
C32,etc.
1. If you manage as Parent child, one need to write UNION between queries i.e,
select p.p1,c1.c11
from P,C1
where P.pc1=C.pc1
UNION
select p.p1,c2.c21
from P,C1
where P.pc1=C.pc1
Is this an efficient way?
OR
2.Can we have a table C in between P and C1,C2,C3,C4
where P maps to C (1:n) and C maps C1(1-1),C maps to C2(1-1) etc.
This way we can get all the required data in one query, no need of unions. But, we will not be able to put referential constrains as C stored all primary keys of C1,C2,C3,C4 in one column.
What is the right approach in both of the above ?
July 12, 2002 - 8:10 am UTC
If P has a 1-1 relationship with 3 of the C* tables and either a 1-1 or 1-m (one to many) with 1 of the C* tables, you can write it as:
select *
from p, c1, c2, c3, c4
where p.key = c1.fkey
and p.key = c2.fkey
and p.key = c3.fkey
and p.key = c4.fkey
if any of the relationships are OPTIONAL, add the outer join (+) to the query.
If more then 1 of the child tables is 1-m with P, then you cannot do it with a join (since if C1 has two rows that match with P and C2 has 3 rows -- you'll end up with 6 rows -- a cartesian production). You'll want to use a UNION ALL -- better then UNION
Data Model
Mark, September 13, 2002 - 12:28 pm UTC
Tom,
I have the following requirements and could you please tell me whether I can achive this using the Logical model.
It is a technical Membership info Design.
1.One business Type can have multiple comanies under it.
2.One company Must Belong to only on Business Type.
3.One company can have multiple employees
4.we are interested in only the employees with Tech Membership and they have to enroll every year.(we need the Historical Info also).
5.Each company Appoints multiple officers.
6.Each Business Type will have a Director who is selected from the Officers table.
7.Director for each business type should be from that business type only.
I have designed the tables for the above requirements but I am having the problems in enforcing the Relation between Business Type and Company and the requirement 7 at the same time.
Table1:
Business Type
Columns
Bus_Type (Primary Key)
Table2:
Company
Columns
Company Code (Primary Key)
Bus_Type (Foreign Key (table1))
Table3:
Member
Columns
Member Id PK
Enrollment Year PK
Company Code PK (FK to Table 2)
Table4
Officer Table
Officer Id PK (FK to table3)
Year PK (FK to Table3)
Company Code PK (FK to Table3)
Officer Role (Non Key Column)
Directors
Bus_Type PK (FK to Table1)
Year PK (FK to Table4)
Director Id (FK to Table4 column Officer Id)
Directors Company (FK to Table4 Column Company Code)
But In the above model I can not enforce the Requirement 7
which is Director of the bus type Must belong to that bussiness Type.
Could you please tell me how to fit all the above requirements in One logical model.
One or two tables?
John, January 10, 2003 - 10:06 am UTC
Get back to how to design tables:
I have a system where we need to generate alerts. The alert will be stored in an alert table, and end user will see them based on queries. For each alert, we need some basic information, such as alert code, and create date. We also need to store the detailed information for the event that causes the alert, and that detailed information varies depending on the alert. In another words, different kinds of detailed information need to be captured for different alert. I am now considering three options, but not sure what is the best way to handle this.
Alerts are generated based on ALERT_DEFINITION table:
|alert_definition|-------<|alert_record|
(1 to M relationship)
The 3 options I am considering are:
1. Create one ALERT_RECORD table, and create columns that will be needed to capture different types of alert detail information. For columns that don't apply to a particular type of alert, the value will be null.
The problem is that the information is not stored in a logical way. For each row (alert) in the table, there will be lots of columns that are not applicable to it, and therefore have a value of NULL. There may well be confusions down the road as to which column apply to which type of alert.
2. Create one ALERT_RECORD table for each alert type. Information will be stored in a logical way, but I will end up with 20 plus tables just for alert detail information. That will also require extra Java programming work to present those information.
3. Create one ALERT_RECORD table, but only use it to store the basic alert information (alert code, and create date). Create another table called ALERT_RECORD_DETAIL to store the detailed information. The alert details will be stored as Display_heading and display_value. For alert detail like this:
display1 display2 display3
value1 value2 value3
will be stored in ALERT_RECORD_DETAIL as 3 records:
display_heading display_value
-------------- --------------
display1 value1
display2 value2
display3 value3
and I probably need another column to store the display order of the heading.
This design will require lots of parsing in the Java code to present the information.
|alert_definition|-------<|alert_record|--------<|alert_rec_detail|
Is there any other better way to handle this situation? Thanks.
January 10, 2003 - 10:18 am UTC
No version info. Ok, assuming you are using the latest software:
option four
create table alert_record( alert_code int, alert_date date, alert_info XMLTYPE )
or, in earlier releases
option four
create table alert_record( alert_code int, alert_date date, alert_info clob )
if you don't need to have the data structured. You can use Oracle text (intermedia) to index the xmltype or clob for searching if that is needed.
Mirjana, January 29, 2003 - 3:01 am UTC
Thanks.
John, February 04, 2003 - 3:05 pm UTC
We are trying to design our database to accomendate Oracle 8.1.7 and up. So I guess the XMLTYPE option is out.
The alert detail information I need to store and display later to users are: product_id, packing_list, acknowledge_date, etc. Those information have to be stored in a structured way. I don't know how can I display them if I use XMLTYPE or clob?
February 04, 2003 - 4:02 pm UTC
original question was:
...
I have a system where we need to generate alerts. The alert will be stored
in an alert table, and end user will see them based on queries. For each alert,
we need some basic information, such as alert code, and create date. We also
need to store the detailed information for the event that causes the alert, and
that detailed information varies depending on the alert. In another words,
different kinds of detailed information need to be captured for different alert.
I am now considering three options, but not sure what is the best way to
handle this.
1. Create one ALERT_RECORD table, and create columns that will be needed to
capture different types of alert detail information. ...
2. Create one ALERT_RECORD table for each alert type....
3. Create one ALERT_RECORD table, (then a table with variable name/value pairs).....
#1 is the easiest
#2 can be thought of as a variation on #1 -- think "views"
#3 would be avoided by me if at all possible. People are going to want to query "where product_id = 55 and acknowledge date between A and B or packing_list = C". Try doing that with variable name/values
So, I guess I vote for #1 with the #2 variation on a theme (views) to remove the ambiguity of who needs what data...
store as segname
Mark Rynbeek, May 30, 2003 - 11:45 am UTC
Hi Tom,
thanks for the CLOB info I can find here. I have a question regarding the "store as segname" option for CLOBS.
We have to design a table with biological data, proteins. These proteins have a sequence of amino-acids, represented by a long letter string. On average, they're not really long (only .05% is longer than 4000 characters) but still you need the CLOB.
There seems to be some consensus here that CLOB data should be in its own tablespace, physically but not logically. So you get something like :
CREATE TABLE protein
( x INTEGER
, Y VARCHAR2(50)
, z CLOB
)
TABLESPACE t1
lob (z)
STORE AS SEGNAME (TABLESPACE t2
PCTVERSION 0
CACHE READS
NOLOGGING
)
;
Does that make sense to you? What would be the benefit - can you tune t2 differently than t1 because you only store CLOBS in there?
I would think, regarding the percentage that most data would fit in the row itself anyway that we should not bother about all this (but I'm lazy by nature)
Cheers,
Mark.
May 30, 2003 - 12:03 pm UTC
you'd have to ask the people wanting the separate tablespace "why".
the goal is to achieve even IO over all devices. If segrating this way helps that, that is good.
That nologging is a bit dangerous no? do you not care about these proteins? (there is a word I never thought I would use myself in a sentence here ;)
Question answered elsewhere
Mark Rynbeek, May 30, 2003 - 12:06 pm UTC
Survey Design
mo, July 21, 2003 - 4:18 pm UTC
Tom:
Can you give me your opinion on the following: I have a survey form that I want to put on the web for users. The paper form has a lot questions and checkboxes. A user can check all checkboxes if he wants. For example, one question can have
Languages Spoken?
English
French
Spanish
Chinese
Japanese
Other _________
Here is my question. If you are designing a table to store all this info is it better to have one table with columns like:
create table survey as
(survey_id,
language1,
language2,
language3,
language4,
language5
language_other );
or
You would do 3 tables as (one main table and one lookup table and one child data table that can have multiple records for each language:
create table survey as (
survey_id,
language )
create table language as (
lang_Code,
language)
create table survey_language as (
survey_id,
lang_code,
lang_other)
2. The second question is that I have a lot of these questions so if I design it in one table it will be huge. If I break them into lookup tables and child tables like optoin two, would it be better to put all code like languages, colors, etc. into one tables of codes or you would break it up to several lookup tables one for colors and one for languages etc.
3. Can you tell me the advantage/disadvantages of each approach from oracle stand point of view?
July 22, 2003 - 7:59 am UTC
depends -- do you know for 100% sure how many lanaguages there will be.
surveys are an example where the "funky data model" as I call it -- where you have this very generic model -- might make sense.
It allows very flexible storage -- it stinks for querying - you might have to denormalize for quering later -- so you would have columns spanish, english, french (not lang1, lang2, lang3). so you can query "select count(*) from t where survey_id = 55 and english = 1 and spanish = 1" to find all surveys filled out by people that speak spanish and english.
table design
mo, July 22, 2003 - 4:01 pm UTC
Tom:
Do I understand that you are in favor of option 1 for one big table for the whole survey where i define one column for each checkbox value. The table may end up with more than 100 columns.
I was planning to create two other tables, one is a lookup table for all languages, and other is where i store the survey_id and language code. The query would then be a two table join or three table join. Is there a way oracle recommends doing survey kind of design?
2. ALso, if you have several lookup tables would you recommend combining all of it into one (code,description) or create one table for each category.
July 22, 2003 - 8:11 pm UTC
1) do it in the funky data model and then DENORMALIZE if possible for reporting.
2) a table per lookup type.
another table design question
A reader, August 11, 2003 - 12:46 pm UTC
We have a situation where there are different types of data
(type being a categorization of a trow in a table) - let us
say these are "type1", type2.(in our case there are 10
or so such types)
Now each of these have same set of information stored
in a table (same columns - col1, col2, ....col20)
Is it better to store them all in one table (with a type
column in addition to the above columns ) or is it
better to separate them into 10 different tables (with
col1, col2, col3 etc..)
Most of the times these tables need not be queried together
in a join.
What are the pros/cons of creating multiple tables of
the form
type1_table (col1, col2, ...),
type2_table (col1, col2, ...) etc..
versus
one table
single_table( type, col1, col2...)
Thanx!
August 11, 2003 - 2:10 pm UTC
if they all store the same set of information -- then your "type" column is nothing more then yet another attribute -- no more or less relevant then col1, col2, col3
so, unless there is more then meets the eye here -- one table.
thanx!
A reader, August 11, 2003 - 2:11 pm UTC
multi user problem
A reader, October 14, 2003 - 6:02 am UTC
Hi Tom
For report query i am using temporary table.I am taking required information from different table then inserting into a temporary table then report generate.If anyone want to use this report then error message.so my question is
1)what is useful process to generate report in my mentioned criteria.
2)If my system is to generate serial number.what procedure should i take?
say i have to generate serial number for different person.in my form the operator fillup the whole screen then he click in the generate button.
Everything is for multi user.
with regards,
October 14, 2003 - 7:13 am UTC
use a real global temporary table
no concurrency issues whatsoever
read about them in the sql reference guide "create global temporary ...."
Column allocation and performance issues.
Alvin, October 21, 2003 - 11:39 pm UTC
I'm in a system where table columns are defined as
Create table T
( c1 varchar2(2056)
, c2 varchar2(2056)
);
when the data that's actually stored are never more than 100 characters.
1. Is there any performace issue on such allocations ? Or columns being varchar2 will be internally optimized by oracle.
2. How do i measure the impact of such allocations as compared defining the table as such
Create table T2
( c1 varchar2(200)
, c2 varchar2(200)
);
October 22, 2003 - 6:18 am UTC
1) i wrote about this in my new book "Effective Oracle By Design". Some of the things I said about such nonsense was:
...
In addition to using the proper base datatype such as number, date, or string, you should also use the most specific type you can. For example, use VARCHAR2(30) for a field that is up to 30 characters in length; do not use VARCHAR2(4000).
Ask Tom
"I work with a modelers group. My modeler would like to define every VARCHAR2 field with the maximum length, which means that a table with 20 VARCHAR2 fields will all be defined with a maximum of 2000 or 4000 bytes. I tried to talk to him about the reasons we identify data with correct lengths and names in order to understand what we have in our database. He told me that it doesn't matter, since Oracle just stores the length, etc., and there is no overhead. I don't believe this is true, but have been jumping between so many databases that I cannot find a document on the internals of Oracle storage. Can you help me out here with this question?"
My gut response was, "This is your data modeler, my goodness!" They are the ones who are supposed to be telling you that it is vital to use the appropriate length when defining fields! That is their job. Let's just forget about things like storage for a minute, why don't we ask him:
. What is going to happen when users pull this up in a query tool that formats each field based on the width of the column in the database? They'll see one column and need to scroll way over to see the second, the third, and so on.
. Say the code prepares a query that selects ten columns that are VARCHAR2. The developers, for performance, would like to array fetch (very important). They would like to array fetch say 100 rows (very typical). So, you have 4,000 ´ 10 ´ 100 = almost 4MB of RAM the developers must allocate! Now, consider if that were ten VARCHAR2(80) fields (it's probably much smaller than that). That's about 78KB. Ask the data modeler how much RAM he is willing to kick in for this system.
. Now, the developers start to build a data-entry screen to put data into the database. Wow, that code field can be 4,000 characters long and that first name can be 4,000 characters long. How the heck is anyone going to know what sort of data can really go in there?
Tell your data modeler group members that they need to consider the length as a constraint. Just as they use primary and foreign keys, they should use the proper and correct length on fields. You can always expand a field via a command like alter table t modify c varchar2(bigger_number). There is no reason to use the maximum length everywhere. It will hurt the applications you develop, because they will mistakenly allocate many megabytes of RAM. Just think of the array fetch example with an application server. Now, it's not just 4MB; it's 4MB ´ number of connections. You are talking some real memory here for a single query, and you'll be doing a lot of them at the same time.
End Ask Tom
A CHAR(2000) will consume 2,000 bytes of storage whether you put in the letter a, the string `hello world', or 2,000 characters. A CHAR is always blank-padded. Additionally, are you thinking about using an occasional index in your system? If so, beware of storage-related problems. Consider the following:
tkyte@ORA817.US.ORACLE.COM> create table t ( a varchar2(4000), b varchar2(4000));
Table created.
tkyte@ORA817.US.ORACLE.COM> create index t_idx on t(a);
create index t_idx on t(a)
*
ERROR at line 1:
ORA-01450: maximum key length (3218) exceeded
I've noticed that 9i seems to have magically improved upon this CM
NOTE: In Oracle9i, the maximum key length is larger - but the restriction still exists. For example, an index on T(a,b) would raise: ORA-01450: maximum key length (6398) exceeded in Oracle9i.
My system has an 8KB block size. I would need to use at least a 16KB block size to index a single column, but even then, if I tried to create a concatenated index on T(A,B), it would fail there!
The same holds true for your numbers and the new Oracle9i TIMESTAMP datatypes: When appropriate, use scales and precisions on those fields in order to better define your data integrity and to give applications that much more information about the data itself.
In short, never be tempted to use anything other than a DATE or TIMESTAMP datatype to hold a date value, and never be tempted to use a VARCHAR2 to hold a number. Use the appropriate and correct type for each to ensure maximum performance, and more important, to protect your data integrity.
2) it won't affect raw performance, the costs of doing such a thing are found elsewhere.
Order of column in a table
Ashiq Shamsuden A, October 22, 2003 - 6:49 am UTC
Hi Tom,
When you create a table , how you design it ? For example if I've table and it's going to have 100 columns and the table will have all kinds of datatypes. It'll be having composite primary key(of three columns).
Total of 100 columns ,the datatype what i'm going to have is
44 --> varchar2's
20 --> date's
35 --> number's
1 --> clob
How to order the columns for the creation , so I should get benefit by design properly.
October 22, 2003 - 7:00 am UTC
put the most frequently accessed columns "first"
put the columns most likely to be null "last"
but most importantly -- don't over analyze it. it won't make a tremendous amount of difference at the end of the day.
I just always put the primary keys first (convention) and the follow with the columns as they "make sense". when I think of a person for example:
person( unique_id primary key,
first_name,
last_name,
middle_initial,
address,
.....
the data just follows in the "natural order" i see it in my head.
Middle?
A reader, October 22, 2003 - 8:45 am UTC
Tom,
I know one third party application that has all the key columns exactly at the middle of the column list, in a table. The application is supposed to work with any database. How would you rate this?
Thanks
October 22, 2003 - 12:07 pm UTC
confusing but not a big deal
Thanks.
Alvin, October 23, 2003 - 1:57 am UTC
I have both your books and i just read the first chapter of the new book.
Followup on my earlier question.
How would oracle store varchar2(2000) and varchar2(30) in the block buffer cache ? won't the former entail additional overhead of any kind ?
My developers are a tad greedy on allocations and i want to show them the effects of such (if any).
October 23, 2003 - 8:11 am UTC
they will not be stored any differently, no.
the effects are all in the client, in the loss of data integrity, in the loss of meaning, in the sloppiness of the implementation.
cool info on column ordering thanks
Kevin Meade, October 23, 2003 - 10:52 am UTC
Thanks Tom, may I offer my 2 cents; I would note regarding column ordering these clarifications: 1) if the reader has read this thread, you have explained why primary key comes first (some modelers also suggest foreign keys come next (eh..)), 2) also you have explained why "most used" columns come second, 3) I would mention that "likely to be null" columns at the end of a row is as I recall because null columns at the end of a row take no space on the row but null columns between columns with values take 1 byte (ooh boy jumping at the space savings).
As always though you hit the nail on the head with "don't over analyze, do what makes sense". I can't say I every paid much attention to most of this as there are always more important fish to fry. Perfection is never the goal (ooh, did I say never?).
My I also suggest, there is always alphabetical order (hmmm...).
October 23, 2003 - 1:02 pm UTC
i remember once working as a coder -- doing some Ada programming.
the "senior reviewers" wanted me to code all subroutines in alphabetical order, that last comment reminded me of that. I hated that! (i made them really made by using "renames" in ada to give the appearance of compliance without really reordering any code -- i just added AA_ AB_ AC_ to the front of each routine and then renamed AA_proc1 to proc1 :)
clob clarification
Rob, October 23, 2003 - 11:53 am UTC
Tom:
When you say:
when you use clobs, they are moved out of the table into their own segment when
they exceed 4000 bytes
Does this mean that all clobs from each row are moved, or just the ones where the the clob in a particular row exceeds 4000 bytes.
Thanks,
Rob
October 23, 2003 - 1:18 pm UTC
just the ones that are bigger then 4000
Storing Real Numbers and Pseudo-Numbers
Dave, October 23, 2003 - 12:53 pm UTC
Quote: "... never be tempted to use a VARCHAR2 to hold a
number"
When storing such entities as Zip codes, Social Security Numbers, credit card numbers, telephone numbers etc., I have always used Char(), or VarChar2(), because my view was that these are not "real" numbers.
They are codes for which all of the characters happen to be numeric, and any operation that the system is going to perform will be character-based. For example, formatting a telephone number as (999) 999-9999, or viewing the last four of an SSN. We're certainly never going to perform any algebraic op's on them, like adding one zip code to another, and we can easily apply check constraints to ensure that ...
Translate(SSN,'0123456789','0000000000')='000000000'
... for example.
What's your opinion, Tom? Does this seem like a valid distinction, "real" and "pseudo" numbers?
October 23, 2003 - 1:28 pm UTC
they are not numbers as you said. zip codes in particular are "postal codes" in general and are alpha numeric in reality.
so, since they are not numbers, store in a string -- you'll never treat them as numbers so it's ok.
Length of primary key
Kunjan Mehta, October 31, 2003 - 3:37 am UTC
Hi Tom,
We are using surrogate keys as the primary keys. We are using oracle sequences to generate surrogate keys.
I believe, we can generate upto Number(27) through the sequences. is it correct? If not, is there any limit on the largest number being generated by sequence? What is it?
Is there any desired maximum length for the primary key? (e.g. it should not be more than 19 digits or something like that)?
We are desinging a customized ERP. Is it advisable to use a single sequence for all the surrogate keys across the entire application? (approx 1000 tables in total) or should each table have one dedicated sequence or each logical group (say tables for Accounts, tables for materials etc.) should use a dedicated sequences each?
(we're using 9iR1 as database.)
October 31, 2003 - 7:44 am UTC
ops$tkyte@ORA920LAP> select rpad('9',27,'9')/60/60/24/366/1000 from dual;
RPAD('9',27,'9')/60/60/24/366/1000
----------------------------------
3.1623E+16
27 9's is a big big number. that shows it would take 3.16e16 YEARS to exhaust the sequence generating 1000 numbers a second
the number of digits is not going to have much bearing on anything. it'll be just fine to use a single number column as big as it gets.
You should not have a single sequence. see
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:2985886242221
not exhausting sequence, but what about latches
freek, October 31, 2003 - 8:06 am UTC
Tom,
Using 1 sequence for an entire application wouldn't exhaust the sequence, but wouldn't it be still better to use 1 sequence per table / column to reduce waits?
October 31, 2003 - 8:17 am UTC
you know what, i didn't read the entire thing. thanks -- i'll finish my answer now :)
What about Table APIs
A P Clarke, November 03, 2003 - 7:49 am UTC
Towards the top of this thread you said:
>> SELECT only that data which you actually NEED.
This is my preferred approach (got burned by SELECT * a long while back). Howvever, there are a number of respected practioners out there who recommend using Table APIs i.e. PL/SQL encapsulation packages that act as the sole method of interacting with a table. The advantages are security and (allegedly) performance - no multiple versions of the same SELECT statement knocking around, just one "tuned" one.
However, it seems to me that the SLCT() functions in Table APIs - such as the ones generated by Oracle Designer - must always select every columns, unless we code a SLCT() function for every single permutation of column, which I think is impractical.
I have survived a long time without using Table APIs but I got doubts when I read Steven Feuerstein's PL/SQL Best Practices book. What do you think? Can table APIs be reconciled with "only SELECT what you need"?
Cheers, APC
P.S. I'm not trying to set up some PL/SQL version of "Clash of the Titans" here :P
November 03, 2003 - 8:12 am UTC
well, table API's to me are somewhat weak in the first place -- or just have special use cases (simple CRUD screens/data entry).
I mean, NONE of my applications -- none of them -- use simplistic "select ... from t where ....". They all access multiple tables -- all of the time.
How many times are your needs actually met by "select * from emp"?
This page you are looking at is built from some 7 different tables (if I don't include the html db stuff that happens as a side effect).
A simple table API in my world is virtually useless.
I use API's -- but, they are all special purpose API's (asktom is built on such an API). The queries are specificially designed to answer a specific question. The API's to modify all touch multiple tables.
I've personally never had a use for single table API packages -- maybe others can comment and change my mind (but i doubt it -- beyond simple single table CRUD screens they only seem to make things harder)
Phew! I thought it was just me
A P Clarke, November 03, 2003 - 9:04 am UTC
>> How many times are your needs actually met by "select * from emp"?
Um, almost never. That was a rhetorical question, right?
The only time I have dabbled with something like Table APIs was writing some database procedures that used REF CURSORs to assemble resultsets for processing by an EJB web site. But, like you said about this site here, the queries were "specificially designed to answer a specific question". Apart from anything else, such an approach strikes me as a sensible division of labour - I knew Oracle, they knew Java.
Anyway, thanks for putting my mind at rest with regards to generic Table APIs.
Cheers, APC
Table APIs
Tony Andrews, November 03, 2003 - 11:53 am UTC
Makes sense to me, but unfortunately not to those I report to at the moment. I just wrote a table API for our project (my first for this one). My procedures were based on business logic. For example, when you create a new (let's say) salary record for an employee, you have to "close" the previous record by setting the end date to the day prior to the new record's start date. So my routine did that. It was rejected by my team leader as being "not modular enough". She said I should write one procedure to insert the new record, and another to update the old record. The client application will then call them in sequence...
November 03, 2003 - 5:57 pm UTC
team leader needs to read the next chapter in their programming book....
;)
it doesn't seem to make SENSE to do what she is describing.
This must be some new definition of the word "modular"
A P Clarke, November 03, 2003 - 12:17 pm UTC
Maybe it's like playground slang, where children take a normal word and pronounce with such a sneer as to make it mean "stupid" or some variant thereof. As in, "You still like Girls Aloud? You must be really _modular_!"
Seriously, the major defensible reason for building Table APIs is to encapsulate business logic. To design a Table API that is dependent on the client to enforce the rules is, er, counter-intuitive.
Good luck, APC
Exactly the opposite is better
Kevin, November 16, 2003 - 5:58 pm UTC
Hi Tom, I'm enjoying the thread, can't blame the manager for wanting to be modular, but can't say I like the approach at all either. If I may interject my thoughts too:
Tony, your manager should consider that hiding the behaviour is often the best approach. Requiring an app to do "redundant" work is a prescription for failure. In my shop I require all DBAs and Developers to hide things like this. We prefer to have the database automatically recongnize the need to update these kinds of related changes. Perhapes you should consider constructing the two modular components as she requests but then instead of requiring the client to call both, point out that the client's job is to tell the database it wants a new record, and that it is really the database's job to automatically close the prior one. After all, this is a data behaviour built into your database design and as long as you decided you have to have this behaviour, you might as well let the database enforce it. Should you decide to change this design, you would need to change client apps too and that is always bad. If the database is in control, then there is no way for the client to forget it, do it different, or do it wrong, and you won't need to recode everything multiple times for each app that might decide in the future it wants to create new records because the data rule is in only one place, the data base. This means, always enforced, always right, always the same, no developer required (I didn't mean that like it sounds).
In particular, we enjoy liberal use of views and instead of triggers in our goal of database managed rules. Our client apps are much lighter this way and there are vastly fewer data goofups to boot. Good luck.
table design question
Sonali, January 28, 2004 - 12:56 pm UTC
We have a resource table with 60 columns and 4 of them are varchar2(255) and rest of them are varchar2(60) and below and some number columns. Should the row size be of concern ? Would you call this a big table as far as number of columns goes ?
We were asked if splitting this table into 2 and moving the addresses, email adresses to other table would improve the performance as these columns would not be that extensively used. We are careful in not using select * in the queries.
then would this make it any more efficient as it will be returning half the amount of data if 100's of times a resource is read. I feel that if we are not going to do select * and select all the columns then it would not make it half the data if we split the table, would it ?
Whats your openion on this ?
Thanks
January 28, 2004 - 2:37 pm UTC
if you index read and the rows fit on a block -- what you have is fine.
if you full scan and don't need all of the columns, factoring out will help.
but, if you are using indexes -- doesn't matter really.
Design Question
A reader, February 09, 2004 - 12:41 pm UTC
Hi Tom,
We have a design question here. We are developing an application for multiple clients, and in some metadata tables some data are the same for all clients and some are different. We have two choices here:
1). Reserve a section of IDÂ’s in each base table for base data (e.g. 0-999) and start the auto-sequencer for user data above this reserved section.
2). Duplicate the base tables (1 for base data, 1 for user data).
Which one you think is the right way?
February 09, 2004 - 12:48 pm UTC
why do you need to do either of #1 or #2? i'm confused on that point.
re: Design Question
A reader, February 09, 2004 - 1:00 pm UTC
One pointed I missed is that our application is tightly built on these metadata tables. For example, in the application it may be hard-coded that:
if (grouptypeid ==1)
{...}
Here grouptypeid 1 is the base data and must be the same for all clients.
However each client can defined their own group types which could be different from each other.
Thanks for your comments.
February 09, 2004 - 1:23 pm UTC
sure, start the sequence at some large number then. no problem with that.
(don't like the id of hard coded values like that, but thats another story)
Storage Clause
A Reader, February 09, 2004 - 11:30 pm UTC
Dear Tom,
Can you please explain that how to determine different values of the parameters in storage clause of Create Table statement. Secondly what issues should be taken care while creating the table.
Thanks
Regards
February 10, 2004 - 6:32 am UTC
You should ONLY use
freelist
freelist groups
buffer pool
Use LMT's and forget about the rest -- they are not worth our time. Using an LMT makes it easy.
freelists would be set to non-default values for objects having highly concurrent modifications against them (inserts especially, updates that migrate rows).
buffer pool for doing the last mile of tuning.
check constraint on raw column
sonali, March 12, 2004 - 11:22 am UTC
I have created a table for storing salary..
SQLWKS> CREATE TABLE test_Salary
(
Sal_ID number(10,0) PRIMARY KEY,
Sal_Amount raw(16) ,
Sal_Start date NULL,
Sal_Finish date NULL,
constraint chk_sal check (Sal_Amount> 0)
);
constraint chk_sal check (Sal_Amount> 0)
*
ORA-00932: inconsistent datatypes: expected BINARY got NUMBER
I want to check if the salary amount entered is valid integer or not. I have this column as RAW for encryption but the person entering the value should not be anything but integer. How do I do this ? It seems this table is going to be populated from flat file so I cannot have validation there.. they will have their own program to load this so I cannot edit it. I have to do this in this table or in my encryption procedure, which is a better place to do it ?
Thanks
March 12, 2004 - 11:43 am UTC
it is raw, it is not a number at all -- it is whatever 16 bytes of "stuff" you stuff in there.
if it is encrypted "stuff", there is 0% chance that you can "check constrain" it -- you have no idea what the value is.
Will this work then ?
sonali, March 12, 2004 - 1:08 pm UTC
Suppose I create a trigger ( before insert on this table)..
fn_add is the function to encrypt this sal_amount column.
So my plan was to have a trigger that converts the number entered by some user or program to character using this trigger and then encrypt it right away..
SQLWKS> CREATE or REPLACE TRIGGER tr_ETsalInsert
2> BEFORE INSERT ON Salary
3> For Each Row
4>
5> Begin
6> :NEW.Sal_amount := fn_add(to_char(:NEW.Sal_amount));
7> commit;
8> End tr_ETsalInsert;
9> /
Statement processed.
SQLWKS> show errors
No errors for TRIGGER TR_ETSALINSERT
SQLWKS> insert into salary
2> values( Salary_Seq.nextval, 10, 400, null, null)
3>
values( Salary_Seq.nextval, 10, 400, null, null)
*
ORA-00932: inconsistent datatypes: expected BINARY got NUMBER
I can see this approach isn't working but thought this would be better than adding one more numeric column, have this trigger do the conversion and delete value from that column right after the conversion.
Thanks
March 12, 2004 - 5:25 pm UTC
you have a RAW -- a raw is a raw. The trigger would take a RAW as input and do whatever it wants.
You do NOT want to make this "transparent" like that. the application will
insert into salarly( s.nextval, 10, fn_add(sal_amount), null, null );
that'll be the only realistic way to do this. You could play games with instead of triggers and views but that would only
a) make it slower
b) make it more obscure
this does not work either
sonali, March 12, 2004 - 2:13 pm UTC
CREATE TABLE Salary
(
Sal_ID number(10,0) PRIMARY KEY,
Sal_ETResID number(10,0),
Sal_Amount raw(16),
Sal_TempAmount number(10,0),
Sal_Start date NULL,
Sal_Finish date NULL,
constraint chk_sal check (Sal_TempAmount> 0)
);
CREATE or REPLACE TRIGGER tr_ETsalInsert
BEFORE INSERT ON Salary
For Each Row
DECLARE strSal_amount varchar2(20);
Begin
strSal_amount := to_char(:NEW.Sal_TempAmount);
:NEW.Sal_amount := fn_add(strSal_amount);
Update salary
set Sal_TempAmount =0
where sal_id = :NEW.sal_id;
commit;
End tr_ETsalInsert;
/
show errors
SQLWKS> insert into salary
2> values( Salary_Seq.nextval, 10, null,50, null, null)
3>
1 row processed.
SQLWKS> select SAL_AMOUNT, sal_tempamount from salary
2>
SAL_AMOUNT SAL_TEMPAM
-------------------------------- ----------
67BADC900CB03168 600
09016723543F1792 50
4 rows selected.
This does not update the sal_tempamount to 0.. I would guess thats because I have before insert trigger, but then if I have after insert trigger then I will not be able to set the values of :NEW.. if I split this code into 2 parts before insert and after insert (Doing this update, it will not work either as it will be mutating).. how do I do this ?
March 12, 2004 - 7:46 pm UTC
see above
Table Design
Scot Armstrong, April 01, 2004 - 11:48 am UTC
We are creating an application that will have 100 plus fields one of them being a clob. All of the fields needs to be searchable.
From a design and performance standpoint, what would be the best approach:
1. Create the table with all of the columns and use an Oracle Text index with a multi_column_datastore with all of the columns.
2. Create the table with the primary key columns, and store the rest in a clob and use Oracle Text.
3. Create the table with the primary key columns, and store the rest in an xmltype inserting the data in xml format and use Oracle Text. The thought for this option was in case we needed to selectively search specific fields in the future.
Table Design
Scot Armstrong, April 06, 2004 - 9:51 am UTC
Thanks for the help.
I've created a function and indexed in other solutions involving a single table. I've looked at the article for indexing multiple tables as well, but am still unclear how to set up the tables and the function. Would it be possible to walk through an example of specifically showing it with multiple tables?
Also regarding solution #3. Based on how the design specifications come from our business users, they will eventually want to know what column the results from the query are coming from. Would solution #3 be the most viable one for such a requirement or is there a better one?
April 07, 2004 - 8:33 am UTC
you can query anything you want at all in the function.
Take this simple example -- in asktom my pages are made up from three tables -- the Q&A table (with question and answer) and the review table (with your comments -- a detail table) and the followup table (with my response).
anytime the Q&A table is updated -- that row gets re-indexed. that row has its primary key which is a foreign key in the other tables. My logic that adds a review or followup also 'tickles' this parent table -- updating the timestamp column. that triggers the re-index of that row -- that function gets the rowid of the parent row -- reads the Q&A from there, gets the primary key which it then uses to query the details and gather up their text.
Table Design
A reader, April 08, 2004 - 9:27 am UTC
Thanks for the architecture insight!
That solves my issue of how to deal with updates in the other tables and getting the Oracle Text index to be updated.
Multiple LOB Table Design
Scot Armstrong, April 21, 2004 - 8:57 am UTC
I have another design question. We are trying to standardize our table designs regarding lobs (i.e blobs and clobs).
There is one camp that will put multiple lobs columns used for different purposes throughout thier relational tables.
Another camp prefers to have one central table for clobs, and another for blobs, and have the other tables in the schema that need to use these datatypes relationally join to these central tables to store and retrieve the data.
Which design choice is overall the best? Which design has the best performance? Which design has the best scalability?
Thanks for all of your help!
Scot
April 21, 2004 - 8:25 pm UTC
the first guys get my vote.
a lob is just a datatype like number. do they put all of their numbers in a separate table?
the second approach was valid perhaps in the days of LONGs and LONG RAWs (but then I don't think i'd want a SINGLE table -- a table per table perhaps)..
Data Model
Shams, July 02, 2004 - 12:03 pm UTC
Your site is the greatest resource !!!
I work for a insurance company which has ten different departments,
currently we have separate schema for each department. I am planning to combine all the departments data (each department has millions of records) into a single schema
Lets say for examaple I have the following tables in each schema
person - contains persons information like name, date of birth etc.(20 columns)
person employer -person employer information(10 columns)
-- may or may not change when a new
claim is filed
-- if the information is changed,
it is just updated to the new one.
contacts - contains person contacts at the claim level
(10 columns)
-- may or may not change when
a new claim is filed
-- if the information is changed,
it is just updated to the new one.
claims - claim information (60 columns)
insured persons - contains the primary insured person
information at the claim level
(15 columns)
may or may not change claim filed
In every table we have a POLICY_NO column.
employer
Y
|
|
person -------------< claims ------------< contacts
|
|
/\
Insured_Person
I want to combine the design , what is the best approach, thinking the data volume and performance of the queries which joins claim level data all the times.
Do I need to denormalize the design and combine the claims, employer , contacts, insured person table in one and add DEPT_ID in each table to identify the departments.
Thanks
Shams
July 02, 2004 - 12:22 pm UTC
Off the top of my head:
I would rename each table to TABLE_NAME_T,
I would add a DEPT_ID column to each column,
I would have a trigger maintain this DEPT_ID column (by plopping the sys_context( 'myctx', 'dept-id') value into it,
I would create views as TABLE_NAME selecting everything but DEPT_ID,
I would add DEPT_ID to the leading edge of all indexes,
create the policy that adds the predicate "and dept_id = sys_context.....
Add a database login trigger or call logic after connecting/getting a session to set the application context and call it a day.
you'd be done more or less. As long as your system did not do any full scans, you'd be finished and should have more or less the same sort of performance characteristics you do now.
Another approach would be to set up a partition for each dept (still adding that dept_id column), you would use local indexes for everything. You would not need to add dept_id to any of them (it would be implied). That would solve any full scan issues as well (as they would full scan ONLY their partition)
data model
Shams, July 02, 2004 - 2:03 pm UTC
Thanks for the fast response
Are you saying the denormalize approach to keep only two tables
person and claims( claims combined with employer , contacts and insured_person tables)
will not increase the performance in the combined design (for the queries selecting the data in the one FAT table) as compared with individual schema design( where queries join the three different tables) .
if it is true then it would be better to have the design normalized and PARTITION all the tables with DEPT_ID , as you suggested.
Can you confirm.
Thanks,
Shams
July 02, 2004 - 2:40 pm UTC
I'm saying "you have an existing application, if you want to merge schemas - you can do so like this"
I cannot say if denormalizing is a good thing or bad thing here, I don't know all of the requirements of the application.
data model
shams, July 02, 2004 - 5:14 pm UTC
Again thanks for the fast response
If I totally want to redesign the process from scratch.
What will be your best approach, still you think having all the three tables (big tables with millions of records) data elements in one table in a denormalized condition to avoid mutilple joins in the queries.
Anyhow I have to partition the tables whether I have three tables or one table.
what is the good approach of having one big table with partition on dept_id or three tables with partition on dept_id in a normalized way
Thanks,
Shams
July 02, 2004 - 7:03 pm UTC
like I said, i don't know your requirements... all i know is you have an insurance application you want to build.
with all of the "many" relations -- start normalized and only denormalized when the performance of your queries is just "way too bad".
if you are building from scratch and dept_id is your natural partition key for security reasons, it would be a very good candidiate, yes
storage of BLOB and XML in single table
A reader, July 03, 2004 - 4:27 am UTC
Hi Tom,
I have a single table in my application which stores all the files of the application.
totally it has five columns.
3 xml columns
1 blob columns.
1 key column.
xml columns are basd on xml schema, so they are stored as object relational .
So what would be the storage option you would advice me.
Woould it be better to store all the four columns in seperate tablespace or xml columns in one tablespace and blob in another.
your advice is highly appreciated.
POssible if you can demonstrate with an example would be very nice.
thank you.
July 03, 2004 - 10:20 am UTC
I have no problem with everything being in a single tablespace.
Use the tablespace concept to make your life easier. To make administration easier. I'm assuming you have file systems built using typical technologies like striping and such so that IO distribution is "just being done for you". Use the tablespace concept when it behooves you - when it appears to make something "easier" for you or you are more pleased with the results.
A reader, July 14, 2004 - 2:25 pm UTC
Tom,
This may be bit off the thread. But I don't find a way to submit the question on your site. Have you heard something like "Regressive relationship". What does it mean? I came across this from an interview. I was not able to answer this
July 14, 2004 - 10:14 pm UTC
never heard of it really.
"Regressive relationship"
Tony Andrews, July 15, 2004 - 7:32 am UTC
Maybe it was "reflexive relationship"? i.e. a foreign key that references the same table, like emp.mgr?
Table storage question
sonali, July 25, 2004 - 9:36 pm UTC
I know this question will sound silly, but we have some need for this kind of design
Is there any way I can make table store only specific number of rows..say 5 rows in a table.
We have a system table that we populate during installation and people are not supposed to insert any rows afterwords. But they do it everynow and then and we have big problems.
Because they are still doing the setup during installation we cannot make that table read only..
Thanks
July 26, 2004 - 7:23 am UTC
ops$tkyte@ORA9IR2> create table five_rows ( x int );
Table created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create table cnt ( cnt int not null constraint only_five_rows_ok check (cnt<=5) );
Table created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create or replace trigger five_rows_only
2 before insert or delete on five_rows for each row
3 declare
4 l_inc number default 1;
5 begin
6 if (deleting)
7 then
8 l_inc := -1;
9 end if;
10 merge into cnt
11 using (select l_inc inc from dual) d
12 on (1=1)
13 when matched then update set cnt = cnt+d.inc
14 when not matched then insert (cnt) values (d.inc);
15 end;
16 /
Trigger created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> insert into five_rows values ( 1 );
1 row created.
ops$tkyte@ORA9IR2> insert into five_rows values ( 1 );
1 row created.
ops$tkyte@ORA9IR2> insert into five_rows values ( 1 );
1 row created.
ops$tkyte@ORA9IR2> insert into five_rows values ( 1 );
1 row created.
ops$tkyte@ORA9IR2> insert into five_rows values ( 1 );
1 row created.
ops$tkyte@ORA9IR2> insert into five_rows values ( 1 );
insert into five_rows values ( 1 )
*
ERROR at line 1:
ORA-02290: check constraint (OPS$TKYTE.ONLY_FIVE_ROWS_OK) violated
ORA-06512: at "OPS$TKYTE.FIVE_ROWS_ONLY", line 8
ORA-04088: error during execution of trigger 'OPS$TKYTE.FIVE_ROWS_ONLY'
ops$tkyte@ORA9IR2> delete from five_rows where rownum = 1;
1 row deleted.
ops$tkyte@ORA9IR2> insert into five_rows values ( 1 );
1 row created.
ops$tkyte@ORA9IR2> insert into five_rows values ( 1 );
insert into five_rows values ( 1 )
*
ERROR at line 1:
ORA-02290: check constraint (OPS$TKYTE.ONLY_FIVE_ROWS_OK) violated
ORA-06512: at "OPS$TKYTE.FIVE_ROWS_ONLY", line 8
ORA-04088: error during execution of trigger 'OPS$TKYTE.FIVE_ROWS_ONLY'
Wow, you have solution for almost any problem !!
sonali, July 26, 2004 - 9:35 am UTC
not very secure
Gabe, July 27, 2004 - 10:02 am UTC
<quote>We have a system table that we populate during installation and people are not supposed to insert any rows afterwords. But they do it everynow and then and we have big problems.</quote>
If they add records in the table when they are explicitly told not to then they will find a way around the trigger as well (they will temporarily disable it) ... I've seen that before. One way to do it properly is through security ... put the 5-row table in a separate schema (don't publish the password) and grant only select access to it ... very much like dual.
Short of that, a row-level before ins/upd/del trigger created/enabled right after installation which just does raise_application_error should do it.
an I vatchen
A reader, July 28, 2004 - 3:09 pm UTC
schema design question
A reader, July 30, 2004 - 1:30 pm UTC
hi tom
suppose you have a design where there is a generic
table ( I know you dont like them but sometimes
they are a good solution)
e.g.
create table t1
(
common_col1 varchar2(10),
type_of_data varchar2(20),
generic_col varchar2(256),
generic_col1 varchar2(256),
generic_col2 varchar2(256)
);
and so on.
In the generic columns (generic_col*) we store
differnt types of data. Then we work on views
created on top of these.
create or replace view view1
as
select common_col1, type_of_data,
decode( type_of_data, 'type1', generic_col1, null)
as specific_col_name,
...
Now in this case if I new that the specific_col_name
length can never exceed 50, then is it a good idea to
do a substr in the decode as
create or replace view view1
as
select common_col1, type_of_data,
decode( type_of_data, 'type1',
substr(generic_col1,1,50), null)
as specific_col_name,
...
?
I think this will benefit the clients that would
look at the data and allocated storage based on
the max length of varchar2.
What do you think?
Thanx!
July 30, 2004 - 5:44 pm UTC
the substr would be benefical - yes. a view would know "50 characters", a fetch would too -- a report can describe the query and get meaningful information.
thank you!
A reader, July 30, 2004 - 5:48 pm UTC
Steve, August 31, 2004 - 12:22 pm UTC
We have a table that records the activities of our salesforce. Each activity leads to an outcome and a further activity to be undertaken in the future.
When an activity row is created the "outcome" column remains null until the activity is complete at which point the outcome column is populated and a follow-up activity row created. The outcome column is queried heavily for reporting and for creating "diary" type reports for those activities still outstanding (ie the outcome remains null). Now I'm wondering whether it might be better to have a default value for outcome until the activity is complete or create a separate table for outcomes which comprises of the PK from the activity table and the outcome of that activity. Your thoughts would be appreciated. Thanks
August 31, 2004 - 1:35 pm UTC
NULL is perfectly valid. NULL is perfectly fine.
NULL is in fact the default "default value", it is in a sense "a value"
I would leave it be. Why do you think it would be better otherwise? what is the thought behind the premise?
Designing a Table with XML Type Column with Constraints
Suraj, August 31, 2004 - 4:08 pm UTC
Thanks!The Article provides insight on design aspect.
I have a different scenario with me. I have a table where i have planned to store the data in an XMLTYPE column as a Clob Storage. I am unable to find options to indulge constraints ( Primary, unique, referential and check) on different elements of XML with other regular table's columns in the Data Model.
Can you please let me know whether is there any way to apply data integrity on a xml elements stored in XMLType of Clob Storage(Unstructured Storage).
August 31, 2004 - 4:53 pm UTC
(thanks in large part to Sean Dillon....)
ops$tkyte@ORA9IR2> create table t ( x int, y xmltype );
Table created.
ops$tkyte@ORA9IR2> create or replace trigger t before insert on t for each row
2 declare
3 no_nulls_you_lose exception;
4 pragma exception_init( no_nulls_you_lose, -1400 );
5 begin
6 if ( :new.y.extract( '//bar/text()' ).getNumberVal() > 5 )
7 then
8 raise_application_error( -20000, 'you lose' );
9 end if;
10 if ( :new.y.EXISTSNODE('//foo/text()' ) = 0 )
11 then
12 raise no_nulls_you_lose;
13 end if;
14 end;
15 /
Trigger created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create unique index t_idx on t( y.extract( '//foo/text()' ).getStringVal() );
Index created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> insert into t values ( 1, xmltype('<x><foo>x</foo><bar>5</bar></x>') );
1 row created.
ops$tkyte@ORA9IR2> insert into t values ( 1, xmltype('<x><foo>x</foo><bar>5</bar></x>') );
insert into t values ( 1, xmltype('<x><foo>x</foo><bar>5</bar></x>') )
*
ERROR at line 1:
ORA-00001: unique constraint (OPS$TKYTE.T_IDX) violated
ops$tkyte@ORA9IR2> insert into t values ( 1, xmltype('<x><foo></foo><bar>5</bar></x>') );
insert into t values ( 1, xmltype('<x><foo></foo><bar>5</bar></x>') )
*
ERROR at line 1:
ORA-01400: cannot insert NULL into ()
ORA-06512: at "OPS$TKYTE.T", line 11
ORA-04088: error during execution of trigger 'OPS$TKYTE.T'
ops$tkyte@ORA9IR2> insert into t values ( 1, xmltype('<x><foo>y</foo><bar>55</bar></x>') );
insert into t values ( 1, xmltype('<x><foo>y</foo><bar>55</bar></x>') )
*
ERROR at line 1:
ORA-20000: you lose
ORA-06512: at "OPS$TKYTE.T", line 7
ORA-04088: error during execution of trigger 'OPS$TKYTE.T'
Steve, September 01, 2004 - 6:12 am UTC
The premise being that we have many queries which search for unfinished activities ie the outcome is null. Now it's not a large table (about 1 million rows or so) but each query will use a full table scan when only a small %age of activities are outstanding. So, presumably (as it's 9.0.1.4) I could use a function index like decode(<column>,null,'N',<column>) and then change the search criteria to "where decode(<column>,null,'N',<column>)='N'?
September 01, 2004 - 8:30 am UTC
seems like a data model issue really.
You have an "outcome column"
<quote>
When an activity row is created the "outcome" column remains null until the
activity is complete at which point the outcome column is populated and a
follow-up activity row created.
</quote>
you are using outcome as a flag to indicate "open" or "closed"
seems you missed a column -- "status" with values "open" and "closed"?
you are using this column for multiple (undocumentable) purposes. Seems you need an open/closed column.
Of course!!
Steve, September 01, 2004 - 10:22 am UTC
It's so obvious now you say it - I was trying to overcomplicate the issue. Thanks
Wanted your inputs on table design and query building
Saheli, December 15, 2004 - 2:50 am UTC
There is a need to develop an application which will help the doctors do an extensive field level search on the patient records. The requirement is to search on multiple fields ( number not known - in the sense that if there are 400 different fields in a patient record - the doctor should be able to serach on all the fields simultaneously - though it may never happen in reality and logically) and the query has to be built dynamically to include all the fields that the doctor wants to search on.
One patient record is the complete medical record of the patient, including all the various tests undergone, reports taken and the like - you are very well aware of.
For example, if there is a medical health check casesheet report, it has sections like physical examination, systemic review, investigations etc. Under systemic review there can be multiple sub sections like cardio vascular system, respiratory system, CNS etc. Each of these can in turn have sub sections. The final node will be the attributes of each lowest level subsection. The doctor should be able to search on all these fields.
I would appreciate your inputs and comments as to how to approach this in terms of table designs and qeury building on the fly.
This is a sample of how the data can be is given below. This is only part of the data. It will include details about the entire medical record:
<?xml version="1.0" encoding="UTF-8" ?>
- <patientinfo>
+ <personal>
<uhid>00110010101</uhid>
<regdate>dd-mm-yyyy</regdate>
<pname>Text</pname>
<relname>Text</relname>
<gender>Text</gender>
<age>Numeric</age>
<mstatus>Text</mstatus>
<addr>Text</addr>
<city>Text</city>
<pin>Text</pin>
<state>Text</state>
<country>Text</country>
<phone>Text</phone>
<religion>Text</religion>
<occup>Text</occup>
</personal>
- <pataddm type="mhc">
- <mhc>
<mhcvisitdate>17-10-1999</mhcvisitdate>
<checkuptype>ABC HEALTH CHECK</checkuptype>
<chiefcomp>Known Hypertensive - 1 Year.; on regular medication.; Year</chiefcomp>
<hispresentillness>TEXT</hispresentillness>
<mhcimpression>Systemic Hypertension,; Old ASMI,; Obesity</mhcimpression>
+ <mhcadvice>
<advice>Salt restricted diet,; Weight reduction.</advice>
- <mhcmedication>
- <mhcmedi>
<type>Tab.</type>
<name>Atenolol 50 mg.</name>
<seq>1 daily</seq>
</mhcmedi>
</mhcmedication>
</mhcadvice>
- <casesheet>
<casedate>dd-mm-yyyy</casedate>
<consult>Dr. ABC</consult>
- <systemreview>
- <cardiosystem>
<chestpain>TEXT</chestpain>
<breath>Yes on exertion - 1 year</breath>
<palpitation>TEXT</palpitation>
<sweling>TEXT</sweling>
<hypertension>TEXT</hypertension>
- <anymedication>
- <sysmedi>
<type>Tab.</type>
<name>Crocine</name>
<seq>1-0-0</seq>
</sysmedi>
- <sysmedi>
<type>Tab.</type>
<name>Crocine</name>
<seq>1-0-0</seq>
</sysmedi>
</anymedication>
</cardiosystem>
- <respiratorysys>
<sneezing>TEXT</sneezing>
<runningnose>TEXT</runningnose>
<noseblock>TEXT</noseblock>
<cough>TEXT</cough>
<sputum>TEXT</sputum>
<dyspnoea>TEXT</dyspnoea>
<wheezing>TEXT</wheezing>
</respiratorysys>
+ <dentaldetail>
<dentist>TEXT</dentist>
<dental>TEXT</dental>
</dentaldetail>
+ <gastrosys>
<appetite>Normal</appetite>
<abdominalpain>TEXT</abdominalpain>
<flatulance>TEXT</flatulance>
<bowelhabit>TEXT</bowelhabit>
<papticulcers>TEXT</papticulcers>
<moutnulcers>TEXT</moutnulcers>
<piles>TEXT</piles>
</gastrosys>
+ <genitourinary>
<frequency>Normal</frequency>
<burning>TEXT</burning>
<urgency>TEXT</urgency>
<hesitency>TEXT</hesitency>
<dribbling>TEXT</dribbling>
<overflow>TEXT</overflow>
<prostate>TEXT</prostate>
<hydrocele>TEXT</hydrocele>
<gyn>TEXT</gyn>
</genitourinary>
+ <nervoussys>
<headache>Yes</headache>
<giddiness>TEXT</giddiness>
<subjective>TEXT</subjective>
<objective>TEXT</objective>
<aura>TEXT</aura>
<memory>Normal</memory>
<neuritis>TEXT</neuritis>
<sleep>Normal</sleep>
</nervoussys>
+ <eyes>
<vision>Normal</vision>
<colorvision>TEXT</colorvision>
<palpebral>TEXT</palpebral>
<ptoris>TEXT</ptoris>
<cornea>TEXT</cornea>
<fundus>TEXT</fundus>
<opticdis>TEXT</opticdis>
<conjunctiva>TEXT</conjunctiva>
<pupil>TEXT</pupil>
<squint>TEXT</squint>
<glass>TEXT</glass>
</eyes>
+ <ears>
<hearing>Normal</hearing>
</ears>
<weight>Weight Steady / Weight Lose / Weight Gain</weight>
<spinejoints>TEXT</spinejoints>
+ <skin>
<sculy>TEXT</sculy>
<hypopigmentation>TEXT</hypopigmentation>
<hyperpigmentation>TEXT</hyperpigmentation>
<dermatitis>TEXT</dermatitis>
<fungus>TEXT</fungus>
<eczema>TEXT</eczema>
<discolouration>TEXT</discolouration>
<acne>TEXT</acne>
<warts>TEXT</warts>
<rashes>TEXT</rashes>
<urticaria>TEXT</urticaria>
<skinlesions>TEXT</skinlesions>
<xanthalesma>TEXT</xanthalesma>
<scars>TEXT</scars>
<skinothers>TEXT</skinothers>
<dry>TEXT</dry>
<scratchmark>TEXT</scratchmark>
</skin>
<generalsymp>TEXT</generalsymp>
+ <presentmedications>
+ <premedi>
<type>Tab.</type>
<name>Atenolol 50 mg.</name>
<seq>1 daily</seq>
</premedi>
+ <premedi>
<type>Tab.</type>
<name>Captonil 12.5 mg</name>
<seq>1 daily</seq>
</premedi>
+ <premedi>
<type>Tab.</type>
<name>Monotrate 20.mg</name>
<seq>1 daily</seq>
</premedi>
+ <premedi>
<type>Tab.</type>
<name>Persantin 25 mg</name>
<seq>1 daily</seq>
</premedi>
</presentmedications>
+ <pasthistory>
<medical>Took Treatment in Srilanka</medical>
<surgical>TEXT</surgical>
</pasthistory>
+ <personalhist>
<maritialstatus>Married</maritialstatus>
<habits>Yes, TEXT</habits>
<tobaccoandsnuff>Yes, TEXT</tobaccoandsnuff>
<noofchildren>One</noofchildren>
+ <meals>
<diet>Veg/Non-veg</diet>
</meals>
<alcohol>Yes, Occational</alcohol>
<physicalactivity>Active</physicalactivity>
</personalhist>
<drugallergies>TEXT</drugallergies>
+ <familyhistory>
<father>Alive 61 Yrs</father>
<mother>Alive-52 Yrs.</mother>
<siblings>1-Brother and 1 Sister</siblings>
<diabetes>Yes, TEXT</diabetes>
<heart>Yes, TEXT</heart>
<allergies>Yes, Father</allergies>
<cancer>Yes, TEXT</cancer>
<familyhistbp>Yes,TEXT</familyhistbp>
<cva>Yes, TEXT</cva>
<epilepsy>Yes, TEXT</epilepsy>
<endocrine>Yes, TEXT</endocrine>
</familyhistory>
</systemreview>
- <physicalexam>
+ <general>
<build>Obese +</build>
<height>Numeric</height>
<phexamweight>Numeric</phexamweight>
<anaemia>Yes, TEXT</anaemia>
<icterus>Yes, TEXT</icterus>
<cyanosis>Yes, TEXT</cyanosis>
<clubbing>Yes, TEXT</clubbing>
<oedema>Yes, TEXT</oedema>
<glands>Yes, TEXT</glands>
<bmi>Numeric</bmi>
</general>
+ <hent>
<entdrname>TEXT</entdrname>
<ent>TEXT</ent>
<headneck>TEXT</headneck>
</hent>
- <cvs>
<heartrate>Numeric</heartrate>
- <bp>
- <supine>
<systolic>140</systolic>
<diastolic>100</diastolic>
</supine>
- <sitting>
<systolic>Numeric</systolic>
<diastolic>Numeric</diastolic>
</sitting>
- <standing>
<systolic>Numeric</systolic>
<diastolic>Numeric</diastolic>
</standing>
<chestshape>TEXT</chestshape>
</bp>
<heartsound>Normal</heartsound>
<murmurs>Nil</murmurs>
<thrills>Nil</thrills>
</cvs>
+ <rs>
<rate>Numeric</rate>
<breath>Vesicular</breath>
</rs>
+ <abdomen>
<appearance>TEXT</appearance>
<liver>TEXT</liver>
<spleen>TEXT</spleen>
<kidney>TEXT</kidney>
<tenderness>TEXT</tenderness>
<bowelsounds>TEXT</bowelsounds>
<fluid>TEXT</fluid>
<hernia>TEXT</hernia>
<additional>TEXT</additional>
</abdomen>
+ <surgicaldetails>
<surgeonname>TEXT</surgeonname>
<phimosis>TEXT</phimosis>
<hydocele>TEXT</hydocele>
<hernia>TEXT</hernia>
<prostate>TEXT</prostate>
<paintestis>TEXT</paintestis>
<penis>TEXT</penis>
</surgicaldetails>
+ <gynaecfindings>
<gynadrname>TEXT</gynadrname>
<checkup>TEXT</checkup>
<findings>TEXT</findings>
<breasts>TEXT</breasts>
<externalgentalia>TEXT</externalgentalia>
<vagina>TEXT</vagina>
<cervix>TEXT</cervix>
<uterus>TEXT</uterus>
<fornix>TEXT</fornix>
<pv>TEXT</pv>
<papsmear>TEXT</papsmear>
<advice>TEXT</advice>
<vault>TEXT</vault>
<tenderness>TEXT</tenderness>
<discarge>TEXT</discarge>
<pod>TEXT</pod>
</gynaecfindings>
+ <cns>
<cranialnerves>Intact</cranialnerves>
<sensorysys>Normal</sensorysys>
<motorsys>Normal</motorsys>
<reflexes>Bil.equal</reflexes>
</cns>
+ <phyeyes>
<vision>Normal</vision>
<colorvision>TEXT</colorvision>
<palpebral>TEXT</palpebral>
<ptoris>TEXT</ptoris>
<cornea>TEXT</cornea>
<fundus>TEXT</fundus>
<opticdis>TEXT</opticdis>
<conjunctiva>TEXT</conjunctiva>
<pupil>TEXT</pupil>
<squint>TEXT</squint>
<glass>TEXT</glass>
</phyeyes>
<phyfundus>TEXT</phyfundus>
+ <physkin>
<sculy>TEXT</sculy>
<hypopigmentation>TEXT</hypopigmentation>
<hyperpigmentation>TEXT</hyperpigmentation>
<dermatitis>TEXT</dermatitis>
<fungus>TEXT</fungus>
<eczema>TEXT</eczema>
<discolouration>TEXT</discolouration>
<acne>TEXT</acne>
<warts>TEXT</warts>
<rashes>TEXT</rashes>
<urticaria>TEXT</urticaria>
<skinlesions>TEXT</skinlesions>
<xanthalesma>TEXT</xanthalesma>
<scars>TEXT</scars>
<skinothers>TEXT</skinothers>
<dry>TEXT</dry>
<scratchmark>TEXT</scratchmark>
</physkin>
<extremities>TEXT</extremities>
+ <musculoskeletalsystem>
<mssextremeties>TEXT</mssextremeties>
<mssspine>TEXT</mssspine>
<mssjoints>TEXT</mssjoints>
<mssbone>TEXT</mssbone>
</musculoskeletalsystem>
</physicalexam>
December 15, 2004 - 1:40 pm UTC
we really don't need 5 pages of "XML" to get the gist...
you don't mention how the data is maintained here, is it read mostly (eg: bitmap indexes are possible) or very much read write.
The data is maintained as read only
Saheli, December 15, 2004 - 10:44 pm UTC
The data here will have to be parsed from the xml and uploaded into the database and will be read only.. Used only for search.
How should the tables be designed and what others need to be taken care of. Appreciate your timely help.
December 16, 2004 - 8:03 am UTC
then the data should not be persisted as XML
the data should be rows and columns
and bit map indexes will be the "word of the day"
you would/could bitmap index the attributes they want to search on. then a query like:
where (c1 = :x or c2 = :y) and c3 > :z
can simply AND/OR together the bitmaps to find the rows.
Short of that, you could store the XML "as is", using Oracle TEXT, index it and then teach your end users how to "section search" (or write them an application to do it)
</code>
http://docs.oracle.com/docs/cd/B10501_01/text.920/a96517/csection.htm#33034 <code>
table design
A reader, December 29, 2004 - 12:31 pm UTC
tom, I have a desing issue.
I have a user table and user_audit table exactly same structure except the primary key
product:
-----
prod_id number primary key,
prod_name varchar2(100),
prod_desc varchar2(100),
prod_entry_user_code (references user_code),
prod_entry_ts date
prod_hist:
-----------
prod_id number primary key,
prod_name varchar2(100),
prod_desc varchar2(100),
prod_entry_user_code (references user_code),
prod_entry_ts date
primary key(prod_id,prod_entry_ts)
----------------------------------
i have another set of tables
order:
------------
order_id number primary key
order_product_id number,
order_price number(9,2),
order_entry_user_id number,
order_entry_ts date
order_hist
-----------
order_id number ,
order_product_code varchar2(12),
order_price number(9,2),
order_entry_user_id number,
order_entry_ts date
primary key(order_id,order_entry_ts)
*--------**------------
now the question is HOW DO I
JOIN the prod_histto The ORDER_hist ?
for perticular order and product mapping as products an be updated with out change of prod_id? (iknnow this is Not good but i have it.)
can you suggets a better way to join may be by adding a new column with the surrogate key, but then how can i migrate a key from prod_hist to order_hist ?
TIA,
December 29, 2004 - 7:32 pm UTC
insufficient data.
you don't say what the inputs are. too many ways to intrepret this.
I don't even see how you could join prod_hist to order_hist, seems to be missing columns.
joins
A reader, December 30, 2004 - 11:00 am UTC
thanks for reply tom.
what do you mean by what is the input ?
1.) product and product_hist tables.
product is moved to prod_hist when it is expired or upated.
2.) order and order_hist tables
order is moved to order_hist when it is executed or upated.
so for the history, the join I want to make is
--------------------------------
select prod_id,order_id,order_price
from order_hist, product_hist
where order_hist.order_product_id = product_hist.product_id
----------
but obviously this does not work (does not yield required results) as can be updated with out change of prod_id
so in history tables I can not find that
what price was placed against exatcly which version of this product.
I need help to desing thie senario.
Thanks for your help
December 30, 2004 - 11:41 am UTC
do you join every row in prod_hist
or do you get a predicate "for this product"
do you query every row in orders
or do you get a predicate "show me this order"
there are different answers for BULK queries vs "get me this one"
order_hist didn't have a order product id.
order_hist
A reader, December 30, 2004 - 12:13 pm UTC
----------------------------------
i have another set of tables
order:
------------
order_id number primary key
order_product_id number,
order_price number(9,2),
order_entry_user_id number,
order_entry_ts date
order_hist
-----------
order_id number ,
order_product_code varchar2(12),
order_price number(9,2),
order_entry_user_id number,
order_entry_ts date
order_product_id number, <---- Sorry my mistake
primary key(order_id,order_entry_ts)
*--------**------------
1.) every row to count what product had what orders/prices
2.) for specific product also.
2.-- for specific product as
even if the product changes/updated
(only in specific cases like promotion date increased)
the product_id does not change)
so I get a crossproduct of all the different products
but same product ids.
December 30, 2004 - 2:18 pm UTC
1) For this big bulk operation -- just join and use analytics. (you are getting a "something like this" answer because there are no create and no inserts -- NOT that I want them now, you should be able to take it from here)
select *
from (
select ...,
max( case when o.order_ts_entry_date >= p.prod_entry_ts
then p.prod_entry_ts
end )
over (partition by o.order_product_number ) max_dt
from order_hist o, prod_hist p
where o.order_product_number = p.product_number
)
where prod_entry_ts = max_dt
that is, join order to prod by product number, and compute the max prod_entry_ts date such that the prod_entry_ts date is less than the order date (find the DATE of the record that was in place when the order was made) -- and keep just those records.
don't know what #2 means exactly.
the answer
A reader, December 30, 2004 - 3:27 pm UTC
Hi tom,
Thanks for the query, but what I am looking for is the design
answer. I have this crazy design I want to change this so I want to know how do I design the tables(order_hist and prod_hist) so that It can be queried easyly, and fairly junier developer can write the query.
the question is how should I design the history tables
so that I can refer product_hist tabels primary key in to the orders_hist tables
I hope this will make it clear.
December 30, 2004 - 3:48 pm UTC
ease of use? perhaps you want to peek at:
</code>
http://docs.oracle.com/docs/cd/B10501_01/appdev.920/a96628/toc.htm <code>
you can create version enabled history tables, which you could point in time query by using dbms_wm.gotodate().
otherwise, the "easiest" for query purposes may be "denormalize". you don't version product, you version orders perhaps -- and instead of a foreign key to a product history table that you would have to query and look for the "most current correct record", you would have it as attributes.
Or, as you version orders, you don't save just the product number, you save the entire key to the product (including the date) so you can just join.
and I just thought of something -- the record you need to join to in prod_hist from order_hist? it might actually be in PROD mightn't it? I mean, you version the order -- but the product hasn't changed -- it won't be in the history table will it. it just got harder I think.
thanks
A reader, December 30, 2004 - 4:06 pm UTC
thanks for your reply tom.
I want to work and achive the 2 pint you said.
"Or, as you version orders, you don't save just the product number, you save the
entire key to the product (including the date) so you can just join.
"
on join with date (8.1.7) ?? that is the question.
and for the product not being in the product_hist table
will not occur because as soon as we put anything (new)in
product with new product_id it will go to product_hist table. It is just a handling of chages that si not well desinged.
Thank for your time and valuable advice.
December 30, 2004 - 4:34 pm UTC
"on join with date (8.1.7)"
what do you mean by that?
what I meant was, order_hist has (order_product_number,product_entry_ts), not just order_product_number -- so you can join perfectly. You have the entire key to the prod_hist table in order_hist, not a partial key.
not null&default vs nvl
VA, January 27, 2005 - 10:28 am UTC
Which one is better
1. define a column as NOT NULL with a DEFAULT value
2. define it as nullable and NVL() whatever value in a view
I like the latter. The former will use up storage due to the default value.
Comments? Any performance considerations? Thanks
January 27, 2005 - 10:59 am UTC
well, it really comes down to "what is the data"
I mean, the semantics of the column dictate what you do here.
i don't know why you would NVL() it in a view? NULL is an OK value.
A reader, January 27, 2005 - 11:16 am UTC
Well, lets say it is a column called "weight". The business rule is that weight can never be null, it has to be a positive number or zero. So I have the 2 options I mentioned earlier, which one would you prefer and why? Thanks
In other words, if NULL is NOT an ok value as per the business rule for that column, does it make sense to store the "default" value in the table or NVL() it to the needed value in a query/view?
Thanks
January 27, 2005 - 11:22 am UTC
<quote>
The business rule is that weight
can never be null
</quote>
I think that rather sums it all up rather nicely. It is NOT NULL.
A reader, January 27, 2005 - 11:44 am UTC
Right, but there are 2 ways to implement a not null rule. a) Put a NOT NULL constraint on the column and define a default value. This uses storage. or b) restrict all access to the table via a view with a NVL() on the column. This doesnt use storage.
Assuming that these are not-access columns i.e. CBO doesnt make any decisions based on the not-null constraint, why would I want to choose option a) which uses more storage space?
January 27, 2005 - 11:46 am UTC
no, there is only one way to implement a not null rule
a) put a not null constraint on the column
having a default is an option
the nvl() with a view trick is not (in this persons opinion/experience) something you want to do. it is a trick, the data is not stored correctly. disk costs cents per gigabyte. correct data is priceless.
CBO sure does make decisions based on a not null constraint -- it absolutely does.
disk is cheap.
data integrity, priceless.
check constraints in XML ?
Ramki, March 24, 2005 - 2:24 am UTC
Hello Tom,
We doing a project on operation & maintenance on telecom device
we are keeping every device configuration data in XML file,
Device configuration data are like DeviceType, and its Default value, MinValue,MaxValue,IsMandatory,
MaxLength (in case of string data) AllowedValues (this will be shown as drop down in user interface)
Every device will have its own Min, Max, default value, all device will be store in a same table
there will be a row for each device. so check constraint on column is not possible
How to archive this ? one way I can think is write before insert trigger
Tables store device data is
SQL> desc DEVICERECORD;
Name Null? Type
----------------------------------------- -------- -------------
DEVICEID NOT NULL NUMBER(10)
DEVICECONTAINERID NUMBER(10)
IPADDRESS VARCHAR2(20)
DEVICENAME NOT NULL VARCHAR2(20)
DEVICETYPE VARCHAR2(20)
DEVICEVERSION VARCHAR2(20)
DEVICESPECIFICINFO VARCHAR2(50)
LOCATIONINFORMATION VARCHAR2(50)
SNMPVERSION VARCHAR2(20)
SNMPRETRIES NUMBER(10)
SNMPTIMEOUT NUMBER(10)
SNMPREMOTEPORT NUMBER(10)
NUMBEROFVR NUMBER(10)
sample of a Device XML:
</DeviceType>
<DeviceType name="SGSN_EXO" nodeimage="ips3300fl.gif" helpid="526" CardType="SGSN_EVO" FullDeviceType="SGSN_EXO">
<Attr Name="DevName" Default="SGSN_EXO_" MaxLength="20" IsMandatory="true"/>
<Attr Name="IPAddress" IsMandatory="true" MaxLength="15"/>
<Attr Name="NumberofVR" Default="23" MinValue="0" MaxLength="5"/>
<Attr Name="SNMXVersions" AllowedValues="Auto-detect?#?SNMPv1?#?SNMPv2c?#?SNMPv3"/>
<Attr Name="SNMXRetries" MinValue="0" MaxLength="9"/>
<Attr Name="SNMXTimeOut" MinValue="0" MaxLength="9"/>
<Attr Name="SNMXRemotePortId" MinValue="0" MaxLength="9"/>
</DeviceType>
</DeviceType>
<DeviceType name="7206_SLB" nodeimage="ips3300fl.gif" helpid="535" CardType="7206_SLB" FullDeviceType="7206_SLB">
<Attr Name="DevName" Default="7206_SLB_" MaxLength="20" IsMandatory="true"/>
<Attr Name="HWVersions" MaxLength="20"/>
<Attr Name="IPAddress" IsMandatory="true" MaxLength="15"/>
<Attr Name="NumberofVR" Default="23" MinValue="0" MaxLength="5"/>
<Attr Name="SNMPVersions" AllowedValues="SNMPv2c"/>
<Attr Name="SNMPRetries" Default="2" MinValue="0" MaxLength="9"/>
<Attr Name="SNMPTimeOut" Default="6" MinValue="0" MaxLength="9"/>
<Attr Name="SNMPRemotePortId" Default="0" MinValue="0" MaxLength="9"/>
<Attr Name="Slot" MaxLength="20"/>
</DeviceType>
These XML data will be read in UI, and all this check is done in UI.
I feel these check need to be done in DB as check constraints & triggers before storing.
I am not able convince my team developer, they feel XML is right place to do this.
Reason for using XML are :
1. XML is easy to write
2. easy to correct these checks
3. 3rd party tools are available to read XML so less code than doing it DB.
4. This check are need to be done in UI layer, since Error msg should be shown to the user in case of wrong entry.
If I am correct will please tell me strong reason why these check need to be in DB.
If sorry, if am giving too much detail for simple question
Best regards
Ramakrishnan.A
March 24, 2005 - 8:43 am UTC
this is a new question for some time, not a quick "followup and review"
ask them to give you a report (for #3) on the number of tools that
a) read and process xml vs
b) can use sql to report on data in relational databases
it would be interesting.
the checks need to be done in the UI and the database -- for theirs is not the last application to ever use this data (that is the only thing I am 100% certain of here)
check constraints in XML
Ramki, March 24, 2005 - 8:25 am UTC
I am clear with my previous question ?
In sort what I need is check constrain based on key
Say we have DeviceType_Tbl & Device_Tbl has following value
DeviceType_Tbl will define what is device and its Min , Max & mandatory values
DeviceType_Tbl :
Device MINVal MaxVal Mandatory
DT1 1 99 Yes
DT2 99 123 No
DT3 1 90 Yes
DT4 11 16 No
DT5 9 99 Yes
DT6 13 23 No
While creating device in Device_Tbl, say DT1 is created it has to check Min , Max & mandatory defined for that device
How to implement this?
Thanks
data and Index tablespace same disk
j, March 24, 2005 - 9:40 am UTC
Dear Tom,
sorry if this question is not 100% about the initial subject but I didnt find a better place.
If I am correct oracle suggest to put a tablespace for data files and another tablespace for index files, but I am not sure if oracle recommed also to put these tables in differents disks to avoid contention. Is the second statement correct?. If not, would you recommend to put all tablespaces in one drive?. including redo log files?
thanks
tables and indexes in same disk drive
j, March 25, 2005 - 4:47 am UTC
Dear Tom,
thanks for your answer,
Look in these 2 scenarios
1) 10 HD in raid 1E, including all tablespaces and redologfile
or
2) 5 RAID-1 mirrored volumes, using one for tables, one for indexes, one for redolog files etc..
Which one will produce better perfomance
thanks
March 25, 2005 - 7:52 am UTC
take all 10 and do 0+1 for "best over all spread of IO to avoid hot spots"
separating tables from indexes never did anything for you. striping over many disks - does.
design table with lots of update
June, April 13, 2005 - 2:16 am UTC
Hi Tom,
I need to design a table. We have APPSTATUS table, which is appended daily with new accounts information and any changed information on existing account. Average daily new accounts are about 90k, with 50% of them will go through further application process, and 10% of them will be approved at the end. the average approval time for account is about 3 months (varies though, as one account may be closed but re-opened after 90 days with same account ID, < 1%). during this 3 months, the account may be changed to any status, and business would like to have one table, say, B, with all accounts with most current information (sort of snapshot of the day).
We already have more than 2-years data (more than 56mm accounts and is still growing).
To capture the snapshot of the day, we will keep updating 3-months data (90k*90*0.5=4050000, which is 4.5mm accounts), while rest are relative static (may have exception for < 1%, as mentioned above.)
You can imagine that data will be fragmented, and will not be efficient as well due to the fact that only small portion of data will be updated.
I am thinking to split into two tables, one(B1) holds all static information for accounts (I still have to find out if it is possible to find out the cut to identify the static), with other one(B2) holds those accounts before get into static status (approval or whatever). data in B2
will roll into B1, and build one view to union B1 and B2 for user query. Since B2 will be controlled in much smaller size, table rebuild will be quick and easy to do.
Is this a feasible solution, assume we could work out the business criteria for freezing account information at some stage? or is there any suggestion / live example for this type of table build? For an environment with lots of update activity, how could I avoid table fragmentation? I am talking about table with more than 100mm records and size could be more than 10GB. User couldn't avoid updating due to different sources for data in same table.
we are in 9.2.0.4, Sun Solaris.
Thanks in advance for any feedback.
April 13, 2005 - 9:06 am UTC
without really understand why the business wants that extra table, I cannot really comment. What is the precieved goal of that? lots of full scanning going on?
Nice
Kumar, May 24, 2005 - 1:18 pm UTC
Dear Tom,
What is the necessity of having a Secondary key??
Can't primary keys alone suffice??
May 24, 2005 - 1:50 pm UTC
if you mean surrogate keys, they are typically used when you are not 100% sure that the natural key should be used (eg: the natural key might actually change over time)
multiple tabels or sigle table?
Steve, June 10, 2005 - 12:10 pm UTC
Hi Tom,
I have 3 big transaction tables partitioned by month,
t_order, t_receive and t_cancel. There are many common
fields among 3 tables. If I merge the 3 tables
into one table and add a column trans_type to identify the transaction, partioned by month and subpartioned by trans_type. Do you think one-table method is better than 3-table method from performance point of view?
Thanks!
Steve
June 10, 2005 - 3:43 pm UTC
I don't know, I don't know what you actually do with these tables.
If you frequently full scan t_order in order to generate a report on orders, well, putting all three together is going to make that go slower (for example)
You need to understand the "physics" behind how you use the tables before you could say "good or bad" idea.
If I was union all'ing these tables constantly, could be good.
If I was querying individual tables only, could be bad.
modify...
sikanthar, July 06, 2005 - 11:10 pm UTC
Tom,
I want to add a new column into the MIDDLE of existing table. How can i do that..!
July 07, 2005 - 9:06 am UTC
you don't, you'd have to REBUILD the entire table.
Views allow you to order the columns any way you would like, use a view.
ops$tkyte@ORA9IR2> create table t ( x int, z int );
Table created.
ops$tkyte@ORA9IR2> alter table t add y int;
Table altered.
ops$tkyte@ORA9IR2> desc t;
Name Null? Type
----------------------------------- -------- ------------------------
X NUMBER(38)
Z NUMBER(38)
Y NUMBER(38)
ops$tkyte@ORA9IR2> alter table t rename to t_table;
Table altered.
ops$tkyte@ORA9IR2> create or replace view t as select x, y, z from t_table;
View created.
ops$tkyte@ORA9IR2> desc t;
Name Null? Type
----------------------------------- -------- ------------------------
X NUMBER(38)
Y NUMBER(38)
Z NUMBER(38)
Now, T looks like a table -- and has the columns in "your preferred order"
Great..!
sikanthar, July 07, 2005 - 10:02 pm UTC
Tom,
Thanks for your info.
but, instead of creating VIEW as select X,Y,Z why don't we go for creating TABLE as select X,Y,Z..
Am i right!
July 08, 2005 - 7:26 am UTC
why do you want to rebuild the table when a simple
a) add column
b) rename
c) create view
which moves zero bytes can achieve the same goal.
need..!
sikanthar, July 08, 2005 - 10:01 pm UTC
Tom,
client wants segment_type as TABLE with the same name..!
Any suggestion welcome..
July 09, 2005 - 8:52 am UTC
then client will pay the price of you reorganizing the table.
Make sure you explain to client the costs of doing so. I personally find that when people understand the cost of what they are asking for, people frequently become much more flexible in their requests. Not always -- but without the understanding of the cost - there would be no reason for them to change their mind.
"Mr/Mrs client - I can add your column 'in the middle' with 5 seconds of downtime, logically you will have precisely what you asked for, no one will be much the wiser. This is very low risk. This will cost you $5 of my time.
Or, we can schedule a maintainence window of a couple of hours of downtime (depends on size of object, number of indexes, etc). What we'll need to do is create a brand new table with the column in the middle. We'll need to copy the existing data over. Then we have to drop the old table and rename this new table. Then we have to create all of the indexes, implement any triggers, add constraints, verify all of the grants are in place, and test to make sure we didn't miss anything. This is considered high risk. This will cost you $500 of my time."
In 10gr1, using dbms_redefinition, we could mitigate the risk and downtime in the rebuild.
In 9ir1, using dbms_redefinition, we could mitigate the downtime in the rebuild (but not so much the risk as you are responsible for indexes, triggers, grants, constraints, etc)
In both cases, it would still be more than $5 of your time.
Your call.
Great..!
sikanthar, July 10, 2005 - 12:05 am UTC
Tom,
Good explanation around the work.Obviously, the COST and DOWNTIME is too high.The table size 1468006 KB and having 6 indexes.
I'll suggest the client and i do explain about the cost and work around.
will get back with feedback....
Column missing...
Reader, July 21, 2005 - 11:10 pm UTC
Tom,
I did recreate the table due to column adding in the middle. but, instead of creating view i created with create table..
1.Rename the table.
2.Added column in old table.
3.Create table as select ... from old_table.
The table created in the desired order, but the column which newly added in middle of the table contains data..
the thing happened is LAST column data's in old table
inserted into NEW column of new table,
then you may ask what happen to the LAST column in new table? It becomes NULL. Instead inserting into LAST column
of new table its inserted into NEW column of new table..
hope u understand the scenario... Thanks for ur patience..
let me know in detail a abour this//
July 22, 2005 - 8:47 am UTC
sorry, I did not follow this at all. Not sure if you are saying "job accomplished" or "we have a problem"
They messed up
Quadro, August 03, 2005 - 9:06 pm UTC
[quote]
sorry, I did not follow this at all. Not sure if you are saying "job accomplished" or "we have a problem"
[/quote]
They messed up with column orders:
SQL> create table t (x number, z number);
Table created.
SQL> insert into t values (1,2);
1 row created.
SQL> alter table t rename to t_old;
Table altered.
SQL> alter table t_old add (y number);
Table altered.
SQL> select * from t_old;
X Z Y
---------- ---------- ----------
1 2
SQL> create table t (x, y, z) as select * from t_old;
Table created.
SQL> select * from t;
X Y Z
---------- ---------- ----------
1 2
So - Y became Z and Z became Y.
They should do:
create table t (...)
insert /*+ append */ into t ( <list columns order here> )
select <same column order> from t_old
Or another way
Quadro, August 03, 2005 - 9:10 pm UTC
Would be:
create table t (x, y, z) as select x, y, z from t_old;
Both cases - they need explicitly defined column order.
RESOLVE : ORA-00932 : Inconsistent Datatypes : EXPECTED NUMBER GOT BINARY
VIKAS SANGAR, August 22, 2005 - 5:47 am UTC
Dear Mr. Kyte
I have a table T1 in my database with most of its feilds of the datatype RAW, but now i have created a sismilar table T2 with some of its feild as Varchar2(), Date and Number datatype.
All these feilds with changed datatype in T2 will be getting the data of the type RAW from T1 using...
SQL> insert into T2 select* from T1.
On this i am getting the Error...
ORA-00932 : Inconsistent Datatypes : EXPECTED NUMBER GOT BINARY.
Pls tell me how an i over come this problem, why not internal typecast of raw to other datatypes is taking palce here?
What shall i do, to solve this problem? I have tried using To_Char, To_Number etc, but its not working.
Pls Help.
Best Regards...
Vikas.
August 22, 2005 - 7:39 am UTC
what is in these "raw" fields. undoubtedly they are NOT oracle number types (0 to 22 bytes) and not oracle date types (7 byte fixed width)
so, what do you put into these "raw" fields. (I don't even want to know why you were using raw, seems pretty strange)
RESOLVE : ORA-00932 - FEED BACK
VIKAS, August 22, 2005 - 10:35 am UTC
RAW DATATYPES OR NOT SOMETHING INVENTED BY ME, THEY ARE VERY MUCH THERE IN ORACLE.
WE ACTUALLY MIGRATED OUR DATABASE FROM MS SQL SERVER TO ORACLE 9I DATABASE, USING ORACLE MIGRATION WORK BENCH AND IT WAS AT THAT TIME, THAT THE MIGRATED DATABASE WAS CREATED WITH SOME OF THE TABLES HAVING RAW DATATYPE OF SOME OF THEIR FEILDS (COLUMNS).
IT IS DUE TO THIS MATTER, I WANT TO RECREATE THE TABLES WITH MODIFIED DATATYPES FROM THE PREVIOUSLY MIGRATED DATABASE TABLES WHICH HAVE RAW DATATYPE TYPE IN SOME OF ITS TABLES IT.
CREATE TABLE (COLS..) SELECT (COLS....)FROM.
ALTHOUGH I HAVE CREATED THESE TABLES BUT I AM NOT ABLE TO INSERT THE MIGARTED DATA INTO THESE,
INSERT INTO .... SELECT ....FROM.
AND I AM GETTING THE
ORA-00932 ERROR.
TAKE CARE, REGARDS...
August 23, 2005 - 4:01 am UTC
YOU ARE SHOUTING. CAPSLOCK IS SHOUTING. IT IS UNIVERSALLY RECOGNIZED AS SUCH.
anyway, my point was -- you choose yourself to store data that is NOT binary data (numbers, dates, strings) as RAW data, you made that choice and that choice would not be the correct choice. You defeat many features of databases that way.
You will have to create conversion functions that can take your RAW data (whatever it is! for you see, it is just binary bits and bytes, I have no clue what you put in there, the database has no clue -- only you do. Only you can write these conversion routines)
jim, August 22, 2005 - 4:19 pm UTC
You don't need to shout. (Typing is caps is shouting)
If you have raw columns and you want to create a new table that has columns of a different data type then you need to explicitly map the raw columns to whatever types you want in the new columns. You may need to write a function that does this and use the function in your insert statement.
insert col1,col2 into mynew table (select my_function_raw_to_varchar2(col1_old),
my_function_raw_to_date(col2_old)... from my_old_table...)
FEED BACK
VIKAS SANGAR, August 23, 2005 - 12:58 am UTC
HI JIM,
THANX FOR UR SUGGESTION, I HAVE ALREADY RESOLVED MY PROBLEM.
I AM NOT SHOUTING, AND I WOULD LIKE TO SUGGEST YOU TO THINK TWICE BEFORE MAKING ANY FACTLESS ALLIGATIONS AGAINST ANY ONE.
ITS ONLY THAT THE CAPS LOCK KEY OF MY KEY BOARD HAD GOT STUCK DUE TO WHICH I AM COMPELLED TO USE IT WITH CAPS KEY ON. CERTAINLY I'LL REPLACE IT TODAY, AND THEN REVERT BACK TO YOU. AND YES, MY SICERE APOLOGIES, IF SPECIALLY, YOU FELT SO BAD ABOUT IT.
ANYWAYS... THANX TO YOU ONCE AGAIN.
TAKE CARE, REGARDS..
VIKAS
Apology
VIKAS, August 25, 2005 - 3:57 am UTC
Dear Mr. Kyte,
Got my Key Board replaced, I hope now it is fine.
Even you think, that i was shouting, well then My sincere apologies to you too.
Take care, regards.
August 25, 2005 - 4:02 am UTC
sorry, I did not see your other comment when I got the one IN UPPERCASE.
The uppercase stuff does come across as "shouting".
What if, if we have to add 200 columns.
Dinesh Kumar Dubey, November 07, 2005 - 10:26 am UTC
Hi Tom,
Your answers are great and useful.
we have one requirements.
We have to add 200 fields to the existing system.Now this can be done in two ways.
1. To add all these new extra fields to an existing table.
2.To create a new table for these extra fields.
But which method is better 1 or 2,pls suggest.
Thanks and Regds
Dinesh
November 07, 2005 - 8:16 pm UTC
the answer is one of the following
a) 1 is right.
b) 2 is right.
c) neither 1 nor 2 is completely right.
I can sum it up as "it depends".
tell us more about these 200 fields (wow, 200 new fields? seems like "a lot")
Thanks Tom
Dinesh Kumar Dubey, November 07, 2005 - 11:58 pm UTC
Thanks Tom,
Means it is a multiple choice question.Means we have to decide on the basis of our situation.
And regarding those fields 80 fields are number(2),and 120 fields are varchar2 of maximum 16.
Thanks again.
Regds
Dinesh
November 08, 2005 - 9:55 pm UTC
sounds like you missed out on normalization of this data? should these not be rows in a detail table?
when you "add them", will they take on values for existing rows over time? If so, sounds like a rebuild with a generous pctfree to allow for the future updates.
Thanks Tom
Dinesh Kumar Dubey, November 09, 2005 - 1:05 am UTC
Thanks Tom,
For Your reply.
Yes tom these fields will go into the existing rows.But they will have data from future insert.It is really that we should consider pctfree, in this case.
Thanks And Regds
Dinesh
November 09, 2005 - 9:28 am UTC
I did not understand.
1) When you add the columns, will they have defaults?
2) when you add the columns, will you be updating them for many of these rows in the future?
If (1), you want to rebuild using create table as select to supply the defaults.
If (2), you want to rebuild using a generous pctfree to avoid row migration in the future, but then set the pctfree back down after the rebuild (if NEWLY added rows will already have these values)
Hybrid table design
Samson, November 13, 2005 - 1:20 pm UTC
Hi Tom,
We are developing a debt management system for a bank. We've a big fat customer table, that gets populated from the host system daily (incremental uploads). Number of attributes have increased to 373.
About 300 of these are readonly during the day for OLTP operations and get updated only through the batch uplad process run every night, but rest of them get updated through OLTP.
The problems that we are facing with this table is that it is the same table which is accessed for OLTP and batch reports (70% OLTP during the day, 30% report generation. This is the rough guesstimate, we received from operations team). Since its too fat, full table scans take longer. There are certain search screens, where we use pagination queries (FIRST_ROWS_N hint), but since none of the indexes can be satisfied, this table has to be full scanned. This alone takes approximately 17 seconds, which is too slow for a search screen. We thought of using your rowid (wwc_fast_people) search method, but then our search screens have parameters which are updated during the day.
We are not sure how to approach this problem, hence require your opinion on this.
Do you think splitting the table into two tables would benefit ? i.e readonly fields in one table, which can be compressed, and rest 73 (OLTP fields) along with 10 other parameters, which are used in search screens in another table along with rowid of first table.
I know you don't like to guess, but then could you pls share some of experiences with such hybrid system designs. May be you have already come across such scenarios, and could share it with us.
Thanks
PS: your design principals have really helped me a lot in my previous SQL Server project (did i say sql server..?). Indexes vs FTS, bind variables, Single query vs loop etc apply almost across most of the databases. more of it on your blog some day
November 13, 2005 - 5:08 pm UTC
does the data have to be searchable IMMEDIATELY after a commit - or is a couple of seconds lag OK?
do you have a set of N columns (where N is reasonably small) that cover your search columns pretty much? (eg: of the 373, we search say "10" of them most of the time)
Hybrid table design
Samson, November 13, 2005 - 11:06 pm UTC
Hi Tom,
Ideally users want it IMMEDIATELY to be on commit, but couple of seconds lag is ok and is something, they can live with.
N=20 (12 static columns and 8 columns updated through OLTP)
Thanks
PS: Cold you pls share some of your experiences with hybrid systems(may be on your blog). I really liked your rowid search methodology and many others
November 14, 2005 - 7:59 am UTC
All of this should be benchmarked - but:
1) you could create a concatenated index on some subset of the 20 columns (putting the "always" used ones in it). Not sure I want 20 columns in an index (this will impact modifications obviously).
then you can use something like:
ops$tkyte@ORA9IR2> create table t
2 as
3 select * from all_objects;
Table created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create index t_idx on t(object_name,object_type,owner,status,last_ddl_time);
Index created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T', cascade=>true );
PL/SQL procedure successfully completed.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> set autotrace traceonly explain
ops$tkyte@ORA9IR2> select /*+ first_rows(100) */ *
2 from t
3 where rowid in ( select /*+ index_ffs( t t_idx ) */ rowid
4 from t
5 where owner = 'SCOTT'
6 and status = 'INVALID' );
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=HINT: FIRST_ROWS (Cost=108 Card=100 Bytes=11300)
1 0 NESTED LOOPS (Cost=108 Card=100 Bytes=11300)
2 1 INDEX (FAST FULL SCAN) OF 'T_IDX' (NON-UNIQUE) (Cost=8 Card=639 Bytes=12780)
3 1 TABLE ACCESS (BY USER ROWID) OF 'T' (Cost=1 Card=1 Bytes=93)
ops$tkyte@ORA9IR2> set autotrace off
The index_ffs (fast full scan) will use the index as if it were a skinny table and full scan it using multi-block IO. It is very much like the "people" example you referred to but the index is maintained in real time.
This allows for immediate querying of the data.
2) using Oracle Text - index the 20 columns into a single index and have the index maintained in the background (less impact on the modifications, the grunt work takes place "offline" as far as they are concerned).
You can either index an XML representation of the data (index a function in text, your function gets called with the primary key and you query out the record and return a tagged xml document) and then using section searching - efficiently search over the many fields in a single index.
or, you might be able to use the ctxcat index.
The index maintenance would be delayed.
Samson, November 14, 2005 - 8:39 am UTC
Thanks Tom,
I will benchmark these. Seems just a little bit of lateral thinking, isn't it?.
Anyway, "Do you think splitting the table into two tables would benefit ? i.e readonly
fields in one table, which can also be compressed, and rest 73 (OLTP fields) along
with 10 other parameters, which are used in search screens in another table
along with rowid of first table." Would it really help?
Do you think, its a good practise to store readonly and read/write information in one table ?
I am sure you have many interesting experiences with hybrid systems, like the one I am working on. Could you pls share some of them ?
I know I am being insistent, but if it gets me some more pearls of wisdom, then why not...
November 14, 2005 - 1:20 pm UTC
I doubt splitting the table would help, I could see it hurting actually (joining would be required to put the data back together)
Yes, storing read/write and 'read only' data together is OK - perfectly OK. You wouldn't really gain anything but complexity splitting it up since they are *queried* together.
I'm giving you the "pearls" here - if the scan is the issue, we need to either
a) remove scan (text)
b) make scan faster (skinny index)
Samson, November 14, 2005 - 1:41 pm UTC
Thanks Tom. I was actually looking for this reassurance from you.
Design Question on Text Search Column
A reader, November 17, 2005 - 1:49 pm UTC
Hi Tom,
We are building a data warehouse application using star schema. Our users want to search for text (whole-word-match) in the fact table and also the related dimension tables. (Normally they gave us a list of columns to be searched).
Our approach right now is build a text search column in the fact table containing the concatenated string of the fact text columns and all dimension columns. Right now the column is defined as 'VARCHAR2(1000)'.
My questions:
1. Is this a good approach?
2. Is it a good approach if the column size is VARCHAR2(4000)?
3. Any difference if the column is defined as CLOB?
Thanks,
Fan Liu
November 18, 2005 - 9:54 am UTC
1) it is similar to what I have done in the past - definitely. A single index that can search a complex hierarchy.
2/3) if you can do what you need in a varchar2(4000) stick with it. Use a lob if you cannot fit into 4000 characters. The varchar2 has certain efficiencies over the lob.
asking a question!
haseeb aslam, January 10, 2006 - 12:39 pm UTC
i want to ask a question!please answer it.
how to produce a design to find out largest number from five numbers entered by the user.
im waiting of your reply
thanks a lot for your cooperation!
haseebabbi@hotmail.com
January 10, 2006 - 1:10 pm UTC
asking a question!
haseeb aslam, January 10, 2006 - 12:43 pm UTC
how to write a simple program to find out "add" and "even" numbers from first 100 numbers?
January 10, 2006 - 1:11 pm UTC
that would involve your editor to write the code, and sqlplus to compile it. (smells like homework)
Existing design for search
Rajeswari, January 11, 2006 - 5:48 am UTC
Our existing design is like this for any search methods
we have a java wrapper object (set and get methods) related to our input. In java method we check each attribute of the object for null and if it is not null we will set the value for predicate condition. Select Query is formed dynamically. Bind variables are used. Same input java object is used by many modules.
The existing code itself well written eventhough it is dynamic. (It will use indexes similar to one of dynamic cursor approach in asktom). The number of records also limited using rownum.
Whether We will get any benefit if we move logic to PL/SQL. I am not getting any solid reasons.
January 12, 2006 - 10:17 am UTC
one benefit you would achieve would be better code reuse in the future, you know, when some other language becomes "the coolest thing ever".
Talbe Design issue
Jack, January 26, 2006 - 1:59 pm UTC
Hi Tom,
It is not clear to me from this thread about a table design.
1) We have one table containing about 100 columns. This table is used by our online applicaiton and batch application. At any time, only few columns are querired by each application. In this condition is it better to split the table into multiple tables so that batch process can run faster by selecting more rows(as with the split design more rows can be inserted into the block compare to the 100 columns).
2) what is advantage or disadv. of having this table with a bigger block size than the rest of the tables (db will be in multiple block size mode). Please let me know from performance point of view.
Thanks
January 27, 2006 - 8:14 am UTC
1) no, I seriously doubt that would cause anything but heartburn and pain.
2) probably none, unless an individual row is not fitting onto a block.
Talbe Design issue
jack, January 27, 2006 - 2:07 pm UTC
>>>1) no, I seriously doubt that would cause anything but heartburn and pain.
--Can you elobarate with example if possible. As some people cliam that by splitting and moving not frequently used columns to seperate table, we can fit more rows in the block. why do you think this will give heartburn and pain.
Thanks
January 28, 2006 - 12:38 pm UTC
sure, you can fit more rows per block, but then, well, you sort of got to put them back together again someplace else don't you.
putting them back together - heartburn, pain.
inserting bits and pieces into multiple tables - heartburn, pain.
Talbe Design issue
jack, January 30, 2006 - 1:29 pm UTC
>>>>> sure, you can fit more rows per block, but then, well, you sort of got to put them back together again someplace else don't you.
---Yes, I need to. But by placing them in different tables, my batch process can run faster as they can fetch more rows from each table.
>>>>>>>>>>>
putting them back together - heartburn, pain.
inserting bits and pieces into multiple tables - heartburn, pain.
----- Why do you think so. I can make this easily by just calling one stored procedure and the stored procedure should take care of multiple inserts.
**** I am not still convinced with your answers ***
January 30, 2006 - 3:35 pm UTC
That is ok, you don't have to be. Only I do ;)
You asked for my opinion.
I delivered.
You have not delivered any compelling evidence to convince me to change my mind either.
Go ahead and break up the tables and 6 to 12 months - come back and tell us "good idea" or "bad idea".
I like a single table for, well, what properly belongs in a single table. That is all.
store tables
rs, February 28, 2006 - 2:09 pm UTC
What is pupose of "store tables" in oracle . I opened sqlnavigator I saw in a object/folder . Could you please explain this
March 01, 2006 - 7:56 am UTC
Nope, never heard of "store tables" myself.
store tables
sat, March 01, 2006 - 8:27 am UTC
Below is explanation I got from the sqlnavigator
What is a Store Table?
Oracle stores nested table data out-of-line in a store table, which is a system-generated database table associated with the nested table. Store tables are represented in the Oracle Data Dictionary, and therefore SQL Navigator represents them graphically in the DB Navigator tree.
Store tables are created and dropped as needed by Oracle, and cannot be altered by users.
SQL Navigator is a registered trademark of Quest Software Inc.
Table Design for Parent Child Group
A reader, May 11, 2006 - 6:46 am UTC
I have the business entities like Market and Market Group.
One Market Group can have multiple market. And one market may belong to multiple market group.
Here is the design we have in mind
MARKET_MARKETGROUP (
ID INTEGER,(PK)
TYPE_ID INTEGER NOT NULL, (1 if market , 2 if group)
NAME VARCHAR ( 50 ) NOT NULL,
....Other related fields
)
MARKETGROUP_ELEMENT (
MARKET_GROUP_ID, (FK to market id , those will be the element with type 2)
MARKET_ID (FK to market id , those will the element with type 1)
)
Please let me know your comments on this , if it is not good what will be the best way to implement this.
May 11, 2006 - 7:28 pm UTC
I do not see the two entities market and market_group
I see a potentially really bad idea with "market_marketgroup" but it is hard to say since the level of detail here is very "little"
you would have:
a) a market table
b) a market_group table with a foreign key to market (so a market_group belongs to a market, but a market has 0, 1 or more groups)
Not exactly what I am looking for.
A reader, May 12, 2006 - 6:13 am UTC
Tom,
This is not exactly what I am looking for.
I am looking for a market to be belongs to one or many market group. Like below:
Market
A
B
C
D
MARKET GROUP
G1 (Member market A,B)
G2 (Member market A,B,C)
G3 (Member market B,C)
And the member market of a group is not fixed , id can be added or removed.
May 12, 2006 - 9:33 am UTC
market
a
b
c
d
market group
g1 a
g1 b
g2 a
g2 b
g2 c
g3 b
g3 c
there might be a master table "market_group_parent" with g1, g2, g3 in it - that would make sense - and then this is a MANY TO MANY relationship - a market group has many markets, a market may belong to many market groups.
market( market_id primary key )
market_group_parent( group_id primary key )
market_to_market_group( market_id, group_id, primary key(both of them)
Primary Key
reader, May 15, 2006 - 1:24 am UTC
how many primary key we can able to create on table ?
May 15, 2006 - 6:47 am UTC
one
maximum no of columns per table
Thirumaran, May 15, 2006 - 8:52 am UTC
Hi Tom,
This is a similar question like the one answered above
"maximum no of columns per table".
As you say for performance reason we will have 100 columns in a single table. suppose i have columns which stores Warranty types & details,Contract types & details, Network types & details , Depreciation details .
A) if i have data of different types should i create it as separate entities as it's not Normalized or can i store these data as a single entity (club all columns into one table)
separate entities one table for
1) Depreciation
2) Warranty
3) contract
4) network
B) what are the pro's and cons' w.r.t to performance & Normalization
Thanks
Thirumaran
May 15, 2006 - 9:43 am UTC
a) why would this not be 'normalized'. Single table.
b) I don't think you have normalization defined here.
Problem with Multiple Indexes in a table.
Sujit Mondal, May 15, 2006 - 9:51 am UTC
Tom,
I need your commnets for the follwing situations.
Table T1(Total 30 columns , 5 of them has FK)
Table T2(Total 35 columns , 7 of them has FK)
Table T3(Total 40 columns , 10 of them has FK)
Application need to have a search on those FK column so for performance improvemnet we like to put Btree index on all those FK column.
relation of T1,T2,T3 are one to one. As you have suggested generally for 1-1 relationship we should have one table. But with that we will have more number of Btree index in this table. Can you please explain me how the performance will be impacted in this situation. If the number of btree indexes increases how the performance will degrade?
May 15, 2006 - 10:03 am UTC
So what if you have
3 tables with 1 index each
versus
1 table with 3 indexes
??? Indexes are maintained with the underlying data is modified. If you don't modify the data (or insert entirely null keys into them) the index is not touched.
I don't understand where the problem would be here?
why would this not be 'normalized'. Single table.
thirumaran, May 16, 2006 - 3:07 am UTC
Hi Tom,
your question
a) why would this not be 'normalized'. Single table.
Single Table with columns
===================
Network_code
network_type_desc
Depreciation_code
Depreciation_desc
Depreciation_date
Warranty_code
Warranty_period
warranty_in_month
Warranty_end_date
if i store all these columns in a single table then
example:
network_type_desc partially dependent on network_code
similarly for Depreciation & Warranty.
it violates second normal form because the column network_type_desc depends only on the value of Network_code
is it recommended to store this in single entity ?
Thanks in adv
Thirumaran
May 16, 2006 - 7:04 am UTC
you just changed the entire problem. You have some dimensions there - sure, factor that out.
but this is entirely different from the original problem description of:
...
As you say for performance reason we will have 100 columns in a single table.
suppose i have columns which stores Warranty types & details,Contract types &
details, Network types & details , Depreciation details .
........
It sure sounded to me like you wanted to have separate tables to store the "code" (or type) and the details.
The *_desc type stuff - sure, look up tables.
The *_details stuff - that is specific to a record in your single table.
Unselected column
Yoav, May 16, 2006 - 10:45 am UTC
Hi Tom
I have table with 400 column.
There are tens of repors that running against this table.
1. Is there a way to know if there are some columnes that
never been selected?
2. Is there a way to know what was the last time each column had been selected ?
Thank You
May 16, 2006 - 10:58 am UTC
1) only if you enable auditing (fine grained auditing in this case to capture the sql's executed against this table)
2) see #1
Column Order
Sanji, June 06, 2006 - 2:51 pm UTC
June 06, 2006 - 3:25 pm UTC
yes, it would seem the author (Steve Adams) provides information in a fashion similar to me.
With evidence that what he says is true.
Do you see something contradictory here?
Column Order
Sanji, June 06, 2006 - 3:31 pm UTC
Nothing contradictory, but i was wondering about the fact that the rows are not stored in an order in the table. To retrieve in a particular sequence, we use "ORDER BY".
Then how and why would the column order influence performance ?
If there is an idex with a healthy clustering factor, then yes, it's understood that there would be a performance boost for a query using that index.
I am slightly confused with the column ordering phenomenon.
Thanks
Sanji
June 06, 2006 - 3:34 pm UTC
did you read the article? It said why the order of the columns could have an impact on performance.
Order of columns is very very very different from row ordering.
Steve gave two clear examples there?
Column order
A reader, June 06, 2006 - 3:40 pm UTC
Correct. I was confusing row and column orders. Makes lot of sense now.
Thanks
Sanji
Designing generic tables and ad hoc query tables
Senthil Kumar, June 12, 2006 - 2:21 pm UTC
Tom,
We have an application that has tables defined with lot of "generic" column names and lot of indicators which act as a procedural logic for the application code. When I read your book on effective design on oracle, you had mentioned it's not good to go for "generic" design since there is no such thing as generic design which addresses all kinds of requirements or scenarios.
Coming back to the tables we have, most of the tables have columns called "Key columns" which start with key1_name, key2_name...keyn_name. And these columns are supposed to be part of the non-aggregate columns like dept_no,dept_name. And the way to know how many key columns are involved is to look at another column key_level and based on the value on these key columns, we have to look up another table for the actual data value. Like key1_name = 'Dept_no' and we go to table "Dept" which is also decoded from another column and select Dept_no.
And we also have some tables to store the actual table details like column_name,data type etc for using with ad hoc report functionality where the user selects tables from GUI front end and the report module dynamically looks at each aggregate and non-aggregate columns in the "definition table" and performs the summation
Needless to say this data model design has been so complex for most of our programmers and most of the time we have very poor performance on the queries.
My question to you is
1. What is your opinion on this design ?
2. For creating ad hoc report design do you suggest any books or ideas ?
Thanks a lot for your time.
samples from one of our table
PRESN_ID NUMBER(4)
WEBID VARCHAR2(12)
EXEC_PRESN_SEQ_NUM NUMBER(2)
FILE_SEQ_NUM_IND CHAR(1)
PRESN_DATA_TBL VARCHAR2(30)
DATA_KEY_LVL NUMBER(1)
KEY1_DDL_NAME VARCHAR2(30)
MAIN_KEY1_IND CHAR(1)
KEY1_HEADER VARCHAR2(30)
KEY1_HD_LINK_IND CHAR(1)
KEY1_HD_LINK_NUM NUMBER(2)
KEY1_HD_PARM_IND CHAR(1)
KEY1_DATA_LINK_IND CHAR(1)
KEY1_DATA_LINK_NUM NUMBER(2)
KEY1_PARM_IND CHAR(1)
KEY1_DESC_IND CHAR(1)
KEY1_MOUSE_OVER_NUM NUMBER(2)
KEY2_DDL_NAME VARCHAR2(30)
KEY2_HEADER VARCHAR2(30)
KEY2_HD_LINK_IND CHAR(1)
KEY2_HD_LINK_NUM NUMBER(2)
KEY2_HD_PARM_IND CHAR(1)
KEY2_DATA_LINK_IND CHAR(1)
KEY2_DATA_LINK_NUM NUMBER(2)
KEY2_PARM_IND CHAR(1)
KEY2_DESC_IND CHAR(1)
KEY2_MOUSE_OVER_NUM NUMBER(2)
KEY3_DDL_NAME VARCHAR2(30)
KEY3_HEADER VARCHAR2(30)
KEY3_HD_LINK_IND CHAR(1)
KEY3_HD_LINK_NUM NUMBER(2)
KEY3_HD_PARM_IND CHAR(1)
KEY3_DATA_LINK_IND CHAR(1)
KEY3_DATA_LINK_NUM NUMBER(2)
KEY3_PARM_IND CHAR(1)
KEY3_DESC_IND CHAR(1)
KEY3_MOUSE_OVER_NUM NUMBER(2)
June 13, 2006 - 10:44 am UTC
1) you already have that in writing in Effective Oracle by Design. My opinion is pretty "clear" there. The chapter subheading is something like "DON'T DO THIS"
2) for ad-hoc recports, no book - but a tool. Our offering in this area is "Discoverer" - there are many others out there as well. Not sure what a book on "ad hoc" reporting would look like..
Suggestions for trigger design
Ajums TT, June 19, 2006 - 10:14 am UTC
Hi Tom,
I wish to design an audit-trail trigger for my table.
The structure of the table is
Code varchar2(3)
Seq varchar2(3)
Amount number(8,2)
LeaseNo varchar2(6)
Last_Update_Date date
Last_Updated_by varchar2(9)
There are no primary keys defined against the table. However the identifier is Code + Seq + LeaseNo
The interesting thing about this table is that it does not have any updates fired against it.
The maintenance application using this table behaves in the following way.
1> Deletes all the records associated with the LeaseNo.
2> Inserts all the records associated with the LeaseNo.
We are using Oracle 8.1.7 for our application.
What would be the easiest way of doing this?
Regards,
Ajums TT
Please review this design,
SNagaraj, June 21, 2006 - 12:15 pm UTC
design table
I have been thinking of a solution to make the ETL transactions fast and reduce the
consumption of database resource.
The situation is:
There are two tables. BASE TABLE (called as T1) and INC TABLE (called as T2)
T1 contains the following columns
order_num
order_desc
order_date
.
.
many other columns.
Table T2 is exactly similar in structure as T1.
The table T1 contains information of orders and it has more than 300 million rows in it. All of them
are unique. So there is no way of partitioning the table based on Order number.
The table T2 contains information of orders of a particular day and has around 2-3 million rows in it.
The table T1 can have multiple indexes since lot of reports run against them and indexes are necessary.
The orders in T2 can have old orders and new orders. About 20-30 percent of orders in table T2
may already exist in table T1.
The goal is to replace the orders in T1 by the orders in T2 and insert the remaing new orders in T1.
This is being achieved by delete and insert operations (merge is also the other option too)
Whatever the option we try, I think performance sucks. As we have seen one of the 10 hour job
has delete run for 9 hours and insert takes 1 hour. Delete is the big culprit.
I have a solution in a different way but I want to know your opinion.
Create an additional column called seq_no and have oracle sequence generate the value to it.
so now my T1 looks like
seq_num
order_num
order_desc
.
.
Say today there are 300 million rows in T1 (all the order numbers are unique so far) and table T2
has 2 million (200,000 order numbers already exist in T1).
Insert all the 2 million rows to T1 (no delete here). This will probably take 1 hour and we are
saving 9 hours of time by avoiding delete statement.
Next day, again insert all the rows from T2 to T1.
Following day, do the same thing.
Say after 30 days, I have 200K times 30 = 6 million order numbers that are duplicate.
Now create a view on T1 by having a select statement to select the most recent order based on seq_num.
Here analytical functions probably is a good choice.
This view will be the source object for all the reports/jobs that access T1.
So my table name should be changed to T1_tab and my view name should be T1. How would the select
statement in my view perform?
Once in 30 days, perform the delete operation on T1_tab (during a down time) by keeping
the order numbers for the most recent seq_num.
Do you think this makes sense? The goal here is to avoid doing deletes everyday which consumes
90% of the DML time.
What other recommendations do you think you can give?
I don't have a benchmark test to achieve this but I guess this design probably work.
Thanks a lot,
In Response to Designing generic tables and ad hoc query tables
Senthil Kumar, June 21, 2006 - 7:05 pm UTC
generic design
an avid reader, June 25, 2006 - 2:44 am UTC
Tom,
We have a requirement to extract data from various data sources(data warehouses) and create a report for the client. My team also wants to store the extracted data for future purposes. We have to create an extraction job without any user interface to do the job. We were also told to create the data model in such a way that the extraction program should be generic enough so that it can run for another client without any code change.
This leaves us only one option which is to create some kind of meta data data model where we have to store the different data source names, the corresponding table names and the different report fields to be extracted. Also since the data warehouses have normalized tables if I have to store the data from those tables I need to do the same which means I need a table that defines not only the fields but also the relationships between them
I know you are against the so called "generic" design, how do we approach this and how do you think we should create the data model ?
a detailed reply please...
June 25, 2006 - 11:39 am UTC
...
We were also told to create the data model in such a
way that the extraction program should be generic enough so that it can run for
another client without any code change.
......
hehehe, good luck with that. While you can do that, no one is going to be happy. Not you, not your management, and particularly not your end users.
You already know my opinion on the matter. You have all of the details I'm willing to give. 'Good luck' is about the only thing I'll add.
This'll be one of my items:
</code>
http://asktom.oracle.com/Misc/what-did-i-decide-on.html <code>
Just to give you a hint of things I might say about it:
o make sure to only use varchar2(4000) for everything. It is after all the most flexible.
o better yet, have a table that only has number columns - 1000 of them. These numbers will be surrogate keys that point to a table with a number (primary key) and a varchar2(4000). We all know that full scanning a wide table is bad, so the numbers will make it skinny to full scan (and a couple dozen joins to pick up the values should be super fast since we are using numbers - they join quick)
o better better yet, a table with an "instance_id, attribute_id" and a number stored in a column called "value". No longer need we be limited by the pesky 1,000 column limitation!! It'll be very cool. You know you only need three tables:
create table object ( instance_id, attribute_id, value );
create table object_links (instance_id1, instance_id2 );
create table attributes ( attribute_id, attribute_name, attribute_datatype );
yeah, that'll do it. Better yet, be sure to "normalize" the value out of object - adding a fourth table.
Never create another table ever again!
not just "one option" ...
Gabe, June 25, 2006 - 6:11 pm UTC
<quote>
We were also told to create the data model in such a way that the extraction program should be generic enough so that it can run for another client without any code change.
</quote>
Nothing wrong with this requirement Â… though it sounds like a requirement for the metadata repository of an ETL tool. ETL tools take time to build Â… buying one is an option too.
June 25, 2006 - 6:50 pm UTC
Depends on the application, the source data, the needs of the client...
It all depends.
And the model would not necessarily be "generic" in that sense.
confused
Reader, June 25, 2006 - 10:20 pm UTC
Tom,
We are not trying to create a one generic model that supports every client but just to have some Meta data tables which can be used to form the query and support our data extraction.(probably a miniature ETL), the purpose of this Meta data requirement is because there is a distinct posibility of similar client needing similar data in future
so aside from the sarcasm if I read your reply correctly are you saying we have to create "similar" tables like the source tables every time we need details on such tables?
We were thinking of creating the Meta data tables only for query formation but for the data storage we will create the tables exactly as the source tables. So everytime there is a new table and new fields we insert rows in to the meta data tables which are in normalized form. And create the data storage table(s) for the new fields.
June 26, 2006 - 7:29 am UTC
The sarcasm came because you said "I know you don't like generic, but tell me how to do it anyway"???
I don't know what you need - the details here are few and far far far between. Designing a system takes a bit more information.
please help
reader, June 26, 2006 - 12:57 pm UTC
Ok Tom, our requirement is to extract monthly data from oracle and non-oracle warehouse systems
which then will be used to create flat file. The extract is filtered on the basis of
certain accounts which are specific to the client. (say client_id = 10 or account in
(:number of accounts that identifies the client records)).
There are different types of data we need like for example Sales, Finance, Billing etc.
The warehouse systems has normalized tables for each of these data types.
Sales will have say 3 tables: Sales_Account, Sales_subAccount and Sales_details which are
in the order of parent, child and grand child. (There will be other tables which may be
related only to the parent table and doesn't need to be in this chain relationship)
In other words, we pull data from 2 or more data warehouse systems locally and create
a flat file out of that information. We also would like to store the data extracted locally
in our tables for an year.
We can design the model exactly as the data warehouse system tables and load the filtered
data and create the report. My challenge is when my architect says build the data model in
such a way we can re-use the model for future client without major code changes. That way
we don't need to reinvent the wheel again for the future client and spend less money.
Right now only this client needs this kind of report for their internal processing.
Since the data is alread there on the data warehouse systems for all the clients, there might be
similar request for the other "like" clients who more or less has similar fields(99%) in our data
warehouse system.
Source tables for one data type:
TABLES FIELDS REQUIRED
---------------------------------------
ACCESS_BILL - CUSTOMER_CD
ACCESS_BILL - STATE_CD
ACCESS_BILL - DATA_MONTH_DT
ACCESS_BILL - MONTHLY_CHARGES_TOTAL_AMT
ACCESS_BILL - MONTHLY_CHARGES_INTEREST_AMT
ACCESS_BILL - USAGE_CHARGES_TOTAL_AMT
ACCESS_BILL - TAXES_AMT
ACCESS_NAME_ADDR - CUSTOMER_NAME
ACCESS_NAME_ADDR - BILL_NAME1_ADDR
ACCESS_NAME_ADDR - BILL_NAME2_ADDR
ACCESS_NAME_ADDR - BILL_NAME3_ADDR
ACCESS_NAME_ADDR - BILL_NAME4_ADDR
ACCESS_ACCOUNT - STATE
ACCESS_ACCOUNT - CUSTOMER_CD
ACCESS_ACCOUNT - TYPE_OF_SERVICE
ACCESS_ACCOUNT - SPECIAL_ACCOUNT_IND
ACCESS_ACCOUNT_CKT - CKT_ID
ACCESS_ACCOUNT_CKT - CHANNEL_CD
ACCESS_ACCOUNT_CKT - ACTIVITY_DT
ACCESS_ACCOUNT_CKT - SERVICE_ESTABLISH_DT
ACCESS_ACCOUNT_CKT_DTL - CKT_DTL_ID
ACCESS_ACCOUNT_CKT_DTL - PRODUCT_NM
ACCESS_ACCOUNT_CKT_DTL - PRODUCT_QTY
ACCESS_ACCOUNT_CKT_DTL - INITIAL_RT
ACCESS_ACCOUNT_CKT_DTL - UNIT_RT
ACCESS_ACCOUNT_CKT_DTL - REVENUE_AMT
On the tables above, ACCESS_BILL AND ACCESS_NAME_ADDR is related to ACCESS_ACCOUNT with STATE and CUSTOMER_CD.
ACCESS_ACCOUNT_CKT AND ACCESS_ACCOUNT_CKT_DTL are child and grand-child tables for ACCESS_ACCOUNT respectively.
I hope this much detail is enough. Please let me know.
Thanks.
June 26, 2006 - 2:32 pm UTC
I'm not sure why you need a "model" to create a flat file? You need a "query", but an entire data model to simply hold what you already have? I must be missing something...
More details
reader, June 30, 2006 - 6:47 pm UTC
We need this data to be stored in our team since we use this extracted information for creating customized reports at many intervals like weekly or monthly or if the client needs it as an ad hoc. Also this stored data will be used for future expansion like auditing controls on the data.
Also some of the data are interlinked between 2 warehouse systems so it would be better we bring down the data to our tables and we process over that.
July 01, 2006 - 7:48 am UTC
same comment, I don't see why you need a "model" yet, just a query.
your comments please....
snagaraj, July 02, 2006 - 9:04 am UTC
For the design question I posted on June 21st (userid SNagaraj), I appreciate if you can share your comments.
Thanks a lot,
Reply to : SNagaraj from austin,tx
Charlie Zhu, August 10, 2006 - 7:50 pm UTC
I went throught all the post, only this one is not replied.
So here are my suggested options
1) partiton T1(history) table with date (daily etc.)
2) For all TX on orders, run INSERT on T2 and UPDATE on T1.
-- so remove the need to update T1 from T2.
3) Did you benchmarke MERGE performance?
(Delete + Insert will generate more redo logs when you have many indexes, and merge only update columns with few indexes on)
table design
Lakshmi, September 13, 2006 - 8:35 am UTC
Hi Tom
Can you please suggest some links /resources to learn about "how to do table design"?
Thanks
table and schema design
Joe, October 23, 2006 - 5:53 am UTC
Hi Tom -
I've been following this thread for a while now and was looking for an additional opinion relating to the very first question asked in the thread.
I'm designing a database for web users to access and store various types of content exclusively in Oracle database via an ASP .Net front-end. Right now, each individual user will have his/her own schema (5 tables, 5 sequences, 10 views per schema) and potentially I will have 5,000+ schemas (one per registered user, each user will have their own personal content to store in their 5 tables...) and can store up to 5GB of data. If a schema owner wishes to grant access to their personal content to another user, it will be done by granting select access to a view created on the table containing the content.
I've never worked with a database that has more than 200 schemas - and was wondering if you could point out any potential performance problems associated with having 5,000+ schemas in a database (using 15,000 tables ..) or if you have any experience to share about another way to handle this many schemas in a database. Or if you've seen databases with 5,000+ schemas before and haven't noticed any negative side-effects related to that many schemas.
October 23, 2006 - 10:14 am UTC
are these tables all the same size and shape????
I've databases with 10's of thousands of schemas - however, only a few contain database objects (applications have tables, users run applications)
yes, all the same
A reader, October 23, 2006 - 1:49 pm UTC
Thanks Tom. Yes, all the tables (4 total) will be basically the same structure:
CREATE TABLE <schema_name_here>.image(
imageID NUMBER PRIMARY KEY,
image ORDSYS.ORDImage,
thumb ORDSYS.ORDImage,
folderName VARCHAR2(40),
imageDesc VARCHAR2(2000),
imageComments VARCHAR2(2000),
upload_date_time TIMESTAMP,
imageTitle VARCHAR2(128),
imageArtist VARCHAR2(128),
imagePublisher VARCHAR2(128),
imagePrice NUMBER(6,2),
imageFilePath VARCHAR2(128),
imageThumbPath VARCHAR2(128)
LOB (image.source.localData)
STORE AS(TABLESPACE IMAGE_BLOB STORAGE (INITIAL 100M NEXT 100M) CHUNK 24K NOCACHE LOGGING);
And basically the same structure for audio, video, and content (replacing "image" in column names with audio, video... and with ORDaudio, ORDvideo, and ORDdoc where appropriate).
Some users may only be uploading images (and have nothing or only one or two items in video/audio/content); some may have everything in video, some may have a bit in each. But each user can have up to 5GB of storage space, which could be 1 row containing a 5GB video file, or 5,000,000 rows of 1K files.
[Lightbulb goes on here, ding]:
Since most of the actual data will be stored out-of-line as LOB in another tablespace, all that will be in the table will be a pointer, so the table size won't actually grow to 5GB per user. Doh! It seems clear to me now that having 1 schema might be the better choice (or am I still mising something). I'm also thinking about just using ORDDoc instead and storing everything, regardless of what it is, in the same table. So 1 table, 1 schema for 3000+ users accessing potentially over 100TB of data and millions of rows from 1 table. Is that crazy talk? Or have I just been staring at this too long?
Unless you think having 3000+ schemas and 15,000 tables might provide a better solution?
Thank again Tom for your valuable advice.
October 23, 2006 - 1:56 pm UTC
one schema - definitely just ONE SCHEMA.
you can use features like partitioning to "divide and conquer" the largeness of the table, but you really do not want to have a set of tables per user (you would end up with a shared pool the size of alaska)
3,000 users, 5gb of data each, I see 14/15 tb max, where do you get 100tb?
but one schema, THE APPLICATION, hash partition it if there is nothing reasonable to partition by (you'll have an OWNER column using this scheme, maybe you can hash on that so all of my data ends up in a partition)
many thanks
A reader, October 23, 2006 - 2:22 pm UTC
As always Tom, your wisdom and advice is very much appreciated.
100TB? wishful thinking I guess :-) Hopefully the + in 3000+ will be pushing us into 100TB territory, some day.
Enjoy your week at the Open World Conference. And thanks again
strong and weak entities
A Reader, November 14, 2006 - 10:01 pm UTC
Hi Tom,
you mentioned parent/child relations between these tables:
select *
from p, c1, c2, c3, c4
where p.key = c1.fkey
and p.key = c2.fkey
and p.key = c3.fkey
and p.key = c4.fkey
Assuming there is 1:1 optional association between P table and each of C tables, how can we write a query to report which particular C table and particular fkey in it (if any) are not present in P table?
e.g. if c2 contains fkey FK2 and c4 contains fkey FK4 and P contains neither FK2 nor FK4 as primary key, then report should look like:
"FK2 in c2 not present in P
FK4 in c4 not present in P"
November 15, 2006 - 7:03 am UTC
select p.key,
decode(c1.fkey,null,'not there', 'there'),
decode(c2.fkey,null,'not there', 'there'),
....
from p, c1, c2, c3, c4
where p.key = c1.fkey(+)
and p.key = c2.fkey(+)
.....
For some reason
A Reader, November 15, 2006 - 12:21 pm UTC
this query takes forever to complete (in my case pkey and fkey are composites of one VARCHAR2 and one NUMBER, not sure that is the reason though). The fkey is superset of pkey, i.e. fkey includes pkey but also have two additional key parts
select *
from p, c1, c2, c3, c4, c5
where p.key = c1.fkey (+)
and p.key = c2.fkey (+)
and p.key = c3.fkey (+)
and p.key = c4.fkey (+)
and p.key = c5.fkey (+)
Is there an alternative solution which is faster? e.g. something like UNION of (c1 MINUS p), (c2 MINUS p), etc.?
thanks!!
November 16, 2006 - 3:59 am UTC
define forever and give some context here, tables are of what size and scale - and are your expectations even in the realm of "reasonable" given the size of the data.
I would expect 5 full scans and 4 hash outer joins - if you do not see that, something is wrong.
Query
A Reader, November 15, 2006 - 2:53 pm UTC
Besides, your query is wrong, it produces report even when the key is found, but I asked if possible to produce the query ONLY for records that are missing:
FK2 in C2 missing in P
FK4 in C4 missing in P
and no other rows besides these two
select p.key,
decode(c1.fkey,null,'not there', 'there'),
decode(c2.fkey,null,'not there', 'there'),
....
from p, c1, c2, c3, c4
where p.key = c1.fkey(+)
and p.key = c2.fkey(+)
.....
November 16, 2006 - 8:19 am UTC
and you cannot figure out the obvious solution to that?????????
hmm, ok, hint: add a where clause
and (c1.fkey is null or c2.fkey is null or ..... )
seems pretty straightforward, no?
and besides, the query is only "wrong" after you add this requirement which was actually not specified in the original "asking"
you missed the point
A reader, November 16, 2006 - 10:35 pm UTC
FK2 in C2 missing in P
FK4 in C4 missing in P
what does that imply? your solution gives:
......PK2 in P missing in C2 ...
...................................PK4 in P missing in C4 ...
November 17, 2006 - 3:21 am UTC
you missed the point, you are getting what you asked for, if you want to print it down the page - go for it, you are talking about a report, just do it.
I got you the data, now you print it pretty.
well, I would
A Reader, November 17, 2006 - 1:41 pm UTC
but I can't. I got wrong data from the query. Any report I make will be wrong because the data are wrong. Now I could re-organize rows, columns etc. but essential information is missing. If a column value is missing from table P is totally different fact then the fact that a column value is missing from table C2 or C4 or both at the same time. I could reverse the query and outer join with P five times instead but that is not allowed by syntax rules
November 20, 2006 - 2:24 am UTC
umm, you really need to be a tad more clear then. I answered precisely what you asked for.
...
Assuming there is 1:1 optional association between P table and each of C tables,
how can we write a query to report which particular C table and particular fkey
in it (if any) are not present in P table?....
that is exactly what my query shows.
ops$tkyte%ORA10GR2> select * from c1;
FKEY
----------
1
2
ops$tkyte%ORA10GR2> select * from c2;
FKEY
----------
2
3
ops$tkyte%ORA10GR2> select p.key,
2 decode(c1.fkey,null,'not there', 'there'),
3 decode(c2.fkey,null,'not there', 'there')
4 from p, c1, c2
5 where p.key = c1.fkey(+)
6 and p.key = c2.fkey(+)
7 /
KEY DECODE(C1 DECODE(C2
---------- --------- ---------
2 there there
3 not there there
1 there not there
Now, what is "wrong" - and be precise, pretend you are explaining the problem to your mom, be that detailed.
which ways is better to design this table
Yuna, November 30, 2006 - 2:07 pm UTC
Hi Tom,
Parent A have one to many relationship with B;
1) method 1 to create a table T1 as following:
create table T1(
index number(9) not null primary key,
aName varchar(128) not null,
level number(1) not null,
parameter number(9))
Both A and B will be in this table.
if A, level will be 0 and the parameter will be null;
if B, level will be 1 and the parameter will be the index of A.
2) method 2 to create a table T2 as following:
create table TA(
index number(9) not null primary key,
aName varchar(128) not null));
create table TB(
index number(9) not null primary key,
BName varchar(128) not null));
create table TAB(
index number(9) not null primary key,
A_ID number(9) not null));
B_ID number(9) not null));
A and B will be in different tables. and there are another third table to represent their ralationship.
From performance's point of view, I think method is good. but my colleagues don't like method 1 because they think it is too complicated to code.
Can you give comment about this two methods?
Thanks in advanced.
November 30, 2006 - 3:34 pm UTC
there is only one reasonable way to even consider doing this.
create table a
( key datatype PRIMARY KEY,
....
)
create table b
( key datatype PRIMARY KEY,
fkey references A,
.....
)
neither of your approaches would be used. This is fairly basic relational design - I'm not sure why you would even consider the above two approaches?
which ways is better to design this table (2)
Yuna, November 30, 2006 - 5:08 pm UTC
Sorry, I think I did not state the question very well. A is not a really parent to B.
The relation ship between A and B is like medication to Dosage.A( medication) have 1,2,3,4 specific dosage (B).
and we have to get medication and its corresponding dosage to be connected. the simple relationship table can't do that.
many thanks in advance!
November 30, 2006 - 5:55 pm UTC
umm, explain why not.
Medication ( medication_id primary key, ..... );
Dosage( medication_id references medication, dosage_id,
primary key(medication_id,dosage_id) );
A really is a parent to B.
which ways is better to design this table (3)
Yuna, December 01, 2006 - 10:11 am UTC
Hi Tom,
Thank you very much for the clarification. I'm not quite confident with all the terms myself.
In your last followup table b using dosage_id and there would be another column dosage name inside. Since lots of medication have the same dosage, so if we keep a seperate table for dosage and have a third relationship table make more sense?
can you give a little bit comment about Mothod (1)?
Still many thanks!
Yuna
December 01, 2006 - 10:17 am UTC
Now, if you are telling me:
there is an entity "medication" (let's call it med)
there is an entity "dosage"
then, you have a MANY TO MANY relationship between these two entities and are in need of an association object:
med_to_dosage ( med_key references med, dosage_key references dosage, primary key(med_key,dosage_key) );
one column or more column?
Yuna, December 01, 2006 - 11:16 am UTC
Hi Tom,
Thank you very much for your wisdom.
There is an entity T have about 200 attributes (a,b,c,d...). There are two way to design the table:
a)
create table T(
aID primary key;
a type;
b type;
..
..
column200... type)
b)create table T(
aID primary key;
columnname varchar2(50);
columnvalue number(12))
which way is better? since method b will have about 200 records for each insert, whether it will cause performace issue in the future?
Many thanks!
)
December 01, 2006 - 12:44 pm UTC
I have no idea what you mean here.
if you have two hundred attributes, you have two hundred attributes (columns).
but - you don't really share your actual problem (same problem we just had above here, it was like pulling teeth to figure out what you were really doing)
Which approach is better - IT DEPENDS ENTIRELY ON THE REAL PROBLEM.
one column or more column(2)
Yuna, December 01, 2006 - 1:53 pm UTC
Sorry, I will restate the problem:
A patient has 200 atriibute : height, weight, blood pressure......about 200 attributes.
One way to design the table as
a)
create table patientA(
id_patientA number(9) primary key not null,
patient_id number (9) not null reference tableC,
,
height number(3),
weight double,
....
bloodpressure number(3))
the records in patientA about patient 100 would be:
(1, 100, 108,163.......88)
b)
create table patientB(
patientB_id number(9) primary key not null,
patient_id number (9) not null reference tableC,
atrributename varchar2(128),
atrributevalue varchar2(128))
the records in patientB about patient 100 would be:
(1, 100, weight, 108)
(2, 100, height, 163)
......
(200, 100, bloodpressure 88)
One of my colleague think that be is better in normalizing table design and easy for code. I would concern about future performance since the records number in patientB would increase fast.
Please give comment.
Thank you very much!
December 01, 2006 - 9:43 pm UTC
there is but one way to design this table.....
a)
if your colleague believes that "in order to normalize, you need to do this funky attribute value pair JUNK", they need to re-read the books.
Normalization does not mean "more than one table"
you have an entity (patient)
it has well defined attributes (height, weight, etc)
you are only properly served by using a)
one column or more column(3)
Yuna, December 04, 2006 - 10:52 am UTC
Thank you very much for your precious input!
one column or more columns(4)
A reader, December 04, 2006 - 12:17 pm UTC
Hi Tom,
If we have a web form about a patient. The atrributes ( about 100) of the webform might change as requested by the user. Whether B can be used for those kind of webform design?
Thank you very much!
Yuna
Re: SNagaraj - online T2 and history T1
Charlie Zhu, December 04, 2006 - 2:21 pm UTC
Please ignore point (2) of my previous reply.
Here is an update one, make a rolling partition on T2:
1) partiton T1(history) table by date range ( such as daily)
2) partiton T2(OLTP) table by date range too
3) schedule a job after mid-night, move T2 data to T1.
3.1)
TRUNCATE last day partition on T1,
and copy last day partition data on T2 to T1
(INSERT into T1 SELECT ... from T2 PARTITION (p_last_day);
you may use +append hint with direct load INSERT options)
3.2) DROP last day partition on T2, create tomorow day partition on T2.
Re: A reader - one column or more columns(4)
Charlie Zhu, December 04, 2006 - 2:58 pm UTC
If your business user freqently asked for changing the logical data model, you have issues with you data modeler doing the data modeling job.
The requirement is not complete, or business pattern is not mature.
The outcome document for model should include
what the new designed data model can do and can not do in 1 or 5 years,
and what changes should be done for the next 5 years with the new business model.
A complete, nonRedundancy, Enforce business rule,Data Reusable, flexible and stable data, Elegant model.
Re: Yuna - changing attributes
Charlie Zhu, December 04, 2006 - 3:07 pm UTC
You may employ XMLType data type for the changing attributes, group them into one column.
But the performace and Data Reusablility is bad,
we're suffering it for our book attributes, such as hardcover, Dust Jacket, publish year ...
changing attributes
Yuna, December 05, 2006 - 11:13 am UTC
Thank you very much Charlie! I do appreciate that!
I am trying to persuade my colleagues about the shortage of using method (b) in designing tables with changing attributes, they don't buy my points. Is there any way that can satisfy both the performance and the flexibility?
Thanks a lots!
Yuna
Basic Table Design
Maverick, December 05, 2006 - 2:01 pm UTC
Our project is currently in physical design phase and we are working on some table structures. Have couple of questions related to that
1) We have a table with 60-70 columns , most of them are flags [Y or N on front end],
Creating those columns as Integers and while reporting changing back to 'Y' /'N' is better or keep then Varchar2(1) with 'Y' or 'N' as only allowed values?
2) Some of those columns have multiple option
eg: Insurance is a column and user can select Dental/Health/Vision or all three.
Which is better option and why?
creating seperate columns like insurance_dental,insurance medical,insurance_vision
or
Creating Nested tables for insurance column?
Appreciate, any insights you can give us on this
Thank you
December 06, 2006 - 9:25 am UTC
1) no, it would not be better. use the right type to store the data you want to store.
integers are numbers, numbers are varying width fields in Oracle.
2) so it sounds like you have
a) dental
b) health
c) vision
d) all of the above
and a single attribute doesn't it?
no nested tables, don't use them to persist data - use a proper child table in a master/detail relationship when you need them.
Basic Table Design
A reader, December 06, 2006 - 11:37 am UTC
Thanks Tom. I'll follow that. But Can you give me reasons why it's a bad idea..on both issues?
December 07, 2006 - 8:44 am UTC
1) you always use the right type to store your data, period. why would you even consider obscuring something in a number and decoding it later????
tell you what - you tell us why you think it would be sensible, good - and we'll tell you why "it just isn't so"
2) nested tables are syntactic sugar. If you:
create type mytable as table of some_type;
create table t
( x int primary key,
y mytable
)
nested table y store as y_tab;
you'll get:
create table t ( x int primary key, HIDDEN_COLUMN raw(16) UNIQUE );
create table y_tab ( HIDDEN_COLUMN raw(16), <attributes of my table> );
then you have to remember to index HIDDEN_COLUMN in y_tab yourself.
so, a pair of tables with an extra unique index and a 16 byte raw
when what you should have is:
create table t ( x int primary key );
create table y_tab( x references t, <attributes of y> );
create index y_tab_idx on y(x);
how to design table for sustainable system
A reader, December 07, 2006 - 1:49 pm UTC
Hi Tom,
Our development group want to design sustainable system, which means that they hope the user can maintain the system by themselves: The system is a three tires web application. The user can control what will be displayed in the interface by adding, deleting or modifying the attributes of the entity to the database.
Based on this idea, lots of the tables will be designed as
create table t(
index number,
attributeName, varchar(128),
attributeValue, varchar(128)).
This is a great idea, but I think there are some disadvantages too:
1) too much rows for one insert;
2) too much data converting in coding;
3) hard to use oracle database's feature such as partition and index for the values.
Hi Tom, can you give some insight for this?
Thank you very much!
Row Level Trigger with Bulk Insert
Arindam Mukherjee, December 29, 2006 - 12:37 am UTC
Sir,
As I am now at client¿s office and going through a technical meeting, I could not access my database. I have two questions. Please respond to with your easy example. Database is Oracle 9i R2.
Question 1: is it possible to put into practice ROW-LEVEL trigger for BULK insert like FOR ALL. I am in doubt how to handle :NEW and :OLD for bulk insert.
Question 2: How does ROW-LEVEL trigger work when data is loaded through SQL Loader?
Regards,
December 29, 2006 - 9:40 am UTC
gee, I hope you don't expect that I just sort of sit here to answer 24x7.
q1: of course, triggers fire independent of the type of sql statement firing them. for a forall i in 1 .. 100 - you would have 100 fires of the before, before for each row, after for each row, after triggers of course.
q2: sqlldr (non-direct path) just uses inserts like the rest of the world. Nothing special about it.
a direct path load however will not fire triggers from sqlldr.
Sterling Explanation
Arindam Mukherjee, December 30, 2006 - 12:06 am UTC
Imperious!! Please accept my heartfelt thanks and regards for your impeccable response.
Designing the database
Khater Ali, January 02, 2007 - 10:57 am UTC
How Tom,
One of our Customer wants to create a database.Our Sun Admin asked how much space need for creating and managing the Database.So that he is allocating space.After it goes to production that is a static one..How can we measure it.he wants the full size.(Once allocated he cant be able to extend)
January 04, 2007 - 9:44 am UTC
you need to size your data set I would suppose.
And that is something only YOU are capable of doing.
Table design
Muthu, January 03, 2007 - 11:59 am UTC
Tom,
We have a table designed as below :
Entity_type NUMBER(2) Valid Values :
0-ACCESS
1-CKT_TO_LOCATION
2-PORT
Entity_id NUMBER(11) Primary key of Entity based on Entity_type
Entity_type ==> is the table listed with enumerated values
Entity_id ==> will be used a single field to store a table (entity's primary key).
Appreciate your comment on this design. What are the drawbacks from database perspective ?
I believe that it would be effecient and right way to list all the foreign key columns to the table.. rather than storing in one single generic field (which could endup with storing invalid data due to no forign keys.
January 05, 2007 - 8:24 am UTC
you show me a single table.
and don't really say anything about it.
How can one comment on it?
I don't see how your last paragraph relates to anything in the prior ones?
do a view needs disk space
A reader, January 04, 2007 - 10:16 am UTC
Hi Tom,
As far as a view represents only a select on an existing tables, I would like to know if a view (create a view) needs or occupy a disk space.
Thanks for your always precious answers
January 05, 2007 - 9:05 am UTC
the view definition resides in the system tablespace as rows in various tables.
the data "behind" the view resides in the source tables, it is not copied out and stored again.
There are "materialized" views where this is not true, with a materialized view - as the name implies - the view data is physically materialized and stored on disk.
So, normal view -> no storage, it is just a stored query.
Materialized view -> uses storage, it physically stores the data copied from the base tables.
Table Design
muthu, January 05, 2007 - 7:55 pm UTC
Tom, sorry for not providing with more details.
<<
Entity_type NUMBER(2) Valid Values :
0-ACCESS
1-CKT_TO_LOCATION
2-PORT
Entity_id NUMBER(11) Primary key of Entity based on Entity_type
Entity_type ==> is the table listed with enumerated values
Entity_id ==> will be used a single field to store a table (entity's primary key).
>>
Here is how the table looks like.
SQL> desc ckt_loc_details
Name Null? Type
----------------------------------------- -------- ----------------------------
CKT_LOC_DETAILS_ID NOT NULL NUMBER(11)
ENTITY_TYPE NOT NULL NUMBER(2)
ENTITY_ID NOT NULL NUMBER(11)
NETWORK_INTERFACE NUMBER(2)
NETWORK_INTERFACE_JACK NUMBER(2)
entity_type ==> refers the various tables with valid values
entity_id ==> stores the primary key of those tables.
I want to know is, Is it the right way to design (it is more generic).. please advise.
SQL>
January 06, 2007 - 10:57 am UTC
that is about the same amount of information you already provided - isn't it.
Ok, so we have a single table.
no clue what you do with it or why.
Muthu
Muthu, January 06, 2007 - 12:57 pm UTC
<< Ok, so we have a single table.
no clue what you do with it or why.
Sorry, i will try to provide additional info as much as i can.
We use this table to store relationships across three tables mentioned above. In other words, we create a record in 3 tables and store those primary keys (entity_id) with those entity_type values in multiple record(s). It is 1:1 relationship.
At the time of retrieval, this table is joined with the other tables by relation.
Just want to know your input on this table design.
My question was, Should we specify entity_id as individual relational table columns (by foreign keys) instead of just storing everything in one column (entity_id) as generic. Will it make any difference.
Let me know if i missed any useful info you are looking for.
January 07, 2007 - 8:01 pm UTC
if you have a 1:1 relationship between two tables, there should be
well, just one table - not two, no joins and definitely NOT a third table to muddy the waters further.
Muthu, January 07, 2007 - 8:33 pm UTC
Thanks Tom.
the reason i was told to create this new table was,
"no need to duplicate the following columns to 3 different tables (ACCESS, CKT_TO_LOCATION, PORT)"
==
NETWORK_INTERFACE NUMBER(2)
NETWORK_INTERFACE_JACK NUMBER(2)
==
Having said that, Should we specify entity_id as individual relational table columns (by foreign keys) instead of just storing everything in one column (entity_id) as generic. Will it make any difference.
January 08, 2007 - 11:53 am UTC
then why is this called "entity" and not "network_information"
Muthu, January 08, 2007 - 12:44 pm UTC
<< then why is this called "entity" and not "network_information"
But table was named as "ckt_loc_details" to make it generic.
Can you answer this one with respect to my previous question:
Should we specify entity_id as individual relational table columns (by foreign keys) instead of just storing everything in one column (entity_id) as generic. Will it make any difference.
January 08, 2007 - 1:25 pm UTC
you should use proper primary key/foreign keys - I don't know what you mean by "generic" in this sense.
You should be specific, not generic whenever possible.
Is there a way to rollback DDL?
John Pingel, January 30, 2007 - 11:10 am UTC
As most Oracle shops, we received large batches of ddl from our Data Modelers. Unfortunately due to tool issues or human error the scripts we apply to the database fail somewhere in the middle. Best case is that we would rollback all changes and have the data modeler send us a revised copy.
Our goal is to have a 100% end-to-end tested DDL script before we move to our next environment. Flash back is an incredible tool, but I do not believe you can Flash back a database permanently and then reapply DDL.
I see the following options
1) hot backup
2) cold backup
Do you have any other ideas?
January 31, 2007 - 10:38 am UTC
you can flashback a database permanently and reapply DDL - that is exactly what flashback database does!
Rollback DDL
S, February 01, 2007 - 11:36 am UTC
These are the first few sentences from the SQL Reference guide for version 9i (I believe it was there even before 9i, but am too lazy to check) for the CREATE SCHEMA statement:
Use the CREATE SCHEMA to create multiple tables and views and perform multiple grants in a single transaction.
To execute a CREATE SCHEMA statement, Oracle executes each included statement. If all statements execute successfully, Oracle commits the transaction. If any statement results in an error, Oracle rolls back all the statements.
You could use schema for testing DDL if you do not want them in production.
February 01, 2007 - 1:22 pm UTC
create schema is in my experience far too limited as to the types of ddl it can do, create table and grant do not really cover much ground.
and it would not really be useful for "testing DDL" as stated - what did you mean by that?
Testing DDL
s, February 01, 2007 - 1:37 pm UTC
John said "Our goal is to have a 100% end-to-end tested DDL script before we move to our next environment". If they're testing creation of new tables/views they could ask the modeler to generate the scrpit containing only create table/view statements and wrap the script in create schema statement. There will be no changes to the database unless every single create table/view works.
February 01, 2007 - 3:25 pm UTC
I make the rather reasonable assumption here that it would include alters and such of existing structures.
Help with a query
AD, February 01, 2007 - 5:21 pm UTC
Hi Tom,
First create table scripts and data:
CREATE TABLE A
(
ID NUMBER,
X VARCHAR2(6 BYTE),
Y VARCHAR2(6 BYTE),
FIELD_1 NUMBER,
Z NUMBER
);
CREATE TABLE T
(
ACC_NO NUMBER,
FIELD_1 NUMBER,
FIELD_2 NUMBER
);
insert into t values (100, 10, 1);
insert into t values (200, 10, 2);
insert into t values (300, 10, 3);
insert into a values(1, 5, 6, 10, 1);
insert into b values (1, 5, 6, 20, 1);
insert into b values (1, 5, 6, 10, 2)
SQL> select * from t;
ACC_NO FIELD_1 FIELD_2
---------- ---------- ----------
100 10 1
200 10 2
300 10 3
SQL> select * from a;
ID X Y FIELD_1 Z
---------- ------ ------ ---------- ----------
1 5 6 10 1
SQL> select * from b;
ID X Y FIELD_1 Z
---------- ------ ------ ---------- ----------
1 5 6 20 1
1 5 6 10 2
The problem is something like this:
For each acc_no in table t, we will check the value of field_1 and field_2.
If field_2=1 then we select table a for the query
if field_2=2 then we select table b for query
if field_2=3 then we select table c for query
There are around 15 such tables with differnt combinations.
Can you please let me how the queries will be written for this purpose. I assume it has to be some sort of ref cursors.
For example
when acc_no =100 we will select table a
Then the next query will be something like this:
1 (select * from t, a
2 where a.field_1=t.field_1
3* and t.acc_no =100)
SQL> /
ACC_NO FIELD_1 FIELD_2 ID X Y FIELD_1 Z
---------- ---------- ---------- ---------- ------ ------ ---------- ----------
100 10 1 1 5 6 10 1
when acc_no =200 we will select table b
etc.
Many thanks
February 02, 2007 - 10:27 am UTC
For each acc_no in table t, we will check the value of field_1 and field_2.
If field_2=1 then we select table a for the query
if field_2=2 then we select table b for query
if field_2=3 then we select table c for query
ouch, that hurts doesn't it. I am reminded of a bit I wrote in Effective Oracle By Design:
<quote>
Design Your Data Model for Efficiency
You should design your data model to answer your most common queries as efficiently as possible. It should be clear by now that, unless you design your system to efficiently and effectively answer the questions your end users will be asking, your system will fail miserably.
Ask Tom
¿We have this query we need to run hundreds of times a minute and it takes forever, bringing our system to its knees, Please help!¿
if type = `A¿ join to tablea
if type = `B¿ join to tableb
¿
A query like this does conditional joining. (A similar problem is a query that needs to do where exists type processing four or five levels deep.) It is obvious that the data model was not set up to answer the question they ask hundreds of times per minute.
End Ask Tom
When I set about building a system, the first thing I want to figure out is how we can store the data in such a way that the most frequently executed, performance-critical, important queries are answered as fast as humanly possible. This takes place well before the tables are created, and then we create very specific tables to handle that requirement. We choose very specific indexing technologies. We design the system to perform from day one. Tuning will happen only because I made an incorrect assumption during the design and implementation, not because, ¿Well, you always tune after you deploy.¿
The tuning-after-deploying approach just doesn¿t work. Why? Because the system is already deployed. Can you change your physical structures after you¿ve deployed system? Not really. Can you change the fundamental architecture after you¿ve deployed the system? No, not really. Those changes require another version of the product. They are certainly not things you are going to be able to fix overnight.
</quote>
this is so sad to see - you've got something here that'll never perform.
Your test case isn't 100% there - you refer to table b and table c, but I do not see them???
I see no primary keys.
I don't know your join conditions.
Test case is utterly incomplete.
I don't know the relations between T and A, T and B (1 to 1 or 1 to many or 1 to 0/1 or 1 to 0/1/more or what).
Cannot be answered with information provided. I can say with 100% confidence:
a) this will never perform good
b) this data model is botched
c) if it is a 1-1 or 1-1 optional relationship between T and A, B, C - there should not be more than one table here and that needs to be fixed.
Help with a query
AD, February 01, 2007 - 5:21 pm UTC
Hi Tom,
First create table scripts and data:
CREATE TABLE A
(
ID NUMBER,
X VARCHAR2(6 BYTE),
Y VARCHAR2(6 BYTE),
FIELD_1 NUMBER,
Z NUMBER
);
CREATE TABLE T
(
ACC_NO NUMBER,
FIELD_1 NUMBER,
FIELD_2 NUMBER
);
insert into t values (100, 10, 1);
insert into t values (200, 10, 2);
insert into t values (300, 10, 3);
insert into a values(1, 5, 6, 10, 1);
insert into b values (1, 5, 6, 20, 1);
insert into b values (1, 5, 6, 10, 2)
SQL> select * from t;
ACC_NO FIELD_1 FIELD_2
---------- ---------- ----------
100 10 1
200 10 2
300 10 3
SQL> select * from a;
ID X Y FIELD_1 Z
---------- ------ ------ ---------- ----------
1 5 6 10 1
SQL> select * from b;
ID X Y FIELD_1 Z
---------- ------ ------ ---------- ----------
1 5 6 20 1
1 5 6 10 2
The problem is something like this:
For each acc_no in table t, we will check the value of field_1 and field_2.
If field_2=1 then we select table a for the query
if field_2=2 then we select table b for query
if field_2=3 then we select table c for query
There are around 15 such tables with differnt combinations.
Can you please let me how the queries will be written for this purpose. I assume it has to be some sort of ref cursors.
For example
when acc_no =100 we will select table a
Then the next query will be something like this:
1 (select * from t, a
2 where a.field_1=t.field_1
3* and t.acc_no =100)
SQL> /
ACC_NO FIELD_1 FIELD_2 ID X Y FIELD_1 Z
---------- ---------- ---------- ---------- ------ ------ ---------- ----------
100 10 1 1 5 6 10 1
when acc_no =200 we will select table b
etc.
Many thanks
Help with a query
AD, February 02, 2007 - 5:59 pm UTC
Hi Tom,
First of all thanks for your reply.
I must apologize for two reasons 1) for more than one posting - I did not recognise that I was able to post in first instance as after clicking the submit button I was encountered with "page not found" page.
2) for incomplete specification
I am trying explain the situation again.
CREATE TABLE T
(
acc_no NUMBER,
code VARCHAR2(2),
cycle NUMBER,
option_1 number,
option_2 number
);
CREATE TABLE a
(
cycle number,
avg_field1 number,
avg_field2 number
);
CREATE TABLE b
(
cycle number,
weighted_field1 number,
weighted_field2 number,
weighted_field3 number
);
CREATE TABLE c
(
cycle number,
field1 number,
field2 number,
field3 number,
field4 number,
field5 number
);
insert into t values (100, 'SE', 0, 10, null);
insert into t values (101, 'DE', 1, 12, 20);
insert into t values (102, 'SE', 0, -1, 30);
insert into t values (103, 'RE', 0, 1, 30);
insert into a values (0, 10, 20);
insert into a values (1, 12, 22);
insert into b values (0, 25, 25,35);
insert into b values (1, 35, 45,55);
insert into c values (0, 5, 15, 20, 25, 30)
insert into c values (1, 10, 20, 30, 40, 50)
insert into c values (2, 20, 30, 40, 50, 60)
SQL> select * from t;
ACC_NO CO CYCLE OPTION_1 OPTION_2
---------- -- ---------- ---------- ----------
100 SE 0 10
101 DE 1 12 20
102 SE 0 -1 30
103 RE 0 1 30
SQL> select * from a;
CYCLE AVG_FIELD1 AVG_FIELD2
---------- ---------- ----------
0 10 20
1 12 22
SQL> select * from b;
CYCLE WEIGHTED_FIELD1 WEIGHTED_FIELD2 WEIGHTED_FIELD3
---------- --------------- --------------- ---------------
0 25 25 35
1 35 45 55
SQL> select * from c;
CYCLE FIELD1 FIELD2 FIELD3 FIELD4 FIELD5
---------- ---------- ---------- ---------- ---------- ----------
0 5 15 20 25 30
1 10 20 30 40 50
2 20 30 40 50 60
The problem is like this :
For each acc_no in table t we first check the code
if code ='SE' then we check option_1 field if option_1 > 0
then we have to join table b else
we have to join table a
the field "cycle" is the only common field between these two tables.
For example, if account_no =100
Here we have option_1>0 so we should be joining with table b.
now when we join table t with table b for acc_no =100 on cycle we get the record set as follows
select * from t, b
where t.cycle=b.cycle
and t.acc_no =100
/
SQL> /
ACC_NO CO CYCLE OPTION_1 OPTION_2 CYCLE WEIGHTED_FIELD1 WEIGHTED_FIELD2 WEIGHTED_FIELD3
---------- -- ---------- ---------- ---------- ---------- --------------- --------------- ---------------
100 SE 0 10 0 25 25 35
There is further calculation that will then be based on the fields returned from the above record.
If acc_no =102 then code ='SE' and option_1 <0 then we will select table a for the join.
1 select * from t,a
2 where t.cycle=a.cycle
3* and t.acc_no =102
SQL> /
ACC_NO CO CYCLE OPTION_1 OPTION_2 CYCLE AVG_FIELD1 AVG_FIELD2
---------- -- ---------- ---------- ---------- ---------- ---------- ----------
102 SE 0 -1 30 0 10 20
Similarly when code ='RE' then we would select table c regardless option_1, option_2 fields
So for example when acc_no=103, in this case code='RE'
1 select * from t,c
2 where t.cycle=c.cycle
3* and t.acc_no =103
SQL> /
ACC_NO CO CYCLE OPTION_1 OPTION_2 CYCLE FIELD1 FIELD2 FIELD3 FIELD4 FIELD5
---------- -- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
103 RE 0 1 30 0 5 15 20 25 30
The tables a, b, c are sort of reference tables. They do not have too many rows but each has varying number of columns. The particular reference table is selected based on the code, option_1, option_2 etc. from table t. The only common field between t and rest of the reference tables is cycle. Are you suggesting to combine these reference tables to one?
Many thanks
February 03, 2007 - 7:31 pm UTC
you do not say what the relationship between the tables are - 1 to 1, 1 to 0 or 1, 1 to many.
and if you say "1 to 1" and/or "1 to 0 or 1" - then there should be not more than one table.
Design a volatile table
Lois, February 15, 2007 - 6:13 pm UTC
Hi Tom,
Please give me some insights on the table structures for one that stores very volatile data. As in a temp table in which one process to insert and two other processes to fetch and later delete them after successful processing. FIFO based on record creation date, with a seqnum primary key. Size should be about 5000 records, snapshot.
Should I create a new temporary tablespace for this new table? How about the storage, freelists, management, pctincrease. Should I be bothered with creating indexes? How to balance between avoiding table scan without indexes, and high indexes maintenance?
Too many queries in my mind. In brevity, how to design a volatile table with high performance.
Thanks a lot!
February 16, 2007 - 1:53 pm UTC
i would be thinking along the lines of "queue" table - not regular table.
do you know of AQ - advanced queues, specifically designed for the producer/consumer application you describe.
Checking out AQ
Lois, February 21, 2007 - 10:42 pm UTC
Cool. Never know AQ exists until now. Glad to learn something new. Checking it out.
Does AQ works like table? Although records are accessed accordingly by creation_date(default), but not always so, depending on the SQL result returned. Not always a straightforward FIFO.
Also, some records will stay in the table longer(needs updating status field), until the status cleared from READY -> IN PROGRESS -> SUCCESS. AQ still a better choice?
Thanks.
February 22, 2007 - 8:38 am UTC
AQ is a message queue. You would queue messages in there and have a process dequeue them. You likely would not "update them" but rather the "in progress" stuff would likely be records in a traditional table (if they exist at all - I question the need for "in progress", but that is up to you and your design - they might just be more messages in a queue, the same queue or another one)
Design issues
Maverick, March 27, 2007 - 2:17 pm UTC
Couple of design questions:
1) Table Design:
Need to design a cross mapping table for two objects
Employer: Id, name
Status : Id,Description
Emp_status: Emp_id, status_id
For one status [say status_id=6, we need to capture some description from user]
my question is where can i have this description field?
a) Comments column in Emp_status table or
b) Comments column in Employer table?
Which one makes sense?
2) Code design:
In every procedure we write we are using some variables [local] that are exactly same and since we are repeating it in every procedure [eg: Error_mssg variable to capture message from sqlerrm] we thought of using them as global variables [declared in one package and used in many procedures/functions etc..]
Do you see any problems with this approach performance wise or otherwise?
Thanks for any insights you can provide on both these issues.
March 28, 2007 - 11:13 am UTC
1) this description sounds like it is an attribute of the emp_status relation. It does not belong in the employer table - it is a description about the status=6 relation in the emp_status table for that employer
2) global variables are evil in general and should be avoided in practice. This is true in pretty much every programming language.
Design Issues
Prasad, March 28, 2007 - 7:54 am UTC
Hi Tom,
In my data mart I have the following table design issues:
Present Status:-
I used the star schema design method. The Data mart has a dimension table called "claim_dim" and the fact table called "claim_fact". The "claim_dim" table contains the reporting variables and the by variables related to claim. These variables will helps in filtering. The "claim_fact" table contains the list of analysis variables. The above mentioned two tables has a 1-to-1 relationship between them. In other words both the tables will always contains exact number of records at any point of time. In addition, the "claim_fact" table is surrounded by many other dimensions. The "claim_dim" is a confirmed dimension. Its dimension key shared across all other facts in the data mart. The claim_fact table's granularity is lower than the other facts in the mart.
Question:-
1) Shall I combine the claim_dim and claim_fact table?
Advantage: One extra join can be avoided.
Disadvantage: The fact table will contains more number of descriptive columns.
2) Shall I leave the design as it is?
Please do suggest me which one is the best choice?
Thanks in advance.
March 28, 2007 - 12:11 pm UTC
if it is 1:1, it is not a dimension table really anymore.
it would seem you would want to consolidate this, especially if you almost always put a predicate on these columns.
Design issues
Maverick, March 28, 2007 - 12:39 pm UTC
Tom, Thanks for responding for my issue above..
But can you tell me why do you think Global variables are Evil? Do they effect the performance? I was mainly looking at Maintanance point of view and thought using global variables makes maintanance easier..like changing size etc..
March 29, 2007 - 10:40 am UTC
because globals are evil - you should have learned that in programming school.
so, you use a single variable to store an error message - and every bit of code writes to it. That - that is a recipe for disaster.
Your code becomes non-reentrant, maintaining it becomes very very hard (left hand forgets to tell right hand what it is doing).
Avoid globals at all costs.
degenerate ................ the dimension, that is
Duke Ganote, March 28, 2007 - 1:35 pm UTC
A case example...
Nathan, March 28, 2007 - 5:28 pm UTC
Tom - interesting discussion.
I have a case study where I received much better performance by splitting a single logical table into multiple parts...
This is in a moderate to large data Warehouse in Oracle 10.2.0.5 on a decent sized Sun server (8 * 1.25 GHz CPUs, 16 Gig RAM).
We have a single (de-normalized) partitioned reporting table that is used by the majority of our client queries. It is essentially a materialized view, but we maintain it directly (for performance reasons). (Partitioning is designed to speed data load time - it does not assist signficantly with client querying.)
This table was 21 Gig, and contained 207 columns and about a dozen indexes. Since it was a general-purpose table, it was difficult to predict how clients would query it (as they querried from many different angles against many different fields).
Although some queries were running well against the indexes, clients would frequently hit the table with Full Table Scans, requiring over an hour (with multiple parallel processes running) to return results in some cases. Our developers were also spending a significant amount of time trying (sometimes in vain) to optimize our clients' queries.
After analyzing the table, we came to the conclusion that data access would be much more efficient if we split it into three separate tables:
1. A smaller "INDEX" table, which contains all of the fields (66) required to perform 90% of the querying. (Aprox 4 Gig in size) All old indexes are now created against this table.
2. A second "ANALYSIS" table, which contains all of the remaining analysis fields (128) which answer 90% of the remainder of queries. (Approx 6 Gig in size). Only has a PRIMARY KEY index (uses the same PK as the INDEX table).
3. A third "TEXT" table, which contains the 13 full-text fields required to complete the picture (e.g. VARCHAR2s with note text). (Approx 11 Gig) Only has a PRIMARY KEY index (uses the same PK as the INDEX table).
With this construction, we were able to do the following:
- Reduce our index rebuild time significantly. Non-local indexes can now be re-built by scanning the INDEX table only.
- Direct users to only the INDEX table when possible, cutting FTS times from > 1 hour to approx 10 minutes.
- Direct users to a view between the INDEX and ANALYSIS tables which cut run times significantly when a smaller number of rows are selected (can use the INDEX table to identify the necessary Primary Keys and then Nested Loop to the ANALYSIS table) and still have reasonable performance when a larger number of rows are returned.
- Create a view which combines all three tables, but restrict access to those who really need it. Queries against this resulting view also generally perform well because they use Nested Loops to access the (massive) TEXT table.
Obviously this is a significant amount of work to maintain, and we tried numerous other approaches before coming to this one. However, there are some situations where splitting a table can be very benifical...
Continued
Nathan, March 28, 2007 - 5:47 pm UTC
Just a note - non-Local indexes were re-built due to the partition drop / rebuilds done to populate the tables.
Have taken your advice to heart and no longer perform generic index re-builds unless there is a specific performance problem...
Thanks for all your input Tom!
Help with the design
Rajkumar, May 04, 2007 - 5:32 am UTC
Hi Tom,
we have a table namely PROD_PACK which holds product level information
and their prices for the current month AND NOT FOR PREVIOUS MONTHS.
This table gets loaded from a weekly file.
On each load this table would be truncated and repopulated with appended data.
Now we have a plan to maintain a history of products and price related information for all
months by creating a new table PROD_PACK_HIST with the following idea.
1) Table PROD_PACK gets loaded weekly.
2) Insert into PROD_PACK_HIST by selecting the INSERTED rows from PROD_PACK
by having an insert trigger on PROD_PACK
3) Delete duplicate rows from PROD_PACK_HIST
Is this a good and valid idea?
Is using a trigger worth enough for this situation or can it be handled
in some other better way?
May 04, 2007 - 1:06 pm UTC
why not just use one table and partition by date?
Designing the Table with XMLTYPE
Ed from Dallas, June 12, 2007 - 5:45 pm UTC
I'm trying to create a table that contains an XMLTYPE column. As with all LOB datatypes (which an XMLTYPE is), I want to force the data storage from this column to it's own tablespace - with no luck.
SQL> CREATE TABLE WS_LOG1
2 ( WS_LOG_ID NUMBER(12) CONSTRAINT ws_log1_PK_NN NOT NULL,
3 SOAP_MESSAGE CLOB NULL,
4 MACHINE_NAME VARCHAR2(64) NULL
5 )
6 TABLESPACE NDTS_DATA1
7 LOB (SOAP_MESSAGE) STORE AS WS_LOG_LOB_SEG1 (
8 TABLESPACE CLOB_DATA
9 ENABLE STORAGE IN ROW
10 STORAGE (
11 INITIAL 1M
12 NEXT 1M
13 )
14 CHUNK 8
15 CACHE);
Table created.
However, if I create a new, basically identical table, and change the SOAP_MESSAGE datatype to XMLTYPE, I get the following message:
SQL> CREATE TABLE WS_LOG2
2 ( WS_LOG_ID NUMBER(12) CONSTRAINT ws_log2_PK_NN NOT NULL,
3 SOAP_MESSAGE XMLType NULL,
4 MACHINE_NAME VARCHAR2(64) NULL
5 )
6 TABLESPACE NDTS_DATA1
7 LOB (SOAP_MESSAGE) STORE AS WS_LOG_LOB_SEG2 (
8 TABLESPACE CLOB_DATA
9 ENABLE STORAGE IN ROW
10 STORAGE (
11 INITIAL 1M
12 NEXT 1M
13 )
14 CHUNK 8
15 CACHE);
LOB (SOAP_MESSAGE) STORE AS WS_LOG_LOB_SEG2 (
*
ERROR at line 7:
ORA-00904: "SOAP_MESSAGE": invalid identifier
If the XMLTYPE datatype is inherently a CLOB/NCLOB datatype (with a Well-Formedness check wrapper), why can I not force the allocation of the XMLTYPE column data specifically to a tablespace I desire like with all other LOB datatypes?
Am I doing something wrong, is there a workaround, or is this still a limitation for XMLTYPE columns?
June 12, 2007 - 5:49 pm UTC
ops$tkyte%ORA9IR2> CREATE TABLE WS_LOG2
2 ( WS_LOG_ID NUMBER(12) CONSTRAINT ws_log2_PK_NN NOT NULL,
3 SOAP_MESSAGE XMLType NULL,
4 MACHINE_NAME VARCHAR2(64) NULL
5 )
6 TABLESPACE users<b>
7 LOB (SOAP_MESSAGE.xmldata) </b>STORE AS WS_LOG_LOB_SEG2 (
8 TABLESPACE example
9 ENABLE STORAGE IN ROW
10 CHUNK 8
11 CACHE);
Table created.
storage clauses are so 1990's - use locally managed tablespaces with autoallocate...
Designing the Table with XMLTYPE
Ed from Dallas, June 13, 2007 - 12:19 pm UTC
As always, thank you so much, Tom.
About an hour or so after I posted here, I got this DDL to execute as well. I'm not sure yet of the difference between them.
SQL> CREATE TABLE WS_LOG2
2 ( WS_LOG_ID NUMBER(12) CONSTRAINT ws_log2_PK_NN NOT NULL,
3 SOAP_MESSAGE XMLType NULL,
4 MACHINE_NAME VARCHAR2(64) NULL
5 )
6 TABLESPACE NDTS_DATA1
7 XMLTYPE COLUMN SOAP_MESSAGE STORE AS CLOB (
8 TABLESPACE CLOB_DATA
9 ENABLE STORAGE IN ROW
10 STORAGE (
11 INITIAL 1M
12 NEXT 1M
13 )
14 CHUNK 8192
15 CACHE);
You note that I should be using locally managed tablespaces with autoallocate. I am. For the sake of brevity I left out 10 other columns that would be in this table. Should I not attempt to keep the table data in one tablespace while having another manage the XMLTYPE (CLOB) data via the (so 90's) storage clause? I guess I'm getting a mixed message on how to properly manage clob data (LOB storage) based on this site and others. There seems to be considerable documentation on the subject of LOB storage yet I'm inferring from you that none of this is necessary reading if I've implemented locally managed tablespaces with autoallocate. Sorry for my confusion.
My other reference was:
http://www.oracle.com/technology/products/database/application_development/pdf/lob_performance_guidelines.pdf
June 13, 2007 - 2:38 pm UTC
if you are using LMT, drop the initial and next clauses!
you don't need a storage clause to separate the segments, just the tablespace clause will do.
Designing the Table with XMLTYPE
Ed from Dallas, June 14, 2007 - 4:29 pm UTC
DOH!! My bad (dawn just broke on Marblehead). You couldn't have been any clearer with your first response. Sometimes I can't see the forest for the trees. Best regards and thank you so much, Tom.
Staging tables for modified data
Tsang, July 16, 2007 - 2:30 pm UTC
Hi Tom,
I have 1 equipment table and 7 component tables (each stores data for each of 7 different equipment component¿s details).
The application will allow users to modify the data in these tables. The modified data needs to be approved by manager before it is updated to the tables. Some updates are through views (created using 1 or more of these tables).
The approach was to create 8 tables (8 replicas of their respectable tables without the data). They are served as the holding place for the modified data. Once the manager approved the changes, the data will be deleted from these holding tables and the actual tables will get updated with the approved changes.
What would be the best approach in this situation?
Thanks!
July 17, 2007 - 11:34 am UTC
your approach is viable. You didn't give me anything else to choose from...
AQ (advance queues) would probably be overkill for something as simple as this. What you have - these interface staging tables - works.
Related Question.. Not answered in this thread
Raam, September 21, 2007 - 1:04 am UTC
Question : What are the pros and cons of modeling a single table in place of multiple tables we currently have?
Scenario: There is a table called "Properties" that has different types of properties and its attributes in it. The tables I am designing is dependent on this table.
My tables are
Customer (Header) This table intersects with "Properties" table in different logical usages.
Customer --- Owned Properties
Customer --- Interested Properties
Customer --- Locale Properties
Customer --- Rejected Properties
Each of this association captures additional information. There may be two to six additional fields for each association. Currently we have created separate tables for each association. We currently have no requirements to get joined information. So they are logically different sets of data.
Customers - May have 500,000 records
Interested Properties - May have 20 to 30 per Customer
Rest of the tables may not have as many for each customer.
Now from functional point of view the customer who will be using this application may want to create a few more association that we have not provided. So a suggestion made by function person is that we should have one table with different roles. So it can be used by (deploying) customers to add more association if necessary instead of creating new tables. This table will be implemented in java layer as polymorphic objects.
I would like to know your opinion. What is best from application functionality and performance point of view?
September 26, 2007 - 7:41 am UTC
sounds like customers have relations to properties.
and this association might have different 'flavors' to it.
there is no reason I see not to have:
customer 1:M cust_prop_relations M:1 properties
where cust_prop_relations has the primary key of customer and properties and then attributes for each of the relation types you envision. eg: a set of attributes that will be populated for 'interested' relations, another set for 'owned' and so on.
alhnuf
alhnuf, October 07, 2007 - 8:39 am UTC
Table design to generate repeating data
Rajeswari, November 03, 2007 - 8:00 am UTC
Requirement:
Approve Refer Decline
R_LOCALITY SLUM AREA Y
R_LOCALITY UNDER DEVELOPED AREA Y
R_LOCALITY HOUSING COMPLEX Y
R_LOCALITY RESIDENTIAL COMPLEX Y
R_SURROUND COMMERICAL BLDG Y
R_SURROUND SMALL / MEDIUM SIZE SHOPS Y
R_SURROUND OTHERS Y
R_ACCESS EASY TO LOCATE Y
R_ACCESS DIFFICULT TO LOCATE Y
R_ACCESS UNABLE TO LOCATE Y
R_ACCOTYP CHAWL Y
R_ACCOTYP SLUM Y
R_ACCOTYP FLAT Y
R_ACCOTYP INDEPENDENT HOUSE Y
R_ACCOTYP BLOCK Y
R_ACCOTYP OTHERS Y
R_ACCOTYP GUESTHOUSE Y
R_ACCOTYP ROW-HOUSE Y
R_ACCOTYP HOTEL/LODGE/HOSTEL Y
Currently I planned to have normal table.
create table baseconfig(cname varchar2(10), cvalue number(3), desc varchar2(100));
insert into baseconfig values('R_LOCALITY',1,'SLUM AREA');
insert into baseconfig values('R_ACCOTYP',3,'FLAT');
Similarly we have data for all the configuration
For decision we have table like below
create table cpv_config(locality number(3),surround number(3),accessb number(3),
accomtype number(3),decision varchar2(1));
Total attributes for this table is 30.
insert into cpv_config values(3,1,1,3,'A');
insert into cpv_config values(4,1,1,3,'A');
insert into cpv_config values(3,2,1,3,'A');
insert into cpv_config values(4,2,1,3,'A');
insert into cpv_config values(3,1,2,3,'A');
insert into cpv_config values(4,1,2,3,'A');
insert into cpv_config values(3,2,2,3,'A');
insert into cpv_config values(4,2,2,3,'A');
Total =2*2*2*3 = 24 records
Similarly I have records for refer and decline.
Detail table similar to cpv_config, but for each application
create table cpv_det(applnno varchar2(9),locality number(3),surround number(3),accessb number(3),
accomtype number(3));
insert into cpv_det('071102001',1,1,1,1);
I will match this detail with cpv_config and decision is taken.
My problem is maintaining cpv_config table.
Number of records increases as the combination increases,
as I have around 30 attributes, it will lead to around 50000 records
It is difficult to maintain these many maintenance details. For User we planned to provide
list box(multi select) for each column, so that within 4 or 5 maintenance they can do the maintenance
On Submit I need convert the single record to as many records.
I am in dilemma whether it is right approach or I should use VARRAY or Nested table
Please provide your feedback.
November 05, 2007 - 11:20 am UTC
i have no idea why you have the cpv_config table.
generating repeated data
Rajeswari, November 06, 2007 - 12:05 am UTC
Oracle 9.2.0.8
CPV - Contact Point Verification
cpv_config table used for to store the decision. It is one time configuration table where user provides the decision based on the different parameters (say locality, surrounding, etc). Maximum 2 years once they change the decision, so they delete existing record and maintain new one.
Current manual logic they follow is if number of Approve > Number of refer = final decision is Approve, similarly if number of refer > no. of decline = refer
So I planned to have separate table like cpv_config for decisioning. I am storing only final decision, if each column data satisfies with cpv_det I will take the decision.
As number of combination increases as each attribute value is independent, I am not sure whether the table design is correct or not.
November 06, 2007 - 9:30 am UTC
it just seems to be a cartesian join though, what use is that.
Regarding dynamic solutions
Vinay Chandrakant, November 24, 2007 - 3:50 am UTC
Hi Tom,
Having read through the posts above, and all your books, I have a fair idea of your thoughts behind highly dynamic and generic solutions. However, I would like to present a requirement that I have and ask you for your opinion on this. What I would like to know is, is there any approach you would go for, other than design a generic solution, given the requirement that I have:
We have a data-warehouse 'kind' of application where, in a staging area, we perform all data manipulation, transformation etc. and then feed the data to an operational data store (ODS). The users want the following:
There should be 'checks' in place that do certain basic validation and verification of data, before transferring that data to the ODS. Apart from fulfilling the purpose of checks, this should also serve as a 'reporting'/'visualization' tool wherein users can see information about the data. For example, they might want to see the sum of quantities in a certain table grouped by two-three attributes, or more. The kind of reports they would want to generate might change over time (they might need to aggregate across more dimensions tomorrow, they might want to check an altogether new table the next day)..
Considering this, I felt that a dynamic solution was the way to go here - build an application that takes the table name and the various dimensions as input and generate appropriate queries (rollup? analytic?) on the fly, so that the users can have their desired reporting whenever they want.
Does this fall into the class of ad-hoc query requirements? Is this a 'sane' approach? Can this be done 'statically' at all?
I know that I have not provided actual 'data' and detailed 'requirements' to you. In general, however, do you see this as a valid case where a dynamic/generic solution is the only way to go?
Thanks,
Vinay
November 26, 2007 - 12:01 pm UTC
... Does this fall into the class of ad-hoc query requirements? Is this a 'sane'
approach? Can this be done 'statically' at all? ..
this is ad-hoc query
is it 'sane'
and it isn't done statically...
The generic stuff I'm talking about is junk like
NOTE THIS CODE IS JUNK, DO NOT COPY, DO NOT USE, DO NOT DO THIS - sql injection, bad idea, horrible all around - nothing good about it :
function get_col_value
( p_tname in varchar2, p_keyname in varchar2,
p_cname in varchar2, p_val in varchar2 ) return varchar2
as
l_data long;
begin
execute immediate
'select ' || p_cname || ' from ' || p_tname || ' where ' ||
p_keyname || ' = :x' using p_val into l_data;
return l_data;
end;
or using a 'entity attribute model' (table of objects, table of links, table of object-attributes) and the like.
ad-hoc query in a warehousing environment - sure, happens all of the time. But that is not "generic code" really.
'Generic' code
Vinay Chandrakant, November 27, 2007 - 7:32 am UTC
Hi Tom,
Thanks for the response, and also the example of 'junk' code ;-)
My requirement dictates that the table name, the fields (column names) and the aggregation performed (count, sum, avg..) is not known till runtime. I would request you for some hints on how best to proceed with the design. But I fear I might be going a little 'off-the-thread' here. I will post my query as a new question, or as a follow up to an existing dynamic sql discussion, if it is relevant to that discussion.
Thanks,
Vinay
November 27, 2007 - 3:46 pm UTC
well, you are just talking about an ad-hoc query tool - for which I suggest YOU BUY, not build in the year 2007.
There are just way too many good ones out there to consider building one on your companies time.
Reg. Ad-hoc query tools
Vinay Chandrakant, November 27, 2007 - 11:57 pm UTC
Hi Tom,
True - that seems a more sensible thing keeping the overall picture in mind. But what we're looking for here is not really a complete 'ad-hoc query tool' with all the advanced product features. The scope and usage is kind-of limited. Additionally, the results of the 'queries' etc. need to be integrated with the existing schema and data model the client has in place already.
More on this in a new question that I will post soon. I tried to post one yesterday but you had a backlog. I guess I'll wait till you're accepting new questions..
And moreover, the decision of buying/introducing new products is not mine - I'm just a poor lil' developer you see.. :-(
Thanks,
Vinay
ROW TO COLUMN CONVERSION
ROOPA, April 30, 2008 - 5:27 am UTC
HOW TO CONVERT YEARLY DATA INTO MONTHLY DATA?
April 30, 2008 - 10:32 am UTC
BY MAKING IT UP I GUESS?
coverting data from row to column wise
ROOPA, April 30, 2008 - 6:00 am UTC
table1 format
MONTH AMOUNT_PAID
01.12.2006 00:00:00 5395
01.11.2006 00:00:00 567
01.11.2006 00:00:00 1974
01.04.2007 00:00:00 2462
01.04.2007 00:00:00 1974
01.11.2006 00:00:00 5395
01.02.2008 00:00:00 5395
table2 format
MONTH JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
01-DEC-2006 0 0 0 0 0 0 0 0 0 0 0 5395
01-FEB-2007 0 5395 0 0 0 0 0 0 0 0 0 0
01-NOV-2006 0 0 0 0 0 0 0 0 0 0 5395 0
how to convert table1 format into table2 format i.e yearly data to monthly data.
April 30, 2008 - 10:52 am UTC
http://asktom.oracle.com/Misc/question-of-day.html no idea what 01.12.2006 is, is that jan 12 2006 or dec 1 2006.
why use 5,395 three times - just to make it more confusing?
why not use a month with more than one observation - like 01.11.2006 with two rows - isn't that the "interesting bit"
what is the point of having 11 zeroes on each output line?
why is 01.02.2008 in the input, but 01-feb-2007 is in the output.
how is table 1 "yearly" data. Not sure what it is (either data at the DAY level, like jan 12th, jan 11th and so on or data at the monthly level)
no create table
no insert
do you expect me to create a table and reformat your data (which I'm not sure how to do, your dates are ambiguous)?
row to column conversion
Roopa, May 01, 2008 - 12:30 am UTC
hi tom
i have sent a question on"row to column conversion".what i would like to say is:
I have a table called
table1:
MONTH AMOUNT_PAID
01.12.2006 5395
01.11.2006 567
01.11.2006 1974
01.04.2007 2462
01.04.2007 1974
01.11.2006 5395
01.02.2008 5395
the above table is of year wise data.here two columns one is month and the other is the amount paid in that year.
I want this data to be reformatted like
table2:
MONTH JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
01-DEC-2006 0 0 0 0 0 0 0 0 0 0 0 5395
01-FEB-2007 0 5395 0 0 0 0 0 0 0 0 0 0
01-NOV-2006 0 0 0 0 0 0 0 0 0 0 5395 0
here the table1 data has been reformmated to month wise.i.e the amount paid in the respective month.
Is there any query to obtain this result?
May 01, 2008 - 9:33 am UTC
fascinating, I do not believe you read a thing - not a thing - that I wrote.
not
a
thing
please read it, take it to heart, then supply the required information.
Foreign relation wthi where condition
Sawsan, June 02, 2008 - 2:42 am UTC
Hi Tom,
I have this question
suppose I have table like this
sql> create table Comp (
comno varchar2(7),comname varchar2(200),comparent varchar2(7)
);
/
To enter the company structure like this
group of companies----> companies -----> branch of companies suppose 3 level only
comno comname comparent
01 oilcompanygroup1
0101 oilcompany1 01
0102 oilcompany2 01
0101001 oilbrachcomp1 0101
0102001 oilbrachcomp2 0102
.
.
.
My questions
1- if i want to know the groupcompany and company of oilbrachcomp1,I need to use substr is substr is faster to get form database?
2- suppose i have other table X and i wnat to get relation from COMP table as a foreign but for GROUPCOPMNY only there is any way to get foreign relation whith where conditin
June 02, 2008 - 11:19 am UTC
I don't know what you mean by "groupcompany" in this aspect. I didn't really follow this very easily at all - it is not clear.
You have to understand - to you, this data is obvious (it should be, it is YOUR DATA). To us, it is just bits and bytes on the screen, with no context.
Be detailed. I have no clue why you would need substr for anything here.
provided INSERTS if you hope for an example back.
A Reader, June 02, 2008 - 4:55 pm UTC
Identifier | Company ID | Security ID | code | Action ID | Mnemonic
| Flag |Company Name | ID-type | ID | Currency
| Description|Unique ID|Ann-date | Eff-date |Amd-date| Nfields
| fld-mnemo1 | Value-1 | ¿ |fld-mnemoN | Value-N |
If there is CHG_NAME in <Mnemonic> field, there will be three fields with labels in the file as shown below
CP_OLD_NAME - old name
CP_NEW_NAME - new name
CP_NOTES - Additional text information
D Equity|1500000|0|0|25000000|CHG_NAME
|N|Digital Inc| N.A.|N.A.|N.A.
| | |20070206|20070207|N.A.|3
|CP_OLD_NAME|Digital Systems Inc|CP_NEW_NAME|Allied Innovations|CP_NOTES|Allied new information |
If there is CHG_DOM <Mnemonic> field, there will be six fields with labels in the file as shown below
A Equity|9750000|0|0|24900000|CHG_DOM
|N|ATS Corp|N.A.|N.A.|N.A.
| | |20070129|20070130|N.A.|6
|CP_OLD_STATE|NY|CP_NEW_STATE|VA|CP_ADDRESS|1234 Madison Ave, NY| CP_OLD_COUNTRY|US|CP_NEW_COUNTRY|US|CP_NOTES|N.A.|
there might be some other different values for <Mnemonic> field. I just showed two samples above.
Can you please let me know how to design the tables and load this to database?
June 02, 2008 - 5:20 pm UTC
hah. very funny.
everything one would need for a "design"
except for.............
well, requirements...
Reader, June 02, 2008 - 5:32 pm UTC
Tom,
I do not have any requirements as such. I just have this file and I need to load this to database. I was wondering if you could help.
June 03, 2008 - 10:37 am UTC
if you have no requirements you are by definition done.
come on, please - someone is going to use this data right? If not, don't load it. If they are - what do they need to do with it.
You can "load" it simply with sqlldr or an external table, but so what?
You throw out tidbits like this:
...
If there is CHG_NAME in <Mnemonic> field, there will be three fields with labels in the file as
shown below
CP_OLD_NAME - old name
CP_NEW_NAME - new name
CP_NOTES - Additional text information
..
as if this data has some sort of meaning... I don't know what else to say here.
Reader, June 02, 2008 - 5:54 pm UTC
With regards to above question -
File:
D Equity|1500000|0|0|25000000|CHG_NAME|N|Digital Inc| N.A.|N.A.|N.A.| | |20070206|20070207|N.A.|3
|CP_OLD_NAME|Digital Systems Inc|CP_NEW_NAME|Allied Innovations|CP_NOTES|Allied new information |
A Equity|9750000|0|0|24900000|CHG_DOM
|N|ATS Corp|N.A.|N.A.|N.A.| | |20070129|20070130|N.A.|6
|CP_OLD_STATE|NY|CP_NEW_STATE|VA|CP_ADDRESS|1234 Madison Ave, NY| CP_OLD_COUNTRY|US|CP_NEW_COUNTRY|US|CP_NOTES|N.A.|
design
A reader, June 02, 2008 - 11:43 pm UTC
Reader, June 03, 2008 - 3:58 pm UTC
Field Description Field Type
Identifier identifier in the file
Company ID company ID NUMBER
Security ID Identifies a security Number
Code Return code. A return of 300
indicates that no action is returned and
no further fields will be delivered in that record NUMBER
Action ID A unique identifier assigned to actions NUMBER
Mnemonic Mnemonic for action Character
Flag Status of the action at the moment of the request. Character
Company Name Name of the company Character
sec_ID-type Type of security identifier Character
sec_ID The security ID Character
Currency Currency of security Character
Description The name of the market sector Character
Unique ID A unique number assigned to all securities.
This field will not populate for
company-level actions.
Character
Ann-date Date when the action is announced
Date
Eff-date Date when the action becomes effective. Date
Amd-date Date when the action was updated. Date
Nfields Number of fields NUMBER
< fld-mnemo1 >...< fld-mnemoN > Mnemonic identifying data elements #1 ... #N Varies by action
<Value-1>...<Value-Nfields> Data element #1...#N Varies by action
Name Change: MNEMONIC - CHG_NAME (Change name)
Mnemonic Definition Field Type
CP_OLD_NAME old name of the company Char (30)
CP_NEW_NAME New name of the company Char (30)
CP_NOTES Additional text information. Char (320)
For this, the MNEMONIC field has value of CHG_NAME (Change name)
fld-mnemo1 | Value-1 |fld-mnemo2| Value-2|fld-mnemo-3 | Value-3 | will have the following:
CP_OLD_NAME|Digital Systems Inc|CP_NEW_NAME|Allied Innovations|CP_NOTES|Allied new information |
Example from the file:
D Equity|1500000|0|0|25000000|CHG_NAME
|N|Digital Inc| N.A.|N.A.|N.A.
| | |20070206|20070207|N.A.|3
|CP_OLD_NAME|Digital Systems Inc|CP_NEW_NAME|Allied Innovations|CP_NOTES|Allied new information |
Domicile Change: A change in a company's country/state
Mnemonic Definition Field Type
CP_OLD_STATE The state where an issuer was Char (4)
CP_NEW_STATE The state where an issuer is now Char (4)
CP_ADDRESS The address where an issuer is now located Char (136)
CP_OLD_COUNTRY Old Country Name Char (4)
CP_NEW_COUNTRY New Country Name Char (4)
CP_NOTES Additional text information. Char (320)
For this, the MNEMONIC field has value of CHG_DOM (Change domicile)
fld-mnemo1 | Value-1 |fld-mnemo2| Value-2 .. |fld-mnemo-6 | Value-6 | will have the following:
CP_OLD_STATE|NY|CP_NEW_STATE|VA|CP_ADDRESS|1234 Madison Ave, NY|
CP_OLD_COUNTRY|US|CP_NEW_COUNTRY|US|CP_NOTES|N.A.|
Example from the file:
A Equity|9750000|0|0|24900000|CHG_DOM
|N|ATS Corp|N.A.|N.A.|N.A.
| | |20070129|20070130|N.A.|6
|CP_OLD_STATE|NY|CP_NEW_STATE|VA|CP_ADDRESS|1234 Madison Ave, NY|
CP_OLD_COUNTRY|US|CP_NEW_COUNTRY|US|CP_NOTES|N.A.|
There are some more different vales for mnemonic. It varies by action. I have to load this to a table. Please let me know, how to do this.
June 03, 2008 - 5:21 pm UTC
read about sqlldr.
<quote src = Expert Oracle Database Architecture>
How Do I Load Delimited Data?
Delimited data, or data that is separated by some special character and perhaps enclosed in quotes, is the most popular data format for flat files today. On a mainframe, a fixed-length, fixed-format file would probably be the most recognized file format, but on UNIX and NT, delimited files are the norm. In this section, we will investigate the popular options used to load delimited data.
The most popular format for delimited data is the comma-separated values (CSV) format. In this file format, each field of data is separated from the next by a comma. Text strings can be enclosed within quotes, thus allowing for the string itself to contain commas. If the string must contain a quotation mark as well, the convention is to double up the quotation mark (in the following code we use "" in place of just "). A typical control file to load delimited data will look much like our first example earlier, but the FIELDS TERMINATED BY clause would generally be specified like this:
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
It specifies that a comma separates the data fields, and that each field might be enclosed in double quotes. If we were to modify the bottom of this control file to be
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
(DEPTNO, DNAME, LOC )
BEGINDATA
10,Sales,"Virginia,USA"
20,Accounting,"Va, ""USA"""
30,Consulting,Virginia
40,Finance,Virginia
when we run SQLLDR using this control file, the results will be as follows:
ops$tkyte@ORA10G> select * from dept;
DEPTNO DNAME LOC
---------- -------------- -------------
10 Sales Virginia,USA
20 Accounting Va, "USA"
30 Consulting Virginia
40 Finance Virginia
Notice the following in particular:
* Virginia,USA in department 10: This results from input data that was "Virginia,USA". This input data field had to be enclosed in quotes to retain the comma as part of the data. Otherwise, the comma would have been treated as the end-of-field marker, and Virginia would have been loaded without the USA text.
* Va, "USA": This resulted from input data that was "Va, ""USA""". SQLLDR counted the double occurrence of " as a single occurrence within the enclosed string. To load a string that contains the optional enclosure character, you must ensure the enclosure character is doubled up.
Another popular format is tab-delimited data, which is data separated by tabs rather than commas. There are two ways to load this data using the TERMINATED BY clause:
* TERMINATED BY X'09' (the tab character using hexadecimal format; in ASCII, 9 is a tab character)
* TERMINATED BY WHITESPACE
The two are very different in implementation, as the following shows. Using the DEPT table from earlier, we¿ll load using this control file:
LOAD DATA
INFILE *
INTO TABLE DEPT
REPLACE
FIELDS TERMINATED BY WHITESPACE
(DEPTNO, DNAME, LOC)
BEGINDATA
10 Sales Virginia
It is not readily visible on the page, but there are two tabs between each piece of data here. The data line is really
10\t\tSales\t\tVirginia
where the \t is the universally recognized tab escape sequence. When you use this control file with the TERMINATED BY WHITESPACE clause as previously, the resulting data in the table DEPT is
ops$tkyte@ORA10G> select * from dept;
DEPTNO DNAME LOC
---------- -------------- -------------
10 Sales Virginia
TERMINATED BY WHITESPACE parses the string by looking for the first occurrence of whitespace (tab, blank, or newline), and then it continues until it finds the next non-whitespace character. Hence, when it parsed the data, DEPTNO had 10 assigned to it, the two subsequent tabs were considered as whitespace, Sales was assigned to DNAME, and so on.
On the other hand, if you were to use FIELDS TERMINATED BY X'09', as the following modified control file does:
¿
FIELDS TERMINATED BY X'09'
(DEPTNO, DNAME, LOC )
¿
you would find DEPT loaded with the following data:
ops$tkyte@ORA10G> select * from dept;
DEPTNO DNAME LOC
---------- -------------- -------------
10 Sales
Here, once SQLLDR encountered a tab, it output a value. Hence, 10 is assigned to DEPTNO, and DNAME gets NULL since there is no data between the first tab and the next occurrence of a tab. Sales gets assigned to LOC.
This is the intended behavior of TERMINATED BY WHITESPACE and TERMINATED BY <character>. Which is more appropriate to use will be dictated by the input data and how you need it to be interpreted.
Lastly, when loading delimited data such as this, it is very common to want to skip over various columns in the input record. For example, you might want to load fields 1, 3, and 5, skipping columns 2 and 4. To do this, SQLLDR provides the FILLER keyword. This allows you to map a column in an input record, but not put it into the database. For example, given the DEPT table and the last control file from earlier, we can modify the control file to load the data correctly (skipping over the tabs) using the FILLER keyword:
LOAD DATA
INFILE *
INTO TABLE DEPT
REPLACE
FIELDS TERMINATED BY x'09'
(DEPTNO, dummy1 filler, DNAME, dummy2 filler, LOC)
BEGINDATA
10 Sales Virginia
The resulting DEPT table is now as follows:
ops$tkyte@ORA10G> select * from dept;
DEPTNO DNAME LOC
---------- -------------- -------------
10 Sales Virginia
</quote>
Reader, June 04, 2008 - 5:48 pm UTC
Tom,
With regards to the above question, should I be creating one big table, load the data and then split it into different tables?
June 05, 2008 - 9:28 am UTC
do whatever you want, you have no requirements, no design, no need to apparently process this data - so maybe you just load it into a single table and let it be. you've achieved your stated goal "load data into database", you are done as far as I can tell.
table design
ram, June 05, 2008 - 8:24 am UTC
Tom,
We have a table with millions of rows and there is one specifc query which is always interested in a particular row which selects it first and then updates the previous value with a new value. Now as the table is huge this goes for a full scan of the table.Since the table is huge would you recommend to create an index to support the predicate values or will it be better to create a seperate look up table so that the full scan cost is avoided?
Thanks
June 05, 2008 - 10:18 am UTC
... Now as the table is huge this goes for a
full scan of the table. ...
no, you have that wrong. What you mean to type was:
Now, the table is huge and we did not index it with a unique index on the columns we use in the where clause and there are no other appropriate indexes - so it has no choice but to full scan because of what we haven't done (create indexes appropriate for our needs)
Data Model Design Question
Faisal, July 09, 2008 - 4:52 pm UTC
Tom,
I have a data modeling question.
I have three entities.
Zone --> Region --> District (Restricted Heirarchy)
Now I have another entity; store.
The requirement is that store could be in district or Region or Zone. If in distict then must be in valid distict/region/zone. If in region without district then must be in valid region/zone. if in zone without region and district then must have be in valid zone. Store in zone and district without region is not valid.
What could be the best table design in this case.
Thanks
Faisal
July 09, 2008 - 5:19 pm UTC
I don't know why a store, which has a fixed location, wouldn't always be in a district.
This will be nightmarish to "join", it'll be a join to one of three tables.
basically, store would need three attributes
zone, region, district
and you would want to make sure that you have a check constraint that says only one of them can be filled in at a time. If you have a district, we know the region and zone - you would not populate them (else you have to make sure the district and region and zone all "match", so you would just use district)
likewise with region and zone. You would only ever have ONE of the three attributes filled in.
It will be *a mess*
and since a store has a physical location, it seems they all would have districts.
Design the table
Faisal, July 10, 2008 - 3:53 am UTC
Tom,
First of all, thanks for the quick reply.
Actually, my requirement is what I exactly mentioned but my example was not good. You are right, in case of store there is always a physical location but assume I have this requirment so based on your logic; will join be a mess? Can we inforce using FK constraints only? Thanks
July 10, 2008 - 7:31 am UTC
what you call a requirement I call a request.
And I think your request is a really completely bad idea.
And when I see bad ideas in the formation stage, I point them out.
Look - you are in design mode right here, right now. If your design is wrong, bad, incorrect - you want to fix it before it becomes the legacy system.
If you use this three attribute approach, you'll have to outer join your store table to three other tables to figure out where the heck the store is. This falls soundly under the category of a really bad design decision.
be a rebel - point out how bad of an idea this is.
as a developer start thinking of requirements as 'possible feature requests'. When we call them requirements - it makes them sound so "required". When a so called requirement would force you to implement something so horrendously bad -something we know will never perform or be manageable in real life - it is time to question the so called requirement.
In order to do your 'requirement', you would have the three table, three attributes in store (mutually exclusive, if you have a district - then you do not have a zone nor a region and so on), and a 4 table outer join to figure out where the heck something is.
Table Design
Faisal, July 11, 2008 - 6:22 am UTC
Tom,
I totally agree with you but sometime business request or requirment force you to do things out of the way.
How about if I will make store table child of District and create one additional record in District table with description "Not Applicable" for all the regions without Districts and the store without district will point to this record? I think this way there will be no join issues and the whole design will be compact and easy to understand. What do you think?
Zone --> Region --> District --> store
Thanks,
Faisal
July 11, 2008 - 8:25 am UTC
and it is your job to explain to the business "this will cost you 5 billion dollars to implement, are you sure this is a requirement, have you THOUGHT this through, do you realize it does not make sense"
I did not understand your last paragraph there. How would this fix anything?
Now your stores would either belong to a district
Or to nothing - no region, no zone.
and you'd have a dummy district record that does not join to region or zone (orphaned child)
Suggestion re: Zones - Regions - Districts
Stew Ashton, July 28, 2008 - 3:38 pm UTC
Just a thought: do Zones - Regions - Districts have to be distinct entities? Suppose there were just Locations in a hierarchy? Then each store could be assigned to a Location no matter where it stood in the hierarchy, and that hierarchy could be queried using START WITH ... CONNECT BY.
This seems similar to an organizational hierarchy, where employees can be assigned to any organizational unit, not just to the low-level departments.
Not at work so can't illustrate with an example, sorry.
Adding columns dynamically to a table
Sandy, August 07, 2008 - 10:58 pm UTC
Hi Tom,
We have a following requirement:
1) We need to to check for each and every table in one of our schema, certain columns (column1,column2,column3) If these columns are missing in the table, we should alter the table and add these columns dynamically.)
2) this needs to be a generic script as this can be used in future too.
How do i check the missing columns for each table & dynamically alter the table as the table list can change any time?
August 08, 2008 - 1:07 pm UTC
holy cow.
how does one get into a situation whereby they do not actually already know what version of their application (and hence - whether these magical columns exist or not) already? Without even going to a database to ask it?
I don't get it. How could you have a "need" like this.
It is pretty easy to do.
ops$tkyte%ORA10GR2> create table t1 ( a int, b int, c int, d int );
Table created.
ops$tkyte%ORA10GR2> create table t2 ( x int );
Table created.
ops$tkyte%ORA10GR2> create table t3 ( a int, c int, x int );
Table created.
ops$tkyte%ORA10GR2> create table t4 ( b int, d int );
Table created.
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select table_name,
2 'alter table ' || table_name ||
3 case when has_a is null then ' add a int' end ||
4 case when has_b is null then ' add b int' end ||
5 case when has_c is null then ' add c int' end cmd
6 from (
7 select table_name,
8 max(decode(column_name,'A',column_name)) has_a,
9 max(decode(column_name,'B',column_name)) has_b,
10 max(decode(column_name,'C',column_name)) has_c
11 from user_tab_columns
12 group by table_name
13 having count( case when column_name in ( 'A', 'B', 'C' ) then 1 end ) <> 3
14 )
15 /
TABLE_NAME CMD
------------------------------ ------------------------------------------------------------------------
T2 alter table T2 add a int add b int add c int
T4 alter table T4 add a int add c int
T3 alter table T3 add b int
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> begin
2 for x in (
3 select table_name,
4 'alter table ' || table_name ||
5 case when has_a is null then ' add a int' end ||
6 case when has_b is null then ' add b int' end ||
7 case when has_c is null then ' add c int' end cmd
8 from (
9 select table_name,
10 max(decode(column_name,'A',column_name)) has_a,
11 max(decode(column_name,'B',column_name)) has_b,
12 max(decode(column_name,'C',column_name)) has_c
13 from user_tab_columns
14 group by table_name
15 having count( case when column_name in ( 'A', 'B', 'C' ) then 1 end ) <> 3
16 ) ) loop
17 execute immediate x.cmd;
18 end loop;
19 end;
20 /
PL/SQL procedure successfully completed.
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select table_name,
2 'alter table ' || table_name ||
3 case when has_a is null then ' add a int' end ||
4 case when has_b is null then ' add b int' end ||
5 case when has_c is null then ' add c int' end cmd
6 from (
7 select table_name,
8 max(decode(column_name,'A',column_name)) has_a,
9 max(decode(column_name,'B',column_name)) has_b,
10 max(decode(column_name,'C',column_name)) has_c
11 from user_tab_columns
12 group by table_name
13 having count( case when column_name in ( 'A', 'B', 'C' ) then 1 end ) <> 3
14 )
15 /
no rows selected
Excellent
Sandy, August 08, 2008 - 3:27 pm UTC
That was excellent as usual.
design
A reader, August 09, 2008 - 4:35 pm UTC
this is genius work
Excellent
Sandeep, August 12, 2008 - 1:58 pm UTC
What if i have different column types, it gives me following error.
SQL> begin
2 for x in (
3 select table_name,
4 'alter table ' || table_name ||
5 case when has_a is null then ' add created_on_date date' end ||
6 case when has_b is null then ' add created_by_user varchar2(40)' end ||
7 case when has_c is null then ' add created_by_process varchar2(50)' end cmd
8 from (
9 select table_name,
10 max(decode(column_name,'created_on_date',column_name)) has_a,
11 max(decode(column_name,'created_by_user',column_name)) has_b,
12 max(decode(column_name,'created_by_process',column_name)) has_c
13 from user_tab_columns
14 group by table_name
15 having count( case when column_name in ( 'created_on_date', 'created_by_user', 'created_by_proc
ess' ) then 1 end ) <> 3
16 )) loop
17 execute immediate x.cmd;
18 end loop;
19 end;
20 /
begin
*
ERROR at line 1:
ORA-01735: invalid ALTER TABLE option
ORA-06512: at line 17
August 18, 2008 - 8:51 am UTC
you cannot debug this?
do you see that you used LOWER CASE on the column names and column names are NOT lower case - it is trying to add the same column twice.
14 group by table_name
15 having count( case when column_name in ( 'A', 'B', 'C' ) then 1 end ) <> 3
16 ) ) loop
see how mine used upper case, that is how the data is stored.
Nothing to do with datatypes and if you just run the query to see what is being produced - you would have seen it right off - that it was ALWAYS trying to add three columns....
Design a table
Dakshinamurty Y., August 25, 2008 - 8:09 am UTC
There is a "Order" table. It has "Status" as one of the columns which can be either 'Open' or 'Closed' or 'Error'. The table will always be accessed by giving value to "Status".
The 'Open' records only will be queried for all Update purposes until the record gets 'Closed'
The 'Error' records only will be fetched to rectify erronious order and making it again 'Open'
The 'Closed' records only will be queried for a report purpose and should be kept in the database for 3 years.
The Orders will be coming at the rate of 0.3 million per day. They will get closed with in 7 days.
I am planning to design the table with three list partitions on "Status" column, with local indexing.
Based on the performance for given conditions, I have few questions:
1. Will it be a best design for that table?
2. Will row movement between table rows be hampering performance?
August 26, 2008 - 8:43 pm UTC
...The table will always be accessed by
giving value to "Status". ....
if that is true, you sound OK so far... local indexes will be OK since status='value' will always be used to do partition elimination. You might consider never binding against status column (all others - yes bind, not that one) so that you get the right plan for the right partition.
you might subpartition by date - letting the closed partition have many partitions and the others having just one. To make aging easier.
1) looks ok so far.
2) it will just involve a delete+insert when you modify the key, plan for that. It is neither bad nor good, it is just a fact.
RE: Design a table
York, October 27, 2008 - 5:58 pm UTC
This is an excellent thread. Thanks TOM.
Dakshinamurty Y posted a question related partition on Aug 25, 2008. He asked if it is OK to partition on status column.
My answer is NO, if the status column needs be to updatd (from open to closed / error). When the partition key is updated, you could get following error: "ORA-14402: updating partition key column would cause a partition change "
-------------------
create table t(c1 varchar(10), c2 varchar2(20))
partition by list (c1)
(
partition p01 values('A'),
partition p02 values('B'),
partition p03 values(DEFAULT)
);
insert into t values('A','1');
insert into t values('A','2');
insert into t values('B','100');
commit;
select * from t;
update t set c1='B' where c2='1';
-- error message: "ORA-14402"
October 27, 2008 - 8:41 pm UTC
enable row movement.
ops$tkyte%ORA11GR1> create table t(c1 varchar(10), c2 varchar2(20))
2 enable row movement
3 partition by list (c1)
4 (
5 partition p01 values('A'),
6 partition p02 values('B'),
7 partition p03 values(DEFAULT)
8 );
Table created.
ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> insert into t values('A','1');
1 row created.
ops$tkyte%ORA11GR1> insert into t values('A','2');
1 row created.
ops$tkyte%ORA11GR1> insert into t values('B','100');
1 row created.
ops$tkyte%ORA11GR1> commit;
Commit complete.
ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> select 'p01', t.* from t partition (p01) union all
2 select 'p02', t.* from t partition (p02) union all
3 select 'p03', t.* from t partition (p03)
4 /
'P0 C1 C2
--- ---------- --------------------
p01 A 1
p01 A 2
p02 B 100
ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> update t set c1='B' where c2='1';
1 row updated.
ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> select 'p01', t.* from t partition (p01) union all
2 select 'p02', t.* from t partition (p02) union all
3 select 'p03', t.* from t partition (p03)
4 /
'P0 C1 C2
--- ---------- --------------------
p01 A 2
p02 B 100
p02 B 1
design
A reader, October 28, 2008 - 1:29 am UTC
Tom:
If you are tracking information on customers, vendors, suppliers, etc and they all have same information (name, email, address ,phone ,etc) would you create one table for all with a flag to specifiy the record type (C, V, S) or would you create one table per entity.
I prefer one table each but i have seen many designs with the first approach.
October 28, 2008 - 7:06 am UTC
I would go with the option, "none of the above" really
You have a super type - that has all of the common attributes (name, email, address, phone, etc)
You have subtypes - they have the attributes specific to a customer, a vendor, a supplier.
You do not use an attribute to say "this is of type X", you use subtypes. The problem is - I can be a customer, a vendor and a supplier simultaneously.
The subtypes are modeled as separate entities and may be physically implemented either as separate entities (tables) or as optional attributes of the supertype (since there is a one to one relationship here)
Eg, either:
create table t ( KEY, name, email, address, phone,
<customer attributes>, <vendor attributes>, <supplier attributes> );
or
create table t ( KEY, name, email, address, phone )
create table customer( KEY fk to t and primary key, .... );
create table vendor( KEY fk to t and primary key, .... );
.......
table design
sam, October 29, 2008 - 11:17 pm UTC
Your one table solution with optianal attributes for each subtype matches with what I said (one big table)
But how are you going to filter all supplier or all customers if you do not have that "record type" flag?
We use that design in an old database but i find it sometimes diffult to deal with and we ended up with 100 column table.
Is it bad design when you have one table for customers, one table for suppliers. one table for employees, etc.
October 30, 2008 - 8:10 am UTC
with optianal attributes for each subtype matches with
what I said (one big table)
Nope, you said "have a flag", there will be no "flag"
...
But how are you going to filter all supplier or all customers if you do not
have that "record type" flag?
......
A customer must have some attribute that will not be null if they are a customer (else, all of the customer attributes will be null)
We use that design in an old database but i find it sometimes diffult to deal
with and we ended up with 100 column table.
then use the supertype/subtype tables - that was presented as option 2
Is it bad design when you have one table for customers, one table for
suppliers. one table for employees, etc.
not if you do it as I described - with a subtype/supertype table - you want the common attributes in a single table, do not be afraid of joins.
design
sam, October 30, 2008 - 10:29 pm UTC
Tom:
What do you mean by this:
<<A customer must have some attribute that will not be null if they are a customer (else, all of the customer attributes will be null) >>
Yes i have org_id (Sequence no) for customer. But how does this tell that he is a customer.
I am using ORG_ID for customer id, supplier id, employee id, etc.
I need ORG_TYPE = C to tell me this is a customer
ORG_TYPE = S to tell me this is a supplier
2. Can you tell me the problem with this design. is it from a reporting prespective.
customer (cust_id,cust_name,cust_email,cust_addr,cust_city)
supplier(supp_id,supp_name,supp_email,supp_addr,supp_city)
Assuming that a customer will never be a supplier and vice versa.
November 02, 2008 - 4:22 pm UTC
in a single table, you would have
create table whatever
( attributes_all_have,
attributes_of_a_customer, -- there must be a not null attribute of a customer
that is what I mean. there must be something that says "I am a customer", not a flag, some attribute of a customer.
what happens when a person is a supplier, customer and employee all at once?
when I went to college I was
a) a student
b) an employee
at the same time....
Primary keys selection
pranav, March 04, 2009 - 10:43 am UTC
Hi Tom,
I was not sure where to post a new question. So I am adding it here to the existing one.
I was trying to find what is the ideal selection of Primary keys -
Are numbers better than strings, if so why?
Does the length of number or strings matter when used as primary keys?
From a performance point if i use string as primary keys my number of join decreases, and if i use a surrogate key the number of join increases as my query criteria is string -
for example
table a - aid primary key, aname varchar2(30)
table b - bid primary key, aid foreign key (to table a), bname varchar2(40)
now the query comes for aname = 'abc' find all the b information. if i do it by one query than i need to join to table a by aid (which includes a join), however if i have the string as primary key (aname here), i don't need a join!
Hope it is clear
Thanks
Pranav.
March 04, 2009 - 1:41 pm UTC
If you have a natural key, use it in general (unless it is obviously not useful as the key - eg: it has 32 attributes, that would be not sensible). String or number or date - I don't care - just use it.
If you do not have a natural key, your choices would be
a) number - sequence populated.
b) raw(16) - sys_guid populated
probably (a)
In your example, is ANAME a natural key? That is
a) not null
b) unique
c) IMMUTABLE
if so, then the choice of a surrogate was wrong. It one of a,b,c is not true, then the choice of aid was right.
Oleksandr, March 05, 2009 - 3:42 am UTC
Natural keys have to be used with big caution - virtually any natural key may mutate with time, causing serious maintenance problems, especially when such mutations involve all or nearly all rows (e. g. to all taxpayers are issued identification numbers of a new kind, as this year in Germany).
March 05, 2009 - 1:50 pm UTC
... virtually any natural key may
mutate with time ...
nope
create table date_dim ( the_date date primary key, the_year, the_qty, the_fiscal_year, the_fiscal_year_qtr, the_day_in_the_year, .... );
the_date will never 'mutate'
create table doc( id number primary key, .... );
create table doc_version( id number, version, primary key(id,version) );
doc_version has a natural key....
There are many natural keys that do not change - and it was pointed out that the natural key must be immutable, never changing in value - if there is a chance it does, you do not have a natural key.
Reg. selection of primary keys
pranav, March 05, 2009 - 9:29 am UTC
Hi,
Thanks for the prompt response. Appreciated your time
Rgrds
Pranav.
the_date will never 'mutate' ?
A reader, April 14, 2009 - 5:10 am UTC
Yes, this is why I have written not "any", but "virtually any" - to underline that some rare exception may exist.
But under some circumstances even a date used as natural key may mutate - for example assume that data were timestamped by some device (e.g. cash register) with a internal clock and it appeared that clock was misaligned and so date values need to be corrected. Not a common scenario, but not completely impossible.
As for your second example - if a version number is generated by a system (an application or DB), then it is not a "natural key" in a strict sense, rather a surrogate key exposed to a user. If it is produced externally, then there is no guarantee that it would not change with the time (e.g. user has created versions numbers not in accordance with the company policies and was ordered to align them). As well not very common scenario - but not completely impossible.
April 14, 2009 - 11:40 am UTC
for a date dimension - it would NEVER CHANGE.
Think about it, think about the date dimension table, which is what we are talking about here.
You have a primary key of THE_DATE, other columns would be "THE_WEEK_YEAR", "THE_MONTH_YEAR", "THE_QTR_YEAR", "THE_YEAR" etc (things DERIVED from THE_DATE, functionally dependent data)
If the wrong "the_date" was entered, it was entered in the fact table, you would be updating a foreign key in the fact table, you would not be touching the dimension (except MAYBE to add this new THE_DATE value if it did not exist)
Table Design
Ravi, April 15, 2009 - 12:01 pm UTC
The required design is to store the addresses of a person.
Can you please explain the advantages & disadvantages of these two approaches
in terms of design, performance, maintenance, best practice ..etc.,
(pls add if I am missing any ...)
Approach#1 :
create table person(person_id number(3) primary key, name varchar2(10));
create table person_email_address(person_id number(3), address_id number(3), email varchar2(50))
alter table person_email_address add primary key (person_id,address_id)
alter table person_email_address add constraint fk_person_id foreign key(person_id) references person(person_id)
create table person_postal_address(person_id number(3), address_id number(3), postal_address varchar2(50))
alter table person_postal_address add primary key (person_id,address_id)
alter table person_postal_address add constraint fk_pperson_id foreign key(person_id) references person(person_id)
Approach#2
create table person(person_id number(3) primary key, name varchar2(10));
create table addresses (person_id number(3), address_id number(3), address_type_id number(3), address varchar2(500))
alter table addresses add primary key (person_id,address_id)
create table address_type(type_id number(3) primary key, name varchar2(50))
alter table addresses add constraint fk_type_id foreign key(address_type_id) references address_type(type_id)
Thanks so much....
Ravi
April 15, 2009 - 2:13 pm UTC
... The required design is to store the addresses of a person. ...
that is entirely insufficient to design anything.
but #2 looks like it more likely closer to correct - the first one, with a table per type of address - not a chance.
Why is this insufficient? Because there are likely more requirements - for example, you must store addresses AND a person must have at least one and at most one 'default' address, the address to which we mail stuff to. If that is the real requirement, I'd have different ways to do that.
Many to many design - performance consideration
A reader, July 27, 2009 - 3:43 am UTC
Tom,
In your recommended "many to many" design;
"med_to_dosage ( med_key references med, dosage_key references dosage, primary
key(med_key,dosage_key) );"
If I need to access this table both ways i.e. sometimes going in by med_key only, sometimes going in on dosage_key only, would you say make this table into an IOT based on the current PK and have an extra UNIQUE index on (DOSAGE_KEY,MED_KEY) in order to prevent having access to the "table" at all?
Or is there a better table structure you would recommend?
July 27, 2009 - 5:54 am UTC
probably not - an IOT is good if the underlying data either
a) is relatively static/read mostly as you would find in a report/warehouse database
b) doesn't need a secondary index
here you would need a secondary index and the data would be changing frequently. Secondary indexes would need constant attention to ensure they do not turn into two range scans for each range scan.
In an IOT, rows do not have rowids - secondary indexes on IOT's do not always have a rowid in the classic sense, they can become somewhat inefficient if the underlying data changes a lot.
How to balance the in-line and out-line storage for a LOB object?
jim Qian, July 27, 2009 - 2:50 pm UTC
Hi Tom,
There is a large column in my table. this column is RAW datatype and its size is 2000 bytes. I think if I change this column to BLOB datatype and put it outside the table
(out-line storage), I can get better performance because there is some related DML action with this table. Do you think this is a good idea?
July 27, 2009 - 8:30 pm UTC
doubt it - the lob locator is relatively large - not 2000 bytes, but large. And then you would have a lob index to maintain and a lob segment which would be huge and wasting space since each lob uses at least one CHUNK and the smallest chunk is a single block. So, instead of getting multiple raws on a single block, you would get a raw per block, a new index to maintain, and the work of traversing from the table block to the lob index to the lob segment to retrieve.
In the end - it comes down to this:
benchmark it before you do anything.
How to balance the in-line and out-line storage for a LOB object?
Jim Qian, July 28, 2009 - 12:29 pm UTC
Hi Tom,
Thanks for the prompt response. Appreciated your time.
I am continuing the same question.
I should give you more details:
The RAW datatype column is infrequently accessed. It is inserted and copied to other tables and then deleted. The Values of the raw column are not changeable and values are same in many tables (duplicated columns). It is only accessed by some applications.
July 28, 2009 - 8:53 pm UTC
did you read my response above? read it again, particularly the last line. I'll only repeat myself if I answer again.
Nice knowledge.....
Apisit, July 29, 2009 - 6:20 am UTC
more thoughts
Mark Wooldridge, July 31, 2009 - 4:22 pm UTC
Responding to Ravi's question of having a seperate table for different address types....
The values for a postal address are different than the values of an email address (i.e. the postal address will probably include street names, city, state and zip codes while an email address has a name and a domain).
Just something to think about when needing to query the data and answer questions like how many domains do I have in emails addresses or how many people by state do are there.
August 04, 2009 - 12:26 pm UTC
? I'm not sure how to rectify "email address" and "postal address" as to me they are as different as "first name" and "date of birth"
Expert Oracle By Design: Design for Writes or Reads
A reader, January 22, 2010 - 4:20 am UTC
If i understood correctly you specified at page 38 of Oracle by Design that when every user inputs his appointment you created that many records in tables for example if user inputs appointment from 05-02-10 to 19-02-10 with repeat type weekly and interval 1 you will create three records for dates 05,12,19.
Now if you have created these three records then when user comes to update his appointment and wanted to change interval to 2 then he has to update all three records or your application will internally update all these three records.
January 25, 2010 - 1:45 am UTC
of course, we would update them.
the user can either update the current record only, the current record and all future records or all records - just like on the palm pilot. We do the rest.
A reader, January 25, 2010 - 7:55 am UTC
then if the user has created appointment for a year and he wanted to update it then we have to update 365 records?
January 29, 2010 - 8:22 am UTC
yes, so?
You update an appointment once, you read them thousands of time.
The major thrust of what was said in the document was that - this system needs to be optimized for READS, and it is running on a system that has tons of memory and scads of diskspace.
The point was that the storage technique you used on a palm pilot to store repeating appointments was great for the palm pilot but it STANK for most anything else. on the palm you had
a) very very very small memory - in kilobytes or sometimes a whopping megabyte.
b) very very slow processors, but all dedicated to a single user
c) no need to in an ad-hoc fashion
d) a LIMITED amount of history - the data for old appointments was rapidly disappeared
On the database you had
a) plenty of memory
b) pretty fast processors
c) the need to ad-hoc query CONSTANTLY
d) the need to maintain history forever
e) the need to show what was happening today, this week, this month billions of times per day (not a single user)
f) the need to schedule groups
g) and a lot lot more
The needs were extremely different! Try to write a SQL query to give you all of the appointments this month - in rows - if what you have is a single record? It is hard, it is inefficient.
The right approach was to take the single record upon insert and..... make it be as many as you needed. Turn it into rows. No, display is trivial, search is trivial. Finding out what 20 people have scheduled at 3pm next tuesday - easy as can be.
Store a single row for repeating things and all of the above become a nightmare (enter the middle tier, we'll just do everything there and we'll need 2,000 of them to do it!!!)
Updating 365 rows - takes how long? (answer - as fast as you blink).
Querying data that isn't there (because you hid it in a fancy dancy structure)? (answer - longer than you can imagine)
David Aldridge, January 27, 2010 - 3:08 am UTC
I like the "create every occurance" method, but one problem would be the difficulty of supporting repeating events with no end date (however unrealistic that is in real life, it is often supported as a user convenience -- nobody wants to give a date at which they want to stop being reminded of their grandfather's birthday for example), or very frequent events (eg. brush teeth three times a day for the next 80 years). I think that it leads to a disproportionate amount of storage usage -- imagine that a user testing the system creates an event with a few hundred thousand occurances. that's your high water mark pushed out a fair bit.
Also, each event record still needs to be associated with a record that gives the specification of the original event and its repetition pattern so that they can all be updated together if the original specification is changed.
Storing every occurance would do for a first pass, but I think a more complex solution would be needed for a full-featured calendar.
January 29, 2010 - 3:04 pm UTC
... Storing every occurance would do for a first pass, but I think a more complex
solution would be needed for a full-featured calendar. ...
but it didn't, it worked for years for thousands of users. We set a reasonable maximum of "calendar goes out 10 years" and then had a job that would fill out the results into the future on a periodic basis.
and since it is a database, running such a job inside the database was trivial :)
My current palm pilot goes for a maximum of 2031 - 10 years floating out wasn't a problem at all.
A reader, January 27, 2010 - 5:55 pm UTC
Hi Sir;
Is there a side effect for using "number" datatype for "date" column?
If so, what is the disadvantage?
January 29, 2010 - 3:24 pm UTC
yes, it ensures you will someday have the date 29-feb-2001 in your database
which of course isn't a valid date.
please think about this, what GOOD could come from using the wrong datatype?
Maybe I'll use a varchar2 for your salary field - just yours. So I can stick in there 'one million dollars', which looks really neat but tends to get you a zero paycheck.
To "a reader"
Adam Hawkes, January 29, 2010 - 10:20 am UTC
Data integrity would be the #1 reason to use the proper data type. That's kind of important, you know. There are probably a million "bad" side effects, and ZERO positive ones.
Heck, why not make everything VARCHAR2(4000)?
sigh
A reader, February 11, 2010 - 4:58 pm UTC
Is there any performance overhead of definining
using bigger datatypes, such as varchar2(4000) rather than varchar2(40)
February 16, 2010 - 11:12 am UTC
sure
sometimes you cannot index them - we presume they are too large. That could be a problem.
a client program given:
select c1, c2, c3, ... c10 from t
where the average length of c1..c10 would be 40 characters (some are 120, some are 10 and so one) would have a maximum of 40x10 = 400 bytes. Using an array fetch size of 100 rows (typical size), that would be 40,000 bytes of fetch buffer. Now, do the math for a 4000 character average size (4000*10*100 = 4,000,000). That could have an effect on you.
a client program describes the query and sets up a report based on the size - that could be a problem
forget the performance argument though, I would laugh at you if you brought it up as a reason to use or not use 4000 for everything.
If you used it as an argument for using 4000 (premise: no performance hit, let's use 4000), then I would know you've never really done data processing before, you've never maintained code over the long haul - in short, you haven't had any experience yet.
If you used it as an argument for NOT using 4000 (premise: there is a performance hit, do not use 4000 for that reason), then I would know you've never really done data processing before, you've never maintained code over the long haul - in short, you haven't had any experience yet.
(yes, same reason.... for both)
Because the real answer is: you use the right type for your data, you use constraints to validate your data - and a datatype is nothing more than a constraint and your scale/precisions (lengths) are too. You do it because it is the proper, professional, correct, useful thing to do.
Remember, you can always make the length larger with a simple alter in no time at all. You cannot do the other way (make them smaller, correct the mistake) without a reorganization in most every case. And, if you make them 4000 characters, you'll have 4000 characters in some field, some day.
table design
A reader, February 16, 2010 - 11:54 am UTC
Excellent!
Variant record tables and PKs
Gav, April 13, 2010 - 11:05 pm UTC
Tom,
Some previous advice in this thread where John from New York had 'variant' type records asked whether it was better to:
1. Create one ALERT_RECORD table, and create columns that will be needed to
capture different types of alert detail information. ...
2. Create one ALERT_RECORD table for each alert type....
Your advice:
#1 is the easiest
#2 can be thought of as a variation on #1 -- think "views"
Is it valid to set up one main table and a view for each variant, but without a PK on the main table?
The views can be given 'virtual' PKs as follows:
CREATE TABLE record_table
(variant_type,
variant1_id,
variant2_id,
variant3_id,
variant1_data,
variant2_data,
variant3_data)
Create a check constraint to make sure that there is a valid record id for every record:
CONSTRAINT record_table_constraint CHECK
(CASE
WHEN variant_type in (variant1_type) and variant1_id IS NOT NULL then 1
WHEN variant_type in (variant2_type) and variant2_id IS NOT NULL then 1
WHEN variant_type in (variant3_type) and variant3_id IS NOT NULL then 1
ELSE 0
END =1),
Create a unique index on each of the variant_ids. A unique index will make sure that each variant_id is not repeated, however since it is only based on 1 column then it will allow NULL values for when the data is of a different variant.
CONSTRAINT variant1_ix UNIQUE (variant1_id) USING INDEX,
CONSTRAINT variant2_ix UNIQUE (variant2_id) USING INDEX,
CONSTRAINT variant2_ix UNIQUE (variant2_id) USING INDEX;
Create views on this table to show the data for each particular variant.
CREATE VIEW variant1_view AS SELECT
variant1_id, variant1_data
FROM record_table WHERE
variant1_id IS NOT NULL;
repeat for 2 & 3...
There is no PK on the main table, but each view has a unique, non-repeating key field.
The only real PK that could be used would be a surrogate one, but I can't see the point in the overhead if it's never going to be used. Most guidance says to put a PK in, but is it valid to omit it here?
Thanks,
Gav
April 14, 2010 - 9:03 am UTC
My point was - always 1 table, if you need the illusion of more than one table - use views on that single table.
But every table - every. single. table. will have a primary key.
I don't understand what you are trying to accomplish with this "virtual" thing?
why would you have variant ids????????????
You are missing the design, the table would be:
alert_records(
primary_key ...,
common_data ...,
data_for_alert_1_type,
data_for_alert_2_type,
.....
)
there would be ONE key, one common set of data and then the data bits would be filled in as appropriate.
Variant Record Views and PKs
Gav, April 14, 2010 - 6:30 pm UTC
Sorry Tom, I thought I'd explained everything well enough..but it's easy with hindsight to see where the holes in my explanation are.
The application provides natural ids (the variant_ids) for every record - they are guaranteed unique and so would be enough to be the primary key if each record type were in its own table. So if we were using a table for each record type we would not need surrogate keys.
But when all the record types are in 1 table (which they are, with a structure much as you suggest), there is no natural primary key - I would have to create a surrogate one.
However, when the views are set up, they still have their own ids present and so every row in each view has a not-null, unique identifier (what I called a 'virtual PK') even though the underlying model does not actually have a primary key. I was wondering whether that would be enough without putting a surrogate primary on the underlying table.
I can certainly create a surrogate key....and given your statement (every.single.table) it sounds like it would be wise to do so.
Thanks,
Gav
April 15, 2010 - 8:05 am UTC
... The application provides natural ids (the variant_ids) for every record - they ..
then, the application is coded incorrectly - they are variant records, they are all of the same type. If you did this with many tables, it would look like this:
create table alert_common( id number primary key, data );
create table alert_1( id references alert_common primary key, data );
create table alert_2( id references alert_common primary key, data );
and so on - they would all be pointing to a common table by its primary key.
so, the model you are considering is wrong in many ways - the key would be unique across ALL the subtypes - because they are subtypes of a supertype and the supertype is the holder of the key.
table design
sam, April 15, 2010 - 11:23 am UTC
Tom:
it depends (like you always say).
There are some cases that provide a good reason to split 1-1 table.
For example, if you have a master EMPLOYEE table with 500,000 records.
A new requirement for NEW EMPLOYEES needs 30 more fields that that does not apply to old employees
It make sense to create another 1-1 table and preserve current performance in queries.
Another consideration is data that isn't present on all records. This is tough to judge sometimes though, because different database systems handle missing data in different ways. So it could be less efficient to add extra columns to the main table if many of them will be empty, rather than adding a new table for that data. But it might not save much either.
I find it sometimes very confusing to look at a 200 column table. I understand you can create views, but we mostly update tables directly. We do not use views for inserts/updates. If you add views, there is more maintenace and more objects and you may end up with hundreds of VIEWS for inserts/updates.
April 15, 2010 - 12:31 pm UTC
it would not make sense for performance reasons.
The existing records would be 100% unaffected by this - they would not change at all, trailing null columns consume 0 bytes of storage.
All you would achieve by using two tables would be to damage performance going forward. You would have to JOIN to this new table - presumably for both OLD and NEW employees since you don't know before you GET a record if they are old or new. And - one would presume that going forward - OLD records will be treated like NEW records - that is, eventually many or most of the records will have all fields
So no, I 100% disagree with you on this.
And please remember, I did say "in almost all cases",
never say never
never say always
I always say
but - your example - no, it demands a single table.
Another consideration is data that isn't present on all records.
by definition, in our example above, that is definitely
a) the case (the data isn't present on all records, that is the CASE)
b) the point (that is what we were talking about)
and for a subtype/supertype relation - you have that.
I find it sometimes very confusing to look at a 200 column table. I understand
you can create views, but we mostly update tables directly.
I'm always confounded by statements like this. It is akin to saying:
We know there is a way to solve our problem, but we purposely refuse to do it.
If you add views, there is more maintenace and more
objects and you may end up with hundreds of VIEWS for inserts/updates.
did you think about that before you wrote it???
What do you think might be harder to manage?
a) hundreds of tables with their associated storage, an ongoing maintenance thing
b) hundreds of bits of metadata that need no looking at after they are created
I don't know about you - but I would choose (b) in a heartbeat.
Somehow, I knew this was Sam without even reading the name...
table design
Upkar, May 17, 2010 - 3:37 am UTC
Tom,
I have a similar problem as "Gav from Australia" and "John from New York":
The table structure suggested in earlier posts is:
alert_records(
primary_key ...,
common_data ...,
data_for_alert_1_type,
data_for_alert_2_type,
.....
)
My problem is in a different domain(items/tasks), but explained using the alerts analogy. Here's what is required
1) In my application, users need to be able to perform SQL operations from the UI using a Seach screen. The SQLs can be on common fields as well as on alert specific fields (in the latter case users will choose alert type on screen drop down). These SQLs need to be fast and storing alert specific data in XML may not work out simply because of the XML parsing overhead involved (for the search).
2) There are 11-12 known alert types at this stage in design, each of which can have 30-40 unique fields.
Will the sum total count of alert specific fields required in my application go over 1000 in future? Well, it might in future if more alert types are added...
3) I have UIs where common attributes are displayed to the users. Also, user can select an alert id, and will then be able to drill down on individual element's details (each alert detail page on the UI will be configured differently).
4) A few of the alert types may have more fields in common than others.
In such a scenario, what would you suggest other than using any of the three approaches below:
1) Separate tables for each alert type
2) The alert_common option
create table alert_common( id number primary key, data );
create table alert_1( id references alert_common primary key, data );
create table alert_2( id references alert_common primary key, data );
3) Have generic attributes - attribute_1, attribute_2 etc. each attribute has a different context for a different alert type. But then, i'm forced to have large number of columns with generic data types and lengths (with their original meaning obscured).
I am thinking of having alert_common/alert_1/alert_2... tables kind of design. From my understanding of your replies earlier, we should always have one table - and use views on that single table. But, that may not be useful here. Please let me know your thoughts.
Thanks
May 24, 2010 - 7:35 am UTC
1) xml doesn't have to be parsed at runtime, you can in fact index it - store it in binary parsed form - etc.
But you know what this ultimately means? It means you must have some idea at application compile time (install time) what to index etc - that is, you might have to actually figure out what elements you need to store...
2) ok, that would ultimately force you because of a 1000 column limit into multiple tables. Unless you multiplex those 30/40 attributes into an xmltype which can be (see #1)
3) ok, not relevant to the discussion
4) ok, see #3
Now, to the 'questions'
1) I'd use a single table for as long as I could. As stated.
2) see #1
3) not a fan
Alter table
satin satin, June 08, 2010 - 2:14 pm UTC
Hello Tom,
I dont know it is right place to post this question.
We are using Oracle 10 g. My question is there any strong reason we cannot add the column in the middle of table. It In any way it is good for readbilty sake.
Let us say I have cust_name,addresss ,city all together and cust_state at the end after 60 columns and added this column after it deployed to production.
Withoout using select or drop table clause can we re-arrange them ? Why cannot we do this in oracle
Thanks
June 10, 2010 - 10:24 am UTC
there is a strong, good, valid reason. It is simply:
because no such facility exists.
This is what views are all about. First - there is really no such concept as "order of columns in a table" - it just happens that 'select * from t' will resolve the columns in order of creation (by accident, you cannot really rely on that, it is not a documented fact, it is by empirical observation that we notice this)...
Second, you use views to present to users whatever 'view' of reality you would like.
create table t ( x int, z int );
alter table t add y int;
create view v as select x,y,z from t; <<<<=== let people use this.
if you are worried that the table name should stay the table name - then just:
rename table t to t_table;
create view t as select x,y,z from t_table;
and they'll never know.
reader
A reader, June 09, 2010 - 5:33 pm UTC
Few years ago at this site, I found a dictionary view ( I think ALL_*) that lists information very similar to "desc table_name" format for all the tables (not dba_tab_columns)
I am not sure if this is the right thread, but I do not seem to remember the view
I like to find the name of the view, if you could help
Thanks
Design a table
A reader, March 29, 2011 - 1:14 am UTC
Hi Tom,
I have a situation where i have to design a table ,where in future there will be lots of insert and delete but no updates, so what things i need to consider while designing a table.or we can take 3 different cases .
a) design a table for lots of insert and delete only.
b) design a table for lots of selects only .
c) design a table for lots of insert,delete,updates only but no selects .
Can you give me the few points for each scenerio.
Thanks.
March 29, 2011 - 4:25 am UTC
a) if the table is going to hit a steady state size, nothing much needs to be taken into consideration, nothing special. If the number of inserts far exceeds the deletes, you probably want to consider partitioning from the beginning for ease of administration as the table will get larger and larger over time.
b) nothing - beyond the normal. Do you want to partition, do you want to cluster, do you want to IOT it, do you want to compress it, etc. The same thing you do for every table of course :)
c) don't keep that table, it is not useful - you never select from it????
The considerations are pretty much the same for all of them - you think about pretty much everything, all of the time.
Need more info
Rick, March 30, 2011 - 9:54 am UTC
We get it, lots of inserts and deletes; however, what is anticipated select activity. What is the *main* type of access to the table your business will use? And remember: You can write fast, you can read fast, but you can only do both half fast! If main use will insert/delete, then fewer indexes and selects will be slowed somewhat. If main use is select, then more indexes and insert/delete will be slowed somewhat.
April 12, 2011 - 9:14 am UTC
you can read and write fast? why couldn't you?
design tables
sam, June 27, 2011 - 4:54 pm UTC
Tom:
I want to store invoice information. I am creating 2 tables: one for invoice header data (i.e invno, date, contract, status, total amount)
and invoice_line_items (i.e invno, magno, prstage, unit cost, units, total_amount)
Is it ok to store unit costs and units in the detail record after lookup from a contract lookup table or it is more standard to link to the lookup table for this data?
I am concerend the data may change during the year and I want to store the values at the time invoice was created.
Also, is it ok to store Invoice Line Item Amount in the detail table and invoice total in the invoice header table or that has to always be calculated since it is derived data?
June 28, 2011 - 10:54 am UTC
Is it ok to store unit costs and units in the detail record after lookup from a
contract lookup table or it is more standard to link to the lookup table for
this data?
it depends.
IF the unit_cost and units are to be "as of the point in time you query" (meaning, if the invoice was created today - and you read that invoice out next year - the unit_cost should reflect the price as of that point in time - a year from now), then you should leave the unit cost and units in the lookup table (to avoid having to update cascade)
IF the unit_cost and/or units are to be fixed as of the point in time the invoice is filed - then you would want to carry them over - so that a year from now, when the price of something goes up - you have saved the value that really applies to the invoice.
So, probably, you need to carry that stuff over as the prices are probably fixed as of the point in time you cut the invoice - whereas the lookup reference price will probably fluctuate over time.
table design
sam, June 28, 2011 - 3:02 pm UTC
Tom:
The customer claims it hardly ever changes but I am pretty sure they will change it in a few months. so i am going to carry it over to the invoice table as a safety net.
I can't see how UNITS and UNIT COSTS would not be "Fixed" at the time you invoice unless there is a case for correcting those data elements later on.
What about the Line Item Total and Total invoice amount? Is it ok to store those derived values or you have to compute that everytime someone queries the invoice?
June 28, 2011 - 4:16 pm UTC
if it changes *ever* - and you need the OLD value to be applied to the invoice, the value that was in place should be copied.
You could take a slowly changing dimension with effective dating as well, but I think that is unnecessarily complicated.
Knowing what I know - meaning, I only know what you've told me so far - I'd be inclined to move the cost information into the invoice.
I'd change my mind if the requirement became
"we sometimes change a price and need that change to be reflected in the existing invoice, and we sometimes change a price and do not want that change reflected in existing invoices".
If that were true, then we'd look at effective dating the records and doing a join to get the unit cost.
design table
sam, June 28, 2011 - 10:24 pm UTC
Tom:
Great help on the desing but what about this
What about the Line Item Total and Total invoice amount? Is it ok to store those derived values or
you have to compute that everytime someone queries the invoice?
June 29, 2011 - 12:01 pm UTC
I would tend to compute those as needed as invoices are typically updatable. Derived information, in general, should not be pre-computed and stored in a live system. In a warehouse, in a reporting system - maybe (materialized views are good for that)
pls think
Venkata, June 29, 2011 - 5:26 am UTC
@Sam:
Just think before you shoot a query. precomputed values need columns to store and so is the disk space required.
@Tom:
Does the query performance vary depending on the no.of columns in a select statement/ a table
June 29, 2011 - 12:13 pm UTC
well - the space isn't the issue, space is cheap. The fact that the underlying data is mutable is what you need to be concerned with. There are pro's and con's to precomputing the values - and in this particular case, I would lean towards not precomputing - but in others I might lean towards it. It depends.
Yes, the query performance may vary widely - that should be obvious. If you need to full scan a table with 1,000 char(2000) columns - that'll take longer than full scanning a table with 10 number columns. If you need to select the last column from a table with 1000 columns via an index - that'll take longer than fetching the first column from that table with 1000 columns (we have to read the entire row to get the last column - we only read the first column to get the first column). If you need to return 1,000 columns from the server to a client - that'll take longer than retrieving just the first column. If you replace a 'select * from table' with a 'select just,the,columns,you,need from table' - you might find that we can get everything we need from an index instead of having to go to the table. And so on.
Table Design for Address Entities
Nand Kishore Sagi, June 29, 2011 - 6:11 am UTC
Hi Tom,
While on the table design, we have a situation where we need to store multiple addresses (Minimum:1 Address Maximum:8 Addresses to be stored). For example I need to store these two addresses
(1.)500 Oracle Parkway,
Redwood Shores, CA
94065
(2.)1910 Oracle Way,
Reston, VA
20190-4733
The addresses belong to a single individual by name Oracle :). The tables needed to store these addresses can be designed in at least two different ways as shown below.
Option 1CREATE TABLE ACCT_ADDRESS
(
accountId NUMBER NOT NULL ,
addressType NUMBER ,
zip VARCHAR2(10 CHAR) NOT NULL ,
zip4 VARCHAR2(4 CHAR)
);
CREATE TABLE ACCT_ADDRESS_LIST
(
accountId NUMBER NOT NULL ,
addressType NUMBER,
addressSeq NUMBER NOT NULL ,
address VARCHAR2(100 CHAR) NOT NULL
);
The two addresses when stored would look like this.
ACCT_ADDRESS Data1,1,'94065',NULL --HQ Address
1,2,'20190','4733' --BR Address
ACCT_ADDRESS_LIST Data1,1,1,'500 Oracle Parkway' --First entry of the HQ address
1,1,2,'Redwood Shores' --Second entry of the HQ address
1,1,3,'CA' --Third entry of the HQ address
1,2,1,'1910 Oracle Way' --First entry of BR Address
1,2,2,'Reston' --Second entry of the BR address
1,2,3,'VA' --Third entry of the BR address
Option 2create table ACCT_ADDRESS
(
ACCT_ID number not null,
ADDRESS_TYPE number not null,
STREET varchar2(100 char) not null,
CITY varchar2(30 char) not null,
STATE varchar2(2 char) not null,
ZIP varchar2(5 char) not null,
ZIP4 varchar2(4 char),
constraint UK_ACCT_ADDRESS UNIQUE (ACCT_ID, ADDRESS_TYPE)
);
In this scenario the data in ACCT_ADDRESS would look like this.
ACCT_ADDRESS Data1,1,'500 Oracle Parkway','Redwood Shores','CA','94065',NULL
1,2,'1910 Oracle Way','Reston','VA','20190','4733'
The apparent advantage with option 1 is that it can accommodate non standard addresses also but the disadvantage would be on processing the data. Whereas with option 2 processing of data would be easier but the flexibility of storing non standard addresses is lost.
The source of this data is a new GUI screen that is yet to be developed for our application.
I would like to know as to which of the above two options is the optimal solution from application performance perspective that is while storing as well as retrieving the data. Can you suggest any other better alternative? We are currently using Oracle 10GR2 as our database.
Thanks a lot for your time and help Tom.
This site has been really helpful to me over the period of years.
Thanks and Regards
Nand Kishore Sagi
June 29, 2011 - 12:16 pm UTC
typically the attribute you call "street" in the above would be used to store non-standard stuff like PO box, whatever.
I would think the second approach is the one you would be looking at.
Address tables
Carsten, June 29, 2011 - 10:47 am UTC
Hi Nand Kishore Sagi,
your first option is known as EAV model. Search for it and you will know which option and why Tom will only suggest for real applications (Hint: it is not #1).
Regards
Carsten
design table
A reader, June 29, 2011 - 4:37 pm UTC
Tom:
Thanks for the advise on not storing the invoice total (derived data) in a live system. There should not be any performance hits in deriving that since calculation is only per invoice and I dont have to check other invoice transaction like calculating Quantity on Hand where you have to scan thousands of old transactions sometimes to get your answer.
The comment someone made about not storing a derived column because it takes disk space was very stupid. Oracle 11g has a new column type for derived columns.
If this invoice is submitted to you and you review it and timestamp it and approve it and assign your own internal voucher/report unique number to it, would you store the data for this transaction in separate table "vouchers" or you would ammend those columns to the INVOICE table.
The relationship is ONE-TO-ONE but I think both designs are valid. Can you confirm?
June 29, 2011 - 5:38 pm UTC
Oracle 11g has a new column type for derived
columns.
that is a virtual column and it does not consume storage - it doesn't precompute any thing - it is very much like having an expression in a view.
So, your rebut to them isn't really a valid one.
I would probably put the attributes in the invoice table. I tend to not use two tables for a one to one relation in general
table
A reader, June 29, 2011 - 10:32 pm UTC
TOm:
I was not trying to rebut them on the storage issue.
IMO, it is very stupid to make a decision on whether to store derived data or not because of storage. Someone else can come in and say calculations use CPU cycles, RAM, disk reads, etc so why run the calculation 1000 times every time the invoice is queries when you can do it once and store it.
It is similar to telling someone to not give birth because the newborn will need water, food, and oxygen or to not drive a car because that consumes gas which costs money and increases pollution.
thanks for your help!
table
A reader, June 29, 2011 - 10:39 pm UTC
TOm:
<<<If you need to select the last column from a table with 1000 columns via an index - that'll take longer than fetching the first column from that table with 1000 columns (we have to read the entire row to get the last column - we only read the first column to get the first column). >>>
it is interesting to know that. Why do you have to read the entire row when i select one column only if reading from the table?
basically then the performance for
T1
-----
col1
...
col1000
T2
-------
col1
.....
col10
might be different for this query
select col2,col3 from T1
versus
select col2,col1000 from T2
assuming they are not read from an index.
June 30, 2011 - 1:43 pm UTC
Why do you have to read the entire row when i
select one column only if reading from the table?
because we don't store rows in fixed width records, they are stored sort of like this:
<null indicator|length_field>data<null indicator|length_field>data........
In order to get the 2nd column you have to read over the first - we know where the row starts and ends - we don't know where the i'th column is until we parse over the (i-1) columns.
actually the performance of
select col2, col3 from t2
could be different from
select col2, col1000 from t2
you don't even need two tables. And, when you compare:
select col2,col3 from T1
versus
select col2,col1000 from T2
you cannot tell - just given the information you gave - which will be faster, you need more data points :)
Re: Address Tables
Nand Kishore Sagi, June 30, 2011 - 4:50 am UTC
Hi Tom,
Thanks a lot for your inputs. I have forwarded it on and hopefully should know regarding the change or no change soon.
@Carsten: Thanks a lot for the pointer on researching EAV model. I just looked at EAV details in Wikipedia and I totally agree that address details are not clinical medical records which actually benefit a lot from EAV model. A new thing learned for the day :). Thanks
Thanks and Regards
Nand Kishore Sagi
you are correct
Venkata, June 30, 2011 - 5:33 am UTC
Dear Sam/Reader,
you are correct to have a virtual column but as Tom said it generally depends on the need. no hard feelings please
Response to "A reader"
Centinul, June 30, 2011 - 7:31 am UTC
> it is interesting to know that. Why do you have to read the entire row when i select one column
only if reading from the table?
Essentially, there is no "index" that points to the exact location of a column in a row. You should review the "Logical Storage Structures" section of the Concepts Guide, specifically section on "Data Block Format"
http://download.oracle.com/docs/cd/E11882_01/server.112/e16508/logical.htm#CIHEIFJC When you access a row in a block there is a row directory that contains the location in the block of the row you are interested in. Once you locate the row you have to traverse through all the columns prior to the one you are after because:
1. There is no "column index" as I mentioned
2. The length of each column can be different.
The row format looks like this for columns
<column 1 length><column 1 data><column 2 length><column 2 data><etc>
So we only know how to get to column 2 based on the length of column 1. Column 3 based on the lengths of column 1 and column 2.
Technically speaking it's more resource intensive to access the 1000th column versus the 1st but I wouldn't think it would be noticeable. Christian Antognini discusses this exact scenario in his excellent book: "Troubleshooting Oracle Performance." Here is a link to a Google Book Preview for this section of the book:
http://books.google.com/books?id=b3DIkYO2gBQC&pg=PA527&lpg=PA527&dq=Christian+Antognini+%22column+order%22&source=bl&ots=yUPyEEbevG&sig=lT4oQCVsNEJ-5Se3MoLQLbl_T-M&hl=en&ei=mmsMTtX1M6Hj0QGDnqWXDg&sa=X&oi=book_result&ct=result&resnum=1&ved=0CBkQ6AEwAA#v=onepage&q&f=false
Hopefully I've helped shed some light on this.
History table design,
A reader, July 27, 2011 - 3:58 pm UTC
Hello,
In our data model, all of our FACT tables have HISTORY tables. We need to store any changes made to a record in FACT along with its timestamp. We also want to know fow how long record remained with that change.
For example (consider sales and sales history table):
sales_id=1001
region=US
zone=central
state=tx
zip=77777
The same record exists in History as well. However, the plan is to have two extra columns in history table.
sales_id=1001
change_id=1
region=US
zone=central
state=tx
zip=77777
curr_timestamp=12-jul-2011 12:24:34
next_timestamp=null
Say, the sales ID 1001 record gets updated on 14-jul-2011 12:24:34
The existing record's next_val is changed to 14-jul-2011 12:24:34 and a new record with changes gets inserted into hist table. This record will have next_timestamp as null until the next change in the fact table happens.
The application team wants two date attributes in each history record to find out the time duration for each historical changes (in the above case change_id 1 remained unchanged for 2 days).
There will be tens and millions of DMLs on FACt table each day. Considering the enoromous amount of transaction, the database will perform 2 IOs on Hist table: 1 for insert and 2 for updating the next_ts for the most recent record. The advantage of this method is the reports that runs out of history table find easy to get the difference between curr_ts and next_ts since they both are in the same record. Over all for every DML happens in FACT, there are 3 transactions - 1 on FACT table (update statement), 2nd on History table (inserting the changed record) and 3rd updating the next_ts of the previous record. History will eventually contain billions of rows.
Consideing performance bottleneck in mind, is this a good idea or any other proposal you suggest?
Thanks,
Re: History Table Design
Narendra, July 29, 2011 - 5:21 am UTC
August 01, 2011 - 11:32 am UTC
Can you be more specific. I didn't (at a glance) see anyone saying "do not use"
I see someone questioning the use of it for MOST ALL queries - asking "why is that the case"
multiple lines insert into same row
A reader, August 02, 2011 - 9:02 am UTC
Dear Tom,
In our production database, Client has a specific requirement of having a column for "Remarks", where there a data should not be in a single line, but in multiple lines. Example:
Table: My_data
Empid Salary Remarks
1112 20000 This is special customer,
and he is good.
1113 30000 This is premium customer,
and he is clever.
August 02, 2011 - 11:30 am UTC
ok, so what is the issue?
Just insert text with newlines in it - it works fine. It is up to the client application to correctly display the text - but most should work OK if you just leave newlines in it.
multiple lines insert into same row
A reader, August 02, 2011 - 9:02 am UTC
Dear Tom,
In our production database, Client has a specific requirement of having a column for "Remarks", where there a data should not be in a single line, but in multiple lines. Example:
Table: My_data
Empid Salary Remarks
1112 20000 This is special customer,
and he is good.
1113 30000 This is premium customer,
and he is clever.
How to do it?
multiple lines insert into same row
A reader, August 02, 2011 - 11:49 am UTC
Dear Tom,
We even tried for it. But we are able to retrieve the row from development database running in an windows server but not from our production database running in Linux server. Please help.
August 02, 2011 - 11:52 am UTC
you'll need to be much more precise.
you don't say how you retrieve this data.
you don't say what client you use.
you don't say what is presenting this data.
and you do know that newlines are different on windows than on unix right? You might have to accommodate for that in your code - nothing the database can do - it is purely a windows thing.
use dump() on the data.
ops$tkyte%ORA11GR2> create table t ( x varchar2(30) );
Table created.
ops$tkyte%ORA11GR2> insert into t values ( 'hello
2 world' );
1 row created.
ops$tkyte%ORA11GR2> select dump(x,16) from t;
DUMP(X,16)
-------------------------------------------------------------------------------
Typ=1 Len=11: 68,65,6c,6c,6f,a,77,6f,72,6c,64
^ that is a newline, you are looking for A
see if your data has the newline.
multiple lines insert into same row
A reader, August 02, 2011 - 12:02 pm UTC
Dear Tom,
We are using a .Net application with oracle database and we are using the same insert/update statement for both Linux and windows database in same type of table structure. But whenever we try to retrieve the data it says "unable to retrieve". But if we specify a single line entry in "Remarks" column, then only it can be retrievable from that Linux. What could be the reason?
August 02, 2011 - 2:01 pm UTC
unable to retrieve is NOT an error message of ours, you'll need to diagnose this a bit further - this is your code now hiding the true error from us.
Hopefully your developers log these real errors for you somewhere.
get into sqlplus, dump your data, see what you see - this isn't a database issue - this is a bug in your developed application.
virtual column implementation
Kannan, August 13, 2011 - 7:19 pm UTC
Dear Tom
Just to clarify , is it true , that "virtual columns are always implemented as an index on the expression used ", and as such an update penalty is implicit . If this statement is true , by creating another index on this virtual column , we only favour sql down the line so as to use this column by name.
As a rider on the above , can the update penalty be reduced if we use a cached function to implement the virtual column , and the data we have favours it .
For example a pure deterministic function taking three arguments , would always return a value , and group by operation performed on the arguments , results in small resultsets counts as a percentage of the whole table, and is there a way to estimate the result cache size that would be required for a result set with known characteristics.
August 14, 2011 - 8:31 pm UTC
is it true , that "virtual columns are always implemented as
an index on the expression used "
NO, nothing could be further from the truth. Where did you read such a statement?
virtual columns are simply stored expressions, they consume no storage - they impose no insert/update/delete penalty.
A reader, August 15, 2011 - 1:29 pm UTC
so will the store expression evaluated on fly, when running query which use that expression?
thanks
August 16, 2011 - 1:58 am UTC
yes, when that column in a given row is accessed - it is evaluated.
blocks and latches,
A reader, September 07, 2011 - 5:09 pm UTC
Hello,
We have a table which is small in size (few thousand rows) but the number of users accessing the table are large (again in few thousands at a given point of time - this is a table belonging to an online app).
From performance point of view, one set of people are suggesting to have very few rows per block (one row per block) that way the contention in a block is reduced drastically. I am on the side that suggests fewer blocks would work better.
There are two things that I can think of: 1. row level locking when a user updates a record. Irrespective of number of blocks alloted for the table, the other user has to wait to make a DML on the same record until the previous user commits or rollback. Here number of rows per block doesn't even matter 2. Number of latches. If there are few blocks in a table, the user process would be waiting for latches held by previous users but if data is spread in lot more blocks, the chaces of latch wait comes down
What would you suggest? Fewer blocks or lot of bloks considering huge number of users accessing the table for DML purposes at the same time.
Thanks,
September 08, 2011 - 5:11 pm UTC
1) correct
2) maybe. Blocks are held in lists, the lists are held in an array. We take a data block address (DBA = file.block) and hash it, that tells us what slot in the array has the list that our block will be found on. There is a chance that all blocks in the sparse table could hash to the same list while none of the blocks in the small table do. there is a chance that the blocks in the large table are spread evenly over the lists - while the small table is concentrated in a single list (neither of those scenarios are likely).
I would just probably set INITRANS high for the table - that'll make it so we can have lots of concurrent transactions and would reserve a large segment of the block for this information. that would reduce the number of rows/block we could have - while maximizing the number of concurrent transactions. Sort of a happy middle ground.
Read only Table
Rajeshwaran, Jeyabal, September 09, 2011 - 1:38 pm UTC
Tom:
We have a table in production, where the data will be inserted into that table from Application and Batch programs (no more delete and update) the table is around 350MB with 100K records. Now we dont want the Application and Batch programs to insert data into it.
1) So we did the code change in Application & Batch program *NOT* to insert into that table any more. But still we will have that table in production for 6 more months. Do you think any other database centric approach for disabling the Inserts into that table?
I know in 11g we can alter table to read only. but we are on 10GR2
Would you suggest to go for a Trigger ;) !?!?!?
September 10, 2011 - 3:00 pm UTC
Well, the simplest thing that pops into mind would be "revoke insert on t from schema", simply remove their ability to insert into that table.
short of that
ops$tkyte%ORA11GR2> create table t ( x int );
Table created.
ops$tkyte%ORA11GR2> insert into t select rownum from all_users;
45 rows created.
ops$tkyte%ORA11GR2> alter table t add constraint "You cannot do that" check(1=0) novalidate;
Table altered.
ops$tkyte%ORA11GR2> insert into t values ( 10000 );
insert into t values ( 10000 )
*
ERROR at line 1:
ORA-02290: check constraint (OPS$TKYTE.You cannot do that) violated
storing multiple values in a single column
kirthi, December 14, 2011 - 8:50 am UTC
We have a requirement to store a certain number of passwords used in the past by a user. We currently have oldpwd1, oldpwd2, oldpwd3 etc in an "accounts" table. Now they want to increase the number of passwords being stored; one of the developers requested that a separate table be created to store the id of the user and one column to store all the encryted passwords. Each of these passwords would be separated by a delimited. I was thinking storing all these passwords in one column does not sound like a good idea as it would make it harder to query these and to maintain it. As everytime someone changes their password this column needs to be queried and manipulated to store the most recent 5 passwords for example. I would like to know your thoughts on this and suggest if there is a better way to do it. Thank you.
existing structure:
create table accounts_tbl(userid varchar2(10), currenpassword varchar2(200), oldpwd1 varchar2(200), oldpwd2 varchar2(200), oldpwd3 varchar2(200));
Proposed new table structure:
create table accounts_tbl(userid varchar2(10), currenpassword varchar2(200));
create table user_pwd_history(userid varchar2(10), pwdhistory varchar2(4000));
December 15, 2011 - 7:35 am UTC
I was thinking storing all these
passwords in one column does not sound like a good idea
Yeah, it makes about as much sense as it did to use three columns - that was a bad decision too.
You should have a table accounts_tbl( userid primary key, current_HASH_OF_THEIR_PASSWORD )
and a table accounts_table_history( userid references accounts_tbl, date_of_change date, old_hash_of_their_password );
Note that I wrote "HASH_OF_THEIR_PASSWORD" - since your developer obviously must realize that storing the password - even if encrypted - is an utterly bad idea right... they don't really store the password, they are storing a hash of the password right??
A reader, December 15, 2011 - 11:25 am UTC
yes, we are using hash and not encryption for the passwords. Can you please say what the disadvantages/problems of storing all the old passwords in one column (passwords will be separated by a delimiter) would be, so I can try to convince my team that it would be more beneficial to use your suggested approach. I suggested your approach by saying it would be more easy to manitain and query versus storing all the old passwords in one column. The response from the team was, why would you store more information which we would not need (like change_date - this will be the date when this old password was changed, we do not need to maintain all their old passwords but only a certin number of passwords).
December 16, 2011 - 6:24 am UTC
the disadvantages/problems of storing all the old passwords in one column
(passwords will be separated by a delimiter) would be
you already did, yourself, you know the answer, you wrote it above.
I was thinking storing all these passwords in one
column does not sound like a good idea as it would make it harder to query these and to maintain
it. As everytime someone changes their password this column needs to be queried and manipulated to
store the most recent 5 passwords for example.
You do need the change date - since you want to be able to order the data.
Ask them - why would you store more data than you need - you are storing the delimiter - we don't need that, we need a date or a sequence
Tell them they are using a relational database and you use parent child relations to physically implement things like this, you do not
a) use multiple columns as they did in the past for the reason they are encountering now - 3 isn't going to remain 3, it'll grow
b) hide multiple attributes in a single column. See your own comments and #1. What happens when 5 becomes "forever" or "for 100" or whatever.
Use rows.
design
A reader, December 16, 2011 - 8:50 am UTC
<one of the developers requested that a separate table be created to store the id of the user and one column to store all the encryted passwords.>
Kind of strange Tom did not say "Fire the so-called developer and hire a developer who knows Database Design 101".
I get amazed every day about how many system admins/developers who have no clue about database design or development. They always like to use flat file approach or treat the DB as a data sink/black box which Tom calls "Black Box approach".
Based on my experience, These applications keep getting rewritten and rewritten for ever, never scale, or get you in a maintenance nightmares for ever.
December 16, 2011 - 5:44 pm UTC
I was feeling nice this morning.
Tablespace sizing
A reader, January 18, 2012 - 1:02 am UTC
Hi Tom,
I want to know if we can calculate the size of tablespace when we design the database.
1. If I have a table which has 5 columns as
create table test
(id varchar2(13), header varchar2(25), data_val varchar2(3000), flag varchar2(1), created date);
If the table is loaded with 25000 records per day, how can we determine the size of table. The data_val column contains 2000-3000 bytes for each record.
Is the following estimation correct if I have to retain the data for maximum 10 years:
select 25000 * 365 * 4/1024/1024 * 10 as "gb" from dual;
GB
----------
348.091
So this table will require ~350 GB and similarly calculation needs to be done for other tables. Am I correct?
2. Also I assume that breaking the table across four-five tablespace can help me in better administration and I/O.
Please correct me if I was wrong in size estimation.
January 18, 2012 - 7:23 am UTC
1) i get the
Another way to look at this is, you'll get 2 rows per block most of the time in an 8k block size database. so that is 12,500 blocks/day.
ops$tkyte%ORA11GR2> select 12500 * 8 * 365 * 10/1024/1024 from dual;
12500*8*365*10/1024/1024
------------------------
348.091125
so, sure, if the rows are that wide AND you do not use compression, over time, it would probably take about 350gb.
2) it will do nothing for performance, tablespaces are NOT about performance. What it would allow you to do is
a) partition the table, you'd have to to get it spread across more than one tablespace
b) set the old tablespaces 'read only' after they are full so you don't have to keep backing them up over time.
So, maybe you have 10 tablespaces ultimately - one for each year and the oldest 9 would be read only - and only the current one is read/write.
Table design
A reader, February 01, 2012 - 1:24 am UTC
Hi Tom,
I have a requirement like the example given below. Could you please suggest me how can we design a table for the given structure. The number of RATES columns for a given plan can decrease or increase with time so I can't think of them as columns. It would be great if we could accommodate them in a table or two?
We can have multiple regions. We can have 4 or more group names. We can have only predefined 4 age_groups.
I have to select the value corresponding to a respective region+group+plan+rate+age_group
How can we make it simple for user to load the data ?
Region 1
PLAN_1 PLAN_2
--------------------- ----------------------
Group_Name Age_Group Rate_1 Rate_2 Rate_3 Rate_1 Rate_2 Rate_3
Member <30 100 200 300 700 800 900
>30 and <50 101 201 301 701 801 901
>50 and <75 102 202 302 702 802 902
>75 103 203 303 703 803 903
Member+Spouse <30 104 204 304 704 804 904
>30 and <50 105 205 305 705 805 905
>50 and <75 106 206 306 706 806 906
>75 107 207 307 707 807 907
Member+Family <30 108 208 308 708 808 908
>30 and <50 109 209 309 709 809 909
>50 and <75 110 210 310 710 810 910
>75 111 211 311 711 811 911
February 01, 2012 - 7:23 am UTC
technically you would have a parent table:
create table top
( region,
group,
age_group,
primary key(region, group, age_group)
)
and a child table
create table child
( region, group, age_group,
plan_name,
rate_name,
value,
foreign key(region,group,age_group) references top,
primary key(region,group,age_group,plan,rate)
)
but you don't "need" top physically as it does not have any other attributes apparently.
so you would be inserting things like
insert into child (region,group,age_group,plan_name,rate_name,value)
values
('1', 'member', '<30', 'plan_1', 'rate_1', 100 );
design a table to store the property combination of auctions
phoenixbai, May 17, 2012 - 10:16 am UTC
Hi tom,
Requirement:
suppose I have an auction and its corresponding property and its values, as below:
auc_name: 'video_1'
property_name value
----------------------
terminal ipad
resolution high
seri_num 15
there could be many kinds of values for each property,for example, for property 'terminal', we have values: ipad, pc, android, etc.
With specified auc_name and the combination of any number of property:value pairs, it could uniquely identify(points to) an url that meets the user request.for example:
auc_name property:value list url
--------- ----------------------- ----
video_1 terminal:pc;resolution:high;seri_num:15;
http://abc.com/video_1_pc_high_15.html Issues:
My question is, how should I design the table?
Currently:
I come up with below:
1) create a table with property and its value combination:
columns: prop_id, prop_name, value_id, value_name,....
2) create second table, storing the relationship between auction, property/value combination list and the unique url:
columns: auc_name, features, url, ....
PS: features is a list of property:value pairs.
what is the pros and cons of this design?
Is there any other way to design the table, so that it would be easier to do the select based on the auction name, and the given list of property/value pairs?
Thanks in advance!
May 18, 2012 - 2:31 am UTC
another option, just store an XML document with the stuff tagged in it. Gives you the ability to index if you need.
and then you have all of the xml functions to query it, search it, process it.
you are right, XML is way better!
phoenixbai, May 18, 2012 - 10:04 pm UTC
Thank you very much for the suggestion, I think XML is way better then the key:value pairs!
A reader, February 24, 2014 - 10:19 pm UTC
keep the blob column in the same table. You dont need another table to store blob. You will have to join two tables in that case. If you have one table you dont need to join un-necessarily
encrypt XMLTYPE store as securefile binary xml
Branka, March 07, 2014 - 5:30 pm UTC
Is it possible to encrypt XMLTYPE store as securefile binary xml?
Query Again on Design of Table
Mohammed Imran Pasha, November 12, 2015 - 7:36 am UTC
Hi Tom,
Thanks a lot for answering all the questions related to oracle.
My question to you is:
I have a table into which millions of rows will be inserted and later selected(only). nation_code column is part of primary key of the table.
My manager is suggesting to create different tables for depending on nation_code, this way size of the table can be controlled and also we will get performance benefits when selecting data from table. But I am worried about the maintenance, that is we need to maintain 7-8 tables instead of one.
In your opinion which approach is best single table or multiple tables to get good performance results while inserting and selecting data? Also please give reasons so I can convince my manager.
also there will be at least a million rows for each nation.
DB Version: 11.2.0.3.0
Thanks,
Imran.
November 12, 2015 - 4:15 pm UTC
Please ensure you keep followups related to the original question. Ask new questions as appropriate (I can see you already have done here).
Ooo
Amritanshu, August 06, 2017 - 1:56 am UTC
In short to recognize a particular document in huge table
Ordering Columns in a Table DB 10g, 11g
Abubaker, October 05, 2019 - 12:29 pm UTC
Hi Tom,
Is there solution in making column sequence reordered other than creating a view, in DB 10.2g and 11g.
Thanks in advance.
Table creation with default value based on reference column value
Rajasekhar, September 04, 2020 - 7:40 am UTC
Hello Chris,
it is possible to create a table, which defaults column value based on certain conditions(reference column values) instead of assigning it when it is null?
Create table test_default (id number, name varchar2(30), dept number when id=2 default 10);
Thanks in advance
September 07, 2020 - 1:38 am UTC
Unfortunately not. You'd need a trigger for that.