Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Partha.

Asked: December 30, 2008 - 1:24 am UTC

Last updated: January 05, 2009 - 6:57 pm UTC

Version: 10.2.0

Viewed 1000+ times

You Asked

We have a single table that contains all the list of values (not master tables) for each language. This leads to a lot of self joins. Is there a better way to do handle this?

Example:
CREATE TABLE SYSTEM_CONSTANTS
(
  SYS_CODE   VARCHAR2(10),
  SYS_DESC   VARCHAR2(50),
  SYS_LANG   VARCHAR2(10),
  SYS_DISP_ENG VARCHAR2(100),
  SYS_DISP_LANG VARCHAR2(100),
  SYS_VALUE  NUMBER,
  PRIMARY KEY(SYS_CODE, SYS_LANG, SYS_VALUE)
) ;

INSERT INTO SYSTEM_CONSTANTS(SYS_CODE, SYS_DESC, SYS_LANG, SYS_DISP_ENG, SYS_DISP_LANG, SYS_VALUE) VALUES
( 'GENDER', 'Gender of the person', 'jp', 'Male', 'Male in Japanese character', 0) ;
INSERT INTO SYSTEM_CONSTANTS(SYS_CODE, SYS_DESC, SYS_LANG, SYS_DISP_ENG, SYS_DISP_LANG, SYS_VALUE) VALUES
( 'GENDER', 'Gender of the person', 'jp', 'Female', 'Female in Japanese character', 1) ;
INSERT INTO SYSTEM_CONSTANTS(SYS_CODE, SYS_DESC, SYS_LANG, SYS_DISP_ENG, SYS_DISP_LANG, SYS_VALUE) VALUES
( 'YN', 'Yes or No Indicator', 'jp', 'Yes', 'Yes in Japanese', 0) ;
INSERT INTO SYSTEM_CONSTANTS(SYS_CODE, SYS_DESC, SYS_LANG, SYS_DISP_ENG, SYS_DISP_LANG, SYS_VALUE) VALUES
( 'YN', 'Yes or No Indicator', 'jp', 'No', 'No in Japanese', 1) ;
INSERT INTO SYSTEM_CONSTANTS(SYS_CODE, SYS_DESC, SYS_LANG, SYS_DISP_ENG, SYS_DISP_LANG, SYS_VALUE) VALUES
( 'TRG_TYPE', 'Training Type', 'jp', 'Oracle Training', 'Oracle Training in Japanese', 1) ;
INSERT INTO SYSTEM_CONSTANTS(SYS_CODE, SYS_DESC, SYS_LANG, SYS_DISP_ENG, SYS_DISP_LANG, SYS_VALUE) VALUES
( 'TRG_TYPE', 'Training Type', 'jp', 'Microsoft Training', 'Microsoft Training in Japanese', 2) ;
-- ..................... more inserts ....... --
COMMIT ;


The idea is whenever we use a list of value we select from the system constants with the relevant sys_code like:

SELECT DECODE(:P10_LANGUAGE, NULL, SYS_DISP_ENG, SYS_DISP_LANG), SYS_VALUE
  FROM SYSTEM_CONSTANTS
 WHERE SYS_CODE = :P10_CODE
   AND SYS_LANG = NVL(:P10_LANGUAGE, SYS_LANG) ;


The question is we usually have search screens where the data to be displayed is the SYS_DISP_LANG. So, the more list of values data we have in the search results, the more self joins we need. Does it make any difference if we have each of the LOV in a table by itself or to have one table which has all the list of values (select lists)?

CREATE TABLE my_emp
( EMP_ID       NUMBER,
  ENAME        VARCHAR2(50),
  GENDER       NUMBER,
  TRAINING_ID  NUMBER,
  SELECTED_IND NUMBER,
  TRG_IND      NUMBER,
  PRIMARY KEY(EMP_ID)
) ;

-- Value of Gender, training_id, selected_ind, trg_ind is from the client as select lists from the system_constants table --
-- For completeness inserting the values directly --
INSERT INTO my_emp VALUES (100, 'AAA', 1, 2, 1, 0) ;
INSERT INTO my_emp VALUES (101, 'BBB', 0, 1, 0, 1) ;
INSERT INTO my_emp VALUES (102, 'CCC', 1, 1, 1, 0) ;
INSERT INTO my_emp VALUES (103, 'DDD', 1, 1, 1, 0) ;
INSERT INTO my_emp VALUES (104, 'EEE', 0, 2, 0, 1) ;
INSERT INTO my_emp VALUES (105, 'FFF', 0, 2, 1, 0) ;
COMMIT ;

SELECT ENAME, 
       DECODE(:P10_LANGUAGE, NULL, gender.SYS_DISP_ENG, gender.SYS_DISP_LANG) gender,
       DECODE(:P10_LANGUAGE, NULL, trg.SYS_DISP_ENG, trg.SYS_DISP_LANG) training_type, 
       DECODE(:P10_LANGUAGE, NULL, sel.SYS_DISP_ENG, sel.SYS_DISP_LANG) selected_flag, 
       DECODE(:P10_LANGUAGE, NULL, trg_comp.SYS_DISP_ENG, trg_comp.SYS_DISP_LANG) training_completed_flag
 FROM my_emp, 
      system_constants gender, system_constants trg, system_constants sel,  
      system_constants trg_comp
WHERE my_emp.gender = gender.sys_value
  AND gender.sys_code = 'GENDER'
  AND gender.sys_lang = :P10_LANGUAGE
  --
  AND my_emp.training_id = trg.sys_value
  AND trg.sys_code = 'TRG_TYPE'
  AND trg.sys_lang = :P10_LANGUAGE
  --
  AND my_emp.selected_ind = sel.sys_value
  AND sel.sys_code = 'YN'
  AND sel.sys_lang = :P10_LANGUAGE
  --
  AND my_emp.trg_ind = trg_comp.sys_value
  AND trg_comp.sys_code = 'YN'
  AND trg_comp.sys_lang = :P10_LANGUAGE ;


This becomes more complicated if one of the values like training_id or selected_ind is NULL. Then the code becomes even more complicated with inner queries and outer joins. Because of this complication, the developers have created a function called get_description(vc_code, vc_value, vc_language) that returns the description in the desired language and using it everywhere. So, the above query becomes,
SELECT ENAME, get_description('GENDER', gender, :P10_LANGUAGE) gender,
       get_description('TRG_TYPE', training_id, :P10_LANGUAGE) training_type, 
       get_description('YN', selected_ind, :P10_LANGUAGE) selected_flag, 
       get_description('YN', trg_ind, :P10_LANGUAGE) training_completed_flag
 FROM my_emp ;


This query leads to context switching.

What is the better way of doing this?

and Tom said...

DO NOT DO NOT DO NOT do the last one.
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:48704116042682#1377078900346297424



In fact, do NOT do what you've already done. You should use a lookup table per type of lookup, not a 'generic' lookup for everything.


Use outer joins, use a view. And use a separate table for EACH lookup.


So,

a) use a separate table for each lookup
b) hide the "complexity" if you need in a view
c) DO NOT use the plsql 'getter' functions as you have

There are two approaches to your view, one would be:


create view..
select ename, 
      (select descript from gender_lookup where code = EMP.gender) gender,
      (select descript from training_type_lookup where code = EMP.tt_code) tt,
      ....
 from EMP



or

create view ..
select emp.ename, gl.descript gender, tt.descript training_type, ....
  from EMP, gender_lookup gl, training_type_lookup tt, ...
 where <outer joins>


the first one works nicely if you typically get a small number of rows from EMP in your application at a time, the second works nicely if you tend to get many/most/all of the rows from emp - so you might use BOTH views in different places.





Rating

  (12 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Generic Lookup Not the Problem

Greg, January 02, 2009 - 3:39 pm UTC

My organization originally created one table per lookup, but abandoned that practice after a while. It doesn't make sense to create hundreds of tables with less than ten rows and the same two columns (id and value) for each table. You may want to reconsider your advice.

We do use the first view option you presented below with outer joins, which seems to work quite well.
Tom Kyte
January 05, 2009 - 10:43 am UTC

... It doesn't make sense to create hundreds of tables
with less than ten rows and the same two columns (id and value) for each table. ...

one term for you Greg:

data integrity.


You might want to rethink YOUR approach.


You do not have any type information for 'value' (lack of data integrity, I'm sure some of the lookups are numbers, some are strings of length 10 or less, others are strings of length 1, some are strings of length 80 or less, some must be unique)...

You do not have referential integrity.

And you have completely botched the optimizers ability to correctly estimate cardinalities. You will be fighting the optimizer every step of the way.


You will certainly rethink your position. I base mine on data integrity, referential integrity and performance.

You base yours (apparently) on "we cannot be bothered"

Generic Lookup Not the Problem

Greg, January 02, 2009 - 4:51 pm UTC

Clarification: I meant to say we use the second view method with the outer joins, not the one with the correlated queries. Correlated queries are a no-no with us, but we may want to reconsider this since you are recommending it.
Tom Kyte
January 05, 2009 - 10:56 am UTC

... Correlated queries are a no-no
with us ...

why? do you hate having good performance in many cases?


Why would anyone have a blanket "do not use" rule?

Use with knowledge, but don't rule things out


and they are called scalar subqueries when used the way I used them. A subquery that returns a single column and zero or one rows. Correlated subqueries are subtly different

Generic Lookup Not the Problem

Greg, January 02, 2009 - 5:06 pm UTC

Shoot! Sorry about the multiple posting, but I forgot the the original poster might want to know how my org would write the query of the generic lookup table.

SELECT ENAME,
gender,
trg.SYS_DISP AS training_type,
sel.SYS_DISP AS selected_flag,
trg_comp.SYS_DISP AStraining_completed_flag,
etc
FROM my_emp,
(SELECT sys_value, DECODE(:P10_LANGUAGE, NULL, SYS_DISP_ENG, SYS_DISP_LANG) AS sys_disp FROM
system_constants WHERE sys_code = 'GENDER') gender,
(SELECT sys_value, DECODE(:P10_LANGUAGE, NULL, SYS_DISP_ENG, SYS_DISP_LANG) AS sys_disp FROM
system_constants WHERE sys_code = 'TRG_TYPE') trg,
(SELECT sys_value, DECODE(:P10_LANGUAGE, NULL, SYS_DISP_ENG, SYS_DISP_LANG) AS sys_disp FROM
system_constants WHERE sys_code = 'YN') sel,
(SELECT sys_value, DECODE(:P10_LANGUAGE, NULL, SYS_DISP_ENG, SYS_DISP_LANG) AS sys_disp FROM
system_constants WHERE sys_code = 'YN') trg_comp
WHERE my_emp.gender = gender.sys_value(+)
AND my_emp.training_id = trg.sys_value(+)
AND my_emp.selected_ind = sel.sys_value(+)
AND my_emp.trg_ind = trg_comp.sys_value(+)

do not

Alex Fatkulin, January 02, 2009 - 11:49 pm UTC

Greg, you may find this article to be of some use perhaps

http://www.simple-talk.com/opinion/opinion-pieces/bad-carma/


re: do not

Greg, January 03, 2009 - 9:25 am UTC

Nice article, very amusing. I always wondered how far you could push this idea.

For those who don't want to bother to read, this is a case where they lumped all the tables in the database into a single table that contained a superset of all the columns, with the aim of having the db be completely customizable by the end user I suppose.

They are combining the main data tables together, with I assume 1mil+ rows each and a large amount of completely disjoint columns. The difference between this and what was discussed above is that the logical lookup tables are small with less than 100 rows on avg, the columns are few and almost completely overlap (id, value, etc), and are rarely updated. We keep the main data tables separated.
Tom Kyte
January 05, 2009 - 11:09 am UTC

see above, there are other excellent reasons to do it correctly...

greg should be reconsidering

anonymous, January 04, 2009 - 5:13 pm UTC

why does it matter if you have 50 tables instead of one table? Are 50 tables harder to manage than one? if so, how? data types, check constraints, queries, etc.. are all much more clear/easy if you seperate. If you separate via views anyway from your one master table, why not just physically separate in the first place? You are willing to reconsider the later part based on tom's advice, but feel his initial advice of "Don't do that", can be disqualified?


love that article...

Darren, January 04, 2009 - 10:47 pm UTC

I loved that article when i read it a number of years ago, and it still amuses me.

Mainly it amuses me, because it reminds me of an Oracle based product that I had to support for a number of years. Not so bad maybe, but let's just say, not scalable... to state the obvious

re: greg should be reconsidering

Greg, January 05, 2009 - 12:16 am UTC

Hey, I was just describing what we do in practice, which happened to coincide with Tom's advice for the outer join view.

I started out just like you, strongly arguing for a pure logical design with one table for each entity based on what I learned in textbooks. The number of tables started proliferating and a more experienced developer sat me down and showed me the pros and cons. After a lot of arguing I came around to his point of view and now we find it a lot easier to develop and maintain with a single lookup table. As I mentioned, no one would even think of doing this for our main data tables.
Tom Kyte
January 05, 2009 - 11:30 am UTC

what pro's - put them out here.

We are somewhat experienced as well - but we are willing to listen.

convince us.


how do you do referential integrity?
how do you handle data integrity (strong typing, correct datatypes)
how often do you find yourself 'fighting' with the optimizer.

what are the "pros"? The only pros I can think of are actually not a "pro". It usually comes back as "find it a lot easier to develop and maintain with a single
lookup table" - meaning "we can be lazy and do half the work - we forgo data and referential integrity because we are not willing to do it right.

We do it too . . .

Bryan, January 05, 2009 - 6:53 pm UTC

To be honest, we do it too. We have a code/code_detail with detail columns type (number), value (number), text (varchar2(100) - only descriptions). There is an unique index on type/value. The value is tagged on child tables. There are about 1500 records in the code table. There are about 100 types. There is a line which separates if new values are put in the code table versus implementing a separate table. First, if the entity only has a value and a description than it is a candidate. Second, how many records are we talking? For us, more than 99% of the time we are talking <10 records. (Obviously there are a couple of exceptions since the prior numbers would not add up.) Finally, it has been growing over, say, 8 years . . . and the vast majority were inserted at inception. I would estimate we add 5 types max on average per year and maybe 30-40 records in the detail table . . . again on a good year.

So, would I rather have 100 separate tables where I would have to name when I create and parse through when I am searching for one particular table? Or would I rather have one table I can scan through when I am searching for a particular key? I'm with Greg. I used to always put lookups in separate tables until I first saw something similar to what is described. (Which by-the-way, is an Oracle product now.)

Typically, these descriptions are used to fill in descriptions in a form. The type is coded directly into the form. The other possibility is it is used in a report. Again the type is coded directly into the report. In both cases it is a true and glorified unique lookup.

I read the "simple-talk" link from Alex's post. Very funny, but I am not at all talking about a gross implementation that was tried there.

Am I wrong? Technically, maybe. Is there at least a perception of simplification? Not always - I've had to explain this to one of our coders more than once - but in general yes, there is at least a perception of simplification. Will I change my ways, probably not. That would take money and a cathartic revelation. At this time I don't perceive to understand the vast performance hit I am taking by implementing in this way. Have I tested it? Admittedly not. In my opinion, squeezing out the last drops of performance vs the status quo is not worth it . . . although I am open to a lesson. (I have a rough enough time convincing the coders to use bind variables, not to use an asterisk, and other unpleasant and lazy code.)

I will now step down and make my way to the guillotine next to Greg.

Regards,
BLC
Tom Kyte
January 05, 2009 - 6:57 pm UTC

tell me how you do

data integrity
referential integrity

Bryan, January 05, 2009 - 7:20 pm UTC

Referential Integrity - Physically, none. The child table theoretically could have any numeric value. Practically, forms only display the available values based on the type. Any other batch applications the types and values are coded directly in the application. Obviously that doesn't stop any bonehead back-end update statements.

Data Integrity - Physically, none. But, by-in-large, I am the only person who is allowed to perform dml on the 2 code tables. Technically, updates and deletes ought never be performed. These are static descriptions. Maybe an update for clarification. So practically, only inserts are performed. However, I do have a trigger on the back-end which logs all dml activity.

So, there are some admitted holes that are not available in some organizations.

BLC

Re: We do it too

Greg, January 05, 2009 - 7:58 pm UTC

Thanks for the handoff Bryan, let me give this a shot.

Data Integrity

All of our lookups are text strings, ('Big', 'Small', etc) of less than 250 characters length so the column is just VARCHAR(255). We don't have lookups for any other data type (date, number, lob, etc). I feel pretty confident about this but I don't know if this will satisfy you.

Referential Integrity

Err ok embarrassed here... So we use two column foreign key constraints (you could call them lookup_type_id, lookup_id). That means every time we refer to the a lookup table we need to refer to two columns instead of one in the case of having a separate table for each lookup.

I consider this OK in our case cause we are primarily a datamart team and we only use these lookups for storing mappings or UI, but this would be obviously unacceptable for transactional databases which is probably the perspective the original poster was coming from. We denormalize the lookups when we load our dimension tables so we just don't have ref int issues there. I remember asking about transactional tables and the answer I got was if it was really needed do it in the UI.

I think I have been living in the DM world for too long. Original advice was probably spot on just not what I do so of course it must be wrong, right?


not working

tosha, January 12, 2009 - 1:27 am UTC

why is my myspace not woking