Thanks for the question, Luis.

Asked: May 27, 2008 - 10:36 am UTC

Last updated: May 29, 2008 - 1:19 pm UTC

Version: 10.2.0.1.0

Viewed 1000+ times

You Asked

Hi Tom,

I've tried looking on your answers but couldn't find a good solution to my case.

In my company we use EJBs and plain java and connect to the database using oracle's thin JDBC. Basically what I have to do is an end-of-day procedure that does the following:

1) From the n tables that exist with daily values, drop the oldest,
2) Create a new empty table exactly like the n-1 tables (down to the indexes)
based on the newest table before the created table.

So if I have:
TBL_IDX_TESTER_20081017
TBL_IDX_TESTER_20081018
TBL_IDX_TESTER_20081019

after running the procedure I should have
TBL_IDX_TESTER_20081018
TBL_IDX_TESTER_20081019
TBL_IDX_TESTER_20081020

TBL_IDX_TESTER_20081017 is dropped, and an empty TBL_IDX_TESTER_20081020 is created with everything (down to indexes) imported from the last available table (TBL_IDX_TESTER_20081019).

Now, the problem is that this can't be done directly inside the java code because of a client's constraint - NO DDL INSIDE THE JAVA APPLICATION. The user that connects to db from the java code has no right to do anything else then select, update, delete and insert.

So I've thought of creating stored procedures that had the right of doing the required work (i.e., dropping and creating the tables) and then GRANTing EXECUTE to the user inside the java application to make the work. That would safeguard that no DDL code is inside the java application, but on the other hand I could do the required work.

My inquiries here on Ask Tom show that you strongly disapprove of that approach.

Is there any way of doing that? I don't mind using anything else, I just don't seem to find another solution.

Here's the table creation commands:

create tablespace testind
logging
  datafile '/export/home/oracle/databases/ora10/testind.dbf'
  size 32m 
  autoextend on 
  next 32m maxsize 2048m
  extent management local;

CREATE TABLE TBL_IDX_TESTER_20081017( 
    FOREIGN_ID CHAR(16) NOT NULL,
    KEYWRD VARCHAR2(40) NOT NULL,
    INDEX_NO INTEGER NOT NULL,
    VALUE VARCHAR2(200) NOT NULL
) TABLESPACE TESTIND;

ALTER TABLE TBL_IDX_TESTER_20081017 ADD CONSTRAINT TBL_IDX_TESTER_20081017_PK PRIMARY 
KEY(INDEX_NO);

INSERT INTO TBL_IDX_TESTER_20081017 (FOREIGN_ID, KEYWRD, INDEX_NO, VALUE) VALUES 
('ABCDE', 'KEYWRD1', 1, 'TESTVALUE1');
INSERT INTO TBL_IDX_TESTER_20081017 (FOREIGN_ID, KEYWRD, INDEX_NO, VALUE) VALUES 
('BCDEF', 'KEYWRD2', 2, 'TESTVALUE2');
INSERT INTO TBL_IDX_TESTER_20081017 (FOREIGN_ID, KEYWRD, INDEX_NO, VALUE) VALUES 
('CDEFG', 'KEYWRD3', 3, 'TESTVALUE3');

CREATE INDEX TBL_IDX_TESTER_20081017_IDX1 ON TBL_IDX_TESTER_20081017 (KEYWRD, 
FOREIGN_ID);


CREATE TABLE TBL_IDX_TESTER_20081018( 
    FOREIGN_ID CHAR(16) NOT NULL,
    KEYWRD VARCHAR2(40) NOT NULL,
    INDEX_NO INTEGER NOT NULL,
    VALUE VARCHAR2(200) NOT NULL
) TABLESPACE TESTIND;


ALTER TABLE TBL_IDX_TESTER_20081018 ADD CONSTRAINT TBL_IDX_TESTER_20081018_PK PRIMARY 
KEY(INDEX_NO);

INSERT INTO TBL_IDX_TESTER_20081018 (FOREIGN_ID, KEYWRD, INDEX_NO, VALUE) VALUES 
('ABCDE', 'KEYWRD1', 1, 'TESTVALUE1');
INSERT INTO TBL_IDX_TESTER_20081018 (FOREIGN_ID, KEYWRD, INDEX_NO, VALUE) VALUES 
('BCDEF', 'KEYWRD2', 2, 'TESTVALUE2');
INSERT INTO TBL_IDX_TESTER_20081018 (FOREIGN_ID, KEYWRD, INDEX_NO, VALUE) VALUES 
('CDEFG', 'KEYWRD3', 3, 'TESTVALUE3');

CREATE INDEX TBL_IDX_TESTER_20081018_IDX1 ON TBL_IDX_TESTER_20081018 (KEYWRD, 
FOREIGN_ID);

CREATE TABLE TBL_IDX_TESTER_20081019( 
    FOREIGN_ID CHAR(16) NOT NULL,
    KEYWRD VARCHAR2(40) NOT NULL,
    INDEX_NO INTEGER NOT NULL,
    VALUE VARCHAR2(200) NOT NULL
) TABLESPACE TESTIND;


ALTER TABLE TBL_IDX_TESTER_20081019 ADD CONSTRAINT TBL_IDX_TESTER_20081019_PK PRIMARY 
KEY(INDEX_NO);

INSERT INTO TBL_IDX_TESTER_20081019 (FOREIGN_ID, KEYWRD, INDEX_NO, VALUE) VALUES 
('ABCDE', 'KEYWRD1', 1, 'TESTVALUE1');
INSERT INTO TBL_IDX_TESTER_20081019 (FOREIGN_ID, KEYWRD, INDEX_NO, VALUE) VALUES 
('BCDEF', 'KEYWRD2', 2, 'TESTVALUE2');
INSERT INTO TBL_IDX_TESTER_20081019 (FOREIGN_ID, KEYWRD, INDEX_NO, VALUE) VALUES 
('CDEFG', 'KEYWRD3', 3, 'TESTVALUE3');

CREATE INDEX TBL_IDX_TESTER_20081019_IDX1 ON TBL_IDX_TESTER_20081019 (INDEX_NO, 
FOREIGN_ID);

Note that I've changed the index TEST_INDEX_20081019_IDX1 to stress the point that one (or more) indexes may be different on the tables even though they are the same structure. On a real case, the last day table may have been changed to have more or less indexes than the previous ones, and we need it replicated to the new table to be created.

Thanks a lot for your help :)

and Tom said...

...
My inquiries here on Ask Tom show that you strongly disapprove of that approach.
.....

correct, it is a very very bad, horrible idea to do DDL like this on a production system in an ongoing fashion. One wonders what "design" (anti-design if you ask me) could have gotten us to this.

Basically, you are saying the java application dynamically generates all of it's sql based on the day?!?!?!?!

....
So I've thought of creating stored procedures that had the right of doing the required work (i.e., dropping and creating the tables) and then GRANTing EXECUTE to the user inside the java application to make the work. That would safeguard that no DDL code is inside the java application, but on the other hand I could do the required work.

......

it sounds like your client had a reasonable request - NO DDL. So they said it wrong by stating "in a java program", but only probably because you said 'we will develop in java'. You are missing their point - they know how bad DDL in an ongoing system would be.

do not drop and create.

If you must go with this really bad "table of the day" idea - forcing all sql to be 'new' every day, day in day out - just

a) truncate oldest
b) alter table oldest rename to newest

that way, you don't have to worry about copying indexes, grants, data types, any new things added to it over time.

but seriously, rethink this "design", give the "design" a bit more thought - I cannot imagine why you would desire this sort of schema, it would be nightmarish to code to. What does your code do around midnight?

Rating

(7 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Bad design

Jim, May 27, 2008 - 3:37 pm UTC

Tom,
Great answer. I would have asked how they query the daily data and what happens if they try to query data that is so old that it has been removed? This sounds like a transaction log that needs to be treated as an aged queue.

Some clarifying

Luis Matos, May 28, 2008 - 10:14 am UTC

First, Tom, thanks a lot for your answer. Some things have to be clarified so that I make the next question (you can skip all of those and go to the end too if you wish - questions are tagged with QUESTIONS: in the end of this post).

1) My company makes a software whose data is very volatile. That means that data from the last 5 days is very important, followed by data of the last month (which is less important but still maybe important), and data from more than one month is only very seldom used.

2) Even though I agree completely that a better design could be achieved with the Oracle database and its inherent strengths, the application was done with other database originally. We're actually migrating to Oracle now, and of course some things that might be thought as bizarre in Oracle were result of the constraints that my company had before (when using a less-full-of-features db).

3) The request from the customer is a general one based on guidelines - since they use Oracle for other applications already (and they're a customer big enough to force my company spend money into migrating to Oracle), we have to comply to many guidelines - one of which is "no DDL inside java code".
---------------------------------------------------------------------
I've setup the Oracle DB for testing in the following way:

1 user created the db, that's the "dbowner" user
1 user has access to select/update/delete/insert on the database, that's the "dbuser" user.

For the moment, I can't really re-design the whole way that the application runs because it still has to run on the old database and in Oracle. Since I've got the hint that we're going in the future to move completely to Oracle, I would like to know if someone reading this (you Tom or any other power Oracle user) could find a better solution. On the QUESTIONS: section of this post, I've made a proposal (right at the end). Maybe that would be a solution?

Note: "message" in the context below is a trade (buy/sell shares, transfer of money, future market, and so on) related to Stock Exchange.

Basically the flow is like this:

A) We receive thousands and thousands of messages per day coming from the Stock Exchange. Those are saved in the "daily_table" of that day. Most of those messages have to be seen/answered/treated in the next 5 days. Therefore, they are saved separetely on 5 "daily_tables"
B) Every day, on the end-of-day only, messages older than 5 days are moved to the "recent_messages" table. Also, messages older than 1 month are moved to the, let's say for example, "history_messages_200805" table.

So we have:

- table_20080528
- table_20080527
- table_20080526
- table_20080525
- table_20080524

- "daily_table" view created with the above 5 tables

- table_recent_messages (messages from 2008-04-23 until 2008-05-23)
- table_history_200804 (messages from 2008-04-01 until 2008-04-22)

then we have the end-of-day on 2008-05-28 and the following happens:

1) "daily_table" view with the 5 days is dropped
2) table_20080524's contents go to table_recent_messages
3) table_20080524 is dropped
4) table_20080529 is created using the indexes of table_20080528
5) messages from the day 2008-04-23 from table recent_messages go to table_200804
6) view is re-created with the current 5 days

final result:
table_20080529
table_20080528
table_20080527
table_20080526
table_20080525

daily_table view created with the above 5 tables

table_recent_messages (messages from 2008-04-24 until 2008-05-24)
table_history_200804 (messages from 2008-04-01 until 2008-04-23)
------------------------------------------------------------------------------------------
QUESTIONS:
Now for the technical questions:

when you say:

TOM: "truncate oldest, rename oldest to newest"
ME: Are the commands (ALTER TABLE, truncate, etc) possible to run inside a stored procedure, so that I would be able to call this procedure from the non-privileged java user, by giving it grant execute?

TOM: "Basically, you are saying the java application dynamically generates all of it's sql based on the day?!?!?!?! "
ME: No, the sql actually is done over a view. The view contains always the latest 5 days (i.e. latest 5 tables). So the SQL to see the data is not changed at all. Of course, the SQL to insert into that "table of the day" is regretfully changed every day. This table has 99.99% inserts; deletes and updates are very rare, and selects are done over the view.

And, for the future, when we get rid of the legacy database:
I think that Oracle solves that kind of problem by allowing inside a table the concept of partitions. Instead of having a view with the latest 5 tables, I would create one table with 5 partitions where the key to the partition would be the day. At end of day I would move the "oldest of 5 days" partition to the table_recent_messages, move the partition which is 1 month old to the monthly history table and then create the "newest day" partition on the daily_table. Would it then be the correct approach to you (or to any casual reader of this thead)?

May 28, 2008 - 10:52 am UTC

you will fail miserably unless and until you recognize:

there are different databases existing in the world because they do things very differently from each other. If they did not, there would only be one database out there. You need to employ different tactics, different methods for each or you will simply have an application that works moderately ok on one platform and runs slower than Molasses in Maine in February on all of the rest.

... We're actually migrating to Oracle now, any had before (when using a less-full-of-features
db). ....

I have zero patience for that. I will not help in that area at all. You use the stuff correctly (WHATEVER database you have), there is no excuse.

.... 3) The request from the customer is a general one based on guidelines - since
they use Oracle for other applications already (and they're a customer big
enough to force my company spend money into migrating to Oracle), we have to
comply to many guidelines - one of which is "no DDL inside java code".
....

Now it is obvious, this customer has been burned by these database independent applications many times in the past. They KNOW that doing DDL like that is the kiss of death, not to be done. They are telling you DO NOT DO THAT. You are trying to be sneaky here, stop it.

They actually know what they are talking about - now, follow their guidelines. I don't see how you have a choice, they are paying for your time, don't do something they have explicitly said "DO NOT DO"

And yes, partitioning for the aging and purging and archiving of data is the correct approach.

yes, you can put alters and truncates into procedures. execute immediate 'string';

.,..
No, the sql actually is done over a view. The view contains always the
latest 5 days
.....

well, same difference - say goodbye to shared sql, you are doing tons of ddl - you are recreating this view :( ugh.

Thanks

Luis Matos, May 28, 2008 - 11:30 am UTC

Hi Tom,

please don't get worked up because of me trying to find a solution the way I wrote to you. The reason why I write is exactly because I'm not comfortable with it and I need your opinion.

If you feel that using the stored procedure to make the DDL code is sneaky, well, that wasn't my intention. I understood that the danger of having DDL on Java is exactly that it's not exposed to the Database Administrator (so the Java code does whatever it wants). By creating the stored procedures and giving only execute rights to the java user, my idea was exposing to the database administrator of that company what is that the application wants to do, so that ONLY that stuff was allowed. My sincere idea on this matter is not being sneaky, since all that stored procedures would be clearly spotted in the db.

Well, I stand corrected. I gather that the strong reaction you have makes impractical for me even to think doing it that way.

Anyways, I still have a last question: the commands to move partitions around is still DDL code. So if I do it that way - I would still have to be sneaky as you defined. Anyways, I'll make an enquiry with the customer whether this approach with the partitions is acceptable.

So my question is: is it for you based on the information you've got, the approach with the partitions acceptable?

Thanks

May 28, 2008 - 3:48 pm UTC

... and I need your opinion. ...

my opinion is that if you take an application and simply try to migrate it and the goal is "minimal to no change" from database A to database B (where A is ANY database and B is any other database), you will *fail*.

with partitioning, there would not be any reason to move anything anywhere. I would not be moving the data at all. You have achieved the segregation you desire (physically separate).

You would simply use a predicate to retrieve monthly data or whatever - we would eliminate from consideration any partition that was not relevant.

Andrew, May 28, 2008 - 2:57 pm UTC

CREATE TABLE TBL_IDX_TESTER_20081017( 
    FOREIGN_ID CHAR(16) NOT NULL,
    KEYWRD VARCHAR2(40) NOT NULL,
    INDEX_NO INTEGER NOT NULL,
    VALUE VARCHAR2(200) NOT NULL
) TABLESPACE TESTIND;

While the original poster didn't ask about the table design, this looks a lot like an EAV schema, so they should probably read the thread at:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056

May 28, 2008 - 3:54 pm UTC

ugh, I didn't even look at that - but yes, that looks quite possibly true.

Chuck Jolley, May 28, 2008 - 7:27 pm UTC

I might go even more basic
[quote]
I think that Oracle solves that kind of problem by allowing inside a table the concept of
partitions.
[/quote]

This indicates a misunderstanding of the concept of "column", of all things, to me.
And compound primary key.

What's wrong with adding a date column to the logical primary key if you have one?

Isn't partitioning more of a data volume question?

May 29, 2008 - 7:28 am UTC

... What's wrong with adding a date column to the logical primary key if you have
one?
...

nothing, that is what I'm suggesting (not that the date would be part of the key however, that is not necessary). The date should be an attribute and then the queries would use the date in the predicate to select information they wanted. And if that wasn't desired - we have these things called views to hide that predicate from the application....

... Isn't partitioning more of a data volume question? ...

depends - you partition for many reasons, ease of administration, higher availability and sometimes performance. Here it would be an ease of administration and possibly a performance feature.

Re: Thanks

Luis Matos, May 29, 2008 - 9:55 am UTC

Hi Andrew, Chuck,

thanks for your answers, but the table in question is not the one I've used in the example. That was meant to be an example only. The real table has log queues, and those would be the key for partitions.

As much as reading the EAV dicussion was very interesting - thanks Andrew - I failed to see what that has to do with the matter in question, i.e.:

- You have volatile, immense amount of data every day
- Data after 5 days (for example) is very seldom needed
- You need to have data from today and the last 5 days quickly accessible, while at the same time leaving the older data separately for historic purposes (access to the historic data can be slower).

So that criticisms (if any) come more to the point, I'll draw roughly what the table is:

MESSAGES_TABLE:
MESSAGE_ID CHAR(16)
... (many not important fields for discussion)
QUEUE VARCHAR2(20)
MSG_DATA VARCHAR2(2900)

the "many not important fields for discussion" are mostly header fields of the message that is sent, meant to be used as search criteria. The message_id is the unique key of a message received or sent, while MSG_DATA is the message in XML format, and QUEUE defines whether a message is to be kept on the current searchable messages or not.
The amount of messages received per day can go more than 1 million all received automatically via in-link. 77% of the queries done on messages on the database are done on messages from "today" (i.e. messages which are not in the log). 19% is done on messages in the last 5 days (i.e. messages belong to the last 5 days' log queues), while 2% only is done on messages older than that. Important to say, those 2% remaining queries are done in huge amounts of data, and that's the threshold we want to avoid. So messages older than 5 days are archived.

From what I've understood so far, with Oracle my best bet would be something like this:

MESSAGE_TABLE
PARTITION TODAYMSG - MESSAGES WHERE LOG_QUEUE NOT LIKE 'log%'
PARTITION 20080529 - MESSAGES WHERE log_queue = 'log_20080529'
PARTITION 20080528 - MESSAGES WHERE log_queue = 'log_20080528'

... like this until 20080524.

on the next day end-of-day procedure, I would have:

messages where log queue = 'log_20080524' go to archive (i.e. the respective partition is moved to the archive table)

partition 20080530 is created for messages whose log queue = 'log_20080530'.

That would solve the issue, I guess. Sorry for misleading comments by giving a different table for you.

May 29, 2008 - 10:09 am UTC

... I
failed to see what that has to do with the matter in question ...

they were pointing out that the EAV model, much beloved by young developers worldwide, is a horrible thing to implement in real life. That was all.

Database independence
"migrate" with as few code changes as possible from database A to B
EAV's

almost every red flag possible is present here for the perfect storm. When you couple that with:

....
The request from the customer is a general one based on guidelines - since they use Oracle for
other applications already (and they're a customer big enough to force my company spend money into
migrating to Oracle ), we have to comply to many guidelines - one of which is "no DDL inside java
code".
..........

(being dragged kicking and screaming into Oracle...)

You would have a date in your table.

then you would use

     PARTITION part1 VALUES LESS THAN (to_date('13-mar-2003','dd-mon-yyyy')) ,
     PARTITION part2 VALUES LESS THAN (to_date('14-mar-2003','dd-mon-yyyy')) ,
.....

Hmmmm

Luis Matos, May 29, 2008 - 12:02 pm UTC

I do have several dates in the table, between other things. The one that would be important to this discussion if it was to be used would be in_time, meaning the time where the data enters the table.

But the data that has to be searched quickly is not only the "today" data. Some exceptional messages stay more days into what would be considered "today" data because - exceptionally - they don't reach their final state in one day. So the date itself can't be used as a criteria to the partition, the log_queue is used instead, meaning "final state reached". I.E., a message enters the table in another queue but during the day (in 99% of the cases) it moves to the log queue. Message moved to log queue = message that can after 5 days be archived.

My company actually doesn't mind the move to Oracle, since it's being sponsored, so it's not like they were very upset by having to make this project. My own opinion is that I like Oracle as a DB - it has much more features and well used it will allow for a much more performant application. I've already done a DBA course for Oracle, I'm actually going to do another, I am reading books, asking here questions, but of course thinking in a different way takes time.

May 29, 2008 - 1:19 pm UTC

you won't be partitioning on "like log%"

you'll almost certainly want that "queue name" to be a date, a date that is used to control the partition it is in (we'll move it if you update it). Easier to range partition with dates than strings - and you'll know for sure the date is actually a valid date - you'll never have 20080542 as a 'queue name' that is.

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

Bad design

Some clarifying

Thanks

Re: Thanks

Hmmmm

More to Explore

PL/SQL demos

PL/SQL docs

DDL on stored procedures? Or an alternative

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

Bad design

Some clarifying

Thanks

Re: Thanks

Hmmmm

More to Explore

PL/SQL demos

PL/SQL docs