Skip to Main Content
  • Questions
  • How can we design the big database. e.g. Tera byte Database

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Awan.

Asked: October 19, 2002 - 12:41 am UTC

Last updated: April 12, 2011 - 1:14 pm UTC

Version: 8.1.7

Viewed 1000+ times

You Asked

Dear Kyte:

You are really doing a great job, and the way you explain the questions, I really love that style, because it really make the things clear.

Well my question is:


1)Some times we asked in interview, that how will you design the very big database, e.g. of tera bytes or may be more than this. And how you will confiugre the SGA? How many tablespaces you will make? And how many datafile will you make? And how many rollback segments will be created etc. So how can we answer this question.

2) And if we have some powerfull machine with multiple processors and many physical hard disks, then the database will be designed differently or will be changed with respect to the multiple processors.

Thanks and Regards,
Awan


and Tom said...

1) and I would look at them like they were crazy.

I would ask them -- "ok, so tell me how you would configure the SGA and I'll give you 5 examples of tbyte+ instances where your method would be totally messed up"

"tell me how many tablespaces you would have" and I'll give you 5 example where your method would be totally messed up

same with the datafiles

and rollback segments -- ESPECIALLY this one (well, the SGA question is a close second).


You can only talk in generalities here. You size the SGA based on the needs of of the application. It this a terabyte transactional system or a terabyte data warehouse (we do both). The needs would be totally different. How much ram is on the machine, how many users (wanna leave a bit of space for the PGA you know). Anything else going on here on this machine we need to be aware of? My first response would be "well, since this database doesn't exist yet, we should not even for a second consider using 817 but rather 9iR2 (or whatever is the latest production release of Oracle as you are reading this) since it has about a gazillion features for doing big databases that 817 didn't.

How many tablespaces? well, won't that be a function of the number of objects -- and you are using locally managed tablespaces so the right answer might be "I want three -- small, medium and large". For a system like think, you won't be micromanaging storage -- you'll be letting the hardware dudes do that at the storage array you'll be using. So, you don't need to segregate objects by tablespace and GIVEN that we are using 9i, we'll have block level recovery which means we can recover from corruptions keeping the datafile and tablespaces online and avaiable so we don't need thousands of tablespaces. (the answer won't be three, it'll be some number greater then three, but that is about all I can say).

How many datafiles? depends on the OS doesn't it. We'll be using the biggest we can -- generally in the 16 to 32gig range most likely.

Rollback segments -- we, we are going to use 9i, so RBS configuration is not done by "number of rbs's and size of each" but rather "what is the retention period we need to specify". and that is based solely on the length of the longest running query so..... insufficient data to compute!


2) I've never heard of taking the number of CPU's into consideration when designing a physical schema personally.

We might use different features (lots more PARALLEL operations for loading, indexing, bulk updates, etc) but I've never thought about doing something in my physical design based on the number of cpu's -- no. (the one thing that'll change is the number of cpu's - we add more, we change hardware -- #cpu's is not a constant. in fact, it can change while the system is up and running these days!)

Rating

  (7 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Ghulam Rasool, October 20, 2002 - 6:18 am UTC

Awan asked a good question and as usual YOU replied well. I wish the people who ask these kind of questions, read this.


Many thnaks for such a nice answer.

Awan, October 20, 2002 - 6:58 am UTC

Dear Kyte:

Well it is very good answer indeed. You are smart. I am happy that you explain the things in a very proper manner.

excellent advise

Eric D. Peterson, January 19, 2004 - 11:28 am UTC

A coworker and our DBA is recommend that we go from multiple tablespaces to just two (one for tables and one for indexes). This is part of our move from 8i to 9i. I've been trying to explain to them the usefullness of a variety of tablespaces with different extents sizes (your small, medium, large model). But to no avail. Can you provide more fodder for this discussion. I've been sending them various articles from here but would like some more concrete examples on why one tablespace may not be the best idea.

I haven't heard them say but I'd guess they are thinking of one tablespace with varying extent size. I've seen your comments on how that can lead to fragmented available space. Would this lead to a regular scheduled coalessing of the tablespace to defregment it?

Thanks much for any help you can provide.
Eric

Tom Kyte
January 19, 2004 - 12:59 pm UTC

I use two per application at most myself (usually just one -- both tables and indexes go into it)

using LMTs with autoallocate extents is just too easy to skip over. I've grown over the years to become very fond of that storage mechanism (wrote about it in my last book). Even easier than small medium and large.

how i handle bad interview questions

Ryan Gaffuri, January 19, 2004 - 2:09 pm UTC

I routinely get asked interview questions where the interviewer is looking for a buzzword response and they clearly have not even read the documentation. The most generic of these is 'How do you tune a query?'. They are looking for these buzzwords: explain plan, 'cost', full table scans, and cartesian joins.

The best way I have found to answer this is to do the following:

'I'm not sure what you are looking for. So first I'll give you the generic response, then I'll tell you how I do it and why it is better'.

The guys I talked to argued with me over ignoring 'cost', but after a little bit they realized I knew what I was doing. Occasionally you will get the true 'know it all' interviewing you who insists what is incorrect is correct. The vast majority of time people will realize you know more than they do. Since the people interviewing you have not read the documentation, that isn't saying much...

I get that SQL Tuning question almost every time.

Tom-- The OFA recommends using 7 tablespaces. Should it be re-written? Isn't that the official Oracle 'statement' on how to set up your tablespaces and datafiles?

I'm assuming by two tablespaces you mean Undo, data, and then a third for system right?

Tom Kyte
January 19, 2004 - 2:35 pm UTC

I'm talking at the "application level"


someone comes to me and says "hey, i have this new app we are going to put into your database that we just wrote. It does 'x'. Can you give me some space"

Well, I'll create them a single tablespace, with locally managed extents, set to autoallocate and tell them to leave off ALL storage clauses and use that single tablespace. At most, they might get two -- to separate tables from indexes, but not usually.




Thanks much for such clarity

Eric D. Peterson, January 19, 2004 - 3:03 pm UTC

Thanks that will help in our discussion. The current discourse now is on what are the advantages for each side. I.e. Using three tablespaces = better performance, better management of tables. versus. One tablespace = ease of use, better managment by allowing the database to handle it, easier coding.

But what would you say are the downsides to using three tablespaces versus one?

Thanks.
Eric



Tom Kyte
January 19, 2004 - 3:55 pm UTC

there really are not downsides in general to either implementation. It starts to come down to a matter a choice.

it is easier to judge growth in the uniform sized lmt.

if you don't know the ultimate size of all objects (and hence cannot figure out "s", "m", and "l"), autoallocate looks really good.

How to design VLDB (version 11gR2)

JUN GU, April 06, 2011 - 10:40 am UTC

The database is OLTP with heavy writing (inserting, merging) - 5 million orders messages (db size increases 6G per day). The goal is to handle 50 million orders per day.

The tables are partitioned by month and ASM is used to spread out I/O. We only keep two month data online, so every month we retire 2-month ago data.

Currently it runs fine. But with the goal 50 million orders per day (60G writing), the performance will become a problem. If the tables are partitioned by day, it would be a nightmare for the maintenance.

Also because of heavy writing, there are lots of redo logs generated (even after I turn off logging on some big tables). Currently, the redo log size is 1G, and log switch happens every 3~5 minutes during the peak time. With 50m orders, how can I handle the redo logs? How big can redo logs be?

I sometimes see indexes contention, will it be better if I put the indexes (on partitioned tables) locally instead of globally?

Thanks,
Jun
Tom Kyte
April 12, 2011 - 1:14 pm UTC

If the tables are partitioned by day, it would be a nightmare for the maintenance.

why?

(even after I turn off logging on some big tables)

I doubt that did anything for you. It would ONLY affect things like create table as select, sqlldr direct=y, insert /*+ append */, alter table t move and the like. It does NOTHING for insert, update, delete, merge. It does nothing for index maintenance.



But with the goal 50 million orders per day (60G writing), the performance will become a problem.

why - what reasons do you see for that, what are the technical issues you are anticipating. Give you have sufficient CPU, IO bandwidth (NOT storage, IO capabilities), this isn't going to be astronomical.


With 50m orders, how can I handle the redo logs?

Again, what issues do you anticipate? What do you have to do to "handle" redo logs other than

a) have the space
b) have the IO capabilities
c) have the plan in place to handle archiving, which you already have a need for


I sometimes see indexes contention, will it be better if I put the indexes (on partitioned tables) locally instead of globally?


"it depends". If the index was on LAST_NAME for example - it would probably be a really bad idea to use local indexes since a query of the form:

where last_name = ?

would have to scan each and every local index partition. It would make more sense to globally partition that index for retrieval AND for reducing insert contention.


An index on the partition key would be different of course, it would be local. However, an index on your date column (monotonically increasing value) will be a point of contention! It will be a right hand side index, always growing.

How to design VLDB (version 11gR2)

Martin, April 12, 2011 - 11:16 pm UTC

Hi Jun,
I am interested to know what kind of business you work for that handles 50 million orders per day?