Skip to Main Content

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Stuart.

Asked: October 24, 2004 - 5:27 pm UTC

Last updated: February 14, 2023 - 5:37 pm UTC

Version: 9.2

Viewed 10K+ times! This question is

You Asked

Tom

Is there a limit either internal to Oracle or to its maintainability for the maximim number of tables a database can hold. I have heard that the maximum is about 10,000. Is there any basis in fact for such a limit? The particular project I am working on has a requirement to hold many logical tables (between 10,000 and 100,000). We are thinking of employing a generic database design where the contents of these logical tables will be held in a single format with their metadata held in table clones of the data dictionary. The argument for this approach is to defeat the max tables problem before it bites us (that if it is a real and not imagined problem).

A penny for your thoughts.

Stuart

and Tom said...

there are no maximums as far as numbers of tables go.

however, a schema design that has 100,000 tables would be something I would seriously question. I'm not aware of any team of humans that could possibly design, implement, or maintain such a thing personally.




Rating

  (18 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

What the...?

Jon, October 25, 2004 - 5:20 am UTC

Many a Q&A is read that raises the eyebrows - a few even provoke a response. This is one is right up there. My curiosity has got the better of me...

Stuart, what on earth are you modelling that could potentially required 100000 tables?

I've ask a few people in our project team - no developer, DBA, designer or architect has ever come across such a design before...



Re: What the...?

Tony Andrews, October 25, 2004 - 8:22 am UTC

Let me guess: there are 10,000 customers and someone decided to build a separate order table per customer? For performance reasons of course ;-)

"generic database design" says it all, I suppose...

Kashif, October 25, 2004 - 11:34 am UTC


Perhaps it's no so strange after all?

Stuart Dentt, October 25, 2004 - 4:34 pm UTC

The schema that I have in mind which needs to hold so many tables is designed to store the survey responses as part of a all purpose campaign management system. The questions asked in each survey are completely different and hence the responses from each survey comprise a logical table. I am wondering if it is sensible to hold the responses from each survey as a separate table (many tables - high performance)- or go for the generic (few tables - poor performance) database design.

Tom Kyte
October 25, 2004 - 4:39 pm UTC

the questions asked might be different -- but so what?

you have 3 or 4 types of questions.

You have a survey, made up of questions, questions have one of 4 sets of optional attributes.


to me, this clearly sounds like 2 tables, not 100,000


a survey table
a survey questions table




A reader, October 25, 2004 - 5:50 pm UTC

Haha,
10000 customer table for storing customer attributes.?
What a great thinking. You must get Phd for Relational Database Management

Hello Stuart

Kamran, October 26, 2004 - 2:37 am UTC

You may think of a design like:
Serveys(SurveyID, SurveyDescription, ..., constraint PK primary key(surveyID))
Survey_details(SurveyID, Question, Response, ...,
constraint FK foreign key(SurveyID) references Surveys(SuveyID))
Note: Managing fewer objects is easy where as the number of objects (tables etc) goes on increase it might be impossible to manage.
Query: How can you say that performance 'ld be increase if you use separate table for separate case(survey or whatever)?

A reader, October 28, 2004 - 12:25 pm UTC

It's not that you make your administration simple. Having that many tables for a an "Entity" called Survey is against the relational database design. By an Entity, you are dealing a problem of grouping of information that are related to a real life thing. For example Customer. It stores all the relevent attributes of a customer. If you had 10000 customer table , it will be customer_1..2 and so on. How would one build a code for an invoice, that has a customer attribute. Strange thinking.



Might make sense

Hugh Winkler, May 14, 2010 - 9:35 am UTC

Tom,

A human or team of them cannot manage 100,000 tables, but I'm sure the OP means for a computer program to manage them.

What's striking about the snarky pedantic responses from the commenters is, they describe no real functional disadvantages to the proposed design -- the best they can do is say it would be "against the relational database design".

If you have an entity table that you anticipate would have billions of rows; if you typically will query for rows associated only to one or a few rows in a foreign table; then you can vastly reduce the number of rows needed to search for full table scans or index range scans by splitting the billions of rows across 10^5 tables. It's poor man's partitioning.

Is that reasonable?

Tom Kyte
May 24, 2010 - 7:17 am UTC

.. A human or team of them cannot manage 100,000 tables, but I'm sure the OP means
for a computer program to manage them.
...

then they likely made a huge mistake.


why would anyone full scan a billion row table in a transactional application?

There is partitioning, and if you had some logical thing to partition by, you would use partitioning.

There is a saying "penny wise, pound foolish", you may choose where to spend your money (on software that does something for you - or on people to write, debug, maintain, enhance developed software that does something for you.

Me, I would buy a Human resources application in the 21st century, not build one. But there are those that would build one saying "it would be free!!!!". Free is a funny term.


How about you give a concrete example demonstrating what you mean and then we can address it.

It's the cost. Partitioning option costs 10x

Hugh Winkler, May 26, 2010 - 8:56 am UTC

Thanks Tom. We have a problem that partitions trivially. Approximately:

---- TABLE samples ----
*container VARCHAR2(64)
*logid VARCHAR2(64)
*dt INTEGER
values VARRAY(1000) OF SampleValue


The primary key is (container, logid, dt). Within a contaner there are zero or more "logs". Each log has values at many distinct times; each value is an array of name/value pairs, likely to have 50 to 200 or so values. For each log there may be hundreds of thousands, or even millions, of dt.

Queries *always* confine themselves to one container. So we can partition by container. Alternatively, we can write code to manage creating one table per container. This code isn't all that hard to write. We'd need a metatable where we map container id to a synthesized table name, and our stores and queries would dynamically create the sql statement.

While there is an extra programming headache, it seems manageable, because our case is pretty simple. We are ISVs and our software currently is capable of running on SE 1, although most customers do use Enterprise Edition. Using Oracle partitioning increases the Oracle cost by an order of magnitude. Here are the current (May 2010) processor license costs:

SE 1 EE + Partitioning
$5800 $59,000

So, using the Oracle partitioning strategy for us, means our customers see a minimum Oracle cost at least $50,000 higher. For some of our prospective customers, that puts the solution out of reach.

Nevertheless, we can adopt a business strategy that uses Oracle partitioning for those customers that can afford it, and for the lower cost customers, does not attempt to support the very large databases supported by the partitioning strategy.

So, when we're making this decisions we need to understand the true cost of managing the partitioning ourselves. What problems would we encounter, and would we really get similar benefits to partitioning?








Tom Kyte
May 26, 2010 - 9:13 am UTC


what might cost 10x the amount would be the salaries of the developers and dbas that constructed such a thing and have to maintain/enhance it over time.

... So, when we're making this decisions we need to understand the true cost of managing the partitioning ourselves. What problems would we encounter, and would we really get similar benefits to partitioning? ....

flooding the shared pool with 100's of thousands of SQL statements for one, that would be the first one - doesn't scale very well at all.

then having millions of indexes to manage and maintain - flooding the library cache (shared pool again)

You'll have to almost certainly forgo using the most efficient language ever for manipulating data in the database - plsql, as you would have to use dynamic sql for everything.

You will have serious scaling issues with a multi-table approach. No real ability to use plsql and other features.

Well I guess I have to follow your advice, now

Hugh Winkler, June 10, 2010 - 5:04 pm UTC

Heck. Now it's documented on the interwebs forever, that I have Asked Tom whether creating many tables would be a good substitute for partitions -- and that Tom said NO. If I implement that strategy in spite of the advice, and something goes wrong, I get fired!

Just kidding, Tom. Thanks for this advice. We've implemented a partitioning strategy and early testing shows a significant reduction in I/O for common queries -- so we're getting what we need.


Tom Kyte
June 11, 2010 - 7:12 am UTC

but I gave reasons - not juse "i say so"

here are some of the reasons right from this page:

o I'm not aware of any team of humans that could possibly
design, implement, or maintain such a thing personally.


o to me, this clearly sounds like 2 tables, not 100,000 (*meaning - your design is overly complex, let me show you a different way)

o flooding the shared pool with 100's of thousands of SQL statements for one, that would be the first one - doesn't scale very well at all.

o You'll have to almost certainly forgo using the most efficient language ever for manipulating data in the database - plsql, as you would have to use dynamic sql for everything.

o No real ability to use plsql and other features.

NoSql

Mani, June 15, 2010 - 7:44 am UTC

From what I understand, your requirement (online survey, polling system, etc) could possibly better designed with a NoSql DB, rather than a relational DB.
Tom Kyte
June 22, 2010 - 10:20 am UTC

to collect the survey - easy, you just need a "bag" to put stuff in.

Now, try to analyze the results.

Campaign management system

Sudip, June 18, 2010 - 10:27 am UTC

I did come across a campaign management system based on Oracle 9i database. The system used to create a separate set of tables each new Ad campaign.
The list of pains in the this kind of design hugely exceeds the list of gains in this approach.

Sounds like an Oracle eBS flexfield...

Matt McPeak, June 18, 2010 - 10:45 am UTC

How about this instead of 100,000 tables...

1) Create a table SURVEY_RESPONSES with enough generic columns to hold all the responses from your biggest possible survey.

primary_key_info...
survey_id
response1_number NUMBER
response1_text VARCHAR2(4000)
response1_date DATE
response2_number NUMBER
response2_text VARCHAR2(4000)
response2_date DATE
etc, etc.

2) Create a SURVEYS and SURVEY_QUESTIONS table (parent- child) to define what surveys are, and what the questions are.

3) Create a process to create 100,000 views -- one for each survey -- against the SURVEY_RESULTS table. Typical view would be:

SURVEY_18201_RESULTS_V AS SELECT response1_number AS responders_age, response2_text as political_party, etc
FROM SURVEY_RESULTS
WHERE SURVEY_ID = 18201.

I think I'd rather have 100,000 program-generated views than 100,000 tables.

Thoughts?

- Matt



Variable Data

GH, October 29, 2010 - 9:58 pm UTC

Here is the scenario where I think 1000+ tables would be better option than generic table (but still <10,000)
1. Data is variable (field names, number of fields, data type) per scenario within a customer. Like that many customers
2. columns should support upto 2K size per column
3. Business user would be defining these tables per business requirement
4. Data queried several 1000 times per minute for transaction processing (High Performance required)
5. Most entities consist fewer than 1000 records, but few percent contain several million records. So putting them all in a generic table would affect query performance on all



My points to go with table per entity/scenario are
1. Just to support 2K size for some scenarios, creating a generic table with all columns as 2K would be a overkill
2. Program APIs would have to fetch metadata to construct the query before querying the data. With entity specific tables, you query directly and oracle does the job of constructing the query column list (SELECT *)
3. SELECT * provides resultset with columns referenced as functional names. Same can be achieved by aliasing the generic field. But this again requires #2 above
4. Key reason is performance

Not difficult of 100000 tables but any alternate solution ?

Surajit Biswas, March 07, 2016 - 2:28 pm UTC

Hi,
As you said that whether any developer can make 100000 tables or not --- My answer is "not that much tough too implement". Rather here I do have some questions for you. Suppose an application is used by 100000 users and for every user, there is a specified work area given. In that work area , suppose many objects (which users are creating themselves) like some image , some text box, some paragraph ---- these all are needed to be recorded in DB table only. So for every users work area, there 'ld a table for each. If user count increases, then a new table is required to store that users work space data.

Is there any alternate suggestion which you can suggest ... ?
(not like that in a same table, alter and add columns to store those definitions)


Chris Saxon
March 08, 2016 - 1:39 am UTC

Often the *implementation* of these things is easy, but the next question is then "What now?".

By this I mean, lets say we had 100,000 user accounts, and within each one, they could create any structure they wanted. The database should be fine with that (its basically like a tenanted arrangement) because in such a free-form model, we can assume that no user will ever want to query any other users stuff (because they have no knowledge of the structures in any one elses account).

Yes, there will be significant demands on memory and other resources *if* all of those users are in active use, and that would require some prudent testing and benchmarking.

But that's when we get to the "What now?" moment. Either that data is not really that important ... in which case, why did we bother. Or, that data *is* important, but now we're in a world of pain, because we potentially 100,000 different structures that someone might want to analyze, consolidate, etc etc...and thats near impossible. One user's definition of "sales" might be a 4 column table with 1 rows per sale; another users definition might the same data in an Excel sheet stored as a blob. Free form = incredible challenges to get value from the data.

So often ... once the implementation is done...only *then* to people start thinking - "Hmmm...how do we organize this data?". That leads to more common structures, and suddenly you dont need each user having their own tables...they can share structures. But by this point, you've got 100,000 data migrations to perform :-(

Hope this helps.

But is there a technical reason to limit tables?

Rich Barone, February 11, 2023 - 1:39 pm UTC

I know this thread is old, but we have a developer that creates Point-in-time copies of data, and names them with the date. At the moment, we have 80,000 of these tables, with 31,000 indexes. They are rarely queried, and I've suggested rather than soo many tables of the same data, use several tables with a timestamp, and eliminate 99.9% of these tables.
It is a point of aggravation to me, but I need to know if, other than the performance of querying Data Definition Views, is there a performance, or any other technical reason I can give the developer's manager to get them to halt this process?
Connor McDonald
February 13, 2023 - 4:35 am UTC

Besides the size of the data dctionary (that you've already noted) its unlikely to cause any issues. For example, our own Applications suite when fully installed in tens of thousands of database tables/indexes.

But a question worth asking that developer is - to what purpose? What are you planning to do with "table2218343" ? I'm struggling to find a justification

The mystery of why BA's do what they do

Rich Barone, February 14, 2023 - 2:47 pm UTC

Re: What are the doing with table "table2218343"?

I believe they're keeping them so if the Business Analyst ever wants to know what the data looked like six months ago, she could find the right table, and look in it. I have no idea what business need this meets. It points to poor business practices, IMHO. For years, I have tried to get this particular BA to describe her processes for purposes of automation, and have been met with resistance. Worst of all, we maintain History of everything that happens in our database, so on the rare occasion she needs to look back, we could write a report query to get this information.
At the moment, we have 1T in these tables, when the rest of the database is only 7T.

Clarification

Rich Barone, February 14, 2023 - 2:52 pm UTC

To clarify my comment:
"I've suggested rather than soo many tables of the same data, use several tables with a timestamp, and eliminate 99.9% of these tables."

Instead of
TABLE_NAME20220213
TABLE_NAME20220214
TABLE_NAME20220215
. . . .
Have one table, with a column designating the date it is applicable, like
TABLE_NAME with a column POINT_IN_TIME_DATE
Chris Saxon
February 14, 2023 - 5:37 pm UTC

I agree!

That said, you mentioned that you're already capturing change history - so presumably the data they're storing is already available in some form.

Does the analyst know this? Can they access it easily? With simple SQL queries?

I suspect the analyst is doing this out of some fear, e.g. they'll lose data/be unable to reproduce reports/... Understanding their reasoning may help you show how they can do it properly with what the system already does. Or perhaps show that there are gaps in how it functions that need to be addressed!

It's also worth bearing in mind that while "disk is cheap", it's not free. If I've understood right this data accounts for ~12.5% of the total size of the database. The ~1 terabyte in the production database will be replicated in backups, DR sites, potentially test/uat databases, etc. Removing it could lead to a big drop in the storage needed.

So there could be a cost argument to be made for encouraging the analyst to stop this!