Skip to Main Content
  • Questions
  • Design issue - Multiple schemas / additional fields which is better?

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question.

Asked: January 05, 2005 - 9:35 pm UTC

Last updated: September 27, 2012 - 8:31 am UTC

Version: 9.2.0.6

Viewed 1000+ times

You Asked

Dear Tom,

Your advice would be much appreciated for the following:

There are about 100 tables in a schema presently. Some of the tables are partitioned with date range. Some tables are huge tables with about 6 million rows inserted per day.

The application which was developed to cater for one company is now being enhanced to support more than one company (might go upto 50 companies). When a user is logging to the application he will have to provide the company id he belongs to. Though mainly he will be dealing with data pertaining to his company, he might also need to access other company's data in few instances.

To achieve this we have two choices:
Choice 1: Include additional column called company_id in all the tables. Create appropriate views.

Choice 2: Create separate schemas for each company with the same table structures. Give appropriate access rights.

Which one of this in your opinion will be more easy to maintain and also perform well?


Thanks,
Albert Nelson A.

and Tom said...

choice #1 is the ONLY way to go -- period.

Having a separate schema for each company would increase your shared pool need 50 times. It would increase your hard parses 50 times. If you use plsql (you better be!) it would make you maintain 50 copies of the code (invokers rights would not be appropriate). It would be a nightmare.


What I would suggest is:

a) you add that company_id to the tables (and leading edges of indexes, etc). make it part of your partitioning scheme as well if appropriate.

b) rename table T to T_TABLE (for each table). Create a view T as select<all but the company_id> from t_table where company_id = sys_context( 'my-ctx', 'company-id');

c) when the user logs in, set their company -- now all code that accesses T will see only their companies data.

d) set up another set of views that have a predicate

where company_id in ( select * from gtt )

and for people that can see cross company -- let them have access to these views. populate the gtt (don't let the users have INSERT or even select on the gtt!) for them. They can use these to see across companies they are allowed to see.




Rating

  (32 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

How large?

A reader, January 06, 2005 - 10:12 am UTC

Oracle is known to be a very scalable database able to support tables with millions of rows.

Partitioning and compression are the 2 major features Oracle provides to handle really large tables.

Just curious...what is the size (no. of rows/MB) of the largest table out there? At some point, doesnt the darn thing just become too unmanageable?

Tom Kyte
January 06, 2005 - 11:15 am UTC

I'm aware of a table with double digit BILLIONS of rows (over 50 B).

it is partitioned, each segment is sizable -- but the table itself is huge.

You manage segments. a table is not a segment -- so tables can get incredibly large and still be "manageable" -- only because you do not manage the table.

A reader, January 06, 2005 - 11:22 am UTC

Well, a table is not a segment only for partitioned tables.

In your 50B partitioned table example, unless all operations are done strictly on a per-partition basis (in which case its simply N tables with a UNION ALL view on them), there would still be global indexes, would there not?

Any kind of DDL/DML that would need to update the global index would take hours!

Also, any kind of FTS or hash join involving the full table would take hours.

By managable, I am referring more to the use of the table by apps, not so much the DBA aspect of it

Thanks

Tom Kyte
January 06, 2005 - 11:47 am UTC

would it?

the DDL that operates on a partition operates on the rows in that partition only.

If I drop a partition (assuming they even do that), and the partition is sized sufficiently small enough, it would take only as long as it takes to delete that many rows from the global indexes.

Again, it is not the SIZE of the total table here but the SIZE of the individual SEGMENTS.

if your partitions contain 1,000,000 rows -- it'll take 1/2 the time (approx) it would if they contained 2,000,000 rows. it is again, a function of the size of the individual segments -- not the table here.

(meaning, if I have 5MILLION row partitioned table and a 5BILLION row partitioned table and I drop a partition with 1 or 2 million rows, it'll take about the same amount of time against either table)

And -- as to the FTS question, the fact is -- if you HAVE to have that much data, you HAVE to have that much data, period. The size of your table should be exactly:

a) as small as it can be
b) as larger as it needs to be


your business drives the size of the table, nothing else. Of course, if you don't need 50b rows online, don't keep them online

about d) answer

Marcio, January 06, 2005 - 7:45 pm UTC

Tom, I'm concerning about item d) above in your first answer. Could you elaborate it a little bit in how to gtt comes to play into that fashion?

Thanks and regards,

Tom Kyte
January 06, 2005 - 7:54 pm UTC

99% of the time (assumption on my part) people want/need:

where company_id = VALUE


best way for that is the first set of views.


1% of the time " he might also need to access other company's data in few instances."

so for that, you would (instead of setting context) insert the company id's they are allowed to see into a global temporary table (gtt) and they would query the "views that expose company id as an attribute" but also use "where company_id in (select * from gtt_that_they_cannot_read_or_write_to)" in the predicate.

sys_context ..

Reader, January 07, 2005 - 12:02 am UTC

can you please elaborate the use of b)
Create a view T as select<all
but the company_id> from t_table where company_id = sys_context( 'my-ctx',
'company-id');

thanks ,



Tom Kyte
January 07, 2005 - 9:03 am UTC

search for sys_context on this site to see many examples.


The concept is your code uses:

dbms_session.set_context( 'my_ctx', 'company_id',
varaible_with_company_id_they_are_allowed_to_access );


an application context is settable ONLY by a specific piece of code (a package or procedure you name at creation time), hence the end user cannot set the company id to just "any value" they want. So, your code that implements whatever logic to set the company id it wants to -- will do so and the view will silently only let those rows get through.

See also:
</code> http://asktom.oracle.com/~tkyte/article2/index.html <code>



What about FGAC?

Dushan, January 07, 2005 - 5:23 am UTC

Would not be better to use the FGAC instead of view approach in this case?

Tom Kyte
January 07, 2005 - 9:18 am UTC

not for something "so simple", the predicate is constant here

where company_id = sys_context()


if the predicate is different frequently, it would be FGAC. But for something so simple, you don't need the extra's that FGAC has.

Thanks

Albert Nelson A, January 07, 2005 - 7:53 am UTC

Thanks Tom for your valuable advice.

Regards,
Albert Nelson A.

to: READER FROM US

Marcio, January 07, 2005 - 8:52 am UTC

I suggest you read both
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:1288401763279 http://asktom.oracle.com/~tkyte/article2/index.html <code>

And a little sample of playing with view and context.

ops$mportes@FCCUAT9I> create table t as
2 select user_id, username, account_status
3 from dba_users
4 /

Table created.

ops$mportes@FCCUAT9I>
ops$mportes@FCCUAT9I> create or replace context ctx using p;

Context created.

ops$mportes@FCCUAT9I>
ops$mportes@FCCUAT9I> create or replace procedure p( p_name in varchar2, p_value in varchar2 )
2 is
3 begin
4 dbms_session.set_context('ctx', p_name, p_value);
5 end;
6 /

Procedure created.

ops$mportes@FCCUAT9I> show error
No errors.
ops$mportes@FCCUAT9I>
ops$mportes@FCCUAT9I> create or replace view t_view
2 as
3 select username, account_status
4 from t
5 where account_status = sys_context('CTX', 'ACCOUNT_STATUS')
6 /

View created.

ops$mportes@FCCUAT9I>
ops$mportes@FCCUAT9I> select account_status, count(*)
2 from t
3 group by account_status
4 /

ACCOUNT_STATUS COUNT(*)
-------------------------------- -------------
EXPIRED 2
EXPIRED(GRACE) 5
OPEN 58

3 rows selected.

ops$mportes@FCCUAT9I>
ops$mportes@FCCUAT9I> select * from t_view;

no rows selected

ops$mportes@FCCUAT9I>
ops$mportes@FCCUAT9I> exec p('account_status','EXPIRED')

PL/SQL procedure successfully completed.

ops$mportes@FCCUAT9I>
ops$mportes@FCCUAT9I> select * from t_view;

USERNAME ACCOUNT_STATUS
------------------------------ --------------------------------
OPUAT_READ EXPIRED
OPUAT_INDER EXPIRED

2 rows selected.

ops$mportes@FCCUAT9I>
ops$mportes@FCCUAT9I> exec p('account_status','EXPIRED(GRACE)')

PL/SQL procedure successfully completed.

ops$mportes@FCCUAT9I>
ops$mportes@FCCUAT9I> select * from t_view;

USERNAME ACCOUNT_STATUS
------------------------------ --------------------------------
AKUMAR_DBA EXPIRED(GRACE)
BOCHC_READ EXPIRED(GRACE)
PYCHC_READ EXPIRED(GRACE)
OPCHCONTROL EXPIRED(GRACE)
PYCHCONTROL EXPIRED(GRACE)

5 rows selected.

rgds,
Marcio

About FGAC

Vikas Sharma, January 21, 2005 - 7:06 am UTC

Hi Tom,

If the requirment is to filter/restrict the data of a company with in its users than FGAC would be the Only Option. Am i correct ?


Thanks,

Vikas Sharma

Tom Kyte
January 21, 2005 - 8:32 am UTC

no, you are not. FGAC is one approach, FGAC is a very transparent approach.

but it is not the only approach, you can use views, stored procedures, other ways.

Company ID + Natural key primary key

Anil, January 21, 2005 - 2:44 pm UTC

Hi Tom

Refering to the first part of the question and its answer I would like to get a clarification. SInce we are adding company_id to all the tables in the application we natually have to add this key to the primary key as well. In this case is it advisable to have a surogate key and have unique constraint on the company_id and atual primary key?

Rgds
Anil


Tom Kyte
January 21, 2005 - 7:49 pm UTC

<quote>
we natually have to add this key to the primary key as well.
</quote>

that is not necessarily true..... totally depends on the nature of the primary key.

but if you have a natural key that is your primary key, and there could be conflicts across companies -- it would make sense to add company_id to the primary key, it would not make sense to surrogate it all (it would add yet another column to the table, it would add yet another index, neither of which is really necessary)

Company ID with Primary key

Anil, January 22, 2005 - 1:25 am UTC

Tom ,

<<
totally depends on the nature of the primary
>>

yes tom it is depended on the primary key, but experiance shows that in most of the cases if you have a business code primary key it has to add with company id otherwise it will have conflict when other company uses the same key in their data. I have many cases in the application especially in the masters tables(base data) where we have to add company id to the primary key there by having composite PK. Since it is masters and not very tranasctional I think I can afford an extra index that would be created for the surrogate key!!

Also keeping company id on all the indexes at the leading edge will have any implication on the index access??. Will it be same as accessing the index without the leading edge column?


Rgds
Anil







Tom Kyte
January 22, 2005 - 9:54 am UTC

I agree with that -- I was pointing out that to say:

<quote>
we natually have to add this key to the primary key as well.
</quote>

is not accurate. Me, I use surrogates alot, many of my "primary keys" are in fact populated via a single sequence for that table. Therefore, I would look at that statement and go "huh? why would anyone say that"

I do not see how adding a surrogate key AND the company id here is beneficial. So, the question is not whether you can afford it, the question is "what does it buy you, what do you get for your additional work". You know, if an insert into a table takes 1 unit of time, that insert with a single index will take about 2 units of time, with 2 indexes 3 units of time and so on.

Since all queries in this system will have "where company_id = sys_context()", having the company id in the index (doesn't have to be on leading edge but that probably makes the most sense in most cases) would be advised. I cannot say that it will be "the same" since it won't -- but it'll still be able to use the index to access the data by company.

What about full table scans?

A reader, March 02, 2005 - 1:47 pm UTC

What about the case when a query doesn't hit an index?

With all the data from 50 companies in one table these queries would seem to take much longer.

This might be compounded by the inability to customize indexes for a particular customer's favorite strange query.

Tom Kyte
March 02, 2005 - 4:52 pm UTC

partition if you feel that is going to be an issue.

50 partitions is pretty small, very manageable.

Then, no worries about "indexes", they are just local and all queries hit a single partition

(and what inability would you forsee with indexing even without partitioning?)


anything but multiple schemas -- anything but.

multiple clients issue,

sns, March 02, 2005 - 5:25 pm UTC

The company_id issue reminds me the scenario in our work place. Currently we have data mart in its own LINUX box for a particular client. We extract 3 years worth of data from a data warehouse for that client and store it in the data mart. The client would run ad-hoc queries on the data mart for their reporting purposes. The size of the data mart is 1 TB.

The data warehouse is a single instance that stores data for 8 different clients on its own LINUX box.

Two more clients are signing up with our company to have a similar kind of data mart. The data is quite similar across differnt clients.

The debate is, do we need LINUX box for each data mart OR put all the data in one database and allow the client to access their respective information using client ID?

There are three things to be considerd, PERFORMANCE, HARDWARE EXPENSES and MAINTENANCE.

What is your thought on this?

Thanks



Tom Kyte
March 02, 2005 - 5:44 pm UTC

if you partition by client_id, it'll be as of they have their own schema.

if you have sufficiently sized machine, they will have their performance.

you will minimize the hardware needed (as fewer databases need less hardware). Also, client "a" can use client "b's" spare cpu because A and B do not have private CPU's (this can also be consider a negative by customer B)

Maintenance of fewer databases is easier than of many database.


it sounds like you already have experience with a single instance for all though?

thanks,

sns, March 02, 2005 - 6:12 pm UTC

Well, I have experience putting all the information in a single instance which is a warehouse that does not get clogged due to SQL queries. However datamart does.

I agree your point that client A can use more CPU when client B is inactive. However, if both are active at the same time, then I don't know.

I think you kind of prefer to have all the client info in the same table (partitioned) and within the same database.

What if the size of each client is 5TB and the number of clients is 8?

Thanks,


Tom Kyte
March 02, 2005 - 7:02 pm UTC

resource manager can be used to ensure "fairness" .....

50tb and 8,000 or 500 meg and 8 -- I'd be looking for as few databases to manage as possible.

(you did catch my thought correctly - what I prefer)

Been there done that

dharma, March 02, 2005 - 8:47 pm UTC

We had the same problem in deciding between schemas and partitions, we tried the schema for each company and it was a total nightmare. Never even try that. We stuck to partitions and VPD since we had other security concerns.

The reason we went to the schema approach is because of some great so called gurus who recommended the management about the approach who wouldnt listen to us. After burning their fingers they had settle for the nicer approach.

We did it in 8.1.7, and I didnt know much about Tom then, and wasted 4 months trying some nut's ideas.

binds for the customer_id?

Gabe, March 02, 2005 - 9:57 pm UTC

Regarding the question raised by sns from autin.tx Â…

Assuming one would go for the partitioned-by-client data mart on one single box Â… would you not change the mechanism of the views?

Rather than one set of views like “where company_id = sys_context()” … have n set of views like “where company_id = <literal>”.

Different login schemas for the n customers on the box, each with their own relevant set of views Â… bit more management of the objects but no issues with the binds. For OLTP the sys_context() being a bind is perfect, as you mentioned, smaller shared pool needs, less hard parses, etc. Â… for the DM those are not as much of an issue Â… the potential (though remote) problem I see with sys_context() is bind peeking for customer A who has substantially less [or more] data than customer B, etc. Â…

How remote of a possibility? Since they rent/host the DM data they may decide to rent/host an analytical app on top of it as well (talk about competitive advantage). Maybe Â… do you think is it worth avoiding the sys_context() for DM?

Thanks.

Tom Kyte
March 03, 2005 - 7:18 am UTC

it would depend on the number of client ids -- for you would defeat the sql sharing, it would be a trade off.

on the plus side, you would get the managability of a single schema (code changes, and so on)

on the down side, you would have as much sql in the shared pool as you would with separate schemas.


In oltp -- you are right, i would not, in a data mart/warehouse, you very will might.

In fact, you could have a policy that was:

if ( sys_context( '...', 'application_type' ) = 'OLTP' )
then
return 'client_id = sys_context()'
else
return 'client_id = ' || sys_context();
end if;

to get sort of the best of both.

few extra things to be said ...

Gabe, March 03, 2005 - 10:03 am UTC

Thanks.

Actually, the bind peeking caveat would be an issue to consider even for multi-customer OLTP apps.

Typically I see issues related to bind peeking for Search functionality (frequently invoked but going against large data sets). The other two categories of functionality:
a. Discrete Selects (frequently used but getting discrete data sets)
b. Operational Reports (un-frequently used but shifting through large data sets)
are manageable: definitely binds for a) Â… easier to make a case-by-case choice between binds or literals for b)

A multi-customer OLTP app with disproportionate customer data volumes would exacerbate the bind peeking issue. So for the original case presented in the thread (existing app with 100 tables, some of them with millions of records added per day) I hope nobody would just follow the steps you listed, maybe increase the buffer cache, etc., deploy and just add another brand new customer hoping everything will work just the same. The technique is definitely sound but thorough scalability testing would be paramount.

IÂ’m starting to have some misgivings about incrementally adding customers especially when the application was built just for one. Even when built from the ground up as multi-customer, the disproportionate data volume would be something the seriously consider. I donÂ’t like the separate schema for each customer within the same database solution Â… budget permitting, I would rather have a separate box for each customer (those costs would typically be passed onto the customer anyway) and only put few customers on the same box when safe to do so.

Hope it makes sense.


Tom Kyte
March 03, 2005 - 10:06 am UTC

OLTP is characterized by queries returning predicable, small sets of data -- I would hope they would follow what I said above.

If you run queries/second -- bind.

If you run queries that take many seconds and don't run them a ton -- don't bind.


those operational reports may well fall into the latter category -- definitely.

data mart,

sns, March 03, 2005 - 10:41 am UTC

Thanks Gabe and Tom for your inputs.
I was just not sure how oracle handles large volume of data in the data mart scenario. Views, SYS_CONTEXT are facilities to capture the data when we have more than one client within the same table.

I haven't seen Oracle database bigger than 10TB so far. I am quite sure the way oracle handles multiple queries in a pretty big DB.

I have heard in a company they replaced oracle by Teradata since their size is now up to 25TB. Lot of end users query the database and they felt oracle is not quite capable for such a large database.

I still don't believe them but I cannot argue with them too.

Anyway, oracle is great.



Tom Kyte
March 03, 2005 - 10:57 am UTC

we've got cases where people replace teradata with Oracle, db2 with oracle, Oracle with XXX as well. And bigger than 10tb for sure.

</code> http://www.tpc.org/tpch/results/tpch_perf_results.asp <code>

(it is just a benchmark though..)

clarification on my part ...

Gabe, March 03, 2005 - 4:19 pm UTC

<quote>
OLTP is characterized by queries returning predicable, small sets of data -- I would hope they would follow what I said above.
</quote>

Yes Â… absolutely. As I said the technique you presented is definitely sound and it would be my first choice to consider. Hope my comments were not interpreted otherwise Â… only wanted to flesh out some of the potential issues/risks:
1. bind peeking with disproportionate customer volumes
2. thorough scalability testing paramount
3. risks associated with having too many eggs in the same basket

I realize those are “duh” kind of things … still deemed they were worth mentioning.

Cheers.


Tom Kyte
March 03, 2005 - 5:43 pm UTC

Yes, the whole "to bind or not to bind, that is the question" along with sound development/TEST practices...



Back to the original question ...

Paul, March 04, 2005 - 7:34 am UTC

Tom - back to the question about a schema originally produced for a single company being enhanced to cater for multiple companies.
What if the companies have their own business rules that need to be reflected in the database? Would your recommendation then be to move to separate schemas?

Tom Kyte
March 04, 2005 - 8:29 am UTC

give me a for example.

(and no is probably going to be the answer)

... and the example is.

Paul, March 04, 2005 - 9:05 am UTC

Tom - let's say you have a housing landlord management system that has many different tables - people, tenancies, transactions (rent raised weekly, payment received, adjustments), property details etc. The scenario would be that you want to split this portfolio of properties into separate groups and allow these to be mangaged on your behalf by different organisations whilst you still retain ownership of the properties. Each of these organisations may have different policies to deal with ,say, people falling into arrears with their rent payments. The separate organisations would also need to be able to 'see' only their data, whilst you would need to be able to look at the complete database.

Tom Kyte
March 04, 2005 - 9:38 am UTC

why would you need to look at the complete database?

Need to report on the whole picture

Paul, March 04, 2005 - 9:51 am UTC

Tom - I'll keep plugging away... There will still be a requirement to report on performance indicators like tenancy turnover, arrears figures, number and value of repairs etc. across the whole portfolio. As legislation and management requirements change so will the performance indicators.

Tom Kyte
March 04, 2005 - 10:18 am UTC

ok, so I'm missing something?

and how would multiple schemas solve the problem I'm not understanding totally?

A reader, March 04, 2005 - 10:32 am UTC

I am thinking a VPD is what that guy wants, just restrict the data they can see via a code or something.

Then he can look at all the data in one query

Tom Kyte
March 04, 2005 - 11:07 am UTC

me too, there is a policy that most people use to see "their data", and for some set of users, the policy returns the equivalent of "1=1"

Business rules

A reader, March 04, 2005 - 11:26 am UTC

What about the business rules? I think your reader wanted to implement different business rules in the schema related to the different organizations.

Tom Kyte
March 04, 2005 - 11:33 am UTC

for example please.

Insert Check

Jeff, July 22, 2005 - 4:46 pm UTC

I think I'm missing something.

Let's say you have an application with 10 tables. You add a column to each, eg company id. your context is set:

procedure set_company(p_company_id in number)
as
l_ctx varchar2(255) default 'my_Ctx';
begin
dbms_session.set_context(l_ctx, 'company_id', p_company_id);
end;


and your predicate is (simplified):

function policy_function
(p_schema in varchar2,
p_object in varchar2)
return varchar2
as
l_app_ctx varchar2(30) default 'my_Ctx';
l_predicate varchar2(1024) default NULL;

begin

l_predicate := 'company_id = sys_context('''||l_app_ctx||''',''company_id'')';

return l_predicate;

end;

You add a this same policy for each table, for inserts updates, selects and deletes.

What's the best way to handle inserts? Do all of the inserts need to go back and be changed to include sys_context...etc?

The app is already in-place and I've been ask to "make it work" for many other companies...it's in HMLDB, if that matters.

Am I doing this all wrong?

Thanks,



Tom Kyte
July 22, 2005 - 6:51 pm UTC

you have a trigger to set the

:new.company_id := sys_context( 'my_ctx', 'company_id' );




A reader, July 23, 2005 - 3:20 pm UTC

Not sure what I was doing incorrectly before. I had tried the trigger, but was still getting the policy violation error. Things are working well now. Thanks

Some Clarifications

A reader, January 27, 2006 - 12:55 pm UTC

Tom,

Thanks a lot for this one. It is very helpful. However our team has the following issues

1. What are the extras for FGAC that you refer here ?
2. We have a need in our research database (not a huge database where some information need to be separated by the research study). Only about 10 tables need to be restricted, others can be kept open. In this case, would you recommend FGAC or this approach with one view per those tables.
Given the following
- We don't have to deal with terabytes of data
- No need for partitions

Since the team could not figure out the costs by identifying the extras and also more programmer work for this approach, FGAC seems a faster reliable way to them. Your commments please.

The number of studies could grow to say a 10 in the next couple of years. The predicate is simple like in this case here.
Thanks

Tom Kyte
January 28, 2006 - 12:22 pm UTC

1) I don't know what you mean

2) FGAC isn't about "size" or "number of tables" or "partitioning". FGAC is about securing data in a flexible manner.

IF a single view is all you need, that would likely be the way to go - if you can have a single view that works for everyone. It is when you start adding a view per "role" that FGAC starts making a ton of sense.



To clarify my last question

A reader, January 30, 2006 - 11:10 am UTC

Tom,

I was refering to one of your answers. I have pasted it below. Could you help us understand better what are the costs of doing FGAC. An earlier question and answer below.

"Reviewer: Dushan from Czech rep.
Would not be better to use the FGAC instead of view approach in this case?

Followup:
not for something "so simple", the predicate is constant here
where company_id = sys_context()

if the predicate is different frequently, it would be FGAC. But for something
so simple, you don't need the extra's that FGAC has. " What are these extra's being referred here?
Thanks

Tom Kyte
January 30, 2006 - 3:17 pm UTC

the ability to have a different predicate returned based on any variable you decide to take into consideration.

FGAC allows you to return a custom predicate based on whether it is raining outside, who is asking the question, what time it is, what day of the week it is, what ip address the client is using, whatever you feel like.

Versus a static view that has one predicate, for everyone, all of the time.

If you don't need the flexibility of FGAC (and the overhead that'll come with it - the running of your code when we parse), you wouldn't use it - you'd use a simple view with a single where clause for everyone.

Multiple Client Instances

J, January 23, 2007 - 6:20 pm UTC

Hello Tom.

Here's another variant: We're in a purchased-COTS situation where each client has an exact copy of the same complete database structure. Client-A has an instance, Client-B has an instance, Client-C etc. So there's dozens of cookie-cutted instances, and most of them are small, < one or two gig. Conceptually easy, but the maintenance overhead is NOT. Oh, and the COTS is one of those that has the table relationships dynamically embedded within the application - we can't create our own tables, can't modify the tables (like add a Client_ID column), indexes, etc.

So... my Cunning Plan (semi-formed) is to explore combining these clients - one at a time - into a single instance, as different schema owners, with separate LMTs. Do you think it's feasible? This scenario has been discussed above, but what's "better" (yeah, I know I shouldn't use that term): Mutiple schemas or multiple instances? Your thoughts and insights would be appreciated.

Thanks and Regards.

Single schema scalable to a large number of companies?

Denis Roy, October 02, 2008 - 9:09 am UTC

Would you say the advise given to the original poster is still valid if his system serviced 1000s or 10,000s companies?
Tom Kyte
October 02, 2008 - 9:23 am UTC

especially then, the more 'companies' there are, the more vital it would be to use a single schema.

Imagine, 10,000 copies of

select * from t;

in the shared pool - because you have 10,000 different table T's out there. It would be a disaster to have a schema per company - and the disaster's effect goes up exponentially as the number of companies do.

schemas

A reader, October 03, 2008 - 5:29 pm UTC


How to copy company data to create a new one ?

Romain, September 26, 2012 - 3:28 am UTC

Dear Tom,

How to do if all companies are in one single schema and I want to create a copy of one of them ?

Is there a possibility to do that if all PKs are referenced by FKs without disabling constraints during this process or writing a very very long script ?

Thanks a lot

Tom Kyte
September 27, 2012 - 8:31 am UTC

you would have a primary key that includes company name AND id - just like you do right now (you are storing the company name in the 'schema' right now, it should be part of your table data!)

Silver bullet in 12c?

Apurva, January 28, 2014 - 8:16 am UTC

Dear Tom,

My question is pretty much the same as the one asked right at the beginning of this thread - just want to check if the answer is any different 9 years later :-)


Your advice would be much appreciated for the following:

There are about 260 tables in a schema presently.
All tables are refreshed on a weekly basis.
Some tables are huge tables with about 50 million rows inserted per week.
The application which was developed to cater for one company is now being enhanced to support more than one company (might go upto 45 companies).

To achieve this we have three choices:
Approach 1: Include additional column called company_id in all the tables. Potentially list partition on company_id to be able to selectively truncate/load data for a

specific company
- Pros: Maintenance is easy; if I need to change an attribute, I will have to change it at only once place
- Cons: the performance can go out of the window; especially on large tables - especially if I have some BI reports running on top of large tables
- Cons: my ETL codes will be a little complex

Approach 2: Create separate schemas for each company with the same table structures.
- Pros: ETL jobs will be easy; schema creation is easy
- Cons: Maintenance will be a nightmare (to say the least)


Approach 3: Does the multitenancy feature in 12c has something new to offer? I have read about CDBs and PDBs but I dont see them as a better alternative to 'Approach 1'. Would love to stand corrected. Does 12c have a silver bullet to solve this flavor of "multi-tenancy" problem?

Thank you very much in advance, Tom.

Warm regards,
Apurva

More to Explore

VLDB

If you are new to partitioning, check out Connor McDonald's introduction series here.

VLDB

Documentation set on VLDB and Partitioning.