The Tom Kyte Blog: Building a better Website and Logic...

Saturday, October 13, 2007

Building a better Website and Logic...

I read this article on 19 things NOT to do when building a website. I pretty much agree - I mostly avoid websites that do many of those things (although I find pull down menus annoying, I don't think they should be outlawed entirely - if they work correctly - meaning they work just like tabs and the menus can be ignored ;) ). Do not resize or move my window, do not make me sit there thinking the slow loading home page is the home page only to discover it is some animated graphic you find amazing (and I find to be a yawn), don't try to impress me with your flash capabilities (you are presenting your site to a guy that has disabled animated gifs in the browser - I don't like pages that "move"), don't play music for me - I'm already playing music and mine is better than yours, if you don't work in Firefox - I'll use IE tab to see if there is a form to complain - but that is about it!

In short, a pretty good collection of "things not to do" - sort of like "worst practices". I do a worst practices talk about databases, it is much easier to do than a "best practices" - for the simple reason that worst practices are pretty much universally wrong - whereas a best practice applies just a little more than 50% of the time; in many or most cases - but not all. With best practices you have to do more work and caveat them - describing when not to use them as well as when to use them. Worst practices, so much easier!

I stumbled upon this fake motivational poster shortly after reading the 19 things article. The two seem to go together! Do many of those 19 things and you should just get that poster framed and hang it above your monitor.

Onto logic and false conclusions... I was reading this Dilbert blog entry which points to this NY Times article. Just because a set of experts in a field say something is true, you might still want to ask about the science. I thought the analogy to "Who wants to be a millionaire" with the "ask the audience" question was excellent. When the audience is polled at the same time for what they think to be true - they generally get the right answer. But, what if we polled the audience one by one asking for their opinion, what would happen then... It is highly likely that the slowly building consensus would take over and sway the audiences answer as each in turn spoke theirs out loud. Makes you question the "but everyone knows" statements - or at least it should :)

I'm still dumfounded at how many times people still say "separate indexes from tables into separate tablespaces for performance". Sigh, but everyone knows you should... So, there you go.

22 Comments:

Stew said....: On web usability and design, I read regularly the "Alertbox" columns on www.useit.com: Jakob Nielsen's Website. Lots of common sense, some discoveries and real studies to back it all up.; Sun Oct 14, 03:05:00 PM EDT
Markku Uttula said....: ...everyone knows you should *what*?

Can you please finish that line? Because I've been seeing a lot of "we've separated indexes into another tablespace for performance" lately, and I was hoping to have something other to say than "ah, ok, I'll just cope with that".; Sun Oct 14, 11:22:00 PM EDT
Anonymous said....: Markku:

... because, the sentence would make sense when instead of "separate indexes for performance", it would have said "distribute your files on multiple physical devices".

There is, of course, the theoretical possibility that you would run into contention problems on file handle / db file header level; however, this minuscule compared to disk latency. Your database would become unusable for other reasons long before you even start noticing the file-level contention.; Mon Oct 15, 03:48:00 AM EDT
Thomas Kyte said....: Markku Uttula said...

I just recently wrote in an email:

In 1990, there is a chance this "made sense", back when a big oracle database was 200megabytes and you had 10-20mb disks and no striping. today, it is just silly.

Ask your DBA's to sit down and think about it - challenge them to show why this is so - coming up with the arguments for and against.

If you are using an index to access data, you are doing single block IO - both on the index and the table (meaning, the method used to read the index is IDENTICAL to that used to read the table, the table and index are treated precisely the same). Further, the IO is sequential in nature, read index root, read index branch, read index leaf, get rowid, read table block. What difference would it make if that 4th IO is against a different tablespace on another device? answer: none. Tell them to think about this...

The goal is even IO over all devices. In 1990 one way to do that would be "put some stuff on disk1, put other stuff on disk2". Some stuff on disk 1 - indexes. Some other stuff on disk 2 - tables. About half the IO would be against index, the other half against the table (because the index root and branches are very cachable - we read just the leaves everynow and then - the table, some is cached, some is not - it worked out on average to be a way to split IO out)

Now that you have hundreds or thousands of indexes and hundreds or thousands of tables - IT JUST DOESN'T MATTER - you do IO tuning at the aggregate, you stripe, you use massive volumes, you use your SAN/NAS and hardware to spread it out.

Anonymous said after Markku

And if there were data file header contention, you would just use more smaller files in the construction of your tablespaces. A tablespace is a collection of files.

If we were to create a tablespace with many files - each from many devices and use small extents - so as to have Oracle spread the data for the segments in this tablespace round and round - we would have invented raid striping before it was cool!! (well, we did)

but people got so hung up on another myth - that of a "single extent is best", that no one did that and we ended up separating indexes from tables just to try to spread IO out.... sigh, one myth creating another myth.; Mon Oct 15, 06:31:00 AM EDT
Anonymous said....: While my DBAs are not saying seperate table/index tablespaces for performance, they are arguing that it allows them to restore from some bad disk issue. One time long ago, they had a datafile go bad, so they had to restore a tablespace. Since it had both tables and indexes it took a very long time to restore. Now their thinking is that if they seperate, then rebuilding indexes will be faster. And if there is a disk failure then there would be a 50/50 chance it will only hit the index and not actual data.

I find this argument bogus and am unable to get the DBAs to consider any other design. At least we got rid of the large/medium/small tablespaces. So instead of 6 tablespaces/user, we have 2.; Mon Oct 15, 10:50:00 AM EDT
Thomas Kyte said....: they are arguing that it allows them to restore from some bad disk issue.

and they are 100% correct.

and they get it.

DBA's should use tablespaces to make their life better, more enjoyable, more relaxing.

Tablespaces are an organizational thing. They are not about performance.

Now, I doubt that in general, rebuilding all of the indexes would be faster than recovering a single datafile. If they kept their DATAFILE sizes down - they would achieve the same mean time to repair (the time to repair a damaged file - regardless of what is in it)

If they have a 32gb datafile - that'll take longer to restore and roll forward than a 2gb file.; Mon Oct 15, 11:10:00 AM EDT
Anonymous said....: Regarding the conventional wisdom, I work as a database developer and development DBA for a large company that sells some expensive software (among a lot of other things).

We've recently implemented our product at a new client. The DBA assigned by the customer to work with our team on the implementation sent a laundry list of questions to us after the first install: Shouldn't tables and indexes be in different tablespaces? Why are these all in the same filesystem? Why are the tables created with just one partition? And on it went.

It took two or three phone calls before I convinced the guy that his conventional wisdom about tables and indexes and tablespaces was obsolete, that his massive RAID setup would cover most of our filesystem i/o balancing issues, that we create/drop partitions in the app as we need them, etc. But he was about to need therapy before we got through to him. Intervention!

And this, in 2007.; Mon Oct 15, 11:56:00 AM EDT
Markku Uttula said....: First, a great big thank you on the answer. I was quite certain that this was caused by FUD based on long since outdated "knowledge", and I always like when it turns out I was more or less correct.

On the original subject, the linked article ended up on my bookmarks. Those are also the kinds of things I'm repeatedly forced to explaining to customers (and sometimes to coworkers, luckily they are much faster to get the points across:).; Mon Oct 15, 04:50:00 PM EDT
Jeffrey said....: Tom,

I've been reading your work for quite a while and agree with your logic about data and index tablespaces. However, one reason why people may still be following the separate table and indexes myth is that Oracle is still printing that dogma in many of it's training documents.

I went and confirmed that in the "Oracle OCP Oracle 9i Database: Performance Tuning Exam Guide", Charles A Pack, Editor, Oracle Press, Chapter 7, page 147:

"Table and Index Segments - Keep index segments separate from their associated table segments."

The chapter states the idea at least three times. If you want to kill this myth, you might want to ask the other branches of the Oracle corporation to stop printing it.; Mon Oct 15, 04:55:00 PM EDT
Anonymous said....: Hi!

Suppose I have table that later will have lot of Update and Select operations.
Does it make sense to create table data into tablespace that have small block size for example 2k , and index into tablespace with the block size of 16k?
I have done some performance measures and result seems promising!?..

respect,
Grinalds from LV; Tue Oct 16, 04:23:00 AM EDT
Thomas Kyte said....: how did you do performance measures and what seemed promising about it?

moving table data into a 2k block - would increase concurrency as you store fewer rows per block, less people going after the same bits, but then - putting the index into a 16k block massively decreases that since you get so many more index entries per block.

I would not use non-default block sizes, no.; Tue Oct 16, 06:18:00 AM EDT
Anonymous said....: Let's hear that
evil laugh Tom!; Tue Oct 16, 10:02:00 AM EDT
FirstYearAndSurviving said....: not a question.
promise.

http://www.schoolrack.com/MsLarson/

did i follow the rules?; Tue Oct 16, 11:50:00 PM EDT
Thomas Kyte said....: yes, MsLarson, you did :)

http://tkyte.blogspot.com/2007/05/been-while.html#comments; Wed Oct 17, 12:17:00 AM EDT
Byte64 said....: Hi Tom,
i think that many of those best practices fall in the common sense area, which doesn't mean necessarily that everybody is willing to follow the advice, on the contrary :-)
For instance, much of the hype about Ajax seems to be related to the do not reload the whole page! commandment, allegedly for performance reasons. Clearly it all depends on how heavy is the page to be reloaded and how many real concurrent users there are. Personally i tend to use Ajax as little as possible, because it is introducing complexity and it reduces the chances that *every* browser on earth can correctly interact with the page (IE mobile is one of such crippled browsers for instance). Moreover i basically *hate* Javascript, especially when it comes to debugging.
Ruling out Ajax from my pages implies having a lightweight page structure with as few images as possible and a reasonable amount of data. However i recognize that in certain situations Ajax could improve the user experience, so i am not against it in principle.
For instance the new features under development for Apex 4.0 seem pretty amazing to me.
As always i believe it's a matter of finding the right balance between special effects and development/maintenance costs.
Cheers,
Flavio; Wed Oct 17, 04:10:00 AM EDT
Anonymous said....: > how did you do performance
> measures and what seemed
> promising about it?

in my case it was database that have massive Update operations and it was the main bottleneck. Database have default 8k block size and most of tables were Index organized tables.
I found that Update operation on heap table is faster rather than on IOT. And update on 2k block tablespace is faster than on 8k (default). This was reason to put table data and index on separate tablespaces.

I was doing performance tests with your Runstats package like it is written in your book.
If you like I can send you all the scripts of performance tests. They are quite a huge to put into this blog.

Is there any other internal issues about such setup - different block size tablespaces? Why we should not use that?
Thank you in advance!

Grinalds from LV; Tue Oct 23, 05:17:00 AM EDT
Thomas Kyte said....: I would expect a heap to outperform and IOT for updates. an IOT is a really fat index, indexes are complex structures - takes a lot more work to maintain than a disorganized HEAP of data does.

You just need to describe better the set up. for example - when you put the index into a separate tablespace - did that have the secondary effect of evening IO out (eg: you put indexes not only into a separate tablespace, but on a totally different device).

And what were the *results*, you say "it was faster", but what are we talking about here.

Having different sized blocks is a maintenance issue, is an optimizer issue (changes formulas used for multi-block reads), it makes you "different" and harder to diagnose (because you are "different"); Tue Oct 23, 08:43:00 AM EDT
Anonymous said....: thank you very much for answer.
This example was from my previous project and currently is not actual for me anymore.
There were just idea to play with different block size tablespaces trying to solve issue that we have massive Updates and huge Selects on the same database. (better should be different servers - oltp and datawarehouse but not everybody can afford that).
Can you suggest case when it would be good to use that feature - different block size tablespace?

I have all your books and have not seen more description about this topic.

Thank you in advance!
Grinalds; Wed Oct 24, 04:20:00 AM EDT
Thomas Kyte said....: Can you suggest case when it would be good to use that feature - different block size tablespace?

this feature was added to support transporting data between databases with different blocksizes.

It is there to facilitate movement of tablespaces from database 1 with blocksize X to database 2 with blocksize Y.

The thought was you would

a) transport (say from OLTP)
b) attach (say to warehouse)
c) perform SQL on the migrated data to move it from the transported tablespace to new structures

In short, it is an ETL (extract transform load) tool - that lets you skip the E and the L bit.; Wed Oct 24, 08:30:00 AM EDT
Anonymous said....: ok, transportable tablespace, now I understand, thank you very much, you are my superhero! ;-)

Grinalds; Thu Oct 25, 03:34:00 AM EDT
Grinalds said....: thank you for answer!; Thu Oct 25, 03:45:00 AM EDT
edelstein schmuck said....: Good Job! :); Wed Jul 30, 06:59:00 AM EDT