Thanks for the question, Goh.

Asked: February 17, 2003 - 8:52 pm UTC

Last updated: September 24, 2013 - 10:57 pm UTC

Version: 8.1.7

Viewed 10K+ times! This question is

You Asked

Dear Tom,

I was asked to come out with the performance tuning process which will be used as a general guideline when conducting performance tuning for customer. I understand that there is no fix rules for perofrmance tuning. Hope you can provide some views on these. Thanks in advance.

Step 1 : Baseline statistic collection.
Step 2 : Define performance tuning Goals.
Step 3 : Determine the area and the period of the bottleneck
(when the slowness occurs, which particular program or
overall system slow down at certain time).
Step 4 : Identify the main constraints
- System Level
- Database Level (Statspack report)
- Program Logic Level (Statspack report)
- Network Level
- Client Resource Level
Step 5 : Remove Constraints
Step 6 : Review and Repeat until Tuning goals met.

Regards,
SHGoh

and Tom said...

step 2 is really step 0 -- before anyone gets on a plane, train or automobile.

step 3 assumes it is a bursty problem, many are not.

The problem with tuning is step 5 and 6. If we tune before we deploy -- well, then you never get to 5 and 6.

Each and every exercise in post mortem tuning is an experience. Everytime I do it -- it is different. I guess if I had to categorize i would:

a) isolate to the best of your ability which of the 15 tiers people run this is happening on. Eg: if you have a 48 cpu database server and it is not breaking a sweat -- the database isn't broken. (i've been there where they are jumping up and down "database slow, database slow", I look at database -- database is doing SQUAT, nothing, no work. Database is fast)

b) make sure all of the code you ever write, your company writes is instrumented to the gills. Good code has as much debug/trace code in it as "real" code. Consider what would happen if Oracle didn't have:

sql_trace=true -- that is instruemented code
v$ tables -- that is instrumentation
events -- instruementation on steriods

how would you do anything? You need to make sure every byte of code that comes out of your shop is as instruemented, if not more so, then the database. Else you are blindly finding your way through a maze. Just try to pinpoint a bottleneck in a 14 tier really cool system if the only instruemented piece of code is the database and the database is doing fine!

c) make sure you understand the problem -- is it a single user having an issue with a single report? is it everyone on a single screen? is it some users with the entire system? is it all users with the entire system? does it only happen at lunch on wednesday? does it only happen at 9am, 1pm, 5pm?

d) using the insstruemented trace dumps you get from each layer -- figure out where the bottleneck might be. This might be sql_trace=true for a single user in a single module. This might be statspack at the database level, this might be some diagnostic tool for apache in the middle tier, this might be a java source code profiler, whatever.

e) remember -- each experience will be different from the last and totally different from the next. They are mysteries. If you could codify "do steps 1, 2, 3, 4, 5 and repeat", software would do this for us. It requires some level of intuition and experience. You have to learn to ignore some things (eg: those guys that want to move data files around to increase performance -- tell them this -- Ok, you go over there and play around, just remember where everything is RIGHT NOW and be prepared to put it all back when we say so. Then, ignore them and get back to work ;) Oh, I can count on one hand without any fingers how many times moving datafiles did the trick -- but it is what everyone does for some reason?

sorry -- this is a non-answer answer. If you have my book Expert one on one Oracle -- in there I relate a tuning story where it was 100% a user interface issue. "Hey, the database is hung, stupid database is broken". Turned out it was users "using the application in a manner the developers did not forsee". The logic the programmers had was:

o user swipes timecard
o that selects a record for update
o user hits enter key to confirm and walks away from terminal
o record committed.

What users did:

o hit enter key to clear screen from the last guy
o record committed
o user swipes timecard
o user walks away

So, guess what happens when the last guy from lunch comes in? They leave their record locked. Then, a batch process wakes up in the background and tries to post the records -- so it does a select for update on all of the records -- gets some but gets stuck on the record this last guy didn't commit. Now the people that have their records locked by the batch process are stuck in line - cannot swipe their card. queues build up, people scream, then -- all of a sudden, all by itself -- problem goes away.

It took two days to figure it out. Had to goto the warehouse floor and observe how people used the system. It was a client program issue, nothing to do with the database. But, everyone was 100% sure "database is problem".

Think outside the box.

Rating

(79 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

A reader, February 18, 2003 - 9:42 am UTC

Performance Tuning Process

A reader, February 18, 2003 - 8:55 pm UTC

Dear Tom,

Thanks for the answer. Agreed. Performance Tuning requires a lots of experience. Each problem might have different way of handle it. You have brought up a very good example (Client Program issue or could be client resources). Performance can be affected not only by Database but System resources, Network congested, Client Hardware resources and "The way the program being developed and used". Since we are DBA and should always start to look at the DATABASE layer and System resources layer. I wonder if it is the good approach to collect the timing for a particular complex SQL statement at the Database Console. For instance, When user complaints system slow, I will run the complex SQL at the console and collect the timing. I will then compare the collected timing against the timing that I collected when system running smooth. With this performance indicator, I would be able to conclude that the DATABASE is having a problem. I am not sure if this is workable or not. Need your view. Thanks.

Rgds,
SHGoh

February 19, 2003 - 8:40 am UTC

You need to keep a history of

o how the database has performed in the past
o how the application has performed in the past

In order to diagnose if it is doing "ok" today. Unfortunately, when called into tune the two things people invariably NEVER have is a history of what it looked like when the system was "good".

I don't know what a "database console" is. I don't know why you would collect the timing for a particularly complex sql statement there - I mean -- it would sound like you have the problem pre-identified here, assuming that there is a problem sql statement. It could be a trivial sql statement that is executed 15,000 times with the same inputs (and 15,000 iterations of something that takes 1/1000th of a second is 15 seconds of time!)

No, I don't think you can use your approach to see if the database is "OK". It is a little more complex then that, a couple more moving pieces are involved. You are looking for a simple answer -- like those little green lights that flash on those pretty GUI monitors when "things are ok" -- little red lights when they are bad. Problem is -- I've seen some excessively poorly tuned systems with all green lights and some awesome systems with red lights flashing all over.

For the people with red lights -- I just tell them to lower their thresholds in the GUI -- makes them feel better. They go green.

For the people happy with green lights, I leave them be. Hardware vendors love them.

For people who don't use the GUI's and ignore the lights -- well, now we can get somewhere ;)

cachitok@hotmail.com, February 19, 2003 - 9:00 am UTC

Letme give a quote from my experience

I think to get the best tunning, you only get it with experience.

1.
You must know what is good, what is acceptable and what is wrong. First study.

To know it you must have the experience with the hardware, Oracle version and size of tables and type of query you do. But first you must have knowledge. (read)

I am never going to know how much faster would have been my softaware in 7.x because in that time I do in the way I knew Oracle.

So the first thing is or you get some book (including Tom Kyte book , not because he answers the question, else because it´s really a good book ) and try to get an OCP.

When you create the database with all best conditions, for example automanage, uniformsize, automatic undo management. and know the tools and features you have to know if you are doing something wrong or losing some feature that can increase your performance, for example materialized views in dataware housing.

Then only then you can have an idea of what should be the performance.
For example I solved a problem in time of backing up,moving old data to a read only tablespace, If you don´t know that this exists, you can use it.

2. Experience

You too need some experience, because if you don´t know how Oracle forms locks records in the database, or other specific conditions ( people, software, hardware, network, etc. )only you can has in your work.
For example recently we got a problem in a oracle forms client, the reason was the router that disconnected the client, Word don´t has problem, but Oracle forms yes, we demostrated it, disconnecting the network cable and reconnecting, and the same error message appeared. Then they fixed it.

3. Understand what is important

One thing Tom helpme to understand is how you can incrase the performance, at least in Oracle, he stopped my ideas to set exactly the size of the variable for example NUMBER(8,3), and showme what was really important, where I should have to look for performance problems.

4. Practice.

Even I got my OCP, I have oceans of doubts, not because I don´t understand, else because until you don´t use a feature or see something, you don´t understand how important it is or it is not.

Until using the hint /*+ NO_MERGE */ in a query that used views, and reduced the time from 30:00 to 9 seconds. I didn´t understand how important was the hints, I understand what the hints does, but I didn´t understand how useful they could be.

5. Ask tom
If you has a problem, you can make it understandable, ask tom,
For example the explanation Tom gave me about parameters
OPTIMIZER_INDEX_COST_ADJ = 5
OPTIMIZER_INDEX_CACHING = 90
saved me days of work, not only hours.
And solved a very important problem in performance.

:)

redo log space request

Reader, August 08, 2003 - 7:05 pm UTC

Tom, the online doc describes the subject statistic as how many times a server process had to wait for space in the online redo log, not for space in the redo log buffer. Could you please help me in understanding this in what situations in oracle the server process has to wait for space in online redo log files? LGWR is the one that is writing to online redo log files from log buffer. When does server process access online redo logs? Thanks.

August 10, 2003 - 12:12 pm UTC

</code> http://docs.oracle.com/docs/cd/B10501_01/server.920/a96536/apc2.htm#27146 <code>

lgwr is the server process in question here.

A reader, April 29, 2004 - 4:29 pm UTC

Hi Tom, my question is regarding logons current. Does it impact performance and resources. i have the following query which i run frequently everyday:

select class, name, value from v$sysstat
where name like 'session cu%'
or name like 'open%'
or name like 'parse c%'
or name like 'logon%'
order by 1;

It generated the following query:
CLASS NAME VALUE
---------- ---------------------------------------- ----------
1 logons cumulative 2142278
1 logons current 1.8447E+19
1 opened cursors cumulative 23518633
1 opened cursors current 1.8447E+19
8 opens requiring cache replacement 0
8 opens of replaced files 0
64 session cursor cache hits 2388840
64 session cursor cache count 1597
64 parse count (total) 28092107
64 parse count (hard) 33597
64 parse count (failures) 10673

As u can see the number of users are HIGH as are the open cursors current. I want to know the reason why? The open cursor parameter is set to 500 and sessions is set to 170. So how is it possible for it to reach such a high number.
Also could u please suggest a way to resolve this issue as the number of sessions from the v$sessions are less than 30 at maximum.
Thanks in advance.

April 29, 2004 - 4:56 pm UTC

looks like something "is wrong", those numbers are "not right". I doubt you have 1.8e19 users logged in.

please contact support if you'd like to find out why.

A reader, June 08, 2004 - 2:02 pm UTC

Tom,

In our production database, the process of making the ODS data factory involves several pro *c, pl/sql programs that extracts data from the text files, cleans it and loads it into Oracle tables.

Last month, we moved the database to a different Unix box which has more number of CPU's and more RAM than the previous one.

We were expecting the runtime of the production jobs will come down (the amount of data this month is almost same as last month) because of the better machine. However, it took almost double the amount of time than last month.

I don't know from where to start investigating this matter.

Can you help me out in this?

Thanks,

June 08, 2004 - 3:38 pm UTC

well, suppose you were bottlenecked on IO.

You give a system bottlenecked on IO more CPU, more RAM.

You know what happens? You get every more bottlenecked on IO than ever before.

Why? because you spend less time doing other non-IO related things now.

Do you have statspacks from BEFORE (before the change). If not you either need to

a) move back, and figure out what your bottleneck really was and then fix that resource or

b) tune the entire system as if it were a brand new system, never touched before.

To do B) recommend you get:
</code> http://www.amazon.com/exec/obidos/tg/detail/-/059600527X <code>

and use the approach he outlines (application level tuning, bottom up).

A reader, June 08, 2004 - 3:58 pm UTC

I did not understood completely about your first 4 lines. Can you plesae elaborate.

Other than IO, do you think any other hurdle could cause the performance bottle neck?

My manager took the report of CPU consumption for the entire month of May and found out that the CPU was pretty high during the production jobs.
I am pretty sure, the production jobs that does not do any kind of sorting (just ETL)use very less CPU. But he has a proof in his hand.

Unfortunately, I do not have the statspack report on the previous machine. May be I have to consider your option B.
Have you covered the method of application tuning in any of your books? I own a copy of both of your books.

Thanks a lot.

June 08, 2004 - 4:12 pm UTC

Have you ever been on a toll road. The traffic is backed up for miles and miles at the tollbooth. So, traffic moving slow. You could do a couple of things to fix it:

a) get faster cars for everyone, faster cars go -- well faster.

b) take the curves and hills out of the road and make it straight. straight roads without hills let people drive faster

c) add more lanes to the highway, more lanes = better traffic flow

d) add more toll booths

if you do any of A, B, or C (all reasonable -- consider them "add RAM", "add CPU", "make network faster" for example), you will more than likely cause longer backups. Why? the cars arrive at the scarce resource (the tollbooth) even faster, the queues get longer and take more effort to clear out. A car now sits in a 2 mile backup instead of just a 0.5 mile one.

Problem here was no one analyzed "why the stuff was slow", they just added resources.

Lets go another analogy further. We have proof that cars were moving -- we measured it (they were "using CPU"). Therefore, all we need to do is make them move faster right -- wrong. Making them move faster caused a *real* bottleneck.

everything could cause bottlenecks, anything.

The reason I want you to get Cary's book is because he spells this out like no one else I know.

His approach falls in line with my chapter 10 of Expert One on One Oracle. Time to tkprof with 10046 level 12 traces your applications and see what they are waiting on.

So what if the CPU was being used -- you *want* it to be used. What needs be done is figure out "what am I waiting for, what am I starved for" and feed that.

this is actually "classic". People don't believe it happens but it happens all of the time.

A reader, June 08, 2004 - 4:15 pm UTC

Excellent example!

A reader, June 08, 2004 - 4:23 pm UTC

Wow!!!! Your logic is just great. It makes sense and explains clearly.

Performance Class

daniel, June 08, 2004 - 5:02 pm UTC

Hi. I want to take a class on performance optimization/tunning. I'm thinking about taking either hotsos class (Diagnosing Oracle Performance Problems - </code> http://www.hotsos.com/courses/PD101.php

 or Orapub class (Advanced Reactive Performance Management for Oracle Based Systems

http://www.orapub.com/cgi/genesis.cgi?p1=sub&p2=pm_course <code>. Does anybody have any experience with them, maybe even both? Which one would you recommend?

A reader, June 08, 2004 - 5:10 pm UTC

Tom,

Actually I have a copy of Oracle Performance book too. Which particular chapter explains about application tuning, bottom up?

Thanks

June 08, 2004 - 5:25 pm UTC

it is the entire *book*, it is what the book is all about.

A reader, June 09, 2004 - 11:20 am UTC

Ok, more CPU more RAM could have caused IO bottleneck. Does it mean less CPU less memory is ideal for a job that is more IO oriented?

June 09, 2004 - 11:46 am UTC

no, no, no.

it means that "if you have a resource issue, you must identify what the resource issue is and fix that problem -- for fixing something that isn't broken could lead to even worse (or better or not any different) performance"

It means you need to fix that which is broken.

Fixing that which is not broken just changes your systems characteristics, generally for the worse.

Use that car analogy above. Suppose you did add more tollbooths, lots more.

Now you may well find that you

a) have no backups
b) have tollbooths that are empty, performing no work.

So now making the cars faster, adding more lanes, and straightening the road out would make sense -- get the cars to the tollbooth faster please.

Upto a point -- at some point, cars will start arriving too fast for the available tollbooths again! Then you need more tollbooths

Note in analogy:

o tollbooths are disks, hardware, controllers, etc
o straight road is network
o faster cars are faster/more CPUs
o more lanes is RAM

You need to add them in balance, after determining which one needs tuning. Tuning the wrong one is not only a waste of time (how many hours did you spend doing this massive system change?), but will generally lead to the outcome you experienced -- either no change, or much worse.

A reader, June 09, 2004 - 12:04 pm UTC

For class on performance optimization/tunning I prefer hotsos .

More than an analogy?

Vladimir Andreev, June 11, 2004 - 9:21 am UTC

Hi Tom,

I've been using the highway analogy quite a lot (maybe because I tend to spend quite some time on the Autobahn driving to various customers' offices :-), and it strikes me that it is so close to what really happens in an Oracle database (or any multitasking system, for that matter) that I begin to seriously consider applying the results of scientific research in the road traffic management area to computer system performance.
I have very little knowledge in road traffic management, but I know, for example, that they use hydrodynamics to model traffic flows - something I have never seen applied to computer performance.
Do you know of any author that has published any results in this direction? I assume computer scientists have tried this approach already, haven't they?

By the way, I recently tried to explain the significance of the log buffer size to a friend, and I came up with this analogy:
The LGWR is your friend driving in front of you, leading the way. You are the user session(s) producing redo. You will not overtake LGWR, because you don't know the way. LGWR will always slow down as necessary to keep a maximum distance of log_buffer meters, so that you don't lose him/her in the other traffic. The amount of safety distance is obviously irrelevant for the total performance (time needed to get to your friend's place).

Further observations:
If you get too close to LGWR, you must slow down ('log buffer space' wait event). If you fail to slow down in time (because of a bug), you crash (ORA-600). Slowing down and accelerating frequently is known to consume more petrol (context switching) and therefore should be avoided, so you should not reduce the safety distance too much.

Regards,
Flado

June 11, 2004 - 4:18 pm UTC

Cary Millsap -- it is all about queuing and such. His book delves into this -- myaby not hydrodynamics (way way beyond me) but in that direction -- yes.

analogies are easy to abuse, but your lgwr looks OK. Connor McDonald from Australia has a nice one too -- it involves a rope and his wife (done live, on stage) but that is another story entirely :)

10g tuning

John, September 04, 2004 - 12:59 pm UTC

What is the use of ACTIVE_SESSION_HISTORY view? How as a DBA I could use the view to tune the database? Do you have an example to illustrate the usefullness of active session history info?

I see the value of v$session_wait. However, I would like to understand more about how can I use the new V$ACTIVE_SESSION_HISTORY view. Thanks.

September 04, 2004 - 1:13 pm UTC

v$session_wait is a point in time thing. it shows you "right now, as of this very instant, this is what we are waiting on"

that is not at all useful for "hey, you know, this batch that is running for an hour -- it isn't doing so well -- what is wrong with it?". The ASH might be able to tell you "oh, that batch has spent 59 minutes waiting on lock, not waiting right now but apparently it was just totally blocked". a point in time view would not tell you anything.

or "you know, our middle tier connection pooled application is running really slow today". ASH could help you determine:

a) probably not a database issue, you haven't been waiting on anything in your connections for the last 2 hours. Lets start looking elsewhere

b) oh, geez, massive IO waits for the last 2 hours -- lets try to figure out what changed to cause that to happen

Historically the database has had a short term memory "what is happening right now". However, when you come back from lunch, it is generally the case that the message on your phone is "hey, at 12:30pm (it is 1:15pm now) it was slow -- why and what are you going to do about it". Unless you were sitting there at 12:30 (and you weren't -- you were eating) taking notes about what was going on -- there is no way you can answer those questions. Now, with the database constantly benchmarking itself, keeping a long term memory -- you can go along way towards answering those questions.

It is more knowledge (not even going to call it data, it is knowledge) for you to use.

Millsap Vs. TKPROF & StatsPack

Richard, May 26, 2005 - 9:32 am UTC

I'm currently reading (and weeping at my inadequate Oracle tuning abilities, compared to those of the author) *Optimizing Oracle Performance*, by Cary Millsap.

In that (superb) book, he effectively tramples TKPROF and StatsPack underfoot, as they are, apparently, not nearly detailed enough for *proper* evaluation of Oracle/System problems.

Fighting talk, I'd have thought - what say you?

May 26, 2005 - 10:14 am UTC

I've been pretty successful with tkprof, less so with statspack (you can get lucky), but have my own method of looking at a trace in more detail using SQL.

I picked up some nuggets from that book.

In general, most of the time I find tkprof more than adequate. I find statspack useful. However, the things I'm called in to look at sometimes go beyond that (when tkprof and statspack cannot answer the question).

An interesting one was the case of "missing time" in the trace file -- it was nothing that showed up in the tkprof report -- but we had big gaps of time. We had to process the raw trace files -- with lots of additional instrumentation, and a couple of extra events, to finally track it down. But that is the exception, not the rule.

Many performance issues are so eggregiously obvious that they literally fall out of a tkprof report.

Your own method of looking at a trace in more detail using SQL.

Richard, May 26, 2005 - 10:39 am UTC

Is your own method of *looking at a trace in more detail using SQL* in an existing book, or might it be in your next (Tolkien-esque*) dual-tome publication?

* TK = Gandalf; DKB = Saraman; M$ SQL Server = Sauron! :)

May 26, 2005 - 10:49 am UTC

my method of looking at it is so "unspeakably unstructured" that I haven't tried yet to put it into writing.

Meaning, I don't know what I look for until I start looking for it -- hard to put it into writing. I could document the object types and the views (it was a neat excersise to simply create a view that mapped each and every trace record to an object type and then to parse each record into the appropriate type) but to describe how to use it... Hmm, that is where the custom queries come in. I don't have a library of queries I use against it, I write them on the fly and toss them out as I haven't found them to be very "reusable".

So I guess I've never talked about it because I don't have anything concrete to say about it - other than "if you really know analytics and the contents of a trace file, you can ask *any question* you want"

Saruman, not Saraman! LOTR fans will kill me!

Richard, May 26, 2005 - 10:43 am UTC

UML Type Approach

Richard, May 26, 2005 - 10:59 am UTC

In the strange world of UML, patterns (templates, really) are widely used (GOF, etc.) to identify and solve problems. The Perl Cookbook is, in a way, based upon the same idea.

Could such a patterns-based approach to Oracle problem solving ever be sane? I suppose I'm trying to say that whilst any individual Oracle system is unique, might it be right to document scenarios with "... so if you see this, it MIGHT well be worth considering xyz"? Or is this just another step into the wilderness of *Silver Bullets* ?

May 26, 2005 - 12:48 pm UTC

Oh no -- we have many patterns and that is where tkprof and statspack come in handy.

In the early days of my tuning journey, my favorite trick was:

- turn on sql_trace (usually greeted with blank stares, "what is sql_trace")
- run tkprof (usually greeted with, "did you write that")
- grep '^total' tk.prof | more

and page through it looking for a big number, vi the file, search for big number, tune query and walk away....

Now "everyone knows" about tkprof so I don't get to do that very often...

Later -- statspack like reports with waits at the system level. sometimes, very infrequently, you get luck (direct path read and writes are huge, ummm, so what is your sort area size anyway -- what!! make it bigger. Bye!)...

Now "everyone knows" about statspack/awr reports and I don't get to do that very often...

So now it is the exceptional things I get to look at -- the patterns are well known and in use. The software is even automating them. I find them to be somewhat boring since it is mostly done for us. The advisors, the waits -- everything sort of spoon fed and waiting to be diagnosed and acted on. Those are well written about.

It is the strange thing, the tkprof that shows NO wait times, but there is a ton of missing time (uninstrumented code in the kernel is the most frequent culprit -- but not the only one).

That is where Cary's book comes in as it goes beyond that which is "obvious" and into the area where we are mostly engaged. It has been a long time since I've been able to go into an environment where people have been looking at a problem for day(s) and simply run a tkprof to fix it.

TKPROF-savvy Sites

Richard, May 26, 2005 - 1:23 pm UTC

I have worked for three companies over ten years; only one of them uses Oracle. But, the common factor for all three was: tuning the systems(s) was simply not done in any meaningful way... apart from my attempts, which, I fear, were looked upon as *meddling*. "More, better and faster hardware, that's what we need" was the usual cry. They usually got it, too.

Am I right to infer, from your previous reply, that most sites you help out with DO tune their system(s) (or at least attempt to)? If so, do you suspect that this is the norm? Or could there be an *underclass* of Oracle/SQL Server/Whatever installations that are, at best, tinkered with from time-to-time?

Sometimes, I do have my doubts.

May 26, 2005 - 2:26 pm UTC

I find most sites are aware of or have at least one champion these days that understand tkprof, statspack, tuning.

BUT -- I don't work with the one/few person shops very often. I do tend to see "larger customers". And that is a direct result of the section I work in here at Oracle.

And to me, books like Effective Oracle by Design and Expert One on One Oracle (yes, they are mine) are targeted right at that crowd. They say "here are the tools, here is how to use them, go forth and do good stuff using these techniques"

And many times, in my case, before I get onsite -- someone else has been there and wasn't able to diagnose the issue using those techniques.

My point was -- "we" (eg: the Cary's of the world) do not get called into the simple stuff -- others have been there before us. That is perhaps why he feels let down by the other tools (not trying to put words into his mouth at all). They do what they do, but they do not anymore than what they do. They have limits and by the time we look at something, those limits have probably already been passed. (sometimes I wish a grep ^total on a tk.prf would solve the problem -- that'd be an early lunch day!)

Motive for the method

Cary Millsap, May 26, 2005 - 3:39 pm UTC

The dominant motive for what we do is that a performance diagnostic Method needs a single entry point. Yes, tkprof will help you well to solve several types of problems. In my experience, people who use tkprof have far fewer performance problems than people who don't. But for some types of problems, tkprof won't help you at all. The cases in the final chapter of the book illustrate what I mean by this.

Furthermore, sometimes, tkprof will show you that you've implemented a poor practice, but it's terrible at explaining the total cost (in response time, of course) of that mistake. For example, tkprof might alert you that you're "parsing too much," (e.g., parse count == exec count), but it's possible that though this is happening, it's not a significant enough contributor to response time to really matter, and therefore it's not the where you should be applying your investment.

The real issue, to my mind, is that I want a deterministic problem-solving method that works all the time. Maybe tkprof helps company X solve 90% of their problems but only 10% of the problems that company Y has. What do you do about the remainder? You have to choose another "method." How do you do that? In my opinion, the answer is to begin the analysis by looking at the right data: a SPANNING, NON-OVERLAPPING account of exactly your problem response time. Tkprof doesn't do that.

Two tkprof deficiencies led to our more thorough investigation of raw trace data: (1) There are some problems that tkprof can't help you solve, and (2) tkprof doesn't tell you whether your problem is one that it can solve versus one that it cannot. This second point is what sometimes leads to those performance improvement projects where analysts try everything they can think of for months without ever really getting their finger on the real root cause.

As you said in your post, this (long, failed performance improvement projects) is generally the state of the world by the time I enter a situation. I believe that the benefit of avoiding these situations far exceeds the small amount of pain required to actually do performance improvement projects in the Method R way from the very beginning.

Performance issues in production

Thiru, November 09, 2005 - 4:14 pm UTC

Hi Tom,

a. Can you please outline some steps/strategies that you would suggest if there has been a degradation of performance on a production box. One fine day the database drags and the users need some quick fixes. How does one go about? I know that these should have been anticipated during dev stage itself but such things do happen.

b. Let's say a particular segment of the application is clearly showing up performance problems and we know that a particular package is being called at that time. Given this information, would there be any change in the strategies that you have outlined for (a) above?

Thanks Tom for all that you do.

A reader, March 14, 2006 - 10:06 am UTC

Hi Tom could you please add a very short example about each step in the tunning process, because I'm confusing a bit.
Thank you.

Step 1: Tune the Business Rules
Step 2: Tune the Data Design
Step 3: Tune the Application Design
Step 4: Tune the Logical Structure of the Database
Step 5: Tune Database Operations
Step 6: Tune the Access Paths
Step 7: Tune Memory Allocation
Step 8: Tune I/O and Physical Structure
Step 9: Tune Resource Contention
Step 10: Tune the Underlying Platform(s)

A reader, April 04, 2006 - 1:41 pm UTC

Hi Tom you didn't any comment to the previous process, I expect some comment, because this appears up to 8i documentation startion on 10g I don't see that tuning process in the documentation, I was curious if you had some comment about it
:)
thank you

April 04, 2006 - 7:31 pm UTC

I'm not going to write a book here, I don't reply to every comment - not enough time in the universe for that.

A reader, April 05, 2006 - 9:26 am UTC

Thank you

How does the PARALLEL_THREADS_PER_CPU recommendation fit in?

Naresh, April 20, 2006 - 7:04 pm UTC

Hi Tom,

I understand the general principal in this discussion is to put more capacity where the bottleneck is - and I totally agree.

I was reading Douglas Burns's paper on "How Many PX Slaves?".

The paper quotes the 10.2 documentation with regards to PARALLEL_THREADS_PER_CPU:

"PARALLEL_THREADS_PER_CPU enables you to adjust for hardware configurations with I/O subsystems that are slow relative to the CPU speed and for application workloads that perform few computations relative to the amount of data involved. If the system is neither CPU-bound nor I/O-bound, then the PARALLEL_THREADS_PER_CPU value should be increased. This increases the default DOP and allow better utilization of hardware resources. The default for PARALLEL_THREADS_PER_CPU on most platforms is two. However, the default for machines with relatively slow I/O subsystems can be as high as eight. "

The idea in the documentation seems to be to increase processes using the CPU when the IO subsystem is slow. How does it fit in with the "add resources where there is a bottleneck" approach. It seems opposite to it - enabling more threads would ultimately only mean MORE requests going to the IO system.

What am I missing here?

Thanks,
Naresh

April 21, 2006 - 7:23 am UTC

there is "add more resource"

and there is "utilize what you have"

I don't see any sort of conflict or confusion between them?

One of the risks with increasing the parallel threads per cpu however in this case could be that if your IO system is already slow - then quadrupling the workload (going from 2 to 8) could make it even slower. It assumes that the IO workload can scale as the number of parallel execution servers increases.

Access to v$active_session_history

A reader, May 10, 2006 - 4:06 am UTC

Hi Tom,

if I do a select statement as "SELECT * FROM v$active_session_history" it is showing as "table or view does not exist". Is there any way by which I can get the results. Can I know who is the owner for the above view
Thanks for the same.

May 10, 2006 - 8:06 am UTC

dictionary stuff like that is owned by "sys" .

the view is v_$active_session_history

You would ask your DBA for access to this.

How to tune a database you can't connect to?

Graham, December 12, 2006 - 5:17 am UTC

Hi Tom,

I've been given the enviable task of trying to tune a database / application that I'm not allowed to connect to. I can get someone else to run commands for me but the timezone difference is 9 hours so going back and forth like that is going to be a nightmare.

I will be allowed to connect to a UAT system but that obviously doesn't have the same data (at the moment we don't even know if it has the same code, schema etc.).

I've asked a bunch of questions but the answers are all VERY vague. It's pretty much 'sometimes the system runs slow'.

The db is version 8.1.7

Here's what I've suggested, have I missed anything glaringly obvious - or would you suggest a different route?

1. What I've asked the onsite guy to do is to run statspack for a 5 minute snap over the slow period, and also for a 5 minute snap during 'normal' operation.
2. Provide me with details of stats collection (this is a rapidly growing db so I'm hoping stale stats may be part of the problem)
3. Provide the onsite guy with some sql worst offenders code so that he can send me back the output.
4. Get the UAT system up to date code and schemawise.
5. Export the statistics from production and import them to the uat system (I know you don't like this idea but copying the data is not an option)
6. Using the UAT system attempt to tune the worst offending code using the cost from the explain plans (again I know it's not a great option but I don't see what other choices I have).

Thanks
Graham

December 12, 2006 - 7:12 am UTC

This is in all seriousness:

get another job.

you'd want statspacks AND tkprofs of the affected applications.

You would then be able to speak in very broad generalities about the 50,000 hypothesis you will be able to make up from this stuff.

It did cross my mind!

Graham, December 12, 2006 - 9:12 am UTC

In all fairness to the company in question, they did offer to fly me there the next day - my other half decided this was not an option, so I'm stuck with it.

Looks like I'd best get hypothesizing!

whether we need to tune the database

A reader, December 14, 2006 - 2:09 pm UTC

Hi Tom,

We have an three tire application. Oracle 8i is used for the database. Now we experience timeout issues ( from the applciation log file) almost everyday. Most of the time, the system is very slow.

I get the infomation from the V$system_event as following:
I am very new to the system tuning stuff. Does this database response ok with such kind of statistics? if not, I would like you can give me suggetion in what should I do about the database?

Many thanks,

hu Dec 14
System-wide Wait Analysis
for current wait events

vent Total Total
ame Waits Timeouts
----------------------------- ----------- -----------
ull event 1 1
atch free 13,484 13,478
mon timer 273,516 272,088
dbms ipc reply 4,390 0
dbms ipc message 1,421,940 1,145,272
nqueue 4,302 6
eliable message 1 0
ontrol file sequential read 176,526 0
ontrol file parallel write 324,022 0
efresh controlfile command 17 0
ocal write wait 558 0
uffer busy waits 32,807 4
og file sequential read 56,750 0
og file single write 8,384 0
og file parallel write 148,912 0
GWR wait for redo copy 59 7
og buffer space 70,206 1
og file switch (checkpoint in 1,071 545
omplete)

witch logfile command 1,410 0
og file switch completion 3,097 443
og file sync 59,280 64
b file sequential read 6,830,195 0
b file scattered read 9,829,549 0
b file parallel write 7,847 0
irect path read 7,408,664 0
irect path write 2,619,359 0
irect path read (lob) 4,030,001 0
irect path write (lob) 1,275 0
nstance state change 1 0
mon timer 2,739 2,732
ibrary cache pin 3 0
ile identify 21,002 0
ile open 167,006 0
irtual circuit status 27,313 27,313
ispatcher timer 13,661 13,661
QL*Net message to client 82,049,522 0
QL*Net more data to client 28,064,378 0
QL*Net message from client 82,049,497 0
QL*Net more data from client 6,594 0
QL*Net break/reset to client 52,752 0

December 15, 2006 - 8:29 am UTC

trace your application using sql_trace, see what it spends its time doing.

Tuning Batch Programs

VELU, January 01, 2007 - 7:00 pm UTC

Hi Tom
Wish you a happy new year. I am in the process of tuning a batch operation program. The program was taking about 1 hour 15 minutes to complete. I re wrote the program in PL/SQL, pretty much all the calculations are done thru SQL statements and few Global temporary tables.
1.In development and testing it was taking about 30 minutes
2.Installed the program in production. It is taking about 1 hour 45 minutes to complete.There are other CPU intensive programs running at the same time.

I would like to trouble shoot. I am wondering where should I start and what should I look for . (Tracing 10046 events slows down the program execution drastically almost three times slower and row source operation for some cursors are not there in the raw trace file).

I have read three of your books and am in process of reading Oracle performance tuning by Mr.Millsap.

Your guidance would be highly helpful.

Thank you very much
Velu

January 01, 2007 - 7:53 pm UTC

tracing is about it - you will see where you spend your time.

compare the trace from "test" to "production" and see what is different.

A reader, March 05, 2007 - 10:26 am UTC

Thanks very lot for the valuable information we can find here. I have all your books. I am reading and reading very often your book "expert one on one" and I enjoy it each time.
I am working in an application where our BOSS has decided that there will be no direct select/update on tables in this application. He instead decided that if you want to select some information then you need to call an oracle package having an object type as input and will give you your output in form of object type also. Then it's up to you to get your needed information from this output object type. All needed information are mapped in this output object type.
He said all request on our data base will pass through this service.
However I've realised when starting the development that I am writing more and more of plsql code to do what a simple direct select/update from table t will do it for me very quickly. We started then having performance problem when selecting/updating a huge number of information in the data base.
You could not imagine my reaction when I re-read again this from your book
"The same 'keep in simple' argument applies at the broader architecture level. I would urge people to think carefully before adopting very complex implementations. The more moving parts you have in your system, the more things you have that can go wrong and tracking down exactly where that error is occurring in an overly complex architecture is not easy. It may be really 'cool' to implement using umpteen tiers, but it is not the right choice if a simple stored procedure can do it better, faster and with less resources"
Its exactly my actual situation.
Could you please give me few words about your opinion? Do the data bases have been made to do not directly select from them??
Thanks

March 05, 2007 - 2:14 pm UTC

but you already read my opinions (well, facts mostly) in my book?? You know what I think.

They should be using REF CURSORS - not objects, encapsulation is good - but doing this object in/object out is just a waste for most things.

And you should have transactional API's - plsql code that takes inputs - does a transaction - and returns outputs. But you don't need to do everything via "objects"

Virtualization and Tuning

Paulo, March 08, 2007 - 12:46 pm UTC

Tom,

Today's (and specially tomorrow's) data centers are increasingly using Virtualization techniques for lower costs and all the other advantages virtualization (supposedly) brings.

The question is: how virtualization affects the performance tuning process, if at all? Isn't it insane to try and tune a product (database or otherwise) without knowing beforehand where it will run? You can move a server image from one box to another with a few clicks, but what will happen to your performance then? Are your performance tuning efforts lost?

What are your thoughts on the subject? Do you really think virtualization is this panacea some vendors are trying to sell?

March 08, 2007 - 1:02 pm UTC

Virtualization has been around *forever*

It is how mainframes work. LPARs (logical partitions) are virtualized machines.

I'm a big fan of virtualization - makes many things easier. Makes resource allocation and reallocation easier.

Sure, you can do "not smart things" with it, but you can do not smart things with anything.

What happens to performance? Same thing that happens when you make any change to the operating environment - performance will either

o improve
o go way down
o not change at all

Same as if you add or take away memory, add or take away CPU (from a domain - eg: an LPAR)

v$active_session_history time measurements

A reader, May 03, 2007 - 4:17 pm UTC

Hi Tom

ASH is very cool but I wonder how the time are measured, in seconds?

TIA

Malone

Tuning in production

Karthik, September 04, 2007 - 3:58 pm UTC

Tom,
I agree that collecting and comparing statspack reports during normal and worse timing will give us an idea about where we have constraints in the application.But in production if i need to do tune-what are the possible methods to fix where we dont have an option to touch sql code?is it limited to adding/removing indexes(scenario 1)/add memory(Scenario 2)/changing cursor sharing parametr(Scenario 3) etc
My question simply put:Yes,I have a statspack report where i can find lot of bottlenecks.But what next when iam not touching sql code?

September 05, 2007 - 1:55 pm UTC

you have sqlprofiles, different statistics that can be gathered.

suggestion: use 10g and the AWR/ADDM tools, they were designed to do this.

tkprof

Aru, April 22, 2008 - 10:36 pm UTC

Hi Tom,
This this is a performance tuning thread, hope I'm not wrong in posting this here.
We have a problem that in one test database server an import takes 4 hours but in the other it takes 24 hours.
The hardware, according to the sysadmin's is exactly same.
We tried to trace 910042, level 12) and have taken tkprof output and it is as below. Please can you shed some light to what low hanging fruit you can see in it please.

$ less avanp1_import.txt

TKPROF: Release 10.2.0.3.0 - Production on Wed Apr 23 13:51:10 2008

Copyright (c) 1982, 2005, Oracle. All rights reserved.

Trace file: avanp1_ora_22056.trc
Sort options: default

********************************************************************************
count = number of times OCI procedure was executed
cpu = cpu time in seconds executing
elapsed = elapsed time in seconds executing
disk = number of physical reads of buffers from disk
query = number of buffers gotten for consistent read
current = number of buffers gotten in current mode (usually for update)
rows = number of rows processed by the fetch or execute call
********************************************************************************

INSERT /*+NESTED_TABLE_SET_REFS+*/ INTO "WORKTASKSAFETY_B" ("OWNER_OI",
"BLOBID", "BLOBLEN", "BLOB")
VALUES
(:1, :2, :3, :4)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 549 0.86 0.89 0 6 1702 549
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 549 0.86 0.89 0 6 1702 549

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 67

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 549 0.00 0.00
SQL*Net message from client 549 0.00 0.08
********************************************************************************

SELECT /*+NESTED_TABLE_GET_REFS+*/ "BLOB"
FROM
"WORKTASKSAFETY_B" WHERE ROWID = :1

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 549 0.26 0.21 0 0 0 0
Fetch 549 0.09 0.08 0 549 0 549
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1098 0.35 0.30 0 549 0 549

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 67

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 549 0.00 0.00
SQL*Net message from client 549 0.00 0.18

********************************************************************************

OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 1098 1.12 1.11 0 6 1702 549
Fetch 549 0.09 0.08 0 549 0 549
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1647 1.21 1.20 0 555 1702 1098

Misses in library cache during parse: 0

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
log file sync 550 0.19 3.45
SQL*Net message to client 2745 0.00 0.01
SQL*Net message from client 2745 0.00 0.74
SQL*Net more data from client 117452 0.03 2.59
direct path write 1121 0.00 0.06
db file sequential read 1 0.00 0.00

OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 9 0.00 0.00 0 0 0 0
Execute 9 0.02 0.02 0 26 6 6
Fetch 6 0.00 0.00 0 12 0 3
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 24 0.02 0.02 0 38 6 9

Misses in library cache during parse: 3
Misses in library cache during execute: 3

2 user SQL statements in session.
9 internal SQL statements in session.
11 SQL statements in session.
********************************************************************************
Trace file: avanp1_ora_22056.trc
Trace file compatibility: 10.01.00
Sort options: default

1 session in tracefile.
2 user SQL statements in trace file.
9 internal SQL statements in trace file.
11 SQL statements in trace file.
5 unique SQL statements in trace file.
150912 lines in trace file.
3459 elapsed seconds in trace file.

Thanks Tom,
Regards,
RahulD.

April 23, 2008 - 6:05 pm UTC

... The hardware, according to the sysadmin's is exactly same. ...

hah. ok.

that tkprof shows it takes seconds, that is not very useful is it....

THanks

Aru, April 23, 2008 - 6:51 pm UTC

Hi Tom,
How should I go about trying to analyze the above issue.
What else can I check if even the tkprok does not help. Should I trace it for the whole duration of 24 hours?
Initially I thought that there would be something different about the server hardware,OS and storage but am told no.
Please help...
Regardes,
ARU

April 28, 2008 - 9:44 am UTC

I don't know what issue you have frankly, you posted a tkprof that shows extremely fast processing.

why are you using imp, why not expdb and impdp - both are much faster (especially impdp) than the legacy tools exp/imp.

yes, a much longer trace would be called for - but might not be useful. It could be that ONE THING is taking a huge amount of time, much longer in one system than the other - and if you traced for an hour - you might not see it. So, a full trace would be 'best'

but you can also use the V$ tables to see what is happening in that session - do you have any tools (diagnostic, tuning packs) to use?

v$active_session_history

A reader, May 15, 2008 - 6:36 pm UTC

Hi Tom,
I would like to ask how relevant are the details provided in v$active_session_history?
I have been using it extensively for sometime now to gather historic data for the database users and the time they spent on each wait time. I am assuming that the value for time_waited is in seconds, if not milliseconds, right?

But what I would like to know is how good is the value provided by the columns wait_time and time_waited?
Although the sessions had executed for close to 2 hours and had a lot of wait time on db sequential read, gc buffer waits etc. The wait is always 0 yet the time_waited column shows some high value. If I do a sum(time_waited) and group by name I get some value which far execeeds the time the session has lasted.

Any thoughts are welcome always.

Thanks.

Database 10.2.0.3 and 6 Node RAC Cluster.

May 19, 2008 - 1:17 pm UTC

how relevant?

depends on what you want to know as to relevance doesn't it. It is very relevant for tuning, for asking "what was happening then", "what bottlenecks did I encounter then". If you want to know how much Bob makes per hour, it is not relevant. If you would like to understand what a session was doing at a prior point in time - very relevant.

You share with us no queries, so I'm not sure how you are using the data, so I'm not sure what you might be doing right or wrong when querying it.

Here is the query I use....

A reader, May 20, 2008 - 11:13 pm UTC

Hi Tom,
Thanks for the update. Sorry to have missed out on that vital piece of information.....

Here is the SQL which I use:

Set pages 999 lin 250
Col sql_text for a50 word_wrapped
SELECT SQ.SQL_TEXT,
ET.NAME Event_name,
COUNT(*),
SUM(ASH.TIME_WAITED) Total_time_waited
FROM V$ACTIVE_SESSION_HISTORY ASH,
V$EVENT_NAME ET,
V$SQLAREA SQ
WHERE to_char(ash.sample_time,'DD-MON-YYYY') > '15-MAY-08 09:00:00' AND
ASH.EVENT# = ET.EVENT#
and ash.session_id = 3236
and ash.session_serial# = 49139
AND ASH.SQL_ID = SQ.SQL_ID
GROUP BY SQ.SQL_TEXT, ET.NAME
order by 4 desc;

Cheers.

May 21, 2008 - 8:02 am UTC

you know this does NOT WORK right:

to_char(ash.sample_time,'DD-MON-YYYY') > '15-MAY-08 09:00:00'

why would you convert a date to a string (which is not 'sortable', if you use dd-mon-yyyy on a date and order by it, the strings are not sorted by date) and compare to a string.

why would you not convert the string into a date.

You know '16-APR-08' is greater than '15-MAY-08 09:00:00'

ops$tkyte%ORA10GR2> select * from dual where '16-APR-08' > '15-MAY-08 09:00:00';

D
-
X

the v$ tables are very transient - a single point in time "as of right now" view. If the query was not waiting right then, right there, at that moment, you would not see any wait.

I think you wanted to use the DBA views which contain actual histories over time.

Sorry, I had realised that..

A reader, May 22, 2008 - 11:45 pm UTC

Sorry, I had realised that long time back, but forgot to delete that one. I had modified the query but pasted the wrong one. Thanks for pointing that out :)

Based on your comments, I should go for DBA_HIST_ACTIVE_SESS_HISTORY rather than v$ACTIVE_SESSION_HISTORY, correct?

Thanks.

May 23, 2008 - 8:13 am UTC

depends on what you are looking for... sure.

time_waited in dba_hist_active_sess_history and v$active_session_history

A reader, July 30, 2008 - 12:36 pm UTC

1 SELECT to_char(max(sample_time),'DD-MM-YYYY HH24:MI') end_sample_time , C.SQL_TEXT,
2 EVENT,
3 seq#,
4 SUM(TIME_WAITED/60000000) minute
5 FROM V$ACTIVE_SESSION_HISTORY A,
6 V$SQLAREA C
7 WHERE A.SAMPLE_TIME BETWEEN to_date('30-07-2008 19:00','DD-MM-YYYY HH24:MI') AND
8 to_date('30-07-2008 23:00','DD-MM-YYYY HH24:MI') AND
9 SESSION_STATE='WAITING'
10 and A.SQL_ID = C.SQL_ID
11* GROUP BY C.SQL_TEXT,EVENT, seq#
LUCOTUS at orcl> /

END_SAMPLE_TIME SQL_TEXT EVENT SEQ# MINUTE
---------------- ------------------------------ ------------------------------ ---------- ----------
30-07-2008 19:55 LOCK TABLE LUCOTUS.TEST IN EXC enq: TM - contention 64 .16565335
LUSIVE MODE

30-07-2008 20:11 update test set object_id = 0 enq: TM - contention 71 2.86739578
30-07-2008 19:52 update test set object_id = 0 enq: TM - contention 47 1.84333278
30-07-2008 19:45 update test set object_id = 0 enq: TM - contention 37 1.92530068
30-07-2008 20:02 update test set object_id = 0 enq: TM - contention 61 1.87855473

---

1 select to_char(maxsample, 'DD-MM-YYYY HH24:MI') end_sample_time, sql_text , event, seq#, m from dba_hist_sqltext ,
2 ( select sql_id ,max(sample_time) maxsample , event,seq#, sum(time_waited/60000000) m
3 from dba_hist_active_sess_history ash
4 where
5 SAMPLE_TIME BETWEEN to_date('30-07-2008 19:00','DD-MM-YYYY HH24:MI') AND
6 to_date('30-07-2008 23:00','DD-MM-YYYY HH24:MI') AND
7 sql_id is not null
8 and session_state='WAITING'
9 group by sql_id, event, seq#
10 ) wait
11* where wait.sql_id = dba_hist_sqltext.sql_id
12 /

END_SAMPLE_TIME SQL_TEXT EVENT SEQ# M
---------------- --------------------------------------------- -------------------- ---------- ----------
30-07-2008 20:11 update test set object_id = 0 enq: TM - contention 71 .292966983
30-07-2008 19:52 update test set object_id = 0 enq: TM - contention 47 .195311267
30-07-2008 19:45 update test set object_id = 0 enq: TM - contention 37 .195320667
30-07-2008 20:02 update test set object_id = 0 enq: TM - contention 61 .1953105
30-07-2008 19:55 LOCK TABLE LUCOTUS.TEST IN EXCLUSIVE MODE enq: TM - contention 64 0

Seems the result (time_waited) from both v$active_session_history and dba_hist_active_sess_history not the same. Am I missing someting here ?

What is IMP/IMPDP read wait event?

Robert, April 17, 2009 - 12:15 pm UTC

Hi Tom,

What wait event does IMP/IMPDP wait on when it is reading from the (i.e. flat) import file?
My Statspack data shows main wait event is "db file sequential read" but that doesn't seem to make sense since it is reading from a plain old flat file on the server.

Would you please explain this?

Thank you,

Robert.

April 17, 2009 - 3:55 pm UTC

that wait (the reading of the OS file) isn't reported in the database as a wait event by import.

If you see import waiting on "db file sequential read", that is probably index maintenance (you are importing into a table that has indexes enabled already).

But where do the O/S reads show up?

Robert, April 17, 2009 - 5:11 pm UTC

Tom,

Can we identify stats for these O/S reads anywhere in AWR, STATSPACK, etc. ?

Thanks,

Robert.

April 20, 2009 - 10:49 am UTC

something to ask your OS vendor, to us, as I said, they are just a physical IO - we don't really know what magic the OS did to get us that information - we just did "read()" and they responded.

tuning process

Samuel, April 30, 2009 - 10:13 am UTC

Is it a good practice to use a stand-alone server for tuning a query? there are no other sessions except only one session that we use to run the query to produce a trace file for tuning? by the way, production server is on Unix platform and the server that is used for tuning purpose is on Windows platform. However, oracle database version is 9.2.0.7.0. on both.

OR do you recommend the query to be tuned on the production database server itself?

Thanks

April 30, 2009 - 11:09 am UTC

... OR do you recommend the query to be tuned on the production database server
itself?
...

no - for when you are tuning sometimes you seriously UNTUNE before you tune. And you need the ability to test things out - a new index, a different structure, a reorganization perhaps - things you cannot do in production.

So, you typically tune in isolation (using the FULL set of data) first, then you test in a multi-user situation (drive a simulated load) if at all humanly possible, and then you can deploy to production.

Monster Child Recods

Rupali, October 02, 2009 - 8:46 pm UTC

Dear Tom,

In the process of Query Tuning, can you please throw some light on how to handle scenario where child table can have unexpected huge number of child records.

This is the scenario from DataMart. We have a child table which is suppose to have max 4000 child records, but we got a strange case wherein this table is carrying 67000 child records for single parent. When I am including this record in query, it is running forever.

What stats gathering mechanism we need to adopt to handle such scenarios? Or do we have any other method to handle this.

Please help.

October 07, 2009 - 3:55 pm UTC

insufficient data to say anything.

no queries, no definition of what "forever" is

sorry - cannot really say "a thing" about this.

performance

A reader, December 01, 2009 - 9:45 pm UTC

Tom:

I have a mystery I need your help on. I think you can guide me to pinpoint the issue here.

I have a small package with a few procdures that are called via mod_plsql.
A visual baic app calls the oracle SP which delivers result in XML format. users complain that database seems to sleep sometimes or app is slow in getting data. I ran a few tests in the database and found it to be true sometimes.

It seems to happen first thing in the morning when you first call the procedure. IT does happen sometimes during the day. When i run autotrace and I test the XML query it happens in 0.1 second with excellent stats(see below).

Could it be one of the following:

1) hard parse since execution plan is no in shared pool
2) time it takes oracle to start a unix process to handle query.
3) calling a procedure in a package takes more time first time becaus it is loading all procedures into memory
4) database needs tuning
5) some big job runs in the morning.

Any suggestions for troubleshooting(i.e statspack report in the morning, explain plan/autotrace, TKPROF) any others.

MESSAGE_NO PACKAGE_NAME PROCEDURE_NAME DATE_IN TIME
---------- ------------------------------ ------------------------------ -------------------- ------
2178 MAGAAZINES MAGAZINES_QUERY_BYMAGANO 20-nov-2009 06:00;31 42.3 seconds.
2179 MAGAAZINES MAGAZINES_QUERY_BYMAGANO 20-nov-2009 06:01;10 17.34 seconds.
2180 MAGAAZINES MAGAZINES_QUERY_BYMAGANO 20-nov-2009 06:01;31 1.6 seconds.
2181 MAGAAZINES MAGAZINES_QUERY_BYMAGANO 20-nov-2009 06:01;37 1.52 seconds.
2182 MAGAAZINES MAGAZINES_QUERY_BYMAGANO 20-nov-2009 06:01;42 1.24 seconds.
2183 MAGAAZINES MAGAZINES_QUERY_BYMAGANO 20-nov-2009 06:01;47 1.12 seconds.
2184 MAGAAZINES MAGAZINES_QUERY_BYMAGANO 20-nov-2009 06:01;53 .91 seconds.
2185 MAGAAZINES MAGAZINES_QUERY_BYMAGANO 20-nov-2009 06:01;57 .06 seconds.
2186 MAGAAZINES MAGAZINES_QUERY_BYMAGANO 20-nov-2009 06:02;02 1.41 seconds.
2187 MAGAAZINES MAGAZINES_QUERY_BYMAGANO 20-nov-2009 06:02;07 .47 seconds.
2188 MAGAAZINES MAGAZINES_QUERY_BYMAGANO 20-nov-2009 06:02;11 .09 seconds.
2189 MAGAAZINES MAGAZINES_QUERY_BYMAGANO 20-nov-2009 06:02;16 .95 seconds.
2190 MAGAAZINES MAGAZINES_QUERY_BYMAGANO 20-nov-2009 06:02;21 1.55 seconds.

When i run the SQL in MAGAZINES_QUERY_BYMAGANO

Elapsed: 00:00:00.04

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
3 consistent gets
0 physical reads
0 redo size
3033 bytes sent via SQL*Net to client
2176 bytes received via SQL*Net from client
5 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

December 01, 2009 - 10:36 pm UTC

Sam/SMK

you cannot just run the sql - you have to run the procedure.

run the procedure and trace the procedure, if this is a "first thing in the morning", I'd guess you shutdown your database every night (bad bad horrifically bad idea) and start fresh. So, the database has to load everything it needs in from disk.

trace it. Like you've seen me do a million times here (you must have read that advice before on some other page? you should have seen it coming)

trace it with wait events.
see what it does, that'll show you if it is "sql issues".

if the tkprof shows "sql was fast, procedure was slow", time to look at the machine and figure out what was chewing up all of the cpu to make the code run slower in the morning than in the afternoon.

performance

A reader, December 02, 2009 - 10:40 am UTC

Tom:

Thanks for your reply.

I mostly read Chapter 2 of your book over the weekend on performce tuning. Great substance!.


I do not think they startup the test database evey night. Would not this confirm it

SELECT statup_time from V$instance

19-OCT-2009



When you say trace the procedure, Shall I do it at a session level any time (not just at 6:00 AM) or turn sql tracing for the whole database.

Here is my plan using sql*plus for windows.

1. Initialize mod_plsql
declare
    nm      owa.vc_arr;
    vl      owa.vc_arr;
begin
    nm(1) := 'WEB_AUTHENT_PREFIX';
    vl(1) := 'WEB$';
    owa.init_cgi_env( nm.count, nm, vl );
end; /

2. SQL> SET Long 10000
3. SQL> SET serveroutput on
4. SQL> SET AUTOTRACE on

5.  SQL> alater session set sql_trace=true;

6.  SQL> execure pkg.my_procedure;

7.  SQL> exec owa_util.showpage();

7. Go to the server disk and check the trace file created for my session. Run TKPROF and print the report.
   Check the report to see how long it took


Would not auttorace give similar result. It if is the procedure I would think it will always be slow though.
Would statspack report in the morning be of any help in this case too or that is fater we determine it is not the procedure.

December 04, 2009 - 9:17 am UTC

always try to trace as selectively as possible.

autotrace would not give any report for a plsql routine.

... It if is the procedure I would think
it will always be slow though.
..

why? what if the machine was 100% cpu saturated in the morning (I'm guessing remember, I don't have any information you have really). Then I would expect it to run much slower, wouldn't you?

statspacks would be useful to YOU too, yes.

trace

A reader, December 04, 2009 - 12:05 pm UTC

Tom:

How do you set this selective trace?

Do you do it in the procedure itself or you change the database parameter SQL_TRACE=ON for a day or how?

I noticed it happens sometimes during the day too.

Is there a way in oracle to capture the CPU saturation stats to determin if it is CPU issue or would the trace report tell you that.

When i run the sql in sql*plus 1st time it shows me
recursive calls = 18
redo size = 212
consistent gets = 18

then on subsequuence calls it resets to 0 (except consistent gets = 3).

Kind of strange to get redo size for a SELECT statement using DB links? I wonder if those DB links have to do with this issue. Query is using tables in another database on same machine though.

December 04, 2009 - 5:32 pm UTC

if you are using 10g and above and have access to AWR - you have access to the cpu utilization. Otherwise, you'd have to capture this yourself.

... using DB links? ...

been sort of NICE TO KNOW THAT. You are reliant on the other system responding to you in a reasonable amount of time, good luck with that. You dont have a good insight into what your own database is doing, let alone someone elses.

nothing strange about redo on a select (why don't you think it is strange it did logical io's locally? it is doing work locally). redo happens on selects all of the time (deferred block cleanouts from prior transactions)

to selectively trace, you would have your application instrumented to trace. It would be a switch you could just throw on - like APEX would give you the ability to do, just add a switch to a URL and tracing is on...

performance

A reader, December 04, 2009 - 8:33 pm UTC

Tom:

it is 9i and 7 years old. I cant use apex too until we go 11g in a few months.

Now, selective tracing you mention, i already did that.
I basically inserted a record into debug table and measure the number of seconds it takes for each step in the code.
That would *not* should me the execution plan, stats, etc.

It did not answer the main question though. Most time consume is opening the cursor.

The remote database is sitting right next to this database. Does oracle handle things differntly with DB links. Would the other database cache the execution plan or it has to be hard parse many times with db links.
I have to do it this way because this database has a DAD and open for the web while the other that has the data is not. I do not think we want to open the other database for ther web (security issue).

I can turn SQL_TRACE on this database for a day and then TKPROP the report. Do you think that would help me figure out what is going on? or do i need SQL_TRACE for ther otehr database that i query from using a link.

December 07, 2009 - 2:10 pm UTC

... I cant use apex ...

that has never been true, Apex was initially released in 9iR2. We've had this discussion Sam/SMK. more than once.

... Now, selective tracing you mention, ...

that isn't sql_trace, that is your own instrumentation which is good, but not what we were looking for.

... It did not answer the main question though....

I could have told you that. I wanted you to use sql_trace - you know TRACING - in a selective fashion, you know, on the SQL we execute.

... The remote database is sitting right next to this database ...

how wasteful is that.

.. Does oracle handle
things differntly with DB links...

of course, differently than if the data was all local in that database. Of COURSE it does, how could it not?

the remote system would be using shared sql like everything else.

... I do not think we want to open the
other database for ther web (security issue).
...

why not, apex.oracle.com (that is, asktom.oracle.com) sure doesn't work that way. No one I know would work that way. It doesn't make sense to work that way.

... I can turn SQL_TRACE on this database for a day..

hahaha, have fun with that. If you thought you were slow before, just wait. and then you'll have gigabytes of data - but no knowledge, but only because you won't be able to figure out a darn thing. Too much stuff to look at.

Effect of SDU parameter

Dhairyasheel Tawade, December 10, 2009 - 2:20 am UTC

Hi Tom

apart from all the tuning approaches mentioned above,
I've read that the default SDU size in oracle is 2048 bytes.
On some databases setting it to a value of 1461 has actually resulted in around 40% improvement for me.
I had a rman backup which completed in almost half its regular time after setting SDU to 1461.

can you elaborate more on how this SDU functions and how to judge as to what value to set for SDU on a database server.

December 10, 2009 - 2:53 pm UTC

are you sure about that?

http://docs.oracle.com/docs/cd/B19306_01/backup.102/b14191/rcmarchi.htm#i1008030

given that the rman stuff happens on the database machine, without sqlnet and all.

SDU

Dhairyasheel Tawade, December 11, 2009 - 12:51 am UTC

Hi Tom,

The rman backup example I gave, I was taking Rman backup from a Windows Machine with Oracle client. The database server was on Redhat Linux. Database was in no archive log mode. I had set the sdu to 1461 on both the client as well as the database server.

Regards.

December 11, 2009 - 7:25 am UTC

did you read the architecture overview of rman I pointed you to?!?

...
The RMAN client application directs database server sessions to perform all backup and recovery tasks. The meaning of "session" in this sense depends on the operating system. For example, on UNIX, a server session corresponds to a server process, while on Windows it corresponds to a thread within the database service.

The RMAN client itself does not perform backup, restore, or recovery operations. When you connect the RMAN client to a target database, RMAN allocates server sessions on the target instance and directs them to perform the operations. The RMAN client uses internal, undocumented PL/SQL packages to communicate with the target database and recovery catalog.
....

when you were done with your backup, where did the backup file exist.

performance

A reader, December 11, 2009 - 8:12 pm UTC

Tom:

apex is not supported in 9.2.0.2. Oracle supports it in 9.2.0.4 and later. But they also told me to wait to 11g. it is not under my control.

I did sql trace at session level and TKPROF. I found high level of recursive sql and memroy sorts on 1st call.

My query was selecting from a VIEW which had a join.
I changed that so it selects from tables (instead of view) and it seems things became much better. Does oracle optimizer deal differently with a VIEW than a table. DO you think that might cause a large number of sorts and recursive sql? What normally causes that in your opinion.

Statistics
----------------------------------------------------------
1055 recursive calls
1 db block gets
236 consistent gets
11 physical reads
212 redo size
3638 bytes sent via SQL*Net to client
2972 bytes received via SQL*Net from client
6 SQL*Net roundtrips to/from client
46 sorts (memory)
0 sorts (disk)
1 rows processed

Elapsed: 00:00:00.06

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
3 consistent gets
0 physical reads
0 redo size
3307 bytes sent via SQL*Net to client
2913 bytes received via SQL*Net from client
5 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1 rows processed

December 14, 2009 - 8:08 am UTC

... it is not under my control.
...

since 9.2.0.8 is out, and lots of pretty "necessary" fixes have been made - including security related ones and so on.... Maybe it is under the control of someone like you. Did you ever think to bring it up?

your query took 0.06 seconds. Yawn, move on.

Get an example where it takes noticeably longer SAM/SMK - come on, this isn't useful. You know that. You need an *example where you experience the problem*. An example where you don't experience the issue is useless.

... Does oracle optimizer deal differently with a VIEW than a
table....

If the query against the view is different than the query against the table - then yes, we obviously would.

Before you read that sentence too quickly, make sure you read it and understand what it says - if the query against the VIEW is SEMANTICALLY DIFFERENT than the query against the able - then YES we will process it differently.

If you get radically different plans, I would sit there and read and read and read those two queries over and over and over (and one more time over) to make sure they are actually *the same*. Because, usually they are not, they give different answers.

But again, this is useless to look at this stuff here, there is no performance issue.

perfromance

A reader, December 17, 2009 - 4:25 pm UTC

Tom:

Perforamnce is good (subsecond) except the first call in the morning.

DBA tells me it is normal for database to be slow on first request in the morning as they ahve not warmed up yet. I guess they need to get that morning coffee and stretch up a but before they start servicing requests!

He mentioned that they do a nighly export everynight that might be flushing the cache as it uses it.

Can this be accurate?

I did flush the cache and ran it took a few more seconds ( nothing compared to the 50 seconds i get on 1st request).

But i read the mod_plsql manual and i found interesting thing. The connection pool is at the apache process level and if the process does not service any requests in 15 minutes, mod_plsql will kill the connection pool.

So could it be creating new connections taking a lot of time (I doubt it). would TKPROF show you if it is actually creating a new connection (instead of suing a pooled one).

performance

A reader, December 23, 2009 - 10:53 pm UTC

Mr Tom:

Merry Christmas! 

Can you tell me what is possibly be going on here.
I Have a simple verify login function that takes a user input password and runs a java stored procedure to compute the sha1 string and compare that to the value in a small 100 user table.

The java class is this

http://www.anyexample.com/programming/java/java_simple_class_to_compute_sha_1_hash.xml


I run it over a db link usually where remote database is next to local one. But then i tested it in the local database to see if it is the link and it takes also 30 seconds on first request and then it runs very fast afterwards. Then next day or after some idle time it takes 30-40 seconds and then a snap. The function uses p_password as a bind variable.

1. Why does it take that long and should not oracle be caching it all the time that it should be a subsecond response always. The fetch here is taking 30 seconds. is it fetching the whole ticuser table which is very tiny.

2. how would TKPROF show you if a db link is the issue usually.

SQL> edi
Wrote file afiedt.buf

  1  SELECT COUNT (*)
  2          FROM ticuser
  3          WHERE userid = 'mike'
  4*         AND password = SHA1_PKG.F_SHA1('secret')
SQL> /

  COUNT(*)
----------
         1

1 row selected.

Elapsed: 00:00:28.08

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=18)
   1    0   SORT (AGGREGATE)
   2    1     TABLE ACCESS (BY INDEX ROWID) OF 'TICUSER' (Cost=1 Card
          =1 Bytes=18)

   3    2       INDEX (UNIQUE SCAN) OF 'PKTICUSER' (UNIQUE)




Statistics
----------------------------------------------------------
        275  recursive calls
          0  db block gets
        141  consistent gets
          1  physical reads
          0  redo size
        198  bytes sent via SQL*Net to client
        275  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
         21  sorts (memory)
          0  sorts (disk)
          1  rows processed


SELECT COUNT (*)
        FROM ticuser
        WHERE userid = 'mike'
        AND new_passwrd = SHA1_PKG.F_SHA1('secret')

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        2      0.01       1.30          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch        4      0.38      25.40          1          2          0           2
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        8      0.39      26.70          1          2          0           2

SQL> /

  COUNT(*)
----------
         1

1 row selected.

Elapsed: 00:00:00.03

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=18)
   1    0   SORT (AGGREGATE)
   2    1     TABLE ACCESS (BY INDEX ROWID) OF 'TICUSER' (Cost=1 Card
          =1 Bytes=18)

   3    2       INDEX (UNIQUE SCAN) OF 'PKTICUSER' (UNIQUE)




Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          2  consistent gets
          0  physical reads
          0  redo size
        215  bytes sent via SQL*Net to client
        275  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

December 31, 2009 - 2:32 pm UTC

look in the tkprof and see what those 275 recursive sql's are and what they do, they account for the additional consistent gets and the time.

... it takes
also 30 seconds on first request and then it runs very fast afterwards. Then
next day or after some idle time it takes 30-40 seconds and then a snap. ...

sounds like "getting the cache loaded up with stuff" for the first time and then "reloading into the cache after it got unloaded from the cache" for the second time.

Automatic SQL Tuning in Oracle Database 10g

Abhisek, January 02, 2010 - 9:59 am UTC

Hi Tom,

I would like to ask for the usefulness of Automatic SQL Tuning in Oracle Database 10g.. How effective and reliable is it when compared to human analysis..like we do..

January 04, 2010 - 10:33 am UTC

given the level of most "analysts" - it is light years ahead. It never gets bored, it asks for permission in most cases before making a change, it is conservative and predicable.

it gathers statistics that go beyond the norm - something an analyst could not do - in the form of sql profiles or dynamic samples.

it suggests the obvious - "index this, materialize view that"

it identifies the low hanging fruit for you and points it out.

it should be your first course of action, then second would be "review the ALGORITHMS", this is where the human comes into play.

Consider:

open c for select * from t;
loop
   fetch c bulk collect into array limit 1000;
   forall i in 1..array.count insert into t2 values array(i);
   exit when c%notfound;
end loop;
close c;

let the database tune the sql - select * from t; and insert into t2

however, look at that horrible algorithm, look at that very bad approach. You, the human analyst, would at this point call in the development team and teach them about how to properly do sql and demonstrate that the above should simply be:

insert into t2 select * from t;

nothing more, nothing less - maybe an /*+ append */ hint, maybe enable parallel dml and do it parallel, but a single SQL statement.

The stuff we need to tune are the approaches - not the sql so much.

performance

A reader, January 04, 2010 - 8:41 am UTC

Tom:

Thanks, I did not see your response until today.

Would this trace report show you that the time is wasted in fetching one record from this very small user table OR the sqlnet wait.

When i check the Cache (V$DB_OBJECT_CACHE) the package.sql is there but still it should not take oracle to load a simple statement 30 seconds to load it, If it is a caching issue, would you check why it is getting flushed OR schedule a morning job (3:00 AM) to load all of these into CACHE every day.

TKPROF: Release 9.2.0.2.0 - Production on Mon Jan 4 08:55:58 2010

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

Trace file: tic_ora_13742342.trc
Sort options: default

********************************************************************************
count = number of times OCI procedure was executed
cpu = cpu time in seconds executing
elapsed = elapsed time in seconds executing
disk = number of physical reads of buffers from disk
query = number of buffers gotten for consistent read
current = number of buffers gotten in current mode (usually for update)
rows = number of rows processed by the fetch or execute call
********************************************************************************

The following statement encountered a error during parse:

alter session set_sql_trace=false

Error encountered: ORA-00922
********************************************************************************

ALTER SESSION set events '10046 trace name context forever, level 12'

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 1 0.00 0.04 0 0 0 0
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1 0.00 0.04 0 0 0 0

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 419

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 1 0.00 0.00
SQL*Net message from client 1 21.69 21.69
********************************************************************************

SELECT COUNT (*)
FROM ticuser
WHERE userid = 'mike'
AND password = SHA1_PKG.F_SHA1('Secret')

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.02 1.75 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.41 24.13 0 2 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.43 25.89 0 2 0 1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 419

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=135 r=0 w=0 time=24741940 us)
1 TABLE ACCESS BY INDEX ROWID TICUSER (cr=135 r=0 w=0 time=24741917 us)
1 INDEX UNIQUE SCAN PKTICUSER (cr=1 r=0 w=0 time=42883 us)(object id 234051)

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 3 0.00 0.00
SQL*Net message from client 3 18.06 30.97
SQL*Net break/reset to client 2 0.01 0.01
********************************************************************************

select longname
from
javasnm$ where short = :1

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.10 0 0 0 0
Execute 10 0.00 0.00 0 0 0 0
Fetch 10 0.00 0.29 0 40 0 10
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 21 0.00 0.39 0 40 0 10

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: SYS (recursive depth: 1)
********************************************************************************

select longdbcs
from
javasnm$ where short = :1

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.04 0 0 0 0
Execute 0 0.00 0.00 0 0 0 0
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1 0.00 0.04 0 0 0 0

Misses in library cache during parse: 0
Parsing user id: SYS (recursive depth: 1)
********************************************************************************

SELECT max(version)
FROM
"SYS"."JAVA$POLICY$SHARED$TABLE"

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 2 0.01 0.04 0 0 0 0
Execute 2 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.05 0 6 0 2
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 6 0.01 0.09 0 6 0 2

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: SYS (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=3 r=0 w=0 time=51547 us)
2 TABLE ACCESS FULL JAVA$POLICY$SHARED$TABLE (cr=3 r=0 w=0 time=51509 us)

********************************************************************************

select privilege#,level
from
sysauth$ connect by grantee#=prior privilege# and privilege#>0 start with
(grantee#=:1 or grantee#=1) and privilege#>0

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 3 0.00 0.01 0 0 0 0
Execute 3 0.00 0.07 0 0 0 0
Fetch 180 0.01 0.07 0 87 0 177
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 186 0.01 0.16 0 87 0 177

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: SYS (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
59 CONNECT BY WITH FILTERING (cr=29 r=0 w=0 time=50778 us)
24 NESTED LOOPS (cr=26 r=0 w=0 time=19745 us)
24 CONCATENATION (cr=2 r=0 w=0 time=122 us)
0 INDEX RANGE SCAN I_SYSAUTH1 (cr=1 r=0 w=0 time=57 us)(object id 110)
24 INDEX RANGE SCAN I_SYSAUTH1 (cr=1 r=0 w=0 time=46 us)(object id 110)
24 TABLE ACCESS BY USER ROWID SYSAUTH$ (cr=24 r=0 w=0 time=19586 us)
31 NESTED LOOPS (cr=3 r=0 w=0 time=314 us)
45 BUFFER SORT (cr=0 r=0 w=0 time=140 us)
45 CONNECT BY PUMP (cr=0 r=0 w=0 time=50 us)
31 INDEX RANGE SCAN I_SYSAUTH1 (cr=3 r=0 w=0 time=91 us)(object id 110)

********************************************************************************

alter session set sql_trace=false

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.10 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.00 0.10 0 0 0 0

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 419

********************************************************************************

OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 2 0.02 1.85 0 0 0 0
Execute 3 0.00 0.04 0 0 0 0
Fetch 2 0.41 24.13 0 2 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 7 0.43 26.03 0 2 0 1

Misses in library cache during parse: 1

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 4 0.00 0.00
SQL*Net message from client 4 21.69 52.66
SQL*Net break/reset to client 2 0.01 0.01

OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 7 0.01 0.21 0 0 0 0
Execute 15 0.00 0.07 0 0 0 0
Fetch 192 0.01 0.41 0 133 0 189
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 214 0.02 0.70 0 133 0 189

Misses in library cache during parse: 0

3 user SQL statements in session.
7 internal SQL statements in session.
10 SQL statements in session.
********************************************************************************
Trace file: tic_ora_13742342.trc
Trace file compatibility: 9.00.01
Sort options: default

1 session in tracefile.
3 user SQL statements in trace file.
7 internal SQL statements in trace file.
10 SQL statements in trace file.
7 unique SQL statements in trace file.
391 lines in trace file.

January 04, 2010 - 12:10 pm UTC

sam , look at the other place you posted pretty much the same thing

It's the recursing package call

Santosh Kompella, January 04, 2010 - 1:31 pm UTC

SELECT COUNT (*)
FROM ticuser
WHERE userid = 'mike'
AND new_passwrd = SHA1_PKG.F_SHA1('secret')

In this above statement the SHA1_PKG.F_SHA1('secret') call is executed for atleast rows where userid = 'mike' or worse for all the rows in the table. Oracle may be spending lot of time doing context switching between SQL and PL/SQL engine. There are couple of ways of reducing it.
1. Get the value for SHA1_PKG.F_SHA1('secret') in a variable and use variable in the SQL.
2. Define the F_SHA1 function as deterministic in the package. This will reduce the number of executions to at most two.

Hope this helps.

January 04, 2010 - 2:08 pm UTC

no, it should call it twice at most to get the start/stop keys for the index range scan (if a function based index is in place) or once if the index is just on username and the function is applied during the filter phase.

do you see that in subsequent executions, there are no recursive calls.

And just calling a package/function would not imply recursive SQL - only if it did SQL would it...

consider:

ops$tkyte%ORA10GR2> @test
ops$tkyte%ORA10GR2> /*
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create or replace function foo(x in number) return number
ops$tkyte%ORA10GR2> as
ops$tkyte%ORA10GR2> begin
ops$tkyte%ORA10GR2>         dbms_application_info.set_client_info( to_number(userenv('client_info'))+1 );
ops$tkyte%ORA10GR2>         return x;
ops$tkyte%ORA10GR2> end;
ops$tkyte%ORA10GR2> /
ops$tkyte%ORA10GR2> drop table t;
ops$tkyte%ORA10GR2> create table t as select * from all_users where rownum <= 5;
ops$tkyte%ORA10GR2> */
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> exec dbms_application_info.set_client_info(0);

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> set autotrace traceonly statistics;
ops$tkyte%ORA10GR2> select * from t where foo(user_id)=user_id;


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
        674  bytes sent via SQL*Net to client
        400  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          5  rows processed

ops$tkyte%ORA10GR2> set autotrace off
ops$tkyte%ORA10GR2> select userenv('client_info') from dual;

USERENV('CLIENT_INFO')
----------------------------------------------------------------
5

that function was called five times, recursive calls - zero.

but it is using the index, the index is hitting the table ONCE, the function (the java function - it is the cause of the large INITIAL, one time delay) is called once or twice at most in this example.

Past history of execution plans of the query

A reader, February 17, 2010 - 10:51 pm UTC

Tom

Database: 10gR2

Is there anyway to find out the past execution plans for the specific query ? I want to see how the actual execution plan was 3 days back at specific time. I have AWR snapshot data for 14 days configured.
Thanks!

February 18, 2010 - 8:43 am UTC

from the command line, you can:

@?/rdbms/admin/awrsqrpt

you can also "time travel" in enterprise manager and look at SQL over time in the GUI as well.

tuning

sam, February 01, 2011 - 4:27 pm UTC

Tom:

When I check for worst performing SQL that uses a lot of resources in V$SQL I get this which shows high disk reads and buffer gets.

I do not have this SQL in my application. This must be sent by MOD_PLSQL. Is there some bad SQL there or is there anything I can do for this.

EXECUTIONS BUFFER_GETS DISK_READS FIRST_LOAD_TIME
---------- ----------- ---------- ------------------- SQL_TEXT
-----------------------------------------------------------------------------------------------------------------
675 1040227 423033 2011-01-30/13:03:13
declare rc__ number; simple_list__ owa_util.vc_arr; complex_list__ owa_util.vc_arr; function match_pattern ( p_string in varchar2, p_simple_pattern in owa_util.vc_arr, p_complex_pattern in owa_util.vc_arr, p_use_special_chars in boolean default true ) return boolean is special_chars constant varchar2(27) := ' @*()+-/=\<>;:"|&?{}[]''' || chr(9) || chr(10) || chr(12) || chr(13); trans_chars constant varchar2(27) := '***************************'; l_string varchar2(512) := upper(p_string); l_count integer := p_simple_pattern.count; begin if (p_use_special_chars) then if (instr(translate(l_string, special_chars, trans_chars),'*') > 0) then return true; end if; end if; for i in 1..l_count loop if (l_string like upper(p_simple_pattern(i)) escape '\') then return true; end if; end loop; l_count := p_complex_pattern.count; for i in 1..l_count loop if (owa_pattern.match(line => l_string, pat =>
upper(p_complex_pattern(i)),flag

431 405998 327606 2011-01-30/08:00:14

February 01, 2011 - 6:22 pm UTC

that is the code block that runs your code. The buffer gets and reads are cumulative and reflect YOUR sql that is called ultimately by this block of code

Andrew, February 12, 2011 - 12:43 pm UTC

Tom,
quick question about the optimizer choice. I'm running 10.2.4 on Sun. Don't have system stats on. tables in question are loaded and stats are collected. These are regular tables. nothing "fancy" or special about them.

Take 2 tables and define ref integrity. The only one note here- the FK on child is on PK column.
If I issue the follwoing statement:

select c.a from parent p, child c
where p.pk= :1 and p.pk=c.pk

in the plan it looks up the parent table pk before proceeding with access to child.
But if i think about what the sql is doing than it's seemed to be a redundent operation - since all information is presented in the child to begin with.
Am i missing smth?

February 14, 2011 - 7:38 am UTC

It could have removed the parent from the query altogether - if you change the predicate to be c.pk = :1, it does. It is an optimization that could be made but as of 11.2.0.2 isn't yet made.

query on large tables

Ravi B, February 22, 2011 - 6:08 pm UTC

Hi Tom,

We have a system where our researches constantly enter and query data in our database as they find information,
throughout the day. It is a multi-user environment, OLTP if you will.

Our architecture is such that all the application logic resides in xml/java layer. Believe it or not,the application has "built-in" concurrency control, constraint management, cascade updates,cascade deletes, dynamic SQL generation, pagination and all that goes with an application. No stored procedures, packages or any procedural objects except tables, views and indexes.
Everything is meta-data driven. The database (oracle 10g) is used primarily as a bucket which holds data.
"yikes" is an understatement :(

I was brought in to address performance issues for this system.

I did some preliminary checks with explain plan/tkprof and found that the "sql generator engine" is generating some "abusive" sqls which is very difficult to fix. I fix one thing and it breaks another. It also does things like count(*) on big tables to get total no.of pages for pagination queries etc. The application developer/DBA who worked on this before me tried various partition schemas,
indexes etc but it seems dint work out.

For example. We have a big table about 7148354 rows (about 1.5 gb) which has about 37 columns. Researches can bring up the information on the UI (big table, joining several tables) and query on any combination of these 37 columns to narrow their search and delete/update information
as appropriate. This search is always a wild card search. The SQL Engine always does a LIKE operation.

Also, the query is always on "enddate is null" meaning, all the rows which are active. Inactive rows are "end dated" with sysdate. I tried several combination of concatenated indexes, function based indexes,bit-map indexes etc; nothing worked as optimizer always wants to do
a FTS/Hash join, understandably so, because that would be less expensive than index search. The distribution on the table is something like

total rows in the table: about 7148354
rows with enddate is null 6967034
rows with endate is not null 181320

I tried intermedia text but dint workout because we had to rebuild the indexes each time to make the search current.

The performance is painfully slow, i mean very very slow.

Please let me know if I can do anything else before giving it up! Please do let me know if you need more information.

Rewriting this application is ruled out in near future.

Thanks in advance!

February 23, 2011 - 8:33 pm UTC

.. Our architecture is such that all the application logic resides in xml/java
layer. ...

I predicted what you would say when I read that.

I was wrong, it was worse than I thought.

I was brought in to address performance issues for this system.

I feel bad for you (I'm making these comments as I read your post.... not as an after thought)

Please let me know if I can do anything else before giving it up!

In the words of Monty Python - "run away, run away"

Rewriting this application is ruled out in near future.

bzzzzt - wrong answer, game over player one. They lose.

There is no hope here, I don't typically use the phrasing "there is NO", I usually use "there is probably not", but there is NO applies here.

I thought so...

Ravi B, February 24, 2011 - 11:31 am UTC

Well, as a FTE i cant run away so quick :) I would have definitely done that if i were a contractor. Anyways, I always feel good when I validate my thoughts with you. Asktom never let me down so far. Thanks a million for your service!

performance design

sam, March 23, 2011 - 10:59 pm UTC

Tom:

I have a web application with a slow main menu (3-5 seconds) page. The menu design is such that the page is about 20 KB in size because it builds dynamically list of vendors to view/edit data for and list of contacts, etc. Also, all the application links are listed on the main menu.
It also has a few derived numbers that are calculated real time (i.e open orders). so every time user navigates to main menu we run the sql again.

I did a SQL trace and checked the tkprof report and found that two SQL commands that use a dblink to lookup an organzation and contact table cause the delay sometimes. It could be a network or machine issue.

The tables in the remote db which is on same machine do not get updated that often. I am also looking up a few columns in those tables.

i am thinking of copying those remote tables locally to speed things up and refresh the tables nightly from remote tables.

Do you agree? and how would you implement this.

WOuld you

a) create a local table with same structure and procedure and job to DELETE data and the INSERT new data from remote table.
b Run DDL (CREATE TABLE AS SELECT * FROM TABLE@LINK) in local database to create the local table from remote table every night..
c) WOuld you create Materialized view for local table and schedule to refresh nightly from remote table..

Can you advise?

Thanks,

March 24, 2011 - 8:14 am UTC

materialized view would be the only logical way to do it

I use materialized views on my own home page here - to show the counts at the top of the page.

performance

Sam, March 24, 2011 - 8:26 am UTC

Tom:

1) How often do you refresh your MV? Are the stats updated real time (i.e if someone closed an order and he goes back to main menu)?

2) What about looking up the data from remote table. I only need a few columns (i.e last name, first name, organization type).

Would you create a local table or MV for this too?

If i ran DDL to create the table I would need to create the index and update the STATS. Even if i delete data and reinsert I would think i would need to update stats so optimizer can use the index.

March 24, 2011 - 8:46 am UTC

1) as often as I like. My period of refresh is not relevant to your problem Sam. ONLY YOU know how current your information needs be. Think about it.

2) I would never use a dblink in 'real time' myself. It would add a lot to the runtime and makes my system 100% dependent on the remote system. ALL of my remote accesses are done in the background, not in a user transaction. Right now the availability of your system is much lower than it needs be since your database has to be up AND their database has to be up. Also, their response time to your queries affects your performance and you have NO CONTROL over their response times.

And as I wrote, the only logical choice would be a materialized view. I've already answered that question

If i ran DDL to create the table I would need to create the index and update
the STATS. Even if i delete data and reinsert I would think i would need to
update stats so optimizer can use the index

Sam, think that entire thought process through please - think it through and work it out yourself. I know you can - just take the time to do it and don't expect everything to be spoon fed to you.

Ask yourself "if before I delete and re-insert the data, my queries used the index - would I need to update the stats to keep the optimizer using my index?"

think it through.

performanc tuning

sam, March 25, 2011 - 5:53 pm UTC

om:

To answer your question first, "it depends".

If the source table has 100 ROWS and remote table has 100 ROWS then you dont need to update stats, but if your remote table has 10000 and you loaded those into local table or MV then yes you would need to update it..

You know what I concluded yesterday. Using DB links real-time in an application is as dangerous as some nasty virus sleeping in a human body. You never know when it wakes up and does damage. If you think triggers are bad, DB links are worse.

DB links are very tricky. When i run the SQL in sql*plus to test i get excellent performance time and logical/physical statistics but when I analyzed many TKPROF reports to see what is causing all these delays it is all related to the DB link. It is very unpredictable.

Even a simple SQL like "select fname,lname from contact@hr_link where contactid='john' " would take sometimes 1-2 seconds.

I should have listened to you before but sometimes you dont believe a thing until you see it live.
For all remote tables I created READ-ONLY MATERIALIZED VIEWS and got rid of all the DB links and the application runs faster than a high speed train. What a difference that made!

My main menu was taking sometimes 5 seconds, now it runs in 0.05 seconds.

The people who originally architected this thing were creating a database/instance per application. And now we have applications and links all over the place creating all kinds of chokes.

1) Why do you think DB links are sometimes slow? Is it a networking bottleneck or it is a connection overhead or the other instance is tied up. I noticed also it always PARSES the SQL with every call when using a link.

2) This is what i did.

On the Local/master database:

CREATE MATERIALIZED VIEW LOG ON <tab> ;

On the remote site:

CREATE MATERIALIZED VIEW <mv_name>
BUILD IMMEDIATE
REFRESH FAST START WITH SYSDATE NEXT TRUNC(SYSDATE) + (25/24)
AS SELECT * FROM <tab>@<dblink>;

That should build a read-only materialized view that is refreshed daily at 01:00 AM.

This solution works well for all REMOTE SELECTS/READS. Do you see any issues I did wrong?

3) It is not as easy to get rid of the REMOTE WRITES that use a link that are done real time too.
How do you handle those usually. Do you write locally to a table or MV and push the records later somehow.
Can you explain how best handle UPDATES/DELETES/INSERTS that use a db link.

Would you create a local table and save the data there and have a hourly or nighly job that checks the table and copies the data to remote and then delete it from local.

I think the best thing for the long term is to consolidate the databases into one and create different schemas but that requires more work and management approvals and funding.

March 29, 2011 - 2:34 am UTC

1) Why do you think DB links are sometimes slow? Is it a networking bottleneck
or it is a connection overhead or the other instance is tied up. I noticed also
it always PARSES the SQL with every call when using a link.

we only parse when YOU MAKE US parse. You - you and you alone - control the number of times a statement is parsed. You are false in your statement that 'it always parses', it does not.

ops$tkyte%ORA11GR2> @mystat 'parse count (total)'
ops$tkyte%ORA11GR2> set echo off

NAME                        VALUE
---------------------- ----------
parse count (total)            85

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> declare
  2          cursor c is select * from dual@loopback@ora11gr2;
  3          l_rec c%rowtype;
  4  begin
  5          for i in 1 .. 100
  6          loop
  7                  open c;
  8                  fetch c into l_rec;
  9                  close c;
 10          end loop;
 11  end;
 12  /

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> @mystat2
ops$tkyte%ORA11GR2> set echo off

NAME                        VALUE DIFF
---------------------- ---------- ------------------
parse count (total)            92                7

if it did, parse count would be over 100....

Now, why are they sometimes slow...

o when you first touch them, you have to open a connection and session to the remote database - that is a lot of work

o you added latency, network round trips

o you are relying on the remote database to perform to your expectations and most times you cannot even really 'see' what the load is

o you've added a ton of extra work - which if you don't have the capacity right now to perform, you will wait.

2&3) I avoid creating more than one database if they actually need to share data, else if I HAVE to, I use an MV to replicate (ONE WAY, no update anywhere crud) just what I need

I think the best thing for the long term is to consolidate the databases into
one and create different schemas but that requires more work and management
approvals and funding.

I would say "but that requires less administrative overhead and work ultimately, less cost, more benefit, and the approval of an intelligent management chain"

performance

sam, March 30, 2011 - 9:44 pm UTC

Tom:

I guess I was wrong on thinking that if the remote database runs on the same machine you would not have much network issues and connection issues.

But you did not explain to me how I would handle WRITES with MVs.

For example, I have this situation now where the local user create new VENDORS and CONTACTS before he can place an order.

The VENDORS and CONTACTS tables are in a remote database so i INSERT the data using a db link.

The issue here is that READ ONLY MVs i created for VENDORS and CONTACTS table in the local database will not have this new vendor so the until the MV is refreshed at night. he user may need to select this vendor on the order form.

How do you handle this issue. WOuld you

a) Schedule a JOB to fast refresh the MV in the transaction that inserts the record into the remote DB.
b) Make the local MV Writeable and insert the record in remote and local database.
c) Save the vendor and contact data locally into the local MV and push it to the remote DB via job.

What do you advise to do?

April 12, 2011 - 9:55 am UTC

I would only do D)

d) you obviously want a single database, it is so painfully obvious. There is no way this should be more than a single database.

I seriously have no other comment. Making a user wait for distributed stuff, making your application dependent on TWO databases being simultaneously available - something I cannot even imagine doing in real life.

(but since you already have the database link open, since you already now where the data is, why not just select it from the remote database...)

Index on foreign keys

Rakesh, August 01, 2011 - 1:29 am UTC

Hi Tom,

My database is experincing lot of Emq-Tm contentions. PPls say that we need to create an index on foreign key on child tables. Whys is it required.. Whats is data flow between parent and child tables when DML's are fired.

It would be a great help,if u can explain me with an example

August 01, 2011 - 11:56 am UTC

ppls? no clue what that is.
u? I don't know who U is, why would you come here to ask someone named U to explain something?

see http://www.oracle.com/technetwork/issue-archive/2010/10-nov/o60asktom-176254.html and read the last part of that article for a discussion on the locking behavior of unindexed foreign keys.

performance issue after database moved to high end server

sikki, January 10, 2012 - 12:34 am UTC

Tom,

The batch process which runs for 9 hours is now taking 14 hours to complete with the server which has higher configuration in terms of CPU and memory. This is affecting business as we are unable to give the report.

Could you please confirm why does it takes time in new server which has better capacity high end server ? Ideally it shound at least give the same performance as old server, but it gives poor performance.

The things i tried :

I had looked at AWR report and got few problematic queries and if i use specific index hint in few queries which has problem the total job run coming close to 12 hours.

gathered stats with auto_sample_size across database.

if i disable parallelism on few tables the query runs better, why is this on new server, better resources availability.

and many more..

My questions are:

1. If the sql's have to be tuned why the same runs well in old server, though i copied the stats from old to new?
2. the new machine has double the capacity of old server in terms of CPU and memory, why does the sql takes poor plan?
3. If I gather stats with estimated of 100% with dbms_stats.gather_database_stats, it should run well, but teh job never completes even after 15 hours and I had to copy the stats from old to new server to bring close to 14 hours, why is this case?
4. Do you suspect any other problem ?
5. Do i really need to do SQL tuning of all the problmatic queries to fix? Again doubt raises the same running well if i point the connect string back to old sever.

Your expert advise is really appreciated.

performance tuning using partitioning

A Reader, June 29, 2012 - 10:27 am UTC

Tom,
I just need your advice on the partitioning way of improving the things.
I read your books chapter covering the partitioning bits few years back.

we have a real example/tasks on paritioning on hand and just wanted to share with you for your expert opinion. ;)

facts :
1. table T is range partitioned on column (c1 number)
2. c1 column is not unique.
3. composite ( c1, c2, c3) forms a unique combination.
3. partition p1 has range upto 1000000000, p2 has upto 2000000000,.....partition p10 has upto 10000000000 and rest is maxvalue. --- I didnt changed 1000000000 and likewise to 1,000,000,000 to make it more readable as in one of the post you mentioned 1,000,000,000 is not the number, sorry about that :)
4. we seen all the data is going to p1 partition - as of today , max(c1) is far below from the ceiling figure 1000000000, table is being used since last 02 years or so.
5. sudenly one day it is found - partitionining is not proper ( we see pruning would not be happening ).
6. We are in the process of assessing if the paritioning is proper or not - because someone did the partition few years ago..
7. We are at 10gR2 (10.2.0.4)

Next immediate step:
a) Decided to split partition the p1 ( making it smaller)leaving the other p2... p10 as it is - thus inviting the split partiton again again :( ...- also the million dollar question would be - how to choose the magic figure of splitted parition size..

My approach :
a) Understand the question, whether partitioning is really required or not ? ( this would lead to first looking into what we want to achive from partitioning...
I see in our case it is only the Performance ( one of the corner of the tringle you mentioned in your book) to be achived.
b) If the queries happens to have the predicate on columns c1 , then choose c1 as the candidate column for partitioning else chose other column/s.
c) re-visit the paritioning scheme , may be hash(c1) would be more appropriate here...as Oracle would do equal distribution of the data across the paritions
d) Keep the parition across the different tablespaces.. ( availablity corner of the triangle), But this is really rarely required in 21st century databases .. we have redudndancy, robustness built at every level ( disks spindles/ controller/ IO borads etc) so things usually wont fail...it is something like all will fail ( disaster ) or nothing would fail :)
e) I would see if Administration ( last corner of the triangle ) is required to achived in this case.. may be if someone getting rid of inactive data frequently.. or someone doing exchange paritition etc using this table..

Question/s
q1) your comments on the give facts avove, next immediate step and my approach of analysing the same?
q2) how does hash paritioning internally works? .. some pseduo code would give more insight.. ( i have read many psuedo code from you , which makes the things crystal clear.. the one which always runs of top of my mind is pseudo of 'rollback' which starts with commit statement ;) )
q3) Do the inserts would get benifitted from partitioning ( may be parallel inserts .. ?) , I feel in 20th Century this might have been, but now a days it hardly matters.. as LMT, ASSMS are there today.
q4) If performance gain is the only aim...then most of the queries which involves the c1 column , would defintely get benifitted from paritioning pruning c1 column ( in case Full table scan .. it would be small table now to full table scan .. and in case of index range scan .. it would be local index range scan..).?
q5) querris which involves predicate other then the partition key column (c1) , we can safely create the global indexes and for those querries it would be as if table is not paritioned.. ?
q6) How to estimate the time taken for - changing the existing paritioning scheme or creating partitons from the existing partition and say this would take some <n> hrs.. I would do benchmark to reach on such figures .... ? what do you say ...I read some article " Partition 4TB tbale in 20 mins " - is there something we can exploit within Oracle to reduce th outage during partioning implementation or we have to go for DIY ( customization approach ) to really make the outage to minimal during partioning implementation.?
q7) I see paritioning implementation requires good understaning of performance tuning? which in turn requires basic understanding on everything how Oracle works ;)

Many thanks for your time and happy weekend ahead.

July 02, 2012 - 6:41 am UTC

What kind of queries do you pose against this data, what are the predicates in general

How many rows on average would an individual query need to retrieve from this table in order to satisfy those queries.

q1) first step would be to see if partitioning could help you achieve your goal of "better performance". Unless you are using full scans a lot - the answer is likely "no, not really, be careful when partitioning to not make it slower"

q2) hash partitioning is easy - you have a table in N partitions. It is hash partitioned on a key X. When you insert - we take the value of X and hash it into a number between 1 and N. That tells us what partition to put it in. When you say "where x = ?", we take the ? value at runtime, hash it to that same number 1..N and go to that partition to retrieve it.

q3) inserts usually run slower when you have partitioning - think about it - you have to do extra work (you have to figure out where they go and then put them there, as opposed to "just put them in the table"). There are rare cases where partitioning a table that undergoes heavy concurrent modifications will improve insertion rates since they reduce contention (on hot right hand side indexes populated by sequences for example)

q4) only if you are doing full scans would it be hugely beneficial. A local partition index is still and index, and indexes take work to navigate. It might be true that a locally partitioned index might be smaller than a non-partitioned index on the same table, but this is usually negligible.

q5) probably

q6) you can reorganize this table entirely online without an outage using dbms_redefinition.

otherwise, you'd have to understand what you *sustained* IO rates are - you'd be doing direct path reads and writes - how fast can you read that data and write it again? Can you use parallelism (do you have the resources to exploit - remember, if ONE process is using 100% of your IO bandwidth, having two processes trying to do that at the same time will run a lot slower than the one did by itself - parallel <> fast=true unless you have the resources to give it.

q7) not really, it requires a sense of the physics behind it all. You can see how to use partitioning as a performance tool without knowing all of the nuances behind tuning in general.

GC Buffer Busy

Reader, July 04, 2012 - 6:49 am UTC

Hi Tom,

I came across this blog which teaches how to use the dba_hist_active_sess_history to point to RAC DB issue.

The author after his analysis points the problem to the fact that segment space management of the Tablespace was set to Manual.

My Question to you is: Oracle recommeds auto - segment management to be set to automatic all the time. From your experience can you explain to me any scenario where we would need the auto-segment to be set to Manual?

July 05, 2012 - 7:13 am UTC

not in the year 2011, no.

GC Buffer busy

Reader, July 04, 2012 - 6:53 am UTC

Forgot to mention the link in my previous post

http://www.ardentperf.com/2007/09/12/gc-buffer-busy-waits-in-rac-finding-hot-blocks/

Index contention

Anand, July 04, 2012 - 11:40 pm UTC

Hi Tom,

We have a big table in production which is used extensively in online .And in batch we insert many rows as well.The problem is while in batch processing we see lot of index contention(because parallel processing in batch) but we need index for online otherwise it will be very slow.So every day before batch we drop this index and at morning re create index.Is there any solution to handle this kind of case.

July 05, 2012 - 8:39 am UTC

what is wrong with what you are doing?

Instead of doing slow by slow processing in multiple threads you control (doesn't perform well, doesn't scale well), you could try using big, bulk, pure SQL statements we run in parallel...

It depends on what this index is on and what you are doing with it. Describe to me the contents of this column and how you use it in where clauses in general.

performance issue

A Reader, July 06, 2012 - 11:27 am UTC

Tom,
we have the query of the folowing ..

MERGE /*+ PARALLEL(b,2) NOLOGGING */
INTO t_load b
USING
( SELECT a.SERVICE_KEY SKEY ,a.ACCOUNT_KEY AKEY
,a.PERIOD_KEY PKEY ,a.USAGE_TYPE_KEY UKEY ,COUNT(*) CNT FROM
t1 a WHERE a.Key = :B1 GROUP BY
a.SERVICE_KEY, a.ACCOUNT_KEY, a.PERIOD_KEY,
a.USAGE_TYPE_KEY ) A
ON ( b.SERVICE_KEY = A.SKEY AND b.ACCOUNT_KEY = A.AKEY AND b.PERIOD_KEY = A.PKEY AND
b.USAGE_TYPE_KEY = A.UKEY AND b.Key = :B1 )
WHEN MATCHED THEN
UPDATE SET b.TOTAL_CALL_CNT = A.CNT

and the plan is( -- sorry Tom, I tried to format the below to more readable .. but not succeeded)

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

| Id  | Operation                                       | Name                        | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | MERGE STATEMENT                                 |                             |       |       |       |   637 (100)|          |       |       |        |      |          |
|   1 |  PX COORDINATOR                                 |                             |       |       |       |            |          |       |       |        |      |          |
|   2 |   PX SEND QC (RANDOM)                           | :TQ10002                    |   585 | 54405 |       |   637   (1)| 00:00:09 |       |       |  Q1,02 | P->S | QC (RAND)  |
|   3 |    INDEX MAINTENANCE                            | t_load           |       |       |       |            |          |       |       |  Q1,02 | PCWP |          |
|   4 |     PX RECEIVE                                  |                             |   585 | 54405 |       |   637   (1)| 00:00:09 |       |       |  Q1,02 | PCWP |          |
|   5 |      PX SEND RANGE                              | :TQ10001                    |   585 | 54405 |       |   637   (1)| 00:00:09 |       |       |  Q1,01 | P->P | RANGE    |
|   6 |       MERGE                                     | t_load                      |       |       |       |            |          |       |       |  Q1,01 | PCWP |          |
|   7 |        BUFFER SORT                              |                             |       |       |       |            |          |       |       |  Q1,01 | PCWC |          |
|   8 |         PX RECEIVE                              |                             |   585 | 54405 |       |   637   (1)| 00:00:09 |       |       |  Q1,01 | PCWP |          |
|   9 |          PX SEND HYBRID (ROWID PKEY)            | :TQ10000                    |   585 | 54405 |       |   637   (1)| 00:00:09 |       |       |        | S->P | HYBRID (ROW|
|  10 |           VIEW                                  |                             |       |       |       |            |          |       |       |        |      |          |
|  11 |            SORT GROUP BY                        |                             |   585 | 54405 | 82000 |   637   (1)| 00:00:09 |       |       |        |      |          |
|  12 |             PARTITION RANGE ALL                 |                             |   585 | 54405 |       |   627   (1)| 00:00:09 |     1 |    12 |        |      |          |
|* 13 |              HASH JOIN                          |                             |   585 | 54405 |       |   627   (1)| 00:00:09 |       |       |        |      |          |
|  14 |               PARTITION LIST SINGLE             |                             |   585 | 40950 |       |    36   (0)| 00:00:01 |       |       |        |      |          |
|  15 |                TABLE ACCESS BY LOCAL INDEX ROWID| t_load                      |   585 | 40950 |       |    36   (0)| 00:00:01 |   KEY |   KEY |        |      |          |
|  16 |                 BITMAP CONVERSION TO ROWIDS     |                             |       |       |       |            |          |       |       |        |      |          |
|* 17 |                  BITMAP INDEX SINGLE VALUE      | bitmap_ndx1                 |       |       |       |            |          |   KEY |   KEY |        |      |          |
|  18 |               PARTITION LIST SINGLE             |                             | 10475 |   235K|       |   591   (1)| 00:00:09 |       |       |        |      |          |
|  19 |                TABLE ACCESS BY LOCAL INDEX ROWID| t1                          | 10475 |   235K|       |   591   (1)| 00:00:09 |   KEY |   KEY |        |      |          |
|  20 |                 BITMAP CONVERSION TO ROWIDS     |                             |       |       |       |            |          |       |       |        |      |          |
|* 21 |                  BITMAP INDEX SINGLE VALUE      | bitmap_ndx                  |       |       |       |            |          |   KEY |   KEY |        |      |          |

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

  13 - access("b"."PERIOD_KEY"="a"."PERIOD_KEY" AND "b"."SERVICE_KEY"="a"."SERVICE_KEY" AND "b"."ACCOUNT_KEY"="a"."ACCOUNT_KEY" AND
              "b"."USAGE_TYPE_KEY"="a"."USAGE_TYPE_KEY")
  17 - access("b"."Key"=:B1)
  21 - access("a"."Key"=:B1)

facts:
a) we seen during long ops hash doing a lot of work

  ID   SID SERIAL# USERNAME START_TI SQL_ID        MESSAGE                                   Minutes
---- ----- ------- -------- -------- ------------- ---------------------------------------- --------

   2  2054     753 OLAP     06.07.12 apvkedjeg2tbb Hash Join:  : 2778396 out of 17174512 Bl   187.57

ocks done
17174512 blocks = 17174512 * 16*1024 Bytes -- 16k block size
= 262 GB !!!
Its is too much
we have t_load just 28 GB
an
t1 just 2 GB

b) we saw most of the work done here hence most wait time.

c) There is spill to disk ... here
currently we have work_area size policy manual for this session, but sort_area is just set to 64k and no hash_area set
we will bump these 02 _area to 2GB

d) we see merge takeing hrs to complete.

e) we are at 11.2.0

Question
a) Why Hash join doing this huge work here ? - bug?
b) We seen somewhere "INDEX MANAGEMENT" ? But we are not updating the index columns which are used in the plan..? would Oracle be waiting here?
c) update would actually happen when the 1st rows flows out from hash join?
d) any other suggestion?

July 11, 2012 - 10:33 am UTC

you do know that NOLOGGING is not a hint right?

NOLOGGING is an attribute of a table, if a table is nologging - then DIRECT PATH operations will not be logged (as long as the database is not forcing logging!). And to do a direct path with merge - you would use APPEND, but it will ONLY apply to the insert component!

1) look at the units in v$session_longops before jumping to conclusions about what those numbers mean! you are assuming blocks, see what units says!

b) I can imagine, it is a lot of data

c) use automatic pga memory management, please...

a) why hash join? because it is *brutally efficient* for large data sets.

tell me, are those row values in the plan close to reality? If so, why are you using parallel??????

d) but you are updating indexed columns, that is what index management is about

c) correct

performance issue..contd

A reader, July 06, 2012 - 11:48 am UTC

Tom
further,

e) Would "INDEX MAINTENANCE" be happening when hash operation is in progress? -- its when 1st set of candidate rows would keep flowing out from hash join and those would be taken in next operation ..So would "INDEX MAINTENANCE" be adding to the has join long operation stats which are shown above?

regards

July 11, 2012 - 10:38 am UTC

index maintenance is just us maintaining indexes as a result of an update to an indexed column or an insert into a table.

.Performance tuning process

A Reader, January 28, 2013 - 1:22 am UTC

Tom,
Sorry I asked the same question by direct mailing to you sometime ago .
I know that is not scalable solution. sorry about that .
:-(

I am working on performance analysis of a database. I have identified the areas and these are below at high level.

1. Concurrency issue
parse/execution =1 - many parse and many execution of the same SQL - a soft parse ) - due to this sometime top 5 wait event in AWR for 1 hr are showing 60% concurrency waits on latch: library cache

2. Sweeper indexes : Index height 5 in many cases, index size > table size for a single number column index; table has 6 columns.

3. Index contention enq wait sometime popping up in AWR accounting 10% concurrency waits

4. Stale statistics in all schema objects - statistics not gathered since many years. There are around 8 billion DMLs on the table ( now total 9 billion).

5. Improper partitioning.
Range based partitioning for sequnce populated columns; 1 table has all the data in single ( 1st partition - very high value of partition range) and other have started seeing most of the data in the maxVal partition ( low value of partition range).

6. Improper Indexing scheme on partitioned table
All Non partition key are created as LOCAL indexes. More IO.

7. Some of the tables( say t1 , and t2) have primary key not enabled. Reason - these tables are partitioned table and primary key column has LOCAL partitioned index created. Index on primary Column is not composite index ( not including the partition key column) . issue leading to logical corruption of data.

I have evidences of above issues and recommended the fixes.

Now the question/s:

a) If we fix all above how much % performance gain would be there?

b) By seeing the issues listed above , when the system breakdown would occur ( future date/time) ?

c) if we do all above execercise how long the we can push the system breakdown date? - I mean if on one hand we dont not do above fixes system would break on day/time -date_1_time_1 . However on other hand if we do the fixes for the above issues today how long we can push the date_1_time_1 to another date date_2_time_2 ( here date_1_time_2 > date_1_time_1 )

d) if really answer exists for above questions what is the methodology we should use.?

January 30, 2013 - 2:29 pm UTC

a) how long is a piece of string?

seriously - no one can estimate that given this information - it would take research and benchmarking *on your system* to even think about estimating this.

b,c,d) see (a)

...Performance tuning process

A Reader, January 30, 2013 - 5:08 pm UTC

thanks a lot Tom.

I am also of the same view.

clarification about few lines in expert one-on-one

Ranjan, September 17, 2013 - 10:17 am UTC

Hi Tom,

hope you are doing great.

In your book expert one-one-one on page 541,
you have written the below sentences.
############
There wasnʹt an index in the world that could help us here – so we built our own. Every
night as we refreshed the data from our HR system (a complete refresh of data), we would
also issue:
CREATE TABLE FAST_EMPS
PCTFREE 0
CACHE
AS
SELECT upper(last_name)||ʹ/ʹ||upper(first_name)||ʹ/ʹ …. || ʹ/ʹ || ...................
.
.
.
.
.
and on page number 543 you had written below sentences.
######
Before settling on this design for the above application, we tried a couple of alternative
approaches. We tried using a fast full scan on a function‐based index (close, but not as fast),
we tried interMedia Text (not useful due to the %ABC% requirement),
#######

First of all I want to say very nice book.
I have a doubt regarding above excerpt .
##############
In page 541 you have written database had no index concept
that time and in page 543 you have written that you tried
a fast full scan on a function based index( referring to the texts on 541) .

Could you please clarify whether function based index concept was there but no normal index concept was there ,or
You have written wrongly on page 543 that you tried full scan function based index.

Thanks as always.

September 24, 2013 - 10:57 pm UTC

the attempt at indexing proved out that there was no index that was going to help us.

we tried them, they did not help.

think of it this way:

there was no one that could help Bob. He tried Mary, she couldn't help. He tried Sam - Sam couldn't help him either. So Bob did it himself.

Bob is the problem in this case.
Sam and Mary are indexes.

bit disappointed by your answer,but its ok

Ranjan., September 25, 2013 - 5:39 am UTC

Hi Tom,

You dont know how much we love you.
I may know( only know :) ) fraction of you know about oracle ,even less than that, but I always believe in you.

But here your Bob ,Sam and Mary example made me upset a bit.
You wrote ,"We tried using a fast full scan on a function‐based index (close, but not as fast)".

How come it is close,but not as fast ,if it is not at all exist?

think of it this way:

there was no one that could help Bob. He tried Mary, she couldn't help. He tried Sam - Sam couldn't help him either. So Bob did it himself.

I didnt think like that, only thought like below.

There was no one that could help Bob. He even cant try Mary,coz mary doesnt exist .He cant try Sam ,coz Sam does not exist too.

Sorry ,if it hurts you.

Thanks as always.
Long live

query tuning

babu, February 19, 2014 - 7:53 am UTC

Hi tom,
below is the query i tuned from old one.but still it is taking 15 minutes.is there any way to tune better.or it seems the better one.(tables contain rows in lacs nearly in millions).can you tell me...please please...

SELECT /*+ leading(r.run r.rsk )*/
R.Uprice,
R.Aprice,
R.Pprice,
R.exeID,
R.exeDATE,
R.SID,
R.PVprice,
R.PV,
R.PREMIUMPV,
R.BOOK,
R.Makeup,
R.MAkeupprice,
T.businessTYPE,
T.REASON,
T.businessTABLE,
T.businessID,
'20 Dec 2013' AS ARC_date,
'LDN' AS ARC_place,
'2013-12-23 04.52.56' AS ARC_DATETIME
FROM
OWNER.totalRISK R,
OWNER.TRADE T,
(SELECT Name
FROM OWNER.BOOK
WHERE BookGroupYN = 'N'
START WITH Parent IN ('FX ACE ' , 'harward ' , 'harvard_LTFX' , 'FX Utility' , 'GCS FX' , 'HRE London' , 'MGT London' , 'proper London' , 'Prop Strategy' , 'Spot ' , 'SYSTEMATIC')
CONNECT BY PRIOR Name=Parent
) b
WHERE
R.SID = T.SID
AND T.VERSION = R.VERSION
AND R.Book = b.name
AND R.Department='FX Cash Controlling'
AND R.Location = 'London'
AND R.RunType = 'EOD'
AND R.RUNDATE = '20 Dec 2013'

More to Explore

Performance

Get all the information about database performance in the Database Performance guide.

The Performance Tuning Process

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

Performance Tuning Process

redo log space request

Performance Class

More than an analogy?

10g tuning

Millsap Vs. TKPROF & StatsPack

Your own method of looking at a trace in more detail using SQL.

Saruman, not Saraman! LOTR fans will kill me!

UML Type Approach

TKPROF-savvy Sites

Motive for the method

Performance issues in production

How does the PARALLEL_THREADS_PER_CPU recommendation fit in?

Access to v$active_session_history

How to tune a database you can't connect to?

It did cross my mind!

whether we need to tune the database

Tuning Batch Programs

Virtualization and Tuning

v$active_session_history time measurements

Tuning in production

tkprof

THanks

v$active_session_history

Here is the query I use....

Sorry, I had realised that..

time_waited in dba_hist_active_sess_history and v$active_session_history

What is IMP/IMPDP read wait event?

But where do the O/S reads show up?

tuning process

Monster Child Recods

performance

performance

trace

performance

Effect of SDU parameter

SDU

performance

perfromance

performance

Automatic SQL Tuning in Oracle Database 10g

performance

It's the recursing package call

Past history of execution plans of the query

tuning

query on large tables

I thought so...

performance design

performance

performanc tuning

performance

Index on foreign keys

performance issue after database moved to high end server

performance tuning using partitioning

GC Buffer Busy

GC Buffer busy

Index contention

performance issue

performance issue..contd

.Performance tuning process

...Performance tuning process

clarification about few lines in expert one-on-one

bit disappointed by your answer,but its ok

query tuning

More to Explore

Performance