Skip to Main Content
  • Questions
  • Oracle Resource Manager ('overhead')

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Robert.

Asked: April 14, 2008 - 12:58 pm UTC

Last updated: May 23, 2012 - 10:02 am UTC

Version: 10.2.0

Viewed 1000+ times

You Asked

Hello Tom,

I am looking into using Oracle Resource Manager to limit resources by AppDev folks on the production database so as allow them access yet not hurt production performance.

1) Is Resource Manager a good tool to help achieve my goal, above?
2) As I am testing with Resource Manager functionality, I think I am seeing something disturbing, but I havn't nailed it down definitivley yet: It appears that there is a lot of CPU 'overhead' associated with Oracle's 'administration' of a Resource Plan. For example I have 2 groups APPLICATIONS and APPDEV. I have APPLICATIONS getting 100% of CPU at 'level 1' and APPDEV getting 100% CPU at 'level 2'.

We have 4 CPUs on the server.

Here is what I think I am seeing:
When I kick off 4 CPU intensive APPLICATIONS user jobs simultaniously with 4 CPU intensive APPDEV user jobs, my Linux 'top' output shows the 4 APPLICATIONS user jobs getting most of the CPU which is what I would expect...HOWEVER... the 4 APPDEV user jobs getting a NON-insignificant amount of CPU... looking 'inside' the database they do not appear to be getting much CPU (the resource plan is 'working')... but yet the server appears to be spending a good bit of CPU managing these low-cpu processes.
In other words using the resource plan is backfiring... the 'cure' seems to be having its own undesirable side effect!?

Can you please shed some light on this?

Thanks,

Robert.

and Tom said...

1) You would realistically allow *development* accounts access to *production*. That is not a truly good idea. If they say "we need to in order to debug", then I would say "they wrote the worst code ever" (how often does an OS kernel developer or a database kernel developer appear at your site asking to debug your live system - they don't, they can debug from afar as they have actually instrumented their code).

The resource manager can make it so that when the machine is nearing 100% utilization - categories of users can be given a prioritized slice of the machine. That is, if the machine is idle - your development folks will be able to get 100% of it - it'll only be when they try to get 100% of it AND there are application users on that the resource manager will kick in.

But - you really should be looking at

a) getting the code instrumented so this evil practice can stop
b) at the very least, using a backup restored to test to - well - test with


2) when you are near 100% utilization - everything is going to be bogged down. The implementation of a resource plan is going to take some resources itself (of course). You do not quantify at all what "non-insignificant" is - so I will not address that (everyone has a different definition for that...). You did not describe how you measured things - neither in or out of the database - so it is not really possible to comment.

Rating

  (12 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Have Used Resource Manager before-good results

Jim, April 16, 2008 - 10:44 am UTC

I have used Resource manager before (in 9i and 10G) and have had excellent results. Most of the time no one notices it.

I have used it to fence users from low priority batch jobs. Most of the time the batch jobs run in the normal amount of time. If the users are busy then the batch jobs can take about 2X as long, but the users experience excellent response.

I think this is one of those valuable Oracle features that is often overlooked.

More information to my original question...

Robert, April 16, 2008 - 12:29 pm UTC

Tom,

Both your points are well taken.

1) Usually the main reason they need access to production is because of the freshness of the data.
(It is rarely for debugging of their in-house (vs. 3rd party) applications). 
Sometimes we copy/export a production schema/application into test environment so they can look at it there.

2) Ok. Here is more light on meaning of "not-insigificant"...
Linux RHEL3 server, 4 CPUs.
10.2.0.4 database
The server/database is more or less 'quiet' except for my testing, 
and I have run enough tests that I will recognize any skews in my results/observations.
Basically I have 2 primary resource groups in my active plan: HIGH_PRIORITY ('applications') and LOW_PRIORITY ('appdev')
I am using a 'simple' resource plan, where HIGH PRIORITY gets 100% CPU, then LOW PRIORITY gets 100% CPU
To be precise:
Level 1 - SYS GROUP - 100%
Level 2 - HI PRIORITY ('applications') GROUP - 100%
Level 3 - LO PRIORITY ('appdev') GROUP - 100%
Level 4 - OTHER - 100%

Only real activity on the server/database during my testing are my HI PRIORITY and LO PRIORITY test sessions.

Here is the query I am running to test with:

select /*+ full(a) full(b) */ count(*)
  from rg_load_test a
     , rg_load_test b
/

(rg_load_test = copy of dba_objects (47,473 rows))

I am using 'top', OEM Grid Control, extended sql tracing, and timing inside my scripts to monitor/record my tests

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
HERE ARE SIMPLE WALL CLOCK TIMES, WHICH ARE SHOCKING!
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Average time for 4 HI PRIORITY jobs without any low priority jobs    : ~1500 seconds
Average time for 4 HI PRIORITY jobs WITH 2 low priority jobs running : ~2100 seconds

Resource Plan says my HI priority jobs were supposed to get 100% of CPU before any low priority jobs get CPU
When in fact the presence of other jobs caused my hi priority jobs to run 40% slower!?

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
HERE ARE 'top' SNAPSHOTS WHICH SHED SOME LIGHT ON THINGS
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++

=============================================================
Here is a 'top' snapshot which shows very typical numbers for these tests
There are 4 HIGH_PRIORITY sessions running
      and 0 (none) LOW_PRIORITY sessions running
NOTE the 24+% CPU utilization for the 4 oracle processes below (almost 100% total CPU usage)
This is exactly how I would expect these to run
=============================================================
 08:45:02  up 37 days, 18:14,  3 users,  load average: 4.20, 4.02, 2.86
347 processes: 340 sleeping, 5 running, 1 zombie, 1 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
           total   96.8%    0.0%    2.9%   0.2%     0.0%    0.0%    0.0%
           cpu00   96.6%    0.0%    2.5%   0.7%     0.0%    0.0%    0.0%
           cpu01   97.4%    0.0%    2.3%   0.1%     0.0%    0.0%    0.0%
           cpu02   96.4%    0.0%    3.5%   0.0%     0.0%    0.0%    0.0%
           cpu03   96.8%    0.0%    3.1%   0.0%     0.0%    0.0%    0.0%
Mem:  6159420k av, 6126444k used,   32976k free,       0k shrd,  146860k buff
                   3311992k actv, 1910420k in_d,  136616k in_c
Swap: 2763172k av,  870300k used, 1892872k free                 5458076k cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
 3065 oracle    25   0 27256  26M 24976 R    24.9  0.4  17:12   3 oracle <- HI PRIORITY RG
 3066 oracle    25   0 27276  26M 25000 R    24.8  0.4  17:15   2 oracle <- HI PRIORITY RG
 3067 oracle    25   0 27240  26M 24960 R    24.8  0.4  17:13   0 oracle <- HI PRIORITY RG
 3068 oracle    25   0 27316  26M 25040 R    24.7  0.4  17:14   1 oracle <- HI PRIORITY RG
 2833 oracle    16   0  1364 1364   884 R     0.1  0.0   0:08   2 top


=============================================================
Here is a 'top' snapshot which shows very typical numbers for these tests
There are 4 HIGH_PRIORITY sessions running
      and 2 LOW_PRIORITY sessions running

NOTE how much less CPU the HI PRIORITY processes are getting than they were, above

The resource plan is THROTTLING my HI PRIORITY JOBS!!!

=============================================================
 09:35:36  up 37 days, 19:05,  4 users,  load average: 4.12, 1.44, 0.72
360 processes: 352 sleeping, 6 running, 1 zombie, 1 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
           total   94.4%    0.0%    5.0%   0.1%     0.0%    0.0%    0.2%
           cpu00   94.8%    0.0%    4.7%   0.0%     0.0%    0.0%    0.3%
           cpu01   94.2%    0.0%    5.1%   0.0%     0.0%    0.1%    0.3%
           cpu02   93.8%    0.0%    5.5%   0.5%     0.0%    0.0%    0.0%
           cpu03   94.8%    0.0%    4.7%   0.0%     0.0%    0.0%    0.3%
Mem:  6159420k av, 6139856k used,   19564k free,       0k shrd,  160996k buff
                   3351476k actv, 1885296k in_d,  147144k in_c
Swap: 2763172k av,  869388k used, 1893784k free                 5431560k cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
21850 oracle    25   0 27260  26M 24984 R    21.8  0.4   1:13   3 oracle <- HI PRIORITY RG
21853 oracle    25   0 27304  26M 25024 R    21.4  0.4   1:21   0 oracle <- HI PRIORITY RG
21848 oracle    25   0 27336  26M 25060 R    19.1  0.4   1:16   1 oracle <- HI PRIORITY RG
21852 oracle    25   0 27288  26M 25008 R    17.7  0.4   1:17   2 oracle <- HI PRIORITY RG
21847 oracle    25   0 27336  26M 25060 R     9.4  0.4   0:31   1 oracle <- LO PRIORITY RG
21844 oracle    18   0 27276  26M 25000 S     8.4  0.4   0:36   0 oracle <- LO PRIORITY RG
 9844 oracle    16   0 30244  28M 26960 S     0.6  0.4   0:12   2 oracle
 2833 oracle    15   0  1368 1368   884 R     0.2  0.0   0:27   1 top


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
REPRESENTATIVE TKPROF REPORTS 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
==========================================================================================
==========================================================================================
Here is tkprof report for 1 of 4 hi priority sessions with no low priority sessions running.
==========================================================================================
==========================================================================================

select /*+ full(a) full(b) */ count(*)
  from rg_load_test a
     , rg_load_test b

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2   1477.79    1470.92          0       1314          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4   1477.79    1470.92          0       1314          0           1

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 146

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=1314 pr=0 pw=0 time=1470923911 us)
2253116089   MERGE JOIN CARTESIAN (cr=1314 pr=0 pw=0 time=6759651845 us)
  47467    TABLE ACCESS FULL RG_LOAD_TEST (cr=657 pr=0 pw=0 time=190012 us)
2253116089    BUFFER SORT (cr=657 pr=0 pw=0 time=2253572357 us)
  47467     TABLE ACCESS FULL RG_LOAD_TEST (cr=657 pr=0 pw=0 time=1234153 us)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       3        0.00          0.00
  SQL*Net message from client                     3        0.00          0.00
********************************************************************************


==========================================================================================
==========================================================================================
Here is tkprof report for 1 of 4 hi priority sessions with 2 low priority sessions running.

~15% more CPU than above
~40% more TIME than above !!!

==========================================================================================
==========================================================================================

select /*+ full(a) full(b) */ count(*)
  from rg_load_test a
     , rg_load_test b

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2   1613.49    2134.03          0       1314          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4   1613.49    2134.03          0       1314          0           1

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 146

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=1314 pr=0 pw=0 time=2134035600 us)
2253116089   MERGE JOIN CARTESIAN (cr=1314 pr=0 pw=0 time=9012505686 us)
  47467    TABLE ACCESS FULL RG_LOAD_TEST (cr=657 pr=0 pw=0 time=189987 us)
2253116089    BUFFER SORT (cr=657 pr=0 pw=0 time=4506391307 us)
  47467     TABLE ACCESS FULL RG_LOAD_TEST (cr=657 pr=0 pw=0 time=47502 us)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       3        0.00          0.00
  SQL*Net message from client                     3        0.00          0.00
  resmgr:cpu quantum                           4147        0.28         16.50
  latch free                                      5        0.12          0.22
********************************************************************************


Tom, so would you agree that resource manager is not doing a very good job of administrating the CPU resources for my jobs according to my active resource plan?
I would expect some little bit of overhead to manage this, but this seems very excessive.

I thought resource mananager was supposed to make sure my high priority resource groups got all the cpu/resources they needed? What is going on here?

Please explain.

Thanks!

Robert.

simple 90 10 split

Jim, April 16, 2008 - 12:51 pm UTC

I used a 90% 10 split. 90 % for the class for high priority and 10 % for the rest.

To Jim and Tom

Robert, April 16, 2008 - 2:38 pm UTC

Jim,

But I want my low priority users to get 100% of cpu when no high priority users are needing cpu... and I want my high priority users to get 100% of cpu whenever possible.

Tom, why is my resource plan not working as expected?

Thanks,

Robert.
Tom Kyte
April 16, 2008 - 4:38 pm UTC

as stated above by others - you have a misconception as to how this works.

resource management basically does nothing until a resource is totally maxed out (you cannot put cpu in the bank and use it later). So even if you say "100% to group A", unless group A actually needs to consume 100%.

resource management is not a "fence", but rather an attempt to assure you that if group A makes a demand for X%, it will be given to them.

Jim, April 16, 2008 - 3:15 pm UTC

Robert,
I think there is a disconnect in how the Resource Manager works. If the machine isn't maxed out then there isn't any throttling of resources. So if a set of low priority users is using 10% of the CPU and a set of high priority users is using 50% of the CPU (both in aggregate) then neither one will be throttled back and giving 100% to high priority users won't get them their results any faster.

If you want high priority users to have 100% then give them 100% and the low priority users 0%. (that might cause thrashing because the CPU usage isn't a constant value and that may be what you are seeing.)

Try it in a test environment.

I think that is what I'm doing...

Robert, April 16, 2008 - 4:45 pm UTC

Jim,

Here is what my plan looks like

Resource Group    Level 1  Level 2  Level 3  Level 4
--------------    -------  -------  -------  -------
SYS GROUP             100        0        0        0
HIGH GROUP              0      100        0        0
LOW GROUP               0        0      100        0
OTHER GROUP             0        0        0      100

Also, all my tests are with the server CPU maxed out with the only/main processes running being my test processes.

My understanding is that when CPU is maxed out only processes running are from HIGH or LOW group, that HIGH group processes should get all the CPU they can handle and then if any is left over the LOW group will get the rest of it, all things being equal.

What am I missing?

Thanks,

Robert.


But CPU is maxed out.....

Robert, April 16, 2008 - 4:53 pm UTC

Tom,

That is the point of my tests...
...the CPU IS MAXED OUT in each of the tests.

HIGH group running querys which maxes out all CPU.
Then a LOW group process is kicked off (while CPU is still maxed out from HIGH group processes).

What SHOULD occur is that HIGH group processes will not be affected... SHOULD still run at 100% CPU
What DOES occur is that HIGH group processes are throttled down considerably (see my second post).

I don't think this is rocket surgery(?)

Please explain.

Thanks,

Robert.
Tom Kyte
April 16, 2008 - 9:33 pm UTC

I cannot really read the stuff from above, it is sort of "not pre tagged". Can you format (even when I cut and paste, cannot read it as html wiped out the whitespace altogether)

CPUs maxed out

Bill, April 17, 2008 - 8:25 am UTC

You say that your CPUs are maxed out processing your high priority queries, but why? Surely there is I/O going on there as well (and other processing / waits etc). Why do you think your jobs should be 100% CPU bound?
Tom Kyte
April 17, 2008 - 9:16 am UTC

select /*+ full(a) full(b) */ count(*)
from rg_load_test a
  , rg_load_test b

call  count    cpu  elapsed    disk    query  current    rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse    1    0.00    0.00      0      0      0      0
Execute    1    0.00    0.00      0      0      0      0
Fetch    2  1477.79  1470.92      0    1314      0      1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total    4  1477.79  1470.92      0    1314      0      1

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 146

Rows  Row Source Operation
------- ---------------------------------------------------
    1 SORT AGGREGATE (cr=1314 pr=0 pw=0 time=1470923911 us)
2253116089  MERGE JOIN CARTESIAN (cr=1314 pr=0 pw=0 time=6759651845 us)
47467  TABLE ACCESS FULL RG_LOAD_TEST (cr=657 pr=0 pw=0 time=190012 us)
2253116089  BUFFER SORT (cr=657 pr=0 pw=0 time=2253572357 us)
47467  TABLE ACCESS FULL RG_LOAD_TEST (cr=657 pr=0 pw=0 time=1234153 us) 



he did a large in memory cartesian join.


but frankly - when machine is 100% utilized and there are other things going on (the 4 low priority sessions will get some runtime, that is just going to happen) - the 100% of cpu to this group isn't 'reasonable', everyone gets a bit of time at some point....


maybe a better approach (sort of GETTING THAT GROUP OFF OF THE MACHINE) would be to limit the maximum number of active sessions in the lower priority group - that is an easier thing to implement and manage and would basically restrict the maximum load they would be able to impose.

Thanks Tom and everyone

Robert, April 17, 2008 - 10:56 am UTC

Tom,

1) I'm not sure how to format it to keep white space... I created my large post as a text document and then copied/pasted using the "code" tags (?)... When I view it on your web site I am seeing it nicely formated in fixed font with all whitespace (?).

2) But thanks for your help. Your suggestion about getting them off the database is well taken... and I have been playing around with limiting the number of active sessions for the 'low' group.

But you know, I'm wondering if my *extreme* tests are realistic in this case. This is basically a transactional system with an occasional report type query. Thanks to the give and take from this posting I am going to consider testing with more sessions using LESS cpu each...not maxing out CPU... Then, maybe resource manager won't even kick in(?)... and I will compare times of 'HIGH PRIORITY' sessions with various of these 'low priority' loads.

Your feedback would be most appreciated...

Thanks!

Robert.
Tom Kyte
April 17, 2008 - 11:53 am UTC

consider the max concurrent users rule as well - it seems to be something you would benefit from.

"max concurrent users rule"?

Robert, April 17, 2008 - 1:42 pm UTC

Tom,

Not finding any resource plan directive, profile setting, or anything else which is like "max concurrent users" rule (?)
Is this 11g?

Thanks,

Robert
Tom Kyte
April 17, 2008 - 4:31 pm UTC

http://docs.oracle.com/docs/cd/B19306_01/server.102/b14231/dbrm.htm#sthref3237

.... Create an active session pool. This pool consists of a specified maximum number of user sessions allowed to be concurrently active within a group of users. Additional sessions beyond the maximum are queued for execution, but you can specify a timeout period, after which queued jobs will terminate. ....


DBA_RSRC_PLAN_DIRECTIVES view columns

aliyar, January 18, 2012 - 2:55 pm UTC

Dear Tom,

appreciate your help for DBA world.

can you please explain me the following column's description and how to understand

Database : 11gR2. this is a 8 node rac. 
o/s : Linux server

SQL> select PLAN , MGMT_P8,MGMT_P7,MGMT_P6,MGMT_P5,MGMT_P4,MGMT_P3,MGMT_P2,MGMT_P1 from DBA_RSRC_PLAN_DIRECTIVES where GROUP_OR_SUBPLAN='DEFAULT_CONSUMER_GROUP';

PLAN                              MGMT_P8    MGMT_P7    MGMT_P6    MGMT_P5    MGMT_P4    MGMT_P3    MGMT_P2    MGMT_P1
------------------------------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
DW_NORMAL_PLAN                          0          0          0        100          0          0          0          0

from above result , mgmt_p5 is having 100 and rest of them zero. what it says? how to understand above output.

thanks once again for your Help


Tom Kyte
January 18, 2012 - 3:39 pm UTC

they are the parameters used when creating a resource plan directive

see
http://docs.oracle.com/cd/E11882_01/server.112/e25494/dbrm005.htm#ADMIN11868

for information on doing that. Someone created this plan and gave 100 to the fifth argument. It is assigning 100% cpu to the dw_normal_plan at level 5 (plans can have many levels)

Question about cpu usage distribution when using Resource Consumer group

AMAR SINGH, May 23, 2012 - 5:16 am UTC

Hi Tom,
I have one confusion about resource consumer group for a long time and what better place to post your confusion than here.

My question is that if on a 8 cpu machine, if I run a sql that will run analyze index commands for 100 indexes from a single session:

1) I suppose that for the above command only 1 cpu will be used out of 8.

2) Under what conditions can the above sql use more than 1 processor?

3) In case the user executing the above sql is part of resource consumer group and is being allocated 25% of cpu then on a 100% cpu load, will the above sql really use 2 servers(ie 25% of 8 cpus?). If not then will it only use actually 25% of 1 cpu?

I hope I am able to present my question properly.

Thanks
Amar
Tom Kyte
May 23, 2012 - 10:02 am UTC

1) if you don't use parallel, correct
2) using parallel
3) it would use UP TO 25% of the cpu (total cpu) resources on the machine.


More to Explore

Backup/Recovery

Check out the complete guide to all of the Backup & Recovery techniques in the Oracle Database.