Have Used Resource Manager before-good results
Jim, April 16, 2008 - 10:44 am UTC
I have used Resource manager before (in 9i and 10G) and have had excellent results. Most of the time no one notices it.
I have used it to fence users from low priority batch jobs. Most of the time the batch jobs run in the normal amount of time. If the users are busy then the batch jobs can take about 2X as long, but the users experience excellent response.
I think this is one of those valuable Oracle features that is often overlooked.
More information to my original question...
Robert, April 16, 2008 - 12:29 pm UTC
Tom,
Both your points are well taken.
1) Usually the main reason they need access to production is because of the freshness of the data.
(It is rarely for debugging of their in-house (vs. 3rd party) applications).
Sometimes we copy/export a production schema/application into test environment so they can look at it there.
2) Ok. Here is more light on meaning of "not-insigificant"...
Linux RHEL3 server, 4 CPUs.
10.2.0.4 database
The server/database is more or less 'quiet' except for my testing,
and I have run enough tests that I will recognize any skews in my results/observations.
Basically I have 2 primary resource groups in my active plan: HIGH_PRIORITY ('applications') and LOW_PRIORITY ('appdev')
I am using a 'simple' resource plan, where HIGH PRIORITY gets 100% CPU, then LOW PRIORITY gets 100% CPU
To be precise:
Level 1 - SYS GROUP - 100%
Level 2 - HI PRIORITY ('applications') GROUP - 100%
Level 3 - LO PRIORITY ('appdev') GROUP - 100%
Level 4 - OTHER - 100%
Only real activity on the server/database during my testing are my HI PRIORITY and LO PRIORITY test sessions.
Here is the query I am running to test with:
select /*+ full(a) full(b) */ count(*)
from rg_load_test a
, rg_load_test b
/
(rg_load_test = copy of dba_objects (47,473 rows))
I am using 'top', OEM Grid Control, extended sql tracing, and timing inside my scripts to monitor/record my tests
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
HERE ARE SIMPLE WALL CLOCK TIMES, WHICH ARE SHOCKING!
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Average time for 4 HI PRIORITY jobs without any low priority jobs : ~1500 seconds
Average time for 4 HI PRIORITY jobs WITH 2 low priority jobs running : ~2100 seconds
Resource Plan says my HI priority jobs were supposed to get 100% of CPU before any low priority jobs get CPU
When in fact the presence of other jobs caused my hi priority jobs to run 40% slower!?
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
HERE ARE 'top' SNAPSHOTS WHICH SHED SOME LIGHT ON THINGS
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
=============================================================
Here is a 'top' snapshot which shows very typical numbers for these tests
There are 4 HIGH_PRIORITY sessions running
and 0 (none) LOW_PRIORITY sessions running
NOTE the 24+% CPU utilization for the 4 oracle processes below (almost 100% total CPU usage)
This is exactly how I would expect these to run
=============================================================
08:45:02 up 37 days, 18:14, 3 users, load average: 4.20, 4.02, 2.86
347 processes: 340 sleeping, 5 running, 1 zombie, 1 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 96.8% 0.0% 2.9% 0.2% 0.0% 0.0% 0.0%
cpu00 96.6% 0.0% 2.5% 0.7% 0.0% 0.0% 0.0%
cpu01 97.4% 0.0% 2.3% 0.1% 0.0% 0.0% 0.0%
cpu02 96.4% 0.0% 3.5% 0.0% 0.0% 0.0% 0.0%
cpu03 96.8% 0.0% 3.1% 0.0% 0.0% 0.0% 0.0%
Mem: 6159420k av, 6126444k used, 32976k free, 0k shrd, 146860k buff
3311992k actv, 1910420k in_d, 136616k in_c
Swap: 2763172k av, 870300k used, 1892872k free 5458076k cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
3065 oracle 25 0 27256 26M 24976 R 24.9 0.4 17:12 3 oracle <- HI PRIORITY RG
3066 oracle 25 0 27276 26M 25000 R 24.8 0.4 17:15 2 oracle <- HI PRIORITY RG
3067 oracle 25 0 27240 26M 24960 R 24.8 0.4 17:13 0 oracle <- HI PRIORITY RG
3068 oracle 25 0 27316 26M 25040 R 24.7 0.4 17:14 1 oracle <- HI PRIORITY RG
2833 oracle 16 0 1364 1364 884 R 0.1 0.0 0:08 2 top
=============================================================
Here is a 'top' snapshot which shows very typical numbers for these tests
There are 4 HIGH_PRIORITY sessions running
and 2 LOW_PRIORITY sessions running
NOTE how much less CPU the HI PRIORITY processes are getting than they were, above
The resource plan is THROTTLING my HI PRIORITY JOBS!!!
=============================================================
09:35:36 up 37 days, 19:05, 4 users, load average: 4.12, 1.44, 0.72
360 processes: 352 sleeping, 6 running, 1 zombie, 1 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 94.4% 0.0% 5.0% 0.1% 0.0% 0.0% 0.2%
cpu00 94.8% 0.0% 4.7% 0.0% 0.0% 0.0% 0.3%
cpu01 94.2% 0.0% 5.1% 0.0% 0.0% 0.1% 0.3%
cpu02 93.8% 0.0% 5.5% 0.5% 0.0% 0.0% 0.0%
cpu03 94.8% 0.0% 4.7% 0.0% 0.0% 0.0% 0.3%
Mem: 6159420k av, 6139856k used, 19564k free, 0k shrd, 160996k buff
3351476k actv, 1885296k in_d, 147144k in_c
Swap: 2763172k av, 869388k used, 1893784k free 5431560k cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
21850 oracle 25 0 27260 26M 24984 R 21.8 0.4 1:13 3 oracle <- HI PRIORITY RG
21853 oracle 25 0 27304 26M 25024 R 21.4 0.4 1:21 0 oracle <- HI PRIORITY RG
21848 oracle 25 0 27336 26M 25060 R 19.1 0.4 1:16 1 oracle <- HI PRIORITY RG
21852 oracle 25 0 27288 26M 25008 R 17.7 0.4 1:17 2 oracle <- HI PRIORITY RG
21847 oracle 25 0 27336 26M 25060 R 9.4 0.4 0:31 1 oracle <- LO PRIORITY RG
21844 oracle 18 0 27276 26M 25000 S 8.4 0.4 0:36 0 oracle <- LO PRIORITY RG
9844 oracle 16 0 30244 28M 26960 S 0.6 0.4 0:12 2 oracle
2833 oracle 15 0 1368 1368 884 R 0.2 0.0 0:27 1 top
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
REPRESENTATIVE TKPROF REPORTS
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
==========================================================================================
==========================================================================================
Here is tkprof report for 1 of 4 hi priority sessions with no low priority sessions running.
==========================================================================================
==========================================================================================
select /*+ full(a) full(b) */ count(*)
from rg_load_test a
, rg_load_test b
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 1477.79 1470.92 0 1314 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 1477.79 1470.92 0 1314 0 1
Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 146
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=1314 pr=0 pw=0 time=1470923911 us)
2253116089 MERGE JOIN CARTESIAN (cr=1314 pr=0 pw=0 time=6759651845 us)
47467 TABLE ACCESS FULL RG_LOAD_TEST (cr=657 pr=0 pw=0 time=190012 us)
2253116089 BUFFER SORT (cr=657 pr=0 pw=0 time=2253572357 us)
47467 TABLE ACCESS FULL RG_LOAD_TEST (cr=657 pr=0 pw=0 time=1234153 us)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 3 0.00 0.00
SQL*Net message from client 3 0.00 0.00
********************************************************************************
==========================================================================================
==========================================================================================
Here is tkprof report for 1 of 4 hi priority sessions with 2 low priority sessions running.
~15% more CPU than above
~40% more TIME than above !!!
==========================================================================================
==========================================================================================
select /*+ full(a) full(b) */ count(*)
from rg_load_test a
, rg_load_test b
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 1613.49 2134.03 0 1314 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 1613.49 2134.03 0 1314 0 1
Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 146
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=1314 pr=0 pw=0 time=2134035600 us)
2253116089 MERGE JOIN CARTESIAN (cr=1314 pr=0 pw=0 time=9012505686 us)
47467 TABLE ACCESS FULL RG_LOAD_TEST (cr=657 pr=0 pw=0 time=189987 us)
2253116089 BUFFER SORT (cr=657 pr=0 pw=0 time=4506391307 us)
47467 TABLE ACCESS FULL RG_LOAD_TEST (cr=657 pr=0 pw=0 time=47502 us)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 3 0.00 0.00
SQL*Net message from client 3 0.00 0.00
resmgr:cpu quantum 4147 0.28 16.50
latch free 5 0.12 0.22
********************************************************************************
Tom, so would you agree that resource manager is not doing a very good job of administrating the CPU resources for my jobs according to my active resource plan?
I would expect some little bit of overhead to manage this, but this seems very excessive.
I thought resource mananager was supposed to make sure my high priority resource groups got all the cpu/resources they needed? What is going on here?
Please explain.
Thanks!
Robert.
simple 90 10 split
Jim, April 16, 2008 - 12:51 pm UTC
I used a 90% 10 split. 90 % for the class for high priority and 10 % for the rest.
To Jim and Tom
Robert, April 16, 2008 - 2:38 pm UTC
Jim,
But I want my low priority users to get 100% of cpu when no high priority users are needing cpu... and I want my high priority users to get 100% of cpu whenever possible.
Tom, why is my resource plan not working as expected?
Thanks,
Robert.
April 16, 2008 - 4:38 pm UTC
as stated above by others - you have a misconception as to how this works.
resource management basically does nothing until a resource is totally maxed out (you cannot put cpu in the bank and use it later). So even if you say "100% to group A", unless group A actually needs to consume 100%.
resource management is not a "fence", but rather an attempt to assure you that if group A makes a demand for X%, it will be given to them.
Jim, April 16, 2008 - 3:15 pm UTC
Robert,
I think there is a disconnect in how the Resource Manager works. If the machine isn't maxed out then there isn't any throttling of resources. So if a set of low priority users is using 10% of the CPU and a set of high priority users is using 50% of the CPU (both in aggregate) then neither one will be throttled back and giving 100% to high priority users won't get them their results any faster.
If you want high priority users to have 100% then give them 100% and the low priority users 0%. (that might cause thrashing because the CPU usage isn't a constant value and that may be what you are seeing.)
Try it in a test environment.
I think that is what I'm doing...
Robert, April 16, 2008 - 4:45 pm UTC
Jim,
Here is what my plan looks like
Resource Group Level 1 Level 2 Level 3 Level 4
-------------- ------- ------- ------- -------
SYS GROUP 100 0 0 0
HIGH GROUP 0 100 0 0
LOW GROUP 0 0 100 0
OTHER GROUP 0 0 0 100
Also, all my tests are with the server CPU maxed out with the only/main processes running being my test processes.
My understanding is that when CPU is maxed out only processes running are from HIGH or LOW group, that HIGH group processes should get all the CPU they can handle and then if any is left over the LOW group will get the rest of it, all things being equal.
What am I missing?
Thanks,
Robert.
But CPU is maxed out.....
Robert, April 16, 2008 - 4:53 pm UTC
Tom,
That is the point of my tests...
...the CPU IS MAXED OUT in each of the tests.
HIGH group running querys which maxes out all CPU.
Then a LOW group process is kicked off (while CPU is still maxed out from HIGH group processes).
What SHOULD occur is that HIGH group processes will not be affected... SHOULD still run at 100% CPU
What DOES occur is that HIGH group processes are throttled down considerably (see my second post).
I don't think this is rocket surgery(?)
Please explain.
Thanks,
Robert.
April 16, 2008 - 9:33 pm UTC
I cannot really read the stuff from above, it is sort of "not pre tagged". Can you format (even when I cut and paste, cannot read it as html wiped out the whitespace altogether)
CPUs maxed out
Bill, April 17, 2008 - 8:25 am UTC
You say that your CPUs are maxed out processing your high priority queries, but why? Surely there is I/O going on there as well (and other processing / waits etc). Why do you think your jobs should be 100% CPU bound?
April 17, 2008 - 9:16 am UTC
select /*+ full(a) full(b) */ count(*)
from rg_load_test a
, rg_load_test b
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 1477.79 1470.92 0 1314 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 1477.79 1470.92 0 1314 0 1
Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 146
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=1314 pr=0 pw=0 time=1470923911 us)
2253116089 MERGE JOIN CARTESIAN (cr=1314 pr=0 pw=0 time=6759651845 us)
47467 TABLE ACCESS FULL RG_LOAD_TEST (cr=657 pr=0 pw=0 time=190012 us)
2253116089 BUFFER SORT (cr=657 pr=0 pw=0 time=2253572357 us)
47467 TABLE ACCESS FULL RG_LOAD_TEST (cr=657 pr=0 pw=0 time=1234153 us)
he did a large in memory cartesian join.
but frankly - when machine is 100% utilized and there are other things going on (the 4 low priority sessions will get some runtime, that is just going to happen) - the 100% of cpu to this group isn't 'reasonable', everyone gets a bit of time at some point....
maybe a better approach (sort of GETTING THAT GROUP OFF OF THE MACHINE) would be to limit the maximum number of active sessions in the lower priority group - that is an easier thing to implement and manage and would basically restrict the maximum load they would be able to impose.
Thanks Tom and everyone
Robert, April 17, 2008 - 10:56 am UTC
Tom,
1) I'm not sure how to format it to keep white space... I created my large post as a text document and then copied/pasted using the "code" tags (?)... When I view it on your web site I am seeing it nicely formated in fixed font with all whitespace (?).
2) But thanks for your help. Your suggestion about getting them off the database is well taken... and I have been playing around with limiting the number of active sessions for the 'low' group.
But you know, I'm wondering if my *extreme* tests are realistic in this case. This is basically a transactional system with an occasional report type query. Thanks to the give and take from this posting I am going to consider testing with more sessions using LESS cpu each...not maxing out CPU... Then, maybe resource manager won't even kick in(?)... and I will compare times of 'HIGH PRIORITY' sessions with various of these 'low priority' loads.
Your feedback would be most appreciated...
Thanks!
Robert.
April 17, 2008 - 11:53 am UTC
consider the max concurrent users rule as well - it seems to be something you would benefit from.
"max concurrent users rule"?
Robert, April 17, 2008 - 1:42 pm UTC
Tom,
Not finding any resource plan directive, profile setting, or anything else which is like "max concurrent users" rule (?)
Is this 11g?
Thanks,
Robert
April 17, 2008 - 4:31 pm UTC
http://docs.oracle.com/docs/cd/B19306_01/server.102/b14231/dbrm.htm#sthref3237 .... Create an active session pool. This pool consists of a specified maximum number of user sessions allowed to be concurrently active within a group of users. Additional sessions beyond the maximum are queued for execution, but you can specify a timeout period, after which queued jobs will terminate. ....
DBA_RSRC_PLAN_DIRECTIVES view columns
aliyar, January 18, 2012 - 2:55 pm UTC
Dear Tom,
appreciate your help for DBA world.
can you please explain me the following column's description and how to understand
Database : 11gR2. this is a 8 node rac.
o/s : Linux server
SQL> select PLAN , MGMT_P8,MGMT_P7,MGMT_P6,MGMT_P5,MGMT_P4,MGMT_P3,MGMT_P2,MGMT_P1 from DBA_RSRC_PLAN_DIRECTIVES where GROUP_OR_SUBPLAN='DEFAULT_CONSUMER_GROUP';
PLAN MGMT_P8 MGMT_P7 MGMT_P6 MGMT_P5 MGMT_P4 MGMT_P3 MGMT_P2 MGMT_P1
------------------------------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
DW_NORMAL_PLAN 0 0 0 100 0 0 0 0
from above result , mgmt_p5 is having 100 and rest of them zero. what it says? how to understand above output.
thanks once again for your Help
Question about cpu usage distribution when using Resource Consumer group
AMAR SINGH, May 23, 2012 - 5:16 am UTC
Hi Tom,
I have one confusion about resource consumer group for a long time and what better place to post your confusion than here.
My question is that if on a 8 cpu machine, if I run a sql that will run analyze index commands for 100 indexes from a single session:
1) I suppose that for the above command only 1 cpu will be used out of 8.
2) Under what conditions can the above sql use more than 1 processor?
3) In case the user executing the above sql is part of resource consumer group and is being allocated 25% of cpu then on a 100% cpu load, will the above sql really use 2 servers(ie 25% of 8 cpus?). If not then will it only use actually 25% of 1 cpu?
I hope I am able to present my question properly.
Thanks
Amar
May 23, 2012 - 10:02 am UTC
1) if you don't use parallel, correct
2) using parallel
3) it would use UP TO 25% of the cpu (total cpu) resources on the machine.