Detecting dead sesions
Om, October 27, 2001 - 9:07 pm UTC
I got exactly what I was looking for.You are
simply great Tom.The ORACLE world in IT just
cannot do anything without people like you.
Many Thanks
Om
Great !
Andre Whittick Nasser, October 29, 2001 - 9:15 am UTC
Thanks Tom, that was one of the "blind spots" I had in mind regarding connections.
By the way... When you kill a user, its status is marked KILLED in V$SESION.
I was told, or read somewhere that, from time to time, PMON wakes up to clean these entries. But I had a system running during weeks and the entries were still there.
1) Is that true ?
2) Is there any way to really remove those "dirty" entries ?
3) What is the bevahior of Dead Client Detection towards the entry in V$SESSION. Is it marked KILLED ?
Thanks again !
October 29, 2001 - 10:22 am UTC
1) no. PMON will roll them back. PMON will not make them "go away". When they are killed, their resources (locks) have been released but they are still technically connected.
See
</code>
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:1058832409881 <code>
they stay connected until their client application tries to access the database at which point they get the "your session has been killed" error and the session goes away then. Otherwise, they would get something like ORA-3113 EOF on communication channel or "packet write failure" -- rather nasty looking errors.
2) not any official way -- when the client session touches the database ,they are removed.
3) With DCD, the session is cleaned up is the CLIENT is no longer there. If a killed client is still there, DCD won't do anything (PMON has already released the resources, nothing to do)
DCD not working.
Arindom Kanti Dam, February 08, 2002 - 10:56 am UTC
Hello,
I am Sun Solaris 8 , 8.1.7.0.0
Have tried getting DCD to work ; both with MTS and dedicated server mode.
Have these entries in my sqlnet.ora:
sqlnet.expire_time=1
trace_level_server=16
trace_directory_server=/disk01/oracle/product/8.1.7/network/trace
However on reastarting the listener; the following on the *.trc files shows :
grep -i dead *.trc
svr_2051.trc:niotns: Dead connection detection being turned off.
Also noticed that 'dead' client connections are not being cleaned. It is a pain ; though managable in a direct connect mode ; though a showstopper in MTS....
Is this something version specific or does DCD really work?
Thanks.
February 08, 2002 - 4:38 pm UTC
1) make sure you are using the right sqlnet.ora
2) I tried this on 8172 on solaris and cannot reproduce at all. only when I don't have the expire time set do I get
/tmp/svr_13616.trc:niotns: Not enabling dead connection detection.
Perhaps it was an issue with 8170 and patching it to the current level will correct it.
In fact, there was a known issue with DCD and MTS that was fixed in 8171 (it did not work properly, would not get all of the sessions)
What are these process ??
A reader, February 09, 2002 - 1:20 am UTC
Hi Tom,
Apart from the BG process Running in my System i have got two process which mimic the BG process in the sense that the username, OSuser all are null.
But the lOGON_Time Of These sessions are Geting updated to the current time...
What are these process ???
Thanks.
Regards,
Ganesh Raja
February 09, 2002 - 11:51 am UTC
Sorry, my crystal ball is in the shop this week and visualizing the other needed information (like the PROGRAM name for example among others like OS, versions, etc etc, etc) is a little hazy.
Sorry About That ...
A reader, February 10, 2002 - 12:02 am UTC
Hi Tom,
Sorry about that ...
Should have given u more information.
OS : WIN 2000 Sp2
Oracle : 8.1.7.2
Output from v$session
SID SERIAL# LOGON_TIME PROGRAM STATUS
---------- ---------- ------------------- -------------------- --------
1 1 09.02.2002 17:05:33 ORACLE.EXE ACTIVE
2 1 09.02.2002 17:05:33 ORACLE.EXE ACTIVE
3 1 09.02.2002 17:05:34 ORACLE.EXE ACTIVE
4 1 09.02.2002 17:05:34 ORACLE.EXE ACTIVE
5 1 09.02.2002 17:05:35 ORACLE.EXE ACTIVE
6 1 09.02.2002 17:05:36 ORACLE.EXE ACTIVE
7 22345 10.02.2002 09:02:47 ACTIVE
8 22345 10.02.2002 09:02:47 ACTIVE
9 1 09.02.2002 17:05:36 ORACLE.EXE ACTIVE
10 20193 10.02.2002 08:47:13 sqlplus.exe ACTIVE
As i told u the other values like the OSUSER, TERMINAl, MACHINE and all are null..
What are these...
Again a Sorry for the previous Post !!!
Regards,
Ganesh R
February 10, 2002 - 1:30 pm UTC
do you have job queue processes = 2? Those look like they might be the snp processes/threads.
Thanks Tom !!!
A reader, February 10, 2002 - 11:51 pm UTC
From time to time
Alex, September 05, 2002 - 1:11 pm UTC
A simple doubt:
Oracle Documentation (Oracle Concepts) says:
"Like SMON, PMON wakes up regularly to check whether it is needed, and can be called if another process detects the need for it."What does "regularly" mean? Every seconnd? Milisecond?
Thank you very much!!!
September 05, 2002 - 9:35 pm UTC
about every 3 seconds.
no sqlnet.ora
atul, September 05, 2002 - 10:22 pm UTC
Sir,
I am using 8.0.4 and would i be able to set DCD for this version..
ALSO i didn't found sqlnet.ora in my server..
Only Tnsnames and listener files are present...
So how anyone can connect without sqlnet.ora?
And if i have to set DCD have i have to create sqlnet.ora?
Thanks.
atul
September 05, 2002 - 11:25 pm UTC
Yes, DCD was in 8.0
sqlnet.ora is an OPTIONAL file.
yes, you would create it.
Detecting dead connections and Database shutdowns
Nihal, December 30, 2002 - 8:42 pm UTC
Tom,
We are using Oracle 9.1, on windows NT which at times when shuting down normally (using immediate option) gives waiting for active calls errors, on investigation we have concluded that this is due to oracle making calls (selects) to non-oracle systems using 'hsodbc' and if the non-oracle system is down the session hangs, since all calls to non-oracle systems start a transaction, rollback segment is used even for selects, which is not being rolled back as the session is hung state and the database cannot be brought down with the immediate option with a session in hanged state.
we have to use abort option to shutdown the database in such situations.
If we use DCD whats are the chances that the database will go down smoothly, i.e. if the database is being brought down with immediate option, will DCD be used on hanged/dead connections to rolled them back. I mean does it come in play when the database is in the process of being shutdown but can't shutdown as some sessions is hung up.
We are also using Jobs to do some batch processing, does DCD in any way affect job processes.
As far as the need to shutdown goes, We do need to shutdown the database everyday for Backup, Company policy is for cold backup and it will remain so untill the disastor and recovery using hot backups have been throughly tested by the org, which can be quite a few months
Thanks in advance
Nihal
December 30, 2002 - 8:52 pm UTC
You'll have to test -- i've no experience with that.
I doubt it would work - DCD is for client connections - the server pings the client. With hsodbc -- the server is the client. there is no Oracle net on the other end to ping back -- they do not do our protocol.
Company policy is STUPID.
Cold backups & Hot backups in archive log mode are *the same*. you take the *same* exact steps to recover with them. if this takes months -- well -- don't know who you are working for but anyway....
Detecting dead connections and database shutdowns
Nihal, January 02, 2003 - 6:17 pm UTC
Hi Tom,
Is there a way to detect if the DCD has been enabled, i have put the expire_time in the sqlnet .ora, but how do i asertain that what i have done is right, what i need to know is that is there a command which can be executed in sqlplus or thru lsnrctl which will tell us that expire_time is 1 minute.
The connection architecture is as follows Using jdbc to connect to Oracle which in turns connects to remote DB using hsodbc or using pl/sql to connect to Oracle which in turn connects to hsodbc, there is a client in both the cases which is not the server that will execute the sql statement, for the remote db Oracle server is the client, the lock is at the oracle database level hence if the client (Java/plsqlplus does not respond ) then oracle should rollback the process.
January 02, 2003 - 6:35 pm UTC
kill the client.
See if the process in the database goes away.
Perfect
Evan, June 25, 2003 - 8:43 am UTC
Dead on to what I needed. Thanks
EXPIRE_TIME is late...
Cefers.br, August 29, 2003 - 8:46 am UTC
Hi Tom,
I´ve configured DCD, but it seems to be working a litle late.
My SQLNET.EXPIRE_TIME is 30 min but in the tests I´ve done, the dead connections went away after about 50-60 min (quite after the expected).
I´m running Oracle 9iR2 on Linux Advanced Server 2.0. The tests were done in clients running on Win 98 (Oracle Client 8) and Win XP (Oracle Client 9).
My tests were just like that:
* Connect on SqlPlus
* Powerdown the client machine (direct in the power button)
* Trace the dead client session in V$SESSION every 5 minutes, until it´s gone
My guess is there is some kind of problem between Windows clients on Linux Server.
Do you have any idea? Have you seen this before?
Once again, thanks very much.
August 29, 2003 - 9:40 am UTC
everything is done in "big windows of time"..
solution for you would be to specify a significantly smaller window of time, 5 or 10 minutes is usually sufficient and it will very much be "oh, around 5/10 minutes give or take"
some doubts about kill -9 and about dead connections ..
Js, May 28, 2004 - 9:16 am UTC
Hi ...
When we kill a session then it marked "killed" in v$session will be removed
when client will try to connect again ...
I have some doubts ...
1. After killing session ' alter system kill session '
if I abort the client application forcely .. then when will the entry be removed from
v$session.
2. Why, we need to use kill -9 after killing the session from oracle.
3. what is dead connections.
Thanks,
Js
May 28, 2004 - 11:36 am UTC
1) maybe never.
2) you don't
3) when the client disappears, windows blue screen for example. we have dead client detection you can enable at the sqlnet layer to detect this and clean up the dead sessions.
DCD over dblink
Dilip, June 17, 2004 - 12:30 pm UTC
Tom,
I believe sqlnet.expire_time is useful for identify and kill client-server connections. What if the connection is server-server via DB Link. Would sqlnet.expire_time still be useful to disconnect sessions.
Thanks
Dilip.
June 17, 2004 - 2:54 pm UTC
yes, dcd packets are flung in a dblink environment as well. the server you connect to would be sending them periodically. if you "die", they eventually would not recieve a response...
kishor, April 07, 2005 - 1:47 am UTC
Tom,
Sorry for my simple question. But I want to clear my fundas.
here is the question :
PMON wakes up after 3 seconds to clean up terminated / killed process. But suppose at a moment a process got killed who was holding locks and resources. at same time other session (withing fraction of second after killed 1st session) need the locks and resources of 1st session.
But PMON will wake up after 3 seconds.
Will 2nd session waits for 3 seconds for PMON to wake up.
OR
second session will initiate the PMON to clean up the session .
OR
second session will get the message of resource busy ?
Please explain.
Thanks a lot in advance.
Kishor
April 07, 2005 - 9:14 am UTC
when a session is killed, the session itself rolls back. It would be a session that was terminated unexpectedly (not via a alter system) that pmon would be responsible for.
the blocked session in all cases has to wait for the resources to be freed. If you alter system kill the session, they will be released after the session rolls back. If the session died an unexpected death, it'll be after pmon does the same.
assuming the session requesting the lock was doing so in a fashion that would block and wait, else it would get the resource busy.
Detecting dead sessions
Alex, May 16, 2005 - 3:25 pm UTC
Tom,
I've setup a profile to "snipe" idle sessions couple days ago and monitor sessions' activity in the database. Today I've noticed about 25 "sniped" sessions that have been idle for quite some time. I called one of the users and asked if he can try to do something within application to make his session to go away, but he said he doesn't have it open at all (TOAD). I've asked him to open TOAD again and see if his old sessions are still there, and they were not. He also told me that he clicked on "x" on the top-right corner to close his app. I have SQLNET.EXPIRE_TIME set, but looks like DCD didn't work in this case. I searched your site and found your reply to one of the people that states:
"Followup:
by closing the window, the server has no idea the connection is broken, that
there is no client (windows tcp/ip doesn't tell the server "i've gone away")"
Is this true that when "x" is used to close the app, then server doesn't know that the client is gone, and if that is, then what else can I do to detect/disconnect those sessions?
Thanks.
May 16, 2005 - 5:03 pm UTC
if you have sqlnet expire time setup and the client isn't there anymore, that would have killed the session.
I'd ask the user to use taskmgr and see if toad isn't still there
and if not, recheck the configuration on the server and make sure dcd is really on.
reader
A reader, May 17, 2005 - 7:28 am UTC
After dcd detection, and pmon releases the resources,
what will be the status of the session as seen in v$session
May 17, 2005 - 9:12 am UTC
gone, not there.
the goal is to detect a dead client and get rid of it.
sql net expire time
sam, January 02, 2007 - 4:07 pm UTC
TOm:
How dod you check this parameter sql net expire time and how do you set it up? Is this why we are getting there are too many database connections message when there is no one logged in? a lot of dead sessions?
thanks
dead connections
sam, January 04, 2007 - 11:33 am UTC
Tom:
1. Is this correct? when you have a web application using mod_plsql connecting to oracle 9i database configured with number of processes=150, every time a web user logs in (aplication authentication) and calls a page it is considered a connection. Does that mean that after calling 150 pages the number of connections will be exceeded and then system will fail? Can you explain how does that work?
2. In a web system (stateless) what does constitue a database connection is "alive" or "dead".
3. Does the parameter "processes" include a count of all dead sessions?
4. I am trying to find the best solution to resolve the error "ORA-00020: Number of Connections exceeded" in a web app.
If you read this link
http://www.experts-exchange.com/Databases/Oracle/Q_20164850.html it suggests that the EXPIRE_TIME only works for client/server environment.
Also, ccreating a profile which sets the IDLE_TIME does not seem the best solution.
Would creating a procedure to kill sesssion and schedule it nightly be the best solution?
Create or replace procedure KillSession(idleTime in Number)
as cursor c1 is
select sid,serial#,trunc((last_call_et - trunc(last_call_et/3600,0)*3600) / 60,0) TMIN from
v$session
where status='INACTIVE' and username not in ('SYS');
VString varchar2(50);
begin
for c1rec in c1 loop
if c1rec.TMIN > idleTime then
Vstring := 'alter system kill session
'||chr(39)||c1rec.sid||','||c1rec.serial#||chr(39);
execute immediate(Vstring);
end if;
end loop;
end;
THank you,
January 05, 2007 - 9:08 am UTC
1) that is false.
2) they are all alive, they just might be inactive
3) depends on how the dead sessions connected, shared server - no.
4) do you understand the concurrency of your application? How many users hit the "go" button at the same time, you need to size for that.
the web is "client server", the client is your application server.
dead connection
sam, January 05, 2007 - 11:01 am UTC
TOm:
Can you briefly explain how:
1. what consitute a connection in a oracle web system. the mod_plsql is the client for the database.
2. What is the best solution to avoid the "Too many connections" issue in a web environment?
Thanks,
January 06, 2007 - 8:25 am UTC
1) the same thing that constitutes a connection in any configuration. The connection is initiated by a piece of software called mod_plsql running inside of apache on some server.
2) simply put: "not to have too many connections?", that is the only way to avoid "too many".
You need to understand how many you need to support your application - and to ensure you can support that. You can control the number of mod_plsqls (apache settings), you can control how many connections your database supports (sizing hardware, setting init.ora correctly)
for example, here on asktom, my maximum degree of concurrency is approximately 20, I need support for about 20 connections - round up for "safety" (a burst of activity) and there you go.
connections
sam, January 06, 2007 - 11:13 am UTC
Tom:
1. My question was what is a "Connection" physcially. When i call a page in asktom do i create a connection for one second and it goes away. Do I have an allocated connection for me? when someone else calls does that create another connection?
2. You say the best solution is to avoid too many connections. What about all these solutions for deleting dead connections from DCD to IDL_TIME to procedure deleting dead session?
Is there any reference in you book for how to resolve this.
January 07, 2007 - 8:00 pm UTC
1) no, mod_plsql connection pools - there are a pool of connections it uses (allocates and deallocates over time)
2) in a 3 tier environment, the chances of a dead connection are "much smaller", it typically does not come into play. they were mostly a problem in client server days with blue screens of death and client applications crashing and people powering off their machines.
dead connection
sam, January 09, 2007 - 6:16 pm UTC
TOm:
BUt we are working in 3 tier environment and it did happen? Is there a permanent solution for it instead of "Avoiding it" which I am not sure how.
Disconnected session not detected for long time
Subhasis Gangopadhyay, March 14, 2007 - 4:54 am UTC
Hi Tom,
My database version is 10.2.0.1.
I have a very unusual situation.
I am listing sequence of events done -
1. One user logged in to database with sqlplus from a client.
2. He updates a table and did not commit.
3. I disconnected the client machine by removing network connection.
4. The session is still in v$session and lock are shoing in v$lock/dba_locks.
5. This had taken about 2.5 hours to release.Client machine is still disconnected.
But my question is should or should not PMON have cleared this quickly(3 secs/1 min)..at least released resources(locks).I know about dead client detection(with sqlnet.ora) but I could not use it here.I also tried to use 'oradebug wakeup <pid of pmon>', but still it didn't help.
I am eagerly waiting for your inputs.
Thanks
Subhasis
March 14, 2007 - 7:55 am UTC
pmon would not clear this.
you WOULD use sqlnet dead client detection here - that is precisely what you have.
tcp is not really going to tell use the connection is broken, the server is waiting for the client to tell them to do something, client is never going to do that.
you have a dead client, if you want that detected, the server will have to ping clients to ask "still there", for that you use sqlnet dead client detection.
so the thing you say "you cannot use" is in fact what you need to use.
Re: Disconnected session not detected for long time
Subhasis Gangopadhyay, March 15, 2007 - 10:55 am UTC
Hi Tom,
Thanks a lot for your valuable inputs!!!!
Now, continuing more in this regard, we have observed if the connecting client is Redhat linux AS-4(in place of windows XP), then the connection did not die when the network came back.
Let me explain a bit further.
1. Opened a sqlplus session from a linux client.
2. Disconnected network.
3. Issued a sql...it hanged(network still disconnected).
4. After 15-20mins, reconnected network with the client.
5. The select statement gave its output.
But it case of windows client platform it starightway gave 'end of file communication channel' when a sql was fired from a disconnected session.
I hope this a due to different TCP protocol stack behaviour for different OS(linux and windows).How do I can make this behaviour in sync, as my client environment can be heterogeneous(linux/windows).
Can you please throw some light on this.
Thanks
Subhasis
March 15, 2007 - 12:33 pm UTC
throw light on what?
how to configure tcp/ip in different OS's? Sorry, I don't do that.
Alexander, April 17, 2009 - 4:30 pm UTC
Tom,
What do you have to do to enable dead client detection? Do you just edit the sqlnet.ora file and that's it? Do we have to bounce anything?
April 17, 2009 - 4:34 pm UTC
the dedicated server will read the sqlnet.ora when it starts - so it'll only affect future things. You do not need a restart of the database.
captwiggum, April 27, 2009 - 7:42 pm UTC
"""
tcp_disconnect.py
Echo network data test program in python. This program easily translates to C & Java.
By TCP rules, the only way for a server program to know if a client has disconnected,
is to try to read from the socket. Specifically, if select() says there is data, but
recv() returns 0 bytes of data, then this implies the client has disconnected.
But a server program might want to confirm that a tcp client is still connected without
reading data. For example, before it performs some task or sends data to the client.
This program will demonstrate how to detect a TCP client disconnect without reading data.
The method to do this:
1) select on socket as poll (no wait)
2) if no recv data waiting, then client still connected
3) if recv data waiting, the read one char using PEEK flag
4) if PEEK data len=0, then client has disconnected, otherwise its connected.
Note, the peek flag will read data without removing it from tcp queue.
To see it in action: 0) run this program on one computer 1) from another computer,
connect via telnet port 12345, 2) type a line of data 3) wait to see it echo,
4) type another line, 5) disconnect quickly, 6) watch the program will detect the
disconnect and exit.
I hope this is helpful to someone. John Masinter, 17-Dec-2008.
"""
import socket
import time
import select
HOST = '' # all local interfaces
PORT = 12345 # port to listen
# listen for new TCP connections
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.bind((HOST, PORT))
s.listen(1)
# accept new conneciton
conn, addr = s.accept()
print 'Connected by', addr
# loop reading/echoing, until client disconnects
try:
conn.send("Send me data, and I will echo it back after a short delay.\n")
while 1:
data = conn.recv(1024) # recv all data queued
if not data: break # client disconnected
time.sleep(3) # simulate time consuming work
# below will detect if client disconnects during sleep
r, w, e = select.select([conn], [], [], 0) # more data waiting?
print "select: r=%s w=%s e=%s" % (r,w,e) # debug output to command line
if r: # yes, data avail to read.
t = conn.recv(1024, socket.MSG_PEEK) # read without remove from queue
print "peek: len=%d, data=%s" % (len(t),t) # debug output
if len(t)==0: # length of data peeked 0?
print "Client disconnected." # client disconnected
break # quit program
conn.send("-->"+data) # echo only if still connected
finally:
conn.close()
April 27, 2009 - 9:46 pm UTC
what are you trying to show?
if you want us to ping the client to see if is alive, we do that in the sqlnet.ora via the expire time (dead client detection). But, I'm not really sure what relevance this post has to anything?
(and a disconnect can be quite different from "we aborted, goodbye", disconnect implies "nice and neat")
Enable SQLNET.EXPIRE_TIME for multiple DB Servers
David, February 10, 2010 - 1:01 pm UTC
Hello Tom:
Is it possible to set the SQLNET.EXPIRE_TIME for multiple database hosts from a singler central location or do you have to make the entry in the SQLNET.ORA file on each host?
I use a group policy in active directory to set the TNS_ADMIN parameter to point to single TNS_NAMES file on the LAN. Would this work for the SQLNET.ORA? Is there a better way?
February 16, 2010 - 8:04 am UTC
the sqlnet.ora is found via the TNS_ADMIN setting as well, yes.
Or I could just go read the manual...
David, February 10, 2010 - 1:42 pm UTC
"By default, sqlnet.ora is located in the $ORACLE_HOME/network/admin directory on UNIX operating systems and the ORACLE_HOME\network\admin directory on Windows operating systems. sqlnet.ora can also be stored in the directory specified by the TNS_ADMIN environment variable."
Oracle® Database Net Services Reference Guide
Hopscotch
David, February 11, 2010 - 10:11 am UTC
Hello Tom:
Will DCD work if a client connects to a database then queries an HSODBC data source across a data base link?
For example I connect with toad to the Oracle DB then issue a query to the HSODBC database (SQLServer) then shutdown Toad without logging out of the session.
February 16, 2010 - 10:10 am UTC
the oracle database would figure out that the client has disappeared and terminate that session - which in turn would terminate any connection to sqlserver made by that session.
jaan
john, March 14, 2012 - 3:44 pm UTC
very good efforts
Dead Client and Bequeathed connections
Al Ricafort, June 08, 2012 - 7:58 pm UTC
Hi Tom,
DCD has a limitation:
"Dead connection detection is not allowed on bequeathed connections."
I try testing this by connecting using sqlplus. 'ps' shows that my connection is bequeathed'
oracle 11504 0.1 0.6 1206848 26672 ? Ss 08:42 0:00 oraclemynewdb (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
I then killed my sqlplus using the 'kill' command. When I checked if my session is still there, it is gone.
So how did my session get killed?
I am using version 11.2.0.1.0.
Than you.
June 09, 2012 - 6:38 am UTC
just because the client goes away - does not mean that the server process will definitely stick around. with bequeat, the server process is a child of the client process using IPC (inter process communication) to send data back and forth. when the client is killed, the child must be getting some sort of signal that lets is know "parent is gone, might as well exit, they won't be asking me to do anything else"
How long will the session stays?
Al Ricafort, June 11, 2012 - 2:34 am UTC
Hi Tom,
If the client does not gracefully ends it session(say it is killed) how long before its corresponding session is removed? Or will the session stays there forever?
Thank you.
June 11, 2012 - 3:55 am UTC
it depends.
it depends on the tcp/ip implementation - do we get a signal after a period of time?
do you have DCD dead client detection setup?
what was the client doing when it died - if it was waiting for a response from the server - the server would get an error writing the answer back and would figure out the client was dead?
and so on.