Ensuring subprocesses are dead on exiting Python program - python

Is there a way to ensure all created subprocess are dead at exit time of a Python program? By subprocess I mean those created with subprocess.Popen().
If not, should I iterate over all of the issuing kills and then kills -9? anything cleaner?

You can use atexit for this, and register any clean up tasks to be run when your program exits.
atexit.register(func[, *args[, **kargs]])
In your cleanup process, you can also implement your own wait, and kill it when a your desired timeout occurs.
>>> import atexit
>>> import sys
>>> import time
>>>
>>>
>>>
>>> def cleanup():
... timeout_sec = 5
... for p in all_processes: # list of your processes
... p_sec = 0
... for second in range(timeout_sec):
... if p.poll() == None:
... time.sleep(1)
... p_sec += 1
... if p_sec >= timeout_sec:
... p.kill() # supported from python 2.6
... print 'cleaned up!'
...
>>>
>>> atexit.register(cleanup)
>>>
>>> sys.exit()
cleaned up!
Note -- Registered functions won't be run if this process (parent process) is killed.
The following windows method is no longer needed for python >= 2.6
Here's a way to kill a process in windows. Your Popen object has a pid attribute, so you can just call it by success = win_kill(p.pid) (Needs pywin32 installed):
def win_kill(pid):
'''kill a process by specified PID in windows'''
import win32api
import win32con
hProc = None
try:
hProc = win32api.OpenProcess(win32con.PROCESS_TERMINATE, 0, pid)
win32api.TerminateProcess(hProc, 0)
except Exception:
return False
finally:
if hProc != None:
hProc.Close()
return True

On *nix's, maybe using process groups can help you out - you can catch subprocesses spawned by your subprocesses as well.
if __name__ == "__main__":
os.setpgrp() # create new process group, become its leader
try:
# some code
finally:
os.killpg(0, signal.SIGKILL) # kill all processes in my group
Another consideration is to escalate the signals: from SIGTERM (default signal for kill) to SIGKILL (a.k.a kill -9). Wait a short while between the signals to give the process a chance to exit cleanly before you kill -9 it.

The subprocess.Popen.wait() is the only way to assure that they're dead. Indeed, POSIX OS's require that you wait on your children. Many *nix's will create a "zombie" process: a dead child for which the parent didn't wait.
If the child is reasonably well-written, it terminates. Often, children read from PIPE's. Closing the input is a big hint to the child that it should close up shop and exit.
If the child has bugs and doesn't terminate, you may have to kill it. You should fix this bug.
If the child is a "serve-forever" loop, and is not designed to terminate, you should either kill it or provide some input or message which will force it to terminate.
Edit.
In standard OS's, you have os.kill( PID, 9 ). Kill -9 is harsh, BTW. If you can kill them with SIGABRT (6?) or SIGTERM (15) that's more polite.
In Windows OS, you don't have an os.kill that works. Look at this ActiveState Recipe for terminating a process in Windows.
We have child processes that are WSGI servers. To terminate them we do a GET on a special URL; this causes the child to clean up and exit.

Find out a solution for linux (without installing prctl):
def _set_pdeathsig(sig=signal.SIGTERM):
"""help function to ensure once parent process exits, its childrent processes will automatically die
"""
def callable():
libc = ctypes.CDLL("libc.so.6")
return libc.prctl(1, sig)
return callable
subprocess.Popen(your_command, preexec_fn=_set_pdeathsig(signal.SIGTERM))

Warning: Linux-only! You can make your child receive a signal when its parent dies.
First install python-prctl==1.5.0 then change your parent code to launch your child processes as follows
subprocess.Popen(["sleep", "100"], preexec_fn=lambda: prctl.set_pdeathsig(signal.SIGKILL))
What this says is:
launch subprocess: sleep 100
after forking and before exec of the subprocess, the child registers for "send me a SIGKILL
when my parent terminates".

orip's answer is helpful but has the downside that it kills your process and returns an error code your parent. I avoided that like this:
class CleanChildProcesses:
def __enter__(self):
os.setpgrp() # create new process group, become its leader
def __exit__(self, type, value, traceback):
try:
os.killpg(0, signal.SIGINT) # kill all processes in my group
except KeyboardInterrupt:
# SIGINT is delievered to this process as well as the child processes.
# Ignore it so that the existing exception, if any, is returned. This
# leaves us with a clean exit code if there was no exception.
pass
And then:
with CleanChildProcesses():
# Do your work here
Of course you can do this with try/except/finally but you have to handle the exceptional and non-exceptional cases separately.

I needed a small variation of this problem (cleaning up subprocesses, but without exiting the Python program itself), and since it's not mentioned here among the other answers:
p=subprocess.Popen(your_command, preexec_fn=os.setsid)
os.killpg(os.getpgid(p.pid), 15)
setsid will run the program in a new session, thus assigning a new process group to it and its children. calling os.killpg on it thus won't bring down your own python process also.

poll( )
Check if child process has terminated.
Returns returncode attribute.

A solution for windows may be to use the win32 job api e.g. How do I automatically destroy child processes in Windows?
Here's an existing python implementation
https://gist.github.com/ubershmekel/119697afba2eaecc6330

Is there a way to ensure all created subprocess are dead at exit time of a Python program? By subprocess I mean those created with subprocess.Popen().
You could violate encapsulation and test that all Popen processes have terminated by doing
subprocess._cleanup()
print subprocess._active == []
If not, should I iterate over all of the issuing kills and then kills -9? anything cleaner?
You cannot ensure that all subprocesses are dead without going out and killing every survivor. But if you have this problem, it is probably because you have a deeper design problem.

I actually needed to do this, but it involved running remote commands. We wanted to be able to stop the processes by closing the connection to the server. Also, if, for example, you are running in the python repl, you can select to run as foreground if you want to be able to use Ctrl-C to exit.
import os, signal, time
class CleanChildProcesses:
"""
with CleanChildProcesses():
Do work here
"""
def __init__(self, time_to_die=5, foreground=False):
self.time_to_die = time_to_die # how long to give children to die before SIGKILL
self.foreground = foreground # If user wants to receive Ctrl-C
self.is_foreground = False
self.SIGNALS = (signal.SIGHUP, signal.SIGTERM, signal.SIGABRT, signal.SIGALRM, signal.SIGPIPE)
self.is_stopped = True # only call stop once (catch signal xor exiting 'with')
def _run_as_foreground(self):
if not self.foreground:
return False
try:
fd = os.open(os.ctermid(), os.O_RDWR)
except OSError:
# Happens if process not run from terminal (tty, pty)
return False
os.close(fd)
return True
def _signal_hdlr(self, sig, framte):
self.__exit__(None, None, None)
def start(self):
self.is_stopped = False
"""
When running out of remote shell, SIGHUP is only sent to the session
leader normally, the remote shell, so we need to make sure we are sent
SIGHUP. This also allows us not to kill ourselves with SIGKILL.
- A process group is called orphaned when the parent of every member is
either in the process group or outside the session. In particular,
the process group of the session leader is always orphaned.
- If termination of a process causes a process group to become orphaned,
and some member is stopped, then all are sent first SIGHUP and then
SIGCONT.
consider: prctl.set_pdeathsig(signal.SIGTERM)
"""
self.childpid = os.fork() # return 0 in the child branch, and the childpid in the parent branch
if self.childpid == 0:
try:
os.setpgrp() # create new process group, become its leader
os.kill(os.getpid(), signal.SIGSTOP) # child fork stops itself
finally:
os._exit(0) # shut down without going to __exit__
os.waitpid(self.childpid, os.WUNTRACED) # wait until child stopped after it created the process group
os.setpgid(0, self.childpid) # join child's group
if self._run_as_foreground():
hdlr = signal.signal(signal.SIGTTOU, signal.SIG_IGN) # ignore since would cause this process to stop
self.controlling_terminal = os.open(os.ctermid(), os.O_RDWR)
self.orig_fore_pg = os.tcgetpgrp(self.controlling_terminal) # sends SIGTTOU to this process
os.tcsetpgrp(self.controlling_terminal, self.childpid)
signal.signal(signal.SIGTTOU, hdlr)
self.is_foreground = True
self.exit_signals = dict((s, signal.signal(s, self._signal_hdlr))
for s in self.SIGNALS)
def stop(self):
try:
for s in self.SIGNALS:
#don't get interrupted while cleaning everything up
signal.signal(s, signal.SIG_IGN)
self.is_stopped = True
if self.is_foreground:
os.tcsetpgrp(self.controlling_terminal, self.orig_fore_pg)
os.close(self.controlling_terminal)
self.is_foreground = False
try:
os.kill(self.childpid, signal.SIGCONT)
except OSError:
"""
can occur if process finished and one of:
- was reaped by another process
- if parent explicitly ignored SIGCHLD
signal.signal(signal.SIGCHLD, signal.SIG_IGN)
- parent has the SA_NOCLDWAIT flag set
"""
pass
os.setpgrp() # leave the child's process group so I won't get signals
try:
os.killpg(self.childpid, signal.SIGINT)
time.sleep(self.time_to_die) # let processes end gracefully
os.killpg(self.childpid, signal.SIGKILL) # In case process gets stuck while dying
os.waitpid(self.childpid, 0) # reap Zombie child process
except OSError as e:
pass
finally:
for s, hdlr in self.exit_signals.iteritems():
signal.signal(s, hdlr) # reset default handlers
def __enter__(self):
if self.is_stopped:
self.start()
def __exit__(self, exit_type, value, traceback):
if not self.is_stopped:
self.stop()
Thanks to Malcolm Handley for the initial design. Done with python2.7 on linux.

You can try subalive, a package I wrote for similar problem. It uses periodic alive ping via RPC, and the slave process automatically terminates when the master stops alive pings for some reason.
https://github.com/waszil/subalive
Example for master:
from subalive import SubAliveMaster
# start subprocess with alive keeping
SubAliveMaster(<path to your slave script>)
# do your stuff
# ...
Example for slave subprocess:
from subalive import SubAliveSlave
# start alive checking
SubAliveSlave()
# do your stuff
# ...

It's possible to get some more guarantees on windows by spawning a separate process to oversee the destruction.
import subprocess
import sys
import os
def terminate_process_on_exit(process):
if sys.platform == "win32":
try:
# Or provide this script normally.
# Here just to make it somewhat self-contained.
# see https://stackoverflow.com/a/22559493/3763139
# see https://superuser.com/a/1299350/388191
with open('.process_watchdog_helper.bat', 'x') as file:
file.write(""":waitforpid
tasklist /nh /fi "pid eq %1" 2>nul | find "%1" >nul
if %ERRORLEVEL%==0 (
timeout /t 5 /nobreak >nul
goto :waitforpid
) else (
wmic process where processid="%2" call terminate >nul
)""")
except:
pass
# After this spawns we're pretty safe. There is a race, but we do what we can.
subprocess.Popen(
['.process_watchdog_helper.bat', str(os.getpid()), str(process.pid)],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL
)
# example
class DummyProcess:
def __init__(self, pid):
self.pid = pid
set_terminate_when_this_process_dies(DummyProcess(7516))

This is what I did for my posix app:
When your app exists call the kill() method of this class:
http://www.pixelbeat.org/libs/subProcess.py
Example use here:
http://code.google.com/p/fslint/source/browse/trunk/fslint-gui#608

help for python code:
http://docs.python.org/dev/library/subprocess.html#subprocess.Popen.wait

Related

multiprocessing produces defunct process

I use Tornado as a web server, user can submit a task through the front end page, after auditing they can start the submitted task. In this situation, i want to start an asynchronous sub process to handle the task, so i write the following code in an request handler:
def task_handler():
// handle task here
def start_a_process_for_task():
p = multiprocessing.Process(target=task_handler,args=())
p.start()
return 0
I don't care about the sub process and just start a process for it and return to the front end page and tell user the task is started. The task itself will run in the background and will record it's status or results to database so user
can view on the web page later. So here i don't want to use p.join() which is blocking, but without p.join() after the task finished,the sub process becomes a defunct process and as Tornado runs as a daemon and never exits, the defunct process will never disappear.
Anyone knows how to fix this problem, thanks.
The proper way to avoid defunct children is for the parent to gracefully clean up and close all resources of the exited child. This is normally done by join(), but if you want to avoid that, another approach could be to set up a global handler for the SIGCHLD signal on the parent.
SIGCHLD will be emitted whenever a child exits, and in the handler function you should either call Process.join() if you still have access to the process object, or even use os.wait() to "wait" for any child process to terminate and properly reap it. The wait time here should be 0 as you know for sure a child process has just exited. You will also be able to get the process' exit code / termination signal so it can also be a useful method to handle / log child process crashes.
Here's a quick example of doing this:
from __future__ import print_function
import os
import signal
import time
from multiprocessing import Process
def child_exited(sig, frame):
pid, exitcode = os.wait()
print("Child process {pid} exited with code {exitcode}".format(
pid=pid, exitcode=exitcode
))
def worker():
time.sleep(5)
print("Process {pid} has completed it's work".format(pid=os.getpid()))
def parent():
children = []
# Comment out the following line to see zombie children
signal.signal(signal.SIGCHLD, child_exited)
for i in range(5):
c = Process(target=worker)
c.start()
print("Parent forked out worker process {pid}".format(pid=c.pid))
children.append(c)
time.sleep(1)
print("Forked out {c} workers, hit Ctrl+C to end...".format(c=len(children)))
while True:
time.sleep(5)
if __name__ == '__main__':
parent()
One caveat is that I am not sure if this process works on non-Unix operating systems. It should work on Linux, Mac and other Unixes.
You need to join your subprocesses if you do not want to create zombies. You can do it in threads.
This is a dummy example. After 10 seconds, all your subprocesses are gone instead of being zombies. This launches a thread for every subprocess. Threads do not need to be joined or waited. A thread executes subprocess, joins it and then exits the thread as soon as the subprocess is completed.
import multiprocessing
import threading
from time import sleep
def task_processor():
sleep(10)
class TaskProxy(threading.Thread):
def __init__(self):
super(TaskProxy, self).__init__()
def run(self):
p = multiprocessing.Process(target=task_processor,args=())
p.start()
p.join()
def task_handler():
t = TaskProxy()
t.daemon = True
t.start()
return
for _ in xrange(0,20):
task_handler()
sleep(60)

how to handle the commands that are hung indefinitely [duplicate]

Is there any argument or options to setup a timeout for Python's subprocess.Popen method?
Something like this:
subprocess.Popen(['..'], ..., timeout=20) ?
I would advise taking a look at the Timer class in the threading module. I used it to implement a timeout for a Popen.
First, create a callback:
def timeout( p ):
if p.poll() is None:
print 'Error: process taking too long to complete--terminating'
p.kill()
Then open the process:
proc = Popen( ... )
Then create a timer that will call the callback, passing the process to it.
t = threading.Timer( 10.0, timeout, [proc] )
t.start()
t.join()
Somewhere later in the program, you may want to add the line:
t.cancel()
Otherwise, the python program will keep running until the timer has finished running.
EDIT: I was advised that there is a race condition that the subprocess p may terminate between the p.poll() and p.kill() calls. I believe the following code can fix that:
import errno
def timeout( p ):
if p.poll() is None:
try:
p.kill()
print 'Error: process taking too long to complete--terminating'
except OSError as e:
if e.errno != errno.ESRCH:
raise
Though you may want to clean the exception handling to specifically handle just the particular exception that occurs when the subprocess has already terminated normally.
subprocess.Popen doesn't block so you can do something like this:
import time
p = subprocess.Popen(['...'])
time.sleep(20)
if p.poll() is None:
p.kill()
print 'timed out'
else:
print p.communicate()
It has a drawback in that you must always wait at least 20 seconds for it to finish.
import subprocess, threading
class Command(object):
def __init__(self, cmd):
self.cmd = cmd
self.process = None
def run(self, timeout):
def target():
print 'Thread started'
self.process = subprocess.Popen(self.cmd, shell=True)
self.process.communicate()
print 'Thread finished'
thread = threading.Thread(target=target)
thread.start()
thread.join(timeout)
if thread.is_alive():
print 'Terminating process'
self.process.terminate()
thread.join()
print self.process.returncode
command = Command("echo 'Process started'; sleep 2; echo 'Process finished'")
command.run(timeout=3)
command.run(timeout=1)
The output of this should be:
Thread started
Process started
Process finished
Thread finished
0
Thread started
Process started
Terminating process
Thread finished
-15
where it can be seen that, in the first execution, the process finished correctly (return code 0), while the in the second one the process was terminated (return code -15).
I haven't tested in windows; but, aside from updating the example command, I think it should work since I haven't found in the documentation anything that says that thread.join or process.terminate is not supported.
You could do
from twisted.internet import reactor, protocol, error, defer
class DyingProcessProtocol(protocol.ProcessProtocol):
def __init__(self, timeout):
self.timeout = timeout
def connectionMade(self):
#defer.inlineCallbacks
def killIfAlive():
try:
yield self.transport.signalProcess('KILL')
except error.ProcessExitedAlready:
pass
d = reactor.callLater(self.timeout, killIfAlive)
reactor.spawnProcess(DyingProcessProtocol(20), ...)
using Twisted's asynchronous process API.
A python subprocess auto-timeout is not built in, so you're going to have to build your own.
This works for me on Ubuntu 12.10 running python 2.7.3
Put this in a file called test.py
#!/usr/bin/python
import subprocess
import threading
class RunMyCmd(threading.Thread):
def __init__(self, cmd, timeout):
threading.Thread.__init__(self)
self.cmd = cmd
self.timeout = timeout
def run(self):
self.p = subprocess.Popen(self.cmd)
self.p.wait()
def run_the_process(self):
self.start()
self.join(self.timeout)
if self.is_alive():
self.p.terminate() #if your process needs a kill -9 to make
#it go away, use self.p.kill() here instead.
self.join()
RunMyCmd(["sleep", "20"], 3).run_the_process()
Save it, and run it:
python test.py
The sleep 20 command takes 20 seconds to complete. If it doesn't terminate in 3 seconds (it won't) then the process is terminated.
el#apollo:~$ python test.py
el#apollo:~$
There is three seconds between when the process is run, and it is terminated.
As of Python 3.3, there is also a timeout argument to the blocking helper functions in the subprocess module.
https://docs.python.org/3/library/subprocess.html
Unfortunately, there isn't such a solution. I managed to do this using a threaded timer that would launch along with the process that would kill it after the timeout but I did run into some stale file descriptor issues because of zombie processes or some such.
No there is no time out. I guess, what you are looking for is to kill the sub process after some time. Since you are able to signal the subprocess, you should be able to kill it too.
generic approach to sending a signal to subprocess:
proc = subprocess.Popen([command])
time.sleep(1)
print 'signaling child'
sys.stdout.flush()
os.kill(proc.pid, signal.SIGUSR1)
You could use this mechanism to terminate after a time out period.
Yes, https://pypi.python.org/pypi/python-subprocess2 will extend the Popen module with two additional functions,
Popen.waitUpTo(timeout=seconds)
This will wait up to acertain number of seconds for the process to complete, otherwise return None
also,
Popen.waitOrTerminate
This will wait up to a point, and then call .terminate(), then .kill(), one orthe other or some combination of both, see docs for full details:
http://htmlpreview.github.io/?https://github.com/kata198/python-subprocess2/blob/master/doc/subprocess2.html
For Linux, you can use a signal. This is platform dependent so another solution is required for Windows. It may work with Mac though.
def launch_cmd(cmd, timeout=0):
'''Launch an external command
It launchs the program redirecting the program's STDIO
to a communication pipe, and appends those responses to
a list. Waits for the program to exit, then returns the
ouput lines.
Args:
cmd: command Line of the external program to launch
time: time to wait for the command to complete, 0 for indefinitely
Returns:
A list of the response lines from the program
'''
import subprocess
import signal
class Alarm(Exception):
pass
def alarm_handler(signum, frame):
raise Alarm
lines = []
if not launch_cmd.init:
launch_cmd.init = True
signal.signal(signal.SIGALRM, alarm_handler)
p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
signal.alarm(timeout) # timeout sec
try:
for line in p.stdout:
lines.append(line.rstrip())
p.wait()
signal.alarm(0) # disable alarm
except:
print "launch_cmd taking too long!"
p.kill()
return lines
launch_cmd.init = False

Python in Linux: kill processes and sub-processes using the shell

Q: Given an ever-running python program that runs another python program as its child, how can one kill the processes using python shell [i.e. by fetching the processes pids and then execute kill -9 <pid>]?
In more details:
I have a script as follows:
from subprocess import *
while True:
try:
Popen("python ...").wait() # some scrpipt
except:
exit(1)
try:
Popen("python ...").wait() # some scrpipt
except:
exit(1)
Now when I want to kill this process and its children, I:
Run "ps -ef | grep python" to fetch the pids.
Run kill -9 <pid> to kill the processes.
The result: The processes keeps on running after being assign with new pids.
Is there a graceful way to enable the processes to gracefully exit when killed?
Is there a graceful way to enable the processes to gracefully exit when killed?
There isn't when you kill -9. Kill with SIGINT (-2) or SIGTERM (-15), and catch that using the signal module by registering a cleanup function that handles the graceful exit.
import sys
import signal
def cleanup_function(signal, frame):
# clean up all resources
sys.exit(0)
signal.signal(signal.SIGINT, cleanup_function)
In this code parent will wait for child's exit status. If parent is getting its exist status, then only it will proceed to next iteration.
Also, you can't catch SIGKILL (SIGKILL and SIGSTOP are uncaught-able signals )
-9 means SIGKILL
You can implement SIGNAL handler incase of any other signals
import os
import time
def my_job():
print 'I am {0}, son/daughter of {1}'.format(os.getpid(), os.getppid())
time.sleep(50)
pass
if __name__ == '__main__':
while True:
pid = os.fork()
if pid > 0:
expired_child = os.wait() # if child is getting killed, will return a tuple containing its pid and exit status indication
if expired_child:
continue
else:
my_job()

python daemon thread exits but process still run in the background

I am using python 2.7 and Python thread doesn't kill its process after the main program exits. (checking this with the ps -ax command on ubuntu machine)
I have the below thread class,
import os
import threading
class captureLogs(threading.Thread):
'''
initialize the constructor
'''
def __init__(self, deviceIp, fileTag):
threading.Thread.__init__(self)
super(captureLogs, self).__init__()
self._stop = threading.Event()
self.deviceIp = deviceIp
self.fileTag = fileTag
def stop(self):
self._stop.set()
def stopped(self):
return self._stop.isSet()
'''
define the run method
'''
def run(self):
'''
Make the thread capture logs
'''
cmdTorun = "adb logcat > " + self.deviceIp +'_'+self.fileTag+'.log'
os.system(cmdTorun)
And I am creating a thread in another file sample.py,
import logCapture
import os
import time
c = logCapture.captureLogs('100.21.143.168','somefile')
c.setDaemon(True)
c.start()
print "Started the log capture. now sleeping. is this a dameon?", c.isDaemon()
time.sleep(5)
print "Sleep tiime is over"
c.stop()
print "Calling stop was successful:", c.stopped()
print "Thread is now completed and main program exiting"
I get the below output from the command line:
Started the log capture. now sleeping. is this a dameon? True
Sleep tiime is over
Calling stop was successful: True
Thread is now completed and main program exiting
And the sample.py exits.
But when I use below command on a terminal,
ps -ax | grep "adb"
I still see the process running. (I am killing them manually now using the kill -9 17681 17682)
Not sure what I am missing here.
My question is,
1) why is the process still alive when I already killed it in my program?
2) Will it create any problem if I don't bother about it?
3) is there any other better way to capture logs using a thread and monitor the logs?
EDIT: As suggested by #bug Killer, I added the below method in my thread class,
def getProcessID(self):
return os.getpid()
and used os.kill(c.getProcessID(), SIGTERM) in my sample.py . The program doesn't exit at all.
It is likely because you are using os.system in your thread. The spawned process from os.system will stay alive even after the thread is killed. Actually, it will stay alive forever unless you explicitly terminate it in your code or by hand (which it sounds like you are doing ultimately) or the spawned process exits on its own. You can do this instead:
import atexit
import subprocess
deviceIp = '100.21.143.168'
fileTag = 'somefile'
# this is spawned in the background, so no threading code is needed
cmdTorun = "adb logcat > " + deviceIp +'_'+fileTag+'.log'
proc = subprocess.Popen(cmdTorun, shell=True)
# or register proc.kill if you feel like living on the edge
atexit.register(proc.terminate)
# Here is where all the other awesome code goes
Since all you are doing is spawning a process, creating a thread to do it is overkill and only complicates your program logic. Just spawn the process in the background as shown above and then let atexit terminate it when your program exits. And/or call proc.terminate explicitly; it should be fine to call repeatedly (much like close on a file object) so having atexit call it again later shouldn't hurt anything.

Catch KeyboardInterrupt or handle signal in thread

I have some threads running, and one of those threads contains an object that will be spawning subprocesses. I want one such subprocess to be able to kill the entire application. The aforementioned object will need to save some state when it receives this signal. Unfortunately I can't get the signal to be handled in the thread that causes the kill.
Here is some example code that attempts to replicate the situation.
parent.py: starts a thread. that thread runs some subprocesses, one of which will try to kill the parent process.
#!/usr/local/bin/python3
import subprocess, time, threading, random
def killer_func():
possible_cmds = [['echo', 'hello'],
['echo', 'world'],
['/work/turbulencetoo/tmp/killer.py']
]
random.shuffle(possible_cmds)
for cmd in possible_cmds:
try:
time.sleep(2)
subprocess.check_call(cmd)
time.sleep(2)
except KeyboardInterrupt:
print("Kill -2 caught properly!!")
print("Here I could properly save my state")
break
except Exception as e:
print("Unhandled Exception: {}".format(e))
else:
print("No Exception")
killer_thread = threading.Thread(target=killer_func)
killer_thread.start()
try:
while True:
killer_thread.join(4)
if not killer_thread.is_alive():
print("The killer thread has died")
break
else:
print("Killer thread still alive, try to join again.")
except KeyboardInterrupt:
print("Caught the kill -2 in the main thread :(")
print("Main program shutting down")
killer.py, a simple program that tries to kill its parent process with SIGINT:
#!/usr/local/bin/python3
import time, os, subprocess, sys
ppid = os.getppid()
# -2 specifies SIGINT, python handles this as a KeyboardInterrupt exception
cmd = ["kill", "-2", "{}".format(ppid)]
subprocess.check_call(cmd)
time.sleep(3)
sys.exit(0)
Here is some sample output from running the parent program:
$ ./parent.py
hello
Killer thread still alive, try to join again.
No Exception
Killer thread still alive, try to join again.
Caught the kill -2 in the main thread :(
Main program shutting down
No Exception
world
No Exception
I've tried using signal.signal() inside killer_func, but it doesn't work in a sub thread.
Is there a way to force the signal or exception to be handled by the function without the main thread being aware?
The main thread of your program will always be the one that receives the signal. The signal module documentation states this:
Some care must be taken if both signals and threads are used in the
same program. The fundamental thing to remember in using signals and
threads simultaneously is: always perform signal() operations in the
main thread of execution. Any thread can perform an alarm(),
getsignal(), pause(), setitimer() or getitimer(); only the main thread
can set a new signal handler, and the main thread will be the only one
to receive signals (this is enforced by the Python signal module, even
if the underlying thread implementation supports sending signals to
individual threads). This means that signals can’t be used as a means
of inter-thread communication. Use locks instead.
You'll need to refactor your program such that the main thread receiving the signal doesn't prevent you from saving state. The easiest way is use something like threading.Event() to tell the background thread that the program has been aborted, and let it clean up when it sees the event has been set:
import subprocess
import threading
import random
def killer_func(event):
possible_cmds = [['echo', 'hello'],
['echo', 'world'],
['/home/cycdev/killer.py']
]
random.shuffle(possible_cmds)
for cmd in possible_cmds:
subprocess.check_call(cmd)
event.wait(4)
if event.is_set():
print("Main thread got a signal. Time to clean up")
# save state here.
return
event = threading.Event()
killer_thread = threading.Thread(target=killer_func, args=(event,))
killer_thread.start()
try:
killer_thread.join()
except KeyboardInterrupt:
print("Caught the kill -2 in the main thread :)")
event.set()
killer_thread.join()
print("Main program shutting down")
Signals are always handled in the main thread. When you receive a signal, you don't know where it comes from. You can't say "handle it in the thread that spawned the signal-sending-process" because you don't know what signal-sending-process is.
The way to solve this is to use Condition Variables to notify all threads that a signal was received and that they have to shut down.
import threading
got_interrupt = False # global variable
def killer_func(cv):
...
with cv:
cv.wait(2)
interupted = got_interrupt # Read got_interrupt while holding the lock
if interrupted:
cleanup()
...
lock = threading.Lock()
notifier_cv = threading.Condition(lock)
killer_thread = threading.Thread(target=killer_func, args=(notifier_cv,))
killer_thread.start()
try:
...
except KeyboardInterrupt:
with cv:
got_interrupt = True
cv.notify_all()

Categories

Resources