Killing a process spawned by another thread

Killing a process spawned by another thread - python

I have a Python process which is spawning another process from a separate thread, e.g.
class MyClass(unittest.TestCase):
def setup(self):
def spawn_proc():
subprocess.call("test_process")
thread = threading.Thread(target=spawn_proc, args=(), daemon=True)
thread.start()
def cleanup(self):
# ### kill test_process
So calling MyClass.setup() means the test_process will be spawned in second thread.
What I want is a way to kill test_process from this first thread. I've tried saving a reference to the process in spawn_proc(), but this is inaccessible in the first as spawn_proc() is executed in the second thread.
What is the best way to do this? Or is this approach incorrect from the off?
What does work is another call to subprocess to lookup the PID from the OS, then a further call to kill, but I'm not sure if there is a better way.

The problem is that subprocess.call() doesn't return any thread handle. It's a synchronous method (it returns only when the called program terminated).
Instead use subprocess.Popen():
def setup(self):
self.proc = subprocess.Popen("test_process")
def cleanup(self):
self.proc.kill()
Not only do you get a handle, you also avoid the threading module altogether.
More details on Popen (e.g. how to communicate with the process):
https://docs.python.org/2/library/subprocess.html#popen-constructor

Related

Recover async subprocess return code completed by timeout

I would like to know if there is a way to recover the return code when the asynchronous process has been completed by a timeout.
The constraints are that I want to recover this code in another class that is in another python file. In addition, I do not want to block my GUI ...
In my MainView.py, my code this :
if self.comboBox.currentText() == "HASHCAT" :
self.process = Hashcat(MainWindow.hashcatPath, 100, 3, MainWindow.hashFilePath, MainWindow.dictPath, MainWindow.pathOutFile)
self.process.run(2)
And my Hashcat.py file look like this :
def run(self,timeout):
def target():
FNULL = open(os.devnull, 'w')
if self.typeAttack == 0 :
self.subprocess=\
subprocess.Popen([self.pathHashcat,"-m",str(self.algoHash),"-a",str(self.typeAttack),self.pathHashFile,self.pathDict,
"-o",self.pathOutFile],
stdout=FNULL, stderr=subprocess.STDOUT)
if self.typeAttack == 3 :
self.subprocess =\
subprocess.Popen(
[self.pathHashcat, "-m", str(self.algoHash), "-a", str(self.typeAttack),self.pathHashFile,"-o",self.pathOutFile])
self.timer.start()
self.subprocess.wait()
self.timer.cancel()
def timer_callback():
print('Terminating process (timed out)')
self.subprocess.terminate()
self.thread = threading.Thread(target=target)
self.timer = threading.Timer(timeout, timer_callback)
self.thread.start()
print(self.timer.isAlive)

Calling terminate just sends the signal to kill the process; you may still have to wait on it before you can get the returncode; otherwise, it will still be None.
However, the returncode is unlikely to be all that meaningful. You just killed the process with a SIGTERM, so the returncode is going to be -SIGTERM.
If the problem is just that terminate takes too long, or isn't deterministic—well, SIGTERM is meant to be something the child process can use for clean shutdown, which can take time—and can even fail to do anything, if the child has a serious bug. If you really want it to go away immediately, you need to send a SIGKILL instead. (This is the difference between kill 12345 and kill -9 12345 from the terminal.) The way to do that from subprocess is to call the kill method instead of terminate.
The ideal solution is usually to have a double-timeout—e.g., terminate after X seconds, then kill if another Y seconds have passed without termination. This gives the process a chance to do graceful shutdown whenever possible, but still guarantees deterministic killing after X+Y seconds. But it depends—for some uses of some programs, giving the child an extra Y seconds to hopefully finish is more important than giving it Y seconds to clean up. Or it doesn't make much difference either way, and the single-step kill is just simpler to code.
(This is all a different if you're on Windows, but since you're on OS X, that's irrelevant.)

Python subprocess.communicate hangs when parent leaves zombies

I'm trying to use Popen to create a subprocess A along with a thread that communicates with it using Popen.communicate. The main process will wait on the thread using Thread.join with a specified timeout, and kills A after that timeout expires, which should cause the thread to die as well.
However, this doesn't seem to work when A itself spawns more subprocesses B,C and D with different process groups than A that refuse to die. Even after A is dead and labelled defunct, and even after the main process reaps A using os.waitpid() so that it no longer exists, the the thread refuses to join with the main thread.
Only after all the children, B, C, D are killed, does Popen.communicate finally return.
Is this behavior actually expected from the module? A recursive wait might be useful in some cases, but it's certainly not appropriate as the default behavior for Popen.communicate. And if this is the intended behavior, is there any way to override it?
Here's a very simple example:
from subprocess import PIPE, Popen
from threading import Thread
import os
import time
import signal
DEVNULL = open(os.devnull, 'w')
proc = Popen(["/bin/bash"], stdin=PIPE, stdout=PIPE,
stderr=DEVNULL, start_new_session=True)
def thread_function():
print("Entering thread")
return proc.communicate(input=b"nohup sleep 100 &\nexit\n")
thread = Thread(target=thread_function)
thread.start()
time.sleep(1)
proc.kill()
while True:
thread.join(timeout=5)
if not thread.is_alive():
break
print("Thread still alive")
This is on Linux.

I think this comes from a fairly natural way to write the popen.communicate method in Linux. Proc.communicate() appears to read the stdin file descriptor, which will return an EOF when the process dies. Then it does the wait to get the exit code of the process.
In your example, the sleep process inherits the stdin file descriptor from the bash process. So when the bash process dies, popen.communicate doesn't get an EOF on the stdin pipe, as the sleep still has it open. The simplest way to fix this is to change the communicate line to:
return proc.communicate(input=b"nohup sleep 100 >/dev/null&\nexit\n")
This causes your thread to end as soon the bash dies... due to the exit, not your proc.kill, in this case. However, the sleep is still running after bash dies if you use the exit statement or the proc.kill call. If you want to kill the sleep as well, I would use
os.killpg(proc.pid,15)
instead of the proc.kill(). The more general problem of killing B, C and D if they change the group is a more complex problem.
Addtional data:
I couldn't find any official documentation for this method of proc.communicate, but I forgot the most obvious place :-) I found it with the help of this answer. The docs for communicate say:
Interact with process: Send data to stdin. Read data from stdout and stderr, until end-of-file is reached. Wait for process to terminate.
You are getting stuck at step 2: Read until end-of-file, because the sleep is keeping the pipe open.

How to kill a subprocess started in a thread?

I am trying to run the Robocopy command (but I am curious about any subprocess) from Python in windows. The code is pretty simple and works well. It is:
def copy():
with Popen(['Robocopy', media_path, destination_path, '/E', '/mir', '/TEE', '/log+:' + log_path], stdout=PIPE, bufsize=1, universal_newlines=True) as Robocopy:
Robocopy.wait()
returncode = Robocopy.returncode
Additionally I am running it in a separate thread with the following:
threading.Thread(target=copy, args=(media_path, destination_path, log_path,), daemon=True)
However, there are certain instances where I want to stop the robocopy (akin to closing the CMD window if it was run from the command line)
Is there a good way to do this in Python?

We fought with reliably killing subprocesses on Windows for a while and eventually came across this:
https://github.com/andreisavu/python-process/blob/master/killableprocess.py
It implements a kill() method for killing your subprocess. We've had really good results with it.
You will need to somehow pass the process object out of the thread and call kill() from another thread, or poll in your thread with wait() using a timeout while monitoring some kind of global-ish flag.

If the process doesn't start other processes then process.kill() should work:
import subprocess
class InterruptableProcess:
def __init__(self, *args):
self._process = subprocess.Popen(args)
def interrupt(self):
self._process.kill()
I don't see why would you need it on Windows but you could run Thread(target=self._process.wait, daemon=True).start() if you'd like.
If there is a possibility that the process may start other processes in turn then you might need a Job object to kill all the descendant processes. It seems killableprocess.py which is suggested by #rrauenza uses this approach (I haven't tested it). See Python: how to kill child process(es) when parent dies?.

Python avoid orphan processes

I'm using python to benchmark something. This can take a large amount of time, and I want to set a (global) timeout. I use the following script (summarized):
class TimeoutException(Exception):
pass
def timeout_handler(signum, frame):
raise TimeoutException()
# Halt problem after half an hour
signal.alarm(1800)
try:
while solution is None:
guess = guess()
try:
with open(solutionfname, 'wb') as solutionf:
solverprocess = subprocess.Popen(["solver", problemfname], stdout=solutionf)
solverprocess.wait()
finally:
# `solverprocess.poll() == None` instead of try didn't work either
try:
solverprocess.kill()
except:
# Solver process was already dead
pass
except TimeoutException:
pass
# Cancel alarm if it's still active
signal.alarm(0)
However it keeps spawning orphan processes sometimes, but I can't reliably recreate the circumstances. Does anyone know what the correct way to prevent this is?

You simply have to wait after killing the process.

The documentation for the kill() method states:
Kills the child. On Posix OSs the function sends SIGKILL to the child.
On Windows kill() is an alias for terminate().
In other words, if you aren't on Windows, you are only sending a signal to the subprocess.
This will create a zombie process because the parent process didn't read the return value of the subprocess.
The kill() and terminate() methods are just shortcuts to send_signal(SIGKILL) and send_signal(SIGTERM).
Try adding a call to wait() after the kill(). This is even shown in the example under the documentation for communicate():
proc = subprocess.Popen(...)
try:
outs, errs = proc.communicate(timeout=15)
except TimeoutExpired:
proc.kill()
outs, errs = proc.communicate()
note the call to communicate() after the kill(). (It is equivalent to calling wait() and also erading the outputs of the subprocess).
I want to clarify one thing: it seems like you don't understand exactly what a zombie process is. A zombie process is a terminated process. The kernel keeps the process in the process table until the parent process reads its exit status. I believe all memory used by the subprocess is actually reused; the kernel only has to keep track of the exit status of such a process.
So, the zombie processes you see aren't running. They are already completely dead, and that's why they are called zombie. They are "alive" in the process table, but aren't really running at all.
Calling wait() does exactly this: wait till the subprocess ends and read the exit status. This allows the kernel to remove the subprocess from the process table.

On linux, you can use python-prctl.
Define a preexec function such as:
def pre_exec():
import signal
prctl.set_pdeathsig(signal.SIGTERM)
And have your Popen call pass it.
subprocess.Popen(..., preexec_fn=pre_exec)
That's as simple as that. Now the child process will die rather than become orphan if the parent dies.
If you don't like the external dependency of python-prctl you can also use the older prctl. Instead of
prctl.set_pdeathsig(signal.SIGTERM)
you would have
prctl.prctl(prctl.PDEATHSIG, signal.SIGTERM)

What exactly is Python multiprocessing Module's .join() Method Doing?

Learning about Python Multiprocessing (from a PMOTW article) and would love some clarification on what exactly the join() method is doing.
In an old tutorial from 2008 it states that without the p.join() call in the code below, "the child process will sit idle and not terminate, becoming a zombie you must manually kill".
from multiprocessing import Process
def say_hello(name='world'):
print "Hello, %s" % name
p = Process(target=say_hello)
p.start()
p.join()
I added a printout of the PID as well as a time.sleep to test and as far as I can tell, the process terminates on its own:
from multiprocessing import Process
import sys
import time
def say_hello(name='world'):
print "Hello, %s" % name
print 'Starting:', p.name, p.pid
sys.stdout.flush()
print 'Exiting :', p.name, p.pid
sys.stdout.flush()
time.sleep(20)
p = Process(target=say_hello)
p.start()
# no p.join()
within 20 seconds:
936 ttys000 0:00.05 /Library/Frameworks/Python.framework/Versions/2.7/Reso
938 ttys000 0:00.00 /Library/Frameworks/Python.framework/Versions/2.7/Reso
947 ttys001 0:00.13 -bash
after 20 seconds:
947 ttys001 0:00.13 -bash
Behavior is the same with p.join() added back at end of the file. Python Module of the Week offers a very readable explanation of the module; "To wait until a process has completed its work and exited, use the join() method.", but it seems like at least OS X was doing that anyway.
Am also wondering about the name of the method. Is the .join() method concatenating anything here? Is it concatenating a process with it's end? Or does it just share a name with Python's native .join() method?

The join() method, when used with threading or multiprocessing, is not related to str.join() - it's not actually concatenating anything together. Rather, it just means "wait for this [thread/process] to complete". The name join is used because the multiprocessing module's API is meant to look as similar to the threading module's API, and the threading module uses join for its Thread object. Using the term join to mean "wait for a thread to complete" is common across many programming languages, so Python just adopted it as well.
Now, the reason you see the 20 second delay both with and without the call to join() is because by default, when the main process is ready to exit, it will implicitly call join() on all running multiprocessing.Process instances. This isn't as clearly stated in the multiprocessing docs as it should be, but it is mentioned in the Programming Guidelines section:
Remember also that non-daemonic processes will be automatically be
joined.
You can override this behavior by setting the daemon flag on the Process to True prior to starting the process:
p = Process(target=say_hello)
p.daemon = True
p.start()
# Both parent and child will exit here, since the main process has completed.
If you do that, the child process will be terminated as soon as the main process completes:
daemon
The process’s daemon flag, a Boolean value. This must be set before
start() is called.
The initial value is inherited from the creating process.
When a process exits, it attempts to terminate all of its daemonic
child processes.

Without the join(), the main process can complete before the child process does. I'm not sure under what circumstances that leads to zombieism.
The main purpose of join() is to ensure that a child process has completed before the main process does anything that depends on the work of the child process.
The etymology of join() is that it's the opposite of fork, which is the common term in Unix-family operating systems for creating child processes. A single process "forks" into several, then "joins" back into one.

I'm not going to explain in detail what join does, but here's the etymology and the intuition behind it, which should help you remember its meaning more easily.
The idea is that execution "forks" into multiple processes of which one is the main/primary process, the rest workers (or minor/secondary). When the workers are done, they "join" the main process so that serial execution may be resumed.
The join() causes the main process to wait for a worker to join it. The method might better have been called "wait", since that's the actual behavior it causes in the master (and that's what it's called in POSIX, although POSIX threads call it "join" as well). The joining only occurs as an effect of the threads cooperating properly, it's not something the main process does.
The names "fork" and "join" have been used with this meaning in multiprocessing since 1963.

The join() call ensures that subsequent lines of your code are not called before all the multiprocessing processes are completed.
For example, without the join(), the following code will call restart_program() even before the processes finish, which is similar to asynchronous and is not what we want (you can try):
num_processes = 5
for i in range(num_processes):
p = multiprocessing.Process(target=calculate_stuff, args=(i,))
p.start()
processes.append(p)
for p in processes:
p.join() # call to ensure subsequent line (e.g. restart_program)
# is not called until all processes finish
restart_program()

join() is used to wait for the worker processes to exit. One must call close() or terminate() before using join().
Like #Russell mentioned join is like the opposite of fork (which Spawns sub-processes).
For join to run you have to run close() which will prevent any more tasks from being submitted to the pool and exit once all tasks complete. Alternatively, running terminate() will just exit by stopping all worker processes immediately.
"the child process will sit idle and not terminate, becoming a zombie you must manually kill" this is possible when the main (parent) process exits but the child process is still running and once completed it has no parent process to return its exit status to.

To wait until a process has completed its work and exited, use the join() method.
and
Note It is important to join() the process after terminating it in order to give the background machinery time to update the status of the object to reflect the termination.
This is a good example helped me understand it: here
One thing I noticed personally was my main process paused until the child had finished its process using the join() method which defeated the point of me using multiprocessing.Process() in the first place.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.