python multiprocessing: why is process defunct after terminate? - python

I have some python multiprocessing code with the parent process starting a bunch of child worker processes and then terminating them after awhile:
from multiprocessing import Process
nWorkers = 10
curWorkers = []
for iw in range(nWorkers):
pq = Process(target=worker, args=(worker's_args_here))
pq.start()
curWorkers.append(pq)
# Do work here...
for pw in curWorkers:
pw.terminate()
However, the child processes all are showing as defunct long after termination. Are they zombie processes? More importantly, how should I terminate them so that they really go away?

Try adding:
for pw in curWorkers:
pw.join()
at the end. .terminate() just kills the process. The parent process still needs to reap it (at least on Linux-y systems) before the child process goes away entirely.

Related

In Python does the parent process continue to exist as long as any non-daemonic child processes are running

I am using the multiprocessing module of Python. I am testing the following code :
from multiprocessing import *
from time import sleep
def f():
print ('in child#1 proc')
sleep(2)
print('ch#1 ends')
def f1() :
print ('in child#2 proc')
sleep(10)
print('ch#2 ends')
if __name__ == '__main__':
p = Process(target=f)
p1 = Process(target=f1, daemon=True)
p.start()
p1.start()
sleep(1)
print ('child procs started')
I have the following observations :
The first child process p runs for 2 secs
After 1 sec, the second child process p1 becomes zombie
The parent (main) process runs (is active) till child#1 (non-daemon process) is running, that is for 2secs
Now I have the following queries :
Why should the parent (main) process be active after it finishes execution? Note that the parent does not perform a join on p.
Why should the daemon child p1 become a zombie after 1 sec? Note that the parent (main) process actually stays alive till the time p is running.
I have executed the above program on ubuntu.
My observations are based on the output os the ps command on ubuntu
To sum up and persist the discussion in the comments of the other answer:
Why should the parent (main) process be active after it finishes
execution? Note that the parent does not perform a join on p.
multiprocessing tries to make sure that your programs using it behave well. That is, it attempts to clean up after itself. In order to do so, it utilizes the atexit module which lets you register exit handlers that are to be executed when the interpreter process prepares to terminate normally.
multiprocessing defines and registers the function _exit_function that first calls terminate() on all still running daemonic childs and then calls join() on all remaining non-daemonic childs. Since join() blocks, the parent waits until the non-daemonic childs have terminated. terminate() on the other hand does not block, it simply sends a SIGTERM signal (on Unix) to childs and returns.
That brings us to:
Why should the daemon child p1 become a zombie after 1 sec? Note that
the parent (main) process actually stays alive till the time p is
running.
That is because the parent has reached the end of its instructions and the interpreter prepares to terminate, i.e. it executes the registered exit handlers. The daemonic child p1 receives a SIGTERM signal. Since SIGTERM is allowed to be caught and handled inside processes, the child is not ordered to shut down immediately, but instead is given the chance to do some cleanup of its own. That's what makes p1 show up as <defunct>. The Kernel knows that the process has been instructed to terminate, but the process has not done so yet.
In the given case, p1 has not yet had the chance to honor the SIGTERM signal, presumably because it still executes sleep(). At least as of Python 3.5:
The function now sleeps at least secs even if the sleep is interrupted
by a signal, except if the signal handler raises an exception (see PEP
475 for the rationale).
The parent stays alive because it is the root of the app. It stays in memory while the children are processing. Note, join waits for the child to exit and then gives control back to the parent. If you don't join the parent will exit but remain in memory.
p1 will zombie because the parent exits after the sleep 1. It stays alive with p because you don't deamon p. if you don't deamon a process and you call start on it, the control is passed to the child and when the child is complete it will pass control back to the parent. if you do daemon it, it will keep control with the parent and run the child in the back.

What exactly is Python multiprocessing Module's .join() Method Doing?

Learning about Python Multiprocessing (from a PMOTW article) and would love some clarification on what exactly the join() method is doing.
In an old tutorial from 2008 it states that without the p.join() call in the code below, "the child process will sit idle and not terminate, becoming a zombie you must manually kill".
from multiprocessing import Process
def say_hello(name='world'):
print "Hello, %s" % name
p = Process(target=say_hello)
p.start()
p.join()
I added a printout of the PID as well as a time.sleep to test and as far as I can tell, the process terminates on its own:
from multiprocessing import Process
import sys
import time
def say_hello(name='world'):
print "Hello, %s" % name
print 'Starting:', p.name, p.pid
sys.stdout.flush()
print 'Exiting :', p.name, p.pid
sys.stdout.flush()
time.sleep(20)
p = Process(target=say_hello)
p.start()
# no p.join()
within 20 seconds:
936 ttys000 0:00.05 /Library/Frameworks/Python.framework/Versions/2.7/Reso
938 ttys000 0:00.00 /Library/Frameworks/Python.framework/Versions/2.7/Reso
947 ttys001 0:00.13 -bash
after 20 seconds:
947 ttys001 0:00.13 -bash
Behavior is the same with p.join() added back at end of the file. Python Module of the Week offers a very readable explanation of the module; "To wait until a process has completed its work and exited, use the join() method.", but it seems like at least OS X was doing that anyway.
Am also wondering about the name of the method. Is the .join() method concatenating anything here? Is it concatenating a process with it's end? Or does it just share a name with Python's native .join() method?
The join() method, when used with threading or multiprocessing, is not related to str.join() - it's not actually concatenating anything together. Rather, it just means "wait for this [thread/process] to complete". The name join is used because the multiprocessing module's API is meant to look as similar to the threading module's API, and the threading module uses join for its Thread object. Using the term join to mean "wait for a thread to complete" is common across many programming languages, so Python just adopted it as well.
Now, the reason you see the 20 second delay both with and without the call to join() is because by default, when the main process is ready to exit, it will implicitly call join() on all running multiprocessing.Process instances. This isn't as clearly stated in the multiprocessing docs as it should be, but it is mentioned in the Programming Guidelines section:
Remember also that non-daemonic processes will be automatically be
joined.
You can override this behavior by setting the daemon flag on the Process to True prior to starting the process:
p = Process(target=say_hello)
p.daemon = True
p.start()
# Both parent and child will exit here, since the main process has completed.
If you do that, the child process will be terminated as soon as the main process completes:
daemon
The process’s daemon flag, a Boolean value. This must be set before
start() is called.
The initial value is inherited from the creating process.
When a process exits, it attempts to terminate all of its daemonic
child processes.
Without the join(), the main process can complete before the child process does. I'm not sure under what circumstances that leads to zombieism.
The main purpose of join() is to ensure that a child process has completed before the main process does anything that depends on the work of the child process.
The etymology of join() is that it's the opposite of fork, which is the common term in Unix-family operating systems for creating child processes. A single process "forks" into several, then "joins" back into one.
I'm not going to explain in detail what join does, but here's the etymology and the intuition behind it, which should help you remember its meaning more easily.
The idea is that execution "forks" into multiple processes of which one is the main/primary process, the rest workers (or minor/secondary). When the workers are done, they "join" the main process so that serial execution may be resumed.
The join() causes the main process to wait for a worker to join it. The method might better have been called "wait", since that's the actual behavior it causes in the master (and that's what it's called in POSIX, although POSIX threads call it "join" as well). The joining only occurs as an effect of the threads cooperating properly, it's not something the main process does.
The names "fork" and "join" have been used with this meaning in multiprocessing since 1963.
The join() call ensures that subsequent lines of your code are not called before all the multiprocessing processes are completed.
For example, without the join(), the following code will call restart_program() even before the processes finish, which is similar to asynchronous and is not what we want (you can try):
num_processes = 5
for i in range(num_processes):
p = multiprocessing.Process(target=calculate_stuff, args=(i,))
p.start()
processes.append(p)
for p in processes:
p.join() # call to ensure subsequent line (e.g. restart_program)
# is not called until all processes finish
restart_program()
join() is used to wait for the worker processes to exit. One must call close() or terminate() before using join().
Like #Russell mentioned join is like the opposite of fork (which Spawns sub-processes).
For join to run you have to run close() which will prevent any more tasks from being submitted to the pool and exit once all tasks complete. Alternatively, running terminate() will just exit by stopping all worker processes immediately.
"the child process will sit idle and not terminate, becoming a zombie you must manually kill" this is possible when the main (parent) process exits but the child process is still running and once completed it has no parent process to return its exit status to.
To wait until a process has completed its work and exited, use the join() method.
and
Note It is important to join() the process after terminating it in order to give the background machinery time to update the status of the object to reflect the termination.
This is a good example helped me understand it: here
One thing I noticed personally was my main process paused until the child had finished its process using the join() method which defeated the point of me using multiprocessing.Process() in the first place.

Kill Child Process if Parent is killed in Python

I'm spawning 5 different processes from a python script, like this:
p = multiprocessing.Process(target=some_method,args=(arg,))
p.start()
My problem is, when, somehow the parent process (the main script) gets killed, the child processes keeps on running.
Is there a way to kill child processes, which are spawned like this, when the parent gets killed ?
EDIT:
I'm trying this:
p = multiprocessing.Process(target=client.start,args=(self.query_interval,))
p.start()
atexit.register(p.terminate)
But this doesnt seem to be working
I've encounter the same problem myself, I've got the following solution:
before calling p.start(), you may set p.daemon=True. Then as mentioned here python.org multiprocessing
When a process exits, it attempts to terminate all of its daemonic child processes.
The child is not notified of the death of its parent, it only works the other way.
However, when a process dies, all its file descriptors are closed. And the other end of a pipe is notified about this, if it selects the pipe for reading.
So your parent can create a pipe before spawning the process (or in fact, you can just set up stdin to be a pipe), and the child can select that for reading. It will report ready for reading when the parent end is closed. This requires your child to run a main loop, or at least make regular calls to select. If you don't want that, you'll need some manager process to do it, but then when that one is killed, things break again.
If you have access to the parent pid you can use something like this
import os
import sys
import psutil
def kill_child_proc(ppid):
for process in psutil.process_iter():
_ppid = process.ppid()
if _ppid == ppid:
_pid = process.pid
if sys.platform == 'win32':
process.terminate()
else:
os.system('kill -9 {0}'.format(_pid))
kill_child_proc(<parent_pid>)
My case was using a Queue object to communicate with the child processes. For whatever reason, the daemon flag as suggested in the accepted answer does not work. Here's a minimal example illustrating how to get the children to die gracefully in this case.
The main idea is to pause child work execution every second or so and check if the parent process is still alive. If it is not alive, we close the Queue and exit.
Note this also works if the main process is killed using SIGKILL
import ctypes, sys
import multiprocessing as mp
worker_queue = mp.Queue(maxsize=10)
# flag to communicate the parent's death to all children
alive = mp.Value(ctypes.c_bool, lock=False)
alive.value = True
def worker():
while True:
# fake work
data = 99.99
# submit finished work to parent, while checking if parent has died
queued = False
while not queued:
# note here we do not block indefinitely, so we can check if parent died
try:
worker_queue.put(data, block=True, timeout=1.0)
queued = True
except: pass
# check if parent process is alive still
par_alive = mp.parent_process().is_alive()
if not (par_alive and alive.value):
# for some reason par_alive is only False for one of the children;
# notify the others that the parent has died
alive.value = False
# appears we need to close the queue before sys.exit will work
worker_queue.close()
# for more dramatic shutdown, could try killing child process;
# wp.current_process().kill() does not work, though you could try
# calling os.kill directly with the child PID
sys.exit(1)
# launch worker processes
for i in range(4):
child = mp.Process(target=worker)
child.start()

How to let the child process live when parent process exited?

I want to use multiprocessing module to complete this.
when I do this, like:
$ python my_process.py
I start a parent process, and then let the parent process spawn a child process,
then i want that the parent process exits itself, but the child process continues to work.
Allow me write a WRONG code to explain myself:
from multiprocessing import Process
def f(x):
with open('out.dat', 'w') as f:
f.write(x)
if __name__ == '__main__':
p = Process(target=f, args=('bbb',))
p.daemon = True # This is key, set the daemon, then parent exits itself
p.start()
#p.join() # This is WRONG code, just want to exlain what I mean.
# the child processes will be killed, when father exit
So, how do i start a process that will not be killed when the parent process finishes?
20140714
Hi, you guys
My friend just told me a solution...
I just think...
Anyway, just let u see:
import os
os.system('python your_app.py&') # SEE!? the & !!
this does work!!
A trick: call os._exit to make parent process exit, in this way daemonic child processes will not be killed.
But there are some other side affects, described in the doc:
Exit the process with status n, without calling cleanup handlers,
flushing stdio buffers, etc.
If you do not care about this, you can use it.
Here's one way to achieve an independent child process that does not exit when __main__ exits. It uses the os._exit() tip mentioned above by #WKPlus.
Is there a way to detach matplotlib plots so that the computation can continue?

Is there no need to reap zombie process in python?

It seems to me in Python, there is no need to reap zombie processes.
For example, in the following code
import multiprocessing
import time
def func(msg):
time.sleep(2)
print "done " + str(msg)
if __name__ == "__main__":
for i in range(10):
p = multiprocessing.Process(target=func, args=('3'))
p.start()
print "child"+str(i)
print "parent"
time.sleep(100)
When all the child process exit, the parent process is still running
and at this time, I checked the process using ps -ef
and I noticed there is no defunct process.
Does this mean that in Python, there is no need to reap zombie process?
After having a look to the library - especially to multiprocessing/process.py -, I see that
in Process.start(), there is a _current_process._children.add(self) which adds the started process to a list/set/whatever,
a few lines above, there is a _cleanup() which polls and discards terminated processes, removing zombies.
But that doesn't explain why your code doesn't produce zombies, as the childs wait a while befor terminating, so that the parent's start() calls don't notice that yet.
Those processes are not actually zombies since they should terminate successfully.
You could set the child processes to be deamonic so they'll terminate if the main process terminates.

Categories

Resources