Forking and exiting from child in python

Forking and exiting from child in python - python

I'm trying to fork a process, do something in the child and then exit from it (see code below). To exit I first tried sys.exit which turned out to be a problem because an intermediate function caught the SystemExit exception (as in the code below) and so the child didn't actually terminate. I figured out that I should use os._exit instead. Now the child terminates, but I still see defunct processes lying around (when I do ps -ef). Is there a way to avoid these?
import os, sys
def fctn():
if os.fork() != 0:
return 0
# sys.exit(0)
os._exit(0)
while True:
str = raw_input()
try:
print(fctn())
except SystemExit:
print('Caught SystemExit.')
Edit: this was actually not really a Python question but more of a Unix question (so I guess results may vary depending on the system). Ivan's answer suggests that I should do something like
def handleSIGCHLD(sig, frame):
os.wait()
signal.signal(signal.SIGCHLD, handleSIGCHLD)
while for me a simple
signal.signal(signal.SIGCHLD, signal.SIG_IGN)
also works.
And then it's probably true that I should use some library...

You should wait() for a child to remove its zombie process entry from the table.
Finally, to offload tasks to children, you may be better off with multiprocessing.

you are better off using the subprocess module for your need. Its the preferred way of forking off a process.
https://docs.python.org/2/library/subprocess.html#subprocess.check_call

Related

Python Process which is joined will not call atexit

I thought Python Processes call their atexit functions when they terminate. Note that I'm using Python 2.7. Here is a simple example:
from __future__ import print_function
import atexit
from multiprocessing import Process
def test():
atexit.register(lambda: print("atexit function ran"))
process = Process(target=test)
process.start()
process.join()
I'd expect this to print "atexit function ran" but it does not.
Note that this question:
Python process won't call atexit
is similar, but it involves Processes that are terminated with a signal, and the answer involves intercepting that signal. The Processes in this question are exiting gracefully, so (as far as I can tell anyway) that question & answer do not apply (unless these Processes are exiting due to a signal somehow?).

I did some research by looking at how this is implemented in CPython. This is assumes you are running on Unix. If you are running on Windows the following might not be valid as the implementation of processes in multiprocessing differs.
It turns out that os._exit() is always called at the end of the process. That, together with the following note from the documentation for atexit, should explain why your lambda isn't running.
Note: The functions registered via this module are not called when the
program is killed by a signal not handled by Python, when a Python
fatal internal error is detected, or when os._exit() is called.
Here's an excerpt from the Popen class for CPython 2.7, used for forking processes. Note that the last statement of the forked process is a call to os._exit().
# Lib/multiprocessing/forking.py
class Popen(object):
def __init__(self, process_obj):
sys.stdout.flush()
sys.stderr.flush()
self.returncode = None
self.pid = os.fork()
if self.pid == 0:
if 'random' in sys.modules:
import random
random.seed()
code = process_obj._bootstrap()
sys.stdout.flush()
sys.stderr.flush()
os._exit(code)
In Python 3.4, the os._exit() is still there if you are starting a forking process, which is the default. But it seems like you can change it, see Contexts and start methods for more information. I haven't tried it, but perhaps using a start method of spawn would work? Not available for Python 2.7 though.

Python avoid orphan processes

I'm using python to benchmark something. This can take a large amount of time, and I want to set a (global) timeout. I use the following script (summarized):
class TimeoutException(Exception):
pass
def timeout_handler(signum, frame):
raise TimeoutException()
# Halt problem after half an hour
signal.alarm(1800)
try:
while solution is None:
guess = guess()
try:
with open(solutionfname, 'wb') as solutionf:
solverprocess = subprocess.Popen(["solver", problemfname], stdout=solutionf)
solverprocess.wait()
finally:
# `solverprocess.poll() == None` instead of try didn't work either
try:
solverprocess.kill()
except:
# Solver process was already dead
pass
except TimeoutException:
pass
# Cancel alarm if it's still active
signal.alarm(0)
However it keeps spawning orphan processes sometimes, but I can't reliably recreate the circumstances. Does anyone know what the correct way to prevent this is?

You simply have to wait after killing the process.

The documentation for the kill() method states:
Kills the child. On Posix OSs the function sends SIGKILL to the child.
On Windows kill() is an alias for terminate().
In other words, if you aren't on Windows, you are only sending a signal to the subprocess.
This will create a zombie process because the parent process didn't read the return value of the subprocess.
The kill() and terminate() methods are just shortcuts to send_signal(SIGKILL) and send_signal(SIGTERM).
Try adding a call to wait() after the kill(). This is even shown in the example under the documentation for communicate():
proc = subprocess.Popen(...)
try:
outs, errs = proc.communicate(timeout=15)
except TimeoutExpired:
proc.kill()
outs, errs = proc.communicate()
note the call to communicate() after the kill(). (It is equivalent to calling wait() and also erading the outputs of the subprocess).
I want to clarify one thing: it seems like you don't understand exactly what a zombie process is. A zombie process is a terminated process. The kernel keeps the process in the process table until the parent process reads its exit status. I believe all memory used by the subprocess is actually reused; the kernel only has to keep track of the exit status of such a process.
So, the zombie processes you see aren't running. They are already completely dead, and that's why they are called zombie. They are "alive" in the process table, but aren't really running at all.
Calling wait() does exactly this: wait till the subprocess ends and read the exit status. This allows the kernel to remove the subprocess from the process table.

On linux, you can use python-prctl.
Define a preexec function such as:
def pre_exec():
import signal
prctl.set_pdeathsig(signal.SIGTERM)
And have your Popen call pass it.
subprocess.Popen(..., preexec_fn=pre_exec)
That's as simple as that. Now the child process will die rather than become orphan if the parent dies.
If you don't like the external dependency of python-prctl you can also use the older prctl. Instead of
prctl.set_pdeathsig(signal.SIGTERM)
you would have
prctl.prctl(prctl.PDEATHSIG, signal.SIGTERM)

Is there a way to make os.killpg not kill the script that calls it?

I have a subprocess which I open, which calls other processes.
I use os.killpg(os.getpgid(subOut.pid), signal.SIGTERM) to kill the entire group, but this kills the python script as well. Even when I call a python script with os.killpg from a second python script, this kills the second script as well. Is there a way to make os.killpg not stop the script?
Another solution would be to individually kill every child 1process. However, even using
p = psutil.Process(subOut.pid)
child_pid = p.children(recursive=True)
for pid in child_pid:
os.kill(pid.pid, signal.SIGTERM)
does not correctly give me all the pids of the children.
And you know what they say... don't kill the script that calls you...

A bit late to answer, but since google took me here while looking for a related problem: the reason your script gets killed is because its children will, by default, inherit its group id. But you can tell subprocess.Popen to create a new process group for your subprocess. Though it's a bit tricky: you have to pass in os.setpgrp for the preexec_fn parameter. This will call setpgrp (without any arguments) in the newly created (forked) process (before that does the exec) which will set the gid of the new process to the pid of the new process (thus creating a new group). The documentation mentions that it can deadlock in multi-threaded code. As an alternative, you can use start_new_session=True, but that would create not only a new process group but a new session. (And that would mean that if you close your terminal session while your script is running, the children would not be terminated. It may or may not be a problem.)
As a side note, if you are on windows, you can simply pass subprocess.CREATE_NEW_PROCESS_GROUP in the creationflag parameter.
Here is what it looks like in detail:
subOut = subprocess.Popen(['your', 'subprocess', ...], preexec_fn=os.setpgrp)
# when it's time to kill
os.killpg(os.getpgid(subOut.pid), signal.SIGTERM)

Create a process group having all the immediate children of the called process as follows:
p1 = subprocess.Popen(cmd1)
os.setpgrp(p1.pid, 0) #It will create process group with id same as p1.pid
p2 = subprocess.Popen(cmd2)
os.setpgrp(p2.pid, os.getpgid(p1.pid))
pn = subprocess.Popen(cmdn)
os.setpgrp(pn.pid, os.getpgid(p1.pid))
#Kill all the children and their process tree using following command
os.killpg(os.getpgid(p1.pid), signal.SIGKILL)
It will kill whole process tree except its own process.

atleta's answer above worked for me but the preexec_fn argument in the call to Popen should be setpgrp, rather than setgrp:
subOut = subprocess.Popen(['your', 'subprocess', ...], preexec_fn=os.setpgrp)
I'm posting this as an answer instead of a comment on atleta's answer because I don't have comment privileges yet.

Easy way is to set the parent process to ignore the signal before sending it.
# Tell this (parent) process to ignore the signal
old_handler = signal.signal(sig, signal.SIG_IGN)
# Send the signal to our process group and
# wait for them all to exit.
os.killpg(os.getpgid(0), sig)
while os.wait() != -1:
pass
# Restore the handler
signal.signal(sig, old_handler)

Killing child process when parent crashes in python

I am trying to write a python program to test a server written in C. The python program launches the compiled server using the subprocess module:
pid = subprocess.Popen(args.server_file_path).pid
This works fine, however if the python program terminates unexpectedly due to an error, the spawned process is left running. I need a way to ensure that if the python program exits unexpectedly, the server process is killed as well.
Some more details:
Linux or OSX operating systems only
Server code can not be modified in any way

I would atexit.register a function to terminate the process:
import atexit
process = subprocess.Popen(args.server_file_path)
atexit.register(process.terminate)
pid = process.pid
Or maybe:
import atexit
process = subprocess.Popen(args.server_file_path)
#atexit.register
def kill_process():
try:
process.terminate()
except OSError:
pass #ignore the error. The OSError doesn't seem to be documented(?)
#as such, it *might* be better to process.poll() and check for
#`None` (meaning the process is still running), but that
#introduces a race condition. I'm not sure which is better,
#hopefully someone that knows more about this than I do can
#comment.
pid = process.pid
Note that this doesn't help you if you do something nasty to cause python to die in a non-graceful way (e.g. via os._exit or if you cause a SegmentationFault or BusError)

Python: How to determine subprocess children have all finished running

I am trying to detect when an installation program finishes executing from within a Python script. Specifically, the application is the Oracle 10gR2 Database. Currently I am using the subprocess module with Popen. Ideally, I would simply use the wait() method to wait for the installation to finish executing, however, the documented command actually spawns child processes to handle the actual installation. Here is some sample code of the failing code:
import subprocess
OUI_DATABASE_10GR2_SUBPROCESS = ['sudo',
'-u',
'oracle',
os.path.join(DATABASE_10GR2_TMP_PATH,
'database',
'runInstaller'),
'-ignoreSysPrereqs',
'-silent',
'-noconfig',
'-responseFile '+ORACLE_DATABASE_10GR2_SILENT_RESPONSE]
oracle_subprocess = subprocess.Popen(OUI_DATABASE_10GR2_SUBPROCESS)
oracle_subprocess.wait()
There is a similar question here: Killing a subprocess including its children from python, but the selected answer does not address the children issue, instead it instructs the user to call directly the application to wait for. I am looking for a specific solution that will wait for all children of the subprocess. What if there are an unknown number of subprocesses? I will select the answer that addresses the issue of waiting for all children subprocesses to finish.
More clarity on failure: The child processes continue executing after the wait() command since that command only waits for the top level process (in this case it is 'sudo'). Here is a simple diagram of the known child processes in this problem:
Python subprocess module -> Sudo -> runInstaller -> java -> (unknown)

Ok, here is a trick that will work only under Unix. It is similar to one of the answers to this question: Ensuring subprocesses are dead on exiting Python program. The idea is to create a new process group. You can then wait for all processes in the group to terminate.
pid = os.fork()
if pid == 0:
os.setpgrp()
oracle_subprocess = subprocess.Popen(OUI_DATABASE_10GR2_SUBPROCESS)
oracle_subprocess.wait()
os._exit(0)
else:
os.waitpid(-pid)
I have not tested this. It creates an extra subprocess to be the leader of the process group, but avoiding that is (I think) quite a bit more complicated.
I found this web page to be helpful as well. http://code.activestate.com/recipes/278731-creating-a-daemon-the-python-way/

You can just use os.waitpid with the the pid set to -1, this will wait for all the subprocess of the current process until they finish:
import os
import sys
import subprocess
proc = subprocess.Popen([sys.executable,
'-c',
'import subprocess;'
'subprocess.Popen("sleep 5", shell=True).wait()'])
pid, status = os.waitpid(-1, 0)
print pid, status
This is the result of pstree <pid> of different subprocess forked:
python───python───sh───sleep
Hope this can help :)

Check out the following link http://www.oracle-wiki.net/startdocsruninstaller which details a flag you can use for the runInstaller command.
This flag is definitely available for 11gR2, but I have not got a 10g database to try out this flag for the runInstaller packaged with that version.
Regards

Everywhere I look seems to say it's not possible to solve this in the general case. I've whipped up a library called 'pidmon' that combines some answers for Windows and Linux and might do what you need.
I'm planning to clean this up and put it on github, possibly called 'pidmon' or something like that. I'll post a link if/when I get it up.
EDIT: It's available at http://github.com/dbarnett/python-pidmon.
I made a special waitpid function that accepts a graft_func argument so that you can loosely define what sort of processes you want to wait for when they're not direct children:
import pidmon
pidmon.waitpid(oracle_subprocess.pid, recursive=True,
graft_func=(lambda p: p.name == '???' and p.parent.pid == ???))
or, as a shotgun approach, to just wait for any processes started since the call to waitpid to stop again, do:
import pidmon
pidmon.waitpid(oracle_subprocess.pid, graft_func=(lambda p: True))
Note that this is still barely tested on Windows and seems very slow on Windows (but did I mention it's on github where it's easy to fork?). This should at least get you started, and if it works at all for you, I have plenty of ideas on how to optimize it.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.