Automatically restarting a child python process in kubernetes - python

I have a Python application which runs as the main process in a kubernetes pod, and this process kicks off some child processes to long poll a list of SQS queues (1 process per queue). Occasionally, one of the processes becomes a zombie and stops processing, and hangs up all other processes too, including the parent.
if __name__ == '__main__':
PROCESSES = []
for queue, module in qfmapper.items():
PROCESSES.append(Process(target=poll_for_messages, args=(queue,module)))
for process in PROCESSES:
process.start()
for process in PROCESSES:
process.join()
I've tried handling the SIGCHLD signal in the parent before it kicks off the children, but that doesn't seem to kill the parent if one of the children are killed. I know this leaves behind other child processes, but since kubernetes kills the pod if PID 1 dies, it shouldn't matter. This however doesn't seem to work, as the parent doesn't react to it. I'm assuming this is because process.join() blocks the parent.
So I've tried replacing individual Process calls with a Pool:
with contextlib.closing(mp.Pool(len(qfmapper))) as pool:
for queue, module in qfmapper.items():
pool.apply_async(poll_for_messages, args=(queue, module))
pool.close()
pool.join()
This again kicks off the polling processes as expected, but killing one doesn't seem to get replaced with the same call again. It spins up another worker to maintain the Pool, but it doesn't kick it off with the same arguments that the original apply_async call does.
I also tried using map, and that does restart the process if killed, but doesn't loop through all of the queues in my list; it just does the first one in the list multiple times. I've also tried starmap, and just used the for loop to build a list of iterables, but again that doesn't recover if one of the workers is killed.
So, ultimately, this comes back to the title of this question. How do you automatically restart a process that has died / been killed? I've searched high and low and I can't seem to find any answers for what seems to me like a "normal" thing to want to do. This is all running on Python 3.7.3, but I can upgrade to 3.8 if it has any features worth using to resolve this issue.

Related

Python thread or process model where child thread or process can survive parent?

This is a design question in reference to python scripting in using threads versus multi-processes. As I understand it, spawning a thread using the threading module cannot survive termination of the the parent thread, i.e. process. The parent thread must either do a join (i.e. wait timeout not withstanding) or exit, if no join, on parent exit the child threads are terminated. This is due to the shared resources model of threads, right?
Whereas the multiprocessing module when a process is spawned it can survive, i.e. continue to completion, regardless if the parent process which created it exits or terminates. This assumes of course that the parent process never called a join for the child process to complete.
Both, threading and multiprocessing are designed to achieve parallelism within a program. Their goal is not to launch independent processes. Hence both packages implicitly terminate their parallel execution paths during preparation for interpreter shutdown.
Threads are subsets of processes, they cannot outlive the process that created them.
Active non daemonic threads are implicitly joined upon interpreter shutdown using the function _shutdown() in the threading module. This function is called during the finalization routine in the Python interpreter lifecycle.Daemonic threads simply end with the interpreter process.
If processes, created via multiprocessing, are still alive when the interpreter prepares to shut down, they are terminated by the _exit_function(), that has been registered as exit handler via atexit. Similar to threading, multiprocessing joins non daemonic child processes; on daemonic childs, terminate() is called.
If you want to launch processes from a Python program and have that program exit afterwards, use subprocess.Popen. If you are on a POSIX platform, you might also want to take a look at python-daemon.

Do processes need to be stopped manually

I'm new to multiprocessing in Python so I'm in doubt. My first idea was to use threads, but then I read about GIL and moved to multiprocessing.
My question is, when I start a process like this:
t1 = Process(target=run, args=lot)
t1.start()
do I need to stop it somehow from the main process, or they shutdown when the run() method is finished?
I know that things like join() exist, but I'm scheduling a job every n minutes and start a couple of processes in parallel, and this procedure goes until stopped, so I don't really need to wait for processes to finish.
Yes when t1.start() happens it executes the method which is specified in target(i.e run). Once its completed it exits automatically.
You can check this by checking the running process eg in linux use below command,
"ps -aux |grep python" or "ps -aux |grep "program_name.py"
when your target is running count will be more.
To wait until a process has completed its work and exited, use the join() method. But in your case its not required
more example are here : https://pymotw.com/2/multiprocessing/basics.html
Well, GIL is not a big problem when you are not doing much computation, but something like networking stuff or reading files when execution of a program is hanged and control flow is given to the krnel untill input/output operation is performed. Then another thread can run in python.
If you, owever, are bothering with more CPU-consuming stuff you actually should go for multiprocessing.
join() method is used for thread synchronization, so when main thread relies on data processed by another thread it is important to use it. Otherwise it is not. You operating system will handle things like closing child processes in a safe manner.
EDIT: check this discussion for more details.

Does a process always need to be terminated?

I am using a Python process to run one of my functions like so:
Process1 = Process(target = someFunction)
Process1.start()
Now that function has no looping or anything, it just does its thing, then ends, does the Process die with it? or do I always need to drop a:
Process1.terminate()
Afterwards?
The child process will exit by itself - the Process1.terminate() is unnecessary in that regard. This is especially true if using any shared resources between the child and parent process. From the Python documentation:
Avoid terminating processes
Using the Process.terminate method to stop a process is liable to cause any shared resources (such as locks, semaphores, pipes and queues) currently being used by the process to become broken or unavailable to other processes.
Therefore it is probably best to only consider using Process.terminate on processes which never use any shared resources.
However, if you want the parent process to wait for the child process to finish (perhaps the child process is modifying something that the parent will access afterwards), then you'll want to use Process1.join() to block the parent process from continuing until the child process complete. This is generally good practice when using child processes to avoid zombie processes or orphaned children.
No, as per the documentation it only sends a SIGTERM or TerminateProcess() to the process in question. If it has already exited then there is nothing to terminate.
However, it is always a good process to use exit codes in your subprocesses:
import sys
sys.exit(1)
And then check the exit code once you know the process has terminated:
if Process1.exitcode():
errorHandle()

Python multiprocessing - watchdog process?

I have a set of long-running process in a typical "pub/sub" setup with queues for communication.
I would like to do two things, and I can't figure out how to accomplish both simultaneously:
Addition/removal of workers. For example, I want to be able to add extra consumers if I see that my pending queue size has grown too large.
Watchdog for my processes - I want to be notified if any of my producers or consumers crashes.
I can do (2) in isolation:
try:
while True:
for process in workers + consumers:
if not process.is_alive():
logger.critical("%-8s%s died!", process.pid, process.name)
sleep(3)
except KeyboardInterrupt:
# Python propagates CTRL+C to all workers, no need to terminate them
logger.warn('Received CTR+C, shutting down')
The above blocks, which prevents me from doing (1).
So I decided to move the code into its own process.
This doesn't work, because process.is_alive() only works for a parent checking the status of its children. In this case, the processes I want to check would be siblings instead of children.
I'm a bit stumped on how to proceed. How can my main process support changes to subprocesses while also monitoring subprocesses?
multiprocessing.Pool actually has a watchdog built-in already. It runs a thread that checks every 0.1 seconds to see if a worker has died. If it has, it starts a new one to take its place:
def _handle_workers(pool):
thread = threading.current_thread()
# Keep maintaining workers until the cache gets drained, unless the pool
# is terminated.
while thread._state == RUN or (pool._cache and thread._state != TERMINATE):
pool._maintain_pool()
time.sleep(0.1)
# send sentinel to stop workers
pool._taskqueue.put(None)
debug('worker handler exiting')
def _maintain_pool(self):
"""Clean up any exited workers and start replacements for them.
"""
if self._join_exited_workers():
self._repopulate_pool()
This is primarily used to implement the maxtasksperchild keyword argument, and is actually problematic in some cases. If a process dies while a map or apply command is running, and that process is in the middle of handling a task associated with that call, it will never finish. See this question for more information about that behavior.
That said, if you just want to know that a process has died, you can just create a thread (not a process) that monitors the pids of all the processes in the pool, and if the pids in the list ever change, you know a process has crashed:
def monitor_pids(pool):
pids = [p.pid for p in pool._pool]
while True:
new_pids = [p.pid for p in pool._pool]
if new_pids != pids:
print("A worker died")
pids = new_pids
time.sleep(3)
Edit:
If you're rolling your own Pool implementation, you can just take a cue from multiprocessing.Pool, and run your monitoring code in a background thread in the parent process. The checks to see if the processes are still running are quick, so the time lost to the background thread taking the GIL should be negligible. Consider that the multiprocessing.Process watchdog is running every 0.1 seconds! Running yours every 3 seconds shouldn't cause any problems.

Process() called from from Pylons creates a fork

I'm trying to create a background process for some heavy calculations from the main Pylons process. Here's the code:
p = Process(target = instance_process, \
args = (instance_tuple.instance, parent_pipe, child_pipe,))
p.start()
The process is created and started, but is seems to be a fork from the main process: it is listening to the same port and the whole application hangs up. What am I doing wrong?
Thanks in advance.
Process IS a fork. If you look through it's implementation you'll find that Process.start() calls a fork. It does NOT, however, call any of the exec variations to change the execution context.
Still, this may have nothing to do with listening on the same port (unless the parent process is multi-threaded). At which point is the program hanging?
I know that when you try shutting down a python program without terminating the child process created through multiprocessing it will hang until the child process terminates.
This might be caused if, for instance, you do not close the pipe between the processes.

Categories

Resources