Pylons and zombie processes

Pylons and zombie processes - python

I'm trying to write an application that will allow the user to start long-running calculation processes (a few hours, for example). To do so, I use Python Popen() function. As long as the main Pylons process works fine, everything is good, but when I restart the Pylons process, it doesn't respond to any requests if there are any zombie processes left from the previous paster launch.
What could be the origin or a workaround for this problem?
Thanks in advance, Ivan.

To avoid zombie processes, the child must do a double fork to detach itself from the controlling process. See http://en.wikipedia.org/wiki/Zombie_process
So all you need to do is make your child process fork again - while being careful to keep the relevant file handles open so that you can still communicate.

You need some kind of message passing. This maybe done by installing a signal handler. Python has the signal module for this and Popen has a send_signal method.
Maybe http://www.doughellmann.com/PyMOTW/subprocess/#signaling-between-processes helps you too.

Related

How can one maintain communication with a child process if the parent process restarts?

Im writing a simple job scheduler and I need to maintain some basic level of communication from the scheduler ( parent ) to spawned processes ( children ).
Basically I want to be able to send signals from scheduler -> child and also inform the the scheduler of a child's exit status.
Generally this is all fine and dandy using regular subprocess Popen with signaling / waitpid. My issues is that I want to be able to restart the scheduler without impacting running jobs, and re-establish comm once the scheduler restarts.
Ive gone through a few ideas, nothing feels "right".
Right now I have a simple wrapper script that maintains communication with the scheduler using named pipes, and the wrapper will actually run the scheduled command. This seems to work well in most cases but I have seen it subject to some races, mainly due to how namedpipes need the connection to be alive in order to read/write ( no buffering ).
Im considering trying linux message queues instead, but not sure if that's the right path. Another option is setup a server socket accept on the scheduler and have sub-processes connect to the scheduler when they start, and re-establish the connection if it breaks. This might be overkill, but maybe the most robust.
Its a shame I have to go through all this trouble since I cant just "re-attached" the child process on restart ( I realize there are reasons for this ).
Any other thoughts? This seems like a common problem any scheduler would hit, am I missing some obvious solution here?

What happens to running threads after forking?

I'm using OpenERP, a Python based ERP, which uses different threads (one-thread per client, etc). I would like to use multiprocessing.Process() to fork() and call a long-running method.
My question is: what will happen to the parent's threads? Will they be copied and continue to run? Will the child process call accept() on the server socket?
Thanks for your answers,

Forking does not copy threads, only the main one. So be very careful with forking multithreaded application as it can cause unpredictable side-effects (e.g when forking happened while some thread was executing in a mutexed critical section), something really can be broken in your forked process unless you know the code you're forking ideally.
Though everything that I said above is true, there's a workaround (at least on Linux) called pthread_atfork() which acts as a callback when a process was forked (you can recreate all needed threads). Though it applies to C applications, it's not applied to Python ones.
For further information you can refer to:
Python issue tracker on this problem - http://bugs.python.org/issue6923
Seek around the web on similar ideas implementation, for example: http://code.google.com/p/python-atfork/

Python Multiprocessing respawn crashed processes

I want to create some worker processes and if they crash due to an exception, I would like them to respawn. Aside from the is_alive method in the multiprocessing module, I can't seem to find a way to do this.
This would require me to iterate over all the running processes (after a sleep) and check if they are alive. This is essentially a busy loop, I was wondering if there was a better solution that will wake up my program in the event that any one of my worker processes has crashed. Once it wakes up, I would like to log th exception that crashed my program and launch another process.

Polling to see if the child processes are alive should work fine, since it's a low-overhead check and you don't need to check that often.
The first answer to this (similar) question has a Python code example: Multi-server monitor/auto restarter in python

You can wrap your worker processes in try/except blocks where the except pushes a message onto a pipe before raising. Of course, polling isn't really worse than this and it's simpler.

If you're on a unix-like system, your main program can be notified of dead children by installing a signal handler. Look up your operating system's documentation on signal(), especially SIGCHLD. I'm afraid I don't remember whether Windows covers SIGCHLD with its very limited POSIX signal support.

Twisted network client with multiprocessing workers?

So, I've got an application that uses Twisted + Stomper as a STOMP client which farms out work to a multiprocessing.Pool of workers.
This appears to work ok when I just use a python script to fire this up, which (simplified) looks something like this:
# stompclient.py
logging.config.fileConfig(config_path)
logger = logging.getLogger(__name__)
# Add observer to make Twisted log via python
twisted.python.log.PythonLoggingObserver().start()
# initialize the process pool. (child processes get forked off immediately)
pool = multiprocessing.Pool(processes=processes)
StompClientFactory.username = username
StompClientFactory.password = password
StompClientFactory.destination = destination
reactor.connectTCP(host, port, StompClientFactory())
reactor.run()
As this gets packaged for deployment, I thought I would take advantage of the twistd script and run this from a tac file.
Here's my very-similar-looking tac file:
# stompclient.tac
logging.config.fileConfig(config_path)
logger = logging.getLogger(__name__)
# Add observer to make Twisted log via python
twisted.python.log.PythonLoggingObserver().start()
# initialize the process pool. (child processes get forked off immediately)
pool = multiprocessing.Pool(processes=processes)
StompClientFactory.username = username
StompClientFactory.password = password
StompClientFactory.destination = destination
application = service.Application('myapp')
service = internet.TCPClient(host, port, StompClientFactory())
service.setServiceParent(application)
For the sake of illustration, I have collapsed or changed a few details; hopefully they were not the essence of the problem. For example, my app has a plugin system, the pool is initialized by a separate method, and then work is delegated to the pool using pool.apply_async() passing one of my plugin's process() methods.
So, if I run the script (stompclient.py), everything works as expected.
It also appears to work OK if I run twist in non-daemon mode (-n):
twistd -noy stompclient.tac
however, it does not work when I run in daemon mode:
twistd -oy stompclient.tac
The application appears to start up OK, but when it attempts to fork off work, it just hangs. By "hangs", I mean that it appears that the child process is never asked to do anything and the parent (that called pool.apply_async()) just sits there waiting for the response to return.
I'm sure that I'm doing something stupid with Twisted + multiprocessing, but I'm really hoping that someone can explain to my the flaw in my approach.
Thanks in advance!

Since the difference between your working invocation and your non-working invocation is only the "-n" option, it seems most likely that the problem is caused by the daemonization process (which "-n" prevents from happening).
On POSIX, one of the steps involved in daemonization is forking and having the parent exit. Among of things, this has the consequence of having your code run in a different process than the one in which the .tac file was evaluated. This also re-arranges the child/parent relationship of processes which were started in the .tac file - as your pool of multiprocessing processes were.
The multiprocessing pool's processes start off with a parent of the twistd process you start. However, when that process exits as part of daemonization, their parent becomes the system init process. This may cause some problems, although probably not the hanging problem you described. There are probably other similarly low-level implementation details which normally allow the multiprocessing module to work but which are disrupted by the daemonization process.
Fortunately, avoiding this strange interaction should be straightforward. Twisted's service APIs allow you to run code after daemonization has completed. If you use these APIs, then you can delay the initialization of the multiprocessing module's process pool until after daemonization and hopefully avoid the problem. Here's an example of what that might look like:
from twisted.application.service import Service
class MultiprocessingService(Service):
def startService(self):
self.pool = multiprocessing.Pool(processes=processes)
MultiprocessingService().setServiceParent(application)
Now, separately, you may also run into problems relating to clean up of the multiprocessing module's child processes, or possibly issues with processes created with Twisted's process creation API, reactor.spawnProcess. This is because part of dealing with child processes correctly generally involves handling the SIGCHLD signal. Twisted and multiprocessing aren't going to be cooperating in this regard, though, so one of them is going to get notified of all children exiting and the other will never be notified. If you don't use Twisted's API for creating child processes at all, then this may be okay for you - but you might want to check to make sure any signal handler the multiprocessing module tries to install actually "wins" and doesn't get replaced by Twisted's own handler.

A possible idea for you...
When running in daemon mode twistd will close stdin, stdout and stderr. Does something that your clients do read or write to these?

Interactive Python GUI

Python have been really bumpy for me, because the last time I created a GUI client, the client seems to hang when spawning a process, calling a shell script, and calling outside application.
This have been my major problem with Python since then, and now I'm in a new project, can someone give me pointers, and a word of advice in order for my GUI python application to still be interactive when spawning another process?

Simplest (not necessarily "best" in an abstract sense): spawn the subprocess in a separate thread, communicating results back to the main thread via a Queue.Queue instance -- the main thread must periodically check that queue to see if the results have arrived yet, but periodic polling isn't hard to arrange in any event loop.

Your main GUI thread will freeze if you spawn off a process and wait for it to completely. Often, you can simply use subprocess and poll it now and then for completion rather than waiting for it to finish. This will keep your GUI from freezing.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.