So, I've got an application that uses Twisted + Stomper as a STOMP client which farms out work to a multiprocessing.Pool of workers.
This appears to work ok when I just use a python script to fire this up, which (simplified) looks something like this:
# stompclient.py
logging.config.fileConfig(config_path)
logger = logging.getLogger(__name__)
# Add observer to make Twisted log via python
twisted.python.log.PythonLoggingObserver().start()
# initialize the process pool. (child processes get forked off immediately)
pool = multiprocessing.Pool(processes=processes)
StompClientFactory.username = username
StompClientFactory.password = password
StompClientFactory.destination = destination
reactor.connectTCP(host, port, StompClientFactory())
reactor.run()
As this gets packaged for deployment, I thought I would take advantage of the twistd script and run this from a tac file.
Here's my very-similar-looking tac file:
# stompclient.tac
logging.config.fileConfig(config_path)
logger = logging.getLogger(__name__)
# Add observer to make Twisted log via python
twisted.python.log.PythonLoggingObserver().start()
# initialize the process pool. (child processes get forked off immediately)
pool = multiprocessing.Pool(processes=processes)
StompClientFactory.username = username
StompClientFactory.password = password
StompClientFactory.destination = destination
application = service.Application('myapp')
service = internet.TCPClient(host, port, StompClientFactory())
service.setServiceParent(application)
For the sake of illustration, I have collapsed or changed a few details; hopefully they were not the essence of the problem. For example, my app has a plugin system, the pool is initialized by a separate method, and then work is delegated to the pool using pool.apply_async() passing one of my plugin's process() methods.
So, if I run the script (stompclient.py), everything works as expected.
It also appears to work OK if I run twist in non-daemon mode (-n):
twistd -noy stompclient.tac
however, it does not work when I run in daemon mode:
twistd -oy stompclient.tac
The application appears to start up OK, but when it attempts to fork off work, it just hangs. By "hangs", I mean that it appears that the child process is never asked to do anything and the parent (that called pool.apply_async()) just sits there waiting for the response to return.
I'm sure that I'm doing something stupid with Twisted + multiprocessing, but I'm really hoping that someone can explain to my the flaw in my approach.
Thanks in advance!
Since the difference between your working invocation and your non-working invocation is only the "-n" option, it seems most likely that the problem is caused by the daemonization process (which "-n" prevents from happening).
On POSIX, one of the steps involved in daemonization is forking and having the parent exit. Among of things, this has the consequence of having your code run in a different process than the one in which the .tac file was evaluated. This also re-arranges the child/parent relationship of processes which were started in the .tac file - as your pool of multiprocessing processes were.
The multiprocessing pool's processes start off with a parent of the twistd process you start. However, when that process exits as part of daemonization, their parent becomes the system init process. This may cause some problems, although probably not the hanging problem you described. There are probably other similarly low-level implementation details which normally allow the multiprocessing module to work but which are disrupted by the daemonization process.
Fortunately, avoiding this strange interaction should be straightforward. Twisted's service APIs allow you to run code after daemonization has completed. If you use these APIs, then you can delay the initialization of the multiprocessing module's process pool until after daemonization and hopefully avoid the problem. Here's an example of what that might look like:
from twisted.application.service import Service
class MultiprocessingService(Service):
def startService(self):
self.pool = multiprocessing.Pool(processes=processes)
MultiprocessingService().setServiceParent(application)
Now, separately, you may also run into problems relating to clean up of the multiprocessing module's child processes, or possibly issues with processes created with Twisted's process creation API, reactor.spawnProcess. This is because part of dealing with child processes correctly generally involves handling the SIGCHLD signal. Twisted and multiprocessing aren't going to be cooperating in this regard, though, so one of them is going to get notified of all children exiting and the other will never be notified. If you don't use Twisted's API for creating child processes at all, then this may be okay for you - but you might want to check to make sure any signal handler the multiprocessing module tries to install actually "wins" and doesn't get replaced by Twisted's own handler.
A possible idea for you...
When running in daemon mode twistd will close stdin, stdout and stderr. Does something that your clients do read or write to these?
Related
Im writing a simple job scheduler and I need to maintain some basic level of communication from the scheduler ( parent ) to spawned processes ( children ).
Basically I want to be able to send signals from scheduler -> child and also inform the the scheduler of a child's exit status.
Generally this is all fine and dandy using regular subprocess Popen with signaling / waitpid. My issues is that I want to be able to restart the scheduler without impacting running jobs, and re-establish comm once the scheduler restarts.
Ive gone through a few ideas, nothing feels "right".
Right now I have a simple wrapper script that maintains communication with the scheduler using named pipes, and the wrapper will actually run the scheduled command. This seems to work well in most cases but I have seen it subject to some races, mainly due to how namedpipes need the connection to be alive in order to read/write ( no buffering ).
Im considering trying linux message queues instead, but not sure if that's the right path. Another option is setup a server socket accept on the scheduler and have sub-processes connect to the scheduler when they start, and re-establish the connection if it breaks. This might be overkill, but maybe the most robust.
Its a shame I have to go through all this trouble since I cant just "re-attached" the child process on restart ( I realize there are reasons for this ).
Any other thoughts? This seems like a common problem any scheduler would hit, am I missing some obvious solution here?
I have a python script which attempts to communicate with a python daemon. When the original script is invoked, it checks to see if the daemon exists. If the daemon exists, the original script writes to a named pipe to communicate with the daemon. If the daemon does not exists, the original script attempts to create a daemon using DaemonContext and then writes to the named pipe.
Pseudo-code of the original script:
from daemon import DaemonContext
if daemon_exists():
pass
else:
with DaemonContext():
create_daemon()
communicate_with_daemon()
The problem is that when the daemon is created, the parent process is killed (i.e. communicate_with_daemon will never be executed). This prevents the original script from creating a daemon and communicating with it.
According to this answer, this problem is a limitation of the python-daemon library. How would I get around this?
Thanks.
You're describing not a limitation, but a definition of how a daemon process works.
[…] the parent process is killed (i.e. communicate_with_daemon will never be executed).
Yes, that's right; the daemon process detaches from what started it. That's what makes the process a daemon.
However, this statement is not true:
This prevents the original script from creating a daemon and communicating with it.
There are numerous other ways to communicate between processes. The general name for this is Inter-Process Communication. The solutions are many, and which you choose depends on the constraints of your application.
For example, you could open a socket at a known path and preserve that open file; you could open a network port and communicate through the loopback interface; you could make a "drop-box" communication at a file on the local filesystem store, a database, or otherwise; etc.
I have a program (say, "prog") written in C that makes many numerical operations. I want to write a "driver" utility in python that runs the "prog" with different configurations in a parallel way, reads its outputs and logs them. There are several issues to take into account:
All sort of things can go bad any time so logging has to be done as soon as possible after any prog instance finishes.
Several progs can finish simultaneously so logging should be done centralized
workers may be killed somehow and driver has to handle that situation properly
all workers and logger must be terminated correctly without tons of backtraces when KeyboardInterrupt is handled
The first two points make me think that all workers have to send their results to some centralized logger worker through for example multiprocessing.Queue. But it seems that the third point makes this solution a bad one because if a worker is killed the queue is going to become corrupted. So the Queue is not suitable. Instead I can use multiple process to process pipes (i.e. every worker is connected through the pipe with a logger). But then the other problems raise:
reading from pipe is a blocking operation so one logger can't read asynchronously from several workers (use threads?)
if a worker is killed and a pipe is corrupted, how the logger can diagnose this?
P.S. point #4 seems to be solveable -- a have to
disable default SIGINT handling in all workers and logger;
add try except block to main process that makes pool.terminate();pool.join() calls in case of SIGINT exception handled.
Could you please suggest a better design approach if possible and if not than how to tackle the problems described above?
P.S. python 2.7
You can start from the answer given here: https://stackoverflow.com/a/23369802/4323
The idea is to not use subprocess.call() which is blocking, but instead subprocess.Popen which is non-blocking. Set stdout of each instance to e.g. a StringIO object you create for each prog child. Spawn all the progs, wait for them, write their output. Should be not far off from the code shown above.
My setting is the following - I have a Tornado based HTTP server which is pretty much the "hello world" example:
server = tornado.httpserver.HTTPServer(app)
server.bind(8888)
server.start(2)
tornado.ioloop.IOLoop.instance().start()
Now in this setting, I also have another process (let's call it control process) spawned by the root process. Thus this control process is a sibling to the two Tornado handler processes. Naturally I can communicate between the handler processes and the control process through a pipe created by the root process. I however, am more interested in calling a method of the control process and getting it's output.
What is the best approach to do such a thing? If I use the pipe for sending the request from a handler to the control and return the result, should I use a lock to implement process-safety?
You don't need a lock with pipes. The pipe is its own synchronization. Or, put a different way, the two sides of the pipe are separate objects.
(Of course the control process may need a lock internally, e.g., if it's handling the pipe from a different thread than the main event loop and needs to share any data with code that runs in the main loop, but that's not related to inter-process safety.)
Anyway, if you step back and think about this from a higher level, what you're implementing is the exact definition of an RPC mechanism. If what you're doing is simple enough, implementing it from scratch this way is fine, but if not, you may want to have add another protocol to control and let Tornado manage it along with your existing protocol(s).
I'm trying to write an application that will allow the user to start long-running calculation processes (a few hours, for example). To do so, I use Python Popen() function. As long as the main Pylons process works fine, everything is good, but when I restart the Pylons process, it doesn't respond to any requests if there are any zombie processes left from the previous paster launch.
What could be the origin or a workaround for this problem?
Thanks in advance, Ivan.
To avoid zombie processes, the child must do a double fork to detach itself from the controlling process. See http://en.wikipedia.org/wiki/Zombie_process
So all you need to do is make your child process fork again - while being careful to keep the relevant file handles open so that you can still communicate.
You need some kind of message passing. This maybe done by installing a signal handler. Python has the signal module for this and Popen has a send_signal method.
Maybe http://www.doughellmann.com/PyMOTW/subprocess/#signaling-between-processes helps you too.