Im writing a simple job scheduler and I need to maintain some basic level of communication from the scheduler ( parent ) to spawned processes ( children ).
Basically I want to be able to send signals from scheduler -> child and also inform the the scheduler of a child's exit status.
Generally this is all fine and dandy using regular subprocess Popen with signaling / waitpid. My issues is that I want to be able to restart the scheduler without impacting running jobs, and re-establish comm once the scheduler restarts.
Ive gone through a few ideas, nothing feels "right".
Right now I have a simple wrapper script that maintains communication with the scheduler using named pipes, and the wrapper will actually run the scheduled command. This seems to work well in most cases but I have seen it subject to some races, mainly due to how namedpipes need the connection to be alive in order to read/write ( no buffering ).
Im considering trying linux message queues instead, but not sure if that's the right path. Another option is setup a server socket accept on the scheduler and have sub-processes connect to the scheduler when they start, and re-establish the connection if it breaks. This might be overkill, but maybe the most robust.
Its a shame I have to go through all this trouble since I cant just "re-attached" the child process on restart ( I realize there are reasons for this ).
Any other thoughts? This seems like a common problem any scheduler would hit, am I missing some obvious solution here?
I have a python script which attempts to communicate with a python daemon. When the original script is invoked, it checks to see if the daemon exists. If the daemon exists, the original script writes to a named pipe to communicate with the daemon. If the daemon does not exists, the original script attempts to create a daemon using DaemonContext and then writes to the named pipe.
Pseudo-code of the original script:
from daemon import DaemonContext
if daemon_exists():
pass
else:
with DaemonContext():
create_daemon()
communicate_with_daemon()
The problem is that when the daemon is created, the parent process is killed (i.e. communicate_with_daemon will never be executed). This prevents the original script from creating a daemon and communicating with it.
According to this answer, this problem is a limitation of the python-daemon library. How would I get around this?
Thanks.
You're describing not a limitation, but a definition of how a daemon process works.
[…] the parent process is killed (i.e. communicate_with_daemon will never be executed).
Yes, that's right; the daemon process detaches from what started it. That's what makes the process a daemon.
However, this statement is not true:
This prevents the original script from creating a daemon and communicating with it.
There are numerous other ways to communicate between processes. The general name for this is Inter-Process Communication. The solutions are many, and which you choose depends on the constraints of your application.
For example, you could open a socket at a known path and preserve that open file; you could open a network port and communicate through the loopback interface; you could make a "drop-box" communication at a file on the local filesystem store, a database, or otherwise; etc.
i see a lot of examples of how to use multiprocessing but they all talk about spawning workers and controlling them while the main process is alive. my question is how to control background workers in the following way:
start 5 worker from command line:
manager.py --start 5
after that, i will be able to list and stop workers on demand from command line:
manager.py --start 1 #will add 1 more worker
manager.py --list
manager.py --stop 2
manager.py --sendmessagetoall "hello"
manager.py --stopall
the important point is that manager.py should exit after every run. what i don't understand is how to get a list of already running workers from an newly created manager.py program and communicate with them.
edit: Bilkokuya suggested that i will have (1)a manager process that manage a list of workers... and will also listen to incoming commands. and (2) a small command line tool that will send messages to the first manager process... actually it sounds like a good solution. but still, the question remains the same - how do i communicate with another process on a newly created command line program (process 2)? all the examples i see (of Queue for example) works only when both processes are running all the time
The most portable solution I can suggest (although this will still involve further research for you), is to have a long-running process that manages the "background worker" processes. This shouldn't ever be killed off, as it handles the logic for piping messages to each sub process.
Manager.py can then implement logic to create communication to that long-running process (whether that's via pipes, sockets, HTTP or any other method you like). So manager.py effectively just passes on a message to the 'server' process "hey please stop all the child processes" or "please send a message to process 10" etc.
There is a lot of work involved in this, and a lot to research. But the main thing you'll want to look up is how to handle IPC (Inter-Process Communication). This will allow your Manager.py script to interact with an existing/long-running process that can better manage each background worker.
The alternative is to rely fully on your operating system's process management APIs. But I'd suggest from experience that this is a much more error prone and troublesome solution.
I'm designing a long running process, triggered by a Django management command, that needs to run on a fairly frequent basis. This process is supposed to run every 5 min via a cron job, but I want to prevent it from running a second instance of the process in the rare case that the first takes longer than 5 min.
I've thought about creating a touch file that gets created when the management process starts and is removed when the process ends. A second management command process would then check to make sure the touch file didn't exist before running. But that seems like a problem if a process dies abruptly without properly removing the touch file. It seems like there's got to be a better way to do that check.
Does anyone know any good tools or patterns to help solve this type of issue?
For this reason I prefer to have a long-running process that gets its work off of a shared queue. By long-running I mean that its lifetime is longer than a single unit of work. The process is then controlled by some daemon service such as supervisord which can take over control of restarting the process when it crashes. This delegates the work appropriately to something that knows how to manage process lifecycles and frees you from having to worry about the nitty gritty of posix processes in the scope of your script.
If you have a queue, you also have the luxury of being able to spin up multiple processes that can each take jobs off of the queue and process them, but that sounds like it's out of scope of your problem.
I'm trying to create a program that starts a process pool of, say, 5 processes, performs some operation, and then quits, but leaves the 5 processes open. Later the user can run the program again, and instead of it starting new processes it uses the existing 5. Basically it's a producer-consumer model where:
The number of producers varies.
The number of consumers is constant.
The producers can be started at different times by different programs or even different users.
I'm using the builtin multiprocessing module, currently in Python 2.6.4., but with the intent to move to 3.1.1 eventually.
Here's a basic usage scenario:
Beginning state - no processes running.
User starts program.py operation - one producer, five consumers running.
Operation completes - five consumers running.
User starts program.py operation - one producer, five consumers running.
User starts program.py operation - two producers, five consumers running.
Operation completes - one producer, five consumers running.
Operation completes - five consumers running.
User starts program.py stop and it completes - no processes running.
User starts program.py start and it completes - five consumers running.
User starts program.py operation - one procucer, five consumers running.
Operation completes - five consumers running.
User starts program.py stop and it completes - no processes running.
The problem I have is that I don't know where to start on:
Detecting that the consumer processes are running.
Gaining access to them from a previously unrelated program.
Doing 1 and 2 in a cross-platform way.
Once I can do that, I know how to manage the processes. There has to be some reliable way to detect existing processes since I've seen Firefox do this to prevent multiple instances of Firefox from running, but I have no idea how to do that in Python.
There are a couple of common ways to do your item #1 (detecting running processes), but to use them would first require that you slightly tweak your mental picture of how these background processes are started by the first invocation of the program.
Think of the first program not as starting the five processes and then exiting, but rather as detecting that it is the first instance started and not exiting. It can create a file lock (one of the common approaches for preventing multiple occurrences of an application from running), or merely bind to some socket (another common approach). Either approach will raise an exception in a second instance, which then knows that it is not the first and can refocus its attention on contacting the first instance.
If you're using multiprocessing, you should be able simply to use the Manager support, which involves binding to a socket to act as a server.
The first program starts the processes, creates Queues, proxies, or whatever. It creates a Manager to allow access to them, possibly allowing remote access.
Subsequent invocations first attempt to contact said server/Manager on the predefined socket (or using other techniques to discover the socket it's on). Instead of doing a server_forever() call they connect() and communicate using the usual multiprocessing mechanisms.
Take a look at these different Service Discovery mechanisms: http://en.wikipedia.org/wiki/Service_discovery
The basic idea is that the consumers would each register a service when they start. The producer would go through the discovery process when starting. If it finds the consumers, it binds to them. If it doesn't find them it starts up new consumers. In most all of these systems, services can typically also publish properties, so you can have each consumer uniquely identify itself and give other information to the discovering producer.
Bonjour/zeroconf is pretty well supported cross-platform. You can even configure Safari to show you the zeroconf services on your local network, so you can use that to debug the service advertisement for the consumers. One side advantage of this kind of approach is that you could easily run the producers on different machines than the consumers.
You need a client-server model on a local system. You could do this using TCP/IP sockets to communicate between your clients and servers, but it's faster to use local named pipes if you don't have the need to communicate over a network.
The basic requirements for you if I understood correctly are these:
1. A producer should be able to spawn consumers if none exist already.
2. A producer should be able to communicate with consumers.
3. A producer should be able to find pre-existing consumers and communicate with them.
4. Even if a producer completes, consumers should continue running.
5. More than one producer should be able to communicate with the consumers.
Let's tackle each one of these one by one:
(1) is a simple process-creation problem, except that consumer (child) processes should continue running, even if the producer (parent) exits. See (4) below.
(2) A producer can communicate with consumers using named pipes. See os.mkfifo() and unix man page of mkfifo() to create named pipes.
(3) You need to create named pipes from the consumer processes in a well known path, when they start running. The producer can find out if any consumers are running by looking for this well-known pipe(s) in the same location. If the pipe(s) do not exist, no consumers are running, and the producers can spawn these.
(4) You'll need to use os.setuid() for this, and make the consumer processes act like a daemon. See unix man page of setsid().
(5) This one is tricky. Multiple producers can communicate with the consumers using the same named pipe, but you cannot transfer more than "PIPE_BUF" amount of data from the producer to the consumer, if you want to reliably identify which producer sent the data, or if you want to prevent some kind of interleaving of data from different producers.
A better way to do (5) is to have the consumers open a "control" named pipe (/tmp/control.3456, 3456 being the consumer pid) on execution. Producers first set up a communication channel using the "control" pipe. When a producer connects, it sends its pid say "1234", to the consumer on the "control" pipe, which tells the consumer to create a named pipe for data exchange with the producer, say "/tmp/data.1234". Then the producer closes the "control" pipe, and opens "/tmp/data.1234" to communicate with the consumer. Each consumer can have its own "control" pipes (use the consumer pids to distinguish between pipes of different consumers), and each producer gets its own "data" pipe.. When a producer finishes, it should clean up its data pipe or tell the consumer to do so. Similarly, when the consumer finishes, it should clean up its control pipes.
A difficulty here is to prevent multiple producers from connecting to the control pipes of a single consumer at the same time. The "control" pipe here is a shared resource and you need to synchronize between different producers to access it. Use semaphores for it or file locking. See the posix_ipc python module for this.
Note: I have described most of the above in terms of general UNIX semantics, but all you really need is the ability to create daemon processes, ability to create "named" pipes/queues/whatever so that they can be found by an unrelated process, and ability to synchronize between unrelated processes. You can use any python module which provides such semantics.