Opening and communicating with a subprocess - python

I have a subprocess that I use. I must be able to asynchronously read and write to/from this process to it's respective stdout and stdin.
How can I do this? I have looked at subprocess, but the communicate method waits for process termination (which is not what I want) and the subprocess.stdout.read method can block.
The subprocess is not a Python script but can be edited if absolutely necessary. In total I will have around 15 of these subprocesses.

Have a look how communicate is implemented.
There are essentially 2 ways to do it:
either use select() and be notified whether you can read/write,
or delegate the read and write, which both can block, to a thread, respectively.

Have you considered using some queue or NOSQL DB for inter process communication?
I would suggest you to use Redis, and read and write to different keys with your processes.

Have a look at sarge: http://sarge.readthedocs.org/en/latest/index.html
From the sarge docs:
If you want to interact with external programs from your Python applications, Sarge is a library which is intended to make your life easier than using the subprocess module in Python’s standard library.

Related

Why multiprocessing.Process doesn't have a send_signal method?

Unlike subprocess.Popen, multiprocessing.Process doesn't have a send_signal method. Why? Is there a recommended way to send signals like SIGINT to multiprocessing.Process? Should I use os.kill() for that purpose? Thanks in advance.
Your first question makes total sense.
I think that's because multiprocessing and subprocess libraries have different design goals (as explained by this answer) : the former is for making multiple Python scripts cooperate over different CPUs for achieving a common task, while the latter is for integrating external programs in your Python program. Because IPC (inter-process communication) is far easier between cooperating Python multiprocesses (there are queues and pipes, you can pass Python objects as arguments, ...) than with an external program which we can only assume to adhere to the OS interface (textual stdin/stdout, should handle signals correctly, ...).
The default way to communicate with another multiprocess is thus not an OS signal, so that it was not considered useful to integrate it.
Also remember that (C)Python is OpenSource, so you could contribute this integration yourselves.
As for your second question, there already is an answer (cf How can I send a signal from a python program?), yes :
use os.kill()

Python subprocess signal

I would like to establish a very simple communication between two python scripts. I have decided that the best way to communicate and to have both scripts read from a text file. I would like the main program to wait while to child programs execute.
Normally I would make the main program wait x amount of time and continuously check the text file for an okay flag. However I have seen people talk about using a signal.
Could someone please give an example of this.
There is Popen.send_signal() method that allows you to send a signal to a child process.
Here's code example that sends SIGINT to ping subprocess to get the summary in the output on exit.
You need one process to write and one to read; both processes reading leads to no communication. Signals are used only for special proposes, not for normal inter-process-communication. Use something like pipes or sockets. It's not more complicated than files, but much more powerful.

What happens if two python scripts want to write in the same file?

I have a pipeline which at some point splits work into various sub-processes that do the same thing in parallel. Thus their output should go into the same file.
Is it too risky to say all of those processes should write into the same file? Or does python try and retry if it sees that this resource is occupied?
This is system dependent. In Windows, the resource is locked and you get an exception. In Linux you can write the file with two processes (written data could be mixed)
Ideally in such cases you should use semaphores to synchronize access to shared resources.
If using semaphores is too heavy for your needs, then the only alternative is to write in separate files...
Edit: As pointed out by eye in a later post, a resource manager is another alternative to handle concurrent writers
In general, this is not a good idea and will take a lot of care to get right. Since the writes will have to be serialized, it might also adversely affect scalability.
I'd recommend writing to separate files and merging (or just leaving them as separate files).
A better solution is to implement a resource manager (writer) to avoid opening the same file twice. This manager could use threading synchronization mechanisms (threading.Lock) to avoid simultaneous access on some platforms.
How about having all of the different processes write their output into a queue, and have a single process that reads that queue, and writes to the file?
Use multiprocessing.Lock() instead of threading.Lock(). Just a word of caution! might slow down your concurrent processing ability because one process just waits for the lock to be released

Need a workaround: Python's select.select() doesn't work with subprocess' stdout?

From within my master python program, I am spawning a child program with this code:
child = subprocess.Popen(..., stdout=subprocess.PIPE, stdin=subprocess.PIPE)
FWIW, the child is a PHP script which needs to communicate back and forth with the python program.
The master python program actually needs to listen for communication from several other channels - other PHP scripts spawned using the same code, or socket objects coming from socket.accept(), and I would like to use select.select() as that is the most efficient way to wait for input from a variety of sources.
The problem I have, is that select.select() under Windows does not work with the subprocess' stdout file descriptor (this is documented), and it looks I will be forced to:
A) Poll the PHP scripts to see if they have written anything to stdout. (This system needs to be very responsive, I would need to poll at least 1,000 times per second!)
B) Have the PHP scripts connect to the master process and communicate via sockets instead of stdout/stdin.
I will probably go with solution (B), because I can't bring myself to make the system poll at such a high frequency, but it seems a sad waste of resources to reconnect with sockets when stdout/stdin would have done just fine.
Is there some alternative solution which would allow me to use stdout and select.select()?
Unfortunately, many uses of pipes on Windows don't work as nicely as they do on Unix, and this is one of them. On Windows, the better solution is probably to have your master program spawn threads to listen to each of its subprocesses. If you know the granularity of data that you expect back from your subprocess, you can do blocking reads in each of your threads, and then the thread will come alive when the IO unblocks.
Alternatively, (I have no idea if this is viable for your project), you could look into using a Unix-like system, or a Unix-like layer on top of Windows (e.g. Cygwin), where select.select() will work on subprocess pipes.

python subprocess module: looping over stdout of child process

I have some commands which I am running using the subprocess module. I then want to loop over the lines of the output. The documentation says do not do data_stream.stdout.read which I am not but I may be doing something which calls that. I am looping over the output like this:
for line in data_stream.stdout:
#do stuff here
.
.
.
Can this cause deadlocks like reading from data_stream.stdout or are the Popen modules set up for this kind of looping such that it uses the communicate code but handles all the callings of it for you?
You have to worry about deadlocks if you're communicating with your subprocess, i.e. if you're writing to stdin as well as reading from stdout. Because these pipes may be cached, doing this kind of two-way communication is very much a no-no:
data_stream = Popen(mycmd, stdin=PIPE, stdout=PIPE)
data_stream.stdin.write("do something\n")
for line in data_stream:
... # BAD!
However, if you've not set up stdin (or stderr) when constructing data_stream, you should be fine.
data_stream = Popen(mycmd, stdout=PIPE)
for line in data_stream.stdout:
... # Fine
If you need two-way communication, use communicate.
The two answer have caught the gist of the issue pretty well: don't mix writing something to the subprocess, reading something from it, writing again, etc -- the pipe's buffering means you're at risk of a deadlock. If you can, write everything you need to write to the subprocess FIRST, close that pipe, and only THEN read everything the subprocess has to say; communicate is nice for the purpose, IF the amount of data is not too large to fit in memory (if it is, you can still achieve the same effect "manually").
If you need finer-grain interaction, look instead at pexpect or, if you're on Windows, wexpect.
SilentGhost's/chrispy's answers are OK if you have a small to moderate amount of output from your subprocess. Sometimes, though, there may be a lot of output - too much to comfortably buffer in memory. In such a case, the thing to do is start() the process, and spawn a couple of threads - one to read child.stdout and one to read child.stderr where child is the subprocess. You then need to wait() for the subprocess to terminate.
This is actually how communicate() works; the advantage of using your own threads is that you can process the output from the subprocess as it is generated. For example, in my project python-gnupg I use this technique to read status output from the GnuPG executable as it is generated, rather than waiting for all of it by calling communicate(). You are welcome to inspect the source of this project - the relevant stuff is in the module gnupg.py.
data_stream.stdout is a standard output handle. you shouldn't be looping over it. communicate returns tuple of (stdoutdata, stderr). this stdoutdata you should be using to do your stuff.

Categories

Resources