Named pipe is not flushing in Python - python

I have a named pipe created via the os.mkfifo() command. I have two different Python processes accessing this named pipe, process A is reading, and process B is writing. Process A uses the select function to determine when there is data available in the fifo/pipe. Despite the fact that process B flushes after each write call, process A's select function does not always return (it keeps blocking as if there is no new data). After looking into this issue extensively, I finally just programmed process B to add 5KB of garbage writes before and after my real call, and likewise process A is programmed to ignore those 5KB. Now everything works fine, and select is always returning appropriately. I came to this hack-ish solution by noticing that process A's select would return if process B were to be killed (after it was writing and flushing, it would sleep on a read pipe). Is there a problem with flush in Python for named pipes?

What APIs are you using? os.read() and os.write() don't buffer anything.

To find out if Python's internal buffering is causing your problems, when running your scripts do "python -u" instead of "python". This will force python in to "unbuffered mode" which will cause all output to be printed instantaneously.

The flush operation is irrelevant for named pipes; the data for named pipes is held strictly in memory, and won't be released until it is read or the FIFO is closed.

Related

Python multiple process logging to a shared file

I used the multiprocessing framework to create several parallel sub-process (via JoinableQueue), but I just set up the logging (using the normal python logging module) in my main thread. And as I test the code, it seems that all the sub-processes are able to put their logs into the single logfile that I specified in the starting of my main process with no issues.
However, according to the python logging cookbook, it says that the module logging is only thread-safe, but not process-safe. It suggests to use:
multiprocessing.logging (which does not has full functionality of
logging);
use mutliprocessing.Lock to serialize the wiring to
logfile from sub-processes
use logging.QueueHandler to send logs into
a multiprocessing.Queue, and then have a dedicated logging thread in
the main process to handling writing log records into logfile
All the suggested solutions make sense to me, and I actually was able to implement solution #3 - it worked, no issues.
But, I do have the question about what would be the issue if we do not handle this well. What bad consequence might happen if I did not do any of #1,2,3 (as I described in the first paragraph)? And how can I make those bad consequence happen (I'm curious to see them)?
Generally you want log writes to be atomic in some fashion. That is, in this context, when something writes a chunk of text to a log, that chunk appears together rather than being split up and intermixed with the content of other log entries. If multiple processes try to write to a file without some kind of mediation, it can result in such intermixing or even clobbering of the content.
To purposely cause such a thing, have several processes write to the log repeatedly and simultaneously without mediation (no locks or handling processes) just as the documentation suggests you shouldn't. The more processes and the longer (partially dependent on buffer sizes) the writes are, the more likely you'll get intermixing.

Python subprocess & stdout - program deadlocks

I have a simulation program which is piloted though stdin and provides output to stdout
Doing a C++/Qt program for running it in a QProcess works well.
Doing a Python program for running it under linux works well, using:
p = subprocess.Popen(cmd,stdin=subprocess.PIPE,stdout=subprocess.PIPE)
And using p.stdin.write, p.stdout.readline, and p.wait
However, under windows, the program runs and gets the commands through stdin as it should(this has been verified by debugging the subprocess), but the python program deadlocks at any p.stdout.readline, and p.wait. If the stdout=subprocess.PIPE parameter is removed, the program works, the output is displayed on the console and no deadlock occurs.
This sounds familiar with a warning from the Python documentation:
Warning : This will deadlock when using stdout=PIPE and/or stderr=PIPE and the child process generates enough output to a pipe
such that it blocks waiting for the OS pipe buffer to accept more
data. Use communicate() to avoid that.
However, I can't use communicate(), as the program protocol is not a single command and a single output, rather several commands and replies are required.
Is there any solution?
Unsure of it, but it looks like a buffering problem. On Linux (as on most Unix or Unix-like), output to a file or a pipe is internally buffered at the OS level. That means that after a write call, all the data is buffered but nothing is available at the other end of the pipe until either the internal buffer is full, the data is flushed or the pipe is closed. That's one of the reasons why ptys were invented and are not implemented with a pipe pair.
Said differently, it is not possible to drive a program where you need to use previous output to know what you should give as input with pipes, unless the program has been specially tailored for it by consistently flushing its output before reading anything. It works on a true terminal (tty or pty) because the driver automatically forces a flush of the output before any read on the same device.
But it is not the same dealock that is described in the documentation that you have cited in your question.

Replace current process with invocation of subprocess?

In python, is there a way to invoke a new process in, hand it the same context, such as standard IO streams, close the current process, and give control to the invoked process? This would effectively 'replace' the process.
I have a program whose behavior I want to repeat. However, it uses a third-party library, and it seems that the only way that I can truly kill threads invoked by that library is to exit() my python process.
Plus, it seems like it could help manage memory.
You may be interested in os.execv() and friends:
These functions all execute a new program, replacing the current
process; they do not return. On Unix, the new executable is loaded
into the current process, and will have the same process id as the
caller. Errors will be reported as OSError exceptions.

Avoid hang when writing to named pipe which disappears and comes back

I have a program which dumps information into a named pipe like this:
cmd=open(destination,'w')
cmd.write(data)
cmd.close()
This works pretty well until the pipe (destination) disappears while my program is writing to it. The problem is that it keeps hanging on the write part(?)
I was expecting some exception to happen, but that's not the case.
How can I avoid this situation?
Thanks,
Jay
If the process reading from the pipe is not reading as fast as your writing, your script will block when it tries to write to the pipe. From the Wikipedia article:
"If the queue buffer fills up, the
sending program is suspended (blocked)
until the receiving program has had a
chance to read some data and make room
in the buffer. In Linux, the size of
the buffer is 65536 bytes."
Luckly you have a few options:
The signal module will allow you to set an alarm to break out of the write call. After the prescribed amount of time, a SIGALRM signal will be sent to your process, if your handler for the signal raises an exception, it will break you out of the write.
With threading, you can spawn a new thread to handle the writing, killing it if it blocks for too long.
You can also use the fnctl module to make the pipe nonblocking (meaning the call will not wait, it will fail immediately if the pipe is full): Non-blocking read on a subprocess.PIPE in python.
Finally, you can use the select module to check if the pipe is ready for writing before attempting your write, just be careful, the check-write action is not idempotent (e.g. the pipe could fill up between the check and write).
I think that the signal module can help you. Check this example:
http://docs.python.org/library/signal.html#example
(The example solves an possibly non-finishing open() call, but can be trivially modified to do the same thing to your cmd.write() call.)

Jython: subprocess.Popen runs out of file descriptors

I'm using the Jython 2.51 implementation of Python to write a script that repeatedly invokes another process via subprocess.Popen and uses PIPE to pipe stdout and stderr to the parent process and stdin to the child process. After several hundred loop iterations, I seem to run out of file descriptors.
The Python subprocess documentation mentions very little about freeing file descriptors, other than the close_fds option, which isn't described very clearly (Why should there be any file descriptors besides 0, 1 and 2 open in the first place?). I'm assuming that in CPython, reference counting takes care of the resource freeing issue. What's the proper way to make sure all descriptors get freed when one is done with a Popen object in Jython?
Edit: Just in case it makes a difference, this is a multithreaded program, so there are several Popen processes running simultaneously.
This only answers part of your question, but my understanding is that, when you spawn a new process, it normally inherits all the handles of the parent process. That includes such things as open files and sockets that you're listening on.
On UNIX, that's a side-effect of using 'fork', which duplicates the current process and all of its handles before loading the new executable. On Windows it's more explicit, but Python does it anyway, to try to match the behavior across platforms as much as possible.
The close_fds option, when True, closes all these inherited handles after spawning the subprocess, so the new executable starts with a clean slate. But if your subprocesses are run one at a time, and terminating when they're done, then this shouldn't be the problem.

Categories

Resources