I have a program which dumps information into a named pipe like this:
cmd=open(destination,'w')
cmd.write(data)
cmd.close()
This works pretty well until the pipe (destination) disappears while my program is writing to it. The problem is that it keeps hanging on the write part(?)
I was expecting some exception to happen, but that's not the case.
How can I avoid this situation?
Thanks,
Jay
If the process reading from the pipe is not reading as fast as your writing, your script will block when it tries to write to the pipe. From the Wikipedia article:
"If the queue buffer fills up, the
sending program is suspended (blocked)
until the receiving program has had a
chance to read some data and make room
in the buffer. In Linux, the size of
the buffer is 65536 bytes."
Luckly you have a few options:
The signal module will allow you to set an alarm to break out of the write call. After the prescribed amount of time, a SIGALRM signal will be sent to your process, if your handler for the signal raises an exception, it will break you out of the write.
With threading, you can spawn a new thread to handle the writing, killing it if it blocks for too long.
You can also use the fnctl module to make the pipe nonblocking (meaning the call will not wait, it will fail immediately if the pipe is full): Non-blocking read on a subprocess.PIPE in python.
Finally, you can use the select module to check if the pipe is ready for writing before attempting your write, just be careful, the check-write action is not idempotent (e.g. the pipe could fill up between the check and write).
I think that the signal module can help you. Check this example:
http://docs.python.org/library/signal.html#example
(The example solves an possibly non-finishing open() call, but can be trivially modified to do the same thing to your cmd.write() call.)
Related
After looking at the subprocess async documentation I'm left wondering how anyone would run something equivalent to await process.stdout.read(NUMBER_OF_BYTES_TO_READ). The usage of the previous code snippet is advised against right in the documentation, they suggest to use the communicate method, and from what I can tell there is no way of indicating the number of bytes that need to be read with communicate().
What am I missing?
How would I tell communicate to return after reading a certain number of bytes?
Edit - I'm creating my subprocess with async pipes, I am trying to use the pipe asynchronously.
Short answer: stdout.read is blocking.
When there is enough bytes to read, it will return. This is a very happy and unlikely occasion. More likely, there will be little or no bytes to return, so it will wait. Locking the process.
The pipe can be created to be non-blocking, but that behavior is system-specific and fickle in my experience.
The "right" way to use stdout.read is to either be ready in the reading process to be blocked on this operation, possibly indefinitely; or use an external thread to read and push data to a shared buffer. Main thread can then decide to either pull or await on the buffer, retaining control.
In practical terms, and I wrote code like this several times, there will be a listening thread attached to a pipe, reading it until close or a signal from main thread to die. Reader and Main thread will use Queue.Queue to communicate, which is trivial to use in this scenario -- it's thread safe.
So, stdout.read comes with so many caveats, that nobody in the right mind would advise anyone to use it.
I am looking to interface with an interactive command line application using Python 3.5. The idea is that I start the process at the beginning of the Python script and leave it open. In a loop, I print a file path, followed by a line return, to stdin, wait for a quarter second or so as it processes, and read from stdout until it reaches a newline.
This is quite similar to the communicate feature of subprocess, but I am waiting for a line return instead of waiting for the process to terminate. Anyone aware of a relatively simple way to do this?
Edit: it would be preferable to use the standard library to do this, rather than third-party libraries such as pexpect, if possible.
You can use subprocess.Popen for this.
Something like this:
proc = subprocess.Popen(['my-command'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
Now proc.stdin and proc.stdout are your ends of pipes that send data to the subprocess stdin and read from the subprocess stdout.
Since you're only interested in reading newline-terminated lines, you can probably get around any problems caused by buffering. Buffering is one of the big gotchas when using subprocess to communicate with interactive processes. Usually I/O is line-buffered, meaning that if the subprocess doesn't terminate a line with newline, you might never see any data on proc.stdout, and vice versa with you writing to proc.stdin - it might not see it if you're not ending with newline. You can turn buffering off, but that's not so simple, and not platform independent.
Another problem you might have to solve is that you can't determine whether the subprocess is waiting for input or has sent you output except by writing and reading from the pipes. So you might need to start a second thread so you can wait for output on proc.stdout and write to proc.stdin at the same time without running into a deadlock because both processes are blocking on pipe I/O (or, if you're on a Unix which supports select with file handles, use select to determine which pipes are ready to receive or ready to be read from).
This sounds like a job for an event loop. The subprocess module starts to show its strain under complex tasks.
I've done this with Twisted, by subclassing the following:
twisted.internet.endpoints.ProcessEndpoint
twisted.protocols.basic.LineOnlyReceiver
Most documentation for Twisted uses sockets as endpoints, but it's not hard to adjust the code for processes.
I want to make a Python wrapper for another command-line program.
I want to read Python's stdin as quickly as possible, filter and translate it, and then write it promptly to the child program's stdin.
At the same time, I want to be reading as quickly as possible from the child program's stdout and, after a bit of massaging, writing it promptly to Python's stdout.
The Python subprocess module is full of warnings to use communicate() to avoid deadlocks. However, communicate() doesn't give me access to the child program's stdout until the child has terminated.
I think you'll be fine (carefully) ignoring the warnings using Popen.stdin, etc yourself. Just be sure to process the streams line-by-line and iterate through them on a fair schedule so not to fill up any buffers. A relatively simple (and inefficient) way of doing this in Python is using separate threads for the three streams. That's how Popen.communicate does it internally. Check out its source code to see how.
Disclaimer: This solution likely requires that you have access to the source code of the process you are trying to call, but may be worth trying anyways. It depends on the called process periodically flushing its stdout buffer which is not standard.
Say you have a process proc created by subprocess.Popen. proc has attributes stdin and stdout. These attributes are simply file-like objects. So, in order to send information through stdin you would call proc.stdin.write(). To retrieve information from proc.stdout you would call proc.stdout.readline() to read an individual line.
A couple of caveats:
When writing to proc.stdin via write() you will need to end the input with a newline character. Without a newline character, your subprocess will hang until a newline is passed.
In order to read information from proc.stdout you will need to make sure that the command called by subprocess appropriately flushes its stdout buffer after each print statement and that each line ends with a newline. If the stdout buffer does not flush at appropriate times, your call to proc.stdout.readline() will hang.
The following code does not work correctly on Windows (but does on Linux):
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setblocking(True)
sock.connect(address)
gobject.io_add_watch(
sock.fileno(),
gobject.IO_OUT | gobject.IO_ERR | gobject.IO_HUP,
callback)
Snippets of comments in various places in the glib source, and other places mention that in Windows, sockets are put in non-blocking mode during polling. As a result the callback self.outgoing_cb is constantly called, and writing to the socket fails with this error message:
[Errno 10035] A non-blocking socket operation could not be completed immediately
Calling sock.setblocking(True) prior to writing does not seem to circumvent this. By lowering the priority of the polling, and ignoring the error message, it works as expected, but throws far to many events, and consumes a lot of CPU. Is there a way around this limitation in Windows?
Update
I might point out, that the whole point of polling for POLLOUT is that when you make the write call you won't get EAGAIN/EWOULDBLOCK. The strange error message that I'm getting, I believe would be the Windows equivalent of those 2 error codes. In other words, I'm getting gobject.IO_OUT events when the socket will not let me write successfully, and putting it into blocking mode still gives me this inappropriate error.
Another update
On Linux, where this works correctly, the socket is not switched to non-blocking mode, and I receive IO_OUT, when the socket will let me write without blocking, or throwing an error. It's this functionality I want to best emulate/restore under Windows.
Further notes
From man poll:
poll() performs a similar task to select(2): it waits for one of a set
of file descriptors to become ready to perform I/O.
POLLOUT
Writing now will not block.
From man select:
A file descriptor is considered ready if it is possible to perform the correā
sponding I/O operation (e.g., read(2)) without blocking.
Is there a problem with doing non-blocking I/O? It seems kind of strange to use polling loops if you're using blocking I/O.
When I write programs like this I tend to do the following:
Buffer the bytes I want to send to the file descriptor.
Only ask for IO_OUT (or the poll() equivalent, POLLOUT) events when said buffer is non-empty.
When poll() (or equivalent) has signaled that you're ready to write, issue the write. If you get EAGAIN/EWOULDBLOCK, remove the bytes you successfully wrote from the buffer and wait for the next time you get signaled. If you successfully wrote the entire buffer, then stop asking for POLLOUT so you don't spuriously wake up.
(My guess is that the Win32 bindings are using WSAEventSelect and WaitForMultipleObjects() to simulate poll(), but the result is the same...)
I'm not sure how your desired approach with blocking sockets would work. You are "waking up" constantly because you asked to wake you up when you can write. You only want to specify that when you have data to write... But then, when it wakes you up, the system won't really tell you how much data you can write without blocking, so that's a good reason to use non-blocking I/O.
GIO contains GSocket, a "lowlevel network socket object" since 2.22. However this is yet to be ported to pygobject on Windows.
I'm not sure if this helps (I'm not proficient with the poll function or the MFC sockets and don't know the polling is a requirement of your program structure), so take this with a grain of salt:
But to avoid a blocking or EAGAIN on write, we use select, i.e. add the socket to the write set that is passed to select, and if select() comes back with rc=0 the socket will accept writes right away ...
The write loop we use in our app is (in pseudocode):
set_nonblocking.
count= 0.
do {
FDSET writefds;
add skt to writefds.
call select with writefds and a reaonsable timeout.
if (select fails with timeout) {
die with some error;
}
howmany= send(skt, buf+count, total-count).
if (howmany>0) {
count+= howmany.
}
} while (howmany>0 && count<total);
You could use Twisted, which includes support for GTK (even on Windows) and will handle all the various error conditions that non-blocking sockets on Windows like to raise.
I have some commands which I am running using the subprocess module. I then want to loop over the lines of the output. The documentation says do not do data_stream.stdout.read which I am not but I may be doing something which calls that. I am looping over the output like this:
for line in data_stream.stdout:
#do stuff here
.
.
.
Can this cause deadlocks like reading from data_stream.stdout or are the Popen modules set up for this kind of looping such that it uses the communicate code but handles all the callings of it for you?
You have to worry about deadlocks if you're communicating with your subprocess, i.e. if you're writing to stdin as well as reading from stdout. Because these pipes may be cached, doing this kind of two-way communication is very much a no-no:
data_stream = Popen(mycmd, stdin=PIPE, stdout=PIPE)
data_stream.stdin.write("do something\n")
for line in data_stream:
... # BAD!
However, if you've not set up stdin (or stderr) when constructing data_stream, you should be fine.
data_stream = Popen(mycmd, stdout=PIPE)
for line in data_stream.stdout:
... # Fine
If you need two-way communication, use communicate.
The two answer have caught the gist of the issue pretty well: don't mix writing something to the subprocess, reading something from it, writing again, etc -- the pipe's buffering means you're at risk of a deadlock. If you can, write everything you need to write to the subprocess FIRST, close that pipe, and only THEN read everything the subprocess has to say; communicate is nice for the purpose, IF the amount of data is not too large to fit in memory (if it is, you can still achieve the same effect "manually").
If you need finer-grain interaction, look instead at pexpect or, if you're on Windows, wexpect.
SilentGhost's/chrispy's answers are OK if you have a small to moderate amount of output from your subprocess. Sometimes, though, there may be a lot of output - too much to comfortably buffer in memory. In such a case, the thing to do is start() the process, and spawn a couple of threads - one to read child.stdout and one to read child.stderr where child is the subprocess. You then need to wait() for the subprocess to terminate.
This is actually how communicate() works; the advantage of using your own threads is that you can process the output from the subprocess as it is generated. For example, in my project python-gnupg I use this technique to read status output from the GnuPG executable as it is generated, rather than waiting for all of it by calling communicate(). You are welcome to inspect the source of this project - the relevant stuff is in the module gnupg.py.
data_stream.stdout is a standard output handle. you shouldn't be looping over it. communicate returns tuple of (stdoutdata, stderr). this stdoutdata you should be using to do your stuff.