Issue when performing non-block writing - python

I wrote the following code to understand how nonblocking write is operated:
import os, time
def takeAnap():
print('I am sleeping a bit while it is writing!')
time.sleep(50)
fd = os.open('t.txt', os.O_CREAT | os.O_NONBLOCK)
for i in range(100):
# Non-blocking write
fd = os.open('t.txt', os.O_APPEND | os.O_WRONLY | os.O_NONBLOCK)
os.write(fd, str(i))
os.close(fd)
time.sleep(2)
takeAnap()
As you can see, I am creating takeAnap() to be activated while the loop is being processed so that I can convince my self that the writing is performed without blocking! However, the loop still blocks and the method is not performed until finishing. I am not sure if my understanding is wrong but as far as I know, non-blocking operation allows you to do other tasks while the writing is being processed. Is that correct? If so, kindly where is the problem in my code!
Thank you.

I think you misunderstand what the O_NONBLOCK flag is used for. Here's what the flag actually does:
This prevents open from blocking for a “long time” to open the file.
This is only meaningful for some kinds of files, usually devices such
as serial ports; when it is not meaningful, it is harmless and
ignored.
Excerpt from https://www.gnu.org/software/libc/manual/html_node/Open_002dtime-Flags.html.
So, the flag does not specify non-blocking write, but non-blocking open. The writing is still serial, and blocking, and slow.

Related

Poor performance on large writes on O_NONBLOCK fifo in mac os

I have a reader and writer on a FIFO, where the reader must not block indefinitely. To do this, I open the read end with O_NONBLOCK.
The write end can block, so I open it as a regular file. Large writes perform unacceptably awfully - reading/writing a 4MB block takes minutes instead of the expected fraction of a second (expected, because in linux the same code takes a fraction of a second).
Example code in Python replicating the issue. First, create a fifo using mkfifo, e.g. mkfifo some_fifo, then run the reading end, then the writing end.
Reading End:
import os, time
# mkfifo some_fifo before starting python
fd = os.open('some_fifo',os.O_RDONLY | os.O_NONBLOCK)
while True:
try:
read = len(os.read(fd, 8192)) # read up to 8kb (FIFO buffer size in mac os)
print(read)
should_block = read < 8192 # linux
except BlockingIOError:
should_block = True # mac os
if should_block:
print('blocking')
time.sleep(0.5)
Write End:
import os
fd = os.open('some_fifo',os.O_WRONLY)
os.write(fd, b'aaaa'*1024*1024) # 4MB write
Note: The original code where I hit on this issue is cross-platform Java code that also runs on linux. Unfortunately, this means I can't use kqueue with a kevent's data field to figure out how much I can read without blocking - this data is lost in the abstraction over epoll/kqueue that I use. This means a solution of using a blocking fd à la this answer is unacceptable.
Edit: the original code used kqueue to block on the file descriptor in the read end, which performed worse
Edit 2: Linux os.read() doesn't throw a BlockingIOError before the other side of the pipe is connected despite the docs stating that it should (the call succeeds (returns 0) but sets errno to EAGAIN). Updated the code to be friendly to linux behavior too.
Edit 3: The code for macOS was originally:
import select, os
# mkfifo some_fifo before starting python
fd = os.open('some_fifo',os.O_RDONLY | os.O_NONBLOCK)
kq = select.kqueue()
ke = select.kevent(fd)
while True:
try:
read = len(os.read(fd, 8192)) # read up to 8kb (FIFO buffer size in mac os)
except BlockingIOError:
evts = kq.control([ke], 1, 10) # 10-second timeout, wait for 1 event
print(evts)
This performs as poorly as the version with sleeps, but sleeping makes sure the issue isn't with the blocking mechanism, and is cross-platform.

O_NONBLOCK does not raise exception in Python

I am trying to write a "cleaner" program to release a potential writer which is blocked at a named pipe (because no reader is reading from the pipe). However, the cleaner itself should not block when no writer is blocked writing to the pipe. In other words, the "cleaner" must return/terminate immediately, whether there is a blocked writer or not.
Therefore I searched for "Python non-blocking read from named pipe", and got these:
How to read named FIFO non-blockingly?
fifo - reading in a loop
What conditions result in an opened, nonblocking named pipe (fifo) being "unavailable" for reads?
Why does a read-only open of a named pipe block?
It seems that they suggest simply using os.open(file_name, os.O_RDONLY | os.O_NONBLOCK) should be fine, which didn't really work on my machine. I think I may have messed up somewhere or misunderstood some of their suggestion/situation. However, I really couldn't figure out what's wrong myself.
I found Linux man page (http://man7.org/linux/man-pages/man2/open.2.html), and the explanation of O_NONBLOCK seems consistent with their suggestions but not with my observation on my machine...
Just in case it is related, my OS is Ubuntu 14.04 LTS 64-bit.
Here is my code:
import os
import errno
BUFFER_SIZE = 65536
ph = None
try:
ph = os.open("pipe.fifo", os.O_RDONLY | os.O_NONBLOCK)
os.read(ph, BUFFER_SIZE)
except OSError as err:
if err.errno == errno.EAGAIN or err.errno == errno.EWOULDBLOCK:
raise err
else:
raise err
finally:
if ph:
os.close(ph)
(Don't know how to do Python syntax highlighting...)
Originally there is only the second raise, but I found that os.open and os.read, though not blocking, don't raise any exception either... I don't really know how much the writer will write to the buffer! If the non blocking read does not raise exception, how should I know when to stop reading?
Updated on 8/8/2016:
This seems to be a workaround/solution that satisfied my need:
import os
import errno
BUFFER_SIZE = 65536
ph = None
try:
ph = os.open("pipe.fifo", os.O_RDONLY | os.O_NONBLOCK)
while True:
buffer = os.read(ph, BUFFER_SIZE)
if len(buffer) < BUFFER_SIZE:
break
except OSError as err:
if err.errno == errno.EAGAIN or err.errno == errno.EWOULDBLOCK:
pass # It is supposed to raise one of these exceptions
else:
raise err
finally:
if ph:
os.close(ph)
It will loop on read. Every time it reads something, it compares the size of the content read with the specified BUFFER_SIZE, until it reaches EOF (writer will then unblock and continue/exit).
I still want to know why no exception is raised in that read.
Updated on 8/10/2016:
To make it clear, my overall goal is like this.
My main program (Python) has a thread serving as the reader. It normally blocks on the named pipe, waiting for "commands". There is a writer program (Shell script) which will write a one-liner "command" to the same pipe in each run.
In some cases, a writer starts before my main program starts, or after my main program terminates. In this case, the writer will block on the pipe waiting for a reader. In this way, if later my main program starts, it will read immediately from the pipe to get that "command" from the blocked writer - this is NOT what I want. I want my program to disregard writers that started before it.
Therefore, my solution is, during initialization of my reader thread, I do non-blocking read to release the writers, without really executing the "command" they were trying to write to the pipe.
This solution is incorrect.
while True:
buffer = os.read(ph, BUFFER_SIZE)
if len(buffer) < BUFFER_SIZE:
break
This will not actually read everything, it will only read until it gets a partial read. Remember: You are only guaranteed to fill the buffer with regular files, in all other cases it is possible to get a partial buffer before EOF. The correct way to do this is to loop until the actual end of file is reached, which will give a read of length 0. The end of file indicates that there are no writers (they have all exited or closed the fifo).
while True:
buffer = os.read(ph, BUFFER_SIZE)
if not buffer:
break
However, this will not work correctly in the face of non-blocking IO. It turns out non-blocking IO is completely unnecessary here.
import os
import fcntl
h = os.open("pipe.fifo", os.O_RDONLY | os.O_NONBLOCK)
# Now that we have successfully opened it without blocking,
# we no longer want the handle to be non-blocking
flags = fcntl.fcntl(h, fcntl.F_GETFL)
flags &= ~os.O_NONBLOCK
fcntl.fcntl(h, fcntl.F_SETFL, flags)
try:
while True:
# Only blocks if there is a writer
buf = os.read(h, 65536)
if not buf:
# This happens when there are no writers
break
finally:
os.close(h)
The only scenario which will cause this code to block is if there is an active writer which has opened the fifo but is not writing to it. From what you've described, it doesn't sound like this is the case.
Non-blocking IO doesn't do that
Your program wants to do two things, depending on circumstance:
If there are no writers, return immediately.
If there are writers, read data from the FIFO until the writers are done.
Non-blocking read() has no effect whatsoever on task #1. Whether you use O_NONBLOCK or not, read() will return immediately in situation #1. So the only difference is in situation #2.
In situation #2, your program's goal is to read the entire block of data from the writers. That is exactly how blocking IO works: it waits for the writers to finish, and then read() returns. The whole point of non-blocking IO is to return early if the operation can't complete immediately, which is the opposite of your program's goal—which is to wait until the operation is complete.
If you use non-blocking read(), in situation #2, your program will sometimes return early, before the writers have finished their jobs. Or maybe your program will return after reading half of a command from the FIFO, leaving the other (now corrupted) half there. This concern is expressed in your question:
If the non blocking read does not raise exception, how should I know when to stop reading?
You know when to stop reading because read() returns zero bytes when all writers have closed the pipe. (Conveniently, this is also what happens if there were no writers in the first place.) This is unfortunately not what happens if the writers do not close their end of the pipe when they are done. It is far simpler and more straightforward if the writers close the pipe when done, so this is the recommended solution, even if you need to modify the writers a little bit. If the writers cannot close the pipe for whatever reason, the solution is more complicated.
The main use case for non-blocking read() is if your program has some other task to complete while IO goes on in the background.
In POSIX C programs, if read() attempts to read from an empty pipe or a FIFO special file, it has one of the following results:
If no process has the pipe open for writing, read() returns 0 to indicate the end of the file.
If some process has the pipe open for writing and O_NONBLOCK is set to 1, read() returns -1 and sets errno to EAGAIN.
If some process has the pipe open for writing and O_NONBLOCK is set to 0, read() blocks (that is, does not return) until some data is written, or the pipe is closed by all other processes that have the pipe open for writing.
So,first check if there's any writer still open the fifo for write. If there's no one, the read will get an empty string and no exception. Otherwise, an exception will be raised

Keeping a pipe to a process open

I have an app that reads in stuff from stdin and returns, after a newline, results to stdout
A simple (stupid) example:
$ app
Expand[(x+1)^2]<CR>
x^2 + 2*x + 1
100 - 4<CR>
96
Opening and closing the app requires a lot of initialization and clean-up (its an interface to a Computer Algebra System), so I want to keep this to a minimum.
I want to open a pipe in Python to this process, write strings to its stdin and read out the results from stdout. Popen.communicate() doesn't work for this, as it closes the file handle, requiring to reopen the pipe.
I've tried something along the lines of this related question:
Communicate multiple times with a process without breaking the pipe? but I'm not sure how to wait for the output. It is also difficult to know a priori how long it will take the app to finish to process for the input at hand, so I don't want to make any assumptions. I guess most of my confusion comes from this question: Non-blocking read on a subprocess.PIPE in python where it is stated that mixing high and low level functions is not a good idea.
EDIT:
Sorry that I didn't give any code before, got interrupted. This is what I've tried so far and it seems to work, I'm just worried that something goes wrong unnoticed:
from subprocess import Popen, PIPE
pipe = Popen(["MathPipe"], stdin=PIPE, stdout=PIPE)
expressions = ["Expand[(x+1)^2]", "Integrate[Sin[x], {x,0,2*Pi}]"] # ...
for expr in expressions:
pipe.stdin.write(expr)
while True:
line = pipe.stdout.readline()
if line != '':
print line
# output of MathPipe is always terminated by ';'
if ";" in line:
break
Potential problems with this?
Using subprocess, you can't do this reliably. You might want to look at using the pexpect library. That won't work on Windows - if you're on Windows, try winpexpect.
Also, if you're trying to do mathematical stuff in Python, check out SAGE. They do a lot of work on interfacing with other open-source maths software, so there's a chance they've already done what you're trying to.
Perhaps you could pass stdin=subprocess.PIPE as an argument to subprocess.Popen. This will make the process' stdin available as a general file-like object:
import sys, subprocess
proc = subprocess.Popen(["mathematica <args>"], stdin=subprocess.PIPE,
stdout=sys.stdout, shell=True)
proc.stdin.write("Expand[ (x-1)^2 ]") # Write whatever to the process
proc.stdin.flush() # Ensure nothing is left in the buffer
proc.terminate() # Kill the process
This directs the subprocess' output directly to your python process' stdout. If you need to read the output and do some editing first, that is possible as well. Check out http://docs.python.org/library/subprocess.html#popen-objects.

gobject io monitoring + nonblocking reads

I've got a problem with using the io_add_watch monitor in python (via gobject). I want to do a nonblocking read of the whole buffer after every notification. Here's the code (shortened a bit):
class SomeApp(object):
def __init__(self):
# some other init that does a lot of stderr debug writes
fl = fcntl.fcntl(0, fcntl.F_GETFL, 0)
fcntl.fcntl(0, fcntl.F_SETFL, fl | os.O_NONBLOCK)
print "hooked", gobject.io_add_watch(0, gobject.IO_IN | gobject.IO_PRI, self.got_message, [""])
self.app = gobject.MainLoop()
def run(self):
print "ready"
self.app.run()
def got_message(self, fd, condition, data):
print "reading now"
data[0] += os.read(0, 1024)
print "got something", fd, condition, data
return True
gobject.threads_init()
SomeApp().run()
Here's the trick - when I run the program without debug output activated, I don't get the got_message calls. When I write a lot of stuff to the stderr first, the problem disappears. If I don't write anything apart from the prints visible in this code, I don't get the stdin messsage signals. Another interesting thing is that when I try to run the same app with stderr debug enabled but via strace (to check if there are any fcntl / ioctl calls I missed), the problem appears again.
So in short: if I write a lot to stderr first without strace, io_watch works. If I write a lot with strace, or don't write at all io_watch doesn't work.
The "some other init" part takes some time, so if I type some text before I see "hooked 2" output and then press "ctrl+c" after "ready", the get_message callback is called, but the read call throws EAGAIN, so the buffer seems to be empty.
Strace log related to the stdin:
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
fcntl(0, F_GETFL) = 0xa002 (flags O_RDWR|O_ASYNC|O_LARGEFILE)
fcntl(0, F_SETFL, O_RDWR|O_NONBLOCK|O_ASYNC|O_LARGEFILE) = 0
fcntl(0, F_GETFL) = 0xa802 (flags O_RDWR|O_NONBLOCK|O_ASYNC|O_LARGEFILE)
Does anyone have some ideas on what's going on here?
EDIT: Another clue. I tried to refactor the app to do the reading in a different thread and pass it back via a pipe. It "kind of" works:
...
rpipe, wpipe = os.pipe()
stopped = threading.Event()
self.stdreader = threading.Thread(name = "reader", target = self.std_read_loop, args = (wpipe, stopped))
self.stdreader.start()
new_data = ""
print "hooked", gobject.io_add_watch(rpipe, gobject.IO_IN | gobject.IO_PRI, self.got_message, [new_data])
def std_read_loop(self, wpipe, stop_event):
while True:
try:
new_data = os.read(0, 1024)
while len(new_data) > 0:
l = os.write(wpipe, new_data)
new_data = new_data[l:]
except OSError, e:
if stop_event.isSet():
break
time.sleep(0.1)
...
It's surprising that if I just put the same text in a new pipe, everything starts to work. The problem is that:
the first line is not "noticed" at all - I get only the second and following lines
it's fugly
Maybe that will give someone else a clue on why that's happening?
This sounds like a race condition in which there is some delay to setting your callback, or else there is a change in the environment which affects whether or not you can set the callback.
I would look carefully at what happens before you call io_add_watch(). For instance the Python fcntl docs say:
All functions in this module take a
file descriptor fd as their first
argument. This can be an integer file
descriptor, such as returned by
sys.stdin.fileno(), or a file object,
such as sys.stdin itself, which
provides a fileno() which returns a
genuine file descriptor.
Clearly that is not what you are doing when you assume that STDIN will have FD == 0. I would change that first and try again.
The other thing is that if the FD is already blocked, then your process could be waiting while other non-blocked processes are running, therefore there is a timing difference depending on what you do first. What happens if you refactor the fcntl stuff so that it is done soon after the program starts, even before importing the GTK modules?
I'm not sure that I understand why a program using the GTK GUI would want to read from the standard input in the first place. If you are actually trying to capture the output of another process, you should use the subprocess module to set up a pipe, then io_add_watch() on the pipe like so:
proc = subprocess.Popen(command, stdout = subprocess.PIPE)
gobject.io_add_watch(proc.stdout, glib.IO_IN, self.write_to_buffer )
Again, in this example we make sure that we have a valid opened FD before calling io_add_watch().
Normally, when gobject.io_add_watch() is used, it is called just before gobject.MainLoop(). For example, here is some working code using io_add_watch to catch IO_IN.
The documentation says you should return TRUE from the callback or it will be removed from the list of event sources.
What happens if you hook the callback first, prior to any stderr output? Does it still get called when you have debug output enabled?
Also, I suppose you should probably be repeatedly calling os.read() in your handler until it gives no data, in case >1024 bytes become ready between calls.
Have you tried using the select module in a background thread to emulate gio functionality? Does that work? What platform is this and what kind of FD are you dealing with? (file? socket? pipe?)

How do I run a sub-process, display its output in a GUI and allow it to be terminated?

I have been trying to write an application that runs subprocesses and (among other things) displays their output in a GUI and allows the user to click a button to cancel them. I start the processes like this:
queue = Queue.Queue(500)
process = subprocess.Popen(
command,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
iothread = threading.Thread(
target=simple_io_thread,
args=(process.stdout, queue))
iothread.daemon=True
iothread.start()
where simple_io_thread is defined as follows:
def simple_io_thread(pipe, queue):
while True:
line = pipe.readline()
queue.put(line, block=True)
if line=="":
break
This works well enough. In my UI I periodically do non-blocking "get"s from the queue. However, my problems come when I want to terminate the subprocess. (The subprocess is an arbitrary process, not something I wrote myself.) I can use the terminate method to terminate the process, but I do not know how to guarantee that my I/O thread will terminate. It will normally be doing blocking I/O on the pipe. This may or may not end some time after I terminate the process. (If the subprocess has spawned another subprocess, I can kill the first subprocess, but the second one will still keep the pipe open. I'm not even sure how to get such grand-children to terminate cleanly.) After that the I/O thread will try to enqueue the output, but I don't want to commit to reading from the queue indefinitely.
Ideally I would like some way to request termination of the subprocess, block for a short (<0.5s) amount of time and after that be guaranteed that the I/O thread has exited (or will exit in a timely fashion without interfering with anything else) and that I can stop reading from the queue.
It's not critical to me that a solution uses an I/O thread. If there's another way to do this that works on Windows and Linux with Python 2.6 and a Tkinter GUI that would be fine.
EDIT - Will's answer and other things I've seen on the web about doing this in other languages suggest that the operating system expects you just to close the file handle on the main thread and then the I/O thread should come out of its blocking read. However, as I described in the comment, that doesn't seem to work for me. If I do this on the main thread:
process.stdout.close()
I get:
IOError: close() called during concurrent operation on the same file object.
...on the main thread. If I do this on the main thread:
os.close(process.stdout.fileno())
I get:
close failed in file object destructor: IOError: [Errno 9] Bad file descriptor
...later on in the main thread when it tries to close the file handle itself.
I know this is an old post, but in case it still helps anyone, I think your problem could be solved by passing the subprocess.Popen instance to io_thread, rather than it's output stream.
If you do that, then you can replace your while True: line with while process.poll() == None:.
process.poll() checks for the subprocess return code; if the process hasn't finished, then there isn't one (i.e. process.poll() == None). You can then do away with if line == "": break.
The reason I'm here is because I wrote a very similar script to this today, and I got those:-
IOError: close() called during concurrent operation on the same file object. errors.
Again, in case it helps, I think my problems stem from (my) io_thread doing some overly efficient garbage collection, and closes a file handle I give it (I'm probably wrong, but it works now..) Mine's different tho in that it's not daemonic, and it iterates through subprocess.stdout, rather than using a while loop.. i.e.:-
def io_thread(subprocess,logfile,lock):
for line in subprocess.stdout:
lock.acquire()
print line,
lock.release()
logfile.write( line )
I should also probably mention that I pass the bufsize argument to subprocess.Popen, so that it's line buffered.
This is probably old enough, but still usefull to someone coming from search engine...
The reason that it shows that message is that after the subprocess has been completed it closes the file descriptors, therefore, the daemon thread (which is running concurrently) will try to use those closed descriptors raising the error.
By joining the thread before the subprocess wait() or communicate() methods should be more than enough to suppress the error.
my_thread.join()
print my_thread.is_alive()
my_popen.communicate()
In the code that terminates the process, you could also explicitly os.close() the pipe that your thread is reading from?
You should close the write pipe instead... but as you wrote the code you cannot access to it. To do it you should
crate a pipe
pass the write pipe file id to Popen's stdout
use the read pipe file simple_io_thread to read lines.
Now you can close the write pipe and the read thread will close gracefully.
queue = Queue.Queue(500)
r, w = os.pipe()
process = subprocess.Popen(
command,
stdout=w,
stderr=subprocess.STDOUT)
iothread = threading.Thread(
target=simple_io_thread,
args=(os.fdopen(r), queue))
iothread.daemon=True
iothread.start()
Now by
os.close(w)
You can close the pipe and iothread will shutdown without any exception.

Categories

Resources