When do I call flush() when dup'ing file descriptors? - python

I have a wrapper to redirect outputs when I call python python-wrapped C++.
The basic idea is to use dup and dup2, which are the only ways I've been able to catch the printf outputs from the C++. The wrapper works fine with no calls to flush() as long as I'm running the job interactively, but when I send the job to a TORQUE batch I get the unwelcome outputs again.
My understanding, in part from this question, is that some well-placed flush() calls should fix this, but where exactly do they need to go? Should I flush the buffer before dup'ing to the tempfile? Before dup'ing back? Both?
The wrapper I'm using is as follows:
class Filter(object):
"""
Workaround filter for annoying and worthless errors.
"""
def __init__(self, veto_words={'ClassTable'}):
self.veto_words = set(veto_words)
self.temp = tempfile.NamedTemporaryFile()
def __enter__(self):
sys.stdout.flush() # <--- NEEDED?
sys.stderr.flush() # <--- NEEDED?
self.old_out, self.old_err = os.dup(1), os.dup(2)
os.dup2(self.temp.fileno(), 1)
os.dup2(self.temp.fileno(), 2)
def __exit__(self, exe_type, exe_val, tb):
sys.stdout.flush() # <--- NEEDED?
sys.stderr.flush() # <--- NEEDED?
os.dup2(self.old_out, 1)
os.dup2(self.old_err, 2)
self.temp.seek(0)
for line in self.temp:
veto = set(line.split()) & self.veto_words
if not veto:
sys.stderr.write(line)

Python applies line buffering when connected to a TTY, otherwise a larger buffer is needed.
Redirecting your Python program to a pipe means there is no TTY connected to the stream, and you'll have to use .flush() even when sending newlines.
You can run Python with -u to turn off buffering of stdout.

Related

print() with end='' in Python 3.10 not working as in Python 3.9 [duplicate]

Is output buffering enabled by default in Python's interpreter for sys.stdout?
If the answer is positive, what are all the ways to disable it?
Suggestions so far:
Use the -u command line switch
Wrap sys.stdout in an object that flushes after every write
Set PYTHONUNBUFFERED env var
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
Is there any other way to set some global flag in sys/sys.stdout programmatically during execution?
If you just want to flush after a specific write using print, see How can I flush the output of the print function?.
From Magnus Lycka answer on a mailing list:
You can skip buffering for a whole
python process using python -u
or by
setting the environment variable
PYTHONUNBUFFERED.
You could also replace sys.stdout with
some other stream like wrapper which
does a flush after every call.
class Unbuffered(object):
def __init__(self, stream):
self.stream = stream
def write(self, data):
self.stream.write(data)
self.stream.flush()
def writelines(self, datas):
self.stream.writelines(datas)
self.stream.flush()
def __getattr__(self, attr):
return getattr(self.stream, attr)
import sys
sys.stdout = Unbuffered(sys.stdout)
print 'Hello'
I would rather put my answer in How to flush output of print function? or in Python's print function that flushes the buffer when it's called?, but since they were marked as duplicates of this one (what I do not agree), I'll answer it here.
Since Python 3.3, print() supports the keyword argument "flush" (see documentation):
print('Hello World!', flush=True)
# reopen stdout file descriptor with write mode
# and 0 as the buffer size (unbuffered)
import io, os, sys
try:
# Python 3, open as binary, then wrap in a TextIOWrapper with write-through.
sys.stdout = io.TextIOWrapper(open(sys.stdout.fileno(), 'wb', 0), write_through=True)
# If flushing on newlines is sufficient, as of 3.7 you can instead just call:
# sys.stdout.reconfigure(line_buffering=True)
except TypeError:
# Python 2
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
Credits: "Sebastian", somewhere on the Python mailing list.
Yes, it is.
You can disable it on the commandline with the "-u" switch.
Alternatively, you could call .flush() on sys.stdout on every write (or wrap it with an object that does this automatically)
This relates to Cristóvão D. Sousa's answer, but I couldn't comment yet.
A straight-forward way of using the flush keyword argument of Python 3 in order to always have unbuffered output is:
import functools
print = functools.partial(print, flush=True)
afterwards, print will always flush the output directly (except flush=False is given).
Note, (a) that this answers the question only partially as it doesn't redirect all the output. But I guess print is the most common way for creating output to stdout/stderr in python, so these 2 lines cover probably most of the use cases.
Note (b) that it only works in the module/script where you defined it. This can be good when writing a module as it doesn't mess with the sys.stdout.
Python 2 doesn't provide the flush argument, but you could emulate a Python 3-type print function as described here https://stackoverflow.com/a/27991478/3734258 .
def disable_stdout_buffering():
# Appending to gc.garbage is a way to stop an object from being
# destroyed. If the old sys.stdout is ever collected, it will
# close() stdout, which is not good.
gc.garbage.append(sys.stdout)
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
# Then this will give output in the correct order:
disable_stdout_buffering()
print "hello"
subprocess.call(["echo", "bye"])
Without saving the old sys.stdout, disable_stdout_buffering() isn't idempotent, and multiple calls will result in an error like this:
Traceback (most recent call last):
File "test/buffering.py", line 17, in <module>
print "hello"
IOError: [Errno 9] Bad file descriptor
close failed: [Errno 9] Bad file descriptor
Another possibility is:
def disable_stdout_buffering():
fileno = sys.stdout.fileno()
temp_fd = os.dup(fileno)
sys.stdout.close()
os.dup2(temp_fd, fileno)
os.close(temp_fd)
sys.stdout = os.fdopen(fileno, "w", 0)
(Appending to gc.garbage is not such a good idea because it's where unfreeable cycles get put, and you might want to check for those.)
The following works in Python 2.6, 2.7, and 3.2:
import os
import sys
buf_arg = 0
if sys.version_info[0] == 3:
os.environ['PYTHONUNBUFFERED'] = '1'
buf_arg = 1
sys.stdout = os.fdopen(sys.stdout.fileno(), 'a+', buf_arg)
sys.stderr = os.fdopen(sys.stderr.fileno(), 'a+', buf_arg)
Yes, it is enabled by default. You can disable it by using the -u option on the command line when calling python.
In Python 3, you can monkey-patch the print function, to always send flush=True:
_orig_print = print
def print(*args, **kwargs):
_orig_print(*args, flush=True, **kwargs)
As pointed out in a comment, you can simplify this by binding the flush parameter to a value, via functools.partial:
print = functools.partial(print, flush=True)
You can also run Python with stdbuf utility:
stdbuf -oL python <script>
You can create an unbuffered file and assign this file to sys.stdout.
import sys
myFile= open( "a.log", "w", 0 )
sys.stdout= myFile
You can't magically change the system-supplied stdout; since it's supplied to your python program by the OS.
You can also use fcntl to change the file flags in-fly.
fl = fcntl.fcntl(fd.fileno(), fcntl.F_GETFL)
fl |= os.O_SYNC # or os.O_DSYNC (if you don't care the file timestamp updates)
fcntl.fcntl(fd.fileno(), fcntl.F_SETFL, fl)
One way to get unbuffered output would be to use sys.stderr instead of sys.stdout or to simply call sys.stdout.flush() to explicitly force a write to occur.
You could easily redirect everything printed by doing:
import sys; sys.stdout = sys.stderr
print "Hello World!"
Or to redirect just for a particular print statement:
print >>sys.stderr, "Hello World!"
To reset stdout you can just do:
sys.stdout = sys.__stdout__
It is possible to override only write method of sys.stdout with one that calls flush. Suggested method implementation is below.
def write_flush(args, w=stdout.write):
w(args)
stdout.flush()
Default value of w argument will keep original write method reference. After write_flush is defined, the original write might be overridden.
stdout.write = write_flush
The code assumes that stdout is imported this way from sys import stdout.
Variant that works without crashing (at least on win32; python 2.7, ipython 0.12) then called subsequently (multiple times):
def DisOutBuffering():
if sys.stdout.name == '<stdout>':
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
if sys.stderr.name == '<stderr>':
sys.stderr = os.fdopen(sys.stderr.fileno(), 'w', 0)
(I've posted a comment, but it got lost somehow. So, again:)
As I noticed, CPython (at least on Linux) behaves differently depending on where the output goes. If it goes to a tty, then the output is flushed after each '\n'
If it goes to a pipe/process, then it is buffered and you can use the flush() based solutions or the -u option recommended above.
Slightly related to output buffering:
If you iterate over the lines in the input with
for line in sys.stdin:
...
then the for implementation in CPython will collect the input for a while and then execute the loop body for a bunch of input lines. If your script is about to write output for each input line, this might look like output buffering but it's actually batching, and therefore, none of the flush(), etc. techniques will help that.
Interestingly, you don't have this behaviour in pypy.
To avoid this, you can use
while True:
line=sys.stdin.readline()
...

streaming python commands without using flag -u [duplicate]

Is output buffering enabled by default in Python's interpreter for sys.stdout?
If the answer is positive, what are all the ways to disable it?
Suggestions so far:
Use the -u command line switch
Wrap sys.stdout in an object that flushes after every write
Set PYTHONUNBUFFERED env var
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
Is there any other way to set some global flag in sys/sys.stdout programmatically during execution?
If you just want to flush after a specific write using print, see How can I flush the output of the print function?.
From Magnus Lycka answer on a mailing list:
You can skip buffering for a whole
python process using python -u
or by
setting the environment variable
PYTHONUNBUFFERED.
You could also replace sys.stdout with
some other stream like wrapper which
does a flush after every call.
class Unbuffered(object):
def __init__(self, stream):
self.stream = stream
def write(self, data):
self.stream.write(data)
self.stream.flush()
def writelines(self, datas):
self.stream.writelines(datas)
self.stream.flush()
def __getattr__(self, attr):
return getattr(self.stream, attr)
import sys
sys.stdout = Unbuffered(sys.stdout)
print 'Hello'
I would rather put my answer in How to flush output of print function? or in Python's print function that flushes the buffer when it's called?, but since they were marked as duplicates of this one (what I do not agree), I'll answer it here.
Since Python 3.3, print() supports the keyword argument "flush" (see documentation):
print('Hello World!', flush=True)
# reopen stdout file descriptor with write mode
# and 0 as the buffer size (unbuffered)
import io, os, sys
try:
# Python 3, open as binary, then wrap in a TextIOWrapper with write-through.
sys.stdout = io.TextIOWrapper(open(sys.stdout.fileno(), 'wb', 0), write_through=True)
# If flushing on newlines is sufficient, as of 3.7 you can instead just call:
# sys.stdout.reconfigure(line_buffering=True)
except TypeError:
# Python 2
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
Credits: "Sebastian", somewhere on the Python mailing list.
Yes, it is.
You can disable it on the commandline with the "-u" switch.
Alternatively, you could call .flush() on sys.stdout on every write (or wrap it with an object that does this automatically)
This relates to Cristóvão D. Sousa's answer, but I couldn't comment yet.
A straight-forward way of using the flush keyword argument of Python 3 in order to always have unbuffered output is:
import functools
print = functools.partial(print, flush=True)
afterwards, print will always flush the output directly (except flush=False is given).
Note, (a) that this answers the question only partially as it doesn't redirect all the output. But I guess print is the most common way for creating output to stdout/stderr in python, so these 2 lines cover probably most of the use cases.
Note (b) that it only works in the module/script where you defined it. This can be good when writing a module as it doesn't mess with the sys.stdout.
Python 2 doesn't provide the flush argument, but you could emulate a Python 3-type print function as described here https://stackoverflow.com/a/27991478/3734258 .
def disable_stdout_buffering():
# Appending to gc.garbage is a way to stop an object from being
# destroyed. If the old sys.stdout is ever collected, it will
# close() stdout, which is not good.
gc.garbage.append(sys.stdout)
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
# Then this will give output in the correct order:
disable_stdout_buffering()
print "hello"
subprocess.call(["echo", "bye"])
Without saving the old sys.stdout, disable_stdout_buffering() isn't idempotent, and multiple calls will result in an error like this:
Traceback (most recent call last):
File "test/buffering.py", line 17, in <module>
print "hello"
IOError: [Errno 9] Bad file descriptor
close failed: [Errno 9] Bad file descriptor
Another possibility is:
def disable_stdout_buffering():
fileno = sys.stdout.fileno()
temp_fd = os.dup(fileno)
sys.stdout.close()
os.dup2(temp_fd, fileno)
os.close(temp_fd)
sys.stdout = os.fdopen(fileno, "w", 0)
(Appending to gc.garbage is not such a good idea because it's where unfreeable cycles get put, and you might want to check for those.)
The following works in Python 2.6, 2.7, and 3.2:
import os
import sys
buf_arg = 0
if sys.version_info[0] == 3:
os.environ['PYTHONUNBUFFERED'] = '1'
buf_arg = 1
sys.stdout = os.fdopen(sys.stdout.fileno(), 'a+', buf_arg)
sys.stderr = os.fdopen(sys.stderr.fileno(), 'a+', buf_arg)
Yes, it is enabled by default. You can disable it by using the -u option on the command line when calling python.
In Python 3, you can monkey-patch the print function, to always send flush=True:
_orig_print = print
def print(*args, **kwargs):
_orig_print(*args, flush=True, **kwargs)
As pointed out in a comment, you can simplify this by binding the flush parameter to a value, via functools.partial:
print = functools.partial(print, flush=True)
You can also run Python with stdbuf utility:
stdbuf -oL python <script>
You can create an unbuffered file and assign this file to sys.stdout.
import sys
myFile= open( "a.log", "w", 0 )
sys.stdout= myFile
You can't magically change the system-supplied stdout; since it's supplied to your python program by the OS.
You can also use fcntl to change the file flags in-fly.
fl = fcntl.fcntl(fd.fileno(), fcntl.F_GETFL)
fl |= os.O_SYNC # or os.O_DSYNC (if you don't care the file timestamp updates)
fcntl.fcntl(fd.fileno(), fcntl.F_SETFL, fl)
One way to get unbuffered output would be to use sys.stderr instead of sys.stdout or to simply call sys.stdout.flush() to explicitly force a write to occur.
You could easily redirect everything printed by doing:
import sys; sys.stdout = sys.stderr
print "Hello World!"
Or to redirect just for a particular print statement:
print >>sys.stderr, "Hello World!"
To reset stdout you can just do:
sys.stdout = sys.__stdout__
It is possible to override only write method of sys.stdout with one that calls flush. Suggested method implementation is below.
def write_flush(args, w=stdout.write):
w(args)
stdout.flush()
Default value of w argument will keep original write method reference. After write_flush is defined, the original write might be overridden.
stdout.write = write_flush
The code assumes that stdout is imported this way from sys import stdout.
Variant that works without crashing (at least on win32; python 2.7, ipython 0.12) then called subsequently (multiple times):
def DisOutBuffering():
if sys.stdout.name == '<stdout>':
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
if sys.stderr.name == '<stderr>':
sys.stderr = os.fdopen(sys.stderr.fileno(), 'w', 0)
(I've posted a comment, but it got lost somehow. So, again:)
As I noticed, CPython (at least on Linux) behaves differently depending on where the output goes. If it goes to a tty, then the output is flushed after each '\n'
If it goes to a pipe/process, then it is buffered and you can use the flush() based solutions or the -u option recommended above.
Slightly related to output buffering:
If you iterate over the lines in the input with
for line in sys.stdin:
...
then the for implementation in CPython will collect the input for a while and then execute the loop body for a bunch of input lines. If your script is about to write output for each input line, this might look like output buffering but it's actually batching, and therefore, none of the flush(), etc. techniques will help that.
Interestingly, you don't have this behaviour in pypy.
To avoid this, you can use
while True:
line=sys.stdin.readline()
...

print() without newline does not work when reading from stdin [duplicate]

Is output buffering enabled by default in Python's interpreter for sys.stdout?
If the answer is positive, what are all the ways to disable it?
Suggestions so far:
Use the -u command line switch
Wrap sys.stdout in an object that flushes after every write
Set PYTHONUNBUFFERED env var
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
Is there any other way to set some global flag in sys/sys.stdout programmatically during execution?
If you just want to flush after a specific write using print, see How can I flush the output of the print function?.
From Magnus Lycka answer on a mailing list:
You can skip buffering for a whole
python process using python -u
or by
setting the environment variable
PYTHONUNBUFFERED.
You could also replace sys.stdout with
some other stream like wrapper which
does a flush after every call.
class Unbuffered(object):
def __init__(self, stream):
self.stream = stream
def write(self, data):
self.stream.write(data)
self.stream.flush()
def writelines(self, datas):
self.stream.writelines(datas)
self.stream.flush()
def __getattr__(self, attr):
return getattr(self.stream, attr)
import sys
sys.stdout = Unbuffered(sys.stdout)
print 'Hello'
I would rather put my answer in How to flush output of print function? or in Python's print function that flushes the buffer when it's called?, but since they were marked as duplicates of this one (what I do not agree), I'll answer it here.
Since Python 3.3, print() supports the keyword argument "flush" (see documentation):
print('Hello World!', flush=True)
# reopen stdout file descriptor with write mode
# and 0 as the buffer size (unbuffered)
import io, os, sys
try:
# Python 3, open as binary, then wrap in a TextIOWrapper with write-through.
sys.stdout = io.TextIOWrapper(open(sys.stdout.fileno(), 'wb', 0), write_through=True)
# If flushing on newlines is sufficient, as of 3.7 you can instead just call:
# sys.stdout.reconfigure(line_buffering=True)
except TypeError:
# Python 2
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
Credits: "Sebastian", somewhere on the Python mailing list.
Yes, it is.
You can disable it on the commandline with the "-u" switch.
Alternatively, you could call .flush() on sys.stdout on every write (or wrap it with an object that does this automatically)
This relates to Cristóvão D. Sousa's answer, but I couldn't comment yet.
A straight-forward way of using the flush keyword argument of Python 3 in order to always have unbuffered output is:
import functools
print = functools.partial(print, flush=True)
afterwards, print will always flush the output directly (except flush=False is given).
Note, (a) that this answers the question only partially as it doesn't redirect all the output. But I guess print is the most common way for creating output to stdout/stderr in python, so these 2 lines cover probably most of the use cases.
Note (b) that it only works in the module/script where you defined it. This can be good when writing a module as it doesn't mess with the sys.stdout.
Python 2 doesn't provide the flush argument, but you could emulate a Python 3-type print function as described here https://stackoverflow.com/a/27991478/3734258 .
def disable_stdout_buffering():
# Appending to gc.garbage is a way to stop an object from being
# destroyed. If the old sys.stdout is ever collected, it will
# close() stdout, which is not good.
gc.garbage.append(sys.stdout)
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
# Then this will give output in the correct order:
disable_stdout_buffering()
print "hello"
subprocess.call(["echo", "bye"])
Without saving the old sys.stdout, disable_stdout_buffering() isn't idempotent, and multiple calls will result in an error like this:
Traceback (most recent call last):
File "test/buffering.py", line 17, in <module>
print "hello"
IOError: [Errno 9] Bad file descriptor
close failed: [Errno 9] Bad file descriptor
Another possibility is:
def disable_stdout_buffering():
fileno = sys.stdout.fileno()
temp_fd = os.dup(fileno)
sys.stdout.close()
os.dup2(temp_fd, fileno)
os.close(temp_fd)
sys.stdout = os.fdopen(fileno, "w", 0)
(Appending to gc.garbage is not such a good idea because it's where unfreeable cycles get put, and you might want to check for those.)
The following works in Python 2.6, 2.7, and 3.2:
import os
import sys
buf_arg = 0
if sys.version_info[0] == 3:
os.environ['PYTHONUNBUFFERED'] = '1'
buf_arg = 1
sys.stdout = os.fdopen(sys.stdout.fileno(), 'a+', buf_arg)
sys.stderr = os.fdopen(sys.stderr.fileno(), 'a+', buf_arg)
Yes, it is enabled by default. You can disable it by using the -u option on the command line when calling python.
In Python 3, you can monkey-patch the print function, to always send flush=True:
_orig_print = print
def print(*args, **kwargs):
_orig_print(*args, flush=True, **kwargs)
As pointed out in a comment, you can simplify this by binding the flush parameter to a value, via functools.partial:
print = functools.partial(print, flush=True)
You can also run Python with stdbuf utility:
stdbuf -oL python <script>
You can create an unbuffered file and assign this file to sys.stdout.
import sys
myFile= open( "a.log", "w", 0 )
sys.stdout= myFile
You can't magically change the system-supplied stdout; since it's supplied to your python program by the OS.
You can also use fcntl to change the file flags in-fly.
fl = fcntl.fcntl(fd.fileno(), fcntl.F_GETFL)
fl |= os.O_SYNC # or os.O_DSYNC (if you don't care the file timestamp updates)
fcntl.fcntl(fd.fileno(), fcntl.F_SETFL, fl)
One way to get unbuffered output would be to use sys.stderr instead of sys.stdout or to simply call sys.stdout.flush() to explicitly force a write to occur.
You could easily redirect everything printed by doing:
import sys; sys.stdout = sys.stderr
print "Hello World!"
Or to redirect just for a particular print statement:
print >>sys.stderr, "Hello World!"
To reset stdout you can just do:
sys.stdout = sys.__stdout__
It is possible to override only write method of sys.stdout with one that calls flush. Suggested method implementation is below.
def write_flush(args, w=stdout.write):
w(args)
stdout.flush()
Default value of w argument will keep original write method reference. After write_flush is defined, the original write might be overridden.
stdout.write = write_flush
The code assumes that stdout is imported this way from sys import stdout.
Variant that works without crashing (at least on win32; python 2.7, ipython 0.12) then called subsequently (multiple times):
def DisOutBuffering():
if sys.stdout.name == '<stdout>':
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
if sys.stderr.name == '<stderr>':
sys.stderr = os.fdopen(sys.stderr.fileno(), 'w', 0)
(I've posted a comment, but it got lost somehow. So, again:)
As I noticed, CPython (at least on Linux) behaves differently depending on where the output goes. If it goes to a tty, then the output is flushed after each '\n'
If it goes to a pipe/process, then it is buffered and you can use the flush() based solutions or the -u option recommended above.
Slightly related to output buffering:
If you iterate over the lines in the input with
for line in sys.stdin:
...
then the for implementation in CPython will collect the input for a while and then execute the loop body for a bunch of input lines. If your script is about to write output for each input line, this might look like output buffering but it's actually batching, and therefore, none of the flush(), etc. techniques will help that.
Interestingly, you don't have this behaviour in pypy.
To avoid this, you can use
while True:
line=sys.stdin.readline()
...

SGE script: print to file during execution (not just at the end)?

I have an SGE script to execute some python code, submitted to the queue using qsub. In the python script, I have a few print statements (updating me on the progress of the program). When I run the python script from the command line, the print statements are sent to stdout. For the sge script, I use the -o option to redirect the output to a file. However, it seems that the script will only send these to the file after the python script has completed running. This is annoying because (a) I can no longer see real time updates on the program and (b) if my job does not terminate correctly (for example if my job gets kicked off the queue) none of the updates are printed. How can I make sure that the script is writing to the file each time it I want to print something, as opposed to lumping it all together at the end?
I think you are running into an issue with buffered output. Python uses a library to handle it's output, and the library knows that it's more efficient to write a block at a time when it's not talking to a tty.
There are a couple of ways to work around this. You can run python with the "-u" option (see the python man page for details), for example, with something like this as the first line of your script:
#! /usr/bin/python -u
but this doesn't work if you are using the "/usr/bin/env" trick because you don't know where python is installed.
Another way is to reopen the stdout with something like this:
import sys
import os
# reopen stdout file descriptor with write mode
# and 0 as the buffer size (unbuffered)
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
Note the bufsize parameter of os.fdopen being set to 0 to force it to be unbuffered. You can do something similar with sys.stderr.
As others mentioned, it is out of performance reasons to not always write the stdout when not connected to a tty.
If you have a specific point at which you want the stdout to be written, you can force that by using
import sys
sys.stdout.flush()
at that point.
I just encountered a similar issue with SGE, and no suggested method to "unbuffer" the file IO seemed to work for me. I had to wait until the end of program execution to see any output.
The workaround I found was to wrap sys.stdout into a custom object that re-implements the "write" method. Instead of actually writing to stdout, this new method instead opens the file where IO is redirected, appends with the desired data, and then closes the file. It's a bit ugly, but I found it solved the problem, since the actual opening/closing of the file forces IO to be interactive.
Here's a minimal example:
import os, sys, time
class RedirIOStream:
def __init__(self, stream, REDIRPATH):
self.stream = stream
self.path = REDIRPATH
def write(self, data):
# instead of actually writing, just append to file directly!
myfile = open( self.path, 'a' )
myfile.write(data)
myfile.close()
def __getattr__(self, attr):
return getattr(self.stream, attr)
if not sys.stdout.isatty():
# Detect redirected stdout and std error file locations!
# Warning: this will only work on LINUX machines
STDOUTPATH = os.readlink('/proc/%d/fd/1' % os.getpid())
STDERRPATH = os.readlink('/proc/%d/fd/2' % os.getpid())
sys.stdout=RedirIOStream(sys.stdout, STDOUTPATH)
sys.stderr=RedirIOStream(sys.stderr, STDERRPATH)
# Simple program to print msg every 3 seconds
def main():
tstart = time.time()
for x in xrange( 10 ):
time.sleep( 3 )
MSG = ' %d/%d after %.0f sec' % (x, args.nMsg, time.time()-tstart )
print MSG
if __name__ == '__main__':
main()
This is SGE buffering the output of your process, it happens whether its a python process or any other.
In general you can decrease or disable the buffering in SGE by changing it and recompiling. But its not a great thing to do, all that data is going to be slowly written to disk affecting your overall performance.
Why not print to a file instead of stdout?
outFileID = open('output.log','w')
print(outFileID,'INFO: still working!')
print(outFileID,'WARNING: blah blah!')
and use
tail -f output.log
This works for me:
class ForceIOStream:
def __init__(self, stream):
self.stream = stream
def write(self, data):
self.stream.write(data)
self.stream.flush()
if not self.stream.isatty():
os.fsync(self.stream.fileno())
def __getattr__(self, attr):
return getattr(self.stream, attr)
sys.stdout = ForceIOStream(sys.stdout)
sys.stderr = ForceIOStream(sys.stderr)
and the issue has to do with NFS not syncing data back to the master until a file is closed or fsync is called.
I hit this same problem today and solved it by just writing to disk instead of printing:
with open('log-file.txt','w') as out:
out.write(status_report)
print() supports the argument flush since Python 3.3 (documentation). So, to force flush the stream:
print('Hello World!', flush=True)

gobject io monitoring + nonblocking reads

I've got a problem with using the io_add_watch monitor in python (via gobject). I want to do a nonblocking read of the whole buffer after every notification. Here's the code (shortened a bit):
class SomeApp(object):
def __init__(self):
# some other init that does a lot of stderr debug writes
fl = fcntl.fcntl(0, fcntl.F_GETFL, 0)
fcntl.fcntl(0, fcntl.F_SETFL, fl | os.O_NONBLOCK)
print "hooked", gobject.io_add_watch(0, gobject.IO_IN | gobject.IO_PRI, self.got_message, [""])
self.app = gobject.MainLoop()
def run(self):
print "ready"
self.app.run()
def got_message(self, fd, condition, data):
print "reading now"
data[0] += os.read(0, 1024)
print "got something", fd, condition, data
return True
gobject.threads_init()
SomeApp().run()
Here's the trick - when I run the program without debug output activated, I don't get the got_message calls. When I write a lot of stuff to the stderr first, the problem disappears. If I don't write anything apart from the prints visible in this code, I don't get the stdin messsage signals. Another interesting thing is that when I try to run the same app with stderr debug enabled but via strace (to check if there are any fcntl / ioctl calls I missed), the problem appears again.
So in short: if I write a lot to stderr first without strace, io_watch works. If I write a lot with strace, or don't write at all io_watch doesn't work.
The "some other init" part takes some time, so if I type some text before I see "hooked 2" output and then press "ctrl+c" after "ready", the get_message callback is called, but the read call throws EAGAIN, so the buffer seems to be empty.
Strace log related to the stdin:
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
fcntl(0, F_GETFL) = 0xa002 (flags O_RDWR|O_ASYNC|O_LARGEFILE)
fcntl(0, F_SETFL, O_RDWR|O_NONBLOCK|O_ASYNC|O_LARGEFILE) = 0
fcntl(0, F_GETFL) = 0xa802 (flags O_RDWR|O_NONBLOCK|O_ASYNC|O_LARGEFILE)
Does anyone have some ideas on what's going on here?
EDIT: Another clue. I tried to refactor the app to do the reading in a different thread and pass it back via a pipe. It "kind of" works:
...
rpipe, wpipe = os.pipe()
stopped = threading.Event()
self.stdreader = threading.Thread(name = "reader", target = self.std_read_loop, args = (wpipe, stopped))
self.stdreader.start()
new_data = ""
print "hooked", gobject.io_add_watch(rpipe, gobject.IO_IN | gobject.IO_PRI, self.got_message, [new_data])
def std_read_loop(self, wpipe, stop_event):
while True:
try:
new_data = os.read(0, 1024)
while len(new_data) > 0:
l = os.write(wpipe, new_data)
new_data = new_data[l:]
except OSError, e:
if stop_event.isSet():
break
time.sleep(0.1)
...
It's surprising that if I just put the same text in a new pipe, everything starts to work. The problem is that:
the first line is not "noticed" at all - I get only the second and following lines
it's fugly
Maybe that will give someone else a clue on why that's happening?
This sounds like a race condition in which there is some delay to setting your callback, or else there is a change in the environment which affects whether or not you can set the callback.
I would look carefully at what happens before you call io_add_watch(). For instance the Python fcntl docs say:
All functions in this module take a
file descriptor fd as their first
argument. This can be an integer file
descriptor, such as returned by
sys.stdin.fileno(), or a file object,
such as sys.stdin itself, which
provides a fileno() which returns a
genuine file descriptor.
Clearly that is not what you are doing when you assume that STDIN will have FD == 0. I would change that first and try again.
The other thing is that if the FD is already blocked, then your process could be waiting while other non-blocked processes are running, therefore there is a timing difference depending on what you do first. What happens if you refactor the fcntl stuff so that it is done soon after the program starts, even before importing the GTK modules?
I'm not sure that I understand why a program using the GTK GUI would want to read from the standard input in the first place. If you are actually trying to capture the output of another process, you should use the subprocess module to set up a pipe, then io_add_watch() on the pipe like so:
proc = subprocess.Popen(command, stdout = subprocess.PIPE)
gobject.io_add_watch(proc.stdout, glib.IO_IN, self.write_to_buffer )
Again, in this example we make sure that we have a valid opened FD before calling io_add_watch().
Normally, when gobject.io_add_watch() is used, it is called just before gobject.MainLoop(). For example, here is some working code using io_add_watch to catch IO_IN.
The documentation says you should return TRUE from the callback or it will be removed from the list of event sources.
What happens if you hook the callback first, prior to any stderr output? Does it still get called when you have debug output enabled?
Also, I suppose you should probably be repeatedly calling os.read() in your handler until it gives no data, in case >1024 bytes become ready between calls.
Have you tried using the select module in a background thread to emulate gio functionality? Does that work? What platform is this and what kind of FD are you dealing with? (file? socket? pipe?)

Categories

Resources