Python cannot print unbuffered - python

I am working on a cluster and using Python to do some calculations.
However, python scripts does not print anything until they finish or get killed.
I have checked some ways (from Disable output buffering)
Use the -u command line switch
Wrap sys.stdout in an object that flushes after every write
Set PYTHONUNBUFFERED env var
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
But none of these works.
The python version is 2.7.8, 2.7.3 and 2.4.2 (There are three versions on the system, which must be put into PATH before running)
Also, because the cluster uses NFS, I have the exactly the same python among multiple machines. But on one machine, it prints unbuffered (at least it prints something before ending), without any setting at all (no PYTHONUNBUFFERD). On other machines it does not, which makes me even more confusing.
My testing script is
import os,sys,time
for i in range(100):
print(i)
time.sleep(1)
And
import os,sys,time
sys.stdout = os.fdopen(sys.stdout.fileno(),'w',0)
for i in range(100):
print(i)
time.sleep(1)
And
import os,sys,time
for i in range(100):
print(i)
sys.stdout.flush()
time.sleep(1)
Is there any other way to make Python print unbuffered (or buffered more reasonably, like buffer for one line) without modifing Python codes (because there are a lot of them)? Thanks.

Related

Parent and child-spawn don't seem to share mapped memory. Windows

I have a weird problem.
The below code works, if I open two cmd and simply run each script from the commandline via alt-tabbing.
# share1.py
import numpy
ram = numpy.memmap("bla",shape=4096,dtype=numpy.uint8,mode="w+")
for y in xrange(0,4096):
for x in xrange(0,4096):
ram[x]=65+y
print ram
-
# share2.py
import numpy
ram = numpy.memmap("bla",shape=4096,dtype=numpy.uint8,mode="readwrite")
for y in xrange(0,4096):
for x in xrange(0,4096):
m=ram[x]
print m,
print ram
This works! share2 continuously prints out the changes share1 makes.
But when I spawn the same code from python directly, instead of cmd, suddenly this doesn't work anymore and I don't understand why.
I use psutil to spawn an empty python process and feed it code via stdin. I ignore stdout so any print in the spawn eventually outputs on the main cmd I'm using to run the script.
share1 outputs the proper numbers it has set, while share2 only ever outputs 0, as if it didn't get to see the mapping at all. Please note that this also happens using regular mmap (not numpy`s).
What am I doing wrong?
(edit: accidentially put share2 at the beginning of the last paragraph, instead of share1.)

Preventing write interrupts in python script

I'm writing a parser in Python that outputs a bunch of database rows to standard out. In order for the DB to process them properly, each row needs to be fully printed to the console. I'm trying to prevent interrupts from making the print command stop halfway through printing a line.
I tried the solution that recommended using a signal handler override, but this still doesn't prevent the row from being partially printed when the program is interrupted. (I think the WRITE system call is cancelled to handle the interrupt).
I thought that the problem was solved by issue 10956 but I upgraded to Python 2.7.5 and the problem still happens.
You can see for yourself by running this example:
# Writer
import signal
interrupted = False
def signal_handler(signal, frame):
global interrupted
iterrupted = True
signal.signal(signal.SIGINT, signal_handler)
while True:
if interrupted:
break
print '0123456789'
In a terminal:
$ mkfifo --mode=0666 pipe
$ python writer.py > pipe
In another terminal:
$ cat pipe
Then Ctrl+C the first terminal. Some of the time the second terminal will end with an incomplete sequence of characters.
Is there any way of ensuring that full lines are written?
This seems less like an interrupt problem per se then a buffering issue. If I make a small change to your code, I don't get the partial lines.
# Writer
import sys
while True:
print '0123456789'
sys.stdout.flush()
It sounds like you don't really want to catch a signal but rather block it temporarily. This is supported by some *nix flavours. However Python explicitly does not support this.
You can write a C wrapper for sigmasks or look for a library. However if you are looking for a portable solution...

input() blocks other python processes in Windows 8 (python 3.3)

Working on a multi-threaded cross-platform python3.3 application I came across some weird behavior I was not expecting and am not sure is expected. The issue is on Windows 8 calling the input() method in one thread blocks other threads until it completes. I have tested the below example script on three Linux, two Windows 7 and one Windows 8 computers and this behavior is only observed on the Windows 8 computer. Is this expected behavior for Windows 8?
test.py:
import subprocess, threading, time
def ui():
i = input("-->")
print(i)
def loop():
i = 0
f = 'sky.{}'.format(i)
p = subprocess.Popen(['python', 'copy.py', 'sky1', f])
t = time.time()
while time.time() < t+15:
if p.poll() != None:
print(i)
time.sleep(3)
i+=1
f = 'sky.{}'.format(i)
p = subprocess.Popen(['python', 'copy.py', 'sky1', f])
p.terminate()
p.wait()
def start():
t1 = threading.Thread(target=ui)
t2 = threading.Thread(target=loop)
t1.start()
t2.start()
return t2
t2 = start()
t2.join()
print('done')
copy.py:
import shutil
import sys
src = sys.argv[1]
dst = sys.argv[2]
print('Copying \'{0}\' to \'{1}\''.format(src, dst))
shutil.copy(src, dst)
Update:
While trying out one of the suggestions I realized that I rushed to a conclusion missing something obvious. I apologize for getting off to a false start.
As Schollii suggested just using threads (no subprocess or python files) results in all threads making forward progress so the problem actually is using input() in one python process will cause other python processes to block/not run (I do not know exactly what is going on). Furthermore, it appears to be just python processes that are affected. If I use the same code shown above (with some modifications) to execute non-python executables with subprocess.Popen they will run as expected.
To summarize:
Using subprocess to execute non-python executable: works as expected with and without any calls to input().
Using subprocess to execute python executable: created processes appear to not run if a a call to input() is made in the original process.
Use subprocess to create python processes with a call to input() in a new process and not the original process: A call to input() blocks all python processes spawned by the 'main' process.
Side Note: I do not have Windows 8 platform so debugging/tests can be a little slow.
Because there are several problems with input in Python 3.0-3.2 this method has been impacted with few changes.
It's possible that we have a new bug again.
Can you try the following variant, which is raw_input() "back port" (which was avaiable in Python 2.x):
...
i = eval(input("-->"))
...
It's a very good problem to work with,
since you are dependent with input() method, which, usually needs the console input,
since you have threads, all the threads are trying to communicate with the console,
So, I advice you to use either Producer-Consumer concept or define all your inputs to a text file and pass the text file to the program.

SGE script: print to file during execution (not just at the end)?

I have an SGE script to execute some python code, submitted to the queue using qsub. In the python script, I have a few print statements (updating me on the progress of the program). When I run the python script from the command line, the print statements are sent to stdout. For the sge script, I use the -o option to redirect the output to a file. However, it seems that the script will only send these to the file after the python script has completed running. This is annoying because (a) I can no longer see real time updates on the program and (b) if my job does not terminate correctly (for example if my job gets kicked off the queue) none of the updates are printed. How can I make sure that the script is writing to the file each time it I want to print something, as opposed to lumping it all together at the end?
I think you are running into an issue with buffered output. Python uses a library to handle it's output, and the library knows that it's more efficient to write a block at a time when it's not talking to a tty.
There are a couple of ways to work around this. You can run python with the "-u" option (see the python man page for details), for example, with something like this as the first line of your script:
#! /usr/bin/python -u
but this doesn't work if you are using the "/usr/bin/env" trick because you don't know where python is installed.
Another way is to reopen the stdout with something like this:
import sys
import os
# reopen stdout file descriptor with write mode
# and 0 as the buffer size (unbuffered)
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
Note the bufsize parameter of os.fdopen being set to 0 to force it to be unbuffered. You can do something similar with sys.stderr.
As others mentioned, it is out of performance reasons to not always write the stdout when not connected to a tty.
If you have a specific point at which you want the stdout to be written, you can force that by using
import sys
sys.stdout.flush()
at that point.
I just encountered a similar issue with SGE, and no suggested method to "unbuffer" the file IO seemed to work for me. I had to wait until the end of program execution to see any output.
The workaround I found was to wrap sys.stdout into a custom object that re-implements the "write" method. Instead of actually writing to stdout, this new method instead opens the file where IO is redirected, appends with the desired data, and then closes the file. It's a bit ugly, but I found it solved the problem, since the actual opening/closing of the file forces IO to be interactive.
Here's a minimal example:
import os, sys, time
class RedirIOStream:
def __init__(self, stream, REDIRPATH):
self.stream = stream
self.path = REDIRPATH
def write(self, data):
# instead of actually writing, just append to file directly!
myfile = open( self.path, 'a' )
myfile.write(data)
myfile.close()
def __getattr__(self, attr):
return getattr(self.stream, attr)
if not sys.stdout.isatty():
# Detect redirected stdout and std error file locations!
# Warning: this will only work on LINUX machines
STDOUTPATH = os.readlink('/proc/%d/fd/1' % os.getpid())
STDERRPATH = os.readlink('/proc/%d/fd/2' % os.getpid())
sys.stdout=RedirIOStream(sys.stdout, STDOUTPATH)
sys.stderr=RedirIOStream(sys.stderr, STDERRPATH)
# Simple program to print msg every 3 seconds
def main():
tstart = time.time()
for x in xrange( 10 ):
time.sleep( 3 )
MSG = ' %d/%d after %.0f sec' % (x, args.nMsg, time.time()-tstart )
print MSG
if __name__ == '__main__':
main()
This is SGE buffering the output of your process, it happens whether its a python process or any other.
In general you can decrease or disable the buffering in SGE by changing it and recompiling. But its not a great thing to do, all that data is going to be slowly written to disk affecting your overall performance.
Why not print to a file instead of stdout?
outFileID = open('output.log','w')
print(outFileID,'INFO: still working!')
print(outFileID,'WARNING: blah blah!')
and use
tail -f output.log
This works for me:
class ForceIOStream:
def __init__(self, stream):
self.stream = stream
def write(self, data):
self.stream.write(data)
self.stream.flush()
if not self.stream.isatty():
os.fsync(self.stream.fileno())
def __getattr__(self, attr):
return getattr(self.stream, attr)
sys.stdout = ForceIOStream(sys.stdout)
sys.stderr = ForceIOStream(sys.stderr)
and the issue has to do with NFS not syncing data back to the master until a file is closed or fsync is called.
I hit this same problem today and solved it by just writing to disk instead of printing:
with open('log-file.txt','w') as out:
out.write(status_report)
print() supports the argument flush since Python 3.3 (documentation). So, to force flush the stream:
print('Hello World!', flush=True)

python print function in real time

I recently switched OS and am using a newer Python (2.7). On my old system, I used to be able to print instantaneously. For instance, suppose I had a computationally intense for loop:
for i in range(10):
huge calculation
print i
then as the code completed each iteration, it would print i
However, on my current system, python seems to cache the stdout so that the terminal is blank for several minutes, after which it prints:
1
2
3
in short succession. Then, after a few more minutes, it prints:
4
5
6
and so on. How can I make python print as soon as it reaches the print statement?
Try to call flush of stdout after the print
import sys
...
sys.stdout.flush()
Or use a command line option -u which:
Force stdin, stdout and stderr to be totally unbuffered.
Since Python 3.3, you can simply pass flush=True to the print function.
Import the new print-as-function as in Python 3.x:
from __future__ import print_function
(put the statement at the top of your script/module)
This allows you to replace the new print function with your own:
def print(s, end='\n', file=sys.stdout):
file.write(s + end)
file.flush()
The advantage is that this way your script will work just the same when you upgrade one day to Python 3.x.
Ps1: I did not try it out, but the print-as-function might just flush by default.
PS2: you might also be interested in my progressbar example.

Categories

Resources