Out of order output from Popen - python

I'm writing an alternative terminal window (using PySide), and I'm running the shell (bash) using:
subprocess.Popen(['/bin/bash','-i'],....
while setting the various stdio to subprocess.PIPE
I'm also disabling buffering on the output stdio (out,err) using
fcntl(s.fileno(),F_SETFL,os.O_NONBLOCK)
Then I'm using a timer to poll the output io for available data and pull it.
It works fairly well, but I'm getting some strange behavior some of the time. If at a prompt I issue a command (e.g. pwd), I get two distinct possible outputs:
/etc:$ pwd
/etc
/etc:$
And the other is
/etc:$ pwd/etc
/etc:$
As if the newline from the command and the rest of the output get swapped. This happens for basically any command, so for ls, for example, the first file appears right after the ls, and an empty line after the last file.
What bugs me is that it is not consistent.
EDIT: Added full code sample
#!/usr/bin/python
from PySide import QtCore
from PySide import QtGui
import fcntl
import os
import subprocess
import sys
class MyTerminal(QtGui.QDialog):
def __init__(self,parent=None):
super(MyTerminal,self).__init__(parent)
startPath=os.path.expanduser('~')
self.process=subprocess.Popen(['/bin/bash','-i'],cwd=startPath,stdout=subprocess.PIPE,stdin=subprocess.PIPE,stderr=subprocess.PIPE)
fcntl.fcntl(self.process.stdout.fileno(),fcntl.F_SETFL,os.O_NONBLOCK)
fcntl.fcntl(self.process.stderr.fileno(),fcntl.F_SETFL,os.O_NONBLOCK)
self.timer=QtCore.QTimer(self)
self.connect(self.timer,QtCore.SIGNAL("timeout()"),self.onTimer)
self.started=False
def keyPressEvent(self,event):
text=event.text()
if len(text)>0:
if not self.started:
self.timer.start(10)
self.started=True
self.sendKeys(text)
event.accept()
def sendKeys(self,text):
self.process.stdin.write(text)
def output(self,text):
sys.stdout.write(text)
sys.stdout.flush()
def readOutput(self,io):
try:
text=io.read()
if len(text)>0:
self.output(text)
except IOError:
pass
def onTimer(self):
self.readOutput(self.process.stdout)
self.readOutput(self.process.stderr)
def main():
app=QtGui.QApplication(sys.argv)
t=MyTerminal()
t.show()
app.exec_()
if __name__=='__main__':
main()

After trying to create a small code example to paste (added above), I noticed that the problem arises because of synchronization between the stdout and stderr.
A little bit of searching led me to the following question:
Merging a Python script's subprocess' stdout and stderr while keeping them distinguishable
I tried the first answer there and used the polling method, but this didn't solve things, as I was getting events mixing in the same manner as before.
What solved the problem was the answer by mossman which basically redirected the stderr to the stdout, which in my case is good enough.

Related

Ensure that error messages are printed last in Python

This must have an answer somewhere but I couldn't find it.
I would like my error/exception messages to be the last thing printed to the terminal, but it seems random whether they come out before all the text I have printed, after all the text I have printed, or somewhere in the middle of it.
I though a solution would be to use sys.stdout.flush(), so I tried the following:
if __name__ == '__main__':
import sys
try:
main()
except:
sys.stdout.flush()
raise
..But this doesn't work for some reason, it is still seemingly random in which order the error message and the text I have printed comes out.
Why? and how do I fix this?
EDIT: Here is a minimal reproducible example, which behaves as described above at least on my system:
import sys
import numpy as np
def print_garbage():
print(''.join(map(chr, np.random.randint(0, 256, 100))))
raise Exception
try:
print_garbage()
except:
sys.stdout.flush()
raise
EDIT: I am running Python version 3.10.0 on a windows machine, and the terminal I am using is the cmd through PyCharm terminal. My PyCharm version is Community version 2022.2
You can print traceback to stdout, so that there will be no out-of-sync problem:
import traceback
import sys
try:
print_garbage()
except:
traceback.print_exc(file=sys.stdout)
stdout and stderr are separate channels. stdout is a buffered channel and stderr is unbuffered. This is why you are seeing the stderr message before the stdout one. Python is outputting them in your desired order, but the stdout data is being buffered before it is printed.
See here for how to disable buffering.

Utilizing multiprocessing.Pipe() with subprocess.Popen/run as stdin/stdout

I'm currently working on a POC with the following results to be desired
python script working as a parent, meaning it will start a child process while running it
the child process is oblivious to the fact another script is running it, the very same child script can also be executed as the main script by the user
comfortable way to read the subprocess's outputs (to sys.stdout via print), and the parent's inputs will be sent to the sys.stdin (via input)
I've already done some research on the topic and I am aware that I can pass to Popen/run subprocess.PIPE, and call it a day.
However I saw multiprocessing.Pipe() produces a linked socket pair which allows to send objects through them as a whole, so I don't need to get into when to stop reading a stream and continue afterward
# parent.py
import multiprocessing
import subprocess
import os
pipe1, pipe2 = multiprocessing.Pipe()
if os.fork():
while True:
print(pipe1.recv())
exit() # avoid fork colision
if os.fork():
# subprocess.run is busy wait
subprocess.run(args['python3', 'child.py'], stdin=pipe2.fileno(), stdout=pipe2.fileno())
exit() # avoid fork colision
while True:
user_input = input('> ')
pipe1.send(user_input)
# child.py
import os
import time
if os.fork:
while True:
print('child sends howdy')
time.sleep(1)
with open('child.txt, 'w') as file
while True:
user_input = input('> ')
# We supposedly can't write to sys.stdout because parent.py took control of it
file.write(f'{user_input}\n')
So to finally reach the essence of the problem, child.py is installed as a package,
meaning parent.py doesn't call on the actual file to run the script.
The subprocess is run by calling upon the package
And for some bizarre reason, when child.py is a package vs a script, the code written above doesn't seem to work.
child.py's sys.stdin and sys.stdout fail to work entirely, parent.py is unable to receive ANY of the child.py's prints (even sys.stdout.write(<some_data>) and sys.stdout.flush()),
and the same applies to sys.stdin.
If anyone can shed any light on how to solve it, I would be delighted !
Side Note
When calling upon a package, you don't call upon its main.py (image it's dunder_main_dunder.py) directly.
you call upon a python file which it actually starts up the package.
I assume something fishy might be happening over there when that happens and that what causes the interference, but that's just a theory

Realtime output from python script using subprocess.Popen()

After searching around, I defined a function to execute command like in terminal:
import shlex
import subprocess
def execute_cmd(cmd):
p = subprocess.Popen(shlex.split(cmd), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
for line in iter(p.stdout.readline, b''): # b'' here for python3
sys.stdout.write(line.decode(sys.stdout.encoding))
error = p.stderr.read().decode()
if error:
raise Exception(error)
It works fine(output is realtime), when i
execute_cmd('ping -c 5 www.google.com')
However, when i use execute_cmd to run a python script, the output will print out until the process is done.
execute_cmd('python test.py')
script: test.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import time
print('hello')
time.sleep(2)
print('hello, again')
How can i fix it? thanks!
Sorry for not explaining why 'catch the stdout and then write it to stdout again'. Here i really want to do is catching script outputs to logger, the logger output them to screen(StreamHandler) and log file(FileHandler). I builded and tested the logger part, now the 'execute' part. And ignore the stdout= parameter seems not work.
pipeline:
setup logger;
redirect STDOUT, STDERR to logger;
execute scripts;
Because of step 2, if i ignore stdout= parameter, the outputs of scripts will still output to STDOUT, and will not log in file.
Maybe i can set stdout= to logger?
This is a common problem of the underlying output system, notably on Linux or other Unix-like systems. The io library is smart enough to flush output on each \n when it detects that output is directed to a terminal. But this automatic flush does not occur when output is redirected to a file or a pipe, probably for performance reasons. It is not really a problem when only the data matters, but it leads to weird behaviour when timing matters too.
Unfortunately I know no simple way to fix it from the caller program(*). The only possibility is to have the callee force flushing on each line/block or to use unbuffered output:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import time
import sys
print('hello')
sys.stdout.flush()
time.sleep(2)
print('hello, again')
(*) The bullet proof way would be to use a pseudo-terminal. The caller controls the master side and passes the client side to the callee. The library will detect a terminal and will automatically flushes on each line. But it is no longer portable outside the Unix world and is not really a simple way.

Using subprocess to call R from Python, want to keep STDOUT and ignore STDERR

So this code in Python that I have currently works in returning my STDOUT in the variable "run":
run = subprocess.check_output(['Rscript','runData.R',meth,expr,norm])
But it still prints to the screen all this ugly text from having to install a package in R, etc, etc. So I would like for that to be ignored and sent into STDERR. Is there any way to do this? This is what I'm currently working on but it doesn't seem to work. Again, I just want it to ignore what it is printing to the screen except the results. So I want to ignore STDERR and keep STDOUT. Thank you!
run = subprocess.Popen(['Rscript','runData.R',meth,expr,norm],shell=False, stdout=subprocess.PIPE,stderr=devnull)
To avoid piping stderr entirely you may redirect it to os.devnull:
os.devnull
The file path of the null device. For example: '/dev/null' for POSIX, 'nul' for Windows. Also available via os.path.
import os
import subprocess
with open(os.devnull) as devnull:
subprocess.Popen([cmd arg], stdout=subprocess.PIPE, stderr=devnull)
I actually solved my problem as soon as I posted this! My apologies! This is how it worked:
output = subprocess.Popen(['Rscript','runData.R',meth,expr,norm],shell=False, stdout=subprocess.PIPE,stderr=subprocess.PIPE)
final = output.stdout.read()
This ignored the messy stuff from the command line and saved my results into final.
Thank you for everyone's quick replies!

SGE script: print to file during execution (not just at the end)?

I have an SGE script to execute some python code, submitted to the queue using qsub. In the python script, I have a few print statements (updating me on the progress of the program). When I run the python script from the command line, the print statements are sent to stdout. For the sge script, I use the -o option to redirect the output to a file. However, it seems that the script will only send these to the file after the python script has completed running. This is annoying because (a) I can no longer see real time updates on the program and (b) if my job does not terminate correctly (for example if my job gets kicked off the queue) none of the updates are printed. How can I make sure that the script is writing to the file each time it I want to print something, as opposed to lumping it all together at the end?
I think you are running into an issue with buffered output. Python uses a library to handle it's output, and the library knows that it's more efficient to write a block at a time when it's not talking to a tty.
There are a couple of ways to work around this. You can run python with the "-u" option (see the python man page for details), for example, with something like this as the first line of your script:
#! /usr/bin/python -u
but this doesn't work if you are using the "/usr/bin/env" trick because you don't know where python is installed.
Another way is to reopen the stdout with something like this:
import sys
import os
# reopen stdout file descriptor with write mode
# and 0 as the buffer size (unbuffered)
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
Note the bufsize parameter of os.fdopen being set to 0 to force it to be unbuffered. You can do something similar with sys.stderr.
As others mentioned, it is out of performance reasons to not always write the stdout when not connected to a tty.
If you have a specific point at which you want the stdout to be written, you can force that by using
import sys
sys.stdout.flush()
at that point.
I just encountered a similar issue with SGE, and no suggested method to "unbuffer" the file IO seemed to work for me. I had to wait until the end of program execution to see any output.
The workaround I found was to wrap sys.stdout into a custom object that re-implements the "write" method. Instead of actually writing to stdout, this new method instead opens the file where IO is redirected, appends with the desired data, and then closes the file. It's a bit ugly, but I found it solved the problem, since the actual opening/closing of the file forces IO to be interactive.
Here's a minimal example:
import os, sys, time
class RedirIOStream:
def __init__(self, stream, REDIRPATH):
self.stream = stream
self.path = REDIRPATH
def write(self, data):
# instead of actually writing, just append to file directly!
myfile = open( self.path, 'a' )
myfile.write(data)
myfile.close()
def __getattr__(self, attr):
return getattr(self.stream, attr)
if not sys.stdout.isatty():
# Detect redirected stdout and std error file locations!
# Warning: this will only work on LINUX machines
STDOUTPATH = os.readlink('/proc/%d/fd/1' % os.getpid())
STDERRPATH = os.readlink('/proc/%d/fd/2' % os.getpid())
sys.stdout=RedirIOStream(sys.stdout, STDOUTPATH)
sys.stderr=RedirIOStream(sys.stderr, STDERRPATH)
# Simple program to print msg every 3 seconds
def main():
tstart = time.time()
for x in xrange( 10 ):
time.sleep( 3 )
MSG = ' %d/%d after %.0f sec' % (x, args.nMsg, time.time()-tstart )
print MSG
if __name__ == '__main__':
main()
This is SGE buffering the output of your process, it happens whether its a python process or any other.
In general you can decrease or disable the buffering in SGE by changing it and recompiling. But its not a great thing to do, all that data is going to be slowly written to disk affecting your overall performance.
Why not print to a file instead of stdout?
outFileID = open('output.log','w')
print(outFileID,'INFO: still working!')
print(outFileID,'WARNING: blah blah!')
and use
tail -f output.log
This works for me:
class ForceIOStream:
def __init__(self, stream):
self.stream = stream
def write(self, data):
self.stream.write(data)
self.stream.flush()
if not self.stream.isatty():
os.fsync(self.stream.fileno())
def __getattr__(self, attr):
return getattr(self.stream, attr)
sys.stdout = ForceIOStream(sys.stdout)
sys.stderr = ForceIOStream(sys.stderr)
and the issue has to do with NFS not syncing data back to the master until a file is closed or fsync is called.
I hit this same problem today and solved it by just writing to disk instead of printing:
with open('log-file.txt','w') as out:
out.write(status_report)
print() supports the argument flush since Python 3.3 (documentation). So, to force flush the stream:
print('Hello World!', flush=True)

Categories

Resources