I am trying to create a stream object that triggers a callback function any time data is written to it.
class MonitoredStream():
def __init__(self, outstream, callback):
self.outstream = outstream
self.callback = callback
def write(self, s):
self.callback(s)
self.outstream.write(s)
def __getattr__(self, attr):
return getattr(self.outstream, attr)
This works fine when I call the write method directly, but I would love to have it work also when I have a subprocess' output hooked to the stream. For example:
def f(s):
print("Write")
p = sub.Popen(["sh", "test.sh"], stdout=MonitoredStream(sys.stdout, f))
p.communicate()
This just sends output directly to sys.stdout, bypassing the write function completely. Is there a way that I can monitor this output also?
I believe the issue here is that subprocess.Popen doesn't use the Python interface to the pipe - it instead gets the file descriptor and then uses that to write to the pipe directly, which, as you give the attributes of the stdout pipe, means it uses that, bypassing your code.
My best guess at solving this is to make a new in-between pipe that sits in the middle to let you deal with the stream yourself. I would implement this as a context manager:
import sys
import os
from subprocess import Popen
from contextlib import contextmanager
#contextmanager
def monitor(stream, callback):
read, write = os.pipe()
yield write
os.close(write)
with os.fdopen(read) as f:
for line in f:
callback(line)
stream.write(line)
def f(s):
print("Write")
with monitor(sys.stdout, f) as stream:
p = Popen(["ls"], stdout=stream)
p.communicate()
Although you could, of course, still use a class:
import sys
import os
from subprocess import Popen
class MonitoredStream():
def __init__(self, stream, callback):
self.stream = stream
self.callback = callback
self._read, self._write = os.pipe()
def fileno(self):
return self._write
def process(self):
os.close(self._write)
with os.fdopen(self._read) as f:
for line in f:
self.callback(line)
self.stream.write(line)
def f(s):
print("Write")
stream = MonitoredStream(sys.stdout, f)
p = Popen(["ls"], stdout=stream)
p.communicate()
print(stream.process())
Although I feel this is less elegant.
Related
I am using multiprocessing package to spawn a second process from which I would like to redirect stdout and stderr into the first process. I am using multiprocessing.Pipe object:
dup2(output_pipe.fileno(), 1)
Where output_pipe is an instance of multiprocessing.Pipe. However, when I try to read on the other end, it just hangs. I tried reading using Pipe.recv_bytes with a limit, but that raises an OSError. Is this possible at all or should I just switch to some lower level pipe functions?
After experimenting in Python 2.7 I got this working example. With os.dup2 pipe's file descriptor is copied to standard output file descriptor, and each print function ends up writing to a pipe.
import os
import multiprocessing
def tester_method(w):
os.dup2(w.fileno(), 1)
for i in range(3):
print 'This is a message!'
if __name__ == '__main__':
r, w = multiprocessing.Pipe()
reader = os.fdopen(r.fileno(), 'r')
process = multiprocessing.Process(None, tester_method, 'TESTER', (w,))
process.start()
for i in range(3):
print 'From pipe: %s' % reader.readline()
reader.close()
process.join()
Output:
From pipe: This is a message!
From pipe: This is a message!
From pipe: This is a message!
The existing answer works for the raw file descriptors, but this may be useful for using Pipe.send() and recv:
class PipeTee(object):
def __init__(self, pipe):
self.pipe = pipe
self.stdout = sys.stdout
sys.stdout = self
def write(self, data):
self.stdout.write(data)
self.pipe.send(data)
def flush(self):
self.stdout.flush()
def __del__(self):
sys.stdout = self.stdout
To use this, create the object in your multiprocess function, pass it the write side of multiprocessing.Pipe, and then use the read side on the parent process with recv, using poll to check if data exists.
The function glib.spawn_async allows you to hook three callbacks which are called on event on stdout, stderr, and on process completion.
How can I mimic the same functionality with subprocess with either threads or asyncio?
I am more interested in the functionality rather than threading/asynio but an answer that contains both will earn a bounty.
Here is a toy program that shows what I want to do:
import glib
import logging
import os
import gtk
class MySpawn(object):
def __init__(self):
self._logger = logging.getLogger(self.__class__.__name__)
def execute(self, cmd, on_done, on_stdout, on_stderr):
self.pid, self.idin, self.idout, self.iderr = \
glib.spawn_async(cmd,
flags=glib.SPAWN_DO_NOT_REAP_CHILD,
standard_output=True,
standard_error=True)
fout = os.fdopen(self.idout, "r")
ferr = os.fdopen(self.iderr, "r")
glib.child_watch_add(self.pid, on_done)
glib.io_add_watch(fout, glib.IO_IN, on_stdout)
glib.io_add_watch(ferr, glib.IO_IN, on_stderr)
return self.pid
if __name__ == '__main__':
logging.basicConfig(format='%(thread)d %(levelname)s: %(message)s',
level=logging.DEBUG)
cmd = '/usr/bin/git ls-remote https://github.com/DiffSK/configobj'.split()
def on_done(pid, retval, *args):
logging.info("That's all folks!…")
def on_stdout(fobj, cond):
"""This blocks which is fine for this toy example…"""
for line in fobj.readlines():
logging.info(line.strip())
return True
def on_stderr(fobj, cond):
"""This blocks which is fine for this toy example…"""
for line in fobj.readlines():
logging.error(line.strip())
return True
runner = MySpawn()
runner.execute(cmd, on_done, on_stdout, on_stderr)
try:
gtk.main()
except KeyboardInterrupt:
print('')
I should add that since readlines() is blocking, the above will buffer all the output and send it at once. If this is not what one wants, then you have to use readline() and make sure that on end of command you finish reading all the lines you did not read before.
asyncio has subprocess_exec, there is no need to use the subprocess module at all:
import asyncio
class Handler(asyncio.SubprocessProtocol):
def pipe_data_received(self, fd, data):
# fd == 1 for stdout, and 2 for stderr
print("Data from /bin/ls on fd %d: %s" % (fd, data.decode()))
def pipe_connection_lost(self, fd, exc):
print("Connection lost to /bin/ls")
def process_exited(self):
print("/bin/ls is finished.")
loop = asyncio.get_event_loop()
coro = loop.subprocess_exec(Handler, "/bin/ls", "/")
loop.run_until_complete(coro)
loop.close()
With subprocess and threading, it's simple as well. You can just spawn a thread per pipe, and one to wait() for the process:
import subprocess
import threading
class PopenWrapper(object):
def __init__(self, args):
self.process = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.DEVNULL)
self.stdout_reader_thread = threading.Thread(target=self._reader, args=(self.process.stdout,))
self.stderr_reader_thread = threading.Thread(target=self._reader, args=(self.process.stderr,))
self.exit_watcher = threading.Thread(target=self._exit_watcher)
self.stdout_reader_thread.start()
self.stderr_reader_thread.start()
self.exit_watcher.start()
def _reader(self, fileobj):
for line in fileobj:
self.on_data(fileobj, line)
def _exit_watcher(self):
self.process.wait()
self.stdout_reader_thread.join()
self.stderr_reader_thread.join()
self.on_exit()
def on_data(self, fd, data):
return NotImplementedError
def on_exit(self):
return NotImplementedError
def join(self):
self.process.wait()
class LsWrapper(PopenWrapper):
def on_data(self, fd, data):
print("Received on fd %r: %s" % (fd, data))
def on_exit(self):
print("Process exited.")
LsWrapper(["/bin/ls", "/"]).join()
However, mind that glib does not use threads to asynchroneously execute your callbacks. It uses an event loop, just as asyncio does. The idea is that at the core of your program is a loop that waits until something happens, and then synchronously executes an associated callback. In your case, that's "data becomes available for reading on one of the pipes", and "the subprocess has exited". In general, its also stuff like "the X11-server reported mouse movement", "there's incoming network traffic", etc. You can emulate glib's behaviour by writing your own event loop. Use the select module on the two pipes. If select reports that the pipes are readable, but read returns no data, the process likely exited - call the poll() method on the subprocess object in this case to check whether it is completed, and call your exit callback if it has, or an error callback elsewise.
There was a redirect_output function in IPython.utils, and there was a %%capture magic function, but these are now gone, and this thread on the topic is now outdated.
I'd like to do something like the following:
from IPython.utils import io
from __future__ import print_function
with io.redirect_output(stdout=False, stderr="stderr_test.txt"):
while True:
print('hello!', file=sys.stderr)
Thoughts? For more context, I am trying to capture the output of some ML functions that run for hours or days, and output a line every 5-10 seconds to stderr. I then want to take the output, munge it, and plot the data.
You could probably try replacing sys.stderr with some other file descriptor the same way as suggested here.
import sys
oldstderr = sys.stderr
sys.stderr = open('log.txt', 'w')
# do something
sys.stderr = oldstderr
Update: starting form Python 3.4, you should consuder using contextlib.redirect_stdout() instead, like this:
f = io.StringIO()
with redirect_stdout(f):
print('a')
s = f.getvalue()
#Ben, just replacing sys.stderr did not work, and the full flush logic suggested in the post was necessary. But thank you for the pointer as it finally gave me a working version:
import sys
oldstderr = sys.stderr
sys.stderr = open('log.txt', 'w')
class flushfile():
def __init__(self, f):
self.f = f
def __getattr__(self,name):
return object.__getattribute__(self.f, name)
def write(self, x):
self.f.write(x)
self.f.flush()
def flush(self):
self.f.flush()
sys.sterr = flushfile(sys.stderr)
from __future__ import print_function
# some long running function here, e.g.
for i in range(1000000):
print('hello!', file=sys.stderr)
sys.stderr = oldstderr
It would have been nice if Jupyter kept the redirect_output() function and/or the %%capture magic.
I have this somewhat complicated command line function in Python (lets call it myFunction()), and I am working to integrate it in a graphical interface (using PySide/Qt).
The GUI is used to help select inputs, and display outputs. However, myFunction is designed to work as a stand-alone command line function, and it occasionnaly prints out the progress.
My question is: how can I intercept these print calls and display them in the GUI?
I know it would be possible to modify myFunction() to send processEvents() to the GUI, but I would then lose the ability to execute myFunction() in a terminal.
Ideally, I would like something similar to Ubuntu's graphical software updater, which has a small embeded terminal-looking widget displaying what apt-get would display were it executed in a terminal.
you could redirect stdout and restore after. for example:
import StringIO
import sys
# somewhere to store output
out = StringIO.StringIO()
# set stdout to our StringIO instance
sys.stdout = out
# print something (nothing will print)
print 'herp derp'
# restore stdout so we can really print (__stdout__ stores the original stdout)
sys.stdout = sys.__stdout__
# print the stored value from previous print
print out.getvalue()
Wrap it with a function that hijacks stdout:
def stdin2file(func, file):
def innerfunc(*args, **kwargs):
old = sys.stdout
sys.stdout = file
try:
return func(*args, **kwargs)
finally:
sys.stdout = old
return innerfunc
Then simply provide a file like object that supports write():
class GUIWriter:
def write(self, stuff):
#send stuff to GUI
MyFunction = stdin2file(MyFunction, GUIWriter())
The wrapper can be turned into a decorator too:
def redirect_stdin(file):
def stdin2file(func, file):
def innerfunc(*args, **kwargs):
old = sys.stdout
sys.stdout = file
try:
return func(*args, **kwargs)
finally:
sys.stdout = old
return innerfunc
return stdin2file
The use it when declaring MyFunction():
#redirect_stdin(GUIWriter())
def MyFunction(a, b, c, d):
# any calls to print will call the 'write' method of the GUIWriter
# do stuff
Here is a Python 3 pattern using contextmanager that both encapsulates the monkey-patch technique and also ensures that sys.stdout is restored in case of an exception.
from io import StringIO
import sys
from contextlib import contextmanager
#contextmanager
def capture_stdout():
"""
context manager encapsulating a pattern for capturing stdout writes
and restoring sys.stdout even upon exceptions
Examples:
>>> with capture_stdout() as get_value:
>>> print("here is a print")
>>> captured = get_value()
>>> print('Gotcha: ' + captured)
>>> with capture_stdout() as get_value:
>>> print("here is a print")
>>> raise Exception('oh no!')
>>> print('Does printing still work?')
"""
# Redirect sys.stdout
out = StringIO()
sys.stdout = out
# Yield a method clients can use to obtain the value
try:
yield out.getvalue
finally:
# Restore the normal stdout
sys.stdout = sys.__stdout__
All printing is done via sys.stdout, which is a ordinary file-like object: iirc, it requires a method write(str). As long as your replacement has that method, it's quite easy to drop in your hook:
import sys
class CaptureOutput:
def write(self, message):
log_message_to_textbox(message)
sys.stdout = CaptureOutput()
The actual contents of log_message_to_textbox are up to you.
As part of trying to test a legacy function's 'print to stdout' side-effect, I want to capture stdout for later replay. I use mock.
goals (fulfill as many as possible!)
stdout still prints where it normally would, but there is an additional recorder
ideally, this should be 'patched' or only occur in a context
My implementation (below) has patching that seems a bit heavy / gross. Is there a saner way to do it? cStringIO? Any better parts of mock I can use, rather that my __getattr__ hack?
class StreamCapturing(object):
def __init__(self, stream):
self.captured = []
self.stream = stream
def __getattr__(self,attr):
return getattr(self.stream,attr)
def write(self, data):
self.captured.append(data)
self.stream.write(data)
import sys
import mock
with mock.patch('sys.stdout',StreamCapturing(sys.stdout)) as ctx:
sys.stdout.write('a\n')
print 'stdout'
sys.__stdout__.write("the real one\n")
print sys.stdout.captured
sys.stdout.flush()
assert getattr(sys.stdout,'captured') is None
You don't even need to save the previous stdout python does it for you and yes use cStringIO
import sys
from cStringIO import StringIO
sys.stdout = captured = StringIO()
print "test string"
# test stuff
captured = captured.getvalue()
sys.stdout = sys.__stdout__
print "captured",captured
You do not need mock in this situation:
saved_stdout = sys.stdout
sys.stdout = StreamCapturing(saved_stdout)
print "stdout"
captured = "".join(sys.stdout.captured)
sys.stdout=saved_stdout
print "captured: ", captured