How to capture python subprocess stdout in unittest - python

I am trying to write a unit test that executes a function that writes to stdout, capture that output, and check the result. The function in question is a black box: we can't change how it is writing it's output. For purposes of this example I've simplified it quite a bit, but essentially the function generates its output using subprocess.call().
No matter what I try I can't capture the output. It is always written to the screen, and the test fails because it captures nothing. I experimented with both print() and os.system(). With print() I can capture stdout, but not with os.system() either.
It's also not specific to unittesting. I've written my test example without that with the same results.
Questions similar to this have been asked a lot, and the answers all seem to boil down to use subprocess.Popen() and communicate(), but that would require changing the black box. I'm sure there's an answer I just haven't come across, but I'm stumped.
We are using Python-2.7.
Anyway my example code is this:
#!/usr/bin/env python
from __future__ import print_function
import sys
sys.dont_write_bytecode = True
import os
import unittest
import subprocess
from contextlib import contextmanager
from cStringIO import StringIO
# from somwhere import my_function
def my_function(arg):
#print('my_function:', arg)
subprocess.call(['/bin/echo', 'my_function: ', arg], shell=False)
#os.system('echo my_function: ' + arg)
#contextmanager
def redirect_cm(new_stdout):
old_stdout = sys.stdout
sys.stdout = new_stdout
try:
yield
finally:
sys.stdout = old_stdout
class Test_something(unittest.TestCase):
def test(self):
fptr = StringIO()
with redirect_cm(fptr):
my_function("some_value")
self.assertEqual("my_function: some_value\n", fptr.getvalue())
if __name__ == '__main__':
unittest.main()

There are two issues in the above code
StringIO fptr does not shared by the current and the spawned process, we could not get the result in current process even if the spawned process has written result to StringIO object
Changing sys.stdout doesn’t affect the standard I/O streams of processes executed by os.popen(), os.system() or the exec*() family of functions in the os module
A simple solution is
use os.pipe to share result between the two processes
use os.dup2 instead of changing sys.stdout
A demo example as following shown
import sys
import os
import subprocess
from contextlib import contextmanager
#contextmanager
def redirect_stdout(new_out):
old_stdout = os.dup(1)
try:
os.dup2(new_out, sys.stdout.fileno())
yield
finally:
os.dup2(old_stdout, 1)
def test():
reader, writer = os.pipe()
with redirect_stdout(writer):
subprocess.call(['/bin/echo', 'something happened what'], shell=False)
print os.read(reader, 1024)
test()

Related

Why does this test crash when pdb.set_trace() is called?

Simple unittest below.
If I run it (e.g., python -m unittest module_name) without 'test' as an argument, it passes. If I run it with 'test' as an argument, I get "TypeError: bad argument type for built-in operation". Why?
from io import StringIO
import sys
from unittest import TestCase
class TestSimple(TestCase):
def test_simple(self):
old_stdout = sys.stdout
buf = StringIO()
try:
sys.stdout = buf
print('hi')
finally:
import pdb
if 'test' in sys.argv:
pdb.set_trace()
sys.stdout = old_stdout
contextlib.redirect_stdout version:
from contextlib import redirect_stdout
from io import StringIO
import pdb
import sys
from unittest import TestCase
class TestSimple(TestCase):
def test_simple(self):
buf = StringIO()
with redirect_stdout(buf):
print('hi')
pdb.set_trace()
print('finis')
Thanks in advance.
Edit:
The original program was tested in Python 3.4 in both Debian and Windows 7.
Something similar (using environment flags instead of a command line argument) appears to hang in Python 2, but pressing c allows it to finish, so I'm guessing it might just be that pdb's UI has been redirected.But the Python 3 version has the behavior initial described (crashes), although a colleague tested on 3.4 on Mac OS and saw the "hang" behavior.
You need to give pdb the original stdout:
pdb.Pdb(stdout=sys.__stdout__).set_trace()

How can I suppress stdout logging output for a module I'm importing?

I'm importing a module foo which uses Python's logging module. However, foo produces a huge amount of logging output, and I need to use stdout to communicate important information to the user, which is largely being drowned out by the ridiculous output of the module I'm importing.
How can I disable the module's ability to log to stdout without modifying foo's code? I still want it to log to the files it logs to, but I don't want it logging to stdout.
I have tried the following:
logging.getLogger("foo").propagate = False
and
#contextlib.contextmanager
def nostdout():
class DummyFile(object):
def write(self, x): pass
save_stdout = sys.stdout
sys.stdout = DummyFile()
yield
sys.stdout = save_stdout
with nostdout(): import foo
Try the following:
logging.getLogger(<logger_name_used_in_foo>).propagate = False
I'm referencing this article.
In general, if you want to capture anything written to stdout you can use the contextlib in Python 3:
from contextlib import redirect_stdout
f = io.StringIO()
with redirect_stdout(f):
print('foobar')
call_annoying_module()
print('Stdout: "{0}"'.format(f.getvalue()))
On Python 3.4 and older, redirect_stdout can be implemented like this:
from contextlib import contextmanager
#contextmanager
def stdout_redirector(stream):
old_stdout = sys.stdout
sys.stdout = stream
try:
yield
finally:
sys.stdout = old_stdout
If the library has any C bindings that print using puts then it gets more complicated. See the article.
The easiest case is when you're running another program using subprocess,
then all stdout output can be easily captured.
proc = subprocess.Popen("echo output".split(), stdout=subprocess.PIPE)
std_output, err_output = proc.communicate()

Is there a way to redirect stderr to file in Jupyter?

There was a redirect_output function in IPython.utils, and there was a %%capture magic function, but these are now gone, and this thread on the topic is now outdated.
I'd like to do something like the following:
from IPython.utils import io
from __future__ import print_function
with io.redirect_output(stdout=False, stderr="stderr_test.txt"):
while True:
print('hello!', file=sys.stderr)
Thoughts? For more context, I am trying to capture the output of some ML functions that run for hours or days, and output a line every 5-10 seconds to stderr. I then want to take the output, munge it, and plot the data.
You could probably try replacing sys.stderr with some other file descriptor the same way as suggested here.
import sys
oldstderr = sys.stderr
sys.stderr = open('log.txt', 'w')
# do something
sys.stderr = oldstderr
Update: starting form Python 3.4, you should consuder using contextlib.redirect_stdout() instead, like this:
f = io.StringIO()
with redirect_stdout(f):
print('a')
s = f.getvalue()
#Ben, just replacing sys.stderr did not work, and the full flush logic suggested in the post was necessary. But thank you for the pointer as it finally gave me a working version:
import sys
oldstderr = sys.stderr
sys.stderr = open('log.txt', 'w')
class flushfile():
def __init__(self, f):
self.f = f
def __getattr__(self,name):
return object.__getattribute__(self.f, name)
def write(self, x):
self.f.write(x)
self.f.flush()
def flush(self):
self.f.flush()
sys.sterr = flushfile(sys.stderr)
from __future__ import print_function
# some long running function here, e.g.
for i in range(1000000):
print('hello!', file=sys.stderr)
sys.stderr = oldstderr
It would have been nice if Jupyter kept the redirect_output() function and/or the %%capture magic.

How to capture print output of another module?

I was wondering if this is possible in python:
# module1
def test():
print('hey')
# module2
import module1
module1.test() # prints to stdout
Without modifying module1 is there any way to wrap this in module2 so that I can capture the
print('hey') inside a variable? Apart from running module1 as a script?
I don't want to be responsible for modifying sys.stdout and then restoring it to its previous values. The above answers don't have any finally: clause, which can be dangerous integrating this into other important code.
https://docs.python.org/3/library/contextlib.html
import contextlib, io
f = io.StringIO()
with contextlib.redirect_stdout(f):
module1.test()
output = f.getvalue()
You probably want the variable output which is <class 'str'> with the redirected stdout.
Note: this code is lifted from the official docs with trivial modifications (but tested). Another version of this answer was already given to a mostly duplicated question here: https://stackoverflow.com/a/22434594/1092940
I leave the answer here because it is a much better solution than the others here IMO.
Yes, all you need is to redirect the stdout to a memory buffer that complies with the interface of stdout, you can do it with StringIO. This works for me in 2.7:
import sys
import cStringIO
stdout_ = sys.stdout #Keep track of the previous value.
stream = cStringIO.StringIO()
sys.stdout = stream
print "hello" # Here you can do whatever you want, import module1, call test
sys.stdout = stdout_ # restore the previous stdout.
variable = stream.getvalue() # This will get the "hello" string inside the variable
Yes, you can. You need to take control of sys.stdout. Something like this:
import sys
stdout_ = sys.stdout #Keep track of the previous value.
sys.stdout = open('myoutputfile.txt', 'w') # Something here that provides a write method.
# calls to print, ie import module1
sys.stdout = stdout_ # restore the previous stdout.
For Python 3:
# redirect sys.stdout to a buffer
import sys, io
stdout = sys.stdout
sys.stdout = io.StringIO()
# call module that calls print()
import module1
module1.test()
# get output and restore sys.stdout
output = sys.stdout.getvalue()
sys.stdout = stdout
print(output)
There's No need to use another module, just class object with write attribute, with one input, which you can save in another variable. for ecample
CLASS:
class ExClass:
def __init__(self):
self.st = ''
def write(self, o): #here o is the output that goes to stdout
self.st += str(o)
MAIN Program:
import sys
stdout_ = sys.stdout
var = ExClass()
sys.stdout = var
print("Hello") # these will not be pronted
print("Hello2") # instead will be written in var.st
sys.stdout = stdout_
print(var.st)
output will be
Hello
Hello2
Sending ftplib debug output to the logging module
Based on the approach taken by jimmy kumar ahalpara answer, I was able to capture ftplib's debug output into logging. ftplib was around before the logging module and uses print to emit debug messages.
I'd tried reassigning the print function to a logging method but I couldn't get that to work. The code below works for me.
I should think this will work with other modules as well but there would not be any granularity between different module's output as it's capturing everything sent to stdout to the same logger.
# convenience class to redirect stdout to logging
class SendToLog:
def __init__(self, logging_method):
self.logger = logging
_method
def write(self, o):
if str(o).strip(): # ignore empty lines
self.logger(str(o))
import logging
import sys
# code to initialise logging output and handlers ...
# ...
# get logger for ftplib and redirect it's print output to our log
ftp_logger = logging.getLogger('ftplib')
# note: logging's debug method is passed to the class, the instance then calls this method
sys.stdout = SendToLog(ftp_logger.debug)
# code to do stuff with ftplib ...
# remember to set ftplib's debug level > 0 or there will be no output
# FTP.set_debuglevel(1)
# ...
# important to finalise logging and restore stdout
logging.shutdown()
sys.stdout = sys.__stdout__
python3 stdout ftplib logging

How to attach debugger to a python subproccess?

I need to debug a child process spawned by multiprocessing.Process(). The pdb degugger seems to be unaware of forking and unable to attach to already running processes.
Are there any smarter python debuggers which can be attached to a subprocess?
I've been searching for a simple to solution for this problem and came up with this:
import sys
import pdb
class ForkedPdb(pdb.Pdb):
"""A Pdb subclass that may be used
from a forked multiprocessing child
"""
def interaction(self, *args, **kwargs):
_stdin = sys.stdin
try:
sys.stdin = open('/dev/stdin')
pdb.Pdb.interaction(self, *args, **kwargs)
finally:
sys.stdin = _stdin
Use it the same way you might use the classic Pdb:
ForkedPdb().set_trace()
Winpdb is pretty much the definition of a smarter Python debugger. It explicitly supports going down a fork, not sure it works nicely with multiprocessing.Process() but it's worth a try.
For a list of candidates to check for support of your use case, see the list of Python Debuggers in the wiki.
This is an elaboration of Romuald's answer which restores the original stdin using its file descriptor. This keeps readline working inside the debugger. Besides, pdb special management of KeyboardInterrupt is disabled, in order it not to interfere with multiprocessing sigint handler.
class ForkablePdb(pdb.Pdb):
_original_stdin_fd = sys.stdin.fileno()
_original_stdin = None
def __init__(self):
pdb.Pdb.__init__(self, nosigint=True)
def _cmdloop(self):
current_stdin = sys.stdin
try:
if not self._original_stdin:
self._original_stdin = os.fdopen(self._original_stdin_fd)
sys.stdin = self._original_stdin
self.cmdloop()
finally:
sys.stdin = current_stdin
Building upon #memplex idea, I had to modify it to get it to work with joblib by setting the sys.stdin in the constructor as well as passing it directly along via joblib.
import os
import pdb
import signal
import sys
import joblib
_original_stdin_fd = None
class ForkablePdb(pdb.Pdb):
_original_stdin = None
_original_pid = os.getpid()
def __init__(self):
pdb.Pdb.__init__(self)
if self._original_pid != os.getpid():
if _original_stdin_fd is None:
raise Exception("Must set ForkablePdb._original_stdin_fd to stdin fileno")
self.current_stdin = sys.stdin
if not self._original_stdin:
self._original_stdin = os.fdopen(_original_stdin_fd)
sys.stdin = self._original_stdin
def _cmdloop(self):
try:
self.cmdloop()
finally:
sys.stdin = self.current_stdin
def handle_pdb(sig, frame):
ForkablePdb().set_trace(frame)
def test(i, fileno):
global _original_stdin_fd
_original_stdin_fd = fileno
while True:
pass
if __name__ == '__main__':
print "PID: %d" % os.getpid()
signal.signal(signal.SIGUSR2, handle_pdb)
ForkablePdb().set_trace()
fileno = sys.stdin.fileno()
joblib.Parallel(n_jobs=2)(joblib.delayed(test)(i, fileno) for i in range(10))
remote-pdb can be used to debug sub-processes. After installation, put the following lines in the code you need to debug:
import remote_pdb
remote_pdb.set_trace()
remote-pdb will print a port number which will accept a telnet connection for debugging that specific process. There are some caveats around worker launch order, where stdout goes when using various frontends, etc. To ensure a specific port is used (must be free and accessible to the current user), use the following instead:
from remote_pdb import RemotePdb
RemotePdb('127.0.0.1', 4444).set_trace()
remote-pdb may also be launched via the breakpoint() command in Python 3.7.
Just use PuDB that gives you an awesome TUI (GUI on terminal) and supports multiprocessing as follow:
from pudb import forked; forked.set_trace()
An idea I had was to create "dummy" classes to fake the implementation of the methods you are using from multiprocessing:
from multiprocessing import Pool
class DummyPool():
#staticmethod
def apply_async(func, args, kwds):
return DummyApplyResult(func(*args, **kwds))
def close(self): pass
def join(self): pass
class DummyApplyResult():
def __init__(self, result):
self.result = result
def get(self):
return self.result
def foo(a, b, switch):
# set trace when DummyPool is used
# import ipdb; ipdb.set_trace()
if switch:
return b - a
else:
return a - b
if __name__ == '__main__':
xml = etree.parse('C:/Users/anmendoza/Downloads/jim - 8.1/running-config.xml')
pool = DummyPool() # switch between Pool() and DummyPool() here
results = []
results.append(pool.apply_async(foo, args=(1, 100), kwds={'switch': True}))
pool.close()
pool.join()
results[0].get()
Here is the version of the ForkedPdb(Romuald's Solution) which will work for Windows and *nix based systems.
import sys
import pdb
import win32console
class MyHandle():
def __init__(self):
self.screenBuffer = win32console.GetStdHandle(win32console.STD_INPUT_HANDLE)
def readline(self):
return self.screenBuffer.ReadConsole(1000)
class ForkedPdb(pdb.Pdb):
def interaction(self, *args, **kwargs):
_stdin = sys.stdin
try:
if sys.platform == "win32":
sys.stdin = MyHandle()
else:
sys.stdin = open('/dev/stdin')
pdb.Pdb.interaction(self, *args, **kwargs)
finally:
sys.stdin = _stdin
The problem here is that Python always connects sys.stdin in the child process to os.devnull to avoid contention for the stream. But this means that when the debugger (or a simple input()) tries to connect to stdin to get input from the user, it immediately reaches end-of-file and reports an error.
One solution, at least if you don't expect multiple debuggers to run at the same time, is to reopen stdin in the child process. That can be done by setting sys.stdin to open(0), which always opens the active terminal. This in fact is what the ForkedPdb solution does, but it can be done more simply and in an os-independent manner like this:
import multiprocessing, sys
def main():
process = multiprocessing.Process(target=worker)
process.start()
process.join()
def worker():
# Python automatically closes sys.stdin for the subprocess, so we reopen
# stdin. This enables pdb to connect to the terminal and accept commands.
# See https://stackoverflow.com/a/30149635/3830997.
sys.stdin = open(0) # or os.fdopen(0)
print("Hello from the subprocess.")
breakpoint() # or import pdb; pdb.set_trace()
print("Exited from breakpoint in the subprocess.")
if __name__ == '__main__':
main()
If you are on a supported platform, try DTrace. Most of the BSD / Solaris / OS X family support DTrace.
Here is an intro by the author. You can use Dtrace to debug just about anything.
Here is a SO post on learning DTrace.

Categories

Resources