Storing stdout and stderr in Redis from Python Popen - python

I'd like to run a command and store the results in Redis as it is run.
Although the command for the demo is ls /etc, in real life I'd like to use this for long running processes.
I've written out some demo code to show the idea.
Unfortunately this code when run insists on a fileno, and doesn't work, even if I simulate one. How can I accomplish this?
import subprocess
import redis
class RedisFile:
def __init__(self, key):
self.key = key
self.redis = redis.StrictRedis()
print("inited RedisFile with key:", key)
def write(self, value):
self.redis.append(self.key, value)
def main():
out = RedisFile("out")
error = RedisFile("error")
proc = subprocess.Popen(["ls", "/etc"],
stdout=out,
stderr=error,
bufsize=0
)
main()

you should replace sys.stdout with custom stdout.
one example like this:
from unittest.mock import patch
with patch('sys.stdout', your_stdout),patch('sys.stderr', your_stderr):
do_something

Related

Redirect all stdout/stderr globally to logger

Background
I have a very large python application that launches command-line utilities to get pieces of data it needs. I currently just redirect the python launcher script to a log file, which gives me all of the print() output, plus the output of the command-line utilities, i.e.:
python -m launcher.py &> /root/out.log
Problem
I've since implemented a proper logger via logging, which lets me format the logging statements more precisely, lets me limit log file size, etc. I've swapped out most of my print()statements with calls to my logger. However, I have a problem: none of the output from the command-line applications is appearing in my log. It instead gets dumped to the console. Also, the programs aren't all launched the same way: some are launched via popen(), some by exec(), some by os.system(), etc.
Question
Is there a way to globally redirect all stdout/stderr text to my logging function, without having to re-write/modify the code that launches these command-line tools? I tried setting setting the following which I found in another question:
sys.stderr.write = lambda s: logger.error(s)
However it fails with "sys.stderr.write is read-only".
While this is not a full answer, it may show you a redirect to adapt to your particular case. This is how I did it a while back. Although I cannot remember why I did it this way, or what the limitation was I was trying to circumvent, the following is redirecting stdout and stderr to a class for print() statements. The class subsequently writes to screen and to file:
import os
import sys
import datetime
class DebugLogger():
def __init__(self, filename):
timestamp = datetime.datetime.strftime(datetime.datetime.utcnow(),
'%Y-%m-%d-%H-%M-%S-%f')
#build up full path to filename
logfile = os.path.join(os.path.dirname(sys.executable),
filename + timestamp)
self.terminal = sys.stdout
self.log = open(logfile, 'a')
def write(self, message):
timestamp = datetime.datetime.strftime(datetime.datetime.utcnow(),
' %Y-%m-%d-%H:%M:%S.%f')
#write to screen
self.terminal.write(message)
#write to file
self.log.write(timestamp + ' - ' + message)
self.flush()
def flush(self):
self.terminal.flush()
self.log.flush()
os.fsync(self.log.fileno())
def close(self):
self.log.close()
def main(debug = False):
if debug:
filename = 'blabla'
sys.stdout = DebugLogger(filename)
sys.stderr = sys.stdout
print('test')
if __name__ == '__main__':
main(debug = True)
import sys
import io
class MyStream(io.IOBase):
def write(self, s):
logger.error(s)
sys.stderr = MyStream()
print('This is an error', stream=sys.stderr)
This make all call to sys.stderr go to the logger.
The original one is always in sys.__stderr__

How to capture python subprocess stdout in unittest

I am trying to write a unit test that executes a function that writes to stdout, capture that output, and check the result. The function in question is a black box: we can't change how it is writing it's output. For purposes of this example I've simplified it quite a bit, but essentially the function generates its output using subprocess.call().
No matter what I try I can't capture the output. It is always written to the screen, and the test fails because it captures nothing. I experimented with both print() and os.system(). With print() I can capture stdout, but not with os.system() either.
It's also not specific to unittesting. I've written my test example without that with the same results.
Questions similar to this have been asked a lot, and the answers all seem to boil down to use subprocess.Popen() and communicate(), but that would require changing the black box. I'm sure there's an answer I just haven't come across, but I'm stumped.
We are using Python-2.7.
Anyway my example code is this:
#!/usr/bin/env python
from __future__ import print_function
import sys
sys.dont_write_bytecode = True
import os
import unittest
import subprocess
from contextlib import contextmanager
from cStringIO import StringIO
# from somwhere import my_function
def my_function(arg):
#print('my_function:', arg)
subprocess.call(['/bin/echo', 'my_function: ', arg], shell=False)
#os.system('echo my_function: ' + arg)
#contextmanager
def redirect_cm(new_stdout):
old_stdout = sys.stdout
sys.stdout = new_stdout
try:
yield
finally:
sys.stdout = old_stdout
class Test_something(unittest.TestCase):
def test(self):
fptr = StringIO()
with redirect_cm(fptr):
my_function("some_value")
self.assertEqual("my_function: some_value\n", fptr.getvalue())
if __name__ == '__main__':
unittest.main()
There are two issues in the above code
StringIO fptr does not shared by the current and the spawned process, we could not get the result in current process even if the spawned process has written result to StringIO object
Changing sys.stdout doesn’t affect the standard I/O streams of processes executed by os.popen(), os.system() or the exec*() family of functions in the os module
A simple solution is
use os.pipe to share result between the two processes
use os.dup2 instead of changing sys.stdout
A demo example as following shown
import sys
import os
import subprocess
from contextlib import contextmanager
#contextmanager
def redirect_stdout(new_out):
old_stdout = os.dup(1)
try:
os.dup2(new_out, sys.stdout.fileno())
yield
finally:
os.dup2(old_stdout, 1)
def test():
reader, writer = os.pipe()
with redirect_stdout(writer):
subprocess.call(['/bin/echo', 'something happened what'], shell=False)
print os.read(reader, 1024)
test()

Use python's pty to create a live console

I'm trying to create an execution environment/shell that will remotely execute on a server, which streams the stdout,err,in over the socket to be rendered in a browser. I currently have tried the approach of using subprocess.run with a PIPE. The Problem is that I get the stdout after the process has completed. What i want to achieve is to get a line-by-line, pseudo-terminal sort of implementation.
My current implementation
test.py
def greeter():
for _ in range(10):
print('hello world')
greeter()
and in the shell
>>> import subprocess
>>> result = subprocess.run(['python3', 'test.py'], stdout=subprocess.PIPE)
>>> print(result.stdout.decode('utf-8'))
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
If i try to attempt even this simple implementation with pty, how does one do it?
If your application is going to work asynchronously with multiple tasks, like reading data from stdout and then writing it to a websocket, I suggest using asyncio.
Here is an example that runs a process and redirects its output into a websocket:
import asyncio.subprocess
import os
from aiohttp.web import (Application, Response, WebSocketResponse, WSMsgType,
run_app)
async def on_websocket(request):
# Prepare aiohttp's websocket...
resp = WebSocketResponse()
await resp.prepare(request)
# ... and store in a global dictionary so it can be closed on shutdown
request.app['sockets'].append(resp)
process = await asyncio.create_subprocess_exec(sys.executable,
'/tmp/test.py',
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
bufsize=0)
# Schedule reading from stdout and stderr as asynchronous tasks.
stdout_f = asyncio.ensure_future(p.stdout.readline())
stderr_f = asyncio.ensure_future(p.stderr.readline())
# returncode will be set upon process's termination.
while p.returncode is None:
# Wait for a line in either stdout or stderr.
await asyncio.wait((stdout_f, stderr_f), return_when=asyncio.FIRST_COMPLETED)
# If task is done, then line is available.
if stdout_f.done():
line = stdout_f.result().encode()
stdout_f = asyncio.ensure_future(p.stdout.readline())
await ws.send_str(f'stdout: {line}')
if stderr_f.done():
line = stderr_f.result().encode()
stderr_f = asyncio.ensure_future(p.stderr.readline())
await ws.send_str(f'stderr: {line}')
return resp
async def on_shutdown(app):
for ws in app['sockets']:
await ws.close()
async def init(loop):
app = Application()
app['sockets'] = []
app.router.add_get('/', on_websocket)
app.on_shutdown.append(on_shutdown)
return app
loop = asyncio.get_event_loop()
app = loop.run_until_complete(init())
run_app(app)
It uses aiohttp and is based on the web_ws and subprocess streams examples.
Im sure theres a dupe around somewhere but i couldnt find it quickly
process = subprocess.Popen(cmd, stderr=subprocess.PIPE, stdout=subprocess.PIPE,bufsize=0)
for out in iter(process.stdout.readline, b""):
print(out)
If you are on Windows then you will be fighting an uphill battle for a very long time, and I am sorry for the pain you will endure (been there). If you are on Linux, however, you can use the pexpect module. Pexpect allows you to spawn a background child process which you can perform bidirectional communication with. This is useful for all types of system automation, but a very common use case is ssh.
import pexpect
child = pexpect.spawn('python3 test.py')
message = 'hello world'
while True:
try:
child.expect(message)
except pexpect.exceptions.EOF:
break
input('child sent: "%s"\nHit enter to continue: ' %
(message + child.before.decode()))
print('reached end of file!')
I have found it very useful to create a class to handle something complicated like an ssh connection, but if your use case is simple enough that might not be appropriate or necessary. The way pexpect.before is of type bytes and omits the pattern you are searching for can be awkward, so it may make sense to create a function that handles this for you at the very least.
def get_output(child, message):
return(message + child.before.decode())
If you want to send messages to the child process, you can use child.sendline(line). For more details, check out the documentation I linked.
I hope I was able to help!
I don't know if you can render this in a browser, but you can run a program like module so you get stdout immediately like this:
import importlib
from importlib.machinery import SourceFileLoader
class Program:
def __init__(self, path, name=''):
self.path = path
self.name = name
if self.path:
if not self.name:
self.get_name()
self.loader = importlib.machinery.SourceFileLoader(self.name, self.path)
self.spec = importlib.util.spec_from_loader(self.loader.name, self.loader)
self.mod = importlib.util.module_from_spec(self.spec)
return
def get_name(self):
extension = '.py' #change this if self.path is not python program with extension .py
self.name = self.path.split('\\')[-1].strip('.py')
return
def load(self):
self.check()
self.loader.exec_module(self.mod)
return
def check(self):
if not self.path:
Error('self.file is NOT defined.'.format(path)).throw()
return
file_path = 'C:\\Users\\RICHGang\\Documents\\projects\\stackoverflow\\ptyconsole\\test.py'
file_name = 'test'
prog = Program(file_path, file_name)
prog.load()
You can add sleep in test.py to see the difference:
from time import sleep
def greeter():
for i in range(10):
sleep(0.3)
print('hello world')
greeter()
Take a look at terminado. Works on Windows and Linux.
Jupiter Lab uses it.

Python SSH: Using fabric.api as alternative to paramiko

I'm trying to come up with a way to replace Paramiko as a SSH client for a number of scripts I currently have, so far I came up with this draft for testing:
ssh_handler.py
import sys
from fabric.api import run
from fabric.api import task
from fabric.api import execute
from fabric.network import disconnect_all
def worker(command):
run(command)
#task
def cmd(host, command):
host_list = list()
if isinstance(host, (str, unicode)):
host_list.append(host)
elif isinstance(host, list):
host_list += host
else:
sys.exit(1)
# run the command
execute(worker(command), hosts=host_list)
# disconnect
disconnect_all()
return
When executing this from another script, let's say:
testing.py
import ssh_handler
node_list = ["192.168.0.1", "192.168.0.2"]
for node in node_list:
ssh_handler.cmd(node, "uptime")
results in:
./testing.py
No hosts found. Please specify (single) host string for connection: ^C
Could anybody point me to the right direction? Why can't fabric.api.execute recognize the hosts parameter during the time of execution?
Answer:
I was able to fix it by doing this:
import sys
from fabric.api import run
from fabric.tasks import execute
from fabric.network import disconnect_all
class FabricHelper:
def __init__(self):
self.host_list = list()
pass
def set_hosts(self, host):
if isinstance(host, (str, unicode)):
self.host_list.append(host)
elif isinstance(host, list):
self.host_list += host
else:
sys.exit(1)
return
def cmd(self, command):
execute(run, command=command, hosts=self.host_list)
disconnect_all()
return
if __name__ == "__main__":
example = FabricHelper()
example.set_hosts(["10.200.10.51", "10.200.10.52"])
example.cmd("uptime")
I am a bit confused by your code, execute is generally used to execute a task. But worker isn't a task. Worker is just a one liner that is a call to a call to run. So it seems to me tat worker can be deleted.
At first I was wondering what this test framework is that you are using, then I realized that it's just another python file named test!
Lastly, instead of all these elaborate mechanisms, why not just set the host in the env dictionary?

How to attach debugger to a python subproccess?

I need to debug a child process spawned by multiprocessing.Process(). The pdb degugger seems to be unaware of forking and unable to attach to already running processes.
Are there any smarter python debuggers which can be attached to a subprocess?
I've been searching for a simple to solution for this problem and came up with this:
import sys
import pdb
class ForkedPdb(pdb.Pdb):
"""A Pdb subclass that may be used
from a forked multiprocessing child
"""
def interaction(self, *args, **kwargs):
_stdin = sys.stdin
try:
sys.stdin = open('/dev/stdin')
pdb.Pdb.interaction(self, *args, **kwargs)
finally:
sys.stdin = _stdin
Use it the same way you might use the classic Pdb:
ForkedPdb().set_trace()
Winpdb is pretty much the definition of a smarter Python debugger. It explicitly supports going down a fork, not sure it works nicely with multiprocessing.Process() but it's worth a try.
For a list of candidates to check for support of your use case, see the list of Python Debuggers in the wiki.
This is an elaboration of Romuald's answer which restores the original stdin using its file descriptor. This keeps readline working inside the debugger. Besides, pdb special management of KeyboardInterrupt is disabled, in order it not to interfere with multiprocessing sigint handler.
class ForkablePdb(pdb.Pdb):
_original_stdin_fd = sys.stdin.fileno()
_original_stdin = None
def __init__(self):
pdb.Pdb.__init__(self, nosigint=True)
def _cmdloop(self):
current_stdin = sys.stdin
try:
if not self._original_stdin:
self._original_stdin = os.fdopen(self._original_stdin_fd)
sys.stdin = self._original_stdin
self.cmdloop()
finally:
sys.stdin = current_stdin
Building upon #memplex idea, I had to modify it to get it to work with joblib by setting the sys.stdin in the constructor as well as passing it directly along via joblib.
import os
import pdb
import signal
import sys
import joblib
_original_stdin_fd = None
class ForkablePdb(pdb.Pdb):
_original_stdin = None
_original_pid = os.getpid()
def __init__(self):
pdb.Pdb.__init__(self)
if self._original_pid != os.getpid():
if _original_stdin_fd is None:
raise Exception("Must set ForkablePdb._original_stdin_fd to stdin fileno")
self.current_stdin = sys.stdin
if not self._original_stdin:
self._original_stdin = os.fdopen(_original_stdin_fd)
sys.stdin = self._original_stdin
def _cmdloop(self):
try:
self.cmdloop()
finally:
sys.stdin = self.current_stdin
def handle_pdb(sig, frame):
ForkablePdb().set_trace(frame)
def test(i, fileno):
global _original_stdin_fd
_original_stdin_fd = fileno
while True:
pass
if __name__ == '__main__':
print "PID: %d" % os.getpid()
signal.signal(signal.SIGUSR2, handle_pdb)
ForkablePdb().set_trace()
fileno = sys.stdin.fileno()
joblib.Parallel(n_jobs=2)(joblib.delayed(test)(i, fileno) for i in range(10))
remote-pdb can be used to debug sub-processes. After installation, put the following lines in the code you need to debug:
import remote_pdb
remote_pdb.set_trace()
remote-pdb will print a port number which will accept a telnet connection for debugging that specific process. There are some caveats around worker launch order, where stdout goes when using various frontends, etc. To ensure a specific port is used (must be free and accessible to the current user), use the following instead:
from remote_pdb import RemotePdb
RemotePdb('127.0.0.1', 4444).set_trace()
remote-pdb may also be launched via the breakpoint() command in Python 3.7.
Just use PuDB that gives you an awesome TUI (GUI on terminal) and supports multiprocessing as follow:
from pudb import forked; forked.set_trace()
An idea I had was to create "dummy" classes to fake the implementation of the methods you are using from multiprocessing:
from multiprocessing import Pool
class DummyPool():
#staticmethod
def apply_async(func, args, kwds):
return DummyApplyResult(func(*args, **kwds))
def close(self): pass
def join(self): pass
class DummyApplyResult():
def __init__(self, result):
self.result = result
def get(self):
return self.result
def foo(a, b, switch):
# set trace when DummyPool is used
# import ipdb; ipdb.set_trace()
if switch:
return b - a
else:
return a - b
if __name__ == '__main__':
xml = etree.parse('C:/Users/anmendoza/Downloads/jim - 8.1/running-config.xml')
pool = DummyPool() # switch between Pool() and DummyPool() here
results = []
results.append(pool.apply_async(foo, args=(1, 100), kwds={'switch': True}))
pool.close()
pool.join()
results[0].get()
Here is the version of the ForkedPdb(Romuald's Solution) which will work for Windows and *nix based systems.
import sys
import pdb
import win32console
class MyHandle():
def __init__(self):
self.screenBuffer = win32console.GetStdHandle(win32console.STD_INPUT_HANDLE)
def readline(self):
return self.screenBuffer.ReadConsole(1000)
class ForkedPdb(pdb.Pdb):
def interaction(self, *args, **kwargs):
_stdin = sys.stdin
try:
if sys.platform == "win32":
sys.stdin = MyHandle()
else:
sys.stdin = open('/dev/stdin')
pdb.Pdb.interaction(self, *args, **kwargs)
finally:
sys.stdin = _stdin
The problem here is that Python always connects sys.stdin in the child process to os.devnull to avoid contention for the stream. But this means that when the debugger (or a simple input()) tries to connect to stdin to get input from the user, it immediately reaches end-of-file and reports an error.
One solution, at least if you don't expect multiple debuggers to run at the same time, is to reopen stdin in the child process. That can be done by setting sys.stdin to open(0), which always opens the active terminal. This in fact is what the ForkedPdb solution does, but it can be done more simply and in an os-independent manner like this:
import multiprocessing, sys
def main():
process = multiprocessing.Process(target=worker)
process.start()
process.join()
def worker():
# Python automatically closes sys.stdin for the subprocess, so we reopen
# stdin. This enables pdb to connect to the terminal and accept commands.
# See https://stackoverflow.com/a/30149635/3830997.
sys.stdin = open(0) # or os.fdopen(0)
print("Hello from the subprocess.")
breakpoint() # or import pdb; pdb.set_trace()
print("Exited from breakpoint in the subprocess.")
if __name__ == '__main__':
main()
If you are on a supported platform, try DTrace. Most of the BSD / Solaris / OS X family support DTrace.
Here is an intro by the author. You can use Dtrace to debug just about anything.
Here is a SO post on learning DTrace.

Categories

Resources