Parsing pexpect output - python

I'm trying to parse in real time the output of a program block-buffered, which means that output is not available until the process ends. What I need is just to parse line by line, filter and manage data from the output, as it could run for hours.
I've tried to capture the output with subprocess.Popen(), but yes, as you may guess, Popen can't manage this kind of behavior, it keeps buffering until end of process.
from subprocess import Popen, PIPE
p = Popen("my noisy stuff ", shell=True, stdout=PIPE, stderr=PIPE)
for line in p.stdout.readlines():
#parsing text and getting data
So I found pexpect, which prints the output in real time, as it treats the stdout as a file, or I could even do a dirty trick printing out a file and parsing it outside the function. But ok, it is too dirty, even for me ;)
import pexpect
import sys
pexpect.run("my noisy stuff", logfile=sys.stdout)
But I guess it should a better pythonic way to do this, just manage the stdout like subprocess. Popen does. How can I do this?
EDIT:
Running J.F. proposal:
This is a deliberately wrong audit, it takes about 25 secs. to stop.
from subprocess import Popen, PIPE
command = "bully mon0 -e ESSID -c 8 -b aa:bb:cc:dd:ee:00 -v 2"
p = Popen(command, shell=True, stdout=PIPE, stderr=PIPE)
for line in iter(p.stdout.readline, b''):
print "inside loop"
print line
print "outside loop"
p.stdout.close()
p.wait()
#$ sudo python SCRIPT.py
### <= 25 secs later......
# inside loop
#[!] Bully v1.0-21 - WPS vulnerability assessment utility
#inside loop
#[!] Using 'ee:cc:bb:aa:bb:ee' for the source MAC address
#inside loop
#[X] Unable to get a beacon from the AP, possible causes are
#inside loop
#[.] an invalid --bssid or -essid was provided,
#inside loop
#[.] the access point isn't on channel '8',
#inside loop
#[.] you aren't close enough to the access point.
#outside loop
Using this method instead:
EDIT: Due to large delays and timeouts in the output, I had to fix the child, and added some hacks, so final code looks like this
import pexpect
child = pexpect.spawn(command)
child.maxsize = 1 #Turns off buffering
child.timeout = 50 # default is 30, insufficient for me. Crashes were due to this param.
for line in child:
print line,
child.close()
Gives back the same output, but it prints lines in real time. So... SOLVED Thanks #J.F. Sebastian

.readlines() reads all lines. No wonder you don't see any output until the subprocess ends. You could use .readline() instead to read line by line as soon as the subprocess flushes its stdout buffer:
from subprocess import Popen, PIPE
p = Popen("my noisy stuff", stdout=PIPE, bufsize=1)
for line in iter(p.stdout.readline, b''):
# process line
..
p.stdout.close()
p.wait()
If you are already have pexpect then you could use it to workaround the block-buffering issue:
import pexpect
child = pexpect.spawn("my noisy stuff", timeout=None)
for line in child:
# process line
..
child.close()
See also stdbuf, pty -based solutions from the question I've linked in the comments.

Related

Check on the stdout of a running subprocess in python

If need to periodically check the stdout of a running process. For example, the process is tail -f /tmp/file, which is spawned in the python script. Then every x seconds, the stdout of that subprocess is written to a string and further processed. The subprocess is eventually stopped by the script.
To parse the stdout of a subprocess, if used check_output until now, which doesn't seem to work, as the process is still running and doesn't produce a definite output.
>>> from subprocess import check_output
>>> out = check_output(["tail", "-f", "/tmp/file"])
#(waiting for tail to finish)
It should be possible to use threads for the subprocesses, so that the output of multiple subprocesses may be processed (e.g. tail -f /tmp/file1, tail -f /tmp/file2).
How can I start a subprocess, periodically check and process its stdout and eventually stop the subprocess in a multithreading friendly way? The python script runs on a Linux system.
The goal is not to continuously read a file, the tail command is an example, as it behaves exactly like the actual command used.
edit: I didn't think this through, the file did not exist. check_output now simply waits for the process to finish.
edit2: An alternative method, with Popen and PIPE appears to result in the same issue. It waits for tail to finish.
>>> from subprocess import Popen, PIPE, STDOUT
>>> cmd = 'tail -f /tmp/file'
>>> p = Popen(cmd, shell=True, stdin=PIPE, stdout=PIPE, stderr=STDOUT, close_fds=True)
>>> output = p.stdout.read()
#(waiting for tail to finish)
Your second attempt is 90% correct. The only issue is that you are attempting to read all of tail's stdout at the same time once it's finished. However, tail is intended to run (indefinitely?) in the background, so you really want to read stdout from it line-by-line:
from subprocess import Popen, PIPE, STDOUT
p = Popen(["tail", "-f", "/tmp/file"], stdin=PIPE, stdout=PIPE, stderr=STDOUT)
for line in p.stdout:
print(line)
I have removed the shell=True and close_fds=True arguments. The first is unnecessary and potentially dangerous, while the second is just the default.
Remember that file objects are iterable over their lines in Python. The for loop will run until tail dies, but it will process each line as it appears, as opposed to read, which will block until tail dies.
If I create an empty file in /tmp/file, start this program and begin echoing lines into the file using another shell, the program will echo those lines. You should probably replace print with something a bit more useful.
Here is an example of commands I typed after starting the code above:
Command line
$ echo a > /tmp/file
$ echo b > /tmp/file
$ echo c >> /tmp/file
Program Output (From Python in a different shell)
b'a\n'
b'tail: /tmp/file: file truncated\n'
b'b\n'
b'c\n'
In the case that you want your main program be responsive while you respond to the output of tail, start the loop in a separate thread. You should make this thread a daemon so that it does not prevent your program from exiting even if tail is not finished. You can have the thread open the sub-process or you can just pass in the standard output to it. I prefer the latter approach since it gives you more control in the main thread:
def deal_with_stdout():
for line in p.stdout:
print(line)
from subprocess import Popen, PIPE, STDOUT
from threading import Thread
p = Popen(["tail", "-f", "/tmp/file"], stdin=PIPE, stdout=PIPE, stderr=STDOUT)
t = Thread(target=deal_with_stdout, daemon=True)
t.start()
t.join()
The code here is nearly identical, with the addition of a new thread. I added a join() at the end so the program would behave well as an example (join waits for the thread to die before returning). You probably want to replace that with whatever processing code you would normally be running.
If your thread is complex enough, you may also want to inherit from Thread and override the run method instead of passing in a simple target.

Read stdout from subprocess until there is nothing left

I would like to run several commands in the same shell. After some research I found that I could keep a shell open using the return process from Popen. I can then write and read to stdin and stdout. I tried implementing it as such:
process = Popen(['/bin/sh'], stdin=PIPE, stdout=PIPE)
process.stdin.write('ls -al\n')
out = ' '
while not out == '':
out = process.stdout.readline().rstrip('\n')
print out
Not only is my solution ugly, it doesn't work. out is never empty because it hands on the readline(). How can I successfully end the while loop when there is nothing left to read?
Use iter to read data in real time:
for line in iter(process.stdout.readline,""):
print line
If you just want to write to stdin and get the output you can use communicate to make the process end:
process = Popen(['/bin/sh'], stdin=PIPE, stdout=PIPE)
out,err =process.communicate('ls -al\n')
Or simply get the output use check_output:
from subprocess import check_output
out = check_output(["ls", "-al"])
The command you're running in a subprocess is sh, so the output you're reading is sh's output. Since you didn't indicate to the shell it should quit, it is still alive, thus its stdout is still open.
You can perhaps write exit to its stdin to make it quit, but be aware that in any case, you get to read things you don't need from its stdout, e.g. the prompt.
Bottom line, this approach is flawed to start with...

Trouble printing text live with Python subprocess.call

Off the bat, here is what I am importing:
import os, shutil
from subprocess import call, PIPE, STDOUT
I have a line of code that calls bjam to compile a library:
call(['./bjam',
'-j8',
'--prefix="' + tools_dir + '"'],
stdout=PIPE)
I want it to print out text as the compilation occurs. Instead, it prints everything out at the end.
It does not print anything when I run it like this. I have tried running the command outside of Python and determined that all of the output is to stdout (when I did ./bjam -j8 > /dev/null I got no output, and when I ran ./bjam -j8 2> /dev/null I got output).
What am I doing wrong here? I want to print the output from call live.
As a sidenote, I also noticed something when I was outputting the results of a git clone operation:
call(['git',
'clone', 'https://github.com/moses-smt/mosesdecoder.git'],
stdout=PIPE)
prints the stdout text live as the call process is run.
call(['git',
'clone', 'https://github.com/moses-smt/mosesdecoder.git'],
stdout=PIPE, stderr=STDOUT)
does not print out any text. What is going on here?
stdout=PIPE redirects subprocess' stdout to a pipe. Don't do it unless you want to read from the subprocesses stdout in your code using proc.communicate() method or using proc.stdout attribute directly.
If you remove it then subprocess should print to stdout like it does in the shell:
from subprocess import check_call
check_call(['./bjam', '-j8', '--prefix', tools_dir])
I've used check_call() to raise an exception if the child process fails.
See Python: read streaming input from subprocess.communicate() if you want to read subprocess' output line by line (making the line available as a variable in Python) as soon as it is avaiable.
Try:
def run(command):
proc = subprocess.Popen(command, stdout=subprocess.PIPE)
for lineno, line in enumerate(proc.stdout):
try:
print(line.decode('utf-8').replace('\n', ''))
except UnicodeDecodeError:
print('error(%d): cannot decode %s' % (lineno, line))
The try...except logic is for python 3 (maybe 3.2/3.3, I'm not sure), as there line is a byte array not a string. For earlier versions of python, you should be able to do:
def run(command):
proc = subprocess.Popen(command, stdout=subprocess.PIPE)
for line in proc.stdout:
print(line.replace('\n', ''))
Now, you can do:
run(['./bjam', '-j8', '--prefix="' + tools_dir + '"'])
call will not print anything it captures. As documentation says "Do not use stdout=PIPE or stderr=PIPE with this function. As the pipes are not being read in the current process, the child process may block if it generates enough output to a pipe to fill up the OS pipe buffer."
Consider using check_output and print its return value.
In the first case with git call you are not capturing stderr and therefor it normally flows onto your terminal.

Handling tcpdump output in python

Im trying to handle tcpdump output in python.
What I need is to run tcpdump (which captures the packets and gives me information) and read the output and process it.
The problem is that tcpdump keeps running forever and I need to read the packet info as soon as it outputs and continue doing it.
I tried looking into subprocess of python and tried calling tcpdump using popen and piping the stdout but it doesnt seem to work.
Any directions on how to proceed with this.
import subprocess
def redirect():
tcpdump = subprocess.Popen("sudo tcpdump...", stdin=subprocess.PIPE, stdout=subprocess.PIPE, shell=True)
while True:
s = tcpdump.stdout.readline()
# do domething with s
redirect()
You can make tcpdump line-buffered with "-l". Then you can use subprocess to capture the output as it comes out.
import subprocess as sub
p = sub.Popen(('sudo', 'tcpdump', '-l'), stdout=sub.PIPE)
for row in iter(p.stdout.readline, b''):
print row.rstrip() # process here
By default, pipes are block buffered and interactive output is line buffered. It sounds like you need a line buffered pipe - coming from tcpdump in a subprocess.
In the old days, we'd recommend Dan Bernstein's "pty" program for this kind of thing. Today, it appears that pty hasn't been updated in a long time, but there's a new program called "emtpy" which is more or less the same idea:
http://empty.sourceforge.net/
You might try running tcpdump under empty in your subprocess to make tcpdump line buffered even though it's writing to a pipe.

Popen does not give output immediately when available

I am trying to read from both stdout and stderr from a Popen and print them out. The command I am running with Popen is the following
#!/bin/bash
i=10
while (( i > 0 )); do
sleep 1s
echo heyo-$i
i="$((i-1))"
done
echo 'to error' >&2
When I run this in the shell, I get one line of output and then a second break and then one line again, etc. However, I am unable to recreate this using python. I am starting two threads, one each to read from stdout and stderr, put the lines read into a Queue and another thread that takes items from this queue and prints them out. But with this, I see that all the output gets printed out at once, after the subprocess ends. I want the lines to be printed as and when they are echo'ed.
Here's my python code:
# The `randoms` is in the $PATH
proc = sp.Popen(['randoms'], stdout=sp.PIPE, stderr=sp.PIPE, bufsize=0)
q = Queue()
def stream_watcher(stream, name=None):
"""Take lines from the stream and put them in the q"""
for line in stream:
q.put((name, line))
if not stream.closed:
stream.close()
Thread(target=stream_watcher, args=(proc.stdout, 'out')).start()
Thread(target=stream_watcher, args=(proc.stderr, 'err')).start()
def displayer():
"""Take lines from the q and add them to the display"""
while True:
try:
name, line = q.get(True, 1)
except Empty:
if proc.poll() is not None:
break
else:
# Print line with the trailing newline character
print(name.upper(), '->', line[:-1])
q.task_done()
print('-*- FINISHED -*-')
Thread(target=displayer).start()
Any ideas? What am I missing here?
Only stderr is unbuffered, not stdout. What you want cannot be done using the shell built-ins alone. The buffering behavior is defined in the stdio(3) C library, which applies line buffering only when the output is to a terminal. When the output is to a pipe, it is pipe-buffered, not line-buffered, and so the data is not transferred to the kernel and thence to the other end of the pipe until the pipe buffer fills.
Moreover, the shell has no access to libc’s buffer-controlling functions, such as setbuf(3) and friends. The only possible solution within the shell is to launch your co-process on a pseudo-tty, and pty management is a complex topic. It is much easier to rewrite the equivalent shell script in a language that does grant access to low-level buffering features for output streams than to arrange to run something over a pty.
However, if you call /bin/echo instead of the shell built-in echo, you may find it more to your liking. This works because now the whole line is flushed when the newly launched /bin/echo process terminates each time. This is hardly an efficient use of system resources, but may be an efficient use of your own.
IIRC, setting shell=True on Popen should do it.

Categories

Resources