Using python Popen to read the last line - python

I have a simple python program:
test.py:
import time
for i in range(100000):
print i
time.sleep(0.5)
I want to use another program that executes the above one in order to read the last line output while the above program is counting.
import subprocess
process = subprocess.Popen("test",stdout=PIPE)
sleep(20) # sleeps an arbitrary time
print stdout.readlines()[-1]
The problem is that process.stdout.readlines() waits until test.py finishes execution.
Is there any way to read the last line that has been writen in the output while the program is executing?

You could use collections.deque to save only the last specified number of lines:
#!/usr/bin/env python
import collections
import subprocess
import time
import threading
def read_output(process, append):
for line in iter(process.stdout.readline, ""):
append(line)
def main():
process = subprocess.Popen(["program"], stdout=subprocess.PIPE)
# save last `number_of_lines` lines of the process output
number_of_lines = 1
q = collections.deque(maxlen=number_of_lines)
t = threading.Thread(target=read_output, args=(process, q.append))
t.daemon = True
t.start()
#
time.sleep(20)
# print saved lines
print ''.join(q),
# process is still running
# uncomment if you don't want to wait for the process to complete
##process.terminate() # if it doesn't terminate; use process.kill()
process.wait()
if __name__=="__main__":
main()
See other tail-like solutions that print only the portion of the output
See here if your child program uses a block-buffering (instead of line-bufferring) for its stdout while running non-interactively.

Fairly trivial with sh.py:
import sh
def process_line(line):
print line
process = sh.python("test.py", _out=process_line)
process.wait()

Related

Is there a way to get a callback in Python, whenever something gets printed to stdout in a thread?

I want to monitor the stdout of a program and whenever it prints something into stdout, I want to get a callback in python to process the gathered data.
The program I want to monitor is not written in python, but behaves similar to this dummy_script.py:
import datetime
import random
import time
i = 0
while True:
line = f"{datetime.datetime.now()} {i}"
print(line)
i += 1
time.sleep(random.uniform(0, 1))
For the main python script I tried something like this:
from threading import Thread
import os
def do_stuff():
command = f"python3 dummy_script.py"
os.system(command)
thread = Thread(target=do_stuff)
thread.daemon = True
thread.start()
So is there a way to create a callback when a new line is printed to stdout?

Tried to make a non-blocking command execution function, what's causing this unexpected behavior?

What I wanted to happen:
So my goal was to write a function that leverages subprocess to run a command and read the stdout, whether it be immediate or delayed, line by line as it comes. And to do that in a non-blocking, asynchronous way.
I also wanted to be able to pass a function to be called each time a new stdout line is read.
What happened instead:
Until the process being run is completely finished / killed, the output isn't handled / printed as expected. All the correct output happens, but I expected it to print in real-time as the output is polled. Rather, it waits until the entire process finishes running, then prints all the expected output.
What I tried:
So I wrote a simple test script lab_temp.py to provide some output:
from time import sleep
for i in range(10):
print('i:', i)
sleep(1)
And a function set_interval.py which I mostly copied from some SO answer (although I'm sorry I don't recall which answer to give credit):
import threading
def set_interval(func, sec):
def func_wrapper():
t = set_interval(func, sec)
result = func()
if result == False:
t.cancel()
t = threading.Timer(sec, func_wrapper)
t.start()
return t
And then a function call_command.py to run the command and asynchronously poll the process at some interval for output, until it's done. I'm only barely experienced with asynchronous code, and that's probably related to my mistake, but I think the async part is being handled behind the scenes by threading.Timer (in set_interval.py).
call_command.py:
import subprocess
from set_interval import set_interval
def call_command(cmd, update_func=None):
p = subprocess.Popen(cmd, stderr=subprocess.PIPE, stdout=subprocess.PIPE, encoding='utf-8')
def polling(): # Replaces "while True:" to convert to non-blocking
for line in iter(p.stdout.readline, ''):
if update_func:
update_func(line.rstrip())
if p.poll() == 0:
print('False')
return False # cancel interval
else:
print('True')
return True # continue interval
set_interval(polling, 1)
And each of these functions have basic tests:
set_interval.test.py (seems to run as expected):
from set_interval import set_interval
i = 0
def func():
global i
i += 1
print(f"Interval: {i}...")
if i > 5:
return False
else:
return True
set_interval(func, 2)
print('non blocking')
call_command.test.py (results in the wrong behavior, as described initially):
from call_command import call_command
def func(out):
print(out) # <- This will print in one big batch once
# the entire process is complete.
call_command('python3 lab_temp.py', update_func=func)
print('non-blocking') # <- This will print right away, so I
# know it's not blocked / frozen.
What have I gotten wrong here causing the deviation from expectation?
Edit: Continued efforts...
import subprocess
from set_interval import set_interval
def call_command(cmd):
p = subprocess.Popen(cmd, stderr=subprocess.PIPE, stdout=subprocess.PIPE, encoding='utf-8')
def polling():
line = p.stdout.readline().strip()
if not line and p.poll() is not None:
return False
else:
print(line)
return True
set_interval(polling, 1)
Doesn't work. Nearly identical issues.
The problem is located in your command. The lab_temp.py script uses the print function that prints to sys.stdout by default. sys.stdout is block-buffered by default. Since the buffer is large enough to accept the whole script's output, it gets flushed no sooner than at the end.
To fix this, you can either use the sys.stdout's flush method:
from time import sleep
import sys
for i in range(10):
print('i:', i)
sys.stdout.flush()
sleep(1)
or use print's flush parameter:
from time import sleep
for i in range(10):
print('i:', i, flush=True)
sleep(1)
or run Python interpreter with the -u option:
call_command(['python3', '-u', 'lab_temp.py'])
or set the PYTHONUNBUFFERED environment variable to a non-empty string
import subprocess
from set_interval import set_interval
def call_command(cmd):
p = subprocess.Popen(cmd, stderr=subprocess.PIPE, stdout=subprocess.PIPE, encoding='utf-8', env={'PYTHONUNBUFFERED':'1'})
def polling():
line = p.stdout.readline().strip()
if not line and p.poll() is not None:
return False
else:
print(line)
return True
set_interval(polling, 1)
BTW, in order to avoid threads, you might want to use asyncio:
import asyncio
async def call_command(cmd):
p = await asyncio.create_subprocess_exec(cmd[0], *cmd[1:], stderr=asyncio.subprocess.PIPE, stdout=asyncio.subprocess.PIPE)
async for line in p.stdout:
line = line.strip().decode('utf-8')
print(line)

Read from Popen object's stdout while it's running

I'm trying to capture the stdout of a Popen object while it's running and display this data on a gui and log it. However whenever I try and read from the stdout attribute my program freezes. Minimal working code below. 'here' prints, then the process string representation, but then it hangs when it tries to read the first byte of stdout. Why is this the case?
Main script
import subprocess
import os
from threading import Thread
def print_to_terminal(process):
print(process)
print(process.stdout.read(1), flush=True)
sys.stdout.flush()
runner = subprocess.Popen(['python', 'print_and_wait.py'], env=os.environ, stdout=subprocess.PIPE)
print('here')
t = Thread(target=print_to_terminal, args=[runner]).run()
print('there')
runner.wait()
script Popen is calling
from time import sleep
for _ in range(10):
print('hello')
sleep(1)
After comments: This did work if I added a flush to the print in the print_and_wait function. See below
from time import sleep
for _ in range(10):
print('hello', flush=True)
sleep(1)

Capture output from very long running python subprocess event driven

The problem I am facing is that I have a subprocess in python (ffmpeg process) that runs for hours or even days from which I'd like to capture the output. The subprocess its stdout/err.PIPE can easily grow huge and needs to be flushed otherwise the program will grow out of memory (limit of PIPE size). I do not like to constantly parse the output but ondemand (event driven) and very important, get the latest output from the sub-process.
My current approach is to send the output to a LifoQueue with size 0 so I only keep the last result for when I need to process it.
I an other attempt I've tried to assign stdout to /dev/null and when needed re-assign it when I'd like to read the output but failed miserably thus far.
As a proof of concept I have made these programs but I'd like to know if there is a more efficient/pythonic way of handling this issue, also I'd like to avoid using threads.
This is the best solution I could come-up with for now.
import subprocess
import time
import threading
import Queue
class FlushPipe(object):
def __init__(self):
self.command = ['python', './print_date.py']
self.process = None
self.process_output = Queue.LifoQueue(0)
self.capture_output = threading.Thread(target=self.output_reader)
def output_reader(self):
for line in iter(self.process.stdout.readline, b''):
self.process_output.put_nowait(line)
def start_process(self):
self.process = subprocess.Popen(self.command,
stdout=subprocess.PIPE)
self.capture_output.start()
def get_output_for_processing(self):
return self.process_output.get()
if __name__ == "__main__":
flush_pipe = FlushPipe()
flush_pipe.start_process()
now = time.time()
while time.time() - now < 10:
print ">>>" + flush_pipe.get_output_for_processing()
time.sleep(2.5)
flush_pipe.capture_output.join(timeout=0.001)
flush_pipe.process.kill()
print_date.py
#!/usr/bin/env python
import time
if __name__ == "__main__":
while True:
print str(time.time())
time.sleep(0.01)
output:
>>>1520535158.51
>>>1520535161.01
>>>1520535163.51
>>>1520535166.01

How to collect output from a Python subprocess

I am trying to make a python process that reads some input, processes it and prints out the result. The processing is done by a subprocess (Stanford's NER), for ilustration I will use 'cat'. I don't know exactly how much output NER will give, so I use run a separate thread to collect it all and print it out. The following example illustrates.
import sys
import threading
import subprocess
# start my subprocess
cat = subprocess.Popen(
['cat'],
shell=False, stdout=subprocess.PIPE, stdin=subprocess.PIPE,
stderr=None)
def subproc_cat():
""" Reads the subprocess output and prints out """
while True:
line = cat.stdout.readline()
if not line:
break
print("CAT PROC: %s" % line.decode('UTF-8'))
# a daemon that runs the above function
th = threading.Thread(target=subproc_cat)
th.setDaemon(True)
th.start()
# the main thread reads from stdin and feeds the subprocess
while True:
line = sys.stdin.readline()
print("MAIN PROC: %s" % line)
if not line:
break
cat.stdin.write(bytes(line.strip() + "\n", 'UTF-8'))
cat.stdin.flush()
This seems to work well when I enter text with the keyboard. However, if I try to pipe input into my script (cat file.txt | python3 my_script.py), a racing condition seems to occur. Sometimes I get proper output, sometimes not, sometimes it locks down. Any help would be appreciated!
I am runing Ubuntu 14.04, python 3.4.0. The solution should be platform-independant.
Add th.join() at the end otherwise you may kill the thread prematurely before it has processed all the output when the main thread exits: daemon threads do not survive the main thread (or remove th.setDaemon(True) instead of th.join()).

Categories

Resources