The problem I am facing is that I have a subprocess in python (ffmpeg process) that runs for hours or even days from which I'd like to capture the output. The subprocess its stdout/err.PIPE can easily grow huge and needs to be flushed otherwise the program will grow out of memory (limit of PIPE size). I do not like to constantly parse the output but ondemand (event driven) and very important, get the latest output from the sub-process.
My current approach is to send the output to a LifoQueue with size 0 so I only keep the last result for when I need to process it.
I an other attempt I've tried to assign stdout to /dev/null and when needed re-assign it when I'd like to read the output but failed miserably thus far.
As a proof of concept I have made these programs but I'd like to know if there is a more efficient/pythonic way of handling this issue, also I'd like to avoid using threads.
This is the best solution I could come-up with for now.
import subprocess
import time
import threading
import Queue
class FlushPipe(object):
def __init__(self):
self.command = ['python', './print_date.py']
self.process = None
self.process_output = Queue.LifoQueue(0)
self.capture_output = threading.Thread(target=self.output_reader)
def output_reader(self):
for line in iter(self.process.stdout.readline, b''):
self.process_output.put_nowait(line)
def start_process(self):
self.process = subprocess.Popen(self.command,
stdout=subprocess.PIPE)
self.capture_output.start()
def get_output_for_processing(self):
return self.process_output.get()
if __name__ == "__main__":
flush_pipe = FlushPipe()
flush_pipe.start_process()
now = time.time()
while time.time() - now < 10:
print ">>>" + flush_pipe.get_output_for_processing()
time.sleep(2.5)
flush_pipe.capture_output.join(timeout=0.001)
flush_pipe.process.kill()
print_date.py
#!/usr/bin/env python
import time
if __name__ == "__main__":
while True:
print str(time.time())
time.sleep(0.01)
output:
>>>1520535158.51
>>>1520535161.01
>>>1520535163.51
>>>1520535166.01
Related
I want to monitor the stdout of a program and whenever it prints something into stdout, I want to get a callback in python to process the gathered data.
The program I want to monitor is not written in python, but behaves similar to this dummy_script.py:
import datetime
import random
import time
i = 0
while True:
line = f"{datetime.datetime.now()} {i}"
print(line)
i += 1
time.sleep(random.uniform(0, 1))
For the main python script I tried something like this:
from threading import Thread
import os
def do_stuff():
command = f"python3 dummy_script.py"
os.system(command)
thread = Thread(target=do_stuff)
thread.daemon = True
thread.start()
So is there a way to create a callback when a new line is printed to stdout?
Right now, I'm using subprocess to run a long-running job in the background. For multiple reasons (PyInstaller + AWS CLI) I can't use subprocess anymore.
Is there an easy way to achieve the same thing as below ? Running a long running python function in a multiprocess pool (or something else) and do real time processing of stdout/stderr ?
import subprocess
process = subprocess.Popen(
["python", "long-job.py"],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
shell=True,
)
while True:
out = process.stdout.read(2000).decode()
if not out:
err = process.stderr.read().decode()
else:
err = ""
if (out == "" or err == "") and process.poll() is not None:
break
live_stdout_process(out)
Thanks
getting it cross platform is messy .... first of all windows implementation of non-blocking pipe is not user friendly or portable.
one option is to just have your application read its command line arguments and conditionally execute a file, and you get to use subprocess since you will be launching yourself with different argument.
but to keep it to multiprocessing :
the output must be logged to queues instead of pipes.
you need the child to execute a python file, this can be done using runpy to execute the file as __main__.
this runpy function should run under a multiprocessing child, this child must first redirect its stdout and stderr in the initializer.
when an error happens, your main application must catch it .... but if it is too busy reading the output it won't be able to wait for the error, so a child thread has to start the multiprocess and wait for the error.
the main process has to create the queues and launch the child thread and read the output.
putting it all together:
import multiprocessing
from multiprocessing import Queue
import sys
import concurrent.futures
import threading
import traceback
import runpy
import time
class StdoutQueueWrapper:
def __init__(self,queue:Queue):
self._queue = queue
def write(self,text):
self._queue.put(text)
def flush(self):
pass
def function_to_run():
# runpy.run_path("long-job.py",run_name="__main__") # run long-job.py
print("hello") # print something
raise ValueError # error out
def initializer(stdout_queue: Queue,stderr_queue: Queue):
sys.stdout = StdoutQueueWrapper(stdout_queue)
sys.stderr = StdoutQueueWrapper(stderr_queue)
def thread_function(child_stdout_queue,child_stderr_queue):
with concurrent.futures.ProcessPoolExecutor(1, initializer=initializer,
initargs=(child_stdout_queue, child_stderr_queue)) as pool:
result = pool.submit(function_to_run)
try:
result.result()
except Exception as e:
child_stderr_queue.put(traceback.format_exc())
if __name__ == "__main__":
child_stdout_queue = multiprocessing.Queue()
child_stderr_queue = multiprocessing.Queue()
child_thread = threading.Thread(target=thread_function,args=(child_stdout_queue,child_stderr_queue),daemon=True)
child_thread.start()
while True:
while not child_stdout_queue.empty():
var = child_stdout_queue.get()
print(var,end='')
while not child_stderr_queue.empty():
var = child_stderr_queue.get()
print(var,end='')
if not child_thread.is_alive():
break
time.sleep(0.01) # check output every 0.01 seconds
Note that a direct consequence of running as a multiprocess is that if the child runs into a segmentation fault or some unrecoverable error the parent will also die, hencing running yourself under subprocess might seem a better option if segfaults are expected.
In this script I was looking to launch a given program and monitor it as long as the program exists. Thus, I reached the point where I got to use the threading's module Timer method for controlling a loop that writes to a file and prints out to the console a specific stat of the launched process (for this case, mspaint).
The problem arises when I'm hitting CTRL + C in the console or when I close mspaint, with the script capturing any of the 2 events only after the time defined for the interval has completely ran out. These events make the script stop.
For example, if a 20 seconds time is set for the interval, once the script has started, if at second 5 I either hit CTRL + C or close mspaint, the script will stop only after the remaining 15 seconds will have passed.
I would like for the script to stop right away when I either hit CTRL + C or close mspaint (or any other process launched through this script).
The script can be used with the following command, according to the example:
python.exe mon_tool.py -p "C:\Windows\System32\mspaint.exe" -i 20
I'd really appreciate if you could come up with a working example.
I had used python 3.10.4 and psutil 5.9.0 .
This is the code:
# mon_tool.py
import psutil, sys, os, argparse
from subprocess import Popen
from threading import Timer
debug = False
def parse_args(args):
parser = argparse.ArgumentParser()
parser.add_argument("-p", "--path", type=str, required=True)
parser.add_argument("-i", "--interval", type=float, required=True)
return parser.parse_args(args)
def exceptionHandler(exception_type, exception, traceback, debug_hook=sys.excepthook):
'''Print user friendly error messages normally, full traceback if DEBUG on.
Adapted from http://stackoverflow.com/questions/27674602/hide-traceback-unless-a-debug-flag-is-set
'''
if debug:
print('\n*** Error:')
debug_hook(exception_type, exception, traceback)
else:
print("%s: %s" % (exception_type.__name__, exception))
sys.excepthook = exceptionHandler
def validate(data):
try:
if data.interval < 0:
raise ValueError
except ValueError:
raise ValueError(f"Time has a negative value: {data.interval}. Please use a positive value")
def main():
args = parse_args(sys.argv[1:])
validate(args)
# creates the "Process monitor data" folder in the "Documents" folder
# of the current Windows profile
default_path: str = f"{os.path.expanduser('~')}\\Documents\Process monitor data"
if not os.path.exists(default_path):
os.makedirs(default_path)
abs_path: str = f'{default_path}\data_test.txt'
print("data_test.txt can be found in: " + default_path)
# launches the provided process for the path argument, and
# it checks if the process was indeed launched
p: Popen[bytes] = Popen(args.path)
PID = p.pid
isProcess: bool = True
while isProcess:
for proc in psutil.process_iter():
if(proc.pid == PID):
isProcess = False
process_stats = psutil.Process(PID)
# creates the data_test.txt and it erases its content
with open(abs_path, 'w', newline='', encoding='utf-8') as testfile:
testfile.write("")
# loop for writing the handles count to data_test.txt, and
# for printing out the handles count to the console
def process_monitor_loop():
with open(abs_path, 'a', newline='', encoding='utf-8') as testfile:
testfile.write(f"{process_stats.num_handles()}\n")
print(process_stats.num_handles())
Timer(args.interval, process_monitor_loop).start()
process_monitor_loop()
if __name__ == '__main__':
main()
Thank you!
I think you could use python-worker (link) for the alternatives
import time
from datetime import datetime
from worker import worker, enableKeyboardInterrupt
# make sure to execute this before running the worker to enable keyboard interrupt
enableKeyboardInterrupt()
# your codes
...
# block lines with periodic check
def block_next_lines(duration):
t0 = time.time()
while time.time() - t0 <= duration:
time.sleep(0.05) # to reduce resource consumption
def main():
# your codes
...
#worker(keyboard_interrupt=True)
def process_monitor_loop():
while True:
print("hii", datetime.now().isoformat())
block_next_lines(3)
return process_monitor_loop()
if __name__ == '__main__':
main_worker = main()
main_worker.wait()
here your process_monitor_loop will be able to stop even if it's not exactly 20 sec of interval
You can try registering a signal handler for SIGINT, that way whenever the user presses Ctrl+C you can have a custom handler to clean all of your dependencies, like the interval, and exit gracefully.
See this for a simple implementation.
This is the solution for the second part of the problem, which checks if the launched process exists. If it doesn't exist, it stops the script.
This solution comes on top of the solution, for the first part of the problem, provided above by #danangjoyoo, which deals with stopping the script when CTRL + C is used.
Thank you very much once again, #danangjoyoo! :)
This is the code for the second part of the problem:
import time, psutil, sys, os
from datetime import datetime
from worker import worker, enableKeyboardInterrupt, abort_all_thread, ThreadWorkerManager
from threading import Timer
# make sure to execute this before running the worker to enable keyboard interrupt
enableKeyboardInterrupt()
# block lines with periodic check
def block_next_lines(duration):
t0 = time.time()
while time.time() - t0 <= duration:
time.sleep(0.05) # to reduce resource consumption
def main():
# launches mspaint, gets its PID and checks if it was indeed launched
path = f"C:\Windows\System32\mspaint.exe"
p = psutil.Popen(path)
PID = p.pid
isProcess: bool = True
while isProcess:
for proc in psutil.process_iter():
if(proc.pid == PID):
isProcess = False
interval = 5
global counter
counter = 0
#allows for sub_process to run only once
global run_sub_process_once
run_sub_process_once = 1
#worker(keyboard_interrupt=True)
def process_monitor_loop():
while True:
print("hii", datetime.now().isoformat())
def sub_proccess():
'''
Checks every second if the launched process still exists.
If the process doesn't exist anymore, the script will be stopped.
'''
print("Process online:", psutil.pid_exists(PID))
t = Timer(1, sub_proccess)
t.start()
global counter
counter += 1
print(counter)
# Checks if the worker thread is alive.
# If it is not alive, it will kill the thread spawned by sub_process
# hence, stopping the script.
for _, key in enumerate(ThreadWorkerManager.allWorkers):
w = ThreadWorkerManager.allWorkers[key]
if not w.is_alive:
t.cancel()
if not psutil.pid_exists(PID):
abort_all_thread()
t.cancel()
global run_sub_process_once
if run_sub_process_once:
run_sub_process_once = 0
sub_proccess()
block_next_lines(interval)
return process_monitor_loop()
if __name__ == '__main__':
main_worker = main()
main_worker.wait()
Also, I have to note that #danangjoyoo's solution comes as an alternative to signal.pause() for Windows. This only deals with CTRL + C problem part. signal.pause() works only for Unix systems. This is how it was supposed for its usage, for my case, in case it were a Unix system:
import signal, sys
from threading import Timer
def main():
def signal_handler(sig, frame):
print('\nYou pressed Ctrl+C!')
sys.exit(0)
signal.signal(signal.SIGINT, signal_handler)
print('Press Ctrl+C')
def process_monitor_loop():
try:
print("hi")
except KeyboardInterrupt:
signal.pause()
Timer(10, process_monitor_loop).start()
process_monitor_loop()
if __name__ == '__main__':
main()
The code above is based on this.
My goal : To read the latest "chunk" (N lines) of streaming stdout every M seconds from a subprocess.
Current code:
start the subprocess
reads stdout
once I have a chunk of N lines, print it out (or save as current chunk)
wait M seconds
repeat
I have also put code for the moment to terminate the subprocess (which is an endless stream until you hit Ctrl-C)
What I want to achieve is after I wait for M seconds, if for it to always read the latest N lines and not the subsequent N lines in stdout (they can be discarded as I'm only interested in the latest)
My end goal would be to spawn a thread to run the process and keep saving the latest lines and then call from the main process whenever I need the latest results of the stream.
Any help would be greatly appreciated!
#!/usr/bin/env python3
import signal
import time
from subprocess import Popen, PIPE
sig = signal.SIGTERM
N=9
M=5
countlines=0
p = Popen(["myprogram"], stdout=PIPE, bufsize=1, universal_newlines=True)
chunk=[]
for line in p.stdout:
countlines+=1
chunk.append(line)
if len(chunk)==N:
print(chunk)
chunk=[]
time.sleep(M)
if countlines>100:
p.send_signal(sig)
break
print("done")
After much searching, I stumbled upon a solution here:
https://eli.thegreenplace.net/2017/interacting-with-a-long-running-child-process-in-python/
Eli's "Launch, interact, get output in real time, terminate" code section worked for me.
So far its the most elegant solution I've found.
Adapted to my problem above, and written within a class (not shown here):
def output_reader(self,proc):
chunk=[]
countlines=0
for line in iter(proc.stdout.readline, b''):
countlines+=1
chunk.append(line.decode("utf-8"))
if countlines==N:
self.current_chunk = chunk
chunk=[]
countlines=0
def main():
proc = subprocess.Popen(['myprocess'],
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
t = threading.Thread(target=output_reader, args=(proc,))
t.start()
try:
time.sleep(0.2)
for i in range(10):
time.sleep(1) # waits a while before getting latest lines
print(self.current_chunk)
finally:
proc.terminate()
try:
proc.wait(timeout=0.2)
print('== subprocess exited with rc =', proc.returncode)
except subprocess.TimeoutExpired:
print('subprocess did not terminate in time')
t.join()
Here is another possible solution. It is a program that you would run as a separate process in the pipeline, which presents a REST API that when queried will return the last N lines that it read on stdin (where N and the port number are supplied on stdin). It is using run in flask so should not be used in situations where the outside world has access to the local server port to make requests, though this could be adapted.
import sys
import time
import threading
import argparse
from flask import Flask, request
from flask_restful import Resource, Api
class Server:
def __init__(self):
self.data = {'at_eof': False,
'lines_read': 0,
'latest_lines': []}
self.thread = None
self.args = None
self.stop = False
def parse_args(self):
parser = argparse.ArgumentParser()
parser.add_argument("num_lines", type=int,
help="number of lines to cache")
parser.add_argument("port", type=int,
help="port to serve on")
self.args = parser.parse_args()
def start_updater(self):
def updater():
lines = self.data['latest_lines']
while True:
if self.stop:
return
line = sys.stdin.readline()
if not line:
break
self.data['lines_read'] += 1
lines.append(line)
while len(lines) > self.args.num_lines:
lines.pop(0)
self.data['at_eof'] = True
self.thread = threading.Thread(target=updater)
self.thread.start()
def get_data(self):
return self.data
def shutdown(self):
self.stop = True
func = request.environ.get('werkzeug.server.shutdown')
if func:
func()
return 'Shutting down'
else:
return 'shutdown failed'
def add_apis(self, app):
class GetData(Resource):
get = self.get_data
class Shutdown(Resource):
get = self.shutdown
api = Api(app)
api.add_resource(GetData, "/getdata")
api.add_resource(Shutdown, "/shutdown")
def run(self):
self.parse_args()
self.start_updater()
app = Flask(__name__)
self.add_apis(app)
app.run(port=self.args.port)
server = Server()
server.run()
Example usage: here is a test program whose output we want to serve:
import sys
import time
for i in range(100):
print("this is line {}".format(i))
sys.stdout.flush()
time.sleep(.1)
And a simple pipeline to launch it (here from the linux shell prompt but could be done via subprocess.Popen), serving the last 5 lines, on port 8001:
python ./writer.py | python ./server.py 5 8001
An example query, here using curl as the client but it could be done via Python requests:
$ curl -s http://localhost:8001/getdata
{"at_eof": false, "lines_read": 30, "latest_lines": ["this is line 25\n", "this is line 26\n", "this is line 27\n", "this is line 28\n", "this is line 29\n"]}
The server also provides an http://localhost:<port>/shutdown URL to terminate it, though if you call it before you first see "at_eof": true, then expect the writer to die with a broken pipe.
I have a simple python program:
test.py:
import time
for i in range(100000):
print i
time.sleep(0.5)
I want to use another program that executes the above one in order to read the last line output while the above program is counting.
import subprocess
process = subprocess.Popen("test",stdout=PIPE)
sleep(20) # sleeps an arbitrary time
print stdout.readlines()[-1]
The problem is that process.stdout.readlines() waits until test.py finishes execution.
Is there any way to read the last line that has been writen in the output while the program is executing?
You could use collections.deque to save only the last specified number of lines:
#!/usr/bin/env python
import collections
import subprocess
import time
import threading
def read_output(process, append):
for line in iter(process.stdout.readline, ""):
append(line)
def main():
process = subprocess.Popen(["program"], stdout=subprocess.PIPE)
# save last `number_of_lines` lines of the process output
number_of_lines = 1
q = collections.deque(maxlen=number_of_lines)
t = threading.Thread(target=read_output, args=(process, q.append))
t.daemon = True
t.start()
#
time.sleep(20)
# print saved lines
print ''.join(q),
# process is still running
# uncomment if you don't want to wait for the process to complete
##process.terminate() # if it doesn't terminate; use process.kill()
process.wait()
if __name__=="__main__":
main()
See other tail-like solutions that print only the portion of the output
See here if your child program uses a block-buffering (instead of line-bufferring) for its stdout while running non-interactively.
Fairly trivial with sh.py:
import sh
def process_line(line):
print line
process = sh.python("test.py", _out=process_line)
process.wait()