Non-blocking read from subprocess.Popen fails if the process exits fast

Non-blocking read from subprocess.Popen fails if the process exits fast - python

I followed the accepted answer for this question A non-blocking read on a subprocess.PIPE in Python to read non-blocking from a subprocess. This generally works fine, except if the process I call terminates quickly.
This is on Windows.
To illustrate, I have a bat file that simply writes one line to stdout:
test.bat:
#ECHO OFF
ECHO Fast termination
And here the python code, adapted from above mentioned answer:
from subprocess import PIPE, Popen
from threading import Thread
from queue import Queue, Empty
def enqueue_output(out, queue):
for line in iter(out.readline, b''):
queue.put(line)
out.close()
p = Popen(['test.bat'], stdout=PIPE, bufsize=-1,
text=True)
q = Queue()
t = Thread(target=enqueue_output, args=(p.stdout, q))
t.daemon = True # thread dies with the program
t.start()
output = str()
while True:
try:
line = q.get_nowait()
except Empty:
line = ""
output += line
if p.poll() is not None:
break
print(output)
Sometimes, the line from the bat file is correctly captured and printed, sometimes nothing is captured an printed. I suspect that the subprocess might finish before the thread connects the queue to the pipe, and then it doesn't read anything. If I add a little wait of 2 seconds in the bat file before echoing the line, it seems to always work. Likewise the behavior can be forced by adding a little sleep after the Popen in the python code. Is there a way to reliably capture the output of the subprocess even if it finishes immediately while still doing a non-blocking read?

Related

Python Subprocess readline() hangs; can't use normal options

To start, I'm aware this looks like a duplicate. I've been reading:
Python subprocess readlines() hangs
Python Subprocess readline hangs() after reading all input
subprocess readline hangs waiting for EOF
But these options either straight don't work or I can't use them.
The Problem
# Obviously, swap HOSTNAME1 and HOSTNAME2 with something real
cmd = "ssh -N -f -L 1111:<HOSTNAME1>:80 <HOSTNAME2>"
p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, env=os.environ)
while True:
out = p.stdout.readline()
# Hangs here ^^^^^^^ forever
out = out.decode('utf-8')
if out:
print(out)
if p.poll() is not None:
break
My dilemma is that the function calling the subprocess.Popen() is a library function for running bash commands, so it needs to be very generic and has the following restrictions:
Must display output as it comes in; not block and then spam the screen all at once
Can't use multiprocessing in case the parent caller is multiprocessing the library function (Python doesn't allow child processes to have child processes)
Can't use signal.SIGALRM for the same reason as multiprocessing; the parent caller may be trying to set their own timeout
Can't use third party non-built-in modules
Threading straight up doesn't work. When the readline() call is in a thread, thread.join(timeout=1)lets the program continue, but ctrl+c doesn't work on it at all, and calling sys.exit() doesn't exit the program, since the thread is still open. And as you know, you can't kill a thread in python by design.
No manner of bufsize or other subprocess args seems to make a difference; neither does putting readline() in an iterator.
I would have a workable solution if I could kill a thread, but that's super taboo, even though this is definitely a legitimate use case.
I'm open to any ideas.

One option is to use a thread to publish to a queue. Then you can block on the queue with a timeout. You can make the reader thread a daemon so it won't prevent system exit. Here's a sketch:
import subprocess
from threading import Thread
from queue import Queue
def reader(stream, queue):
while True:
line = stream.readline()
queue.put(line)
if not line:
break
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, ...)
queue = Queue()
thread = Thread(target=reader, args=(p.stdout, queue))
thread.daemon = True
thread.start()
while True:
out = queue.get(timeout=1) # timeout is optional
if not out: # Reached end of stream
break
... # Do whatever with output
# Output stream was closed but process may still be running
p.wait()
Note that you should adapt this answer to your particular use case. For example, you may want to add a way to signal to the reader thread to stop running before reaching the end of stream.
Another option would be to poll the input stream, like in this question: timeout on subprocess readline in python

I finally got a working solution; the key piece of information I was missing was thread.daemon = True, which #augurar pointed out in their answer.
Setting thread.daemon = True allows the thread to be terminated when the main process terminates; therefore unblocking my use of a sub-thread to monitor readline().
Here is a sample implementation of my solution; I used a Queue() object to pass strings to the main process, and I implemented a 3 second timer for cases like the original problem I was trying to solve where the subprocess has finished and terminated, but the readline() is hung for some reason.
This also helps avoid a race condition between which thing finishes first.
This works for both Python 2 and 3.
import sys
import threading
import subprocess
from datetime import datetime
try:
import queue
except:
import Queue as queue # Python 2 compatibility
def _monitor_readline(process, q):
while True:
bail = True
if process.poll() is None:
bail = False
out = ""
if sys.version_info[0] >= 3:
out = process.stdout.readline().decode('utf-8')
else:
out = process.stdout.readline()
q.put(out)
if q.empty() and bail:
break
def bash(cmd):
# Kick off the command
process = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, shell=True)
# Create the queue instance
q = queue.Queue()
# Kick off the monitoring thread
thread = threading.Thread(target=_monitor_readline, args=(process, q))
thread.daemon = True
thread.start()
start = datetime.now()
while True:
bail = True
if process.poll() is None:
bail = False
# Re-set the thread timer
start = datetime.now()
out = ""
while not q.empty():
out += q.get()
if out:
print(out)
# In the case where the thread is still alive and reading, and
# the process has exited and finished, give it up to 3 seconds
# to finish reading
if bail and thread.is_alive() and (datetime.now() - start).total_seconds() < 3:
bail = False
if bail:
break
# To demonstrate output in realtime, sleep is called in between these echos
bash("echo lol;sleep 2;echo bbq")

capture stdout and stderr of process that runs an infinite loop

I want to run a process that runs an infinite loop (for example, starting a database server) from a python script and capture stdout and stderr. I tried this, but p.communicate() never returns, apparently because the process needs to finish first.
from subprocess import Popen, PIPE, STDOUT
cmd = "python infinite_loop.py"
p = Popen(cmd, shell=True, stdin=PIPE, stdout=PIPE, stderr=STDOUT)
print("the process is running")
stdout, stderr = p.communicate()
print(stdout)
I'd like to get the output in some kind of streaming form. For example, I might want to save every 100 characters to a new log file. How can I do it?

Edit: Something closer to what you already had, as asyncio seems like overkill for a single coroutine:
import sys
from subprocess import Popen, PIPE, STDOUT
args = (sys.executable, '-u', 'test4.py')
cmd = ' '.join(args)
p = Popen(cmd, shell=True, stdin=PIPE, stdout=PIPE, stderr=STDOUT, universal_newlines=True)
print("the process is running")
for line in iter(p.stdout.readline,''):
line = line.rstrip()
print(line)
Original:
I threw something together. The following uses asyncio.subprocess to read lines from a subprocess' output, and then do something with them (in this case, just print() them).
The subprocess is specified by args, and in my case is just running another python instance in unbuffered mode with the following script (test4.py):
import time
for _ in range(10):
print(time.time(), flush=True)
time.sleep(1)
I'm sleeping in the for loop so it's clear whether the lines are coming in individually or all at once when the program has finished. (If you don't believe me, you can change the for loop to while True:, which will never finish).
The "supervisor" script is:
import asyncio.subprocess
import sys
async def get_lines(args):
proc = await asyncio.create_subprocess_exec(*args, stdout=asyncio.subprocess.PIPE)
while proc.returncode is None:
data = await proc.stdout.readline()
if not data: break
line = data.decode('ascii').rstrip()
# Handle line (somehow)
print(line)
if sys.platform == "win32":
loop = asyncio.ProactorEventLoop()
asyncio.set_event_loop(loop)
else:
loop = asyncio.get_event_loop()
args = (sys.executable, '-u', 'test4.py')
loop.run_until_complete(get_lines(args))
loop.close()
Note that async def is Python 3.5+, but you could use #asyncio.coroutine in 3.4.

How to collect output from a Python subprocess

I am trying to make a python process that reads some input, processes it and prints out the result. The processing is done by a subprocess (Stanford's NER), for ilustration I will use 'cat'. I don't know exactly how much output NER will give, so I use run a separate thread to collect it all and print it out. The following example illustrates.
import sys
import threading
import subprocess
# start my subprocess
cat = subprocess.Popen(
['cat'],
shell=False, stdout=subprocess.PIPE, stdin=subprocess.PIPE,
stderr=None)
def subproc_cat():
""" Reads the subprocess output and prints out """
while True:
line = cat.stdout.readline()
if not line:
break
print("CAT PROC: %s" % line.decode('UTF-8'))
# a daemon that runs the above function
th = threading.Thread(target=subproc_cat)
th.setDaemon(True)
th.start()
# the main thread reads from stdin and feeds the subprocess
while True:
line = sys.stdin.readline()
print("MAIN PROC: %s" % line)
if not line:
break
cat.stdin.write(bytes(line.strip() + "\n", 'UTF-8'))
cat.stdin.flush()
This seems to work well when I enter text with the keyboard. However, if I try to pipe input into my script (cat file.txt | python3 my_script.py), a racing condition seems to occur. Sometimes I get proper output, sometimes not, sometimes it locks down. Any help would be appreciated!
I am runing Ubuntu 14.04, python 3.4.0. The solution should be platform-independant.

Add th.join() at the end otherwise you may kill the thread prematurely before it has processed all the output when the main thread exits: daemon threads do not survive the main thread (or remove th.setDaemon(True) instead of th.join()).

Precarious Popen Piping

I want to use subprocess.Popen to run a process, with the following requirements.
I want to pipe the stdout and stderr back to the caller of Popen as the process runs.
I want to kill the process after timeout seconds if it is still running.
I have come to the conclusion that a flaw in the subprocess API means it cannot fulfill these two requirements at the same time. Consider the following toy programs:
chatty.py
while True:
print 'Hi'
silence.py
while True:
pass
caller.py
import subprocess
import time
def go(command, timeout=60):
proc = subprocess.Popen(command, shell=True,
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
start = time.time()
while proc.poll() is None:
print proc.stdout.read(1024) # <----- Line of interest
if time.time() - start >= timeout:
proc.kill()
break
else:
time.sleep(1)
Consider the marked line above.
If it is included, go('python silence.py') will hang forever - not for just 60 seconds - because read is a blocking call until either 1024 bytes or end of stream, and neither ever comes.
If it is commented, go('python chatty.py') will be printing out 'Hi' over and over, but how can it be streamed back as it is generated? proc.communicate() blocks until end of stream.
I would be happy with a solution that replaces requirement (1) above with "In the case where a timeout did not occur, I want to get stdout and stderr once the algorithm finishes." Even this has been problematic. My implementation attempt is below.
speech.py
for i in xrange(0, 10000):
print 'Hi'
caller2.py
import subprocess
import time
def go2(command, timeout=60):
proc = subprocess.Popen(command, shell=True,
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
start = time.time()
while True:
if proc.poll() is not None:
print proc.communicate()
break
elif time.time() - start >= timeout:
proc.kill()
break
else:
time.sleep(1)
But even this still has problems. Even though python speech.py runs in just a couple seconds, go2('python speech.py') takes the full 60 seconds. This is because the call to print 'Hi' in speech.py is blocking until proc.communicate() is called when the process is killed. Since proc.stdout.read had the problem demonstrated before with silence.py, I'm really at a loss for how to get this working.
How can I get both the stdout and stderr and the timeout behavior?

The trick is to setup a side-band timer to kill the process. I wrote up a program half way between chatty and silent:
import time
import sys
for i in range(10,0,-1):
print i
time.sleep(1)
And then a program to kill it early:
import subprocess as subp
import threading
import signal
proc = subp.Popen(['python', 'longtime.py'], stdout=subp.PIPE,
stderr=subp.PIPE)
timer = threading.Timer(3, lambda proc: proc.send_signal(signal.SIGINT),
args=(proc,))
timer.start()
out, err = proc.communicate()
timer.cancel()
print proc.returncode
print out
print err
and it output:
$ python killer.py
1
10
9
8
Traceback (most recent call last):
File "longtime.py", line 6, in <module>
time.sleep(1)
KeyboardInterrupt
Your timer could be made fancier, like trying increasingly bad signals til the process completes, but you get the idea.

Python: Using popen poll on background process

I am running a long process (actually another python script) in the background. I need to know when it has finished. I have found that Popen.poll() always returns 0 for a background process. Is there another way to do this?
p = subprocess.Popen("sleep 30 &", shell=True,
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
a = p.poll()
print(a)
Above code never prints None.

You don't need to use the shell backgrounding & syntax, as subprocess will run the process in the background by itself
Just run the command normally, then wait until Popen.poll returns not None
import time
import subprocess
p = subprocess.Popen("sleep 30", shell=True)
# Better: p = subprocess.Popen(["sleep", "30"])
# Wait until process terminates
while p.poll() is None:
time.sleep(0.5)
# It's done
print("Process ended, ret code:", p.returncode)

I think you want either the popen.wait() or popen.communicate() commands. Communicate will grab the stdout and stderr data which you've put into PIPE. If the other item is a Python script I would avoid running a shell=True call by doing something like:
p = subprocess.Popen([python.call, "my", params, (go, here)], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
(stdout, stderr) = p.communicate()
print(stdout)
print(stderr)
Of course these hold the main thread and wait for the other process to complete, which might be bad. If you want to busy wait then you could simply wrap your original code in a loop. (Your original code did print "None" for me, btw)
Example of the wrapping in a loop solution:
p = subprocess.Popen([python.call, "my", params, (go, here)], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
while p.poll() == None:
# We can do other things here while we wait
time.sleep(.5)
p.poll()
(results, errors) = p.communicate()
if errors == '':
return results
else:
raise My_Exception(errors)

You shouldn't run your script with ampersand at the end. Because shell forks your process and returns 0 exit code.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Non-blocking read from subprocess.Popen fails if the process exits fast - python

Related

Python Subprocess readline() hangs; can't use normal options

capture stdout and stderr of process that runs an infinite loop

How to collect output from a Python subprocess

Precarious Popen Piping

Python: Using popen poll on background process

Categories

Resources