Summary: I want to start an external process from Python (version 3.6), poll the result nonblocking, and kill after a timeout.
Details: there is an external process with 2 "bad habits":
It prints out the relevant result after an undefined time.
It does not stop after it printed out the result.
Example: maybe the following simple application resembles mostly the actual program to be called (mytest.py; source code not available):
import random
import time
print('begin')
time.sleep(10*random.random())
print('result=5')
while True: pass
This is how I am trying to call it:
import subprocess, time
myprocess = subprocess.Popen(['python', 'mytest.py'], stdout=subprocess.PIPE)
for i in range(15):
time.sleep(1)
# check if something is printed, but do not wait to be printed anything
# check if the result is there
# if the result is there, then break
myprocess.kill()
I want to implement the logic in comment.
Analysis
The following are not appropriate:
Use myprocess.communicate(), as it waits for termination, and the subprocess does not terminate.
Kill the process and then call myprocess.communicate(), because we don't know when exactly the result is printed out
Use process.stdout.readline() because that is a blocikg statement, so it waits until something is printed. But here at the end does not print anything.
The type of the myprocess.stdout is io.BufferedReader. So the question practically is: is there a way to check if something is printed to the io.BufferedReader, and if so, read it, but otherwise do not wait?
I think I got the exact package you need.
Meet command_runner, which is a subprocess wrapper and allows:
Live stdout / stderr output
timeouts regardless of execution
process tree including child processes killing in case of timeout
stdout / stderr redirection to queues, files or callback functions
Install with pip install command_runner
Usage:
from command_runner import command_runner
def callback(stdout_output):
# Do whatever you want here with the output
print(stdout_output)
exit_code, output = command_runner("python mytest.py", timeout=300, stdout=callback, method='poller')
if exit_code == -254:
print("Oh no, we got a timeout")
print(output)
# Check for good exit_code and full stdout output here
If timeout is reached, you'll get exit_code -254 but still get to have output filled with whatever your subprocess wrote to stdout/stderr.
Disclaimer: I'm the author of command_runner
Additional non blocking examples using queues can be seen on the github page.
Related
I'm having a problem with subprocess poll not returning the return code when the process has finished.
I found out how to set a timeout on subprocess.Popen and used that as the basis for my code. However, I have a call that uses Java that doesn't correctly report the return code so each call "times out" even though it is actually finished. I know the process has finished because when removing the poll timeout check, the call runs without issue returning a good exit code and within the time limit.
Here is the code I am testing with.
import subprocess
import time
def execute(command):
print('start command: {}'.format(command))
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
print('wait')
wait = 10
while process.poll() is None and wait > 0:
time.sleep(1)
wait -= 1
print('done')
if wait == 0:
print('terminate')
process.terminate()
print('communicate')
stdout, stderr = process.communicate()
print('rc')
exit_code = process.returncode
if exit_code != 0:
print('got bad rc')
if __name__ == '__main__':
execute(['ping','-n','15','127.0.0.1']) # correctly times out
execute(['ping','-n','5','127.0.0.1']) # correctly runs within the time limit
# incorrectly times out
execute(['C:\\dev\\jdk8\\bin\\java.exe', '-jar', 'JMXQuery-0.1.8.jar', '-url', 'service:jmx:rmi:///jndi/rmi://localhost:18080/jmxrmi', '-json', '-q', 'java.lang:type=Runtime;java.lang:type=OperatingSystem'])
You can see that two examples are designed to time out and two are not to time out and they all work correctly. However, the final one (using jmxquery to get tomcat metrics) doesn't return the exit code and therefore "times out" and has to be terminated, which then causes it to return an error code of 1.
Is there something I am missing in the way subprocess poll is interacting with this Java process that is causing it to not return an exit code? Is there a way to get a timeout option to work with this?
This has the same cause as a number of existing questions, but the desire to impose a timeout requires a different answer.
The OS deliberately gives only a small amount of buffer space to each pipe. When a process writes to one that is full (because the reader has not yet consumed the previous output), it blocks. (The reason is that a producer that is faster than its consumer would otherwise be able to quickly use a great deal of memory for no gain.) Therefore, if you want to do more than one of the following with a subprocess, you have to interleave them rather than doing each in turn:
Read from standard output
Read from standard error (unless it’s merged via subprocess.STDOUT)
Wait for the process to exit, or for a timeout to elapse
Of course, the subprocess might close its streams before it exits, write useful output after you notice the timeout and before you kill it, and/or start additional processes that keep the pipe open indefinitely, so you might want to have multiple timeouts. Probably what’s most informative is the EOF on the pipe, so repeatedly use something like select to wait for (however much is left of) the timeout, issue single reads on the streams that are ready, and wait (with another timeout if you’re concerned about hangs after an early stream closure) on EOF. If the timeout occurs instead, (try to) kill the subprocess, and consider issuing non-blocking reads (or another timeout loop) to get any last available output before closing the pipes.
Using the other answer by #DavisHerring as the basis for more research, I came across a concept that worked for my original case. Here is the code that came out of that.
import subprocess
import threading
import time
def execute(command):
print('start command: {}'.format(command))
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
timer = threading.Timer(10, terminate_process, [process])
timer.start()
print('communicate')
stdout, stderr = process.communicate()
print('rc')
exit_code = process.returncode
timer.cancel()
if exit_code != 0:
print('got bad rc')
def terminate_process(p):
try:
p.terminate()
except OSError:
pass # ignore error
It uses the threading.Timer to make sure that the process doesn't go over the time limit and terminates the process if it does. It otherwise waits for a response back and cancels the timer once it finishes.
Let's say that I have this simple line in python:
os.system("sudo apt-get update")
of course, apt-get will take some time untill it's finished, how can I check in python if the command had finished or not yet?
Edit: this is the code with Popen:
os.environ['packagename'] = entry.get_text()
process = Popen(['dpkg-repack', '$packagename'])
if process.poll() is None:
print "It still working.."
else:
print "It finished"
Now the problem is, it never print "It finished" even when it really finish.
As the documentation states it:
This is implemented by calling the Standard C function system(), and
has the same limitations
The C call to system simply runs the program until it exits. Calling os.system blocks your python code until the bash command has finished thus you'll know that it is finished when os.system returns. If you'd like to do other stuff while waiting for the call to finish, there are several possibilities. The preferred way is to use the subprocessing module.
from subprocess import Popen
...
# Runs the command in another process. Doesn't block
process = Popen(['ls', '-l'])
# Later
# Returns the return code of the command. None if it hasn't finished
if process.poll() is None:
# Still running
else:
# Has finished
Check the link above for more things you can do with Popen
For a more general approach at running code concurrently, you can run that in another thread or process. Here's example code:
from threading import Thread
...
thread = Thread(group=None, target=lambda:os.system("ls -l"))
thread.run()
# Later
if thread.is_alive():
# Still running
else:
# Has finished
Another option would be to use the concurrent.futures module.
os.system will actually wait for the command to finish and return the exit status (format dependent format).
os.system is blocking; it calls the command waits for its completion, and returns its return code.
So, it'll be finished once os.system returns.
If your code isn't working, I think that could be caused by one of sudo's quirks, it refuses to give rights on certain environments(I don't know the details tho.).
From python I am calling a java function:
os.system("java -jar example.jar run myFunction 'inFile.txt' 'outFile.txt' " )
This function is processing a file and the output is written into 'outFile.txt'. The output is dependent on the information in 'inFile.txt'. While processing the input file and writing into out file, sometimes the 'outFile.txt' grows too large (tens of GBs) and at that time, i want to quit and the current processing and move on to process another inFile.txt
Is there is way to know that my outFile.txt that is being written has grown more than say 10GB.
Edit:
As suggested by Maksym, I am using the following code and seems to be working. Thanks
import subprocess
from time import sleep
p = subprocess.Popen(["java", "-jar", "example.jar", "run", "myFunction", "'inFile.txt'", "'outFile.txt'")
rc = p.poll() #returncode
while (rc == None):
sleep(1)
if(os.path.getsize(outFileName) < 1000000000):
rc = p.poll()
continue
else:
p.kill()
break
Have a look at subprocess module. Using Popen you can fork a process and kill it when you need this:
import subprocess
from time import sleep
p = subprocess.Popen(["java", "-jar", "example.jar", "run", "myFunction", "'inFile.txt'", "'outFile.txt'")
while not check_my_conditions():
sleep(my_timeout)
p.kill()
Then, you can rotate your files and restart the process.
Instead of directly calling os.system, you should strongly considering using the multiprocessing.Process built-in class. It handles dealing with spawned processes much more gracefully.
You need to watch the output file periodically, either using something like os.stat to check the file size. You can then kill the original process (or whatever you want to do) when the threshold is exceeded.
Does the java application provide any output (for example, a count of records processed) to stdout or stderr while it runs? If so, you could invoke it using Python's Popen class (in the subprocess module) and estimate when it has processed 'too much'.
I am writing a script in which in the external system command may sometimes require user input. I am not able to handle that properly. I have tried using os.popen4 and subprocess module but could not achieve the desired behavior.
Below mentioned example would show this problem using "cp" command. ("cp" command is used to show this problem, i am calling some different exe which may similarly prompt for user response in some scenarios). In this example there are two files present on disk and when user tries to copy file1 to file2, an conformer message comes up.
proc = subprocess.Popen("cp -i a.txt b.txt", shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT,)
stdout_val, stderr_val = proc.communicate()
print stdout_val
b.txt?
proc.communicate("y")
Now in this example if i read only stdout/stderr and prints it, later on if i try to write "y" or "n" based on user's input, i got an error that channel is closed.
Can some one please help me on achieving this behavior in python such that i can print stdout first, then should take user input and write stdin later on.
I found another solution (Threading) from Non-blocking read on a subprocess.PIPE in python , not sure whether it would help. But it appears it is printing question from cp command, i have modified code but not sure on how to write in threading code.
import sys
from subprocess import PIPE, Popen
from threading import Thread
try:
from Queue import Queue, Empty
except ImportError:
from queue import Queue, Empty
ON_POSIX = 'posix' in sys.builtin_module_names
def enqueue_output(out, queue):
for line in iter(out.readline, b''):
queue.put(line)
out.close()
p = Popen(['cp', '-i', 'a.txt', 'b.txt'],stdin=PIPE, stdout=PIPE, bufsize=1, close_fds=ON_POSIX)
q = Queue()
t = Thread(target=enqueue_output, args=(p.stdout, q))
t.start()
try:
line = q.get_nowait()
except Empty:
print('no output yet')
else:
pass
Popen.communicate will run the subprocess to completion, so you can't call it more than once. You could use the stdin and stdout attributes directly, although that's risky as you could deadlock if the process uses block buffering or the buffers fill up:
stdout_val = proc.stdout.readline()
print stdout_val
proc.stdin.write('y\n')
As there is a risk of deadlock and because this may not work if the process uses block buffering, you would do well to consider using the pexpect package instead.
I don't have a technical answer to this question. More of just a solution. It has something to do with the way the process waits for the input, and once you communicate with the process, a None input is enough to close the process.
For your cp example, what you can do is check the return code immediately with proc.poll(). If the return value is None, you might assume it is trying to wait for input and can ask your user a question. You can then pass the response to the process via proc.communicate(response). It will then pass the value and proceed with the process.
Maybe someone else can chime in with a more technical reason why an initial communicate with a None value closes the process.
I have a question regarding subprocess.popen():
If supposedly the command executed is in a while loop - is there any way for subprocess.popen() to detect it and exit after printing the first output?
Or is there any other way to do this and print the result?
As you see the following program executed on a linux machine just keeps on executing:
>>> import os
>>> import subprocess as sp
>>> p = sp.Popen("yes", stdout=sp.PIPE)
>>> result = p.communicate()[0]
The communicate method is only useful if the program being called is expected to terminate soon with relatively little output. In the case of your example, the "yes" program never terminates, so communicate never completes. In order to deal with subprocesses which execute indefinitely, and may produce a lot of output, you will need to include a loop which repeatedly calls p.poll() until p.poll() returns a value other than None, which would indicate that the process has terminated. While in this loop you should read from p.stdout and p.stderr to consume any output from the program. If you don't consume the output, the buffers may fill up and cause the program to block waiting to be able to write more output.
import subprocess
import time
p = subprocess.Popen("yes", stdout=subprocess.PIPE)
result = ""
start_time = time.time()
while (p.poll() is None):
result += p.stdout.read(8192)
time.sleep(1)
if (time.time() - start_time) > 5:
print "Timeout"
break
print result
Note that the above example will run indefinitely until you kill the "yes" subprocess it is reading input from. If you want to detect that the process doesn't terminate, you can add a time check to the while loop, and jump out at some point once enough time has passed.
If you are certain that your subprocess will terminate of it's own accord, you can simply call communicate() and get back the output, but this does not work in your example for the reasons I explain above.