I am running multiple subprocesses in parallel, but I need to lock each process until the subprocess gives an output (via print function). The subprocesses are running a python script that has been packaged to an executable.
The code looks like this:
import multiprocessing as mp
import subprocess
import os
def main(args):
l,inpath = args
l.acquire()
print "Running KNN.exe for files in %s" % os.path.normpath(inpath).split('\\')[-1]
#Run KNN executable as a subprocess
subprocess.call(os.path.join(os.getcwd(), "KNN.exe"))
#This is where I want to wait for any output from the subprocess before releasing the lock
l.release()
#Here I would like to wait until subprocess is done then print that it is done
l.acquire()
print "Done %s" % os.path.normpath(inpath).split('\\')[-1]
l.release()
if __name__ == "__main__":
#Set working directory path containing input text file
os.chdir("C:\Users\Patrick\Google Drive\KNN")
#Get folder names in directory containing GCM input
manager = mp.Manager()
l = manager.Lock()
gcm_dir = "F:\FIDS_GCM_Data_CMIP5\UTRB\UTRB KNN-CAD\Input"
paths = [(l, os.path.join(gcm_dir, folder)) for folder in os.listdir(gcm_dir)]
#Set up multiprocessing pool
p = mp.Pool(mp.cpu_count())
#Map function through input paths
p.map(main, paths)
So the goal is to lock the process so that a subprocess can be run until receiving an output. After which the lock can be released and the subprocess can continue, until it is complete, then I'd like to print that it is complete.
My question is how can I wait for the single (and only) output from the subprocess before releasing the lock on the process (out of multiple)?
Additionally how can I wait for the process to terminate then print that it is complete?
Your code makes use of the call method, which already waits for the subprocess to finish (which means all output has already been generated). I'm inferring from your question you'd like to be able to differentiate between when output is first written and when the subprocess is finished. Below is your code with my recommended modifications inline:
def main(args):
l,inpath = args
l.acquire()
print "Running KNN.exe for files in %s" % os.path.normpath(inpath).split('\\')[-1]
#Run KNN executable as a subprocess
#Use the Popen constructor
proc = subprocess.Popen(os.path.join(os.getcwd(), "KNN.exe"), stdout=subprocess.PIPE)
#This is where I want to wait for any output from the subprocess before releasing the lock
# Wait until the subprocess has written at least 1 byte to STDOUT (modify if you need different logic)
proc.stdout.read(1)
l.release()
#Here I would like to wait until subprocess is done then print that it is done
#proc.wait()
(proc_output, proc_error) = proc.communicate()
l.acquire()
print "Done %s" % os.path.normpath(inpath).split('\\')[-1]
l.release()
Note that the above doesn't assume you want to do anything with the subprocess's output other than check that it has been generated. If you want to do anything with that output that is less trivial than the above (consume 1 byte then drop it on the floor), the proc.stdout (which is a file object) should represent everything that the subprocess generates while running.
Related
In your Document folder create a folder temp:
/My Documents/temp
Save these few lines as worker.py Python scripts:
import time
from datetime import datetime
for i in range(10):
print '%s...working on iteration %s' % (datetime.now(), i)
time.sleep(0.2)
print '\nCompleted!\n'
Save the code below as caller.py:
import subprocess
cmd = ['python', 'worker.py']
stdout = subprocess.check_output(cmd)
print stdout
(Please note that both Python scripts were saved in to the same folder).
Now using the OS X Terminal or Windows CMD window change the current directory to the folder you created:
cd /My Documents/temp
Now run:
python caller.py
The process takes 2 seconds to complete. When completed it prints out the entire progress log all at once:
2018-01-20 07:52:14.399679...working on iteration 0
...
2018-01-20 07:52:16.216237...working on iteration 9
Completed!
Instead of getting the log printed (all at once after the process has been already completed), I would like the have a real-time progress update. I would like to get every printed line from the process at the same moment it occured.
So, when I run python worker.pycommand it will give me line by line update happening in a real time. How to achieve it?
To get a real-time feed from the subprocess you can use this code in the caller.py
import time
import subprocess
# Start worker process
p = subprocess.Popen(['python', '-u', 'worker.py'], stdout=subprocess.PIPE)
# Loop forever
while True:
# Get new line value
l = p.stdout.readline()
# Stop looping if the child process has terminated
if p.poll() is not None:
break
# Print the line
print l
Note the -u in the subprocess.Popen, you need unbuffered stdout.
https://docs.python.org/3/using/cmdline.html#cmdoption-u
With readline() you are reading a single line per time from the subprocess output. Be aware when the subprocess prints '\nCompleted!\n' you will read it in three loops.
https://docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects
In the example, the loop will run until the subprocess will terminate.
https://docs.python.org/3/library/subprocess.html#subprocess.Popen.poll
I read the question/answer/comments on A non-blocking read on a subprocess.PIPE in Python, but I felt a bit lacking.
When I implemented the solution provided, I noticed that this approach works best when the sub-process ends on it own. But if the subprocess is providing a stream of information and we are looking for a single match of output, then that approach doesn't work for my needs (specifically for Windows, if that matters).
Here is my sample:
File ping.py
import time
def main():
for x in range(100):
print x
time.sleep(1)
if __name__ == '__main__':
print("Starting")
time.sleep(2)
main()
File runner.py
import subprocess
import time
import sys
from Queue import Queue, Empty
from threading import Thread
def enqueue_output(out, queue):
for line in iter(out.readline, b''):
queue.put(line)
out.close()
# Start process we want to listen to
pPing = subprocess.Popen('ping.py',
shell=True,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
)
q = Queue()
t = Thread(target=enqueue_output, args=(pPing.stdout, q))
t.daemon = True
t.start()
# Make sure it's started
print ("get the first line")
try:
line = q.get()
except Empty:
pass
else:
print line.strip()
#look for the 'magic' output
print("empty the queue")
while not q.empty():
line = q.get_nowait().strip()
if (line == "3"):
print("got it!")
sys.exit()
else:
print("not yet")
My expectation is that the runner will make sure the process is started and then wait for the magic output and then stop, which it does. However, the longer the sub-process runs, the longer the runner runs. But since the 'magic' output comes relatively quickly, I have to wait until the subprocess ends before I get anything processed.
What am I missing?
OK, if I understand correctly what you are trying to do, the problem is with ping still being a child process to runner. While you can make read calls non-blocking, the parent process will not actually exit while the child is still running. If you want runner not to wait for the child to finish, read the first line and the first magic output and then exit; you need ping to disassociate itself from the parent process.
Look at this code sample to see how that is done A simple Unix/Linux daemon in Python. Of course you might skip the part where they close and re-open all the I/O streams.
On the same note, I am not sure leaving an open I/O stream connected to the parent will allow the parent to exit, so if that happens to be a problem you might have to figure out another way to exchange data.
I'm executing a function as a thread in python. Now, the program will wait for the function to execute and then terminate after its completion.
My target is to starting the background thread and closing the program calling it.
how can we do it. As in below code, the thread will take 30 min to execute. I want to stop the main program after calling the thread and let the thread run in background.
thread = threading.Thread(target=function_that_runs_for_30_min)
thread.start()
print "Thread Started"
quit()
You cannot do that directly. A thread is just a part of a process. Once the process exits, all the threads are gone. You need to create a background process to achieve that.
You cannot use the multiprocessing module either because it is a package that supports spawning processes using an API similar to the threading module (emphasize mine). As such it has no provision to allow a process to run after the end of the calling one.
The only way I can imagine is to use the subprocess module to restart the script with a specific parameter. For a simple use case, adding a parameter is enough, for more complex command line parameters, the module argparse should be used. Example of code:
import subprocess
import sys
# only to wait some time...
import time
def f(name):
"Function that could run in background for a long time (30')"
time.sleep(5)
print 'hello', name
if __name__ == '__main__':
if (len(sys.argv) > 1) and (sys.argv[1] == 'SUB'):
# Should be an internal execution: start the lengthy function
f('bar')
else:
# normal execution: start a subprocess with same script to launch the function
p = subprocess.Popen("%s %s SUB" % (sys.executable, sys.argv[0]))
# other processing...
print 'END of normal process'
Execution:
C:\>python foo.py
END of normal process
C:\>
and five seconds later:
hello bar
I have a pythonic question:
lets say I am using the subprocess library in python to run a program from the terminal. I need to call the subprocess in a loop.
How can I:
1) I stop subprocess from initiating the program if it has already started. Or...
2) How can I find out if the program is still running so that I don't initiate a new subprocess?
You can check if the process is running using Process.poll. It will return None if the process is currently running. Otherwise, it will return the exit code of the sub-process. For example:
import subprocess
import time
p = None
while True:
if p and p.poll() is None:
print("Process is currently running. Do nothing.")
else: # No process running
if p: # We've run the process at least once already, so get the result.
ret = p.returncode
print("Last run returned {}".format(ret))
p = subprocess.Popen(["some_command", "some arg"])
time.sleep(3) # Wait a few seconds before iterating again.
So I'm trying to effectively create a "branch" in a pipe from subprocess. The idea is to load a file with Popen into a pipe's stdout. Then, I can send that stdout to two (or more) stdin's. This works, more or less. The problem comes when the process needs to see an EOF. As far as I can tell, this happens when you use communicate(None) on a subprocess. However, it also seems to depend on the order I spawned the two processes I'm trying to send data to.
#!/usr/bin/env python
from subprocess import *
import shutil
import os
import shlex
inSub=Popen(shlex.split('cat in.txt'),stdout=PIPE)
print inSub.poll()
queue=[]
for i in range(0,3):
temp=Popen(['cat'],stdin=PIPE)
queue=queue+[temp]
while True:
# print 'hi'
buf=os.read(inSub.stdout.fileno(),10000)
if buf == '': break
for proc in queue:
proc.stdin.write(buf)
queue[1].communicate()
print queue[1].poll()
As long as I use queue[1], things hang at the communicate() line. But if I use queue[2], things don't hang. What's going on? It shouldn't depend on the order the subprocesses were created, should it?
(The in.txt file can really be anything, it doesn't matter.)
I can't see any reason why it would be different for any one of the processes. In any case, closing the stdin pipes will cause Python to send the EOF, ending the processes:
...
while True:
# print 'hi'
buf = os.read(inSub.stdout.fileno(),10000)
if buf == '': break
for proc in queue:
proc.stdin.write(buf)
for proc in queue:
proc.stdin.close()
queue[1].communicate()
...