Wait all subprocess to complete in python - python

I have a method which executes a command using subprocess,
I want to call that method "n" no of times and wait for all "n" subprocess to complete
for example:
import subprocess
class mysubprocess():
def child_process(self,directory):
self.process=subprocess.Popen('ls',cwd=directory)
def execute(self):
directory=['/home/suresh/Documents','/home/suresh/Downloads']
for i in directory:
print(i)
self.child_process(directory)
self.process.wait()
def main():
myobject=mysubprocess()
myobject.execute()
if __name__=='main':
main()

You need to store references to the Popen objects to call wait methods of them later.
(The code in the question overwrites the Popen object with the last Popen object, and waits only the last sub-process.)
import subprocess
class mysubprocess():
def execute(self, directory_list):
procs = []
for d in directory:
print(d)
procs.append(subprocess.Popen('ls', cwd=d)) # <---
for proc in procs:
proc.wait()
def main():
myobject = mysubprocess()
myobject.execute(['/home/suresh/Documents','/home/suresh/Downloads'])
if __name__ == '__main__':
main()
Other issues
The code is passing the entire list (directory) instead of item.
The last if statement should compare __name__ with '__main__'.

Related

How dose print() function impact sub process's life cycle in python?

If I use print() function in subprocess, then subprocess will terminate as soone as the main process terminated.
The following programs terminate at the same time.
# main.py
import time
from subprocess import Popen
if __name__ == '__main__':
proc = Popen(['python', 'sub.py'])
# sub.py
for i in range(10):
time.sleep(1)
print(i)
However if I comment the print() in sub.py, then sub process continues after main terminates.
Also, If I redirect it's stdout in main.py (see following) , the sub process continues as well.
# main.py
import time
from subprocess import Popen
if __name__ == '__main__':
with open('a.txt", 'w') as out:
proc = Popen(['python', 'sub.py'],stdout=out)

Deleting NamedTemporaryFile in Python after a subprocess call

I don't want to delete the temp file until the subprocess execution completes and hence, I invoke the subprocess script as:
import os
import tempfile
import subprocess
def main():
with tempfile.NamedTemporaryFile("w", delete=False) as temp:
temp.write("Hello World")
temp.flush()
print(f"Temp file is: {temp.name}")
args = ["python3",
os.path.dirname(__file__) + "/hello_world.py",
"--temp-file", temp.name]
subprocess.Popen(args)
return
main()
hello_world.py
import argparse
import sys
def print_hello():
print("Hello World")
return
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="""Test case""")
parser.add_argument('--temp-file',
required=True,
help='For test')
args = parser.parse_args()
print(args)
print_hello()
sys.exit(0)
I was hoping the temp file to be deleted once subprocess execution finishes.
Do I need to manually delete the temp file in this case?
Calling subprocess.Popen() starts the process but does not wait for it to finish.
If you want to wait for the process to finish before exiting the with block, you can use subprocess.run() instead.
Edit: Per your comment, you don't want to wait for the process to finish. Since you are creating the file with delete=False, it won't be deleted when the file pointer is closed at the end of the with block, so you will need to manually delete the path, either in the parent or child process.

How to run 10 python programs simultaneously?

I have a_1.py~a_10.py
I want to run 10 python programs in parallel.
I tried:
from multiprocessing import Process
import os
def info(title):
I want to execute python program
def f(name):
for i in range(1, 11):
subprocess.Popen(['python3', f'a_{i}.py'])
if __name__ == '__main__':
info('main line')
p = Process(target=f)
p.start()
p.join()
but it doesn't work
How do I solve this?
I would suggest using the subprocess module instead of multiprocessing:
import os
import subprocess
import sys
MAX_SUB_PROCESSES = 10
def info(title):
print(title, flush=True)
if __name__ == '__main__':
info('main line')
# Create a list of subprocesses.
processes = []
for i in range(1, MAX_SUB_PROCESSES+1):
pgm_path = f'a_{i}.py' # Path to Python program.
command = f'"{sys.executable}" "{pgm_path}" "{os.path.basename(pgm_path)}"'
process = subprocess.Popen(command, bufsize=0)
processes.append(process)
# Wait for all of them to finish.
for process in processes:
process.wait()
print('Done')
If you just need to call 10 external py scripts (a_1.py ~ a_10.py) as a separate processes - use subprocess.Popen class:
import subprocess, sys
for i in range(1, 11):
subprocess.Popen(['python3', f'a_{i}.py'])
# sys.exit() # optional
It's worth to look at a rich subprocess.Popen signature (you may find some useful params/options)
You can use a multiprocessing pool to run them concurrently.
import multiprocessing as mp
def worker(module_name):
""" Executes a module externally with python """
__import__(module_name)
return
if __name__ == "__main__":
max_processes = 5
module_names = [f"a_{i}" for i in range(1, 11)]
print(module_names)
with mp.Pool(max_processes) as pool:
pool.map(worker, module_names)
The max_processes variable is the maximum number of workers to have working at any given time. In other words, its the number of processes spawned by your program. The pool.map(worker, module_names) uses the available processes and calls worker on each item in your module_names list. We don't include the .py because we're running the module by importing it.
Note: This might not work if the code you want to run in your modules is contained inside if __name__ == "__main__" blocks. If that is the case, then my recommendation would be to move all the code in the if __name__ == "__main__" blocks of the a_{} modules into a main function. Additionally, you would have to change the worker to something like:
def worker(module_name):
module = __import__(module_name) # Kind of like 'import module_name as module'
module.main()
return

terminate all processes in a Pool

I have a python script that looks like follows:
import os
import tempfile
from multiprocessing import Pool
def runReport(a, b, c):
# do task.
temp_dir = tempfile.gettempdir()
if (os.path.isfile(temp_dir + "/stop_check")):
# How to terminate all processes in the pool here?
def runReports(args):
return runReport(*args)
def main(argv):
pool = Pool(4)
args = []
# Code to generate args. args is an array of tuples of form (a, b, c)
pool.map(runReports, args)
if (__name__ == '__main__'):
main(sys.argv[1:])
There is another python script that creates this file /tmp/stop_check.
When this file gets created, I need to terminate the Pool. How can I achieve this?
Only the parent process can terminate the pool. You're better off having the parent run a loop that checks for the existence of that file, rather than trying to have each child do it and then signal the parent somehow:
import os
import sys
import time
import tempfile
from multiprocessing import Pool
def runReport(*args):
# do task
def runReports(args):
return runReport(*args)
def main(argv):
pool = Pool(4)
args = []
# Code to generate args. args is an array of tuples of form (a, b, c)
result = pool.map_async(runReports, args)
temp_dir = tempfile.gettempdir()
while not result.ready():
if os.path.isfile(temp_dir + "/stop_check"):
pool.terminate()
break
result.wait(.5) # Wait a bit to avoid pegging the CPU. You can tune this value as you see fit.
if (__name__ == '__main__'):
main(sys.argv[1:])
By using map_async instead of map, you're free to have the parent use a loop to check for the existence of the file, and then terminate the pool when necessary. Do not that using terminate to kill the children means that they won't get to do any clean up at all, so you need to make sure none of them access resources that could get left in an inconsistent state if the process dies while using them.

Python: How to make program wait till function's or method's completion

Often there is a need for the program to wait for a function to complete its work. Sometimes it is opposite: there is no need for a main program to wait.
I've put a simple example. There are four buttons. Clicking each will call the same calculate() function. The only difference is the way the function is called.
"Call Directly" button calls calculate() function directly. Since there is a 'Function End' print out it is evident that the program is waiting for the calculate function to complete its job.
"Call via Threading" calls the same function this time using threading mechanism. Since the program prints out ': Function End' message immidiately after the button is presses I can conclude the program doesn't wait for calculate() function to complete. How to override this behavior? How to make program wait till calculate() function is finished?
"Call via Multiprocessing" buttons utilizes multiprocessing to call calculate() function.
Just like with threading multiprocessing doesn't wait for function completion. What statement we have to put in order to make it wait?
"Call via Subprocess" buttons doesn't do anything since I didn't figure out the way to hook subprocess to run internal script function or method. It would be interesting to see how to do it...
Example:
from PyQt4 import QtCore, QtGui
app = QtGui.QApplication(sys.argv)
def calculate(listArg=None):
print '\n\t Starting calculation...'
m=0
for i in range(50000000):
m+=i
print '\t ...calculation completed\n'
class Dialog_01(QtGui.QMainWindow):
def __init__(self):
super(Dialog_01, self).__init__()
myQWidget = QtGui.QWidget()
myBoxLayout = QtGui.QVBoxLayout()
directCall_button = QtGui.QPushButton("Call Directly")
directCall_button.clicked.connect(self.callDirectly)
myBoxLayout.addWidget(directCall_button)
Button_01 = QtGui.QPushButton("Call via Threading")
Button_01.clicked.connect(self.callUsingThreads)
myBoxLayout.addWidget(Button_01)
Button_02 = QtGui.QPushButton("Call via Multiprocessing")
Button_02.clicked.connect(self.callUsingMultiprocessing)
myBoxLayout.addWidget(Button_02)
Button_03 = QtGui.QPushButton("Call via Subprocess")
Button_03.clicked.connect(self.callUsingSubprocess)
myBoxLayout.addWidget(Button_03)
myQWidget.setLayout(myBoxLayout)
self.setCentralWidget(myQWidget)
self.setWindowTitle('Dialog 01')
def callUsingThreads(self):
print '------------------------------- callUsingThreads() ----------------------------------'
import threading
self.myEvent=threading.Event()
self.c_thread=threading.Thread(target=calculate)
self.c_thread.start()
print "\n\t\t : Function End"
def callUsingMultiprocessing(self):
print '------------------------------- callUsingMultiprocessing() ----------------------------------'
from multiprocessing import Pool
pool = Pool(processes=3)
try: pool.map_async( calculate, ['some'])
except Exception, e: print e
print "\n\t\t : Function End"
def callDirectly(self):
print '------------------------------- callDirectly() ----------------------------------'
calculate()
print "\n\t\t : Function End"
def callUsingSubprocess(self):
print '------------------------------- callUsingSubprocess() ----------------------------------'
import subprocess
print '-missing code solution'
print "\n\t\t : Function End"
if __name__ == '__main__':
dialog_1 = Dialog_01()
dialog_1.show()
dialog_1.resize(480,320)
sys.exit(app.exec_())
Use a queue: each thread when completed puts the result on the queue and then you just need to read the appropriate number of results and ignore the remainder:
#!python3.3
import queue # For Python 2.x use 'import Queue as queue'
import threading, time, random
def func(id, result_queue):
print("Thread", id)
time.sleep(random.random() * 5)
result_queue.put((id, 'done'))
def main():
q = queue.Queue()
threads = [ threading.Thread(target=func, args=(i, q)) for i in range(5) ]
for th in threads:
th.daemon = True
th.start()
result1 = q.get()
result2 = q.get()
print("Second result: {}".format(result2))
if __name__=='__main__':
main()
Documentation for Queue.get() (with no arguments it is equivalent to Queue.get(True, None):
Queue.get([block[, timeout]])
Remove and return an item from the queue. If optional args block is true and timeout is None (the default), block if necessary until an item is available. If timeout is a positive number, it blocks at most timeout seconds and raises the Empty exception if no item was available within that time. Otherwise (block is false), return an item if one is immediately available, else raise the Empty exception (timeout is ignored in that case).
How to wait until only the first thread is finished in Python
You can to use .join() method too.
what is the use of join() in python threading
I find that using the "pool" submodule within "multiprocessing" works amazingly for executing multiple processes at once within a Python Script.
See Section: Using a pool of workers
Look carefully at "# launching multiple evaluations asynchronously may use more processes" in the example. Once you understand what those lines are doing, the following example I constructed will make a lot of sense.
import numpy as np
from multiprocessing import Pool
def desired_function(option, processes, data, etc...):
# your code will go here. option allows you to make choices within your script
# to execute desired sections of code for each pool or subprocess.
return result_array # "for example"
result_array = np.zeros("some shape") # This is normally populated by 1 loop, lets try 4.
processes = 4
pool = Pool(processes=processes)
args = (processes, data, etc...) # Arguments to be passed into desired function.
multiple_results = []
for i in range(processes): # Executes each pool w/ option (1-4 in this case).
multiple_results.append(pool.apply_async(param_process, (i+1,)+args)) # Syncs each.
results = np.array(res.get() for res in multiple_results) # Retrieves results after
# every pool is finished!
for i in range(processes):
result_array = result_array + results[i] # Combines all datasets!
The code will basically run the desired function for a set number of processes. You will have to carefully make you're function can distinguish between each process (hence why I added the variable "option".) Additionally, it doesn't have to be an array that is being populated in the end, but for my example thats how I used it. Hope this simplifies or helps you better understand the power of multiprocessing in Python!

Categories

Resources