Using apply_async with callback function for a pool of processes - python

I am trying to understand how multiprocess pools work. In the following programing I created a pool of 4 processes.
And I call apply_async with a callback function that should update a list called result_list
import Queue
from multiprocessing import Process
from multiprocessing import Pool
result_list = []
def foo_pool(q): #Function for each process
print "foo_pool"
if(q.qsize() > 0):
number = q.get()
return number * 2
def log_result(result):
# This is called whenever foo_pool(i) returns a result.
# result_list is modified only by the main process, not the pool workers.
result_list.append(result)
if __name__ == "__main__":
q = Queue.Queue()
for i in range(4):
q.put(i + 1) #Put 1..4 in the queue
p = Pool(4)
p.apply_async(foo_pool, args = (q, ), callback = log_result)
I realize I don't need to use a queue here. But I am testing this for another program which requires me to use a queue.
When I run the program, the function foo_pool is not being called. The print statement print "foo_pool" does not execute. Why is this?

Roughly speaking, apply_async only schedule async task, but not run it. You need to call p.close() and p.join() to trigger execution or r = p.apply_async() and r.get().

Related

Automatically restarting Python sub-processes using identical arguments

I have a python script which calls a series of sub-processes. They need to run "for ever" - but they occasionally die, or get killed. When this happens I need to restart the process using the same arguments as the one which died.
This is a very simplified version:
[edit: this is the less simplified version, which includes "restart" code]
import multiprocessing
import time
import random
def printNumber(number):
print("starting :", number)
while random.randint(0, 5) > 0:
print(number)
time.sleep(2)
if __name__ == '__main__':
children = [] # list
args = {} # dictionary
for processNumber in range(10,15):
p = multiprocessing.Process(
target=printNumber,
args=(processNumber,)
)
children.append(p)
p.start()
args[p.pid] = processNumber
while True:
time.sleep(1)
for n, p in enumerate(children):
if not p.is_alive():
#get parameters dead child was started with
pidArgs = args[p.pid]
del(args[p.pid])
print("n,args,p: ",n,pidArgs,p)
children.pop(n)
# start new process with same args
p = multiprocessing.Process(
target=printNumber,
args=(pidArgs,)
)
children.append(p)
p.start()
args[p.pid] = pidArgs
I have updated the example to illustrate how I want the processes to be restarted if one crashes/killed/etc - keeping track of which pid was started with which args.
Is this the "best" way to do this, or is there a more "python" way of doing this?
I think I would create a separate thread for each Process and use a ProcessPoolExecutor. Executors have a useful function, submit, which returns a Future. You can wait on each Future and re-launch the Executor when the Future is done. Arguments to the function are tracked as class variables, so restarting is just a simple loop.
import threading
from concurrent.futures import ProcessPoolExecutor
import time
import random
import traceback
def printNumber(number):
print("starting :", number)
while random.randint(0, 5) > 0:
print(number)
time.sleep(2)
class KeepRunning(threading.Thread):
def __init__(self, func, *args, **kwds):
self.func = func
self.args = args
self.kwds = kwds
super().__init__()
def run(self):
while True:
with ProcessPoolExecutor(max_workers=1) as pool:
future = pool.submit(self.func, *self.args, **self.kwds)
try:
future.result()
except Exception:
traceback.print_exc()
if __name__ == '__main__':
for process_number in range(10, 15):
keep = KeepRunning(printNumber, process_number)
keep.start()
while True:
time.sleep(1)
At the end of the program is a loop to keep the main thread running. Without that, the program will attempt to exit while your Processes are still running.
For the example you provided I would just remove the exit condition from the while loop and change it to True.
As you said though the actual code is more complicated (why didn't you post that?). So if the process gets terminated by lets say an exception just put the code inside a try catch block. You can then put said block in an infinite loop.
I hope this is what you are looking for but that seems to be the right way to do it provided the goal and information you provided.
Instead of just starting the process immediately, you can save the list of processes and their arguments, and create another process that checks they are alive.
For example:
if __name__ == '__main__':
process_list = []
for processNumber in range(5):
process = multiprocessing.Process(
target=printNumber,
args=(processNumber,)
)
process_list.append((process,args))
process.start()
while True:
for running_process, process_args in process_list:
if not running_process.is_alive():
new_process = multiprocessing.Process(target=printNumber, args=(process_args))
process_list.remove(running_process, process_args) # Remove terminated process
process_list.append((new_process, process_args))
I must say that I'm not sure the best way to do it is in python, you may want to look at scheduler services like jenkins or something like that.

How do I share data between processes in python?

here is a simple example:
from collections import deque
from multiprocessing import Process
global_dequeue = deque([])
def push():
global_dequeue.append('message')
p = Process(target=push)
p.start()
def pull():
print(global_dequeue)
pull()
the output is deque([])
if I was to call push function directly, not as a separate process, the output would be deque(['message'])
How can get the message into deque, but still run push function in a separate process?
You can share data by using multiprocessing Queue object which is designed to share data between processes:
from multiprocessing import Process, Queue
import time
def push(q): # send Queue to function as argument
for i in range(10):
q.put(str(i)) # put element in Queue
time.sleep(0.2)
q.put("STOP") # put poison pillow to stop taking elements from Queue in master
if __name__ == "__main__":
q = Queue() # create Queue instance
p = Process(target=push, args=(q,),) # create Process
p.start() # start it
while True:
x = q.get()
if x == "STOP":
break
print(x)
p.join() # join process to our master process and continue master run
print("Finish")
Let me know if it helped, feel free to ask questions.
You can also use Managers to achieve this.
Python 2: https://docs.python.org/2/library/multiprocessing.html#managers
Python 3:https://docs.python.org/3.8/library/multiprocessing.html#managers
Example of usage:
https://pymotw.com/2/multiprocessing/communication.html#managing-shared-state

How to tell if an apply_async function has started or if it's still in the queue with multiprocessing.Pool

I'm using python's multiprocessing.Pool and apply_async to call a bunch of functions.
How can I tell whether a function has started processing by a member of the pool or whether it is sitting in a queue?
For example:
import multiprocessing
import time
def func(t):
#take some time processing
print 'func({}) started'.format(t)
time.sleep(t)
pool = multiprocessing.Pool()
results = [pool.apply_async(func, [t]) for t in [100]*50] #adds 50 func calls to the queue
For each AsyncResult in results you can call ready() or get(0) to see if the func finished running. But how do you find out whether the func started but hasn't finished yet?
i.e. for a given AsyncResult object (i.e. a given element of results) is there a way to see whether the function has been called or if it's sitting in the pool's queue?
First, remove completed jobs from results list
results = [r for r in results if not r.ready()]
Number of processes pending is length of results list:
pending = len(results)
And number pending but not started is total pending - pool_size
not_started = pending - pool_size
pool_size will be multiprocessing.cpu_count() if Pool is created with default argument as you did
UPDATE:
After initially misunderstanding the question, here's a way to do what OP was asking about.
I suspect this functionality could be added to the Pool class without too much trouble because AsyncResult is implemented by Pool with a Queue. That queue could also be used internally to indicate whether started or not.
But here's a way to implement using Pool and Pipe. NOTE: this doesn't work in Python 2.x -- not sure why. Tested in Python 3.8.
import multiprocessing
import time
import os
def worker_function(pipe):
pipe.send('started')
print('[{}] started pipe={}'.format(os.getpid(), pipe))
time.sleep(3)
pipe.close()
def test():
pool = multiprocessing.Pool(processes=2)
print('[{}] pool={}'.format(os.getpid(), pool))
workers = []
for x in range(1, 4):
parent, child = multiprocessing.Pipe()
pool.apply_async(worker_function, (child,))
worker = {'name': 'worker{}'.format(x), 'pipe': parent, 'started': False}
workers.append(worker)
pool.close()
while True:
for worker in workers:
if worker.get('started'):
continue
pipe = worker.get('pipe')
if pipe.poll(0.1):
message = pipe.recv()
print('[{}] {} says {}'.format(os.getpid(), worker.get('name'), message))
worker['started'] = True
pipe.close()
count_in_queue = len(workers)
for worker in workers:
if worker.get('started'):
count_in_queue -= 1
print('[{}] count_in_queue = {}'.format(os.getpid(), count_in_queue))
if not count_in_queue:
break
time.sleep(0.5)
pool.join()
if __name__ == '__main__':
test()

process pool: adding a process automatically once one of the process is completed

I have a function which creates a "pool" of process and executes each of the process in the pool.
def sendAndExecutePybotTests(poolProcessNum):
fullList = _generatePybotList()
print fullList
pool = [fullList[i:i+int(poolProcessNum)] for i in range(0, len(fullList), int(poolProcessNum))]
for chunk in pool:
procs = []
for executeLine in chunk:
proc = Process(target=_executePybotTest, args=(executeLine,))
procs.append(proc)
# time interval=1 second for each suite
time.sleep(1)
proc.start()
for proc in procs:
proc.join()
The executePybotTest (which just calls a subprocess to execute a command):
def _executePybotTest(executeLine):
subprocess.call(executeLine,shell=True)
I am using this to run automation tests in parallel. But because of this pool being joined it waits for all the processes inside the pool to finish to continue on to other items waiting to be executed.
I was looking into implementing a queue and automatically executing the next one in queue once one of the processes in the pool is finished. I'm not sure how to go about doing that.
I would use multiprocessing to create/manage the Pool of processes, and use Pool.map to distribute the list of tests to processes in the pool. Then each process can execute the test as it receives them. The map method handles distributing the tasks to each process in the pool for you, so you don't have to deal with implementing it yourself.
from multiprocessing import Pool
def execute_test(input):
print("executing test " + input)
if __name__ == "__main__":
full_list = _generatePybotList()
p = Pool(poolProcessNum)
p.map(execute_test, full_list)
p.close()
p.join()

Issue with Pool and Queue of multiprocessing module in Python

I am new to multiprocessing of Python, and I wrote the tiny script below:
import multiprocessing
import os
def task(queue):
print(100)
def run(pool):
queue = multiprocessing.Queue()
for i in range(os.cpu_count()):
pool.apply_async(task, args=(queue, ))
if __name__ == '__main__':
multiprocessing.freeze_support()
pool = multiprocessing.Pool()
run(pool)
pool.close()
pool.join()
I am wondering why the task() method is not executed and there is no output after running this script. Could anyone help me?
It is running, but it's dying with an error outside the main thread, and so you don't see the error. For that reason, it's always good to .get() the result of an async call, even if you don't care about the result: the .get() will raise the error that's otherwise invisible.
For example, change your loop like so:
tasks = []
for i in range(os.cpu_count()):
tasks.append(pool.apply_async(task, args=(queue,)))
for t in tasks:
t.get()
Then the new t.get() will blow up, ending with:
RuntimeError: Queue objects should only be shared between processes through inheritance
In short, passing Queue objects to Pool methods isn't supported.
But you can pass them to multiprocessing.Process(), or to a Pool initialization function. For example, here's a way to do the latter:
import multiprocessing
import os
def pool_init(q):
global queue # make queue global in workers
queue = q
def task():
# can use `queue` here if you like
print(100)
def run(pool):
tasks = []
for i in range(os.cpu_count()):
tasks.append(pool.apply_async(task))
for t in tasks:
t.get()
if __name__ == '__main__':
queue = multiprocessing.Queue()
pool = multiprocessing.Pool(initializer=pool_init, initargs=(queue,))
run(pool)
pool.close()
pool.join()
On Linux-y systems, you can - as the original error message suggested - use process inheritance instead (but that's not possible on Windows).

Categories

Resources