Issue with Pool and Queue of multiprocessing module in Python

Issue with Pool and Queue of multiprocessing module in Python - python

I am new to multiprocessing of Python, and I wrote the tiny script below:
import multiprocessing
import os
def task(queue):
print(100)
def run(pool):
queue = multiprocessing.Queue()
for i in range(os.cpu_count()):
pool.apply_async(task, args=(queue, ))
if __name__ == '__main__':
multiprocessing.freeze_support()
pool = multiprocessing.Pool()
run(pool)
pool.close()
pool.join()
I am wondering why the task() method is not executed and there is no output after running this script. Could anyone help me?

It is running, but it's dying with an error outside the main thread, and so you don't see the error. For that reason, it's always good to .get() the result of an async call, even if you don't care about the result: the .get() will raise the error that's otherwise invisible.
For example, change your loop like so:
tasks = []
for i in range(os.cpu_count()):
tasks.append(pool.apply_async(task, args=(queue,)))
for t in tasks:
t.get()
Then the new t.get() will blow up, ending with:
RuntimeError: Queue objects should only be shared between processes through inheritance
In short, passing Queue objects to Pool methods isn't supported.
But you can pass them to multiprocessing.Process(), or to a Pool initialization function. For example, here's a way to do the latter:
import multiprocessing
import os
def pool_init(q):
global queue # make queue global in workers
queue = q
def task():
# can use `queue` here if you like
print(100)
def run(pool):
tasks = []
for i in range(os.cpu_count()):
tasks.append(pool.apply_async(task))
for t in tasks:
t.get()
if __name__ == '__main__':
queue = multiprocessing.Queue()
pool = multiprocessing.Pool(initializer=pool_init, initargs=(queue,))
run(pool)
pool.close()
pool.join()
On Linux-y systems, you can - as the original error message suggested - use process inheritance instead (but that's not possible on Windows).

Related

How to tell if an apply_async function has started or if it's still in the queue with multiprocessing.Pool

I'm using python's multiprocessing.Pool and apply_async to call a bunch of functions.
How can I tell whether a function has started processing by a member of the pool or whether it is sitting in a queue?
For example:
import multiprocessing
import time
def func(t):
#take some time processing
print 'func({}) started'.format(t)
time.sleep(t)
pool = multiprocessing.Pool()
results = [pool.apply_async(func, [t]) for t in [100]*50] #adds 50 func calls to the queue
For each AsyncResult in results you can call ready() or get(0) to see if the func finished running. But how do you find out whether the func started but hasn't finished yet?
i.e. for a given AsyncResult object (i.e. a given element of results) is there a way to see whether the function has been called or if it's sitting in the pool's queue?

First, remove completed jobs from results list
results = [r for r in results if not r.ready()]
Number of processes pending is length of results list:
pending = len(results)
And number pending but not started is total pending - pool_size
not_started = pending - pool_size
pool_size will be multiprocessing.cpu_count() if Pool is created with default argument as you did
UPDATE:
After initially misunderstanding the question, here's a way to do what OP was asking about.
I suspect this functionality could be added to the Pool class without too much trouble because AsyncResult is implemented by Pool with a Queue. That queue could also be used internally to indicate whether started or not.
But here's a way to implement using Pool and Pipe. NOTE: this doesn't work in Python 2.x -- not sure why. Tested in Python 3.8.
import multiprocessing
import time
import os
def worker_function(pipe):
pipe.send('started')
print('[{}] started pipe={}'.format(os.getpid(), pipe))
time.sleep(3)
pipe.close()
def test():
pool = multiprocessing.Pool(processes=2)
print('[{}] pool={}'.format(os.getpid(), pool))
workers = []
for x in range(1, 4):
parent, child = multiprocessing.Pipe()
pool.apply_async(worker_function, (child,))
worker = {'name': 'worker{}'.format(x), 'pipe': parent, 'started': False}
workers.append(worker)
pool.close()
while True:
for worker in workers:
if worker.get('started'):
continue
pipe = worker.get('pipe')
if pipe.poll(0.1):
message = pipe.recv()
print('[{}] {} says {}'.format(os.getpid(), worker.get('name'), message))
worker['started'] = True
pipe.close()
count_in_queue = len(workers)
for worker in workers:
if worker.get('started'):
count_in_queue -= 1
print('[{}] count_in_queue = {}'.format(os.getpid(), count_in_queue))
if not count_in_queue:
break
time.sleep(0.5)
pool.join()
if __name__ == '__main__':
test()

Threading queue, is the code right?

I wrote a little multi-host online-scanner. So my question is the code correct? I mean, the program does what it should but if I look in the terminal at the top command, it shows me from time to time a lot of zombie threads.
The code for scan and threader function:
def scan(queue):
conf.verb = 0
while True:
try:
host = queue.get()
if host is None:
sys.exit(1)
icmp = sr1(IP(dst=host)/ICMP(),timeout=3)
if icmp:
with Print_lock:
print("Host: {} online".format(host))
saveTofile(RngFile, host)
except Empty:
print("Done")
queue.task_done()
def threader(queue, hostlist):
threads = []
max_threads = 223
for i in range(max_threads):
thread = Thread(target=scan, args=(queue,))
thread.start()
threads.append(thread)
for ip in hostlist:
queue.put(ip)
queue.join()
for i in range(max_threads):
queue.put(None)
for thread in threads:
thread.join()
P.S. Sorry for my terrible english

If you have a lot more threads than you have cores then you aren't really getting any benefit from spawning them, and even worse python has the global interpreter lock so you don't get real multithreading unless you use multiple processes. Use multiprocessing and set max_threads to multiprocessing.cpu_count().
Even better, you could use a pool.
from multiprocessing import Pool, cpu_count
with Pool(cpu_count()) as p:
results = p.map(scan, hostlist)
# change scan to take host directly instead of from a queue
# if you change
And that's it, no messing with queues and filling it with None to make sure you kill all your processes.
I should add, make sure you create your process pool inside the main module! Your code in its entirety should look like this:
from multiprocessing import Pool, cpu_count
def scan(host):
# whatever
if __name__ == "__main__":
with Pool(cpu_count()) as p:
results = p.map(scan, hostlist)

Using apply_async with callback function for a pool of processes

I am trying to understand how multiprocess pools work. In the following programing I created a pool of 4 processes.
And I call apply_async with a callback function that should update a list called result_list
import Queue
from multiprocessing import Process
from multiprocessing import Pool
result_list = []
def foo_pool(q): #Function for each process
print "foo_pool"
if(q.qsize() > 0):
number = q.get()
return number * 2
def log_result(result):
# This is called whenever foo_pool(i) returns a result.
# result_list is modified only by the main process, not the pool workers.
result_list.append(result)
if __name__ == "__main__":
q = Queue.Queue()
for i in range(4):
q.put(i + 1) #Put 1..4 in the queue
p = Pool(4)
p.apply_async(foo_pool, args = (q, ), callback = log_result)
I realize I don't need to use a queue here. But I am testing this for another program which requires me to use a queue.
When I run the program, the function foo_pool is not being called. The print statement print "foo_pool" does not execute. Why is this?

Roughly speaking, apply_async only schedule async task, but not run it. You need to call p.close() and p.join() to trigger execution or r = p.apply_async() and r.get().

Python multiprocessing pool never finishes

I am running the following (example) code:
from multiprocessing import Pool
def f(x):
return x*x
pool = Pool(processes=4)
print pool.map(f, range(10))
However, the code never finishes. What am I doing wrong?
The line
pool = Pool(processes=4)
completes successfully, it appears to stop in the last line. Not even pressing ctrl+c interrupts the execution. I am running the code inside an ipython console in Spyder.

from multiprocessing import Pool
def f(x):
return x * x
def main():
pool = Pool(processes=3) # set the processes max number 3
result = pool.map(f, range(10))
pool.close()
pool.join()
print(result)
print('end')
if __name__ == "__main__":
main()
The key step is to call pool.close() and pool.join() after the processes finished. Otherwise the pool is not released.
Besides, you should create the pool in the main process by putting the codes within if __name__ == "__main__":

Your constructor is throwing the interpreter off into a thread producing factory for some reason.
You first need to stop all the threads are now running and there will be tons. If you bring up the task manager you will see tons of rogue python.exe tasks. To kill them in bulk try:
taskkill /F /IM python.exe
You would need to do the above a couple of times and make sure the task manager does not show anymore python.exe tasks. This will also kill you spyder instance. So make sure you save.
Now change your code to the following:
from multiprocessing import Pool
def f(x):
return x*x
if (__name__ == '__main__'):
pool = Pool(4)
print pool.map(f, range(10))
Note that I have removed the processes named argument.

Multiprocessing KeyboardInterrupt handling

This problem seems to have been eluding me - all the solutions are more like workarounds and add quite a bit of complexity to the code.
Since its been a good while since any posts regarding this have been made, are there any simple solutions to the following - upon detecting a keyboard interrupt, cleanly exit all the childs proceses, terminate the program?
Code below is snippet of my multiproccess structure - I'd like to preserve as much as posible, while adding the needed functionality:
from multiprocessing import Pool
import time
def multiprocess_init(l):
global lock
lock = l
def synchronous_print(i):
with lock:
print i
time.sleep(1)
if __name__ == '__main__':
lock = Lock()
pool = Pool(processes=5, initializer=multiprocess_init, initargs=(lock, ))
for i in range(1,20):
pool.map_async(synchronous_print, [i])
pool.close() #necessary to prevent zombies
pool.join() #wait for all processes to finish

The short answer is to move to python 3. Python 2 has multiple problems with thread/process synchronization that have been fixed in python 3.
In your case, multiprocessing will doggedly recreate your child processes every time you send keyboard interrupt and pool.close will get stuck and never exit. You can reduce the problem by explicitly exiting the child process with os.exit and by waiting for individual results from apply_async so that you don't get stuck in pool.close prison.
from multiprocessing import Pool, Lock
import time
import os
def multiprocess_init(l):
global lock
lock = l
print("initialized child")
def synchronous_print(i):
try:
with lock:
print i
time.sleep(1)
except KeyboardInterrupt:
print("exit child")
os.exit(2)
if __name__ == '__main__':
lock = Lock()
pool = Pool(processes=5, initializer=multiprocess_init, initargs=(lock, ))
results = []
for i in range(1,20):
results.append(pool.map_async(synchronous_print, [i]))
for result in results:
print('wait result')
result.wait()
pool.close() #necessary to prevent zombies
pool.join() #wait for all processes to finish
print("Join completes")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Issue with Pool and Queue of multiprocessing module in Python - python

Related

How to tell if an apply_async function has started or if it's still in the queue with multiprocessing.Pool

Threading queue, is the code right?

Using apply_async with callback function for a pool of processes

Python multiprocessing pool never finishes

Multiprocessing KeyboardInterrupt handling

Categories

Resources