I am running the following (example) code:
from multiprocessing import Pool
def f(x):
return x*x
pool = Pool(processes=4)
print pool.map(f, range(10))
However, the code never finishes. What am I doing wrong?
The line
pool = Pool(processes=4)
completes successfully, it appears to stop in the last line. Not even pressing ctrl+c interrupts the execution. I am running the code inside an ipython console in Spyder.
from multiprocessing import Pool
def f(x):
return x * x
def main():
pool = Pool(processes=3) # set the processes max number 3
result = pool.map(f, range(10))
pool.close()
pool.join()
print(result)
print('end')
if __name__ == "__main__":
main()
The key step is to call pool.close() and pool.join() after the processes finished. Otherwise the pool is not released.
Besides, you should create the pool in the main process by putting the codes within if __name__ == "__main__":
Your constructor is throwing the interpreter off into a thread producing factory for some reason.
You first need to stop all the threads are now running and there will be tons. If you bring up the task manager you will see tons of rogue python.exe tasks. To kill them in bulk try:
taskkill /F /IM python.exe
You would need to do the above a couple of times and make sure the task manager does not show anymore python.exe tasks. This will also kill you spyder instance. So make sure you save.
Now change your code to the following:
from multiprocessing import Pool
def f(x):
return x*x
if (__name__ == '__main__'):
pool = Pool(4)
print pool.map(f, range(10))
Note that I have removed the processes named argument.
Related
I am learning about Python multiprocessing and trying to understand how I can make my code wait for all processes to finish and then continue with the rest of the code. I thought join() method should do the job, but the output of my code is not what I expected from the using it.
Here is the code:
from multiprocessing import Process
import time
def fun():
print('starting fun')
time.sleep(2)
print('finishing fun')
def fun2():
print('starting fun2')
time.sleep(5)
print('finishing fun2')
def fun3():
print('starting fun3')
print('finishing fun3')
if __name__ == '__main__':
processes = []
print('starting main')
for i in [fun, fun2, fun3]:
p = Process(target=i)
p.start()
processes.append(p)
for p in processes:
p.join()
print('finishing main')
g=0
print("g",g)
I expected all processes under if __name__ == '__main__': to finish before the lines g=0 and print(g) are called, so something like this was expected:
starting main
starting fun2
starting fun
starting fun3
finishing fun3
finishing fun
finishing fun2
finishing main
g 0
But the actual output indicates that there's something I don't understand about join() (or multiprocessing in general):
starting main
g 0
g 0
starting fun2
g 0
starting fun
starting fun3
finishing fun3
finishing fun
finishing fun2
finishing main
g 0
The question is: How do I write the code that finishes all processes first and then continues with the code without multiprocessing, so that I get the former output? I run the code from command prompt on Windows, in case it matters.
On waiting the Process to finish:
You can just Process.join your list, something like
import multiprocessing
import time
def func1():
time.sleep(1)
print('func1')
def func2():
time.sleep(2)
print('func2')
def func3():
time.sleep(3)
print('func3')
def main():
processes = [
multiprocessing.Process(target=func1),
multiprocessing.Process(target=func2),
multiprocessing.Process(target=func3),
]
for p in processes:
p.start()
for p in processes:
p.join()
if __name__ == '__main__':
main()
But if you're thinking about giving your process more complexity, try using a Pool:
import multiprocessing
import time
def func1():
time.sleep(1)
print('func1')
def func2():
time.sleep(2)
print('func2')
def func3():
time.sleep(3)
print('func3')
def main():
result = []
with multiprocessing.Pool() as pool:
result.append(pool.apply_async(func1))
result.append(pool.apply_async(func2))
result.append(pool.apply_async(func3))
for r in result:
r.wait()
if __name__ == '__main__':
main()
More info on Pool
On why g0 prints multiple times:
This is happening because you're using spawn or forkserver to set your Process and the g0 and print declarations are outside a function or the __main__ if block.
From the docs:
Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a starting a new process).
(...)
This allows the newly spawned Python interpreter to safely import the module and then run the module’s foo() function.
Similar restrictions apply if a pool or manager is created in the main module.
It's basically interpreting again because it's importing your .py file as a module.
I have problems with python multiprocessing
python version 3.6.6
using Spyder IDE on windows 7
1.
queue is not being populated -> everytime I try to read it, its empty. Somewhere I read, that I have to get() it before process join() but it did not solve it.
from multiprocessing import Process,Queue
# define a example function
def fnc(i, output):
output.put(i)
if __name__ == '__main__':
# Define an output queue
output = Queue()
# Setup a list of processes that we want to run
processes = [Process(target=fnc, args=(i, output)) for i in range(4)]
print('created')
# Run processes
for p in processes:
p.start()
print('started')
# Exit the completed processes
for p in processes:
p.join()
print(output.empty())
print('finished')
>>>created
>>>started
>>>True
>>>finished
I would expect output to not be empty.
if I change it from .join() to
for p in processes:
print(output.get())
#p.join()
it freezes
2.
Next problem I have is with pool.map() - it freezes and has no chance to exceed memory limit. I dont even know how to debug such simple pieace of code.
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
pool = Pool(processes=4)
print('Pool created')
# print "[0, 1, 4,..., 81]"
print(pool.map(f, range(10))) # it freezes here
Hope its not a big deal to have two questions in one topic
Apperently the problem is Spyder's IPython console. When I run both from cmd, its executed properly.
Solution
for debugging in Spyder add .dummy to multiprocessing import
from multiprocessing.dummy import Process,Queue
It will not be executed by more processors, but you will get results and can actualy see the output. When debugging is done simply delete .dummy, place it in another file, import it and call it for example as function
multiprocessing_my.py
from multiprocessing import Process,Queue
# define a example function
def fnc(i, output):
output.put(i)
print(i)
def test():
# Define an output queue
output = Queue()
# Setup a list of processes that we want to run
processes = [Process(target=fnc, args=(i, output)) for i in range(4)]
print('created')
# Run processes
for p in processes:
p.start()
print('started')
# Exit the completed processes
for p in processes:
p.join()
print(output.empty())
print('finished')
# Get process results from the output queue
results = [output.get() for p in processes]
print('get results')
print(results)
test_mp.py
executed by selecting code and pressing ctrl+Enter
import multiprocessing_my
multiprocessing_my.test2()
...
In[9]: test()
created
0
1
2
3
started
False
finished
get results
[0, 1, 2, 3]
I am new to multiprocessing of Python, and I wrote the tiny script below:
import multiprocessing
import os
def task(queue):
print(100)
def run(pool):
queue = multiprocessing.Queue()
for i in range(os.cpu_count()):
pool.apply_async(task, args=(queue, ))
if __name__ == '__main__':
multiprocessing.freeze_support()
pool = multiprocessing.Pool()
run(pool)
pool.close()
pool.join()
I am wondering why the task() method is not executed and there is no output after running this script. Could anyone help me?
It is running, but it's dying with an error outside the main thread, and so you don't see the error. For that reason, it's always good to .get() the result of an async call, even if you don't care about the result: the .get() will raise the error that's otherwise invisible.
For example, change your loop like so:
tasks = []
for i in range(os.cpu_count()):
tasks.append(pool.apply_async(task, args=(queue,)))
for t in tasks:
t.get()
Then the new t.get() will blow up, ending with:
RuntimeError: Queue objects should only be shared between processes through inheritance
In short, passing Queue objects to Pool methods isn't supported.
But you can pass them to multiprocessing.Process(), or to a Pool initialization function. For example, here's a way to do the latter:
import multiprocessing
import os
def pool_init(q):
global queue # make queue global in workers
queue = q
def task():
# can use `queue` here if you like
print(100)
def run(pool):
tasks = []
for i in range(os.cpu_count()):
tasks.append(pool.apply_async(task))
for t in tasks:
t.get()
if __name__ == '__main__':
queue = multiprocessing.Queue()
pool = multiprocessing.Pool(initializer=pool_init, initargs=(queue,))
run(pool)
pool.close()
pool.join()
On Linux-y systems, you can - as the original error message suggested - use process inheritance instead (but that's not possible on Windows).
This problem seems to have been eluding me - all the solutions are more like workarounds and add quite a bit of complexity to the code.
Since its been a good while since any posts regarding this have been made, are there any simple solutions to the following - upon detecting a keyboard interrupt, cleanly exit all the childs proceses, terminate the program?
Code below is snippet of my multiproccess structure - I'd like to preserve as much as posible, while adding the needed functionality:
from multiprocessing import Pool
import time
def multiprocess_init(l):
global lock
lock = l
def synchronous_print(i):
with lock:
print i
time.sleep(1)
if __name__ == '__main__':
lock = Lock()
pool = Pool(processes=5, initializer=multiprocess_init, initargs=(lock, ))
for i in range(1,20):
pool.map_async(synchronous_print, [i])
pool.close() #necessary to prevent zombies
pool.join() #wait for all processes to finish
The short answer is to move to python 3. Python 2 has multiple problems with thread/process synchronization that have been fixed in python 3.
In your case, multiprocessing will doggedly recreate your child processes every time you send keyboard interrupt and pool.close will get stuck and never exit. You can reduce the problem by explicitly exiting the child process with os.exit and by waiting for individual results from apply_async so that you don't get stuck in pool.close prison.
from multiprocessing import Pool, Lock
import time
import os
def multiprocess_init(l):
global lock
lock = l
print("initialized child")
def synchronous_print(i):
try:
with lock:
print i
time.sleep(1)
except KeyboardInterrupt:
print("exit child")
os.exit(2)
if __name__ == '__main__':
lock = Lock()
pool = Pool(processes=5, initializer=multiprocess_init, initargs=(lock, ))
results = []
for i in range(1,20):
results.append(pool.map_async(synchronous_print, [i]))
for result in results:
print('wait result')
result.wait()
pool.close() #necessary to prevent zombies
pool.join() #wait for all processes to finish
print("Join completes")
I've gone through (this SO thread)[ Synchronization issue using Python's multiprocessing module but it doesnt provide the answer.
The following code :-
rom multiprocessing import Process, Lock
def f(l, i):
l.acquire()
print 'hello world', i
l.release()
# do something else
if __name__ == '__main__':
lock = Lock()
for num in range(10):
Process(target=f, args=(lock, num)).start()
How do I get the processes to execute in order.? I want to hold up a lock for a few seconds and then release it and thereby moving forward with the P1 and P2 into the lock, and then P2 moving forward and P3 exceuting that lock. How would I get the processes to execute in order.?
It sounds like you just want to delay the start of each successive process. If that's the case, you can use a multiprocessing.Event to delay starting the next child in the main process. Just pass the event to the child, and have the child set the Event when its done doing whatever should run prior to starting the next child. The main process can wait on that Event, and once it's signalled, clear it and start the next child.
from multiprocessing import Process, Event
def f(e, i):
print 'hello world', i
e.set()
# do something else
if __name__ == '__main__':
event = Event()
for num in range(10):
p = Process(target=f, args=(event, num))
p.start()
event.wait()
event.clear()
this is not the purpose of locks. Your code architecture is bad for your use case. I think you should refactor your code to this:
from multiprocessing import Process
def f(i):
# do something here
if __name__ == '__main__':
for num in range(10):
print 'hello world', num
Process(target=f, args=(num,)).start()
in this case it will print in order and then will do the remaining part asynchronously