I am trying to call a function in parallel using multiprocessing. I have used both starmap and map many times in the past to do so, but suddenly multiprocessing has stopped working. A pool is created, but the function is never called and the cell never finishes running. To test the issue, I am running a simple example code:
from multiprocessing import Pool
def f(x):
print(x)
return x*x
if __name__ == '__main__':
p = Pool(1)
results = p.map(f, [1, 2, 3])
p.close()
p.join()
When I run this, nothing is printed, and the process never completes.
I have also tried running old code from previous notebooks that contain multiprocessing, and these have failed, too. I tried updating all packages as well. Has anyone else experienced this problem before?
Related
I am using multiprocessing python module to run parallel and unrelated jobs with a function similar to the following example:
import numpy as np
from multiprocessing import Pool
def myFunction(arg1):
name = "file_%s.npy"%arg1
A = np.load(arg1)
A[A<0] = np.nan
np.save(arg1,A)
if(__name__ == "__main__"):
N = list(range(50))
with Pool(4) as p:
p.map_async(myFunction, N)
p.close() # I tried with and without that statement
p.join() # I tried with and without that statement
DoOtherStuff()
My problem is that the function DoOtherStuff is never executed, the processes switches into sleep mode on top and I need to kill it with ctrl+C to stop it.
Any suggestions?
You have at least a couple problems. First, you are using map_async() which does not block until the results of the task are completed. So what you're doing is starting the task with map_async(), but then immediately closes and terminates the pool (the with statement calls Pool.terminate() upon exiting).
When you add tasks to a Process pool with methods like map_async it adds tasks to a task queue which is handled by a worker thread which takes tasks off that queue and farms them out to worker processes, possibly spawning new processes as needed (actually there is a separate thread which handles that).
Point being, you have a race condition where you're terminating the Pool likely before any tasks are even started. If you want your script to block until all the tasks are done just use map() instead of map_async(). For example, I rewrote your script like this:
import numpy as np
from multiprocessing import Pool
def myFunction(N):
A = np.load(f'file_{N:02}.npy')
A[A<0] = np.nan
np.save(f'file2_{N:02}.npy', A)
def DoOtherStuff():
print('done')
if __name__ == "__main__":
N = range(50)
with Pool(4) as p:
p.map(myFunction, N)
DoOtherStuff()
I don't know what your use case is exactly, but if you do want to use map_async(), so that this task can run in the background while you do other stuff, you have to leave the Pool open, and manage the AsyncResult object returned by map_async():
result = pool.map_async(myFunction, N)
DoOtherStuff()
# Is my map done yet? If not, we should still block until
# it finishes before ending the process
result.wait()
pool.close()
pool.join()
You can see more examples in the linked documentation.
I don't know why in your attempt you got a deadlock--I was not able to reproduce that. It's possible there was a bug at some point that was then fixed, though you were also possibly invoking undefined behavior with your race condition, as well as calling terminate() on a pool after it's already been join()ed. As for your why your answer did anything at all, it's possible that with the multiple calls to apply_async() you managed to skirt around the race condition somewhat, but this is not at all guaranteed to work.
I am trying to set up the very basic example of multiprocessing below. However, the execution only prints here and <_MainProcess(MainProcess, started)> and pool.apply() never even calls the function cube(). Instead, the execution just keeps running indefinitely without termination.
import multiprocessing as mp
def cube(x):
print('in function')
return x**3
if __name__ == '__main__':
pool = mp.Pool(processes=4)
print('here')
print(mp.current_process())
results = [pool.apply(cube, args=(x,)) for x in range(1,7)]
print('now here')
pool.close()
pool.join()
print(results)
I have tried various other basic examples including pool.map() but keep running into the same problem. I am using Python 3.7 on Windows 10. Since I am out of ideas, does anybody know what is wrong here or how I can debug this further?
Thanks!
Thank you, upgrading to Python 3.7.3 solved the issue.
I am currently trying to run parallelized code from a spyder console in anaconda. I believe the issue may be with my computer not allowing anaconda to control CPU cores, but I am not sure how to correct this issue.
Another interesting point is that when I run an async example, but when I try to produce the results I receive the same issue.
I have tried multiple simple examples that should be working. There are no package loading errors
from multiprocessing.pool import ThreadPool, Pool
def square_it(x):
return x*x
# On Windows, make sure that multiprocessing doesn't start
# until after "if __name__ == '__main__'"
with Pool(processes=5) as pool:
results = pool.map(square_it, [5, 4, 3, 2 ,1])
print(results)
I expect for my code to complete all code.
This code is meant to run square_it in parallel, in 5 different processes
def square_it(x):
return x*x
with Pool(processes=5) as pool:
results = pool.map(square_it, [5, 4, 3, 2 ,1])
print(results)
The it does that is that 5 new processes are created, then in each of them, the same python module is loaded and function square_it is called.
What happens when the module is imported in one of the 5 subprocesses is the same thing which happens when it is initially loaded in the main process: it creates another Pool of 5 subprocesses, which do that indefinitely.
To avoid that, you have to make sure that the subprocesses do not recursively create more and more subprocesses. You do that by creating the subprocesses only in the "main" module, aka "__main__":
def square_it(x):
return x*x
if __name__ == "__main__":
with Pool(processes=5) as pool:
results = pool.map(square_it, [5, 4, 3, 2 ,1])
print(results)
I've been a long-term observer of Stack Overflow but this time I just can't find a solution to my problem, so here I am asking you directly!
Consider this code:
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
p = Pool(5)
print(p.map(f, [1, 2, 3]))
print "External"
It's the basic example code for multiprocessing with pools as found here in the first box, plus a print statement at the end.
When executing this in PyCharm Community on Windows 7, Python 2.7, the Pool part works fine, but "External" is printed multiple times, too. As a result, when I try to use Multithreading on a specific function in another program, all processes end up running the entire program. How do I prevent that, so only the given function is multiprocessed?
I tried using Process instead, closing, joining and/or terminating the process or pool, embedding the entire thing into a function, calling said function from a different file (it then starts executing that file). I can't find anything related to my problem and feel like I'm missing something very simple.
Since the print instruction is not indented, it is executed each time the python file is imported. Which is, every time a new process is created.
On the opposite, all the code placed beneath if __name__ == '__main__ will not be executed each time a process is created, but only from the main process, which is the only place where the instruction evaluates to true.
Try the following code, you should not see the issue again. You should expect to see External printed to the console only once.
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
p = Pool(5)
print(p.map(f, [1, 2, 3]))
print "External"
Related: python multiprocessing on windows, if __name__ == "__main__"
I'm trying to use multithreading and in order to keep it simple at first, i'm running the following code:
import multiprocessing as mp
pool = mp.Pool(4)
def square(x):
return x**2
results=pool.map(square,range(1,20))
As i understand, results should be a list containig the squares from 1 to 20.
However, the code does not seem to terminate.(doing the same without pool finishes in a blink, this ran for several minutes, before i stopped it manually).
Additional information: task manager tells me, that the additional python processes have been launched and are running, but are using zero % of my cpu; still other unrelated processes like firefox skyrocket in their cpu usage, while the programm is running.
I'm using windows 8 and a i5-4300U cpu (pooling to 2 instead of 4 doesn't help either)
What am i doing wrong?
Are there any good ressources on the Pool class, which could help me understand what is wrong with my code?
Code with pool initialization should be inside __name__ == "__main__" as multiprocessing imports the module each time to spawn a new process.
import multiprocessing as mp
def square(x):
return x**2
if __name__ == '__main__':
pool = mp.Pool(4)
results=pool.map(square,range(1,20))
Your code causes each process to hit an attribute error because it can't find the square attribute because it wasn't defined when you instantiated the pool. The processes therefore hang thereafter. Defining square before pool would fix the problem.
See also:
yet another confusion with multiprocessing error, 'module' object has no attribute 'f'