Using multiprocessing.pool.Pool to initialize parallel processing freezes indefinitely - python

I am currently trying to run parallelized code from a spyder console in anaconda. I believe the issue may be with my computer not allowing anaconda to control CPU cores, but I am not sure how to correct this issue.
Another interesting point is that when I run an async example, but when I try to produce the results I receive the same issue.
I have tried multiple simple examples that should be working. There are no package loading errors
from multiprocessing.pool import ThreadPool, Pool
def square_it(x):
return x*x
# On Windows, make sure that multiprocessing doesn't start
# until after "if __name__ == '__main__'"
with Pool(processes=5) as pool:
results = pool.map(square_it, [5, 4, 3, 2 ,1])
print(results)
I expect for my code to complete all code.

This code is meant to run square_it in parallel, in 5 different processes
def square_it(x):
return x*x
with Pool(processes=5) as pool:
results = pool.map(square_it, [5, 4, 3, 2 ,1])
print(results)
The it does that is that 5 new processes are created, then in each of them, the same python module is loaded and function square_it is called.
What happens when the module is imported in one of the 5 subprocesses is the same thing which happens when it is initially loaded in the main process: it creates another Pool of 5 subprocesses, which do that indefinitely.
To avoid that, you have to make sure that the subprocesses do not recursively create more and more subprocesses. You do that by creating the subprocesses only in the "main" module, aka "__main__":
def square_it(x):
return x*x
if __name__ == "__main__":
with Pool(processes=5) as pool:
results = pool.map(square_it, [5, 4, 3, 2 ,1])
print(results)

Related

Multiprocessing pool doesn't seem to even call function?

I am trying to call a function in parallel using multiprocessing. I have used both starmap and map many times in the past to do so, but suddenly multiprocessing has stopped working. A pool is created, but the function is never called and the cell never finishes running. To test the issue, I am running a simple example code:
from multiprocessing import Pool
def f(x):
print(x)
return x*x
if __name__ == '__main__':
p = Pool(1)
results = p.map(f, [1, 2, 3])
p.close()
p.join()
When I run this, nothing is printed, and the process never completes.
I have also tried running old code from previous notebooks that contain multiprocessing, and these have failed, too. I tried updating all packages as well. Has anyone else experienced this problem before?

Multiprocessing in python won't releae memory

I am running a multiprocessing code. The framework of the code is something like below:
def func_a(x):
#main function here
return result
def func_b(y):
cores = multiprocessing.cpu_count() - 1
pool = multiprocessing.Pool(processes=cores)
results = pool.map(func_a, np.arange(1000)
return results
if __name__ == '__main__':
final_resu = []
for i in range(0, 200):
final_resu.append(func_b(i))
The problem I found in this code has two problems: Firstly, the memory continues going up during the loop. Secondly, in the task manager (windows10), the number of python executions increased step-wise, i.e. 14 to 25, to 36, to 47... with every iteration finished in the main loop.
I believe it has something wrong with the multiprocessing, but I'm not sure how to deal with it. It looks like the multiprocessing in func_b is not deleted when the main loop finished one loop?
As the examples in the docs show, when you're done with a Pool you should shut it down explicitly, via pool.close() followed by pool.join().That said, it would be better still if, in addition, you created your Pool only once - e.g., pass a Pool as an argument to func_b(). and create it - and close it down - only once, in the __name__ == '__main__' block.

process flow in parallel processing in python

I am learning about parallel processing in python and I have some very specific doubts regarding the execution flow of the following program. In this program, I am splitting my list into two parts depending on the process. My aim is to run the add function twice parallely where one process takes one part of the list and other takes other part.
import multiprocessing as mp
x = [1,2,3,4]
print('hello')
def add(flag, q_f):
global x
if flag == 1:
dl = x[0:2]
elif flag == 2:
dl = x[2:4]
else:
dl = x
x = [i+2 for i in dl]
print('flag = %d'%flag)
print('1')
print('2')
print(x)
q_f.put(x)
print('Above main')
if __name__ == '__main__':
ctx = mp.get_context('spawn')
print('inside main')
q = ctx.Queue()
jobs = []
for i in range(2):
p = mp.Process(target = add, args = (i+1, q))
jobs.append(p)
for j in jobs:
j.start()
for j in jobs:
j.join()
print('completed')
print(q.get())
print(q.get())
print('outside main')
The output which I got is
hello
Above main
outside main
flag = 1
1
2
[3, 4]
hello
Above main
outside main
flag = 2
1
2
[5, 6]
hello
Above main
inside main
completed
[3, 4]
[5, 6]
outside main
My questions are
1) From the output, we can see that one process is getting executed first, then the other. Is the program actually utilizing multiple processors for parallel processing? If not, how can I make it parallely process? If it was parallely processing, the print statements print('1') print('2') should be executed at random, right?
2) Can I check programmatically on which processor is the program running?
3) Why are the print statements outside main(hello, above main, outside main) getting executed thrice?
4) What is the flow of the program execution?
1) The execution of add() is probably done so fast that the first execution ended already when the second process is started.
2) A process is usually not assigned to a particular CPU but jumps between them
3) If you are using Windows for each started process the module must be executed again. For these executions __name__ isn't 'main' but all unconditional commands (outside of if and such) like these prints are executed.
4) When start() of a Process is called on Windows a new Python interpreter is started which means that necessary modules are imported (and therefore executed) and necessary resources to run the subprocess are handed to the new interpreter (the "spawn"-method described at https://docs.python.org/3.6/library/multiprocessing.html#contexts-and-start-methods). All processes then run independently (if no synchronization is done by the program)

Python Multiprocessing: How do I keep it within the specified function?

I've been a long-term observer of Stack Overflow but this time I just can't find a solution to my problem, so here I am asking you directly!
Consider this code:
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
p = Pool(5)
print(p.map(f, [1, 2, 3]))
print "External"
It's the basic example code for multiprocessing with pools as found here in the first box, plus a print statement at the end.
When executing this in PyCharm Community on Windows 7, Python 2.7, the Pool part works fine, but "External" is printed multiple times, too. As a result, when I try to use Multithreading on a specific function in another program, all processes end up running the entire program. How do I prevent that, so only the given function is multiprocessed?
I tried using Process instead, closing, joining and/or terminating the process or pool, embedding the entire thing into a function, calling said function from a different file (it then starts executing that file). I can't find anything related to my problem and feel like I'm missing something very simple.
Since the print instruction is not indented, it is executed each time the python file is imported. Which is, every time a new process is created.
On the opposite, all the code placed beneath if __name__ == '__main__ will not be executed each time a process is created, but only from the main process, which is the only place where the instruction evaluates to true.
Try the following code, you should not see the issue again. You should expect to see External printed to the console only once.
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
p = Pool(5)
print(p.map(f, [1, 2, 3]))
print "External"
Related: python multiprocessing on windows, if __name__ == "__main__"

Identify current process is child process in Python

In Python 2.7, is there's a way to identify if the current forked/spawned process is a child process instance (as opposed to being starting as a regular process). My goal is to set a global variable differently if it's a child process (e.g. create a pool with size 0 for child else pool with some number greater than 0).
I can't pass a parameter into the function (being called to execute in the child process), as even before the function is invoked the process would have been initialized and hence the global variable (especially for spawned process).
Also I am not in a position to use freeze_support (unless of course I am miss understood how to use it) as my application is running in a web service container (flask). Hence there's no main method.
Any help will be much appreciated.
Sample code that goes into infinite loop if you run it on windows:
from multiprocessing import Pool, freeze_support
p = Pool(5) # This should be created only in parent process and not the child process
def f(x):
return x*x
if __name__ == '__main__':
freeze_support()
print(p.map(f, [1, 2, 3]))
I would suggest restructuring your program to something more like my example code below. You mentioned that you don't have a main function, but you can create a wrapper that handles your pool:
from multiprocessing import Pool, freeze_support
def f(x):
return x*x
def handle_request():
p = Pool(5) # pool will only be in the parent process
print(p.map(f, [1, 2, 3]))
p.close() # remember to clean up the resources you use
p.join()
return
if __name__ == '__main__':
freeze_support() # do you really need this?
# start your web service here and make it use `handle_request` as the callback
# when a request needs to be serviced
It sounds like you are having a bit of an XY problem. You shouldn't be making a pool of processes global. It's just bad. You're giving your subprocesses access to their own process objects, which allows you to accidentally do bad things, like make a child process join itself. If you create your pool within a wrapper that is called for each request, then you don't need to worry about a global variable.
In the comments, you mentioned that you want a persistent pool. There is indeed some overhead to creating a pool on each request, but it's far safer than having a global pool. Also, you now have the capability to handle multiple requests simultaneously, assuming your web service handles each request in their own thread/process, without multiple requests trampling on each other by trying to use the same pool. I would strongly suggest you try to use this approach, and if it doesn't meet your performance specifications, you can look at optimizing it in other ways (ie, no global pool) to meet your spec.
One other note: multiprocessing.freeze_support() only needs to be called if you intend to bundle your scripts into a Windows executable. Don't use it if you are not doing that.
Move the pool creation into the main section to only create a multiprocessing pool once, any only in the main process:
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
p = Pool(5)
print(p.map(f, [1, 2, 3]))
This works because the only process that is executing in the __main__ name is the original process. Spawned processes run with the __mp_main__ module name.
create a pool with size 0 for child
The child processes should never start a new multiprocessing pool. Only handle your processes from a single entry point.

Categories

Resources