Multiprocessing in Python hanging the system - python

I am working on multiprocessing and trying to replicate the code given in the below link:
Python Multiprocessing imap
My system is hanging in both Spyder and Jupyter as shown following. What could be the reason?
Following is the code exactly copied and running. But it is just hanging.
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
with Pool(3) as p:
print(p.map(f, [1, 2, 3]))

If you read the docs on multiprocessing, in particular the following section:
... you will see this will not work. The solution is to put function f in another .py file and import it order to get it to work. For example:
File worker.py:
def f(x):
return x*x
Your revised code:
from multiprocessing import Pool
from worker import f
if __name__ == '__main__':
with Pool(3) as p:
print(p.map(f, [1, 2, 3]))

Related

Python code using multiprocessing running infinitely

I am trying to execute the following code in jupyter notebook using multiprocessing but the loop is running infinitely.
I need help resolving this issue.
import multiprocessing as mp
import numpy as np
def square(x):
return np.square(x)
x = np.arange(64)
pool = mp.Pool(4)
squared = pool.map(square, [x[16*i:16*i+16] for i in range(4)])
The output for mp.cpu_count() was 4.
You need to rewrite your code to be something like:
def main():
x = np.arange(64)
pool = mp.Pool(4)
squared = .....
if __name__ == '__main__':
main()
This code is currently being run in every process. You need it to only run in the one process that is doing the setup.
You forgot:
pool.close()
pool.join()

Example from documentaion doesn't work in Jupiter Notebook

I had looked at the documentaion. And there was an example
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
with Pool(5) as p:
print(p.map(f, [1, 2, 3]))
The problem is: it is not working. I run this code in Jupiter Notebook cell. And this the cell doesn't raise any exception. But Jupiter's terminal does. And it says: AttributeError: Can't get attribute 'f' on <module '__main__' (built-in)>
As written here the problem may be because I don't use __name__ == '__main__' condition. But I do.
I had literally copy and paste example from the documention and it's not working. What should I do?
I suspect you are running on Windows. If so, this is a known issue. See this article. You need to add your function f to a file, such as worker.py:
worker.py
def f(x):
return x*x
Then you jupyter notebook code becomes:
from multiprocessing import Pool
import worker
if __name__ == '__main__':
with Pool(5) as p:
print(p.map(worker.f, [1, 2, 3]))

Python multiprocessing pool does not work when importing custom classes in the script

I have a simple Python script which runs a parallel pool:
import multiprocessing as mp
def square(x):
return x**2
if __name__ == '__main__':
pool = mp.Pool(4)
results=pool.map(square,range(1,20))
It works fine and as expected. However, if I import any simple custom class, such as the code below, it doesn't work any more. I start the script execution, but the script does not terminate. This is weird, as I do not use the imported class.
import multiprocessing as mp
from Person import Person
def square(x):
return x**2
if __name__ == '__main__':
pool = mp.Pool(4)
results=pool.map(square,range(1,20))
The class Person is very simple:
class Person:
def __init__(self, id):
self.id = id
What is the reason for this behavior and how can I fix it?
EDIT: I am using Windows 10

Logging nested functions using joblib Parallel and delayed calls

In one of my scripts I have something like:
import logging
from joblib import Parallel, delayed
def f_A(x):
logging.info("f_A "+str(x))
def f_B():
logging.info("f_B")
res = Parallel(n_jobs=2, prefer="processes")(delayed(f_A)(x) for x in range(10))
if __name__ == "__main__":
logging.basicConfig(level=logging.INFO)
f_B()
I would expect that when I run python script.py something like:
INFO:root:f_B
INFO:root:f_A
to be shown in the console, instead I see:
INFO:root:f_B
but no information from f_A is shown.
How can I get f_A --and eventually functions called from there-- to show in the logs?
I think the issue is due to default logging level that is DEBUG and the main process doesn't share propagate the level to the children. If you modify slightly the script to:
import logging
from joblib import Parallel, delayed
def f_A(x):
logging.basicConfig(level=logging.INFO)
logging.info("f_A "+str(x))
def f_B():
logging.info("f_B")
res = Parallel(n_jobs=2, prefer="processes")(delayed(f_A)(x) for x in range(10))
if __name__ == "__main__":
logging.basicConfig(level=logging.INFO)
f_B()
then everything works as intended.

How to run subset of script in multiprocessing pool?

I'm trying to run a simple function with arguments from a list in a multiprocessing pool in Python 2.7.5 (Windows 7).
from multiprocessing import Pool
index_lst = []
for idx, item in enumerate(range(10)):
index_lst.append(idx)
def f(x):
return x*x
if __name__ == '__main__':
p = Pool(4)
print(p.map(f, index_lst))
Unfortunately, the entire script gets executed multiple times. How to prevent the list (index_lst) from being created over and over again?

Categories

Resources