Importing scipy breaks multiprocessing support in Python - python

I am running into a bizarre problem that I can't explain. I'm hoping someone out there can help please!
I'm running Python 2.7.3 and Scipy v0.14.0 and am trying to implement some very simple multiprocessor algorithms to speeds up my code using the module multiprocessing. I've managed to make a basic example work:
import multiprocessing
import numpy as np
import time
# import scipy.special
def compute_something(t):
a = 0.
for i in range(100000):
a = np.sqrt(t)
return a
if __name__ == '__main__':
pool_size = multiprocessing.cpu_count()
print "Pool size:", pool_size
pool = multiprocessing.Pool(processes=pool_size)
inputs = range(10)
tic = time.time()
builtin_outputs = map(compute_something, inputs)
print 'Built-in:', time.time() - tic
tic = time.time()
pool_outputs = pool.map(compute_something, inputs)
print 'Pool :', time.time() - tic
This runs fine, returning
Pool size: 8
Built-in: 1.56904006004
Pool : 0.447728157043
But if I uncomment the line import scipy.special, I get:
Pool size: 8
Built-in: 1.58968091011
Pool : 1.59387993813
and I can see that only one core is doing the work on my system. In fact, importing any module from the scipy package seems to have this effect (I've tried several).
Any ideas? I've never seen a case like this before, where an apparently innocuous import can have such a strange and unexpected effect.
Thanks!
Update (1)
Moving the scipy import line to the function compute_something partially improves the problem:
Pool size: 8
Built-in: 1.66807389259
Pool : 0.596321105957
Update (2)
Thanks to #larsmans for testing on a different system. Problem was not confirmed using Scipy v.0.12.0. Moving this query to the scipy mailing list and will post any answers.

After much digging around and posting an issue on the Scipy GitHub site, I've found a solution.
Before I start, this is documented very well here - I'll just give an overview.
This problem is not related to the version of Scipy, or Numpy that I was using. It originates in the system BLAS libraries that Numpy and Scipy use for various linear algebra routines. You can tell which libraries Numpy is linked to by running
python -c 'import numpy; numpy.show_config()'
If you are using OpenBLAS in Linux, you may find that the CPU affinity is set to 1, meaning that once these algorithms are imported in Python (via Numpy/Scipy), you can access at most one core of the CPU. To test this, within a Python terminal run
import os
os.system('taskset -p %s' %os.getpid())
If the CPU affinity is returned as f, of ff, you can access multiple cores. In my case it would start like that, but upon importing numpy or scipy.any_module, it would switch to 1, hence my problem.
I've found two solutions:
Change CPU affinity
You can manually set the CPU affinity of the master process at the top of the main function so that the code looks like this:
import multiprocessing
import numpy as np
import math
import time
import os
def compute_something(t):
a = 0.
for i in range(10000000):
a = math.sqrt(t)
return a
if __name__ == '__main__':
pool_size = multiprocessing.cpu_count()
os.system('taskset -cp 0-%d %s' % (pool_size, os.getpid()))
print "Pool size:", pool_size
pool = multiprocessing.Pool(processes=pool_size)
inputs = range(10)
tic = time.time()
builtin_outputs = map(compute_something, inputs)
print 'Built-in:', time.time() - tic
tic = time.time()
pool_outputs = pool.map(compute_something, inputs)
print 'Pool :', time.time() - tic
Note that selecting a value higher than the number of cores for taskset doesn't seem to matter - it just uses the maximum possible number.
Switch BLAS libraries
Solution documented at the site linked above. Basically: install libatlas and run update-alternatives to point numpy to ATLAS rather than OpenBLAS.

Related

Python multiprocessing pool function not defined

I need to implement a multiprocessing pool that utilizes arbitrary packages for calculations. For this, I'm using Python and joblib 0.9.0. This code is basically the structure I want.
import numpy as np
from joblib import pool
def someComputation(x):
return np.interp(x, [-1, 1], [-1, 1])
if __name__ == '__main__':
some_set_of_numbers = [-1,-0.5,0,0.5,1]
the_pool = pool.Pool(processes=2)
solutions = [the_pool.apply_async(someComputation, (x,)) for x in some_set_of_numbers]
print(solutions[0].get())
On both Windows 10 and Red Hat Enterprise Linux running Anaconda 4.3.1 Python 3.6.0 (as well as 3.5 and 3.4 with virtual envs), I get that 'np' was never passed into the someComputation() function raising the error
File "C:\Anaconda3\lib\site-packages\multiprocessing_on_dill\pool.py", line 608, in get
raise self._value
NameError: name 'np' is not defined
however, on my Mac OS X 10.11.6 running Python 3.5 and the same joblib, I get the expected output of '-1' with the exact same code. This question is essentially the same, but it dealt with pathos and not joblib. The general answer was to include the numpy import statement inside of the function
from joblib import pool
def someComputation(x):
import numpy as np
return np.interp(x, [-1, 1], [-1, 1])
if __name__ == '__main__':
some_set_of_numbers = [-1,-0.5,0,0.5,1]
the_pool = pool.Pool(processes=2)
solutions = [the_pool.apply_async(someComputation, (x,)) for x in some_set_of_numbers]
print(solutions[0].get())
This solves the issue on the Windows and Linux machines, where they now output '-1' as expected but this solution seems clunky. Is there any reason why the first bit of code would work on a Mac, but not Windows or Linux? I ultimately need to run this code on the Linux machine so is there any fix that doesn't include putting the import statement inside of the function?
Edit:
After investigating a bit further, I found an old workaround I put in years ago that looks like is causing the issue. In joblib/pool.py, I changed line 44 from
from multiprocessing.pool import Pool
to
from multiprocessing_on_dill.pool import Pool
to support pickling of arbitrary functions. For some reason, this change is what really causes the issue on Windows and Linux, but the Mac machine runs just fine. Using multiprocessing instead of multiprocessing_on_dill solves the above issue, but the code doesn't work for the majority of my cases since they can't be pickled.
I am not sure what the exact issue is, but it appears that there is some problem with transferring the global scope over to the subprocesses that run the task. You can potentially avoid name errors by binding the name np as a function parameter:
def someComputation(x, np=np):
return np.interp(x, [-1, 1], [-1, 1])
This has the advantage of not requiring a call to the import machinery every time the function is run. The name np will be bound to the function when it is first evaluated during module loading.

Pyfftw slower than matlab fft

I'm trying to compare Pyfftw (in Python 3.6) with matlab r2017a fft.
import time
import numpy
import pyfftw
import multiprocessing
nthread = multiprocessing.cpu_count()
print(nthread)
n=2**20
a = pyfftw.empty_aligned(n, dtype='complex128')
print("fft_object = pyfftw.builders.fft(a)")
fft_object = pyfftw.builders.fft(a) #this instruction spend much time
print("generate numbers")
a[:]= 5*numpy.random.rand(n)
print(a)
print("start fft")
start = time.clock()
y=fft_object()
end4 = time.clock() - start
print(end, time:")
print(end4)
print("result")
print(y)
print(len(y))
while if i use matlab:
x=5*rand(2^20,1);tic;fft(x);toc
this request just the time for computation of fft algorithm, that is the approximatively the same time of the python's call on fft_object().
Thanks in advance for your kind support.
You might take a look at GPU-based codes (if you have the proper hardware):
http://pypi.python.org/pypi/pyfft
http://pypi.python.org/pypi/scikits.cuda
They are based on PyCuda und PyOpenCL. I don't have much experience with these so you have to do a little digging to find what suits you best.

Python3 not handling matplotlib plot when using a multiprocess pool

I have a small script creating different plots. Since no data are shared, I can do some multiprocessing. Using python2.7, no problem. With python3.6, I can´t seem to make it work.
I am using a pool (https://docs.python.org/3/library/multiprocessing.html and https://docs.python.org/2/library/multiprocessing.html) since I do not share objects or anything.
For Python3, I get a crash without traceback at line (fig = plt.figure(number)).
I am running on MacOs X sierra. I believe the problem is the same as for this topic (Saving multiple matplotlib figures with multiprocessing). Unfortunately, the problem wasn´t really addressed as not being the main issue.
One fast answer would be to use python2.7, but other pieces of my work rely on python3+ features.
Any idea on how to have traceback at least (verbose mode didn't show anything related to the crash), and then to solve this issue?
Many thanks
Here is the smallest code producing the error, coming from the thread mentioned above. (this code will create 4 files in the folder of the script).
import matplotlib.pyplot as plt
import numpy.random as random
from multiprocessing import Pool
def do_plot(number):
fig = plt.figure(number)
a = random.sample(100)
b = random.sample(100)
plt.scatter(a, b)
plt.savefig("%03d.jpg" % (number,))
plt.close()
print("Done ", number)
if __name__ == '__main__':
pool = Pool()
pool.map(do_plot, range(4))

CUDA-Python: How to launch CUDA kernel in Python (Numba 0.25)?

could you please help me understand how to write CUDA kernels in Python? AFAIK, numba.vectorize can be performed on cuda, cpu, parallel(multi-cpus), based on target. But target='cuda' requires to set up CUDA kernels.
The main issue is that many examples, answers in Internet are related to deprecated NumbaPro library, so it's hard to follow to such as not-updated WIKIs, especially if you're newbie.
I have:
latest Anaconda (v2)
latest Numba (v0.25)
latest CUDA toolkit (v7)
Here is the error I'm getting:
numba.cuda.cudadrv.driver.CudaAPIError: 1 Call to cuLaunchKernel
results in CU DA_ERROR_INVALID_VALUE
import numpy as np
import time
from numba import vectorize, cuda
#vectorize(['float32(float32, float32)'], target='cuda')
def VectorAdd(a, b):
return a + b
def main():
N = 32000000
A = np.ones(N, dtype=np.float32)
B = np.ones(N, dtype=np.float32)
start = time.time()
C = VectorAdd(A, B)
vector_add_time = time.time() - start
print "C[:5] = " + str(C[:5])
print "C[-5:] = " + str(C[-5:])
print "VectorAdd took for % seconds" % vector_add_time
if __name__ == '__main__':
main()
The code, as posted, is correct and will run on a Python 2 Numbapro/Accelerate system without error.
It was likely that the particular system being used to run the code wasn't very large in capacity and was hitting a display driver watchdog or free memory error with 32 million element vectors. Reducing the size of the input data allowed the code to run correctly.
[This answer assembled from comments and added as a community wiki entry to get this question off the unanswered list]

Python rpy2 and matplotlib conflict when using multiprocessing

I am trying to calculate and generate plots using multiprocessing. On Linux the code below runs correctly, however on the Mac (ML) it doesn't, giving the error below:
import multiprocessing
import matplotlib.pyplot as plt
import numpy as np
import rpy2.robjects as robjects
def main():
pool = multiprocessing.Pool()
num_figs = 2
# generate some random numbers
input = zip(np.random.randint(10,1000,num_figs),
range(num_figs))
pool.map(plot, input)
def plot(args):
num, i = args
fig = plt.figure()
data = np.random.randn(num).cumsum()
plt.plot(data)
main()
The Rpy2 is rpy2==2.3.1 and R is 2.13.2 (I could not install R 3.0 and rpy2 latest version on any mac without getting segmentation fault).
The error is:
The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug.
The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
I have tried everything to understand what the problem is with no luck. My configuration is:
Danials-MacBook-Pro:~ danialt$ brew --config
HOMEBREW_VERSION: 0.9.4
ORIGIN: https://github.com/mxcl/homebrew
HEAD: 705b5e133d8334cae66710fac1c14ed8f8713d6b
HOMEBREW_PREFIX: /usr/local
HOMEBREW_CELLAR: /usr/local/Cellar
CPU: dual-core 64-bit penryn
OS X: 10.8.3-x86_64
Xcode: 4.6.2
CLT: 4.6.0.0.1.1365549073
GCC-4.2: build 5666
LLVM-GCC: build 2336
Clang: 4.2 build 425
X11: 2.7.4 => /opt/X11
System Ruby: 1.8.7-358
Perl: /usr/bin/perl
Python: /usr/local/bin/python => /usr/local/Cellar/python/2.7.4/Frameworks/Python.framework/Versions/2.7/bin/python2.7
Ruby: /usr/bin/ruby => /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby
Any ideas?
This error occurs on Mac OS X when you perform a GUI operation outside the main thread, which is exactly what you are doing by shifting your plot function to the multiprocessing.Pool (I imagine that it will not work on Windows either for the same reason - since Windows has the same requirement). The only way that I can imagine it working is using the pool to generate the data, then have your main thread wait in a loop for the data that's returned (a queue is the way I usually handle it...).
Here is an example (recognizing that this may not do what you want - plot all the figures "simultaneously"? - plt.show() blocks so only one is drawn at a time and I note that you do not have it in your sample code - but without I don't see anything on my screen - however, if I take it out - there is no blocking and no error because all GUI functions are happening in the main thread):
import multiprocessing
import matplotlib.pyplot as plt
import numpy as np
import rpy2.robjects as robjects
data_queue = multiprocessing.Queue()
def main():
pool = multiprocessing.Pool()
num_figs = 10
# generate some random numbers
input = zip(np.random.randint(10,10000,num_figs), range(num_figs))
pool.map(worker, input)
figs_complete = 0
while figs_complete < num_figs:
data = data_queue.get()
plt.figure()
plt.plot(data)
plt.show()
figs_complete += 1
def worker(args):
num, i = args
data = np.random.randn(num).cumsum()
data_queue.put(data)
print('done ',i)
main()
Hope this helps.
I had a similar issue with my worker, which was loading some data, generating a plot, and saving it to a file. Note that this is slightly different than what the OP's case, which seems to be oriented around interactive plotting. Still, I think it's relevant.
A simplified version of my code:
def worker(id):
data = load_data(id)
plot_data_to_file(data) # Generates a plot and saves it to a file.
def plot_something_parallel(ids):
pool = multiprocessing.Pool()
pool.map(worker, ids)
plot_something_parallel(ids=[1,2,3])
This caused the same error others mention:
The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug.
Following #bbbruce's train of thought, I solved my problem by switching the matplotlib backend from TKAgg to the default. Specifically, I commented out the following line in my matplotlibrc file:
#backend : TkAgg
This might be rpy2-specific.
There are reports of a similar problem with OS X and multiprocessing here and there.
I think that using an initializer that imports the packages needed to run the code in plot could solve the problem (multiprocessing-doc).
I had a similar issue and found that setting the start method in multiprocessing to use forkserver works as long as it comes after your if name == main: statement.
if __name__ == '__main__':
multiprocessing.set_start_method('forkserver')
first_process = multiprocessing.Process(target = targetOne)
second_process = multiprocessing.Process(target = targetTwo)
first_process.start()
second_process.start()
Try to upgrade matplotlib to 3.0.3:
pip3 install matplotlib --upgrade
Then everything goes fine.
=======================================================================
No need to read below anymore.
Yesterday, my multiprocess plot works on my MacBook Air. But not working on my MacBook Pro tomorrow morning with the same code, displaying many:
The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug.
The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
They are all using 4th gen i intel CPU (i5-4xxx with air and i7-4xxx with pro). So if there are no difference on hardware, it must be on software.
So I just tried update matplot to 3.0.3 on MacBook Pro( was 3.0.1), every thing goes fine.
Also, no need to do pool.apply_async anymore.

Categories

Resources