An issue with paralellising function broadcasting over a mesh using dask - python

I am looking to parallelise a function which takes multiple 1-dimensional ranges (which are of the form np.linspace(x,y,t)) of numerical input values (this is variable, but lets say it takes five), creates a mesh out of these ranges, and then evaluates some (5-dimensional) cost function for this over this mesh. In its current form it looks something like this:
def func_5d(a,b,c,d,e):
return a + b + c + d + e
def range_search(a_range, b_range, c_range, d_range, e_range):
mesh = itertools.product(a_range, b_range, c_range, d_range, e_range)
func_eval = map(lambda x: (func_5d(np.array(x)), x), mesh)
return func_eval
So, here I would be looking to parallelise the function range_search using dask. Ideally, this would be done by creating a dask mesh, which could then be chunked, and then mapped through to our cost function using either multi-threading or multi-core processing. Looking through the dask documentation, it does not appear that dask.array contains any suitable mechanism to achieve this. There is a dask.array.meshgrid function, extended from the numpy library, but this does not support chunking. Additionally, dask.array does not seem to contain a paralellised map function. However, there is one in dask.bag. But the documentation seems to suggest that dask.bag is used only as a module to carry out preliminary processing of raw data (in formats such as CSV, JSON, etc). Dask.bag objects do also have a method called product() which seems to imitate the itertools.product; however this only takes one other dask.bag object as an argument. So meshing 5 arrays required this method called to be stacked (4 times), which aside from being hideously ugly, is also inefficent when the number of inputs is variable.
From here, I don't really know where to go. I have worked through the Jupyter Notebooks that dask have put together, but they do not seem to hold an answer to my question. Any suggestions on the best approach to paralellising functions of the above form would be much appreciated.

I would use Numpy Slicing for this
a[:, None, None] + b[None, :, None] + c[None, None, :]
You will want to make sure that your input vectors are chunked finely enough that the products of them will still fit comfortably in memory.

Related

"Direct" numpy functions on an array vs numpy array functions

I have a question about the design of Python. I have realised that some functions are implemented directly on container classes (e.g. numpy arrays) while other function that act on these containers must be called from numpy itself. An example would be:
import numpy as np
y = np.array([4,7,9,1])
m1 = np.mean(y) # Ok
m2 = y.mean() # Ok
print(m1 == m2) # True
x = [2,3]
r1 = np.concatenate([x, y]) # Ok
r2 = y.concatenate(x) # AttributeError: 'numpy.ndarray' object has no attribute 'concatenate'
print(r1 == r2)
Why can the mean be calculated directly from the array while the array as no method to concatenate another one to it? Is there a general rule which functions can be called directly on the array and which ones not? And if both is possible what is the pythonic way to do it?
The overview of NumPy history gives an indication of why not everything is consistent: it has two predecessors that were developed independently. Backward compatibility requires the project to keep array methods like max. Ongoing development favors the function syntax np.fun(array). I suppose one reason for the latter is that it allows array_like input (the term used throughout NumPy documentation): anything that NumPy can turn into an ndarray.
The question of why there are both methods and functions of the same name has been discussed and links provided.
But to focus on your two examples:
mean uses just one array. Logically it can be an ndarray method.
concatenate takes a list of arrays, and doesn't give priority to any one of them.
There is a np.append function that looks superficially like the list .append method. But it just passes the task on to concatenate with just a few modifications. And it causes all kinds of newby errors - it isn't inplace, it ravels, and it is slow compared to the list method.
Or consider the large family of ufunc. Those are functions, some take one array, others two. They share a common ufunc functionality.
np.add(a,b) <=> a+b <=> a.__add__(b)
np.sin(a) # no a.sin()
I suspect the choice to make sin a ufunc rather than a method has been influenced by common mathematical notation.
To me a big plus to the function approach is that it can be applied to a list or scalar. np.sin(1) works just as well as np.sin([0,.5,1]) or np.sin(np.arange(0,1,.5)).
Yes, history goes a long way toward excusing the mix of functions of methods, but many of the choices are logical.

Are numpy-matrix-functions buffered?

Are numpy matrix specific functions, such as x.max() buffered when applied several times?
So should one write:
bincount=np.apply_along_axis(lambda x: np.bincount(x, minlength=data.max()+1), axis=0, arr=data)
or better
data_max=data.max()+1
bincount=np.apply_along_axis(lambda x: np.bincount(x, minlength=data_max), axis=0, arr=data)
where data is e.g.
data=np.array([[1,2,5,4,8,7,8,9,14,8,14,5,2,1],
[5,8,7,13,7,8,9,21,5,7,9,24,3,2]])
or of course even much larger
After updating the question, it seems that you are asking whether numpy implements some form of caching of its results. While there is no general response to this question, for a method like ndarray.max, it is clear that no caching is done.
How can we know that without looking at the implementation? Consider that a caching scheme must resolve two problems:
find a place to store the cached result(s);
have a strategy to invalidate the cache once it no longer applies.
Although the first issue is non-trivial, the second one is the real killer. Not only can a numpy array be changed at any time, but the contents of the array can be shared by many objects. Additionally, C code can obtain the address of the internal buffers, and implement its own modifications to the underlying memory. Caching results would effectively disable many interesting uses of numpy.
You can consider numpy as a low-level library that doesn't concern itself with optimizations of that nature. If caching is needed, it should be implemented at a higher level, such as shown in your second example.
Like Slater Tyranus pointed out, only a benchmakr will show any results:
import numpy as np
import timeit
def func_a(data):
return np.apply_along_axis(lambda x: np.bincount(x, minlength=data.max()+1), axis=0, arr=data)
def func_b(data):
data_max=data.max()+1
return np.apply_along_axis(lambda x: np.bincount(x, minlength=data_max), axis=0, arr=data)
setup = '''import numpy as np
data=np.array([[1,2,5,4,8,7,8,9,14,8,14,5,2,1],
[5,8,7,13,7,8,9,21,5,7,9,24,3,2]])
from __main__ import func_a, func_b'''
min(timeit.Timer('func_a(data)', setup=setup).repeat(100,100))
0.02922797203063965
min(timeit.Timer('func_b(data)', setup=setup).repeat(100,100))
0.018524169921875
I tested also with much larger data. Overall one can say, it pays back calculating data_max=data.max() before. With much bigger arrays the discrepancy gets even larger.

How to cache the function that is returned by scipy interpolation

Trying to speed up a potential flow aerodynamic solver. Instead of calculating velocity at an arbitrary point using a relatively expensive formula I tried to precalculate a velocity field so that I could interpolate the values and (hopefully) speed up the code. Result was a slow-down due (I think) to the scipy.interpolate.RegularGridInterpolator method running on every call. How can I cache the function that is the result of this call? Everything I tried gets me hashing errors.
I have a method that implements the interpolator and a second 'factory' method to reduce the argument list so that it can be used in an ODE solver.
x_panels and y_panels are 1D arrays/tuples, vels is a 2D array/tuple, x and y are floats.
def _vol_vel_factory(x_panels, y_panels, vels):
# Function factory method
def _vol_vel(x, y, t=0):
return _volume_velocity(x, y, x_panels, y_panels, vels)
return _vol_vel
def _volume_velocity(x, y, x_panels, y_panels, vels):
velfunc = sp_int.RegularGridInterpolator(
(x_panels, y_panels), vels
)
return velfunc(np.array([x, y])).reshape(2)
By passing tuples instead of arrays as inputs I was able to get a bit further but converting the method output to a tuple did not make a difference; I still got the hashing error.
In any case, caching the result of the _volume_velocity method is not really what I want to do, I really want to somehow cache the result of _vol_vel_factory, whose result is a function. I am not sure if this is even a valid concept.
scipy.interpolate.RegularGridInterpolator returns a numpy array. This is not cacheable because it doesn't implement hash.
You can store other representations of the numpy array and cache that and then convert it back to a numpy array though. For details on how to do that look at the following.
How to hash a large object (dataset) in Python?

input/output validation/casting of a numpy calculation

This is a situation that happens quite often in my codes. Say I have a function do_sth(a,b), that, only for the sake of this example, simply calculates a+b, with a,b either 1D numpy arrays or scalars. In many occasions, I need the function to broadcast the operation, so that if both a,b are 1D arrays, the result will be a 2D array. An example of what I mean follows:
do_sth(1,2) -> 3
do_sth([1,2],0) -> array([1, 2])
do_sth(0,[3,4]) -> array([3, 4])
do_sth([1,2],[3,4]) -> array([[4, 5], [5, 6]])
This is a bit similar to how a numpy ufunc behaves. A possible implementation follows:
from numpy import newaxis, atleast_1d
def do_sth(a, b):
"a,b should be either 1d numpy arrays or scalars"
a, b = map(atleast_1d, [a, b])
# the line below mocks a more complicated calculation
res = a[:, newaxis] + b[newaxis]
conds = [a.size == 1, b.size == 1]
if all(conds):
return res[0, 0]
elif any(conds):
return res.ravel()
else:
return res
As you can see, there's quite a lot of boilerplate. The first question is: is this the right way to do this input/output casting? Is there any reason to not use a decorator to deal with a situation like this? Is there any guideline on the matter?
Moreover, the more complicated calculation, here mocked by the addition, often fails badly if a or b are numpy arrays with 2D,3D shape for example. I say badly in the sense that the point where the calculation fails is not obvious, or may change with time in different revisions of the code, and it is hard to see the connection between the error and the wrong input shape. I think it is then NOT advisable to put the complicated calculation in a try/except block (following python EAFP). In this case, is it correct to check the shape of the 2 arrays at the beginning of the function? Is there any alternative? Is there a numpy function that allows at the same time to convert the input to a numpy array, and also check that the input is compatible with a certain number of dimensions, something like asarray_withdim(arr,ndim=5)?
Regarding the use of decorators - I haven't seen much use of decorators in numpy code, but I think that's because most of the functionality was developed before decorators become common in Python. If you can make it work, there shouldn't be a any downside (but I'm not an expert with either decorators or ufunc).
Non complied numpy functions often have a lot of code that massages the inputs into convenient dimensions. Then they do the core action, followed by final reshaping and type wrapping. They might use functions like np.atleast_2d to ensure there are enough dimensions, and .reshape(-1,1,1) to compress excess dimensions.
np.tensordot is an example of one that performs axes transpose plus reshape on the inputs so it can apply the compiled np.dot. np.insert starts with a number of ndim and isinstance tests. Special cases are handled early, while the general one is left to the end. np.einsum is compiled, but there's a lot of preprocessing being done in C code, before it finally creates an nditer object and does the calculation.

Python and Numba for vectorized functions

Good day, I'm writing a Python module for some numeric work. Since there's a lot of stuff going on, I've been spending the last few days optimizing code to improve calculations times.
However, I have a question concerning Numba.
Basically, I have a class with some fields which are numpy arrays, which I initialize in the following way:
def init(self):
a = numpy.arange(0, self.max_i, 1)
self.vibr_energy = self.calculate_vibr_energy(a)
def calculate_vibr_energy(i):
return numpy.exp(-self.harmonic * i - self.anharmonic * (i ** 2))
So, the code is vectorized, and using Numba's JIT results in some improvement. However, sometimes I need to access the calculate_vibr_energy function from outside the class, and pass a single integer instead of an array in place of i.
As far as I understand, if I use Numba's JIT on the calculate_vibr_energy, it will have to always take an array as an argument.
So, which of the following options is better:
1) Create a new function calculate_vibr_energy_single(i), which will only take a single integer number, and use Numba on it too
2) Replace all usages of the function that are similar to this one:
myclass.calculate_vibr_energy(1)
with this:
tmp = np.array([1])
myclass.calculate_vibr_energy(tmp)[0]
Or are there other, more efficient (or at least, more Python-ic) ways of doing that?
I have only played a little with numba yet so I may be mistaken, but as far as I've understood it, using the "autojit" decorator should give functions that can take arguments of any type.
See e.g. http://numba.pydata.org/numba-doc/dev/pythonstuff.html

Categories

Resources