Batch call to array of functions in Python - python

I am using scipy.interpolate.Rbf, which returns a function, to fit a large number RBF to different sets of points, and storing the output of this in a vector of functions, as follows
import scipy.interpolate as interp
for i in range(0,n):
# Gets data points for this particular iteration
data = get_data(i)
# Fits RBF to data points
zfun_smooth_rbf = interp.Rbf(data[:, 0], data[:, 1], data[:, 2], function='linear', smooth=0)
# Appends RBF function
rbf_fit.append(zfun_smooth_rbf)
And then I am interested in running all the functions to regress a value for each computed RBF. Currently I use a foor loop strategy, similar to what was answered in this question, but this is far from ideal, because it basically runs this sequentially
c = [float(f(x,y) for f in self.rbf_fit]
Is there anyway to run this call with a single call? In other words, I need to call all the functions stored in an array, at the same time. Something like c = self.rbf_fit[:](x,y)?

I'm going to try to combine the __call__ of 2 rbfi into one call.
In [80]: from scipy.interpolate import Rbf
Make a sample as illustrated in the docs:
In [81]: x, y, z, d = np.random.rand(4, 50)
In [82]: rbfi0 = Rbf(x, y, z, d)
In [83]: xi = yi = zi = np.linspace(0, 1, 20)
In [84]: di0 = rbfi0(xi, yi, zi)
In [85]: di0
Out[85]:
array([ 0.26614249, 0.07429816, -0.01512205, 0.05134466, 0.24213774,
0.41653342, 0.45280185, 0.34763177, 0.17681661, 0.07186139,
0.16299749, 0.40416788, 0.641642 , 0.78828711, 0.79709639,
0.6530432 , 0.42473033, 0.24155719, 0.17008326, 0.179932 ])
Make a second sample:
In [86]: x, y, z, d = np.random.rand(4, 50)
In [87]: rbfi1 = Rbf(x, y, z, d)
In [88]: di1 = rbfi1(xi, yi, zi)
In [89]: di1
Out[89]:
array([ 0.38975158, 0.39887118, 0.42430634, 0.48554998, 0.59403568,
0.71745345, 0.77483525, 0.70657269, 0.53545478, 0.34931526,
0.28960157, 0.45825098, 0.7538652 , 0.99950089, 1.14749381,
1.19019632, 1.12736371, 1.00558691, 0.87811695, 0.77231634])
Look at the key attributes of the rbfi:
In [90]: rbfi0.nodes
Out[90]:
array([ -13.02451018, -3.05675802, 8.54073071, -81.47163716,
-5.74247623, 118.70153224, -1.39117053, -3.37170396,
....
-10.08326243, 8.9995743 , 3.83357612, -4.59815344,
-25.09981508, -2.8753806 , -0.63932038, 76.59402274,
0.26222997, -30.35280108])
In [91]: rbfi0.nodes.shape
Out[91]: (50,)
In [92]: rbfi1.nodes.shape
Out[92]: (50,)
In [93]: rbfi0.xi.shape
Out[93]: (3, 50)
In [94]: rbfi1.xi.shape
Out[94]: (3, 50)
Construct the variables in the __call__:
In [95]: xa = np.asarray([a.flatten() for a in [xi,yi,zi]], dtype=np.float_)
In [96]: xa.shape
Out[96]: (3, 20)
In [97]: r0 = rbfi0._call_norm(xa, rbfi0.xi)
In [98]: r1 = rbfi1._call_norm(xa, rbfi1.xi)
In [99]: r0.shape
Out[99]: (20, 50)
In [100]: r1.shape
Out[100]: (20, 50)
Compute the norm for both rbfi with one call - by concatenating the xi arrays:
In [102]: r01 = rbfi0._call_norm(xa, np.concatenate((rbfi0.xi, rbfi1.xi),axis=1))
In [103]: r01.shape
Out[103]: (20, 100)
In [104]: np.allclose(r0, r01[:,:50])
Out[104]: True
In [105]: np.allclose(r1, r01[:,50:])
Now do the same for the nodes' anddot`:
In [110]: res01 = np.dot(rbfi0._function(r01), np.concatenate((rbfi0.nodes, rbfi1.nodes)))
In [111]: res01.shape
Out[111]: (20,)
Oops. We want two sets of 20; this fits it against all 100 nodes at once. I need to do some reshaping.
In [133]: r01.shape
Out[133]: (20, 100)
In [134]: r01 = r01.reshape(20,2,50)
In [135]: nodes01 = np.concatenate((rbfi0.nodes, rbfi1.nodes))
In [136]: nodes01.shape
Out[136]: (100,)
In [137]: nodes01 = nodes01.reshape(2,50) # should have just stacked them
The _function callables for the 2 rbfi differ, so I have use them separately:
In [138]: fr01 = [rbfi0._function(r01[:,0,:]), rbfi1._function(r01[:,1,:])]
In [139]: fr01[0].shape
Out[139]: (20, 50)
With more samples this list would be constructed with a list comprehension.
In [140]: fr01 = np.stack(fr01, axis=1)
In [141]: fr01.shape
Out[141]: (20, 2, 50)
Now I can do the np.dot for the combined rbfi:
In [142]: res01 = np.einsum('ijk,jk->ij', fr01, nodes01)
In [143]: res01.shape
Out[143]: (20, 2)
In [144]: np.allclose(res0, res01[:,0])
Out[144]: True
In [145]: np.allclose(res1, res01[:,1])
Out[145]: True
In [149]: di01 = np.stack([rbfi0(xi, yi, zi), rbfi1(xi, yi, zi)],axis=1)
In [150]: di01.shape
Out[150]: (20, 2)
In [151]: np.allclose(di01, res01)
So I've managed to replace the In [149] iteration with a In [138] one. I don't know if that's a time savings or not. It may depend on how costly the _function call is compared to the rest the rbfi.__call__.
In my example
In [131]: rbfi0._function
Out[131]: <bound method Rbf._h_multiquadric of <scipy.interpolate.rbf.Rbf object at 0xab002fac>>
I don't know if your parameters, function='linear', smooth=0 make a difference. If the respective _function attributes are the same, then I could replace iteration with a
rbfi0._function(r01).reshape(20,2,50)
That gives an idea of how you might speed up the iteration of the rbfi, and maybe even replace it with a 'vector' operation.
It looks like, for the default function, the difference is only in the epsilon value:
In [156]: rbfi0._function??
Signature: rbfi0._function(r)
Source:
def _h_multiquadric(self, r):
return np.sqrt((1.0/self.epsilon*r)**2 + 1)
File: /usr/local/lib/python3.5/dist-packages/scipy/interpolate/rbf.py
Type: method
In [157]: rbfi0.epsilon
Out[157]: 0.25663331561494024
In [158]: rbfi1.epsilon
Out[158]: 0.26163317529091562

Related

Why is the sum of the absolute values of np.sin(x) and np.cos(x) from 0 to 2*pi not the same?

I am trying to compute the sum of the absolute values of these two expressions, and I am somehow confused, since the sum should be the same, right?
Consider the integrals:
Integral of abs(sin(x))dx
Integral of abs(cos(x))dx
It's easy to see, that the area underneath them is the same, and indeed both integrals return 4.
I wrote a simple script to evenly sample those functions and add all the sampled values together. Scaling aside, both expressions should yield the same result, but they dont. Here is the code:
import numpy as np
angles = np.linspace(0, 2*np.pi, 1000)
line1, line2= [], []
for n in angles:
line1.append(np.abs(np.cos(n)))
line2.append(np.abs(np.sin(n)))
print(sum(line1), sum(line2))
The result is the following:
636.983414656738 635.9826284722284
The sums are off by almost exactly 1. I know that they are not 4, because there is some constant factor missing, but the point is, that the values should be the same. Am I completly missing something here or is this a bug?
Consider the extreme case of reducing the samples to 3:
In [93]: angles = np.linspace(0, 2*np.pi, 3)
In [94]: angles
Out[94]: array([0. , 3.14159265, 6.28318531])
In [95]: np.cos(angles)
Out[95]: array([ 1., -1., 1.])
In [96]: np.sin(angles)
Out[96]: array([ 0.0000000e+00, 1.2246468e-16, -2.4492936e-16])
The extra 1 for cos persists for larger samples.
In [97]: angles = np.linspace(0, 2*np.pi, 1001)
In [98]: np.sum(np.abs(np.cos(angles)))
Out[98]: 637.6176779711009
In [99]: np.sum(np.abs(np.sin(angles)))
Out[99]: 636.6176779711009
But if we tell it to skip the 2*np.pi end point, the values match:
In [100]: angles = np.linspace(0, 2*np.pi, 1001, endpoint=False)
In [101]: np.sum(np.abs(np.cos(angles)))
Out[101]: 637.256653677874
In [102]: np.sum(np.abs(np.sin(angles)))
Out[102]: 637.2558690641631
Because your integral method, a simple "sum", has systematic errors that cannot be ignored for this case.
Now try this:
import numpy as np
angles = np.linspace(0, 2*np.pi, 1000)
line1, line2= [], []
for n in angles:
line1.append(np.abs(np.cos(n)))
line2.append(np.abs(np.sin(n)))
i1=np.trapz(line1,angles)
i2=np.trapz(line2,angles)
print(i1,i2,abs(2*(i1-i2)/(i1+i2)))
The result is:
4.000001648229352 3.9999967035417043 1.2361721666315376e-06

How to properly stack numpy arrays?

I am having trouble understanding how data is being stacked in a numpy array and why I cannot match the last data that I added to an array with the last generated data. Here is a MWE:
import numpy as np
np.random.seed(1)
# build storage
container = []
# gen data
x = np.random.random((13, 1, 64, 768))
# add to container
container.append(x)
# gen data
x2 = np.random.random((13, 1, 64, 768))
# add to container
container.append(x2)
# convert to np array
container = np.asarray(container)
# reshape to [13, 2, 64, 768]
container = container.reshape(13, 2, 64, 768)
# check that the last generated data matches the last appended data
assert np.all(x2.flatten() == container[:, -1, :, :].flatten()), 'not a match'
Instead of stacking manually with appending to lists and then reshaping you could use the vstack or the concatenate function of numpy.
# gen data
x1 = np.random.random((13, 1, 64, 768))
x2 = np.random.random((13, 1, 64, 768))
container = np.vstack((x1,x2))
assert np.all(x2.flatten()) == np.all(container[:, -1, :, :].flatten()), 'not a match'
To answer your question: your code does work, just make sure to put np.all() at both sides of the comparison. It's always a good idea to make your input much smaller (say (2,1,2,2)) so you can see what actually happens.
In [152]: alist = []
In [154]: alist.append(np.random.random((2,1,3)))
In [155]: alist.append(np.random.random((2,1,3)))
In [156]: alist
Out[156]:
[array([[[0.85221826, 0.56088315, 0.06232853]],
[[0.0966469 , 0.89513922, 0.44814579]]]),
array([[[0.86207845, 0.88895573, 0.62069196]],
[[0.11475614, 0.29473531, 0.11179268]]])]
Using np.array to join the list elements produces a 4d array - it has joined them on a new leading dimension:
In [157]: arr = np.array(alist)
In [158]: arr.shape
Out[158]: (2, 2, 1, 3)
In [159]: arr[-1,] # same as alist[-1]
Out[159]:
array([[[0.86207845, 0.88895573, 0.62069196]],
[[0.11475614, 0.29473531, 0.11179268]]])
If we concatenate on one of the dimensions:
In [160]: arr = np.concatenate(alist, axis=1)
In [161]: arr
Out[161]:
array([[[0.85221826, 0.56088315, 0.06232853],
[0.86207845, 0.88895573, 0.62069196]],
[[0.0966469 , 0.89513922, 0.44814579],
[0.11475614, 0.29473531, 0.11179268]]])
In [162]: arr.shape
Out[162]: (2, 2, 3) # note the shape - that 2nd 2 is the join axis
In [163]: arr[:,-1]
Out[163]:
array([[0.86207845, 0.88895573, 0.62069196],
[0.11475614, 0.29473531, 0.11179268]])
[163] has the same numbers as [159], but a (2,3) shape.
reshape keeps the values, but may 'shuffle' them:
In [164]: np.array(alist).reshape(2,2,3)
Out[164]:
array([[[0.85221826, 0.56088315, 0.06232853],
[0.0966469 , 0.89513922, 0.44814579]],
[[0.86207845, 0.88895573, 0.62069196],
[0.11475614, 0.29473531, 0.11179268]]])
We have transpose the leading 2 axes before reshape to match [161]
In [165]: np.array(alist).transpose(1,0,2,3)
Out[165]:
array([[[[0.85221826, 0.56088315, 0.06232853]],
[[0.86207845, 0.88895573, 0.62069196]]],
[[[0.0966469 , 0.89513922, 0.44814579]],
[[0.11475614, 0.29473531, 0.11179268]]]])
In [166]: np.array(alist).transpose(1,0,2,3).reshape(2,2,3)
Out[166]:
array([[[0.85221826, 0.56088315, 0.06232853],
[0.86207845, 0.88895573, 0.62069196]],
[[0.0966469 , 0.89513922, 0.44814579],
[0.11475614, 0.29473531, 0.11179268]]])

How to use print() command and make the shape of a numpy array consistent, during integration using scipy?

I tried examining how spicy.integrate.ode works. The code below is simple code to do this.
def func(t, z, p):
x = z[0]
y = z[1]
print('x :', x)
print('x.shape :', x.shape)
print('y :', y)
print('y.shape :', y.shape)
return [x*0, y*0]
t_ini = 0
t_fin = 1
x_ini = np.array([[2, 2]])
y_ini = np.array([[2, 2]])
solver = ode(func)
solver.set_integrator('dopri5')
solver.set_initial_value([x_ini, y_ini], t_ini)
solver.set_f_params([0])
solver.integrate(t_fin)
x_fin, y_fin = solver.y
print('x_fin :', x_fin)
print('y_fin :', y_fin)
However,
print('x :', x)
print('x.shape :', x.shape)
print('y :', y)
print('y.shape :', y.shape)
return [x*0, y*0]
didn't work. The result of the code was
x_fin : [[2. 2.]]
y_fin : [[2. 2.]]
.
Interestingly, when I changed x_ini and y_ini into
x_ini = np.array([[2]])
y_ini = np.array([[2]])
, the print() command worked and the result of the code was the repetition of
x : 2.0
x.shape : ()
y : 2.0
y.shape : ()
with the two lines after the repetition which are
x_fin : [[2.]]
y_fin : [[2.]]
.
It was strange that even if I put x_ini and y_ini having (1, 1) shape, both print(x.shape) and print(y.shape) showed ().
So the questions are:
Why the print() didn't worked for x_ini = y_ini = np.array([[2, 2]]) and what I should to to make them work?
Why the shape of the numpy arrays which are x and y became () instead of (1, 1).
How to make the shape of the numpy arrays which are x and y be (1, 1) during the integration using scipy. What should I do if the shape of both x_ini and y_ini is (2, 2) and I want to make the shape consistent during the integration using scipy.
Is there any guys who know about these?
I get a warning when using your initial value array:
In [9]: x_ini = np.array([[2, 2]])
...: y_ini = np.array([[2, 2]])
In [10]: solver.set_initial_value([x_ini, y_ini], 0)
Out[10]: <scipy.integrate._ode.ode at 0x7f1ab8953d60>
In [11]: solver.integrate(.1)
/usr/local/lib/python3.8/dist-packages/scipy/integrate/_ode.py:1181: UserWarning: dopri5: input is not consistent
warnings.warn('{:s}: {:s}'.format(self.__class__.__name__,
Out[11]:
array([[[2., 2.]],
[[2., 2.]]])
The output is the same as the input
In [12]: np.array([x_ini, y_ini])
Out[12]:
array([[[2, 2]],
[[2, 2]]])
With
x_ini = np.array([[2]])
y_ini = np.array([[2]])
The initial value is a (2,1,1) array
In [18]: np.array([x_ini, y_ini])
Out[18]:
array([[[2]],
[[2]]])
That does run, but the values passed to your function are 0d arrays
x : 2.0
x.shape : ()
y : 2.0
y.shape : ()
===
Let's simplify the func:
In [20]: def func(t, z, p):
...: print(type(z), z.shape, z)
...: return z*0
...:
In [21]: solver = ode(func)
...: solver.set_integrator('dopri5')
Out[21]: <scipy.integrate._ode.ode at 0x7f1ab6debe50>
In [22]: solver.set_f_params([0])
Out[22]: <scipy.integrate._ode.ode at 0x7f1ab6debe50>
In [23]: solver.set_initial_value([1,2], 0)
Out[23]: <scipy.integrate._ode.ode at 0x7f1ab6debe50>
In [24]: solver.integrate(.1)
<class 'numpy.ndarray'> (2,) [1. 2.]
...
If I change the initial value to a (2,1,1), func gets the same inputs:
In [27]: solver.set_initial_value([[[1]],[[2]]], 0)
Out[27]: <scipy.integrate._ode.ode at 0x7f1ab6debe50>
In [28]: solver.integrate(.1)
<class 'numpy.ndarray'> (2,) [1. 2.]
Change the input to a 3 element array:
In [31]: solver.set_initial_value([1,2,3], 0)
Out[31]: <scipy.integrate._ode.ode at 0x7f1ab6debe50>
In [32]: solver.integrate(.1)
<class 'numpy.ndarray'> (3,) [1. 2. 3.]
From the docs:
f : callable ``f(t, y, *f_args)``
Right-hand side of the differential equation. t is a scalar,
``y.shape == (n,)``.
``f_args`` is set by calling ``set_f_params(*args)``.
`f` should return a scalar, array or list (not a tuple).
f returns dy/dt. The y will be a 1d array, and it's supposed to return a like size array. Note the y.shape requirement.
The y that ode passes to the function is derived from the initial value array. A (2,1,1) input is flattened to (2,). A (2,1,2) produces the warning.

Efficient way to cast scalars to numpy arrays

When I write a function that accepts ndarray or scalar inputs
def foo(a):
# does something to `a`
#
# a: `x` dimensional array or scalar
# . . .
cast(a, x)
# deal with `a` as if it is an `x`-d array after this
Is there an effeicint way yo write that cast function? Basically what I'd want is a function that would cast:
a, a scalar to ndarray with shape ((1,)*x)
b, an ndarray with y<x dims explicitly to shape ((1,) * (y-x) + b.shape) (same as broadcasting)
c, an ndarray with x dims is unaffected
d, an ndarray with y>x dims throws an error
do it all in-place (at least when starting with an array), to prevent double memory
it seems like this functionality is repeated so often in built-in functions that there should be some shortcut for it, but I'm not finding it.
I can do a_ = np.array(a, ndmin = x, copy = False) and then assert len(a_.shape) == x) , but that still makes a copy of arrays. (i.e. a_.base is a is False). Is there any way around this?
asarray returns the array itself (if starting with an array):
In [271]: x=np.arange(10)
In [272]: y = np.asarray(x)
In [273]: id(x)
Out[273]: 2812424128
In [274]: id(y)
Out[274]: 2812424128 # same id
ndmin produces a view:
In [276]: y = np.array(x, ndmin=2, copy=False)
In [277]: y
Out[277]: array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
In [278]: id(x)
Out[278]: 2812424128
In [279]: id(y)
Out[279]: 2811135704 # different id
In [281]: x.__array_interface__['data']
Out[281]: (188551320, False)
In [282]: y.__array_interface__['data'] # same databuffer
Out[282]: (188551320, False)
ndmin on an array of the right dim already:
In [286]: x = np.arange(9).reshape(3,3)
In [287]: y = np.array(x, ndmin=2, copy=False)
In [288]: id(x)
Out[288]: 2810813120
In [289]: id(y)
Out[289]: 2810813120 # same id
Similar discussion with astype,
confused about the `copy` attribution of `numpy.astype`

numpy apply along n-spaces

I have a 4d array, and I would like to apply a function to each 2d slice taken by iterating over the last two dimensions. Viz, apply f(2d_array) to (x,y,0,0), and f(2d_array) to (x,y,0,1), etc etc. My function operates on the array in place, so the dimensions would be the same, but a general solution would return an array of shape (x',y',w,z), where w and z are the last two dimensions of the original array.
This could obviously be generalized to mD slices over an nD array.
Is there any built-in functionality that does this thing?
The 'basic' apply-along-axis model is to iterate on one axis, and pass the other to your function:
In [197]: def foo(x): # return same size
...: return x*2
...: np.array([foo(x) for x in np.arange(12).reshape(3,4)])
...:
Out[197]:
array([[ 0, 2, 4, 6],
[ 8, 10, 12, 14],
[16, 18, 20, 22]])
In [198]: def foo(x):
...: return x.sum() # return one less dim
...: np.array([foo(x) for x in np.arange(12).reshape(3,4)])
...:
Out[198]: array([ 6, 22, 38])
In [199]: def foo(x):
...: return x.sum(keepdims=True) # condense the dim
...: np.array([foo(x) for x in np.arange(12).reshape(3,4)])
...:
Out[199]:
array([[ 6],
[22],
[38]])
Your 4d problem can be massaged to fit this.
In [200]: arr_4d = np.arange(24).reshape(2,3,2,2)
In [201]: arr_2d = arr_4d.reshape(6,4).T
In [202]: res = np.array([foo(x) for x in arr_2d])
In [203]: res
Out[203]:
array([[60],
[66],
[72],
[78]])
In [204]: res.reshape(2,2)
Out[204]:
array([[60, 66],
[72, 78]])
which is the equivalent of doing:
In [205]: arr_4d[:,:,0,0].sum()
Out[205]: 60
In [206]: foo(arr_4d[:,:,0,0].ravel())
Out[206]: array([60])
apply_along_axis requires a function that takes a 1d array, but can be applied thus:
In [209]: np.apply_along_axis(foo,0,arr_4d.reshape(6,2,2))
Out[209]:
array([[[60, 66],
[72, 78]]])
foo could reshape its input to 2d, and pass it to a function that takes 2d. apply_along_index uses np.ndindex to generate the indices for the iteration axes.
In [212]: list(np.ndindex(2,2))
Out[212]: [(0, 0), (0, 1), (1, 0), (1, 1)]
np.vectorize normally works with a function that takes a scalar. But recent versions have a signature parameter, which I believe could be used to work with your case. It may require transposing the input so it iterates on the first two axes, passing the last two to function. See my answer at https://stackoverflow.com/a/46004266/901925.
None of these approaches offers a speed advantage.
Without reshaping or swapping, I can iterate with the help of ndindex.
Define a function that expects a 2d input:
def foo2(x):
return x.sum(axis=1, keepdims=True) # 2d
Index iterator for the last 2 dim of arr_4d:
In [260]: idx = np.ndindex(arr_4d.shape[-2:])
Do test calc to determine the shape of the return. vectorize and apply... do this sort of test.
In [261]: r1 = foo2(arr_4d[:,:,0,0]).shape
In [262]: r1
Out[262]: (2, 1)
The result array:
In [263]: res = np.zeros(r1+arr_4d.shape[-2:])
In [264]: res.shape
Out[264]: (2, 1, 2, 2)
Now iterate:
In [265]: for i,j in idx:
...: res[...,i,j] = foo2(arr_4d[...,i,j])
...:
In [266]: res
Out[266]:
array([[[[ 12., 15.],
[ 18., 21.]]],
[[[ 48., 51.],
[ 54., 57.]]]])
I guess you're looking for something like numpy.apply_over_axes coupled with a for loop to iterate other the varying axes.
I rolled my own. I'd be interested to know if there are any performance differences between this and #hpaulj's method and if there is reason to believe that writing a custom c module would be offer significant improvement. Of course #hpaulj's method is more general, since this is specific to my needing to just perform an operation on the array in place.
def apply_along_space(f, np_array, axes):
# apply the function f on each subspace given by iterating over the axes listed in axes, e.g. axes=(0,2)
for slic in itertools.product(*map(lambda ax: range(np_array.shape[ax]) if ax in axes else [slice(None,None,None)], range(len(np_array.shape)))):
f(np_array[slic])
return np_array

Categories

Resources