I am using a custom metric function with scipy's cdist function.
The custom function is something like
def cust_metric(u,v):
dist = np.cumsum(np.gcd(u,v) * k)
return dist
where k is an arbitrary coefficient.
Ideally, I was hoping to pass k as an argument when calling cdist like so:
d_ar = scipy.spatial.distance.cdist(arr1, arr2, metric=cust_metric(k=7))
However, this throws an error.
I was wondering if there is a simple solution that I may be missing?
A quick but non-elegant fix is to declare k as a global variable and adjust it when needed.
According to its documentation, the value for metric should be a callable (or a string for a particular fixed collection). In your case you could obtain that through
def cust_metric(k):
return lambda u, v: np.cumsum(np.gcd(u, v) * k)
I do imagine your actual callable would look somewhat different since the moment u and v are 2D arrays, the np.cumsum returns an array, while the callable is supposed to produce a scalar. For example:
In [25]: arr1 = np.array([[5, 7], [6, 1]])
In [26]: arr2 = np.array([[6, 7], [6, 1]])
In [28]: def cust_metric(k):
...: return lambda u, v: np.sqrt(np.sum((k*u - v)**2))
...:
In [29]: scipy.spatial.distance.cdist(arr1, arr2, metric=cust_metric(k=7))
Out[29]:
array([[51.03920062, 56.08029957],
[36. , 36.49657518]])
In [30]: scipy.spatial.distance.cdist(arr1, arr2, metric=cust_metric(k=1))
Out[30]:
array([[1. , 6.08276253],
[6. , 0. ]])
I know about numpy.interp and scipy.interpolate.interp1d, but I can't seem to figure out how to just do a very simple linear interpolation between two lists based on some kind of [0, 1] range. For example if I have lists
x = [2., 3., 4.]
y = [3., 4., 8.5]
I want a function that will accept 0.5 as an argument and give me
[2.5, 3.5, 6.25]
or will accept 0.1 as an argument and give me 5.25, etc.
[2.1, 3.1, 6.25]
Why am I blanking on this? The answer must be quite easy...
Thanks!
You can use zip to iterate over multiple lists simultaneously.
Put this in an explicit for loop or in a list comprehension (or pass it to map, as suggested in another answer). I think the code is quite self-explanatory:
def with_explicit_loop(x_list, y_list, alpha=0.5):
z_list = []
for a, b in zip(x_list, y_list):
z_list.append(a * (1 - alpha) + b * alpha)
return z_list
def with_list_comprehension(x_list, y_list, alpha=0.5):
return [a * (1 - alpha) + b * alpha for a, b in zip(x_list, y_list)]
Both functions are equivalent, but I think the first is slightly easier to read and the second is slightly faster.
Here's a one-liner:
In [10]: lin = lambda x, y, mult: map(lambda (a,b): a*(1-mult) + b*(mult), zip(x, y))
In [11]: lin(x, y, .1)
Out[11]: [2.1, 3.1, 4.45]
In [12]: lin(x, y, .5)
Out[12]: [2.5, 3.5, 6.25]
I have a lot of data to integrate over and would like to find a way of doing it all with just matrices, and would be willing to compromise on accuracy for a performance boost. What I have in mind is something like this:
import numpy
import scipy
a = np.array([1,2,3])
def func(x):
return x**2 + x
def func2(x):
global a
return a*x
def integrand(x):
return func(x)*func2(x)
integrated = quad(integrand, 0, 1)
So I am trying to integrate each element in the array that comes out of integrand.
I'm aware that there is a possibility of using numpy.vectorize() like this:
integrated = numpy.vectorize(scipy.integrate.quad)(integrand, 0, 1)
but I can't get that working. Is there a way to do this in python?
Solution
Well now that I learnt a bit more python I can answer this question if anyone happens to stable upon it and has the same question. The way to do it is to write the functions as though they are going to take scalar values, and not vectors as inputs. So follow from my code above, what we would have is something like
import numpy as np
import scipy.integrate.quad
a = np.array([1, 2, 3]) # arbitrary array, can be any size
def func(x):
return x**2 + x
def func2(x, a):
return a*x
def integrand(x, a):
return func(x)*func2(x, a)
def integrated(a):
integrated, tmp = scipy.integrate.quad(integrand, 0, 1, args = (a))
return integrated
def vectorizeInt():
global a
integrateArray = []
for i in range(len(a)):
integrate = integrated(a[i])
integrateArray.append(integrate)
return integrateArray
Not that the variable which you are integrating over must be the first input to the function. This is required for scipy.integrate.quad. If you are integrating over a method, it is the second argument after the typical self (i.e. x is integrated in def integrand(self, x, a):). Also the args = (a) is necessary to tell quad the value of a in the function integrand. If integrand has many arguments, say def integrand(x, a, b, c, d): you simply put the arguments in order in args. So that would be args = (a, b, c, d).
vectorize won't provide help with improving the performance of code that uses quad. To use quad, you'll have to call it separately for each component of the value returned by integrate.
For a vectorized but less accurate approximation, you can use numpy.trapz or scipy.integrate.simps.
Your function definition (at least the one shown in the question) is implemented using numpy functions that all support broadcasting, so given a grid of x values on [0, 1], you can do this:
In [270]: x = np.linspace(0.0, 1.0, 9).reshape(-1,1)
In [271]: x
Out[271]:
array([[ 0. ],
[ 0.125],
[ 0.25 ],
[ 0.375],
[ 0.5 ],
[ 0.625],
[ 0.75 ],
[ 0.875],
[ 1. ]])
In [272]: integrand(x)
Out[272]:
array([[ 0. , 0. , 0. ],
[ 0.01757812, 0.03515625, 0.05273438],
[ 0.078125 , 0.15625 , 0.234375 ],
[ 0.19335938, 0.38671875, 0.58007812],
[ 0.375 , 0.75 , 1.125 ],
[ 0.63476562, 1.26953125, 1.90429688],
[ 0.984375 , 1.96875 , 2.953125 ],
[ 1.43554688, 2.87109375, 4.30664062],
[ 2. , 4. , 6. ]])
That is, by making x an array with shape (n, 1), the value returned by integrand(x) has shape (n, 3). There is one column for each value in a.
You can pass that value to numpy.trapz() or scipy.integrate.simps(), using axis=0, to get the three approximations of the integrals. You'll probably want a finer grid:
In [292]: x = np.linspace(0.0, 1.0, 101).reshape(-1,1)
In [293]: np.trapz(integrand(x), x, axis=0)
Out[293]: array([ 0.583375, 1.16675 , 1.750125])
In [294]: simps(integrand(x), x, axis=0)
Out[294]: array([ 0.58333333, 1.16666667, 1.75 ])
Compare that to repeated calls to quad:
In [296]: np.array([quad(lambda t: integrand(t)[k], 0, 1)[0] for k in range(len(a))])
Out[296]: array([ 0.58333333, 1.16666667, 1.75 ])
Your function integrate (which I assume is just an example) is a cubic polynomial, for which Simpson's rule gives the exact result. In general, don't expect simps to give such an accurate answer.
quadpy (a project of mine) is fully vectorized. Install with
pip install quadpy
and then do
import numpy
import quadpy
def integrand(x):
return [numpy.sin(x), numpy.exp(x)] # ,...
res, err = quadpy.quad(integrand, 0, 1)
print(res)
print(err)
[0.45969769 1.71828183]
[1.30995437e-20 1.14828375e-19]
I have two numpy arrays
import numpy as np
x = np.linspace(1e10, 1e12, num=50) # 50 values
y = np.linspace(1e5, 1e7, num=50) # 50 values
x.shape # output is (50,)
y.shape # output is (50,)
I would like to create a function which returns an array shaped (50,50) such that the first x value x0 is evaluated for all y values, etc.
The current function I am using is fairly complicated, so let's use an easier example. Let's say the function is
def func(x,y):
return x**2 + y**2
How do I shape this to be a (50,50) array? At the moment, it will output 50 values. Would you use a for loop inside an array?
Something like:
np.array([[func(x,y) for i in x] for j in y)
but without using two for loops. This takes forever to run.
EDIT: It has been requested I share my "complicated" function. Here it goes:
There is a data vector which is a 1D numpy array of 4000 measurements. There is also a "normalized_matrix", which is shaped (4000,4000)---it is nothing special, just a matrix with entry values of integers between 0 and 1, e.g. 0.5567878. These are the two "given" inputs.
My function returns the matrix multiplication product of transpose(datavector) * matrix * datavector, which is a single value.
Now, as you can see in the code, I have initialized two arrays, x and y, which pass through a series of "x parameters" and "y parameters". That is, what does func(x,y) return for value x1 and value y1, i.e. func(x1,y1)?
The shape of matrix1 is (50, 4000, 4000). The shape of matrix2 is (50, 4000, 4000). Ditto for total_matrix.
normalized_matrix is shape (4000,4000) and id_mat is shaped (4000,4000).
normalized_matrix
print normalized_matrix.shape #output (4000,4000)
data_vector = datarr
print datarr.shape #output (4000,)
def func(x, y):
matrix1 = x [:, None, None] * normalized_matrix[None, :, :]
matrix2 = y[:, None, None] * id_mat[None, :, :]
total_matrix = matrix1 + matrix2
# transpose(datavector) * matrix * datavector
# by matrix multiplication, equals single value
return np.array([ np.dot(datarr.T, np.dot(total_matrix, datarr) ) ])
If I try to use np.meshgrid(), that is, if I try
x = np.linspace(1e10, 1e12, num=50) # 50 values
y = np.linspace(1e5, 1e7, num=50) # 50 values
X, Y = np.meshgrid(x,y)
z = func(X, Y)
I get the following value error: ValueError: operands could not be broadcast together with shapes (50,1,1,50) (1,4000,4000).
reshape in numpy as different meaning. When you start with a (100,) and change it to (5,20) or (10,10) 2d arrays, that is 'reshape. There is anumpy` function to do that.
You want to take 2 1d array, and use those to generate a 2d array from a function. This is like taking an outer product of the 2, passing all combinations of their values through your function.
Some sort of double loop is one way of doing this, whether it is with an explicit loop, or list comprehension. But speeding this up depends on that function.
For at x**2+y**2 example, it can be 'vectorized' quite easily:
In [40]: x=np.linspace(1e10,1e12,num=10)
In [45]: y=np.linspace(1e5,1e7,num=5)
In [46]: z = x[:,None]**2 + y[None,:]**2
In [47]: z.shape
Out[47]: (10, 5)
This takes advantage of numpy broadcasting. With the None, x is reshaped to (10,1) and y to (1,5), and the + takes an outer sum.
X,Y=np.meshgrid(x,y,indexing='ij') produces two (10,5) arrays that can be used the same way. Look at is doc for other parameters.
So if your more complex function can be written in a way that takes 2d arrays like this, it is easy to 'vectorize'.
But if that function must take 2 scalars, and return another scalar, then you are stuck with some sort of double loop.
A list comprehension form of the double loop is:
np.array([[x1**2+y1**2 for y1 in y] for x1 in x])
Another is:
z=np.empty((10,5))
for i in range(10):
for j in range(5):
z[i,j] = x[i]**2 + y[j]**2
This double loop can be sped up somewhat by using np.vectorize. This takes a user defined function, and returns one that can take broadcastable arrays:
In [65]: vprod=np.vectorize(lambda x,y: x**2+y**2)
In [66]: vprod(x[:,None],y[None,:]).shape
Out[66]: (10, 5)
Test that I've done in the past show that vectorize can improve on the list comprehension route by something like 20%, but the improvement is nothing like writing your function to work with 2d arrays in the first place.
By the way, this sort of 'vectorization' question has been asked many times on SO numpy. Beyond these broad examples, we can't help you without knowning more about that more complicated function. As long as it is a black box that takes scalars, the best we can help you with is np.vectorize. And you still need to understand broadcasting (with or without meshgrid help).
I think there is a better way, it is right on the tip of my tongue, but as an interim measure:
You are operating on 1x2 windows of a meshgrid. You can use as_strided from numpy.lib.stride_tricks to rearrange the meshgrid into two-element windows, then apply your function to the resultant array. I like to use a generic nd solution, sliding_windows (http://www.johnvinyard.com/blog/?p=268) (Not mine) to transform the array.
import numpy as np
a = np.array([1,2,3])
b = np.array([.1, .2, .3])
z= np.array(np.meshgrid(a,b))
def foo((x,y)):
return x+y
>>> z.shape
(2, 3, 3)
>>> t = sliding_window(z, (2,1,1))
>>> t
array([[ 1. , 0.1],
[ 2. , 0.1],
[ 3. , 0.1],
[ 1. , 0.2],
[ 2. , 0.2],
[ 3. , 0.2],
[ 1. , 0.3],
[ 2. , 0.3],
[ 3. , 0.3]])
>>> v = np.apply_along_axis(foo, 1, t)
>>> v
array([ 1.1, 2.1, 3.1, 1.2, 2.2, 3.2, 1.3, 2.3, 3.3])
>>> v.reshape((len(a), len(b)))
array([[ 1.1, 2.1, 3.1],
[ 1.2, 2.2, 3.2],
[ 1.3, 2.3, 3.3]])
>>>
This should be somewhat faster.
You may need to modify your function's argument signature.
If the link to the johnvinyard.com blog breaks, I've posted the the sliding_window implementation in other SO answers - https://stackoverflow.com/a/22749434/2823755
Search around and you'll find many other tricky as_strided solutions.
In response to your edited question:
normalized_matrix
print normalized_matrix.shape #output (4000,4000)
data_vector = datarr
print datarr.shape #output (4000,)
def func(x, y):
matrix1 = x [:, None, None] * normalized_matrix[None, :, :]
matrix2 = y[:, None, None] * id_mat[None, :, :]
total_matrix = matrix1 + matrix2
# transpose(datavector) * matrix * datavector
# by matrix multiplication, equals single value
# return np.array([ np.dot(datarr.T, np.dot(total_matrix, datarr))])
return np.einsum('j,ijk,k->i',datarr,total_matrix,datarr)
Since datarr is shape (4000,), transpose does nothing. I believe you want the result of the 2 dots to be shape (50,). I'm suggesting using einsum. But it can be done with tensordot, or I think even np.dot(np.dot(total_matrix, datarr),datarr). Test the expression with smaller arrays, focusing on getting the shapes right.
x = np.linspace(1e10, 1e12, num=50) # 50 values
y = np.linspace(1e5, 1e7, num=50) # 50 values
z = func(x,y)
# X, Y = np.meshgrid(x,y)
# z = func(X, Y)
X,Y is wrong. func takes x and y that are 1d. Notice how you expand the dimensions with [:, None, None]. Also you aren't creating a 2d array from an outer combination of x and y. None of your arrays in func is (50,50) or (50,50,...). The higher dimensions are provided by nomalied_matrix and id_mat.
When showing us the ValueError you should also indicate where in your code that occurred. Otherwise we have to guess, or recreate the code ourselves.
In fact when I run my edited func(X,Y), I get this error:
----> 2 matrix1 = x [:, None, None] * normalized_matrix[None, :, :]
3 matrix2 = y[:, None, None] * id_mat[None, :, :]
4 total_matrix = matrix1 + matrix2
5 # transpose(datavector) * matrix * datavector
ValueError: operands could not be broadcast together with shapes (50,1,1,50) (1,400,400)
See, the error occurs right at the start. normalized_matrix is expanded to (1,400,400) [I'm using smaller examples]. The (50,50) X is expanded to (50,1,1,50). x expands to (50,1,1), which broadcasts just fine.
To address the edit and the broadcasting error in the edit:
Inside your function you are adding dimensions to arrays to try to get them to broadcast.
matrix1 = x [:, None, None] * normalized_matrix[None, :, :]
This expression looks like you want to broadcast a 1d array with a 2d array.
The results of your meshgrid are two 2d arrays:
X,Y = np.meshgrid(x,y)
>>> X.shape, Y.shape
((50, 50), (50, 50))
>>>
When you try to use X in in your broadcasting expression the dimensions don't line up, that is what causes the ValueError - refer to the General Broadcasting Rules:
>>> x1 = X[:, np.newaxis, np.newaxis]
>>> nm = normalized_matrix[np.newaxis, :, :]
>>> x1.shape
(50, 1, 1, 50)
>>> nm.shape
(1, 4000, 4000)
>>>
You're on the right track with your list comprehension, you just need to add in an extra level of iteration:
np.array([[func(i,j) for i in x] for j in y])
I have 2 numpy arrays
X = [[2 3 6], [7 2 9], [7 1 4]]
a = [0 0.0005413307 0.0010949014 0.0015468832 0.0027740823 0.0033288284]
b = [0 0.0050251256 0.0100502513 0.0150753769 0.0201005025 0.0251256281]
new = []
for z in range(3):
new.append(interp1d(a, z[0], b, 'linear'))
I am getting error as :
if xi is not None and shape[axis] != len(xi):
TypeError: tuple indices must be integers, not str
I need to find the linear interpolation of the same. How can I find that?
I have values X with respect to time a but I want to find interpolation for time b.
Linear interpolation will give me 3 points as in X for every a[i] and b[i] ?
You put the arguments in wrong order. Flowing is the help message of interp1d, check it out:
interp1d(x, y, kind='linear', axis=-1, copy=True, bounds_error=True,fill_value=np.nan)
Interpolate a 1-D function.
x and y are arrays of values used to approximate some function f:
y = f(x) .
This class returns a function whose call method uses interpolation
to find the value of new points.
interp1d is a function whose return value is a new function. This new function can then be called with values in the given interpolation range:
from scipy.interpolate import interp1d
x1 = [ 0., 0.04007922, 0.04723573, 0.05440107, 0.06178645, 0.06837938]
x2 = [ 0., 0.00502513, 0.01005025, 0.01507538, 0.0201005, 0.02512563]
f = interp1d(x1, x2)
f([0.0, 0.01, 0.02, 0.03, 0.068])
#array([ 0. , 0.0012538 , 0.0025076 , 0.0037614 , 0.02483647])