I checked the available interpolation method in scipy, but could not get the proper solution for my case.
assume i have 100 points whose coordinates are random,
e.g., their x and y positions are:
x=np.random.rand(100)*100
y=np.random.rand(100)*100
z = f(x,y) #the point value calculated by certain function
now i want to get the point value z of a new evenly sampled coordinates (xnew and y new)
xnew = range(100)
ynew = range(100)
how should i do this using bilinear sampling?
i know it is possible to do it point by point, e.g., find the 4 nearest random points, and do the interpolation, but there got to be some easier existing functions to do this
thanks alot!
Use scipy.interpolate.griddata. It does the exact thing you need
# griddata expects an ndarray for the interpolant coordinates
interpolants = numpy.array([xnew, ynew])
# defaults to linear interpolation
znew = scipy.interpolate.griddata((x, y), z, interpolants)
http://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.griddata.html#scipy.interpolate.griddata
Related
Given is a geometrical object, for simplification a semisphere with a certain radius. This is displayed as a 2D matrix with the Z data being the height. Assuming that I cut the object along any line, I want to calculate the area of the cut. My solution is to interpolate the semisphere using scipys RectBivariateSpline to accurately display it.
import numpy as np
import scipy.interpolate as intp
radius = 15.
gridsize = 0.5
spectrum = np.arange(-radius,radius+gridsize,gridsize)
X,Y = np.meshgrid(spectrum,spectrum)
Z = np.where(np.sqrt(X**2+Y**2)<=radius, np.sqrt(radius**2-np.sqrt(X**2+Y**2)**2), 0)
spline = intp.RectBivariateSpline(x = X[0,:], y = Y[:,0], z = Z)
#Example coordinates of the cut
x0 = -4.78
x = -6.73
y0 = -15.
y = 15.
However, the RectBivariateSpline only offers an area integral (which can be quickly checked by setting x0 = x or y0 = y). On the other hand the UnivariateSpline only takes in 1D array, which would only work if my cut happened to be along one specific vector of the matrix Z.
Since I want to perform this operation a few thousand times, I would need a comparably quick way to solve the integral (numerically or analytically doesn't matter as long as the error is somewhat negligible). Does anyone have an idea on how to do this?
It turned out, that, for my case, it was sufficient to sample the spline along my cut (using numpy's arange to gather equally spaced points) and then by integrating via the Simpson rule, which only requires a number of points with a sufficiently low distance (which can be controlled via arange's step parameter).
I've been having invalid input errors when working with scipy interp2d function. It turns out the problem comes from the bisplrep function, as showed here:
import numpy as np
from scipy import interpolate
# Case 1
x = np.linspace(0,1)
y = np.zeros_like(x)
z = np.ones_like(x)
tck = interpolate.bisplrep(x,y,z) # or interp2d
Returns: ValueError: Invalid inputs
It turned out the test data I was giving interp2d contained only one distinct value for the 2nd axis, as in the test sample above. The bisplrep function inside interp2d considers it as an invalid output:
This may be considered as an acceptable behaviour: interp2d & bisplrep expect a 2D grid, and I'm only giving them values along one line.
On a side note, I find the error message quite unclear. One could include a test in interp2d to deal with such cases: something along the lines of
if len(np.unique(x))==1 or len(np.unique(y))==1:
ValueError ("Can't build 2D splines if x or y values are all the same")
may be enough to detect this kind of invalid input, and raise a more explicit error message, or even directly call the more appropriate interp1d function (which works perfectly here)
I thought I had correctly understood the problem. However, consider the following code sample:
# Case 2
x = np.linspace(0,1)
y = x
z = np.ones_like(x)
tck = interpolate.bisplrep(x,y,z)
In that case, y being proportional to x, I'm also feeding bisplrep with data along one line. But, surprisingly, bisplrep is able to compute a 2D spline interpolation in that case. I plotted it:
# Plot
def plot_0to1(tck):
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
X = np.linspace(0,1,10)
Y = np.linspace(0,1,10)
Z = interpolate.bisplev(X,Y,tck)
X,Y = np.meshgrid(X,Y)
fig = plt.figure()
ax = Axes3D(fig)
ax.plot_surface(X, Y, Z,rstride=1, cstride=1, cmap=cm.coolwarm,
linewidth=0, antialiased=False)
plt.show()
plot_0to1(tck)
The result is the following:
where bisplrep seems to fill the gaps with 0's, as better showed when I extend the plot below:
Regarding of whether adding 0 is expected, my real question is: why does bisplrep work in Case 2 but not in Case 1?
Or, in other words: do we want it to return an error when 2D interpolation is fed with input along one direction only (Case 1 & 2 fail), or not? (Case 1 & 2 should return something, even if unpredicted).
I was originally going to show you how much of a difference it makes for 2d interpolation if your input data are oriented along the coordinate axes rather than in some general direction, but it turns out that the result would be even messier than I had anticipated. I tried using a random dataset over an interpolated rectangular mesh, and comparing that to a case where the same x and y coordinates were rotated by 45 degrees for interpolation. The result was abysmal.
I then tried doing a comparison with a smoother dataset: turns out scipy.interpolate.interp2d has quite a few issues. So my bottom line will be "use scipy.interpolate.griddata".
For instructive purposes, here's my (quite messy) code:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.cm as cm
n = 10 # rough number of points
dom = np.linspace(-2,2,n+1) # 1d input grid
x1,y1 = np.meshgrid(dom,dom) # 2d input grid
z = np.random.rand(*x1.shape) # ill-conditioned sample
#z = np.cos(x1)*np.sin(y1) # smooth sample
# first interpolator with interp2d:
fun1 = interp.interp2d(x1,y1,z,kind='linear')
# construct twice finer plotting and interpolating mesh
plotdom = np.linspace(-1,1,2*n+1) # for interpolation and plotting
plotx1,ploty1 = np.meshgrid(plotdom,plotdom)
plotz1 = fun1(plotdom,plotdom) # interpolated points
# construct 45-degree rotated input and interpolating meshes
rotmat = np.array([[1,-1],[1,1]])/np.sqrt(2) # 45-degree rotation
x2,y2 = rotmat.dot(np.vstack([x1.ravel(),y1.ravel()])) # rotate input mesh
plotx2,ploty2 = rotmat.dot(np.vstack([plotx1.ravel(),ploty1.ravel()])) # rotate plotting/interp mesh
# interpolate on rotated mesh with interp2d
# (reverse rotate by using plotx1, ploty1 later!)
fun2 = interp.interp2d(x2,y2,z.ravel(),kind='linear')
# I had to generate the rotated points element-by-element
# since fun2() accepts only rectangular meshes as input
plotz2 = np.array([fun2(xx,yy) for (xx,yy) in zip(plotx2.ravel(),ploty2.ravel())])
# try interpolating with griddata
plotz3 = interp.griddata(np.array([x1.ravel(),y1.ravel()]).T,z.ravel(),np.array([plotx1.ravel(),ploty1.ravel()]).T,method='linear')
plotz4 = interp.griddata(np.array([x2,y2]).T,z.ravel(),np.array([plotx2,ploty2]).T,method='linear')
# function to plot a surface
def myplot(X,Y,Z):
fig = plt.figure()
ax = Axes3D(fig)
ax.plot_surface(X, Y, Z,rstride=1, cstride=1,
linewidth=0, antialiased=False,cmap=cm.coolwarm)
plt.show()
# plot interp2d versions
myplot(plotx1,ploty1,plotz1) # Cartesian meshes
myplot(plotx1,ploty1,plotz2.reshape(2*n+1,-1)) # rotated meshes
# plot griddata versions
myplot(plotx1,ploty1,plotz3.reshape(2*n+1,-1)) # Cartesian meshes
myplot(plotx1,ploty1,plotz4.reshape(2*n+1,-1)) # rotated meshes
So here's a gallery of the results. Using random input z data, and interp2d, Cartesian (left) vs rotated interpolation (right):
Note the horrible scale on the right side, noting that the input points are between 0 and 1. Even its mother wouldn't recognize the data set. Note that there are runtime warnings during the evaluation of the rotated data set, so we're being warned that it's all crap.
Now let's do the same with griddata:
We should note that these figures are much closer to each other, and they seem to make way more sense than the output of interp2d. For instance, note the overshoot in the scale of the very first figure.
These artifacts always arise between input data points. Since it's still interpolation, the input points have to be reproduced by the interpolating function, but it's pretty weird that a linear interpolating function overshoots between data points. It's clear that griddata doesn't suffer from this issue.
Consider an even more clear case: the other set of z values, which are smooth and deterministic. The surfaces with interp2d:
HELP! Call the interpolation police! Already the Cartesian input case has inexplicable (well, at least by me) spurious features in it, and the rotated input case poses the threat of s͔̖̰͕̞͖͇ͣ́̈̒ͦ̀̀ü͇̹̞̳ͭ̊̓̎̈m̥̠͈̣̆̐ͦ̚m̻͑͒̔̓ͦ̇oͣ̐ͣṉ̟͖͙̆͋i͉̓̓ͭ̒͛n̹̙̥̩̥̯̭ͤͤͤ̄g͈͇̼͖͖̭̙ ̐z̻̉ͬͪ̑ͭͨ͊ä̼̣̬̗̖́̄ͥl̫̣͔͓̟͛͊̏ͨ͗̎g̻͇͈͚̟̻͛ͫ͛̅͋͒o͈͓̱̥̙̫͚̾͂.
So let's do the same with griddata:
The day is saved, thanks to The Powerpuff Girls scipy.interpolate.griddata. Homework: check the same with cubic interpolation.
By the way, a very short answer to your original question is in help(interp.interp2d):
| Notes
| -----
| The minimum number of data points required along the interpolation
| axis is ``(k+1)**2``, with k=1 for linear, k=3 for cubic and k=5 for
| quintic interpolation.
For linear interpolation you need at least 4 points along the interpolation axis, i.e. at least 4 unique x and y values have to be present to get a meaningful result. Check these:
nvals = 3 # -> RuntimeWarning
x = np.linspace(0,1,10)
y = np.random.randint(low=0,high=nvals,size=x.shape)
z = x
interp.interp2d(x,y,z)
nvals = 4 # -> no problem here
x = np.linspace(0,1,10)
y = np.random.randint(low=0,high=nvals,size=x.shape)
z = x
interp.interp2d(x,y,z)
And of course this all ties in to you question like this: it makes a huge difference if your geometrically 1d data set is along one of the Cartesian axes, or if it's in a general way such that the coordinate values assume various different values. It's probably meaningless (or at least very ill-defined) to try 2d interpolation from a geometrically 1d data set, but at least the algorithm shouldn't break if your data are along a general direction of the x,y plane.
Given a set of points describing some trajectory in the 2D plane, I would like to provide a smooth representation of this trajectory with local high order interpolation.
For instance, say we define a circle in 2D with 11 points in the figure below. I would like to add points in between each consecutive pair of points in order or produce a smooth trace. Adding points on every segment is easy enough, but it produces slope discontinuities typical for a "local linear interpolation". Of course it is not an interpolation in the classical sense, because
the function can have multiple y values for a given x
simply adding more points on the trajectory would be fine (no continuous representation is needed).
so I'm not sure what would be the proper vocabulary for this.
The code to produce this figure can be found below. The linear interpolation is performed with the lin_refine_implicit function. I'm looking for a higher order solution to produce a smooth trace and I was wondering if there is a way of achieving it with classical functions in Scipy? I have tried to use various 1D interpolations from scipy.interpolate without much success (again because of multiple y values for a given x).
The end goals is to use this method to provide a smooth GPS trajectory from discrete measurements, so I would think this should have a classical solution somewhere.
import numpy as np
import matplotlib.pyplot as plt
def lin_refine_implicit(x, n):
"""
Given a 2D ndarray (npt, m) of npt coordinates in m dimension, insert 2**(n-1) additional points on each trajectory segment
Returns an (npt*2**(n-1), m) ndarray
"""
if n > 1:
m = 0.5*(x[:-1] + x[1:])
if x.ndim == 2:
msize = (x.shape[0] + m.shape[0], x.shape[1])
else:
raise NotImplementedError
x_new = np.empty(msize, dtype=x.dtype)
x_new[0::2] = x
x_new[1::2] = m
return lin_refine_implicit(x_new, n-1)
elif n == 1:
return x
else:
raise ValueError
n = 11
r = np.arange(0, 2*np.pi, 2*np.pi/n)
x = 0.9*np.cos(r)
y = 0.9*np.sin(r)
xy = np.vstack((x, y)).T
xy_highres_lin = lin_refine_implicit(xy, n=3)
plt.plot(xy[:,0], xy[:,1], 'ob', ms=15.0, label='original data')
plt.plot(xy_highres_lin[:,0], xy_highres_lin[:,1], 'dr', ms=10.0, label='linear local interpolation')
plt.legend(loc='best')
plt.plot(x, y, '--k')
plt.xlabel('X')
plt.ylabel('Y')
plt.title('GPS trajectory')
plt.show()
This is called parametric interpolation.
scipy.interpolate.splprep provides spline approximations for such curves. This assumes you know the order in which the points are on the curve.
If you don't know which point comes after which on the curve, the problem becomes more difficult. I think in this case, the problem is called manifold learning, and some of the algorithms in scikit-learn may be helpful in that.
I would suggest you try to transform your cartesian coordinates into polar coordinates, that should allow you to use the standard scipy.interpolation without issues as you won't have the ambiguity of the x->y mapping anymore.
I can't quite wrap my head around on how to extrapolate from a dataset where the points are not ordered, i.e. be decreasing for 'x'. like so:
http://www.pic-host.org/images/2014/07/21/0b5ad6a11266f549.png
I got that I need to create a plot for the x and y values seperately. So the code that gets me this: (The points are ordered)
x = bananax
y = bananay
t = np.arange(x.shape[0], dtype=float)
t /= t[-1]
nt = np.linspace(0, 1, 100)
x1 = scipy.interpolate.spline(t, x, nt)
y1 = scipy.interpolate.spline(t, y, nt)
plt.plot(nt, x1, label='data x')
plt.plot(nt, y1, label='data y')
Now I got the interpolated splines. I guess I have to do the extrapolation for f(nt)=x1 and f(nt)=y1 respectivly. I get how to interpolate from the data with a simple linear regression but I'm missing how to get a more complex spline(?) extrapolated from it.
The aim is to let the extrapolated function follow the curvature of the datapoints. (At one end at least)
Cheers, and thanks!
I believe that you're on the right track in that you're creating a parametric curve (creating x(t) and y(t)) because the points are ordered. Part of issue seems to be that the spline function is giving you back discrete values rather than the form and parameters of the spline. scipy.optimize has some nice tools that will help you find functions rather than calculating points
If you've got any insight into the underlying process generating the data I suggest that you use that to help select a functional form for fitting. These more free-form methods will give you a degree of flexibility to do so.
Fit x(t) and y(t) and hold onto the resulting fitting functions. They'll be generated with data from t=0 to t=1 but nothing* will stop you from evaluating them outside that range.
I can recommend the following links for guidance on curve fitting procedure:
short: http://glowingpython.blogspot.com/2011/05/curve-fitting-using-fmin.html
long: http://nbviewer.ipython.org/gist/keflavich/4042018
*almost nothing
Thanks this got me on the right track. What worked for me was:
x = bananax
y = bananay
#------ fit a spline to the coordinates, x and y axis are interpolated towards t
t = np.arange(x.shape[0], dtype=float) #t is # of values
t /= t[-1] #t is now devided from 0 to 1
nt = np.linspace(0, 1, 100) #nt is array with values from 0 to 1 with 100 intermediate values
x1 = scipy.interpolate.spline(t, x, nt) #The x values where spline should estimate the y values
y1 = scipy.interpolate.spline(t, y, nt)
#------ create a new linear space for nnt in which an extrapolation from the interpolated spline will be made
nnt = np.linspace(-1, 1, 100) #values <0 are extrapolated (interpolation started at the tip(=0)
x1fit = np.polyfit(nt,x1,3) #fits a polynomial function of the nth order with the spline as input, output are the function parameters
y1fit = np.polyfit(nt,y1,3)
xpoly = np.poly1d(x1fit) #genereates the function based on the parameters obtained by polyfit
ypoly = np.poly1d(y1fit)
I would like to know how does numpy.gradient work.
I used gradient to try to calculate group velocity (group velocity of a wave packet is the derivative of frequencies respect to wavenumbers, not a group of velocities). I fed a 3 column array to it, the first 2 colums are x and y coords, the third column is the frequency of that point (x,y). I need to calculate gradient and I did expect a 2d vector, being gradient definition
df/dx*i+df/dy*j+df/dz*k
and my function only a function of x and y i did expect something like
df/dx*i+df/dy*j
But i got 2 arrays with 3 colums each, i.e. 2 3d vectors; at first i thought that the sum of the two would give me the vector i were searchin for but the z component doesn't vanish. I hope i've been sufficiently clear in my explanation. I would like to know how numpy.gradient works and if it's the right choice for my problem. Otherwise i would like to know if there's any other python function i can use.
What i mean is: I want to calculate gradient of an array of values:
data=[[x1,x2,x3]...[x1,x2,x3]]
where x1,x2 are point coordinates on an uniform grid (my points on the brillouin zone) and x3 is the value of frequency for that point. I give in input also steps for derivation for the 2 directions:
stepx=abs(max(unique(data[:,0])-min(unique(data[:,0]))/(len(unique(data[:,0]))-1)
the same for y direction.
I didn't build my data on a grid, i already have a grid and this is why kind examples given here in answers do not help me.
A more fitting example should have a grid of points and values like the one i have:
data=[]
for i in range(10):
for j in range(10):
data.append([i,j,i**2+j**2])
data=array(data,dtype=float)
gx,gy=gradient(data)
another thing i can add is that my grid is not a square one but has the shape of a polygon being the brillouin zone of a 2d crystal.
I've understood that numpy.gradient works properly only on a square grid of values, not what i'm searchin for. Even if i make my data as a grid that would have lots of zeroes outside of the polygon of my original data, that would add really high vectors to my gradient affecting (negatively) the precision of calculation. This module seems to me more a toy than a tool, it has severe limitations imho.
Problem solved using dictionaries.
You need to give gradient a matrix that describes your angular frequency values for your (x,y) points. e.g.
def f(x,y):
return np.sin((x + y))
x = y = np.arange(-5, 5, 0.05)
X, Y = np.meshgrid(x, y)
zs = np.array([f(x,y) for x,y in zip(np.ravel(X), np.ravel(Y))])
Z = zs.reshape(X.shape)
gx,gy = np.gradient(Z,0.05,0.05)
You can see that plotting Z as a surface gives:
Here is how to interpret your gradient:
gx is a matrix that gives the change dz/dx at all points. e.g. gx[0][0] is dz/dx at (x0,y0). Visualizing gx helps in understanding:
Since my data was generated from f(x,y) = sin(x+y) gy looks the same.
Here is a more obvious example using f(x,y) = sin(x)...
f(x,y)
and the gradients
update Let's take a look at the xy pairs.
This is the code I used:
def f(x,y):
return np.sin(x)
x = y = np.arange(-3,3,.05)
X, Y = np.meshgrid(x, y)
zs = np.array([f(x,y) for x,y in zip(np.ravel(X), np.ravel(Y))])
xy_pairs = np.array([str(x)+','+str(y) for x,y in zip(np.ravel(X), np.ravel(Y))])
Z = zs.reshape(X.shape)
xy_pairs = xy_pairs.reshape(X.shape)
gy,gx = np.gradient(Z,.05,.05)
Now we can look and see exactly what is happening. Say we wanted to know what point was associated with the value atZ[20][30]? Then...
>>> Z[20][30]
-0.99749498660405478
And the point is
>>> xy_pairs[20][30]
'-1.5,-2.0'
Is that right? Let's check.
>>> np.sin(-1.5)
-0.99749498660405445
Yes.
And what are our gradient components at that point?
>>> gy[20][30]
0.0
>>> gx[20][30]
0.070707731517679617
Do those check out?
dz/dy always 0 check.
dz/dx = cos(x) and...
>>> np.cos(-1.5)
0.070737201667702906
Looks good.
You'll notice they aren't exactly correct, that is because my Z data isn't continuous, there is a step size of 0.05 and gradient can only approximate the rate of change.