scipy interp1d extrapolation method - python

I am trying to extrapolate values from some endpoints as shown in the image below
extrapolated value illustration
I have tried using the scipy interp1d method as shown below
from scipy import interpolate
x = [1,2,3,4]
y = [0,1,2,0]
f = interpolate.interp1d(x,y,fill_value='extrapolate')
print(f(4.3))
output : -0.5999999999999996
Though this is correct, I also need a second extrapolated value which is the intersection of X on segment i=1.The estimated value i am expecting is ~ 3.3 as seen from the graph in the image above.But I need get this programmatically,I am hoping there should be a way of returning multiple values from interp1d(.....) or something. Any help will be much appreciated.Thanks in advance

If you want to extrapolate based all but the last pair of values, you can just build a second interpolator, using x[:-1], y[:-1])

Related

Edge of a curve based out of numpy array

I am looking for some mathematical guidance, to help me find the index locations (red circles) of a curve as shown in the image below. The curve is just 1D numpy array. I tried scipy - gaussianfilter1d. I also tried np.gradient and I am not anywhere close to what I want to do. The gradient is abruptly changing, so a first order gradient should give what I am looking for. Then I realized the data is not smooth, and I tried smoothing by 'gaussianfilter1d'. Even then, I am unable to pick up where it changes. I have various types of these numpy arrays (same size, values ranging from 0 - 1), so the solution has to be applicable, and not dependent on the given data set. So I could not hardcode. Any thoughts would be much appreciated.
CSV file
First you get a smooth function from your data using scipy's UnivariateSpline. Then you plot the area where the absolute slope is say at least 1/4 of it's maximum.
from scipy.interpolate import UnivariateSpline
f= UnivariateSpline(np.arange(5500), y, k=3, s=0.3)
df = f.derivative()
plt.plot(x,f(x))
cond = np.abs(df(x)) > 0.25*np.max(np.abs(df(x)))
plt.scatter(x[cond],f(x[cond]), c='r')
Looks like what you are looking for is the first and last point of the marked ones. So you do
(x[cond].min(),f(x[cond].min()).item()), (x[cond].max(), f(x[cond].max()).item())
And your points are:
((1455, 0.20595740349084446), (4230, 0.1722999962943679))

vectorized interpolation on array with nans

I am trying to interpolate an image cube NDIM=(dim_frequ, dim_spaxel1, dim_spaxel1) along the frequency axis. The aim is to oversample the frequency space. The array may contain nans. It would, of course, be possible to run two for loops over the array but that's definitely too slow.
What I want in pseudo code:
import numpy as np
from scipy.interpolate import interp1d
dim_frequ, dim_spaxel1, dim_spaxel2 = 2559, 70, 70
cube = np.random.rand(dim_frequ, dim_spaxel1, dim_spaxel2)
cube.ravel()[np.random.choice(cube.size, 1000, replace=False)] = np.nan
wavelength = np.arange(1.31, 2.5894999999, 5e-4) # deltaf so that len(wavelength)==DIMfrequ
wavelength_over = np.arange(1.31, 2.5894999999, 5e-5)
cube_over = interp1d(wavelength, cube, axis=0, kind='quadratic', fill_value="extrapolate")(wavelength_over)
cube_over[np.isnan(cube_over)] # array([], dtype=float64)
I've tried np.interp which can only handle 1D data (?)
I've tried scipy.interpolate.interp1d which can in principle handle
arrays along a given axis, but returns nans (I assume because of the
nans in the array)
This actually works in the case the kind is = 'linear'. I'd actually like it a bit fancier though, as soon as I set kind to 'quadratic' it returns nans.
I've tried the scipy.interpolate.CubicSpline
which raises a ValueError again because of the nans.
Any ideas what else to try? I am quite free in terms of the type of the interpolation, but it shouldn't be too fancy, i.e. nothing crazier than a spline or a low order polynomial
So a couple of things.
First
This returns all nan because cube_over has no nan in it after the above
cube_over[np.isnan(cube_over)]
Since np.isnan(cube_over) is all False
Otherwise it appears to be interpolating everything in the wavelength_over array.
Second
scipy doesn't like nans (see the docs) Typical practice is to drop the nan's from your set of points to interpolate since it typically will not add any value to the interpolation function.
Although it appears to be working with you interp1d example above. I am guessing it is dropped them along the axis when it builds the interpolation function, but I am not sure.
Third
What value do you actually want to interpolate? I am not sure what your desired output / endpoint is. It appears that your code is working more or less as expected. When you are interpolating you wavelength_over array. Seeing as they are so similar (if not the same value as the wavelength array. I think you might benefit from a 2d interpolation method but again I do not have a good understanding of your goal.
See 2d interpolation options in scipy docs
Hope this helps.

Why is the output of linspace and interp1d always the same?

So I was doing my assignment and we are required to use interpolation (linear interpolation) for the same. We have been asked to use the interp1d package from scipy.interpolate and use it to generate new y values given new x values and old coordinates (x1,y1) and (x2,y2).
To get new x coordinates (lets call this x_new) I used np.linspace between (x1,x2) and the new y coordinates (lets call this y_new) I found out using interp1d function on x_new.
However, I also noticed that applying np.linspace on (y1,y2) generates the exact same values of y_new which we got from interp1d on x_new.
Can anyone please explain to me why this is so? And if this is true, is it always true?
And if this is always true why do we at all need to use the interp1d function when we can use the np.linspace in it's place?
Here is the code I wrote:
import scipy.interpolate as ip
import numpy as np
x = [-1.5, 2.23]
y = [0.1, -11]
x_new = np.linspace(start=x[0], stop=x[-1], num=10)
print(x_new)
y_new = np.linspace(start=y[0], stop=y[-1], num=10)
print(y_new)
f = ip.interp1d(x, y)
y_new2 = f(x_new)
print(y_new2) # y_new2 values always the same as y_new
The reason why you stumbled upon this is that you only use two points for an interpolation of a linear function. You have as an input two different x values with corresponding y values. You then ask interp1d to find a linear function f(x)=m*x +b that fits best your input data. As you only have two points as input data, there is an exact solution, because a linear function is exactly defined by two points. To see this: take piece of paper, draw two dots an then think about how many straight lines you can draw to connect these dots.
The linear function that you get from two input points is defined by the parameters m=(y1-y2)/(x1-x2) and b=y1-m*x1, where (x1,y1),(x2,y2) are your two inputs points (or elements in your x and y arrays in your code snippet.
So, now what does np.linspace(start, stop, num,...) do? It gives you num evenly spaced points between start and stop. These points are start, start + delta, ..., end. The step width delta is given by delta=(end-start)/(num - 1). The -1 comes from the fact that you want to include your endpoint. So the nth point in your interval will lie at xn=x1+n*(x2-x1)/(num-1). At what y values will these points end up after we apply our linear function from interp1d? Lets plug it in:
f(xn)=m*xn+b=(y1-y2)/(x1-x2)*(x1+n/(num-1)*(x2-x1)) + y1-(y1-y1)/(x1-x2)*x1. Simplifying this results in f(xn)=(y2-y1)*n/(num - 1) + y1. And this is exactly what you get from np.linspace(y1,y2,num), i.e. f(xn)=yn!
Now, does this always work? No! We made use of the fact that our linear function is defined by the two endpoints of the intervals we use in np.linspace. So this will not work in general. Try to add one more x value and one more y value in your input list and then compare the results.

Matlab's spline equivalent in Python, three inputs.

I'm converting a matlab script to python and I have it a roadblock.
In order to use cubic spline interpolation on a signal. The script uses the command spline with three inputs. f_o, c_signal and freq. so it looks like the following.
cav_sig_freq = spline(f_o, c_signal, freq)
f_o = 1x264, c_signal = 1x264 and freq = 1x264
From the documentation in matlab it reads that "s = spline(x,y,xq) returns a vector of interpolated values s corresponding to the query points in xq. The values of s are determined by cubic spline interpolation of x and y."
In python i'm struggling to find the correct python equivalent. Non of different interpolation functions I have found in the numpy and Scipy documentation let's use the third input like in Matlab.
Thanks for taking the time to read this. If there are any suggestion to how I can make it more clear, I'll be happy to do so.
Basically you will first need to generate something like an interpolant function, then give it your points. Using your variable names like this:
from scipy import interpolate
tck = interpolate.splrep(f_o, c_signal, s=0)
and then apply this tck to your points:
c_interp = interpolate.splev(freq, tck, der=0)
For more on this your can read this post.
Have you tried the InterpolatedUnivariateSpline within scipy.interpolate? If I understand the MatLab part correctly, then I think this will work.
import numpy as np
from scipy.interpolate import InterpolatedUnivariateSpline as ius
a = [1,2,3,4,5,6]
b = [r * 2 for r in a]
c = ius(a, b, k=1)
# what values do you want to query?
targets = [3.4, 2.789]
interpolated_values = c(targets)
It seems this may add one more step to your code than what MatLab provides, but I think it is what you want.

Get median value in each bin in a 2D grid

I have a 2-D array of coordinates and each coordinates correspond to a value z (like z=f(x,y)). Now I want to divide this whole 2-D coordinate set into, for example, 100 even bins. And calculate the median value of z in each bin. Then use scipy.interpolate.griddata function to create a interpolated z surface. How can I achieve it in python? I was thinking of using np.histogram2d but I think there is no median function in it. And I found myself have hard time understanding how scipy.stats.binned_statistic work. Can someone help me please. Thanks.
With numpy.histogram2d you can both count the number of data and sum it, thus it gives you the possibility to compute the average.
I would try something like this:
import numpy as np
coo=np.array([np.arange(1000),np.arange(1000)]).T #your array coordinates
def func(x, y): return x*(1-x)*np.sin(np.pi*x) / (1.5+np.sin(2*np.pi*y**2)**2)
z = func(coo[:,0], coo[:,1])
(n,ex,ey)=np.histogram2d(coo[:,0], coo[:,1],bins=100) # here we get counting
(tot,ex,ey)=np.histogram2d(coo[:,0], coo[:,1],bins=100,weights=z) # here we get total over z
average=tot/n
average=np.nan_to_num(average) #cure 0/0
print(average)
you'll need a few functions or one depending on how you want to structure things:
function to create the bins should take in your data, determine how big each bin is and return an array or array of arrays (also called lists in python).
Happy to help with this but would need more information about the data.
get the median of the bins:
Numpy (part of scipy) has a median function
http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.median.html
essentially the median on an array called
"bin"
would be:
$ numpy.median(bin)
Note: numpy.median does accept multiple arrays, so you could get the median for some or all of your bins at once. numpy.median(bins) which would return an array of the median for each bin
Updated
Not 100% on your example code, so here goes:
import numpy as np
# added some parenthesis as I wasn't sure of the math. also removed ;'s
def bincalc(x, y):
return x*(1-x)*(np.sin(np.pi*x))/(1.5+np.sin(2*(np.pi*y)**2)**2)
coo = np.random.rand(1000,2)
tcoo = coo[0]
a = []
for i in tcoo:
a.append(bincalc(coo[0],coo[1]))
z_med = np.median(a)
print(z_med)`

Categories

Resources