I have created a bspline using splprep as below from a set of points:
tck,uout = splprep([x,y],s=0.,k=2,per=False)
Now, I am trying to evaluate the derivative of a spline using:
dx,dy = splev(uout,tck,der=1)
I find that splev returns two lists for the derivative.
Given that the Spline is parametrized (say in u), does it return dx/du and dy/du ?
If not how to evaluate the derivative (dy/dx) properly ?
Yes, if der = 1 the the lists are the values of dx/du and dy/du at each point. The gradient is then dy/dx = dy/du / dx/du.
I'm slightly concerned about the splprep call: s is optional, but if defined it should have a value of about the same as the number of points (larger means smoother). per is an integer value, not a boolean. And cubic splines are better behaved than quadratic. http://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.splprep.html
Related
I am trying to interpolate a set of ordered pairs using Numpy's Lagrange Interpolation; I have done this before without incident.
This time, however, I keep getting "Division by zero error" and the interpolating polynomial comes out with infinite coefficientes.
I am aware data points must not be repeated due to the internal workings of Lagrange's Method, and they are not repeated.
Here is my code and the offending ordered pair, in numpy vector format.
Code:
x = out["x"].round(decimals=3)
x = np.array(x)
y = out["y"].round(decimals=3)
y = np.array(y)
print(x)
print(y)
pol = lagrange(x,y)
print(pol)
Ordered pair:
[273.324 285.579 309.292 279.573 297.427 290.681 276.621 293.586 283.463
284.674 273.904 288.064 280.125 294.269 288.51 285.898 273.419 273.023
281.754 281.546 283.21 303.399 297.392 293.359 306.404 356.285 302.487
280.586 299.487 302.487]
[ 0. 5.414 6.202 0. 9.331 11.52 0. 10.495 5.439 4.709
0. 4.916 0. 10.508 6.736 5.25 0. 0. 6.53 4.305
5.124 6.753 10.175 10.545 5.98 9.147 11.137 0. 8.764 9.57 ]
Lots of thanks in advance.
Why Lagrange Interpolation did not work for you.
You have the value 302.487 twice in your array x. I.e. you did repeat it.
Why Lagrange Interpolation is not what you want.
As Tim Roberts pointed out Lagrange interpolation is really not made for 20 points. The problem is that polynomials of high degree tend to overfit. Check out the following example from the wikipedia article of overfitting.
Figure 2. Noisy (roughly linear) data is fitted to a linear function and a polynomial function. Although the polynomial function is a perfect fit, the linear function can be expected to generalize better: if the two functions were used to extrapolate beyond the fitted data, the linear function should make better predictions.
Alternative Regression
There are at least two valid alternatives. One of them being what is recommended in the wikipedia article. If you know what type of function your data is ruffly coming from use regression to fit a function of that type to the data. In the case of the example above thats a linear function. If you want to do that check out scipy's curve fit.
Alternative Spline Interpolation
An other alternative is spline interpolation. Again from the wikipedia article on Spline Interpolation
Instead of fitting a single, high-degree polynomial to all of the values at once, spline interpolation fits low-degree polynomials to small subsets of the values, for example, fitting nine cubic polynomials between each of the pairs of ten points, instead of fitting a single degree-ten polynomial to all of them. Spline interpolation is often preferred over polynomial interpolation because the interpolation error can be made small even when using low-degree polynomials for the spline. Spline interpolation also avoids the problem of Runge's phenomenon, in which oscillation can occur between points when interpolating using high-degree polynomials.
There are just two little technical details that I want to point out. Point one is you points need to be ordered so I did that for you. And two scipy's UnivariateSpline has a smoothing parameter s that you need to choose. If you pick it small it sticks to the data like you're used to with Lagrange interpolation but if you make it bigger it well becomes smoother and hopefully generalizes better. Below I picked 2 different values for you to look at but you should probably play around with it yourself. I included a very small one so you see it can do what you're used to from Lagrange interpolation but wouldn't recommend it. Also you probably should use more data, preprocess it etc.. But that's not what the question was about.
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import UnivariateSpline
idx = np.argsort(x)
x = x[idx]
y = y[idx]
for s in [10,60]:
t = np.linspace(np.min(x), np.max(x), 10**4)
f = UnivariateSpline(x,y, s=s)
plt.scatter(x,y)
plt.plot(t,f(t))
plt.title(f'{s=}')
plt.show()
I have a cloud of data points (x,y) that I would like to interpolate and smooth.
Currently, I am using scipy :
from scipy.interpolate import interp1d
from scipy.signal import savgol_filter
spl = interp1d(Cloud[:,1], Cloud[:,0]) # interpolation
x = np.linspace(Cloud[:,1].min(), Cloud[:,1].max(), 1000)
smoothed = savgol_filter(spl(x), 21, 1) #smoothing
This is working pretty well, except that I would like to give some weights to the data points given at interp1d. Any suggestion for another function that is handling this ?
Basically, I thought that I could just multiply the occurrence of each point of the cloud according to its weight, but that is not very optimized as it increases a lot the number of points to interpolate, and slows down the algorithm ..
The default interp1d uses linear interpolation, i.e., it simply computes a line between two points. A weighted interpolation does not make much sense mathematically in such scenario - there is only one way in euclidean space to make a straight line between two points.
Depending on your goal, you can look into other methods of interpolation, e.g., B-splines. Then you can use scipy's scipy.interpolate.splrep and set the w argument:
w - Strictly positive rank-1 array of weights the same length as x and y. The weights are used in computing the weighted least-squares spline fit. If the errors in the y values have standard-deviation given by the vector d, then w should be 1/d. Default is ones(len(x)).
Suppose I have a curve, and then I estimate its gradient via finite differences by using np.gradient. Given an initial point x[0] and the gradient vector, how can I reconstruct the original curve? Mathematically I see its possible given this system of equations, but I'm not certain how to do it programmatically.
Here is a simple example of my problem, where I have sin(x) and I compute the numerical difference, which matches cos(x).
test = np.vectorize(np.sin)(x)
numerical_grad = np.gradient(test, 30./100)
analytical_grad = np.vectorize(np.cos)(x)
## Plot data.
ax.plot(test, label='data', marker='o')
ax.plot(numerical_grad, label='gradient')
ax.plot(analytical_grad, label='proof', alpha=0.5)
ax.legend();
I found how to do it, by using numpy's trapz function (trapezoidal rule integration).
Following up on the code I presented on the question, to reproduce the input array test, we do:
x = np.linspace(1, 30, 100)
integral = list()
for t in range(len(x)):
integral.append(test[0] + np.trapz(numerical_grad[:t+1], x[:t+1]))
The integral array then contains the results of the numerical integration.
You can restore initial curve using integration.
As life example: If you have function for position for 1D moving, you can get function for velocity as derivative (gradient)
v(t) = s(t)' = ds / dt
And having velocity, you can potentially get position (not all functions are integrable analytically - in this case numerical integration is used) with some unknown constant (shift) added - and with initial position you can restore exact value
s(T) = Integral[from 0 to T](v(t)dt) + s(0)
Is there a way to get scipy's interp1d (in linear mode) to return the derivative at each interpolated point? I could certainly write my own 1D interpolation routine that does, but presumably scipy's is internally in C and therefore faster, and speed is already a major issue.
I am ultimately feeding a munging of the interpolated function into a multi-dimensional minimization routine, so being able to pass analytic derivatives would speed things up a lot rather than having the minimization routine try to calculate them itself. And interp1d must be calculating them internally --- so can I access them?
Use UnivariateSpline instead of interp1d, and use the derivative method to generate the first derivative. The example at the manual page here is pretty self-explanatory.
You can combine scipy.interpolate.interp1d and scipy.misc.derivative, but there is something that must be taken into account:
When calling derivative method with some dx chosen as spacing, the derivative at x0 will be computed as the first order difference between x0-dx and x0+dx:
derivative(f, x0, dx) = (f(x0+dx) - f(x0-dx)) / (2 * dx)
As a result, you can't use derivative closer than dx to your interpolated function range limits, because f will raise a ValueError telling you that your interpolated function is not defined there.
So, what can you do closer than dx to those range limits?
If f is defined inside [xmin, xmax] (range):
At the range limits you can move x0 a bit in:
x0 = xmin + dx or x0 = xmax - dx
For other points you can refine dx (make it smaller).
Uniform function outside interpolation range:
If your interpolated function happens to be uniform outside the interpolation range:
f(x0 < xmin) = f(x0 > xmax) = f_out
You may define your interpolated function like this:
f = interp1d(x, y, bound_errors=False, fill_value=f_out)
Linear interpolation case:
For the linear case it might be cheaper to calculate just once the differences between points:
import numpy as np
df = np.diff(y) / np.diff(x)
This way you can access them as the components of an array.
As far as I know, internally interp1d uses a BSpline.
The BSpline has a derivative which gives the nuth derivative.
So for an interploation f = interp1d(x, y) you can use
fd1 = f._spline.derivative(nu=1)
However, be carful as always when using functions with leading underscore.
I don't think bounds are checked if you choose values outside the interpolation region. It also seems, that BSpline appends a tailing dimension, so you have to write
val = fd1(0).item()
val_arr = fd1(np.array([0, 1]))[..., 0]
I want numerically compute the FFT on a numpy array Y. For testing, I'm using the Gaussian function Y = exp(-x^2). The (symbolic) Fourier Transform is Y' = constant * exp(-k^2/4).
import numpy
X = numpy.arange(-100,100)
Y = numpy.exp(-(X/5.0)**2)
The naive approach fails:
from numpy.fft import *
from matplotlib import pyplot
def plotReIm(x,y):
f = pyplot.figure()
ax = f.add_subplot(111)
ax.plot(x, numpy.real(y), 'b', label='R()')
ax.plot(x, numpy.imag(y), 'r:', label='I()')
ax.plot(x, numpy.abs(y), 'k--', label='abs()')
ax.legend()
Y_k = fftshift(fft(Y))
k = fftshift(fftfreq(len(Y)))
plotReIm(k,Y_k)
real(Y_k) jumps between positive and negative values, which correspond to a jumping phase, which is not present in the symbolic result. This is certainly not desirable. (The result is technically correct in the sense that abs(Y_k) gives the amplitudes as expected ifft(Y_k) is Y.)
Here, the function fftshift() renders the array k monotonically increasing and changes Y_k accordingly. The pairs zip(k, Y_k) are not changed by applying this operation to both vectors.
This changes appears to fix the issue:
Y_k = fftshift(fft(ifftshift(Y)))
k = fftshift(fftfreq(len(Y)))
plotReIm(k,Y_k)
Is this the correct way to employ the fft() function if monotonic Y and Y_k are required?
The reverse operation of the above is:
Yx = fftshift(ifft(ifftshift(Y_k)))
x = fftshift(fftfreq(len(Y_k), k[1] - k[0]))
plotReIm(x,Yx)
For this case, the documentation clearly states that Y_k must be sorted compatible with the output of fft() and fftfreq(), which we can achieve by applying ifftshift().
Those questions have been bothering me for a long time: Are the output and input arrays of both fft() and ifft() always such that a[0] should contain the zero frequency term, a[1:n/2+1] should contain the positive-frequency terms, and a[n/2+1:] should contain the negative-frequency terms, in order of decreasingly negative frequency [numpy reference], where 'frequency' is the independent variable?
The answer on Fourier Transform of a Gaussian is not a Gaussian does not answer my question.
The FFT can be thought of as producing a set vectors each with an amplitude and phase. The fft_shift operation changes the reference point for a phase angle of zero, from the edge of the FFT aperture, to the center of the original input data vector.
The phase (and thus the real component of the complex vector) of the result is sometimes less "jumpy" when this is done, especially if some input function is windowed such that it is discontinuous around the edges of the FFT aperture. Or if the input is symmetric around the center of the FFT aperture, the phase of the FFT result will always be zero after an fft_shift.
An fft_shift can be done by a vector rotate of N/2, or by simply flipping alternating sign bits in the FFT result, which may be more CPU dcache friendly.
The definition for the output of fft (and ifft) is here: http://docs.scipy.org/doc/numpy/reference/routines.fft.html#background-information
This is what the routines compute, no more and no less. Observe that the discrete Fourier transform is rather different from the continuous Fourier transform. For a densely sampled function there is a relation between the two, but the relation also involves phase factors and scaling in addition to fftshift. This is the cause of the oscillations you see in your plot. The necessary phase factor you can work out yourself from the above mathematical formula for the DFT.