I'm trying to interpolate a function at arbitrary points and I have the function values at Chebyshev extreme points. I use the real values from Fast Fourier Transform to compute the Chebyshev coefficients. Then I scale them with 2/N and then I use the polynomial library to evaluate the series of chebyshev polynomials at a set of points. This produces the wrong function approximation. Where am I going wrong?
import numpy as np
import matplotlib.pyplot as plt
# Define the number of
# Chebyshev extreme points
N = 10
# Define the function to be
# approximated
def f(x):
return x**2
# Evaluate the function at the
# Chebyshev extreme points
x = np.cos(np.arange(N) * np.pi / N)
y = f(x)
# Compute the discrete Fourier
# transform (DFT) of the function
# values using the FFT algorithm
DFT = np.fft.fft(y).real
# Compute the correct scaling
# factor
scaling_factor = 2/N
# Scale the DFT coefficients by
# the correct scaling factor
chebyshev_coefficients = scaling_factor * DFT
# Use Chebval to
# evaluate the approximated
# polynomial at a set of points
x_eval = np.linspace(-1, 1, 100)
y_approx = np.polynomial.chebyshev.chebval(x_eval, chebyshev_coefficients[::-1])
# Plot the original function
# and the approximated function
plt.plot(x, y, 'o',
label='Original function')
plt.plot(x_eval, y_approx, '-',
label='Approximated function')
plt.legend()
plt.show()
Related
The analytical Fourier transform of a sinusoidal signal is purely imginary. However, when numerically computing discrete Fourier transform, the result is not.
Tldr: Find all answers to this question here.
Consider therefore the following code
import matplotlib.pyplot as plt
import numpy as np
from scipy.fftpack import fft, fftfreq
f_s = 200 # Sampling rate = number of measurements per second in [Hz]
t = np.arange(0,10000, 1 / f_s)
N = len(t)
A = 4 # Amplitude of sinus signal
x = A * np.sin(t)
X = fft(x)[1:N//2]
freqs = (fftfreq(len(x)) * f_s)[1:N//2]
fig, (ax1,ax2) = plt.subplots(2,1, sharex = True)
ax1.plot(freqs, X.real, label = "$\Re[X(\omega)]$")
ax1.plot(freqs, X.imag, label = "$\Im[X(\omega)]$")
ax1.set_title("Discrete Fourier Transform of $x(t) = A \cdot \sin(t)$")
ax1.legend()
ax1.grid(True)
ax2.plot(freqs, np.abs(X), label = "$|X(\omega)|$")
ax2.legend()
ax2.set_xlabel("Frequency $\omega$")
ax2.set_yscale("log")
ax2.grid(True, which = "both")
ax2.set_xlim(0.15,0.175)
plt.show()
Clearly, the absolute value |X(w)| can be used as good approximation to the analytical result. However, the imaginary and real value of the function X(w) are different. Already another question on SO mentioned this fact, but did not explain why. So I can only use the absolute value and the phase?
Another question would be how the Amplitude is related to the numerical result. Mathematically speaking it should be the integral under the curve of |X(w)| divided by normalization (which, as far as I understood, should be given by N), i.e. approximately by
A_approx = np.sum(np.abs(X)) / N
print(f"Numerical value: {A_approx:.1f}, Correct value: {A:.1f}")
Numerical value: 13.5, Correct value: 4.0
This does not seem to be the case. Any insights? Ideas?
Related questions which did not help are here and here.
An FFT does not produce the result you expect because it is finite in length, and thus more similar to the Fourier Transform of a rectangular window on your sinusoid. The length and placement of this rectangular window will affect the phase and amplitude of the FFT result.
I compare fitting with optimize.curve_fit and optimize.least_squares. With curve_fit I get the covariance matrix pcov as an output and I can calculate the standard deviation errors for my fitted variables by that:
perr = np.sqrt(np.diag(pcov))
If I do the fitting with least_squares, I do not get any covariance matrix output and I am not able to calculate the standard deviation errors for my variables.
Here's my example:
#import modules
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import least_squares
noise = 0.5
N = 100
t = np.linspace(0, 4*np.pi, N)
# generate data
def generate_data(t, freq, amplitude, phase, offset, noise=0, n_outliers=0, random_state=0):
#formula for data generation with noise and outliers
y = np.sin(t * freq + phase) * amplitude + offset
rnd = np.random.RandomState(random_state)
error = noise * rnd.randn(t.size)
outliers = rnd.randint(0, t.size, n_outliers)
error[outliers] *= 10
return y + error
#generate data
data = generate_data(t, 1, 3, 0.001, 0.5, noise, n_outliers=10)
#initial guesses
p0=np.ones(4)
x0=np.ones(4)
# create the function we want to fit
def my_sin(x, freq, amplitude, phase, offset):
return np.sin(x * freq + phase) * amplitude + offset
# create the function we want to fit for least-square
def my_sin_lsq(x, t, y):
# freq=x[0]
# phase=x[1]
# amplitude=x[2]
# offset=x[3]
return (np.sin(t*x[0]+x[2])*x[1]+ x[3]) - y
# now do the fit for curve_fit
fit = curve_fit(my_sin, t, data, p0=p0)
print 'Curve fit output:'+str(fit[0])
#now do the fit for least_square
res_lsq = least_squares(my_sin_lsq, x0, args=(t, data))
print 'Least_squares output:'+str(res_lsq.x)
# we'll use this to plot our first estimate. This might already be good enough for you
data_first_guess = my_sin(t, *p0)
#data_first_guess_lsq = x0[2]*np.sin(t*x0[0]+x0[1])+x0[3]
data_first_guess_lsq = my_sin(t, *x0)
# recreate the fitted curve using the optimized parameters
data_fit = my_sin(t, *fit[0])
data_fit_lsq = my_sin(t, *res_lsq.x)
#calculation of residuals
residuals = data - data_fit
residuals_lsq = data - data_fit_lsq
ss_res = np.sum(residuals**2)
ss_tot = np.sum((data-np.mean(data))**2)
ss_res_lsq = np.sum(residuals_lsq**2)
ss_tot_lsq = np.sum((data-np.mean(data))**2)
#R squared
r_squared = 1 - (ss_res/ss_tot)
r_squared_lsq = 1 - (ss_res_lsq/ss_tot_lsq)
print 'R squared curve_fit is:'+str(r_squared)
print 'R squared least_squares is:'+str(r_squared_lsq)
plt.figure()
plt.plot(t, data)
plt.title('curve_fit')
plt.plot(t, data_first_guess)
plt.plot(t, data_fit)
plt.plot(t, residuals)
plt.figure()
plt.plot(t, data)
plt.title('lsq')
plt.plot(t, data_first_guess_lsq)
plt.plot(t, data_fit_lsq)
plt.plot(t, residuals_lsq)
#error
perr = np.sqrt(np.diag(fit[1]))
print 'The standard deviation errors for curve_fit are:' +str(perr)
I would be very thankful for any help, best wishes
ps: I got a lot of input from this source and used part of the code Robust regression
The result of optimize.least_squares has a parameter inside of it called jac. From the documentation:
jac : ndarray, sparse matrix or LinearOperator, shape (m, n)
Modified Jacobian matrix at the solution, in the sense that J^T J is a Gauss-Newton approximation of the Hessian of the cost function. The type is the same as the one used by the algorithm.
This can be used to estimate the Covariance Matrix of the parameters using the following formula: Sigma = (J'J)^-1.
J = res_lsq.jac
cov = np.linalg.inv(J.T.dot(J))
To find the variance of the parameters one can then use:
var = np.sqrt(np.diagonal(cov))
The SciPy program optimize.least_squares requires the user to provide in input a function fun(...) which returns a vector of residuals. This is typically defined as
residuals = (data - model)/sigma
where data and model are vectors with the data to fit and the corresponding model predictions for each data point, while sigma is the 1σ uncertainty in each data value.
In this situation, and assuming one can trust the input sigma uncertainties, one can use the output Jacobian matrix jac returned by least_squares to estimate the covariance matrix. Moreover, assuming the covariance matrix is diagonal, or simply ignoring non-diagonal terms, one can also obtain the 1σ uncertainty perr in the model parameters (often called "formal errors") as follows (see Section 15.4.2 of Numerical Recipes 3rd ed.)
import numpy as np
from scipy import linalg, optimize
res = optimize.least_squares(...)
U, s, Vh = linalg.svd(res.jac, full_matrices=False)
tol = np.finfo(float).eps*s[0]*max(res.jac.shape)
w = s > tol
cov = (Vh[w].T/s[w]**2) # Vh[w] # robust covariance matrix
perr = np.sqrt(np.diag(cov)) # 1sigma uncertainty on fitted parameters
The above code to obtain the covariance matrix is formally the same as the following simpler one (as suggested by Alex), but the above has the major advantage that it works even when the Jacobian is close to degenerate, which is a common occurrence in real-world least-squares fits
cov = linalg.inv(res.jac.T # res.jac) # covariance matrix when jac not degenerate
If one does not trust the input uncertainties sigma, one can still assume that the fit is good, to estimate the data uncertainties from the fit itself. This corresponds to assuming chi**2/DOF=1, where DOF is the number of degrees of freedom. In this case, one can use the following lines to rescale the covariance matrix before computing the uncertainties
chi2dof = np.sum(res.fun**2)/(res.fun.size - res.x.size)
cov *= chi2dof
perr = np.sqrt(np.diag(cov)) # 1sigma uncertainty on fitted parameters
My Python programming problem is the following:
I want to create an array of measurement results. Each result can be described as a normal distribution for which the mean value is the measurement result itself and the standard deviation is its uncertainty.
Pseudo code could be:
x1 = N(result1, unc1)
x2 = N(result2, unc2)
...
x = array(x1, x2, ..., xN)
Than I would like to calculate the FFT of x:
f = numpy.fft.fft(x)
What I want is that the uncertainty of the measurements contained in x is propagated through the FFT calculation so that f is an array of amplitudes along with their uncertainty like this:
f = (a +/- unc(a), b +/- unc(b), ...)
Can you suggest me a way to do this?
Each Fourier coefficient computed by the discrete Fourier transform
of the array x is a linear combination of the elements of x; see
the formula for X_k on the wikipedia page on the discrete Fourier transform,
which I'll write as
X_k = sum_(n=0)^(n=N-1) [ x_n * exp(-i*2*pi*k*n/N) ]
(That is, X is the discrete Fourier transform of x.)
If x_n is normally distributed with mean mu_n and variance sigma_n**2,
then a little bit of algebra shows that the variance of X_k is the sum
of the variances of x_n
Var(X_k) = sum_(n=0)^(n=N-1) sigma_n**2
In other words, the variance is the same for each Fourier coefficent;
it is the sum of the variances of the measurements in x.
Using your notation, where unc(z) is the standard deviation of z,
unc(X_0) = unc(X_1) = ... = unc(X_(N-1)) = sqrt(unc(x1)**2 + unc(x2)**2 + ...)
(Note that the distribution of the magnitude of X_k is the Rice distribution.)
Here's a script that demonstrates this result. In this example, the standard
deviation of the x values increase linearly from 0.01 to 0.5.
import numpy as np
from numpy.fft import fft
import matplotlib.pyplot as plt
np.random.seed(12345)
n = 16
# Create 'x', the vector of measured values.
t = np.linspace(0, 1, n)
x = 0.25*t - 0.2*t**2 + 1.25*np.cos(3*np.pi*t) + 0.8*np.cos(7*np.pi*t)
x[:n//3] += 3.0
x[::4] -= 0.25
x[::3] += 0.2
# Compute the Fourier transform of x.
f = fft(x)
num_samples = 5000000
# Suppose the std. dev. of the 'x' measurements increases linearly
# from 0.01 to 0.5:
sigma = np.linspace(0.01, 0.5, n)
# Generate 'num_samples' arrays of the form 'x + noise', where the standard
# deviation of the noise for each coefficient in 'x' is given by 'sigma'.
xn = x + sigma*np.random.randn(num_samples, n)
fn = fft(xn, axis=-1)
print("Sum of input variances: %8.5f" % (sigma**2).sum())
print()
print("Variances of Fourier coefficients:")
np.set_printoptions(precision=5)
print(fn.var(axis=0))
# Plot the Fourier coefficient of the first 800 arrays.
num_plot = min(num_samples, 800)
fnf = fn[:num_plot].ravel()
clr = "#4080FF"
plt.plot(fnf.real, fnf.imag, 'o', color=clr, mec=clr, ms=1, alpha=0.3)
plt.plot(f.real, f.imag, 'kD', ms=4)
plt.grid(True)
plt.axis('equal')
plt.title("Fourier Coefficients")
plt.xlabel("$\Re(X_k)$")
plt.ylabel("$\Im(X_k)$")
plt.show()
The printed output is
Sum of input variances: 1.40322
Variances of Fourier coefficients:
[ 1.40357 1.40288 1.40331 1.40206 1.40231 1.40302 1.40282 1.40358
1.40376 1.40358 1.40282 1.40302 1.40231 1.40206 1.40331 1.40288]
As expected, the sample variances of the Fourier coefficients are
all (approximately) the same as the sum of the measurement variances.
Here's the plot generated by the script. The black diamonds are the
Fourier coefficients of a single x vector. The blue dots are the
Fourier coefficients of 800 realizations of x + noise. You can see that
the point clouds around each Fourier coefficent are roughly symmetric
and all the same "size" (except, of course, for the real coeffcients,
which show up in this plot as horizontal lines on the real axis).
I have a set of points (x,y) as two vectors
x,y for example:
from pylab import *
x = sorted(random(30))
y = random(30)
plot(x,y, 'o-')
Now I would like to smooth this data with a Gaussian and evaluate it only at certain (regularly spaced) points on the x-axis. lets say for:
x_eval = linspace(0,1,11)
I got the tip that this method is called a "Gaussian sum filter", but so far I have not found any implementation in numpy/scipy for that, although it seems like a standard problem at first glance.
As the x values are not equally spaced I can't use the scipy.ndimage.gaussian_filter1d.
Usually this kind of smoothing is done going through furrier space and multiplying with the kernel, but I don't really know if this will be possible with irregular spaced data.
Thanks for any ideas
This will blow up for very large datasets, but the proper calculaiton you are asking for would be done as follows:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(0) # for repeatability
x = np.random.rand(30)
x.sort()
y = np.random.rand(30)
x_eval = np.linspace(0, 1, 11)
sigma = 0.1
delta_x = x_eval[:, None] - x
weights = np.exp(-delta_x*delta_x / (2*sigma*sigma)) / (np.sqrt(2*np.pi) * sigma)
weights /= np.sum(weights, axis=1, keepdims=True)
y_eval = np.dot(weights, y)
plt.plot(x, y, 'bo-')
plt.plot(x_eval, y_eval, 'ro-')
plt.show()
I'll preface this answer by saying that this is more of a DSP question than a programming question...
...that being said there, there is a simple two step solution to your problem.
Step 1: Resample the data
So to illustrate this we can create a random data set with unequal sampling:
import numpy as np
x = np.cumsum(np.random.randint(0,100,100))
y = np.random.normal(0,1,size=100)
This gives something like:
We can resample this data using simple linear interpolation:
nx = np.arange(x.max()) # choose new x axis sampling
ny = np.interp(nx,x,y) # generate y values for each x
This converts our data to:
Step 2: Apply filter
At this stage you can use some of the tools available through scipy to apply a Gaussian filter to the data with a given sigma value:
import scipy.ndimage.filters as filters
fx = filters.gaussian_filter1d(ny,sigma=100)
Plotting this up against the original data we get:
The choice of the sigma value determines the width of the filter.
Based on #Jaime's answer I wrote a function that implements this with some additional documentation and the ability to discard estimates far from the datapoints.
I think confidence intervals could be obtained on this estimate by bootstrapping, but I haven't done this yet.
def gaussian_sum_smooth(xdata, ydata, xeval, sigma, null_thresh=0.6):
"""Apply gaussian sum filter to data.
xdata, ydata : array
Arrays of x- and y-coordinates of data.
Must be 1d and have the same length.
xeval : array
Array of x-coordinates at which to evaluate the smoothed result
sigma : float
Standard deviation of the Gaussian to apply to each data point
Larger values yield a smoother curve.
null_thresh : float
For evaluation points far from data points, the estimate will be
based on very little data. If the total weight is below this threshold,
return np.nan at this location. Zero means always return an estimate.
The default of 0.6 corresponds to approximately one sigma away
from the nearest datapoint.
"""
# Distance between every combination of xdata and xeval
# each row corresponds to a value in xeval
# each col corresponds to a value in xdata
delta_x = xeval[:, None] - xdata
# Calculate weight of every value in delta_x using Gaussian
# Maximum weight is 1.0 where delta_x is 0
weights = np.exp(-0.5 * ((delta_x / sigma) ** 2))
# Multiply each weight by every data point, and sum over data points
smoothed = np.dot(weights, ydata)
# Nullify the result when the total weight is below threshold
# This happens at evaluation points far from any data
# 1-sigma away from a data point has a weight of ~0.6
nan_mask = weights.sum(1) < null_thresh
smoothed[nan_mask] = np.nan
# Normalize by dividing by the total weight at each evaluation point
# Nullification above avoids divide by zero warning shere
smoothed = smoothed / weights.sum(1)
return smoothed
I try to validate my understanding of Numpy's FFT with an example: the Fourier transform of exp(-pi*t^2) should be exp(-pi*f^2) when no scaling is applied on the direct transform.
However, I find that to obtain this result I need to multiply the result of FFT by a factor dt, which is the time interval between two sample points on my function. I don't understand why. Can anybody help ?
Here is a sample code:
# create data
N = 4097
T = 100.0
t = linspace(-T/2,T/2,N)
f = exp(-pi*t**2)
# perform FT and multiply by dt
dt = t[1]-t[0]
ft = fft(f) * dt
freq = fftfreq( N, dt )
freq = freq[:N/2+1]
# plot results
plot(freq,abs(ft[:N/2+1]),'o')
plot(freq,exp(-pi*freq**2),'r')
legend(('numpy fft * dt', 'exact solution'),loc='upper right')
xlabel('f')
ylabel('amplitude')
xlim(0,1.4)
Be careful, you are not computing the continuous time Fourier transform, computers work with discrete data, so does Numpy, if you take a look to numpy.fft.fft documentation it says:
numpy.fft.fft(a, n=None, axis=-1)[source]
Compute the one-dimensional discrete Fourier Transform.
This function computes the one-dimensional n-point discrete Fourier Transform (DFT) with the efficient Fast Fourier Transform (FFT) algorithm
That means that your are computing the DFT which is defined by equation:
the continuous time Fourier transform is defined by:
And if you do the maths to look for the relationship between them:
As you can see there is a constant factor 1/N which is exactly your scale value dt (x[n] - x[n-1] where n is in [0,T] interval is equivalent to 1/N).
Just a comment on your code, it is not a good practice to import everything from numpy import * instead use:
import numpy as np
import matplotlib.pyplot as plt
# create data
N = 4097
T = 100.0
t = np.linspace(-T/2,T/2,N)
f = np.exp(-np.pi*t**2)
# perform FT and multiply by dt
dt = t[1]-t[0]
ft = np.fft.fft(f) * dt
freq = np.fft.fftfreq(N, dt)
freq = freq[:N/2+1]
# plot results
plt.plot(freq, np.abs(ft[:N/2+1]),'o')
plt.plot(freq, np.exp(-np.pi * freq**2),'r')
plt.legend(('numpy fft * dt', 'exact solution'), loc='upper right')
plt.xlabel('f')
plt.ylabel('amplitude')
plt.xlim([0, 1.4])
plt.show()