The Gaussian basis function is given by the following equation.
Essentially I am creating a data set made up of N = 25 observations of my x_n ranging from [0 1] and the my target value function_s_noise. Since the 24 gaussian basis functions will be used for a regression model I created a design matrix phi and when I plot them I should expect this result. However this is what I am getting when plotting phi.
I am mostly not sure what the values of mu and s need to be of the corresponding basis functions.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
x_n=np.arange(0,1,0.04) #[0,1]
function = np.sin(x_n*np.pi*2)
n = 25 # Number of data points
noise = np.random.normal(0.0, 0.01, n) # Random Gaussian noise
# print(noise)
function_s_noise = function + noise
def gaussian_basis(x, mu, s=7):
return np.exp(-(x-mu)**2/(2*(s^2)))
M = 24
# Calculate design matrix Phi
phi = np.ones((x_n.shape[0], M))
for m in range(M-1):
mu = m/M
phi[:, m+1] = np.vectorize(gaussian_basis)(x_n, mu)
I have an experimental time signal, and I need to compute some integral out of it. In detail, I need to compute the PSD, and then compute the power in some bands of frequencies. So, it seems that scipy is the best way to compute integrals. But the algorithms simpson and trapezoid compute for the whole array. There are no integration limits
I can write a function to perform the search of the arrays and get the index of the integration limits, applying then simpson to an slice of the original array. But I was wondering if there is any other way.
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import welch
from scipy.integrate import simpson
t_0 = 0
t_N = 5
N = 10000
freq = N/(t_N-t_0)
w_1 = 1
w_2 = 5
x = np.linspace(t_0,t_N,N)
y = np.cos(2*np.pi*w_1*x) + np.sin(2*np.pi*w_2*x) + \
This is the time signal
f,p = welch(y,fs=freq,nperseg=2**13)
plt.plot(f,p, '-*')
plt.xlabel('f [Hz]')
My integration function
def psd_integrate(f, p, f0, fN):
k0, = np.where(f >= f0) # tuple unpack
k0 = k0[0] # get 1st index
kN, = np.where(f <= fN) # tuple unpack
kN = kN[-1] # get last index
return simpson(p[k0:kN], f[k0:kN])
EDIT: Forgot to add. I'm using seaborn, that's why the plots looks different from default matplotlib
I would like to smooth time series data. For this I would like to use Python.
Now I have already found the function scipy.ndimage.gaussian_filter1d.
For this, the array and a sigma value must be passed.
Now to my question:
Is the sigma value equal to the filter length?
I would like to run a filter of length 365 over the data.
Would it then be the correct procedure to set this sigma value to 365 or am I confusing things?
sigma defines how your Gaussian filter are spread around its mean. You can create gaussian filter with a specific size like below.
import numpy as np
import matplotlib.pyplot as plt
sigma1 = 3
sigma2 = 50
def gaussian_filter1d(size,sigma):
filter_range = np.linspace(-int(size/2),int(size/2),size)
gaussian_filter = [1 / (sigma * np.sqrt(2*np.pi)) * np.exp(-x**2/(2*sigma**2)) for x in filter_range]
return gaussian_filter
fig,ax = plt.subplots(1,2)
ax[0].set_title(f'sigma= {sigma1}')
ax[1].set_title(f'sigma= {sigma2}')
Here is the effect of sigma on the Gaussian filter.
Later, you might convolve your signal with your Gaussian filter.
I would like to compute the RMS Amplitude, of a gaussian white noise signal.
import matplotlib.pyplot as plt
import numpy as np
mean = 0
std = 1.0
t = 100
def zv(t):
return np.random.normal(mean, std, size = t)
def rms(x):
return np.sqrt(np.mean(zv(x)**2))
The plot of zv(t) works - but I don't know why the plot of rms(t) is just empty.
Do you have some comments?
Best Regards
zv(t) returns a one dimensional array of size t. As a result, when you take the mean, it is a single value. You can verify this by printing out the value of rms(t). If you want to create a plot along t for rms, you will need to generate multiple monte carlo samples. For example,
def zv(t):
n = 1000
return np.random.normal(mean, std, size = (n, t))
def rms(x):
return np.sqrt(np.mean(zv(x)**2, axis = 0))
I compare fitting with optimize.curve_fit and optimize.least_squares. With curve_fit I get the covariance matrix pcov as an output and I can calculate the standard deviation errors for my fitted variables by that:
perr = np.sqrt(np.diag(pcov))
If I do the fitting with least_squares, I do not get any covariance matrix output and I am not able to calculate the standard deviation errors for my variables.
Here's my example:
#import modules
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import least_squares
noise = 0.5
N = 100
t = np.linspace(0, 4*np.pi, N)
# generate data
def generate_data(t, freq, amplitude, phase, offset, noise=0, n_outliers=0, random_state=0):
#formula for data generation with noise and outliers
y = np.sin(t * freq + phase) * amplitude + offset
rnd = np.random.RandomState(random_state)
error = noise * rnd.randn(t.size)
outliers = rnd.randint(0, t.size, n_outliers)
error[outliers] *= 10
return y + error
#generate data
data = generate_data(t, 1, 3, 0.001, 0.5, noise, n_outliers=10)
#initial guesses
# create the function we want to fit
def my_sin(x, freq, amplitude, phase, offset):
return np.sin(x * freq + phase) * amplitude + offset
# create the function we want to fit for least-square
def my_sin_lsq(x, t, y):
# freq=x[0]
# phase=x[1]
# amplitude=x[2]
# offset=x[3]
return (np.sin(t*x[0]+x[2])*x[1]+ x[3]) - y
# now do the fit for curve_fit
fit = curve_fit(my_sin, t, data, p0=p0)
print 'Curve fit output:'+str(fit[0])
#now do the fit for least_square
res_lsq = least_squares(my_sin_lsq, x0, args=(t, data))
print 'Least_squares output:'+str(res_lsq.x)
# we'll use this to plot our first estimate. This might already be good enough for you
data_first_guess = my_sin(t, *p0)
#data_first_guess_lsq = x0[2]*np.sin(t*x0[0]+x0[1])+x0[3]
data_first_guess_lsq = my_sin(t, *x0)
# recreate the fitted curve using the optimized parameters
data_fit = my_sin(t, *fit[0])
data_fit_lsq = my_sin(t, *res_lsq.x)
#calculation of residuals
residuals = data - data_fit
residuals_lsq = data - data_fit_lsq
ss_res = np.sum(residuals**2)
ss_tot = np.sum((data-np.mean(data))**2)
ss_res_lsq = np.sum(residuals_lsq**2)
ss_tot_lsq = np.sum((data-np.mean(data))**2)
#R squared
r_squared = 1 - (ss_res/ss_tot)
r_squared_lsq = 1 - (ss_res_lsq/ss_tot_lsq)
print 'R squared curve_fit is:'+str(r_squared)
print 'R squared least_squares is:'+str(r_squared_lsq)
plt.plot(t, data)
plt.plot(t, data_first_guess)
plt.plot(t, data_fit)
plt.plot(t, residuals)
plt.plot(t, data)
plt.plot(t, data_first_guess_lsq)
plt.plot(t, data_fit_lsq)
plt.plot(t, residuals_lsq)
perr = np.sqrt(np.diag(fit[1]))
print 'The standard deviation errors for curve_fit are:' +str(perr)
I would be very thankful for any help, best wishes
ps: I got a lot of input from this source and used part of the code Robust regression
The result of optimize.least_squares has a parameter inside of it called jac. From the documentation:
jac : ndarray, sparse matrix or LinearOperator, shape (m, n)
Modified Jacobian matrix at the solution, in the sense that J^T J is a Gauss-Newton approximation of the Hessian of the cost function. The type is the same as the one used by the algorithm.
This can be used to estimate the Covariance Matrix of the parameters using the following formula: Sigma = (J'J)^-1.
J = res_lsq.jac
cov = np.linalg.inv(
To find the variance of the parameters one can then use:
var = np.sqrt(np.diagonal(cov))
The SciPy program optimize.least_squares requires the user to provide in input a function fun(...) which returns a vector of residuals. This is typically defined as
residuals = (data - model)/sigma
where data and model are vectors with the data to fit and the corresponding model predictions for each data point, while sigma is the 1σ uncertainty in each data value.
In this situation, and assuming one can trust the input sigma uncertainties, one can use the output Jacobian matrix jac returned by least_squares to estimate the covariance matrix. Moreover, assuming the covariance matrix is diagonal, or simply ignoring non-diagonal terms, one can also obtain the 1σ uncertainty perr in the model parameters (often called "formal errors") as follows (see Section 15.4.2 of Numerical Recipes 3rd ed.)
import numpy as np
from scipy import linalg, optimize
res = optimize.least_squares(...)
U, s, Vh = linalg.svd(res.jac, full_matrices=False)
tol = np.finfo(float).eps*s[0]*max(res.jac.shape)
w = s > tol
cov = (Vh[w].T/s[w]**2) # Vh[w] # robust covariance matrix
perr = np.sqrt(np.diag(cov)) # 1sigma uncertainty on fitted parameters
The above code to obtain the covariance matrix is formally the same as the following simpler one (as suggested by Alex), but the above has the major advantage that it works even when the Jacobian is close to degenerate, which is a common occurrence in real-world least-squares fits
cov = linalg.inv(res.jac.T # res.jac) # covariance matrix when jac not degenerate
If one does not trust the input uncertainties sigma, one can still assume that the fit is good, to estimate the data uncertainties from the fit itself. This corresponds to assuming chi**2/DOF=1, where DOF is the number of degrees of freedom. In this case, one can use the following lines to rescale the covariance matrix before computing the uncertainties
chi2dof = np.sum(**2)/( - res.x.size)
cov *= chi2dof
perr = np.sqrt(np.diag(cov)) # 1sigma uncertainty on fitted parameters
I have a set of points (x,y) as two vectors
x,y for example:
from pylab import *
x = sorted(random(30))
y = random(30)
plot(x,y, 'o-')
Now I would like to smooth this data with a Gaussian and evaluate it only at certain (regularly spaced) points on the x-axis. lets say for:
x_eval = linspace(0,1,11)
I got the tip that this method is called a "Gaussian sum filter", but so far I have not found any implementation in numpy/scipy for that, although it seems like a standard problem at first glance.
As the x values are not equally spaced I can't use the scipy.ndimage.gaussian_filter1d.
Usually this kind of smoothing is done going through furrier space and multiplying with the kernel, but I don't really know if this will be possible with irregular spaced data.
Thanks for any ideas
This will blow up for very large datasets, but the proper calculaiton you are asking for would be done as follows:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(0) # for repeatability
x = np.random.rand(30)
y = np.random.rand(30)
x_eval = np.linspace(0, 1, 11)
sigma = 0.1
delta_x = x_eval[:, None] - x
weights = np.exp(-delta_x*delta_x / (2*sigma*sigma)) / (np.sqrt(2*np.pi) * sigma)
weights /= np.sum(weights, axis=1, keepdims=True)
y_eval =, y)
plt.plot(x, y, 'bo-')
plt.plot(x_eval, y_eval, 'ro-')
I'll preface this answer by saying that this is more of a DSP question than a programming question...
...that being said there, there is a simple two step solution to your problem.
Step 1: Resample the data
So to illustrate this we can create a random data set with unequal sampling:
import numpy as np
x = np.cumsum(np.random.randint(0,100,100))
y = np.random.normal(0,1,size=100)
This gives something like:
We can resample this data using simple linear interpolation:
nx = np.arange(x.max()) # choose new x axis sampling
ny = np.interp(nx,x,y) # generate y values for each x
This converts our data to:
Step 2: Apply filter
At this stage you can use some of the tools available through scipy to apply a Gaussian filter to the data with a given sigma value:
import scipy.ndimage.filters as filters
fx = filters.gaussian_filter1d(ny,sigma=100)
Plotting this up against the original data we get:
The choice of the sigma value determines the width of the filter.
Based on #Jaime's answer I wrote a function that implements this with some additional documentation and the ability to discard estimates far from the datapoints.
I think confidence intervals could be obtained on this estimate by bootstrapping, but I haven't done this yet.
def gaussian_sum_smooth(xdata, ydata, xeval, sigma, null_thresh=0.6):
"""Apply gaussian sum filter to data.
xdata, ydata : array
Arrays of x- and y-coordinates of data.
Must be 1d and have the same length.
xeval : array
Array of x-coordinates at which to evaluate the smoothed result
sigma : float
Standard deviation of the Gaussian to apply to each data point
Larger values yield a smoother curve.
null_thresh : float
For evaluation points far from data points, the estimate will be
based on very little data. If the total weight is below this threshold,
return np.nan at this location. Zero means always return an estimate.
The default of 0.6 corresponds to approximately one sigma away
from the nearest datapoint.
# Distance between every combination of xdata and xeval
# each row corresponds to a value in xeval
# each col corresponds to a value in xdata
delta_x = xeval[:, None] - xdata
# Calculate weight of every value in delta_x using Gaussian
# Maximum weight is 1.0 where delta_x is 0
weights = np.exp(-0.5 * ((delta_x / sigma) ** 2))
# Multiply each weight by every data point, and sum over data points
smoothed =, ydata)
# Nullify the result when the total weight is below threshold
# This happens at evaluation points far from any data
# 1-sigma away from a data point has a weight of ~0.6
nan_mask = weights.sum(1) < null_thresh
smoothed[nan_mask] = np.nan
# Normalize by dividing by the total weight at each evaluation point
# Nullification above avoids divide by zero warning shere
smoothed = smoothed / weights.sum(1)
return smoothed