I'm trying to fit an asymmetric gaussian to my data. My data is just a numpy array called wave (x) and a numpy array called spec (y) that looks like an asymmetric gaussian.
This is the image with the data with an asymmetric gaussian fitted with curve_fit (this has a continuum too, but this is not important right now.
This is the function:
def agauss(amp, cen, b_sigma, r_sigma, x):
y = np.zeros(len(x))
for i in range(len(x)):
if x[i] < cen:
y[i] = amp*np.exp(-((x[i] - cen)**2)/(2*b_sigma**2))
else:
y[i] = amp*np.exp(-((x[i] - cen)**2)/(2*r_sigma**2))
return y
I'm using this code to fit the parameters:
with pm.Model() as asym:
cen = pm.Uniform('cen', lower=5173, upper=5179)
bsigma = pm.HalfCauchy('bsigma', beta=3)
rsigma = pm.HalfCauchy('rsigma', beta=3)
amp = pm.Uniform('amp', lower=1e-19, upper=1e-16)
err = pm.HalfCauchy('err', beta=0.0000001)
ag_pred = pm.Normal('ag_pred', mu=agauss(amp, cen, bsigma, rsigma, wave), sigma=err, observed=spec)
agdata = pm.sample(3000, cores=2)
But I get the error "Variables do not support boolean operations" in the theano.tensor module. How should I define the function in order to fit the paremeters? There is a better way to do this? Thanks!!
144 err = pm.HalfCauchy('err', beta=0.0000001)
145
--> 146 ag_pred = pm.Normal('ag_pred', mu=agauss(amp, cen, bsigma, rsigma, wave), sigma=err, observed=spec)
147
148 agdata = pm.sample(3000, cores=2)
~/Documents/OIII_emitters/m2fs_reduction/test/assets/scripts/analysis.py in agauss(amp, cen, b_sigma, r_sigma, x)
118 y = np.zeros(len(x))
119 for i in range(len(x)):
--> 120 if x[i] < cen:
121 y[i] = amp*np.exp(-((x[i] - cen)**2)/(2*b_sigma**2))
122 else:
~/anaconda3/envs/data_science/lib/python3.7/site-packages/theano/tensor/var.py in __bool__(self)
92 else:
93 raise TypeError(
---> 94 "Variables do not support boolean operations."
95 )
96
TypeError: Variables do not support boolean operations.
Here is an approach that follows the switchpoint trick of pymc3. It's basically an non-linear extension of this SO question. Below I show the code and how it works with some mock data that I created. I don't expect the mock data to be too similar with your actual data but it should give you a starting point. This approach does have some modelling limitations (i.e. fitting two gaussians) but again they could go away with more realistic input data.
Firstly I create some mock input data. Note that two gaussians attached to each other will have different normalisations and will also approach the continuum of your spectrum somewhat differently. I didn't attempt to include the latter subtlety in the mock data but I kept the normalisation consistent. Another complication is the fact that the amplitude of the two gaussians is different. This probably won't be a problem with real data but here I simply adda constant to match them. From these mock data we would like to recover parameters cen_mock, sigma_mock_1 and sigma_mock_2. Finally note that I kept the same frequency limits as in your question.
import numpy as np
import pymc3 as pm
import theano.tensor as tt
import matplotlib.pyplot as plt
def gaussian_pdf(x,sigma,cen):
g1 = (1/(sigma*np.sqrt(2*np.pi)))*np.exp(
-0.5*np.square(x-cen)/np.square(sigma))
return g1
# create mock data
wavelength_min = 5173
wavelength_max = 5179
cen_mock = 5175
x1 = np.arange(5173,cen_mock,0.01)
x2 = np.arange(cen_mock,5179,0.01)
x = np.arange(wavelength_min,wavelength_max,0.01)
sigma_mock_1 = 1.4
g1 = gaussian_pdf(x[x<=cen_mock],sigma_mock_1,cen_mock)
sigma_mock_2 = 1.9
g2 = gaussian_pdf(x[x>cen_mock],sigma_mock_2,cen_mock)
Once we have some mock data we build the model in pymc3 and sample:
# construct model
with pm.Model() as asym:
switchpoint = pm.DiscreteUniform("switchpoint",lower=x.min(),upper=x.max())
sigma = pm.HalfCauchy('sigma', beta=10, testval=1.)
sigma_1 = pm.HalfNormal('sigma_1', sd=10)
sigma_2 = pm.HalfNormal('sigma_2', sd=10)
sd = pm.math.switch(switchpoint > x, sigma_1, sigma_2)
likelihood = pm.Normal(
'y', mu=(1/(sd*np.sqrt(2*np.pi)))*tt.exp(
-0.5*tt.square(x-switchpoint)/tt.square(sd)),
sd=sigma, observed=y_mock)
with asym:
step1 = pm.NUTS([sigma_1,sigma_2])
step2 = pm.Metropolis([switchpoint])
trace = pm.sample(4000, step=[step1,step2],cores=4,tune=4000,chains=4)
The parameters of the model is the swithcpoint which represents where the gaussians change and the standard deviations. So the sampler will be estimating the "right" or "left" standard deviation based on the value of x.
We now check the results:
df = pm.summary(trace)
x1_m = x[x<=df.loc['switchpoint']['mean']]
x2_m = x[x>df.loc['switchpoint']['mean']]
g1_m = gaussian_pdf(x1_m,df.loc['sigma_1']['mean'],df.loc['switchpoint']['mean'])
g2_m = gaussian_pdf(x2_m,df.loc['sigma_2']['mean'],df.loc['switchpoint']['mean'])
amp_m = g1_m.max() - g2_m.max()
plt.plot(x[x<=cen_mock],g1,label='left gaussian mock')
plt.plot(x[x>cen_mock],g2,label='right gaussian mock')
plt.plot(x1_m,g1_m,label='left gaussian model')
plt.plot(x2_m,g2_m+amp_m,label='right gaussian model')
plt.text(5177,0.23,'sd_mock1='+str(sigma_mock_1))
plt.text(5177,0.2,'sd_mock2='+str(sigma_mock_2))
plt.legend(frameon=False)
There are of course much better ways of checking the sampling results (which are actually bayesian) but this is a quick and dirty way to get a feel of what is going on. Note that after the modelling I still need to adda constant to one of the gaussians to join them naturally. For three different pairs of standard deviations I get the following plots:
Now for some discussion. To evaluate how good this approach is we need to have some better idea of the real data. As you can see if the standard deviations differ significantly then the model starts failing. However for somewhat similar standard deviation and the edge case of the standard deviation being equal the model is somewhat acceptable. Furthermore another important input is whether the number of frequency datapoints is equal for the two gaussians. This will determine how each gaussian approaches the continuum. It will also determine whether there any need to arbitrarily adda constant to the second gaussian in order to join them in a more natural way when fitting on real data.
To sum up, this is an approach that follows closely your desired parametrisation, but with some extra work in evaluation using mock data. From your question it seems to me that the switchpoint approach is required, but only when applied to real data (or more realistic mock data) one can know with some certainty if it's sufficient.
Related
I have an experimental dataset 1 which plots intensity as a function of energy. These are arrays of 1800 datapoints.
I have been trying to fit a model to this data, given by the equation below:
Imodel = I0 * ((math.cos(phi) + (beta * f1))**2 + (math.sin(phi) + (beta*f2))**2 + Ioff
I have 2 other datasets of f1 vs. energy and f2 vs. energy 2. These are arrays of 700 datapoints, albeit over the same energy range as the first dataset.
I want to use this model function together with the f1 and f2 data to find optimal values of the other 4 parameters (I0, phi, beta, Ioff) where this model function fits the experimental dataset exactly.
I have been looking into curve_fit and least_squares from the scipy.optimize package, as well as linear regression packages such as lmfit and scikit, but to no avail.
can anyone help? Thanks
Presently I have no representative data from Ayrtonb1 in order to test the method proposed below. The method seems convenient from theoretical basis but one cannot be sure that it will be satisfying with the OP data.
Nevertheless a preliminary test was carried out with a "toy" data (shown below).
I suppose that the screencopy below is sufficient to understand the method and to reproduce the calculus with real data.
The result of this preliminary test is rather good :
LRMSE<2 for a range up to 600. (Least Root Mean Square Error).
LRMSRE<2% (Least Root Mean Square Relative Error).
Your data and formula look suspiciously like resonant (or anomalous) X-ray diffraction data, with measurements of scattered intensity going across the Zn K-edge. Although you do not say this, the discussion here will assume that. You say you have 1800 measurements, presumably as a function of X-ray energy in eV. The resonant scattering factors (f1, f2) you show seem to be idealized and possibly "typical", and perhaps not specifically for the Zn K-edge -- at the very least the energy scale shown is not the same as your data.
You will want to treat the data and model the intensity as a function of X-ray energy. And you will want realistic values for f1 and f2 for the element of interest, and at the actual energy points for your data. I recommend using xraydb (full disclosure: I am the lead author) [pip install xraydb], so that you can do
import numpy as np
import xraydb
#edata, idata = function_to_extract_your_data()
# or perhaps testing with
edata = np.linspace(9500, 10500, 501)
f1 = xraydb.f1_chantler('Zn', edata)
f2 = xraydb.f2_chantler('Zn', edata)
As written, your intensity function does not directly depend on energy, though it might at a later date, say to make that offset be linear in energy, not just a constant. You might write a function like:
def intensity(en, phi, beta, scale=1, slope=0, offset=0, f1=-1, f2=1):
costerm = np.cos(phi) + beta*f1
sinterm = np.sin(phi) + beta*f2
return scale * (costerm**2 + sinterm**2) + slope*en + offset
with that you can start just trying out some values to get a feel for the function and how it compares to your data:
import matplotlib.pyplot as plt
beta = 0.025 # Wild Guess!
for phi in np.pi*np.arange(20)/10:
plt.plot(edata, intensity(edata, phi, beta, f1=f1, f2=f2), label='%.1f'%phi)
plt.legend()
plt.show()
It kind of looks like your value for phi would be around 5.5 to 6 (or -0.8 to -0.3). You could also try different values of beta and plot that with your actual data.
With that model for intensity and a feel for what the range of parameters is, you could then try to fit your data. To do that, I would recommend using lmfit (full disclosure: I am the lead author) [pip install lmfit], where you can create a model from your intensity model function - this will use the names of the function arguments to name the fitting parameters.
from lmfit import Model
imodel = Model(intensity, independent_vars=['en', 'f1', 'f2'])
params = imodel.make_params(scale=1, offset=0, slope=0, beta=0.1, phi=5.5)
That independent_vars will tell Model to not make fitting Parameters for the function arguments f1 and f2 and to expect them to be passed into any evaluation or fit. The other function arguments (scale, phi, etc) will be expected to become fitting variables. You do have to create a "Parameters" object and must give initial values for all parameters. You can put bounds on these or fix them so they are not altered in the fit, as with
params['scale'].min = 0 # force scale to be positive
params['slope'].vary = False # slope will be fixed at 0.
You can then evaluate the model with
init_value = imodel.eval(params, en=edata, f1=f1, f2=f2)
And then eventually do a fit with
result = imodel.fit(idata, params, en=edata, f1=f1, f2=f2)
print(result.fit_report())
plt.plot(edata, idata, label='data')
plt.plot(edata, init_value, label='initial fit')
plt.plot(edata, result.best_fit, label='best fit')
plt.legend()
plt.show()
Finally, for analysis of X-ray resonant scattering be sure to consider including absorption corrections in that intensity calculation. As you go across the Zn K edge, the absorption depth of the sample may change dramatically if the Zn concentration is high.
I have created two mini encoding networks for Standard autoencoder and VAE and ploted each. Would just like to know if my understanding is correct for this mini case. Note it's only one epoch and it ends with encoding.
import numpy as np
from matplotlib import pyplot as plt
np.random.seed(0)
fig, (ax,ax2) = plt.subplots(2,1)
def relu(x):
c = np.where(x>0,x,0)
return c
#Standard autoencoder
x = np.random.randint(0,2,[100,5])
w_autoencoder = np.random.normal(0,1,[5,2])
bottle_neck = relu(x.dot(w_autoencoder))
ax.scatter(bottle_neck[:,0],bottle_neck[:,1])
#VAE autoencoder
w_vae1 = np.random.normal(0,1,[5,2])
w_vae2 = np.random.normal(0,1,[5,2])
mu = relu(x.dot(w_vae1))
sigma = relu(x.dot(w_vae2))
epsilon_sample = np.random.normal(0,1,[100,2])
latent_space = mu+np.log2(sigma)*epsilon_sample
ax2.scatter(latent_space[:,0], latent_space[:,1],c='red')
w_vae1 = np.random.normal(0,1,[5,2])
w_vae2 = np.random.normal(0,1,[5,2])
mu = relu(x.dot(w_vae1))
sigma = relu(x.dot(w_vae2))
epsilon_sample = np.random.normal(0,1,[100,2])
latent_space = mu+np.log2(sigma)*epsilon_sample
ax2.scatter(latent_space[:,0], latent_space[:,1],c='red')
Since your motive is "understanding", I should say you are in the right direction and working on this sort of implementation definitely helps you in understanding. But I strongly believe "understanding" has to be achieved first in books/papers and only then via the implementation/code.
On a quick glance, your standard autoencoder looks fine. You are making an assumption via your implementation that your latent code would be in the range of (0,infinity) using relu(x).
However, while doing the implementation of VAE, you can't achieve the latent code with relu(x) function. This is where your "theoretical" understanding is missing. In standard VAE, we make an assumption that the latent code is a sample from a Gaussian distribution and as such we approximate the parameters of that Gaussian distribution i.e. mean and covariance. Further, we also make another assumption that this Gaussian distribution is factorial which means the covariance matrix is diagonal. In your implementation, you are approximating the mean and diagonal covariance as:
mu = relu(x.dot(w_vae1))
sigma = relu(x.dot(w_vae2))
which seems fine but while getting sample (reparameterization trick), not sure why you introduced np.log2(). Since you are using ReLU() activation, you may end up with 0 in your sigma variable and when you do np.log2(0), you will get inf. I believe you were motivated by some available code where they do:
mu = relu(x.dot(w_vae1)) #same as yours
logOfSigma = x.dot(w_vae2) #you are forcing your network to learn log(sigma)
Now since you are approximating log of sigma, you can allow your output to be negative because to get the sigma, you would do something like np.exp(logOfSigma) and this would ensure you would always get positive values in your diagonal covariance matrix. Now to do sampling, you can simply do:
latent_code = mu + np.exp(logOfSigma)*epsilon_sample
Hope this helps!
The following code fits a oversimplified generalized linear model using statsmodels
model = smf.glm('Y ~ 1', family=sm.families.NegativeBinomial(), data=df)
results = model.fit()
This gives the coefficient and a stderr:
coef stderr
Intercept 2.9471 0.120
Now I want to graphically compare the real distribution of the variable Y (histogram) with the distribution that comes from the model.
But I need two parameters r and p to evaluate the stats.nbinom(r,p) and plot it.
Is there a way to retrieve the parameters from the results of the fitting?
How can I plot the PMF?
Generalized linear models, GLM, in statsmodels currently does not estimate the extra parameter of the Negative Binomial distribution. Negative Binomial belongs to the exponential family of distributions only for fixed shape parameter.
However, statsmodels also has Negative Binomial as a Maximum Likelihood Model in discrete_model which estimates all parameters.
The parameterization of the Negative Binomial for count regression is in terms of the mean or expected value, which is different from the parameterization in scipy.stats.nbinom. Actually, there are two different commonly used parameterization for the Negative Binomial count regression, usually called nb1 and nb2
Here is a quickly written script that recovers the scipy.stats.nbinom parameters, n=size and p=prob from the estimated parameters. Once you have the parameters for the scipy.stats.distribution you can use all the available method, rvs, pmf, and so on.
Something like this should be made available in statsmodels.
In a few example runs, I got results like this
data generating parameters 50 0.25
estimated params 51.7167511571 0.256814610633
estimated params 50.0985814878 0.249989725917
Aside, because of the underlying exponential reparameterization, the scipy optimizers have sometimes problems to converge. In those cases, either providing better starting values or using Nelder-Mead as optimization method usually helps.
import numpy as np
from scipy import stats
import statsmodels.api as sm
# generate some data to check
nobs = 1000
n, p = 50, 0.25
dist0 = stats.nbinom(n, p)
y = dist0.rvs(size=nobs)
x = np.ones(nobs)
loglike_method = 'nb1' # or use 'nb2'
res = sm.NegativeBinomial(y, x, loglike_method=loglike_method).fit(start_params=[0.1, 0.1])
print dist0.mean()
print res.params
mu = res.predict() # use this for mean if not constant
mu = np.exp(res.params[0]) # shortcut, we just regress on a constant
alpha = res.params[1]
if loglike_method == 'nb1':
Q = 1
elif loglike_method == 'nb2':
Q = 0
size = 1. / alpha * mu**Q
prob = size / (size + mu)
print 'data generating parameters', n, p
print 'estimated params ', size, prob
#estimated distribution
dist_est = stats.nbinom(size, prob)
BTW: I ran into this before but didn't have time to look at it
https://github.com/statsmodels/statsmodels/issues/106
I've been trying to fit some histogram data with scipy.optimize.curve_fit, but so far I haven't once been able to produce fit parameters that differ significantly from my guess parameters.
I wouldn't be terribly surprised to find that the more arcane parameters in my fit get stuck in local minima, but even linear coefficients won't move from my initial guesses!
If you've seen anything like this before, I'd love some advice. Do least-squared minimization routines just not work for certain classes of functions?
I try this,
import numpy as np
from matplotlib.pyplot import *
from scipy.optimize import curve_fit
def grating_hist(x,frac,xmax,x0):
# model data to be turned into a histogram
dx = x[1]-x[0]
z = np.linspace(0,1,20000,endpoint=True)
grating = np.cos(frac*np.pi*z)
norm_grating = xmax*(grating-grating[-1])/(1-grating[-1])+x0
# produce the histogram
bin_edges = np.append(x,x[-1]+x[1]-x[0])
hist,bin_edges = np.histogram(norm_grating,bins=bin_edges)
return hist
x = np.linspace(0,5,512)
p_data = [0.7,1.1,0.8]
pct = grating_hist(x,*p_data)
p_guess = [1,1,1]
p_fit,pcov = curve_fit(grating_hist,x,pct,p0=p_guess)
plot(x,pct,label='Data')
plot(x,grating_hist(x,*p_fit),label='Fit')
legend()
show()
print 'Data Parameters:', p_data
print 'Guess Parameters:', p_guess
print 'Fit Parameters:', p_fit
print 'Covariance:',pcov
and I see this: http://i.stack.imgur.com/GwXzJ.png (I'm new here, so I can't post images)
Data Parameters: [0.7, 1.1, 0.8]
Guess Parameters: [1, 1, 1]
Fit Parameters: [ 0.97600854 0.99458336 1.00366634]
Covariance: [[ 3.50047574e-06 -5.34574971e-07 2.99306123e-07]
[ -5.34574971e-07 9.78688795e-07 -6.94780671e-07]
[ 2.99306123e-07 -6.94780671e-07 7.17068753e-07]]
Whaaa? I'm pretty sure this isn't a local minimum for variations in xmax and x0, and it's a long way from the global minimum best fit. The fit parameters still don't change, even with better guesses. Different choices for curve functions (e.g. the sum of two normal distributions) do produce new parameters for the same data, so I know it's not the data itself. I also tried the same thing with scipy.optimize.leastsq itself just in case, but no dice; the parameters still don't move. If you have any thoughts on this, I'd love to hear them!
The problem you're facing is actually not due to curve_fit (or leastsq). It is due to the landscape of the objective of your optimisation problem. In your case the objective is the sum of residuals' squares, which you are trying to minimise. Now, if you look closely at your objective in a close surrounding of your initial conditions, for example using the code below, which only focuses on the first parameter:
p_ind = 0
eps = 1e-6
n_points = 100
frac_surroundings = np.linspace(p_guess[p_ind] - eps, p_guess[p_ind] + eps, n_points)
obj = []
temp_guess = p_guess.copy()
for p in frac_surroundings:
temp_guess[0] = p
obj.append(((grating_hist(x, *p_data) - grating_hist(x, *temp_guess))**2.0).sum())
py.plot(frac_surroundings, obj)
py.show()
you will notice that the landscape is piecewise constant (you can easily check that the situation is the same for other parameters. The problem with that is that these pieces are of the order of 10^-6, whereas the initial step of the fitting procedure is somewhere around 10^-8, hence the procedure ends quickly concluding that you cannot improve from the given initial condition. You could try to fix it by changing epsfcn parameter in curve_fit, but you would quickly notice that the landscape, on top of being piecewise constant, is also very "rugged". In other words, curve_fit is simply not well suited for such a problem, which is simply difficult for gradient based methods, as it is highly non-convex. Probably, some stochastic optimisation methods could do a better job. That is, however, a different question/problem.
I think it is a local minimum, or the algorith fails for a non trivial reason. It is far easier to fit the data to the input, instead of fitting the statistical description of the data to the statistical description of the input.
Here's a modified version of the code doing so:
z = np.linspace(0,1,20000,endpoint=True)
def grating_hist_indicator(x,frac,xmax,x0):
# model data to be turned into a histogram
dx = x[1]-x[0]
grating = np.cos(frac*np.pi*z)
norm_grating = xmax*(grating-grating[-1])/(1-grating[-1])+x0
return norm_grating
x = np.linspace(0,5,512)
p_data = [0.7,1.1,0.8]
pct = grating_hist(x,*p_data)
pct_indicator = grating_hist_indicator(x,*p_data)
p_guess = [1,1,1]
p_fit,pcov = curve_fit(grating_hist_indicator,x,pct_indicator,p0=p_guess)
plot(x,pct,label='Data')
plot(x,grating_hist(x,*p_fit),label='Fit')
legend()
show()
I want to do least-squares polynomial fits on data sets (X,Y,Yerr) and obtain the covariance matrices of the fit parameters. Also, since I have many data sets, CPU-time is an issue, so I'm seeking an analytical (=fast) solution. I found the following (non-ideal) options:
numpy.polyfit does the fit, but doesn't take into account the errors Yerr, nor does it return the covariance;
numpy.polynomial.polynomial.polyfit does accept Yerr as an input (in the form of weights), but doesn't return covariance either;
scipy.optimize.curve_fit and scipy.optimize.leastsq can be tailored to fit polynomials and return the covariance matrix, but - being iterative methods - these are much slower than the polyfit routines (which yield an analytical solution);
Does Python provide an analytical polynomial fit routine that returns the covariance of the fit parameters (or do I have to write one myself :-) ?
Update:
It appears that in Numpy 1.7.0, numpy.polyfit now not only does accept weights, but also returns the covariance matrix of the coefficients ... So, issue resolved! :-)
You want a fast weighted least squares model that returns the covariance matrix without additional overhead? In general, the right covariance matrix will depend on the data generating process (DGP) because different DGP (say Heteroscedasticity of errors) imply different distributions of parameter estimates (think White vs. OLS standard errors). But if you can assume WLS is the right way to do it, and I believe you would use the asymptotic variance estimate for beta for WLS, (1/n X'V^-1X)^-1, where V is the weighting matrix created from Yerrs. That's a pretty simple formula if numpy.polynomial.polynomial.polyfit is working for you.
I looked for an online reference but couldn't find one. But see Fumio Hayashi's Ecomometrics, 2000, Princeton University press, p. 133 - 137 for a derivation and discussion.
Update 12/4/12:
There is another stack overflow question that comes close:
numpy.polyfit has no keyword 'cov' that has a nice explanation (with code) of how to use scikits.statsmodels to do what you want. I believe you'll want to replace the line:
result = sm.OLS(Y,reg_x_data).fit()
to
result = sm.WLS(Y,reg_x_data, weights).fit()
Where you define weights as a function of Yerr as before with numpy.polynomial.polynomial.polyfit. More details on using statsmodels with WLS over at
the statsmodels website.
Here it is using scipy.linalg.lstsq
import numpy as np,numpy.random, scipy.linalg
#generate the test data
N = 100
xs = np.random.uniform(size=N)
errs = np.random.uniform(0, 0.1, size=N) # errors
ys = 1 + 2 * xs + 3 * xs ** 2 + errs * np.random.normal(size=N)
# do the fit
polydeg = 2
A = np.vstack([1 / errs] + [xs ** _ / errs for _ in range(1, polydeg + 1)]).T
result = scipy.linalg.lstsq(A, (ys / errs))[0]
covar = np.matrix(np.dot(A.T, A)).I
print result, '\n', covar
>> [ 0.99991811 2.00009834 3.00195187]
[[ 4.82718910e-07 -2.82097554e-06 3.80331414e-06]
[ -2.82097554e-06 1.77361434e-05 -2.60150367e-05]
[ 3.80331414e-06 -2.60150367e-05 4.22541049e-05]]