Including measurement uncertainties in pyMC3 - python

I'm trying to fit some observations, which have measurement errors, to some other data with no measurement error. How do I take into account the measurement error in pyMC3? I have the following approach, which seems to give me reasonable results, but is it the right way to go about it?
n_samples = 20000
with pymc3.Model() as predictive_model:
intercept = pymc3.Normal('Intercept',mu=1.0,sd=0.2)
exponent = pymc3.Normal('A',mu=4.2,sd=0.15)
likelihood = pymc3.Normal('Observed',
mu=intercept*x_values**exponent,
observed=observed_values,
sd=observed_errors)
start = pymc3.find_MAP()
step = pymc3.NUTS(scaling=start)
trace_predictive = pymc3.sample(n_samples, step, start=start,njobs=4)
where x_values, observed_values and observed_errors are 1D numpy arrays of the same length.

It looks like you have a model C x^A, but you believe the data you collected looked like C x^A + eps. It also looks like you know the measurement error exactly, somehow (this surprises me!)
If your goal is to infer something about the intercept C, exponent A, and measurement noise eps, I would write the model like this:
with pymc3.Model() as predictive_model:
intercept = pymc3.Normal('Intercept',mu=1.0,sd=0.2)
exponent = pymc3.Normal('A',mu=4.2,sd=0.15)
eps = pymc3.HalfNormal('eps', 10.)
likelihood = pymc3.Normal('Observed',
mu=intercept*x_values**exponent,
sd=eps
observed=observed_values - observed_errors)
trace_predictive = pymc3.sample(n_samples, njobs=4)
(note that there are better ways to initialize than the MAP now, and they get chosen automatically!)

Related

scipy.stats.chisquare isn't giving the results expected from the input data

I have some data which I want to apply a fit to and then perform a chi-squared test to get the goodness of the fit. It is obvious that the fit I'm applying doesn't fit the data very well (that in of itself isn't a problem, I'm not necessarily expecting it to) but the values scipy.stats.chisquare is returning would suggest an almost perfect fit which is clearly wrong.
What I've done so far is define a function describing the fit I'm applying (a sinusoidal fit), then using scipy.optimize.curve_fit to fit this function to my data by getting the fit parameters from popt then using them in the previously defined function to generate a fit.
I'm then taking the measured data and the fitted data and putting them into scipy.stats.chisquare in an attempt to get a fit but that is returning a p-value of 1.0 which cannot be right. My assumption is that there is some problem with using the values generated by scipy.optimize.curve_fit in scipy.stats.chisquare but if that is the case I don't understand why that's a problem or how to work around it.
I have my measured data in two lists which I'm calling "time" and "rate" below
import numpy as np
import math
%matplotlib inline
import matplotlib.pyplot as plt
from statistics import stdev
import scipy
time =[309.6666666666667, 326.3333333333333, 334.6666666666667, 399.9166666666667, 416.5833333333333, 433.25, 449.91666666666663, 466.58333333333337, 483.25, 499.91666666666663,]
rate = [0.298168, 0.29317, 0.306496, 0.249861, 0.241532, 0.241532, 0.206552, 0.249861, 0.253193, 0.239867]
def oscillation(t,A,C):
return(A*np.cos((2*np.pi*(t-x0))/(t0))+C)
t0 = 365.25
A = 0.35/2
x0 = 152.5
C = 0.475
popt, pcov = curve_fit(oscillation, time, rate, p0=[A,C])
rate_fit = []
for t in time:
r = oscillation(t, popt[0],popt[1])
rate_fit.append(r)
print(scipy.stats.chisquare(rate, f_exp=rate_fit))
plt.plot(time,rate, '.')
plt.plot(time,rate_fit,'--')
The output of the above is a fit which does look like the best fit to the data when plotted but is clearly not a perfect fit, making the other output of a p value of 0.99999999999458533 which is clearly wrong
You are only fitting for two parameters, A and C, thus forcing the phase and period.
If you also fit for the phase and period, you get a much better fit:
Also in this case, my p-value is 1.0.
The reason why your p-value is 1.0 when x0 and t0 are fixed, is that your result is the best fit that can be made with those values for x0 and t0. Forcing those values, will very likely produce an overall worse fit. For comparison, with x0 and t0 free, I get
A = -3.45840427e-02
C = 2.65142203e-01
x0 = 1.88838771e+02
t0 = 2.61112538e+02
Compare that to t0 = 365.25 and x0 = 152.5.
Of course, there are (physical) reasons that you want to fix e.g. t0 to a year, but in such a case, you should worry less that the plot looks bad; your p-value still takes this into account.
The more likely reason, however, is that you are also forgetting the ddof parameter in scipy.stats.chisquare. It's default is ddof=0, which is not what you have: in your case it's len(rate) - 2, in my above case, it would be len(rate) - 4.
For your fit (t0 and x0 fixed), that results in p = 0.902. With all parameters free, it results in 0.999887 (i.e., 1 again).
Bonus: output when I fix the period t0 to 365.25:
A = -4.05218922e-02
C = 2.74772524e-01
x0 = 8.69008279e+01
p = 0.997
and the plotted fit:

Is the code proper way of understanding Vae vs. Standard Autoencoder?

I have created two mini encoding networks for Standard autoencoder and VAE and ploted each. Would just like to know if my understanding is correct for this mini case. Note it's only one epoch and it ends with encoding.
import numpy as np
from matplotlib import pyplot as plt
np.random.seed(0)
fig, (ax,ax2) = plt.subplots(2,1)
def relu(x):
c = np.where(x>0,x,0)
return c
#Standard autoencoder
x = np.random.randint(0,2,[100,5])
w_autoencoder = np.random.normal(0,1,[5,2])
bottle_neck = relu(x.dot(w_autoencoder))
ax.scatter(bottle_neck[:,0],bottle_neck[:,1])
#VAE autoencoder
w_vae1 = np.random.normal(0,1,[5,2])
w_vae2 = np.random.normal(0,1,[5,2])
mu = relu(x.dot(w_vae1))
sigma = relu(x.dot(w_vae2))
epsilon_sample = np.random.normal(0,1,[100,2])
latent_space = mu+np.log2(sigma)*epsilon_sample
ax2.scatter(latent_space[:,0], latent_space[:,1],c='red')
w_vae1 = np.random.normal(0,1,[5,2])
w_vae2 = np.random.normal(0,1,[5,2])
mu = relu(x.dot(w_vae1))
sigma = relu(x.dot(w_vae2))
epsilon_sample = np.random.normal(0,1,[100,2])
latent_space = mu+np.log2(sigma)*epsilon_sample
ax2.scatter(latent_space[:,0], latent_space[:,1],c='red')
Since your motive is "understanding", I should say you are in the right direction and working on this sort of implementation definitely helps you in understanding. But I strongly believe "understanding" has to be achieved first in books/papers and only then via the implementation/code.
On a quick glance, your standard autoencoder looks fine. You are making an assumption via your implementation that your latent code would be in the range of (0,infinity) using relu(x).
However, while doing the implementation of VAE, you can't achieve the latent code with relu(x) function. This is where your "theoretical" understanding is missing. In standard VAE, we make an assumption that the latent code is a sample from a Gaussian distribution and as such we approximate the parameters of that Gaussian distribution i.e. mean and covariance. Further, we also make another assumption that this Gaussian distribution is factorial which means the covariance matrix is diagonal. In your implementation, you are approximating the mean and diagonal covariance as:
mu = relu(x.dot(w_vae1))
sigma = relu(x.dot(w_vae2))
which seems fine but while getting sample (reparameterization trick), not sure why you introduced np.log2(). Since you are using ReLU() activation, you may end up with 0 in your sigma variable and when you do np.log2(0), you will get inf. I believe you were motivated by some available code where they do:
mu = relu(x.dot(w_vae1)) #same as yours
logOfSigma = x.dot(w_vae2) #you are forcing your network to learn log(sigma)
Now since you are approximating log of sigma, you can allow your output to be negative because to get the sigma, you would do something like np.exp(logOfSigma) and this would ensure you would always get positive values in your diagonal covariance matrix. Now to do sampling, you can simply do:
latent_code = mu + np.exp(logOfSigma)*epsilon_sample
Hope this helps!

Getting more refined results from Python SciPy curve_fit

I've got the following bit of Python (v2.7.14) code, which uses curve_fit from SciPy (v1.0.1) to find parameters for an exponential decay function. Most of the time, I get reasonable results. Occasionally though, I'll get some results which are completely out of my expected range, even though the found parameters will look fine when plotted against the original graph.
First, my understanding of the exponential decay formula comes from https://en.wikipedia.org/wiki/Exponential_decay which I've translated to Python as:
y = a * numpy.exp(-b * x) + c
Where by:
a is the initial value of the data
b is the decay rate, which is the inverse of when the signal gets to 1/e from initial value
c is an offset, as I am dealing with non-negative values in my data which never reach zero
x is the current time
The script takes into account that non-negative data is being fitted and offsets the initial guess appropriately. But even without guessing, not offsetting, using max/min (instead of first/last values) and other random things I've tried, I cannot seem to get curve_fit to produce sensible values on the troublesome datasets.
My hypothesis is that the troublesome datasets don't have enough of a curve that can be fit without going way outside the realm of the data. I've looked at the bounds argument for curve_fit, and thought that might be a reasonable option. I'm unsure as to what would make good lower and upper bounds for the calculation, or if it is actually the option I am looking for.
Here is the code. Commented out code are things I've tried.
#!/usr/local/bin/python
import numpy as numpy
from scipy.optimize import curve_fit
import matplotlib.pyplot as pyplot
def exponential_decay(x, a, b, c):
return a * numpy.exp(-b * x) + c
def fit_exponential(decay_data, time_data, decay_time):
# The start of the curve is offset by the last point, so subtract
guess_a = decay_data[0] - decay_data[-1]
#guess_a = max(decay_data) - min(decay_data)
# The time that it takes for the signal to reach 1/e becomes guess_b
guess_b = 1/decay_time
# Since this is non-negative data, above 0, we use the last data point as the baseline (c)
guess_c = decay_data[-1]
#guess_c = min(decay_data)
guess=[guess_a, guess_b, guess_c]
print "guess: {0}".format(guess)
#popt, pcov = curve_fit(exponential_decay, time_data, decay_data, maxfev=20000)
popt, pcov = curve_fit(exponential_decay, time_data, decay_data, p0=guess, maxfev=20000)
#bound_lower = [0.05, 0.05, 0.05]
#bound_upper = [decay_data[0]*2, guess_b * 10, decay_data[-1]]
#print "bound_lower: {0}".format(bound_lower)
#print "bound_upper: {0}".format(bound_upper)
#popt, pcov = curve_fit(exponential_decay, time_data, decay_data, p0=guess, bounds=[bound_lower, bound_upper], maxfev=20000)
a, b, c = popt
print "a: {0}".format(a)
print "b: {0}".format(b)
print "c: {0}".format(c)
plot_fit = exponential_decay(time_data, a, b, c)
pyplot.plot(time_data, decay_data, 'g', label='Data')
pyplot.plot(time_data, plot_fit, 'r', label='Fit')
pyplot.legend()
pyplot.show()
print "Gives reasonable results"
time_data = numpy.array([0.0,0.040000000000000036,0.08100000000000018,0.12200000000000011,0.16200000000000014,0.20300000000000007,0.2430000000000001,0.28400000000000003,0.32400000000000007,0.365,0.405,0.44599999999999995,0.486,0.5269999999999999,0.567,0.6079999999999999,0.6490000000000002,0.6889999999999998,0.7300000000000002,0.7700000000000002,0.8110000000000002,0.8510000000000002,0.8920000000000001,0.9320000000000002,0.9730000000000001])
decay_data = numpy.array([1.342146870531986,1.405586070225509,1.3439802492549762,1.3567811728250267,1.2666276377825874,1.1686375326985337,1.216119360088685,1.2022841507836042,1.1926979408026064,1.1544395213303447,1.1904416926531907,1.1054720201415882,1.112100683833435,1.0811434035632939,1.1221671794680403,1.0673295063196415,1.0036146509494743,0.9984005680821595,1.0134498134883763,0.9996920772051201,0.929782730581616,0.9646581154122312,0.9290690593684447,0.8907360533169936,0.9121560047238627])
fit_exponential(decay_data, time_data, 0.567)
print
print "Gives results that are way outside my expectations"
time_data = numpy.array([0.0,0.040000000000000036,0.08099999999999996,0.121,0.16199999999999992,0.20199999999999996,0.24300000000000033,0.28300000000000036,0.32399999999999984,0.3650000000000002,0.40500000000000025,0.44599999999999973,0.48599999999999977,0.5270000000000001,0.5670000000000002,0.6079999999999997,0.6479999999999997,0.6890000000000001,0.7290000000000001,0.7700000000000005,0.8100000000000005,0.851,0.8920000000000003,0.9320000000000004,0.9729999999999999,1.013,1.0540000000000003])
decay_data = numpy.array([1.4401611921948776,1.3720688158534153,1.3793465463227048,1.2939909686762128,1.3376345321949346,1.3352710161631154,1.3413634841956348,1.248705138603995,1.2914294791901497,1.2581763134585313,1.246975264018646,1.2006447776495062,1.188232179689515,1.1032789127515186,1.163294324147017,1.1686263160765304,1.1434009568472243,1.0511578409946472,1.0814520440570896,1.1035953824496334,1.0626893599266163,1.0645580326776076,0.994855722989818,0.9959891485338087,0.9394584009825916,0.949504060086646,0.9278639431146273])
fit_exponential(decay_data, time_data, 0.6890000000000001)
And here is the text output:
Gives reasonable results
guess: [0.4299908658081232, 1.7636684303350971, 0.9121560047238627]
a: 1.10498934435
b: 0.583046565885
c: 0.274503681044
Gives results that are way outside my expectations
guess: [0.5122972490802503, 1.4513788098693758, 0.9278639431146273]
a: 742.824622191
b: 0.000606308344957
c: -741.41398516
Most notably, with the second set of results, the value for a is very high, with the value for c being equally low on the negative scale, and b being a very small decimal number.
Here is the graph of the first dataset, which gives reasonable results.
Here is the graph of the second dataset, which does not give good results.
Note that the graph itself plots correctly, though the line does not really have a good curve to it.
My questions:
Is my implementation of the exponential decay algorithm with curve_fit correct?
Are my initial guess parameters good enough?
Is the bounds parameter the correct solution for this problem? If so, what is a good way to determine lower and upper bounds?
Have I missed something here?
Again, thank you!
When you say that the second fit gives results that are "way outside" of your expectations and that although the second graph "plots correctly" the line does not really "have a good curve fit" you are on the right track to understanding what is going on. I think you are just missing a piece of the puzzle.
The second graph is fit pretty well by a curve that does look linear. That probably means that you don't really have enough change in your data (well, perhaps below the noise level) to detect that it is an exponential decay.
I would bet that if you printed out not only the best-fit values but also the uncertainties and correlations for the variables that you would see that the uncertainties are huge and some of the correlations are very close to 1. That may mean that taking into account the uncertainties (and measurements always have uncertainties) the results might actually fit with your expectation. And that may also tell you that the data you have does not support an exponential decay very well.
You might also try other models for this data ("linear" comes to mind ;)) and compare goodness-of-fit statistics such as chi-square and Akaike information criterion.
scipy.curve_fit does return the covariance matrix -- the pcov that you did not use in your example. Unfortunately, scipy.curve_fit does not convert these values into uncertainties and correlation values, and it does not attempt to return any goodness-of-fit statistics at all.
To fully explain any fit to data, you need not only the best-fit values but also an estimate of the uncertainties for the variable parameters. And you need the goodness-of-fit statistics in order to determine if a fit is good, or at least whether one fit is better than another.

Translating rJAGS censored linear regression model to PyMC3

I'm currently attempting to translate a program my boss originally wrote in R/rJAGS to Python/PyMC3, partially because he wanted to see if it was something python could do, partially because I want to learn how to do this sort of thing, it seems like a good thing to know. I've gotten a linear fit model working in PyMC3, but I'm having difficulty trying to replicate the censoring bit.
The R program reads in a table, each line having three y-values for three specific x-values which are constant across the data set. Each y-value also has some error associated with it. If that were it then I have a PyMC3 model that can do that; here's the toy model I had set up for it:
import numpy as np
import pymc3 as pmc
# set random seed for reproducibility
np.random.seed(12345)
x = np.linspace(0,10,3)
# Make some model data
# Parameters for linear fit
slope_true = -0.2
inter_true = 0.1
#Linear function
linear = lambda x,slope,inter: slope*x+inter
f_true = linear(x=x,slope=slope_true, inter=inter_true )
# add noise to the data points
f = f_true + np.random.normal(size=len(x)) * 0.05
f_error = np.ones_like(f_true)*f.max()*np.random.uniform(0,1,size=len(x))
with pmc.Model() as model3:
slope = pmc.Normal('slope', mu=0, tau=0.4, testval= 0.15)
inter = pmc.Normal('inter', mu=0, tau=40, testval=0.15)
linear = pmc.Deterministic('linear', slope*x+inter)
y = pmc.Normal('y', mu=linear, tau=1.0/f_error**2, observed=f)
start = pmc.find_MAP()
step = pmc.NUTS()
trace = pmc.sample(1000,start=start)
# extract results
slope_fit = np.median(trace.slope)
slope_up = slope_fit - np.percentile(trace.slope, 15.9)
slope_dn = np.percentile(trace.slope, 84.1) - slope_fit
The above model was somewhat hacked together from examples I found online, it generates points on a line, adds a bit of noise and some "error", then performs a fit on the noisy points with error. After that it grabs the a median value for the slope and some errors associated with it.
But now I need to be able to account for these censored points that sometimes pop up. In this instance certain y-values may have been non-detections, so the value for that point is considered a censor limit and the point is then set to NaN, with an error still associated with the point. The R code model (saved as lin_regress_model.bug) which handles this looks like this:
model {
for (i in 1:N) {
isCensored[i] ~ dinterval(rv[i], censorLimitVec[i])
rv[i] ~ dnorm(y[i],rve[i])
y[i] <- a*x[i] + b
}
a ~ dnorm(0, 1e-6)
b ~ dnorm(0, 1e-6)
tau ~ dgamma(0.001, 0.001)
sigma <- 1/sqrt(tau)
}
Here's an example of data it might get fed:
N = 3 # always 3, because 3 points
isCensored = c(False, False, True)
censorLimitVec = c(-6.65, -6.65, -6.65) # was value of 3rd point before NA
rv = c(-3.4, -4.7, NA) # y-values
rve = c(7e3, 7e2, 6.66) # these are Tau I think, like 1/sigma^2
x = c(0.15, 0.68, 0.94) # x-values
So all of those get passed into the jags model, and it's able to fit this censored data, but I can't for the life of me figure out how to translate that bit into PyMC3-speak. It sounds like the dinterval function in this may be similar to Uniform in PyMC3, but I don't really know what to do with that because I can't directly translate the formula lines (the concept of the tilde itself in R is still a bit weird to me).
If anyone out there can help me it would be greatly appreciated. For all I know it might not even be possible with PyMC3, or maybe it's easy and I've just missed something. Regardless, I've been banging my head against the wall for a few days now so I figure it'd be best just to ask for help at this point.

Why does scipy.optimize.curve_fit produce parameters which are barely different from the guess?

I've been trying to fit some histogram data with scipy.optimize.curve_fit, but so far I haven't once been able to produce fit parameters that differ significantly from my guess parameters.
I wouldn't be terribly surprised to find that the more arcane parameters in my fit get stuck in local minima, but even linear coefficients won't move from my initial guesses!
If you've seen anything like this before, I'd love some advice. Do least-squared minimization routines just not work for certain classes of functions?
I try this,
import numpy as np
from matplotlib.pyplot import *
from scipy.optimize import curve_fit
def grating_hist(x,frac,xmax,x0):
# model data to be turned into a histogram
dx = x[1]-x[0]
z = np.linspace(0,1,20000,endpoint=True)
grating = np.cos(frac*np.pi*z)
norm_grating = xmax*(grating-grating[-1])/(1-grating[-1])+x0
# produce the histogram
bin_edges = np.append(x,x[-1]+x[1]-x[0])
hist,bin_edges = np.histogram(norm_grating,bins=bin_edges)
return hist
x = np.linspace(0,5,512)
p_data = [0.7,1.1,0.8]
pct = grating_hist(x,*p_data)
p_guess = [1,1,1]
p_fit,pcov = curve_fit(grating_hist,x,pct,p0=p_guess)
plot(x,pct,label='Data')
plot(x,grating_hist(x,*p_fit),label='Fit')
legend()
show()
print 'Data Parameters:', p_data
print 'Guess Parameters:', p_guess
print 'Fit Parameters:', p_fit
print 'Covariance:',pcov
and I see this: http://i.stack.imgur.com/GwXzJ.png (I'm new here, so I can't post images)
Data Parameters: [0.7, 1.1, 0.8]
Guess Parameters: [1, 1, 1]
Fit Parameters: [ 0.97600854 0.99458336 1.00366634]
Covariance: [[ 3.50047574e-06 -5.34574971e-07 2.99306123e-07]
[ -5.34574971e-07 9.78688795e-07 -6.94780671e-07]
[ 2.99306123e-07 -6.94780671e-07 7.17068753e-07]]
Whaaa? I'm pretty sure this isn't a local minimum for variations in xmax and x0, and it's a long way from the global minimum best fit. The fit parameters still don't change, even with better guesses. Different choices for curve functions (e.g. the sum of two normal distributions) do produce new parameters for the same data, so I know it's not the data itself. I also tried the same thing with scipy.optimize.leastsq itself just in case, but no dice; the parameters still don't move. If you have any thoughts on this, I'd love to hear them!
The problem you're facing is actually not due to curve_fit (or leastsq). It is due to the landscape of the objective of your optimisation problem. In your case the objective is the sum of residuals' squares, which you are trying to minimise. Now, if you look closely at your objective in a close surrounding of your initial conditions, for example using the code below, which only focuses on the first parameter:
p_ind = 0
eps = 1e-6
n_points = 100
frac_surroundings = np.linspace(p_guess[p_ind] - eps, p_guess[p_ind] + eps, n_points)
obj = []
temp_guess = p_guess.copy()
for p in frac_surroundings:
temp_guess[0] = p
obj.append(((grating_hist(x, *p_data) - grating_hist(x, *temp_guess))**2.0).sum())
py.plot(frac_surroundings, obj)
py.show()
you will notice that the landscape is piecewise constant (you can easily check that the situation is the same for other parameters. The problem with that is that these pieces are of the order of 10^-6, whereas the initial step of the fitting procedure is somewhere around 10^-8, hence the procedure ends quickly concluding that you cannot improve from the given initial condition. You could try to fix it by changing epsfcn parameter in curve_fit, but you would quickly notice that the landscape, on top of being piecewise constant, is also very "rugged". In other words, curve_fit is simply not well suited for such a problem, which is simply difficult for gradient based methods, as it is highly non-convex. Probably, some stochastic optimisation methods could do a better job. That is, however, a different question/problem.
I think it is a local minimum, or the algorith fails for a non trivial reason. It is far easier to fit the data to the input, instead of fitting the statistical description of the data to the statistical description of the input.
Here's a modified version of the code doing so:
z = np.linspace(0,1,20000,endpoint=True)
def grating_hist_indicator(x,frac,xmax,x0):
# model data to be turned into a histogram
dx = x[1]-x[0]
grating = np.cos(frac*np.pi*z)
norm_grating = xmax*(grating-grating[-1])/(1-grating[-1])+x0
return norm_grating
x = np.linspace(0,5,512)
p_data = [0.7,1.1,0.8]
pct = grating_hist(x,*p_data)
pct_indicator = grating_hist_indicator(x,*p_data)
p_guess = [1,1,1]
p_fit,pcov = curve_fit(grating_hist_indicator,x,pct_indicator,p0=p_guess)
plot(x,pct,label='Data')
plot(x,grating_hist(x,*p_fit),label='Fit')
legend()
show()

Categories

Resources