Translating rJAGS censored linear regression model to PyMC3

Translating rJAGS censored linear regression model to PyMC3 - python

I'm currently attempting to translate a program my boss originally wrote in R/rJAGS to Python/PyMC3, partially because he wanted to see if it was something python could do, partially because I want to learn how to do this sort of thing, it seems like a good thing to know. I've gotten a linear fit model working in PyMC3, but I'm having difficulty trying to replicate the censoring bit.
The R program reads in a table, each line having three y-values for three specific x-values which are constant across the data set. Each y-value also has some error associated with it. If that were it then I have a PyMC3 model that can do that; here's the toy model I had set up for it:
import numpy as np
import pymc3 as pmc
# set random seed for reproducibility
np.random.seed(12345)
x = np.linspace(0,10,3)
# Make some model data
# Parameters for linear fit
slope_true = -0.2
inter_true = 0.1
#Linear function
linear = lambda x,slope,inter: slope*x+inter
f_true = linear(x=x,slope=slope_true, inter=inter_true )
# add noise to the data points
f = f_true + np.random.normal(size=len(x)) * 0.05
f_error = np.ones_like(f_true)*f.max()*np.random.uniform(0,1,size=len(x))
with pmc.Model() as model3:
slope = pmc.Normal('slope', mu=0, tau=0.4, testval= 0.15)
inter = pmc.Normal('inter', mu=0, tau=40, testval=0.15)
linear = pmc.Deterministic('linear', slope*x+inter)
y = pmc.Normal('y', mu=linear, tau=1.0/f_error**2, observed=f)
start = pmc.find_MAP()
step = pmc.NUTS()
trace = pmc.sample(1000,start=start)
# extract results
slope_fit = np.median(trace.slope)
slope_up = slope_fit - np.percentile(trace.slope, 15.9)
slope_dn = np.percentile(trace.slope, 84.1) - slope_fit
The above model was somewhat hacked together from examples I found online, it generates points on a line, adds a bit of noise and some "error", then performs a fit on the noisy points with error. After that it grabs the a median value for the slope and some errors associated with it.
But now I need to be able to account for these censored points that sometimes pop up. In this instance certain y-values may have been non-detections, so the value for that point is considered a censor limit and the point is then set to NaN, with an error still associated with the point. The R code model (saved as lin_regress_model.bug) which handles this looks like this:
model {
for (i in 1:N) {
isCensored[i] ~ dinterval(rv[i], censorLimitVec[i])
rv[i] ~ dnorm(y[i],rve[i])
y[i] <- a*x[i] + b
}
a ~ dnorm(0, 1e-6)
b ~ dnorm(0, 1e-6)
tau ~ dgamma(0.001, 0.001)
sigma <- 1/sqrt(tau)
}
Here's an example of data it might get fed:
N = 3 # always 3, because 3 points
isCensored = c(False, False, True)
censorLimitVec = c(-6.65, -6.65, -6.65) # was value of 3rd point before NA
rv = c(-3.4, -4.7, NA) # y-values
rve = c(7e3, 7e2, 6.66) # these are Tau I think, like 1/sigma^2
x = c(0.15, 0.68, 0.94) # x-values
So all of those get passed into the jags model, and it's able to fit this censored data, but I can't for the life of me figure out how to translate that bit into PyMC3-speak. It sounds like the dinterval function in this may be similar to Uniform in PyMC3, but I don't really know what to do with that because I can't directly translate the formula lines (the concept of the tilde itself in R is still a bit weird to me).
If anyone out there can help me it would be greatly appreciated. For all I know it might not even be possible with PyMC3, or maybe it's easy and I've just missed something. Regardless, I've been banging my head against the wall for a few days now so I figure it'd be best just to ask for help at this point.

Related

Understading hyperopt's TPE algorithm

I am illustrating hyperopt's TPE algorithm for my master project and cant seem to get the algorithm to converge. From what i understand from the original paper and youtube lecture the TPE algorithm works in the following steps:
(in the following, x=hyperparameters and y=loss)
Start by creating a search history of [x,y], say 10 points.
Sort the hyperparameters according to their loss and divide them into two sets using some quantile γ (γ = 0.5 means the sets will be equally sized)
Make a kernel density estimation for both the poor hyperparameter group (g(x)) and good hyperparameter group (l(x))
Good estimations will have low probability in g(x) and high probability in l(x), so we propose to evaluate the function at argmin(g(x)/l(x))
Evaluate (x,y) pair at the proposed point and repeat steps 2-5.
I have implemented this in python on the objective function f(x) = x^2, but the algorithm fails to converge to the minimum.
import numpy as np
import scipy as sp
from matplotlib import pyplot as plt
from scipy.stats import gaussian_kde
def objective_func(x):
return x**2
def measure(x):
noise = np.random.randn(len(x))*0
return x**2+noise
def split_meassures(x_obs,y_obs,gamma=1/2):
#split x and y observations into two sets and return a seperation threshold (y_star)
size = int(len(x_obs)//(1/gamma))
l = {'x':x_obs[:size],'y':y_obs[:size]}
g = {'x':x_obs[size:],'y':y_obs[size:]}
y_star = (l['y'][-1]+g['y'][0])/2
return l,g,y_star
#sample objective function values for ilustration
x_obj = np.linspace(-5,5,10000)
y_obj = objective_func(x_obj)
#start by sampling a parameter search history
x_obs = np.linspace(-5,5,10)
y_obs = measure(x_obs)
nr_iterations = 100
for i in range(nr_iterations):
#sort observations according to loss
sort_idx = y_obs.argsort()
x_obs,y_obs = x_obs[sort_idx],y_obs[sort_idx]
#split sorted observations in two groups (l and g)
l,g,y_star = split_meassures(x_obs,y_obs)
#aproximate distributions for both groups using kernel density estimation
kde_l = gaussian_kde(l['x']).evaluate(x_obj)
kde_g = gaussian_kde(g['x']).evaluate(x_obj)
#define our evaluation measure for sampling a new point
eval_measure = kde_g/kde_l
if i%10==0:
plt.figure()
plt.subplot(2,2,1)
plt.plot(x_obj,y_obj,label='Objective')
plt.plot(x_obs,y_obs,'*',label='Observations')
plt.plot([-5,5],[y_star,y_star],'k')
plt.subplot(2,2,2)
plt.plot(x_obj,kde_l)
plt.subplot(2,2,3)
plt.plot(x_obj,kde_g)
plt.subplot(2,2,4)
plt.semilogy(x_obj,eval_measure)
plt.draw()
#find point to evaluate and add the new observation
best_search = x_obj[np.argmin(eval_measure)]
x_obs = np.append(x_obs,[best_search])
y_obs = np.append(y_obs,[measure(np.asarray([best_search]))])
plt.show()
I suspect this happens because we keep sampling where we are most certain, thus making l(x) more and more narrow around this point, which doesn't change where we sample at all. So where is my understanding lacking?

So, I am still learning about TPE as well. But here's are the two problems in this code:
This code will only evaluate a few unique point. Because the best location is calculated based on the best recommended by the kernel density functions but there is no way for the code to do exploration of the search space. For example, what acquisition functions do.
Because this code is simply appending new observations to the list of x and y. It adds a whole lot of duplicates. The duplicates lead to a skewed set of observations and that leads to a very weird split and you can easily see that in the later plots. The eval_measure starts as something similar to the objective function but diverges later on.
If you remove the duplicates in x_obs and y_obs you can remove the problem no. 2. However, the first problem can only be removed through the addition of some way of exploring the search space.

Including measurement uncertainties in pyMC3

I'm trying to fit some observations, which have measurement errors, to some other data with no measurement error. How do I take into account the measurement error in pyMC3? I have the following approach, which seems to give me reasonable results, but is it the right way to go about it?
n_samples = 20000
with pymc3.Model() as predictive_model:
intercept = pymc3.Normal('Intercept',mu=1.0,sd=0.2)
exponent = pymc3.Normal('A',mu=4.2,sd=0.15)
likelihood = pymc3.Normal('Observed',
mu=intercept*x_values**exponent,
observed=observed_values,
sd=observed_errors)
start = pymc3.find_MAP()
step = pymc3.NUTS(scaling=start)
trace_predictive = pymc3.sample(n_samples, step, start=start,njobs=4)
where x_values, observed_values and observed_errors are 1D numpy arrays of the same length.

It looks like you have a model C x^A, but you believe the data you collected looked like C x^A + eps. It also looks like you know the measurement error exactly, somehow (this surprises me!)
If your goal is to infer something about the intercept C, exponent A, and measurement noise eps, I would write the model like this:
with pymc3.Model() as predictive_model:
intercept = pymc3.Normal('Intercept',mu=1.0,sd=0.2)
exponent = pymc3.Normal('A',mu=4.2,sd=0.15)
eps = pymc3.HalfNormal('eps', 10.)
likelihood = pymc3.Normal('Observed',
mu=intercept*x_values**exponent,
sd=eps
observed=observed_values - observed_errors)
trace_predictive = pymc3.sample(n_samples, njobs=4)
(note that there are better ways to initialize than the MAP now, and they get chosen automatically!)

Continuous Piecewise-Linear Fit in Python

I have a number of short time-series ( maybe 30 - 100 time points ), and they have a general shape : they start high, come down quickly, may or may not plateau near zero, and then go back up. If they don't plateau, they look something like a simple quadratic, and if they do plateau, you may have a long series of zeros.
I'm trying to use the lmfit module to fit a piecewise linear curve that is continuous. I'd like to infer where the line changes gradients, that is, I want to know where the curve "qualitatitively" changes gradients. I'd like to know when the gradient stops going down, and when it starts increasing again, in general terms. I'm having a few issues with it :
lmfit seems to require at least two parameters, so I'm having to pass _.
I'm unsure how to constrain one parameter to be greater than another.
I'm getting could not broadcast input array from shape (something) into shape (something)
errors
Here's some code. First, my objective function, to be minimised.
def piecewiselinear(params, data, _) :
t1 = params["t1"].value
t2 = params["t2"].value
m1 = params["m1"].value
m2 = params["m2"].value
m3 = params["m3"].value
c = params["c"].value
# Construct continuous, piecewise-linear fit
model = np.zeros_like(data)
model[:t1] = c + m1 * np.arange(t1)
model[t1:t2] = model[t1-1] + m2 * np.arange(t2 - t1)
model[t2:] = model[t2-1] + m3 * np.arange(len(data) - t2)
return model - data
I then call,
p = lmfit.Parameters()
p.add("t1", value = len(data)/4, min = 1, max = len(data))
p.add("t2", value = len(data)/4*3, min = 2, max = len(data))
p.add("m1", value = -100., max=0)
p.add("m2", value = 0.)
p.add("m3", value = 20., min = 1.)
p.add("c", min=0, value = 800.)
result = lmfit.minimize(piecewiselinear, p, args = (data, _) )
The model is that, at some time t1, the gradient of the line changes, and the same happens at t2. Both of these parameters, as well as the gradients of the line segments ( and one intercept ), need to be inferred.
I could do this using MCMC methods, but I have too many of these series, and it would take too long.
Part of the traceback :
15 model = np.zeros_like(data)
16 model[:t1] = c + m1 * np.arange(t1)
---> 17 model[t1:t2] = model[t1-1] + m2 * np.arange(t2-t1)
18 model[t2:] = model[t2-1] + m3 * np.arange(len(data) - t2)
19
ValueError: could not broadcast input array from shape (151) into shape (28)
A couple of examples of the time-series :
Any and all suggestions welcome. Thank you very much.

Here's a plot from a rather brute-force 3-pwlin fitter;
will trade rough code for test cases.
Also, a couple of links:
Fit-piecewise-linear-data
on dsp.stack might give you some ideas; added a bit on
Dynamic programming.
github.com/NickFoubert/simple-segment
has python for segmenting e.g. ECGs with max_error (not number of pieces),
from a nice paper by Keogh et al.,
An online algorithm for segmenting time series, 2001, 8p.
And a possible alternative: could you just fit the power p in y ~ x^p, log y ~ p log x^2
(after shifting x to [-1 .. 1] and y > 1e-6 or so) ?
This would be robust, fast, and easy to plot and understand.
One should probably weight the ends
so that the errors are roughly flat and normal.
Also one could fit separate p p' to the left and right halves.

Going down the brute force route seems to do the trick. I'm just testing all combinations of switchpoints and picking the best fit. It's very quick and can be reasonably robust. Here's the result of one particular fit.
I'm forcing the gradient of the second line to be zero. This ensures that we don't get an OK fit for two lines and a perfect fit for one, which may grab a higher score ( I'm using the sum of R^2 values here ). In green are marked the switchpoints, and these should work very well for my application.
I'd love to learn a more elegant want to do this, but in the meantime, this is an option...

Numerical Accuracy with scipy.optimize.curve_fit in Python

I am having issues with the numerical accuracy of scipy.optimize.curve_fit function in python. It seems to me that I can only get ~ 8 digits of accuracy when I desire ~ 15 digits. I have some data (at this point artificially created) made from the following data creation:
where term 1 ~ 10^-3, term 2 ~ 10^-6, and term 3 is ~ 10^-11. In the data, I vary A randomly (it is a Gaussian error). I then try to fit this to a model:
where lambda is a constant, and I only fit alpha (it is a parameter in the function). Now what I would expect is to see a linear relationship between alpha and A because terms 1 and 2 in the data creation are also in the model, so they should cancel perfectly;
So;
However, what happens is for small A (~10^-11 and below), alpha does not scale with A, that is to say, as A gets smaller and smaller, alpha levels out and remains constant.
For reference, I call the following:
op, pcov = scipy.optimize.curve_fit(model, xdata, ydata, p0=None, sigma=sig)
My first thought was that I was not using double precision, but I am pretty sure that python automatically creates numbers in double precision. Then I thought it was an issue with the documentation perhaps that cuts off the digits? Anyways, I could put my code in here but it is sort of complicated. Is there a way to ensure that the curve fitting function saves my digits?
Thank you so much for your help!
EDIT: The below is my code:
# Import proper packages
import numpy as np
import numpy.random as npr
import scipy as sp
import scipy.constants as spc
import scipy.optimize as spo
from matplotlib import pyplot as plt
from numpy import ndarray as nda
from decimal import *
# Declare global variables
AU = 149597871000.0
test_lambda = 20*AU
M_Sun = (1.98855*(sp.power(10.0,30.0)))
M_Jupiter = (M_Sun/1047.3486)
test_jupiter_mass = M_Jupiter
test_sun_mass = M_Sun
rad_jup = 5.2*AU
ran = np.linspace(AU, 100*AU, num=100)
delta_a = np.power(10.0, -11.0)
chi_limit = 118.498
# Model acceleration of the spacecraft from the sun (with Yukawa term)
def model1(distance, A):
return (spc.G)*(M_Sun/(distance**2.0))*(1 +A*(np.exp(-distance/test_lambda))) + (spc.G)*(M_Jupiter*distance)/((distance**2.0 + rad_jup**2.0)**(3.0/2.0))
# Function that creates a data point for test 1
def data1(distance, dela):
return (spc.G)*(M_Sun/(distance**2.0) + (M_Jupiter*distance)/((distance**2.0 + rad_jup**2.0)**(3.0/2.0))) + dela
# Generates a list of 100 data sets varying by ~&a for test 1
def generate_data1():
data_list = []
for i in range(100):
acc_lst = []
for dist in ran:
x = data1(dist, npr.normal(0, delta_a))
acc_lst.append(x)
data_list.append(acc_lst)
return data_list
# Generates a list of standard deviations at each distance from the sun. Since &a is constant, the standard deviation of each point is constant
def generate_sig():
sig = []
for i in range(100):
sig.append(delta_a)
return sig
# Finds alpha for test 1, since we vary &a in test 1, we need to generate new data for each time we find alpha
def find_alpha1(data_list, sig):
alphas = []
for data in data_list:
op, pcov = spo.curve_fit(model1, ran, data, p0=None, sigma=sig)
alphas.append(op[0])
return alphas
# Tests the dependence of alpha on &a and plots the dependence
def test1():
global delta_a
global test_lambda
test_lambda = 20*AU
delta_a = 10.0**-20.0
alphas = []
delta_as = []
for i in range(20):
print i
data_list = generate_data1()
print np.array(data_list[0])
sig = generate_sig()
alpha = find_alpha1(data_list, sig)
delas = []
for alp in alpha:
if alp < 0:
x = 0
plt.loglog(delta_a, abs(alp), '.' 'r')
else:
x = 0
plt.loglog(delta_a, alp, '.' 'b')
delta_a *= 10
plt.xlabel('Delta A')
plt.ylabel('Alpha (at Lambda = 5 AU)')
plt.show()
def main():
test1()
if __name__ == '__main__':
main()

I believe this is to do with the minimisation algorithm used here, and the maximum obtainable precision.
I remember reading about it in numerical recipes a few years ago, I'll see if i can dig up a reference for you.
edit:
link to numerical recipes here - skip down to page 394 and then read that chapter. Note the third paragraph on page 404:
"Indulge us a ﬁnal reminder that tol should generally be no smaller
than the square root of your machine’s ﬂoating-point precision."
And mathematica mention that if you want accuracy, then you need to go for a different method, and that they don't infact use LMA unless the problem is recognised as being a sum of squares problem.
Given that you're just doing a one dimensional fit, it might be a good exercise to try just implementing one of the fitting algorithms they mention in that chapter.
What are you actually trying to achieve though? From what i understand about it, you're essentially trying to work out the amount of random noise you've added to the curve. But then that's not really what you're doing - unless i've understood wrong...
Edit2:
So after reading how you generate the data, there's an issue with the data and the model you're applying.
You're essentially fitting the two sides of this:
You're essentially trying to fit the height of a gaussian to random numbers. You're not fitting the gaussian to the frequency of those numbers.
Looking at your code, and judging from what you've said, this isn't you end goal, and you're just wanting to get used to the optimise method?
It would make more sense if you randomly adjusted the distance from the sun, and then fit to the data and see if you can minimise to find the distance which generated the data set?

Why does scipy.optimize.curve_fit produce parameters which are barely different from the guess?

I've been trying to fit some histogram data with scipy.optimize.curve_fit, but so far I haven't once been able to produce fit parameters that differ significantly from my guess parameters.
I wouldn't be terribly surprised to find that the more arcane parameters in my fit get stuck in local minima, but even linear coefficients won't move from my initial guesses!
If you've seen anything like this before, I'd love some advice. Do least-squared minimization routines just not work for certain classes of functions?
I try this,
import numpy as np
from matplotlib.pyplot import *
from scipy.optimize import curve_fit
def grating_hist(x,frac,xmax,x0):
# model data to be turned into a histogram
dx = x[1]-x[0]
z = np.linspace(0,1,20000,endpoint=True)
grating = np.cos(frac*np.pi*z)
norm_grating = xmax*(grating-grating[-1])/(1-grating[-1])+x0
# produce the histogram
bin_edges = np.append(x,x[-1]+x[1]-x[0])
hist,bin_edges = np.histogram(norm_grating,bins=bin_edges)
return hist
x = np.linspace(0,5,512)
p_data = [0.7,1.1,0.8]
pct = grating_hist(x,*p_data)
p_guess = [1,1,1]
p_fit,pcov = curve_fit(grating_hist,x,pct,p0=p_guess)
plot(x,pct,label='Data')
plot(x,grating_hist(x,*p_fit),label='Fit')
legend()
show()
print 'Data Parameters:', p_data
print 'Guess Parameters:', p_guess
print 'Fit Parameters:', p_fit
print 'Covariance:',pcov
and I see this: http://i.stack.imgur.com/GwXzJ.png (I'm new here, so I can't post images)
Data Parameters: [0.7, 1.1, 0.8]
Guess Parameters: [1, 1, 1]
Fit Parameters: [ 0.97600854 0.99458336 1.00366634]
Covariance: [[ 3.50047574e-06 -5.34574971e-07 2.99306123e-07]
[ -5.34574971e-07 9.78688795e-07 -6.94780671e-07]
[ 2.99306123e-07 -6.94780671e-07 7.17068753e-07]]
Whaaa? I'm pretty sure this isn't a local minimum for variations in xmax and x0, and it's a long way from the global minimum best fit. The fit parameters still don't change, even with better guesses. Different choices for curve functions (e.g. the sum of two normal distributions) do produce new parameters for the same data, so I know it's not the data itself. I also tried the same thing with scipy.optimize.leastsq itself just in case, but no dice; the parameters still don't move. If you have any thoughts on this, I'd love to hear them!

The problem you're facing is actually not due to curve_fit (or leastsq). It is due to the landscape of the objective of your optimisation problem. In your case the objective is the sum of residuals' squares, which you are trying to minimise. Now, if you look closely at your objective in a close surrounding of your initial conditions, for example using the code below, which only focuses on the first parameter:
p_ind = 0
eps = 1e-6
n_points = 100
frac_surroundings = np.linspace(p_guess[p_ind] - eps, p_guess[p_ind] + eps, n_points)
obj = []
temp_guess = p_guess.copy()
for p in frac_surroundings:
temp_guess[0] = p
obj.append(((grating_hist(x, *p_data) - grating_hist(x, *temp_guess))**2.0).sum())
py.plot(frac_surroundings, obj)
py.show()
you will notice that the landscape is piecewise constant (you can easily check that the situation is the same for other parameters. The problem with that is that these pieces are of the order of 10^-6, whereas the initial step of the fitting procedure is somewhere around 10^-8, hence the procedure ends quickly concluding that you cannot improve from the given initial condition. You could try to fix it by changing epsfcn parameter in curve_fit, but you would quickly notice that the landscape, on top of being piecewise constant, is also very "rugged". In other words, curve_fit is simply not well suited for such a problem, which is simply difficult for gradient based methods, as it is highly non-convex. Probably, some stochastic optimisation methods could do a better job. That is, however, a different question/problem.

I think it is a local minimum, or the algorith fails for a non trivial reason. It is far easier to fit the data to the input, instead of fitting the statistical description of the data to the statistical description of the input.
Here's a modified version of the code doing so:
z = np.linspace(0,1,20000,endpoint=True)
def grating_hist_indicator(x,frac,xmax,x0):
# model data to be turned into a histogram
dx = x[1]-x[0]
grating = np.cos(frac*np.pi*z)
norm_grating = xmax*(grating-grating[-1])/(1-grating[-1])+x0
return norm_grating
x = np.linspace(0,5,512)
p_data = [0.7,1.1,0.8]
pct = grating_hist(x,*p_data)
pct_indicator = grating_hist_indicator(x,*p_data)
p_guess = [1,1,1]
p_fit,pcov = curve_fit(grating_hist_indicator,x,pct_indicator,p0=p_guess)
plot(x,pct,label='Data')
plot(x,grating_hist(x,*p_fit),label='Fit')
legend()
show()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.