Continuous Piecewise-Linear Fit in Python

Continuous Piecewise-Linear Fit in Python - python

I have a number of short time-series ( maybe 30 - 100 time points ), and they have a general shape : they start high, come down quickly, may or may not plateau near zero, and then go back up. If they don't plateau, they look something like a simple quadratic, and if they do plateau, you may have a long series of zeros.
I'm trying to use the lmfit module to fit a piecewise linear curve that is continuous. I'd like to infer where the line changes gradients, that is, I want to know where the curve "qualitatitively" changes gradients. I'd like to know when the gradient stops going down, and when it starts increasing again, in general terms. I'm having a few issues with it :
lmfit seems to require at least two parameters, so I'm having to pass _.
I'm unsure how to constrain one parameter to be greater than another.
I'm getting could not broadcast input array from shape (something) into shape (something)
errors
Here's some code. First, my objective function, to be minimised.
def piecewiselinear(params, data, _) :
t1 = params["t1"].value
t2 = params["t2"].value
m1 = params["m1"].value
m2 = params["m2"].value
m3 = params["m3"].value
c = params["c"].value
# Construct continuous, piecewise-linear fit
model = np.zeros_like(data)
model[:t1] = c + m1 * np.arange(t1)
model[t1:t2] = model[t1-1] + m2 * np.arange(t2 - t1)
model[t2:] = model[t2-1] + m3 * np.arange(len(data) - t2)
return model - data
I then call,
p = lmfit.Parameters()
p.add("t1", value = len(data)/4, min = 1, max = len(data))
p.add("t2", value = len(data)/4*3, min = 2, max = len(data))
p.add("m1", value = -100., max=0)
p.add("m2", value = 0.)
p.add("m3", value = 20., min = 1.)
p.add("c", min=0, value = 800.)
result = lmfit.minimize(piecewiselinear, p, args = (data, _) )
The model is that, at some time t1, the gradient of the line changes, and the same happens at t2. Both of these parameters, as well as the gradients of the line segments ( and one intercept ), need to be inferred.
I could do this using MCMC methods, but I have too many of these series, and it would take too long.
Part of the traceback :
15 model = np.zeros_like(data)
16 model[:t1] = c + m1 * np.arange(t1)
---> 17 model[t1:t2] = model[t1-1] + m2 * np.arange(t2-t1)
18 model[t2:] = model[t2-1] + m3 * np.arange(len(data) - t2)
19
ValueError: could not broadcast input array from shape (151) into shape (28)
A couple of examples of the time-series :
Any and all suggestions welcome. Thank you very much.

Here's a plot from a rather brute-force 3-pwlin fitter;
will trade rough code for test cases.
Also, a couple of links:
Fit-piecewise-linear-data
on dsp.stack might give you some ideas; added a bit on
Dynamic programming.
github.com/NickFoubert/simple-segment
has python for segmenting e.g. ECGs with max_error (not number of pieces),
from a nice paper by Keogh et al.,
An online algorithm for segmenting time series, 2001, 8p.
And a possible alternative: could you just fit the power p in y ~ x^p, log y ~ p log x^2
(after shifting x to [-1 .. 1] and y > 1e-6 or so) ?
This would be robust, fast, and easy to plot and understand.
One should probably weight the ends
so that the errors are roughly flat and normal.
Also one could fit separate p p' to the left and right halves.

Going down the brute force route seems to do the trick. I'm just testing all combinations of switchpoints and picking the best fit. It's very quick and can be reasonably robust. Here's the result of one particular fit.
I'm forcing the gradient of the second line to be zero. This ensures that we don't get an OK fit for two lines and a perfect fit for one, which may grab a higher score ( I'm using the sum of R^2 values here ). In green are marked the switchpoints, and these should work very well for my application.
I'd love to learn a more elegant want to do this, but in the meantime, this is an option...

Related

Performance issue with Scipy's solve_bvp and coupled differential equations

I'm facing a problem while trying to implement the coupled differential equation below (also known as single-mode coupling equation) in Python 3.8.3. As for the solver, I am using Scipy's function scipy.integrate.solve_bvp, whose documentation can be read here. I want to solve the equations in the complex domain, for different values of the propagation axis (z) and different values of beta (beta_analysis).
The problem is that it is extremely slow (not manageable) compared with an equivalent implementation in Matlab using the functions bvp4c, bvpinit and bvpset. Evaluating the first few iterations of both executions, they return the same result, except for the resulting mesh which is a lot greater in the case of Scipy. The mesh sometimes even saturates to the maximum value.
The equation to be solved is shown here below, along with the boundary conditions function.
import h5py
import numpy as np
from scipy import integrate
def coupling_equation(z_mesh, a):
ka_z = k # Global
z_a = z # Global
a_p = np.empty_like(a).astype(complex)
for idx, z_i in enumerate(z_mesh):
beta_zf_i = np.interp(z_i, z_a, beta_zf) # Get beta at the desired point of the mesh
ka_z_i = np.interp(z_i, z_a, ka_z) # Get ka at the desired point of the mesh
coupling_matrix = np.empty((2, 2), complex)
coupling_matrix[0] = [-1j * beta_zf_i, ka_z_i]
coupling_matrix[1] = [ka_z_i, 1j * beta_zf_i]
a_p[:, idx] = np.matmul(coupling_matrix, a[:, idx]) # Solve the coupling matrix
return a_p
def boundary_conditions(a_a, a_b):
return np.hstack(((a_a[0]-1), a_b[1]))
Moreover, I couldn't find a way to pass k, z and beta_zf as arguments of the function coupling_equation, given that the fun argument of the solve_bpv function must be a callable with the parameters (x, y). My approach is to define some global variables, but I would appreciate any help on this too if there is a better solution.
The analysis function which I am trying to code is:
def analysis(k, z, beta_analysis, max_mesh):
s11_analysis = np.empty_like(beta_analysis, dtype=complex)
s21_analysis = np.empty_like(beta_analysis, dtype=complex)
initial_mesh = np.linspace(z[0], z[-1], 10) # Initial mesh of 10 samples along L
mesh = initial_mesh
# a_init must be complex in order to solve the problem in a complex domain
a_init = np.vstack((np.ones(np.size(initial_mesh)).astype(complex),
np.zeros(np.size(initial_mesh)).astype(complex)))
for idx, beta in enumerate(beta_analysis):
print(f"Iteration {idx}: beta_analysis = {beta}")
global beta_zf
beta_zf = beta * np.ones(len(z)) # Global variable so as to use it in coupling_equation(x, y)
a = integrate.solve_bvp(fun=coupling_equation,
bc=boundary_conditions,
x=mesh,
y=a_init,
max_nodes=max_mesh,
verbose=1)
# mesh = a.x # Mesh for the next iteration
# a_init = a.y # Initial guess for the next iteration, corresponding to the current solution
s11_analysis[idx] = a.y[1][0]
s21_analysis[idx] = a.y[0][-1]
return s11_analysis, s21_analysis
I suspect that the problem has something to do with the initial guess that is being passed to the different iterations (see commented lines inside the loop in the analysis function). I try to set the solution of an iteration as the initial guess for the following (which must reduce the time needed for the solver), but it is even slower, which I don't understand. Maybe I missed something, because it is my first time trying to solve differential equations.
The parameters used for the execution are the following:
f2 = h5py.File(r'path/to/file', 'r')
k = np.array(f2['k']).squeeze()
z = np.array(f2['z']).squeeze()
f2.close()
analysis_points = 501
max_mesh = 1e6
beta_0 = 3e2;
beta_low = 0; # Lower value of the frequency for the analysis
beta_up = beta_0; # Upper value of the frequency for the analysis
beta_analysis = np.linspace(beta_low, beta_up, analysis_points);
s11_analysis, s21_analysis = analysis(k, z, beta_analysis, max_mesh)
Any ideas on how to improve the performance of these functions? Thank you all in advance, and sorry if the question is not well-formulated, I accept any suggestions about this.
Edit: Added some information about performance and sizing of the problem.
In practice, I can't find a relation that determines de number of times coupling_equation is called. It must be a matter of the internal operation of the solver. I checked the number of callings in one iteration by printing a line, and it happened in 133 ocasions (this was one of the fastests). This must be multiplied by the number of iterations of beta. For the analyzed one, the solver returned this:
Solved in 11 iterations, number of nodes 529.
Maximum relative residual: 9.99e-04
Maximum boundary residual: 0.00e+00
The shapes of a and z_mesh are correlated, since z_mesh is a vector whose length corresponds with the size of the mesh, recalculated by the solver each time it calls coupling_equation. Given that a contains the amplitudes of the progressive and regressive waves at each point of z_mesh, the shape of a is (2, len(z_mesh)).
In terms of computation times, I only managed to achieve 19 iterations in about 2 hours with Python. In this case, the initial iterations were faster, but they start to take more time as their mesh grows, until the point that the mesh saturates to the maximum allowed value. I think this is because of the value of the input coupling coefficients in that point, because it also happens when no loop in beta_analysisis executed (just the solve_bvp function for the intermediate value of beta). Instead, Matlab managed to return a solution for the entire problem in just 6 minutes, aproximately. If I pass the result of the last iteration as initial_guess (commented lines in the analysis function, the mesh overflows even faster and it is impossible to get more than a couple iterations.

Based on semi-random inputs, we can see that max_mesh is sometimes reached. This means that coupling_equation can be called with a quite big z_mesh and a arrays. The problem is that coupling_equation contains a slow pure-Python loop iterating on each column of the arrays. You can speed the computation up a lot using Numpy vectorization. Here is an implementation:
def coupling_equation_fast(z_mesh, a):
ka_z = k # Global
z_a = z # Global
a_p = np.empty(a.shape, dtype=np.complex128)
beta_zf_i = np.interp(z_mesh, z_a, beta_zf) # Get beta at the desired point of the mesh
ka_z_i = np.interp(z_mesh, z_a, ka_z) # Get ka at the desired point of the mesh
# Fast manual matrix multiplication
a_p[0] = (-1j * beta_zf_i) * a[0] + ka_z_i * a[1]
a_p[1] = ka_z_i * a[0] + (1j * beta_zf_i) * a[1]
return a_p
This code provides a similar output with semi-random inputs compared to the original implementation but is roughly 20 times faster on my machine.
Furthermore, I do not know if max_mesh happens to be big with your inputs too and even if this is normal/intended. It may make sense to decrease the value of max_mesh in order to reduce the execution time even more.

Checking Neural Network Gradient with Finite Difference Methods Doesn't Work

After a full week of print statements, dimensional analysis, refactoring, and talking through the code out loud, I can say I'm completely stuck.
The gradients my cost function produces are too far from those produced by finite differences.
I have confirmed my cost function produces correct costs for regularized inputs and not. Here's the cost function:
def nnCost(nn_params, X, y, lambda_, input_layer_size, hidden_layer_size, num_labels):
# reshape parameter/weight vectors to suit network size
Theta1 = np.reshape(nn_params[:hidden_layer_size * (input_layer_size + 1)], (hidden_layer_size, (input_layer_size + 1)))
Theta2 = np.reshape(nn_params[(hidden_layer_size * (input_layer_size+1)):], (num_labels, (hidden_layer_size + 1)))
if lambda_ is None:
lambda_ = 0
# grab number of observations
m = X.shape[0]
# init variables we must return
cost = 0
Theta1_grad = np.zeros(Theta1.shape)
Theta2_grad = np.zeros(Theta2.shape)
# one-hot encode the vector y
y_mtx = pd.get_dummies(y.ravel()).to_numpy()
ones = np.ones((m, 1))
X = np.hstack((ones, X))
# layer 1
a1 = X
z2 = Theta1#a1.T
# layer 2
ones_l2 = np.ones((y.shape[0], 1))
a2 = np.hstack((ones_l2, sigmoid(z2.T)))
z3 = Theta2#a2.T
# layer 3
a3 = sigmoid(z3)
reg_term = (lambda_/(2*m)) * (np.sum(np.sum(np.multiply(Theta1, Theta1))) + np.sum(np.sum(np.multiply(Theta2,Theta2))) - np.subtract((Theta1[:,0].T#Theta1[:,0]),(Theta2[:,0].T#Theta2[:,0])))
cost = (1/m) * np.sum((-np.log(a3).T * (y_mtx) - np.log(1-a3).T * (1-y_mtx))) + reg_term
# BACKPROPAGATION
# δ3 equals the difference between a3 and the y_matrix
d3 = a3 - y_mtx.T
# δ2 equals the product of δ3 and Θ2 (ignoring the Θ2 bias units) multiplied element-wise by the g′() of z2 (computed back in Step 2).
d2 = Theta2[:,1:].T#d3 * sigmoidGradient(z2)
# Δ1 equals the product of δ2 and a1.
Delta1 = d2#a1
Delta1 /= m
# Δ2 equals the product of δ3 and a2.
Delta2 = d3#a2
Delta2 /= m
reg_term1 = (lambda_/m) * np.append(np.zeros((Theta1.shape[0],1)), Theta1[:,1:], axis=1)
reg_term2 = (lambda_/m) * np.append(np.zeros((Theta2.shape[0],1)), Theta2[:,1:], axis=1)
Theta1_grad = Delta1 + reg_term1
Theta2_grad = Delta2 + reg_term2
grad = np.append(Theta1_grad.ravel(), Theta2_grad.ravel())
return cost, grad
Here's the code to check the gradients. I have been over every line and there is nothing whatsoever that I can think of to change here. It seems to be in working order.
def checkNNGradients(lambda_):
"""
Creates a small neural network to check the backpropagation gradients.
Credit: Based on the MATLAB code provided by Dr. Andrew Ng, Stanford Univ.
Input: Regularization parameter, lambda, as int or float.
Output: Analytical gradients produced by backprop code and the numerical gradients (computed
using computeNumericalGradient). These two gradient computations should result in
very similar values.
"""
input_layer_size = 3
hidden_layer_size = 5
num_labels = 3
m = 5
# generate 'random' test data
Theta1 = debugInitializeWeights(hidden_layer_size, input_layer_size)
Theta2 = debugInitializeWeights(num_labels, hidden_layer_size)
# reusing debugInitializeWeights to generate X
X = debugInitializeWeights(m, input_layer_size - 1)
y = np.ones(m) + np.remainder(np.range(m), num_labels)
# unroll parameters
nn_params = np.append(Theta1.ravel(), Theta2.ravel())
costFunc = lambda p: nnCost(p, X, y, lambda_, input_layer_size, hidden_layer_size, num_labels)
cost, grad = costFunc(nn_params)
numgrad = computeNumericalGradient(costFunc, nn_params)
# examine the two gradient computations; two columns should be very similar.
print('The columns below should be very similar.\n')
# Credit: http://stackoverflow.com/a/27663954/583834
print('{:<25}{}'.format('Numerical Gradient', 'Analytical Gradient'))
for numerical, analytical in zip(numgrad, grad):
print('{:<25}{}'.format(numerical, analytical))
# If you have a correct implementation, and assuming you used EPSILON = 0.0001
# in computeNumericalGradient.m, then diff below should be less than 1e-9
diff = np.linalg.norm(numgrad-grad)/np.linalg.norm(numgrad+grad)
print(diff)
print("\n")
print('If your backpropagation implementation is correct, then \n' \
'the relative difference will be small (less than 1e-9). \n' \
'\nRelative Difference: {:.10f}'.format(diff))
The check function generates its own data using a debugInitializeWeights function (so there's the reproducible example; just run that and it will call the other functions), and then calls the function that calculates the gradient using finite differences. Both are below.
def debugInitializeWeights(fan_out, fan_in):
"""
Initializes the weights of a layer with fan_in
incoming connections and fan_out outgoing connections using a fixed
strategy.
Input: fan_out, number of outgoing connections for a layer as int; fan_in, number
of incoming connections for the same layer as int.
Output: Weight matrix, W, of size(1 + fan_in, fan_out), as the first row of W handles the "bias" terms
"""
W = np.zeros((fan_out, 1 + fan_in))
# Initialize W using "sin", this ensures that the values in W are of similar scale;
# this will be useful for debugging
W = np.sin(range(1, np.size(W)+1)) / 10
return W.reshape(fan_out, fan_in+1)
def computeNumericalGradient(J, nn_params):
"""
Computes the gradient using "finite differences"
and provides a numerical estimate of the gradient (i.e.,
gradient of the function J around theta).
Credit: Based on the MATLAB code provided by Dr. Andrew Ng, Stanford Univ.
Inputs: Cost, J, as computed by nnCost function; Parameter vector, theta.
Output: Gradient vector using finite differences. Per Dr. Ng,
'Sets numgrad(i) to (a numerical approximation of) the partial derivative of
J with respect to the i-th input argument, evaluated at theta. (i.e., numgrad(i) should
be the (approximately) the partial derivative of J with respect
to theta(i).)'
"""
numgrad = np.zeros(nn_params.shape)
perturb = np.zeros(nn_params.shape)
e = .0001
for i in range(np.size(nn_params)):
# Set perturbation (i.e., noise) vector
perturb[i] = e
# run cost fxn w/ noise added to and subtracted from parameters theta in nn_params
cost1, grad1 = J((nn_params - perturb))
cost2, grad2 = J((nn_params + perturb))
# record the difference in cost function ouputs; this is the numerical gradient
numgrad[i] = (cost2 - cost1) / (2*e)
perturb[i] = 0
return numgrad
The code is not for class. That MOOC was in MATLAB and it's over. This is for me. Other solutions exist on the web; looking at them has proved fruitless. Everyone has a different (inscrutable) approach. So, I'm in serious need of assistance or a miracle.
Edit/Update: Fortran ordering when raveling vectors influences the outcome, but I have not been able to get the gradients to move together changing that option.

One thought: I think your perturbation is a little large, being 1e-4. For double precision floating point numbers, it should be more like 1e-8, i.e., the root of the machine precision (or are you working with single precision?!).
That being said, finite differences can be very bad approximations to true derivatives. Specifically, floating point computations in numpy are not deterministic, as you seem to have found out. The noise in evaluations can cancel out many significant digits under some circumstances. What values are you seeing and what are you expecting?

All of the following figured into the solution to my problem. For those attempting to translate MATLAB code to Python, whether from Andrew NG's Coursera Machine Learning course or not, these are things everyone should know.
MATLAB does everything in FORTRAN order; Python does everything in C order. This affects how vectors are populated and, thus, your results. You should always be in FORTRAN order, if you want your answers to match what you did in MATLAB. See docs
Getting your vectors in FORTRAN order can be as easy as passing order='F' as an argument to .reshape(), .ravel(), or .flatten(). You may, however, achieve the same thing if you are using .ravel() by transposing the vector then applying the .ravel() function like so X.T.ravel().
Speaking of .ravel(), the .ravel() and .flatten() functions do not do the same thing and may have different use cases. For example, .flatten() is preferred by SciPy optimization methods. So, if your equivalent of fminunc isn't working, it's likely because you forgot to .flatten() your response vector y. See this Q&A StackOverflow and docs on .ravel() which link to .flatten().More Docs
If you're translating your code from MATLAB live script into a Jupyter notebook or Google COLAB, you must police your name space. On one occasion, I found that the variable I thought was being passed was not actually the variable that was being passed. Why? Jupyter and Colab notebooks have a lot of global variables that one would never write ordinarily.
There is a better function to evaluate the differences between numerical and analytical gradients: Relative Error Comparison np.abs(numerical-analyitical)/(numerical+analytical). Read about it here CS231 Also, consider the accepted post above.

Closed Form Ridge Regression

I am having trouble understanding the output of my function to implement multiple-ridge regression. I am doing this from scratch in Python for the closed form of the method. This closed form is shown below:
I have a training set X that is 100 rows x 10 columns and a vector y that is 100x1.
My attempt is as follows:
def ridgeRegression(xMatrix, yVector, lambdaRange):
wList = []
for i in range(1, lambdaRange+1):
lambVal = i
# compute the inner values (X.T X + lambda I)
xTranspose = np.transpose(x)
xTx = xTranspose # x
lamb_I = lambVal * np.eye(xTx.shape[0])
# invert inner, e.g. (inner)**(-1)
inner_matInv = np.linalg.inv(xTx + lamb_I)
# compute outer (X.T y)
outer_xTy = np.dot(xTranspose, y)
# multiply together
w = inner_matInv # outer_xTy
wList.append(w)
print(wList)
For testing, I am running it with the first 5 lambda values.
wList becomes 5 numpy.arrays each of length 10 (I'm assuming for the 10 coefficients).
Here is the first of those 5 arrays:
array([ 0.29686755, 1.48420319, 0.36388528, 0.70324668, -0.51604451,
2.39045735, 1.45295857, 2.21437745, 0.98222546, 0.86124358])
My question, and clarification:
Shouldn't there be 11 coefficients, (1 for the y-intercept + 10 slopes)?
How do I get the Minimum Square Error from this computation?
What comes next if I wanted to plot this line?
I think I am just really confused as to what I'm looking at, since I'm still working on my linear-algebra.
Thanks!

First, I would modify your ridge regression to look like the following:
import numpy as np
def ridgeRegression(X, y, lambdaRange):
wList = []
# Get normal form of `X`
A = X.T # X
# Get Identity matrix
I = np.eye(A.shape[0])
# Get right hand side
c = X.T # y
for lambVal in range(1, lambdaRange+1):
# Set up equations Bw = c
lamb_I = lambVal * I
B = A + lamb_I
# Solve for w
w = np.linalg.solve(B,c)
wList.append(w)
return wList
Notice that I replaced your inv call to compute the matrix inverse with an implicit solve. This is much more numerically stable, which is an important consideration for these types of problems especially.
I've also taken the A=X.T#X computation, identity matrix I generation, and right hand side vector c=X.T#y computation out of the loop--these don't change within the loop and are relatively expensive to compute.
As was pointed out by #qwr, the number of columns of X will determine the number of coefficients you have. You have not described your model, so it's not clear how the underlying domain, x, is structured into X.
Traditionally, one might use polynomial regression, in which case X is the Vandermonde Matrix. In that case, the first coefficient would be associated with the y-intercept. However, based on the context of your question, you seem to be interested in multivariate linear regression. In any case, the model needs to be clearly defined. Once it is, then the returned weights may be used to further analyze your data.

Typically to make notation more compact, the matrix X contains a column of ones for an intercept, so if you have p predictors, the matrix is dimensions n by p+1. See Wikipedia article on linear regression for an example.
To compute in-sample MSE, use the definition for MSE: the average of squared residuals. To compute generalization error, you need cross-validation.

Also, you shouldn't take lambVal as an integer. It can be small (close to 0) if the aim is just to avoid numerical error when xTx is ill-conditionned.
I would advise you to use a logarithmic range instead of a linear one, starting from 0.001 and going up to 100 or more if you want to. For instance you can change your code to that:
powerMin = -3
powerMax = 3
for i in range(powerMin, powerMax):
lambVal = 10**i
print(lambVal)
And then you can try a smaller range or a linear range once you figure out what is the correct order of lambVal with your data from cross-validation.

Translating rJAGS censored linear regression model to PyMC3

I'm currently attempting to translate a program my boss originally wrote in R/rJAGS to Python/PyMC3, partially because he wanted to see if it was something python could do, partially because I want to learn how to do this sort of thing, it seems like a good thing to know. I've gotten a linear fit model working in PyMC3, but I'm having difficulty trying to replicate the censoring bit.
The R program reads in a table, each line having three y-values for three specific x-values which are constant across the data set. Each y-value also has some error associated with it. If that were it then I have a PyMC3 model that can do that; here's the toy model I had set up for it:
import numpy as np
import pymc3 as pmc
# set random seed for reproducibility
np.random.seed(12345)
x = np.linspace(0,10,3)
# Make some model data
# Parameters for linear fit
slope_true = -0.2
inter_true = 0.1
#Linear function
linear = lambda x,slope,inter: slope*x+inter
f_true = linear(x=x,slope=slope_true, inter=inter_true )
# add noise to the data points
f = f_true + np.random.normal(size=len(x)) * 0.05
f_error = np.ones_like(f_true)*f.max()*np.random.uniform(0,1,size=len(x))
with pmc.Model() as model3:
slope = pmc.Normal('slope', mu=0, tau=0.4, testval= 0.15)
inter = pmc.Normal('inter', mu=0, tau=40, testval=0.15)
linear = pmc.Deterministic('linear', slope*x+inter)
y = pmc.Normal('y', mu=linear, tau=1.0/f_error**2, observed=f)
start = pmc.find_MAP()
step = pmc.NUTS()
trace = pmc.sample(1000,start=start)
# extract results
slope_fit = np.median(trace.slope)
slope_up = slope_fit - np.percentile(trace.slope, 15.9)
slope_dn = np.percentile(trace.slope, 84.1) - slope_fit
The above model was somewhat hacked together from examples I found online, it generates points on a line, adds a bit of noise and some "error", then performs a fit on the noisy points with error. After that it grabs the a median value for the slope and some errors associated with it.
But now I need to be able to account for these censored points that sometimes pop up. In this instance certain y-values may have been non-detections, so the value for that point is considered a censor limit and the point is then set to NaN, with an error still associated with the point. The R code model (saved as lin_regress_model.bug) which handles this looks like this:
model {
for (i in 1:N) {
isCensored[i] ~ dinterval(rv[i], censorLimitVec[i])
rv[i] ~ dnorm(y[i],rve[i])
y[i] <- a*x[i] + b
}
a ~ dnorm(0, 1e-6)
b ~ dnorm(0, 1e-6)
tau ~ dgamma(0.001, 0.001)
sigma <- 1/sqrt(tau)
}
Here's an example of data it might get fed:
N = 3 # always 3, because 3 points
isCensored = c(False, False, True)
censorLimitVec = c(-6.65, -6.65, -6.65) # was value of 3rd point before NA
rv = c(-3.4, -4.7, NA) # y-values
rve = c(7e3, 7e2, 6.66) # these are Tau I think, like 1/sigma^2
x = c(0.15, 0.68, 0.94) # x-values
So all of those get passed into the jags model, and it's able to fit this censored data, but I can't for the life of me figure out how to translate that bit into PyMC3-speak. It sounds like the dinterval function in this may be similar to Uniform in PyMC3, but I don't really know what to do with that because I can't directly translate the formula lines (the concept of the tilde itself in R is still a bit weird to me).
If anyone out there can help me it would be greatly appreciated. For all I know it might not even be possible with PyMC3, or maybe it's easy and I've just missed something. Regardless, I've been banging my head against the wall for a few days now so I figure it'd be best just to ask for help at this point.

Fitting curve: why small numbers are better?

I spent some time these days on a problem. I have a set of data:
y = f(t), where y is very small concentration (10^-7), and t is in second. t varies from 0 to around 12000.
The measurements follow an established model:
y = Vs * t - ((Vs - Vi) * (1 - np.exp(-k * t)) / k)
And I need to find Vs, Vi, and k. So I used curve_fit, which returns the best fitting parameters, and I plotted the curve.
And then I used a similar model:
y = (Vs * t/3600 - ((Vs - Vi) * (1 - np.exp(-k * t/3600)) / k)) * 10**7
By doing that, t is a number of hour, and y is a number between 0 and about 10. The parameters returned are of course different. But when I plot each curve, here is what I get:
http://i.imgur.com/XLa4LtL.png
The green fit is the first model, the blue one with the "normalized" model. And the red dots are the experimental values.
The fitting curves are different. I think it's not expected, and I don't understand why. Are the calculations more accurate if the numbers are "reasonnable" ?

The docstring for optimize.curve_fit says,
p0 : None, scalar, or M-length sequence
Initial guess for the parameters. If None, then the initial
values will all be 1 (if the number of parameters for the function
can be determined using introspection, otherwise a ValueError
is raised).
Thus, to begin with, the initial guess for the parameters is by default 1.
Moreover, curve fitting algorithms have to sample the function for various values of the parameters. The "various values" are initially chosen with an initial step size on the order of 1. The algorithm will work better if your data varies somewhat smoothly with changes in the parameter values that on the order of 1.
If the function varies wildly with parameter changes on the order of 1, then the algorithm may tend to miss the optimum parameter values.
Note that even if the algorithm uses an adaptive step size when it tweaks the parameter values, if the initial tweak is so far off the mark as to produce a big residual, and if tweaking in some other direction happens to produce a smaller residual, then the algorithm may wander off in the wrong direction and miss the local minimum. It may find some other (undesired) local minimum, or simply fail to converge. So using an algorithm with an adaptive step size won't necessarily save you.
The moral of the story is that scaling your data can improve the algorithm's chances of of finding the desired minimum.
Numerical algorithms in general all tend to work better when applied to data whose magnitude is on the order of 1. This bias enters into the algorithm in numerous ways. For instance, optimize.curve_fit relies on optimize.leastsq, and the call signature for optimize.leastsq is:
def leastsq(func, x0, args=(), Dfun=None, full_output=0,
col_deriv=0, ftol=1.49012e-8, xtol=1.49012e-8,
gtol=0.0, maxfev=0, epsfcn=None, factor=100, diag=None):
Thus, by default, the tolerances ftol and xtol are on the order of 1e-8. If finding the optimum parameter values require much smaller tolerances, then these hard-coded default numbers will cause optimize.curve_fit to miss the optimize parameter values.
To make this more concrete, suppose you were trying to minimize f(x) = 1e-100*x**2. The factor of 1e-100 squashes the y-values so much that a wide range of x-values (the parameter values mentioned above) will fit within the tolerance of 1e-8. So, with un-ideal scaling, leastsq will not do a good job of finding the minimum.
Another reason to use floats on the order of 1 is because there are many more (IEEE754) floats in the interval [-1,1] than there are far away from 1. For example,
import struct
def floats_between(x, y):
"""
http://stackoverflow.com/a/3587987/190597 (jsbueno)
"""
a = struct.pack("<dd", x, y)
b = struct.unpack("<qq", a)
return b[1] - b[0]
In [26]: floats_between(0,1) / float(floats_between(1e6,1e7))
Out[26]: 311.4397707054894
This shows there are over 300 times as many floats representing numbers between 0 and 1 than there are in the interval [1e6, 1e7].
Thus, all else being equal, you'll typically get a more accurate answer if working with small numbers than very large numbers.

I would imagine it has more to do with the initial parameter estimates you are passing to curve fit. If you are not passing any I believe they all default to 1. Normalizing your data makes those initial estimates closer to the truth. If you don't want to use normalized data just pass the initial estimates yourself and give them reasonable values.

Others have already mentioned that you probably need to have a good starting guess for your fit. In cases like this is, I usually try to find some quick and dirty tricks to get at least a ballpark estimate of the parameters. In your case, for large t, the exponential decays pretty quickly to zero, so for large t, you have
y == Vs * t - (Vs - Vi) / k
Doing a first-order linear fit like
[slope1, offset1] = polyfit(t[t > 2000], y[t > 2000], 1)
you will get slope1 == Vs and offset1 == (Vi - Vs) / k.
Subtracting this straight line from all the points you have, you get the exponential
residual == y - slope1 * t - offset1 == (Vs - Vi) * exp(-t * k)
Taking the log of both sides, you get
log(residual) == log(Vs - Vi) - t * k
So doing a second fit
[slope2, offset2] = polyfit(t, log(y - slope1 * t - offset1), 1)
will give you slope2 == -k and offset2 == log(Vs - Vi), which should be solvable for Vi since you already know Vs. You might have to limit the second fit to small values of t, otherwise you might be taking the log of negative numbers. Collect all the parameters you obtained with these fits and use them as the starting points for your curve_fit.
Finally, you might want to look into doing some sort of weighted fit. The information about the exponential part of your curve is contained in just the first few points, so maybe you should give those a higher weight. Doing this in a statistically correct way is not trivial.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Continuous Piecewise-Linear Fit in Python - python

Related

Performance issue with Scipy's solve_bvp and coupled differential equations

Checking Neural Network Gradient with Finite Difference Methods Doesn't Work

Closed Form Ridge Regression

Translating rJAGS censored linear regression model to PyMC3

Fitting curve: why small numbers are better?

Categories

Resources