I am trying to implement a basic way of the stochastic gradient desecent with multi linear regression and the L2 Norm as loss function.
The result can be seen in this picture:
Its pretty far of the ideal regression line, but I dont really understand why thats the case. I double checked all array dimensions and they all seem to fit.
Below is my source code. If anyone can see my error or give me a hint I would appreciate that.
def SGD(x,y,learning_rate):
theta = np.array([[0],[0]])
for i in range(N):
xi = x[i].reshape(1,-1)
y_pre = xi#theta
theta = theta + learning_rate*(y[i]-y_pre[0][0])*xi.T
print(theta)
return theta
N = 100
x = np.array(np.linspace(-2,2,N))
y = 4*x + 5 + np.random.uniform(-1,1,N)
X = np.array([x**0,x**1]).T
plt.scatter(x,y,s=6)
th = SGD(X,y,0.1)
y_reg = np.matmul(X,th)
print(y_reg)
print(x)
plt.plot(x,y_reg)
plt.show()
Edit: Another solution was to shuffle the measurements with x = np.random.permutation(x)
to illustrate my comment,
def SGD(x,y,n,learning_rate):
theta = np.array([[0],[0]])
# currently it does exactly one iteration. do more
for _ in range(n):
for i in range(len(x)):
xi = x[i].reshape(1,-1)
y_pre = xi#theta
theta = theta + learning_rate*(y[i]-y_pre[0][0])*xi.T
print(theta)
return theta
SGD(X,y,10,0.01) yields the correct result
When I use LinearRegression in sklearn, I would do
m = 100
X = 6*np.random.rand(m,1)-3
y = 0.5*X**2 + X+2 + np.random.randn(m,1)
lin_reg = LinearRegression()
lin_reg.fit(X,y)
y_pred_1 = lin_reg.predict(X)
y_pred_1 = [_[0] for _ in y_pred_1]
and when I plot (X,y) and (X, y_pred_1) it seems to be correct.
I wanted create formula for line of best fit by:
y= (lin_reg.coef_)x + lin_reg.intercept_
Manually I've inserted values into formula I've got by using coef_, intercept_ and compared it to predicted value from lin_reg.predict(value) which are the same so lin_reg.predict in fact uses formula I've made above using coef, intercept.
My problem is how to I create a formula for simple polynomial regression?
I would do
poly_features = PolynomialFeatures(degree=2, include_bias=False)
X_poly_2 = poly_features.fit_transform(X)
poly_reg_2 = LinearRegression()
poly_reg_2.fit(X_poly_2, y)
then poly_reg_2.coef_ gives me array([[0.93189329, 0.43283304]]) and poly_reg_2.intercept_ = array([2.20637695]).
Since it is "simple" polynomial regression it should look something like
y = x^2 + x + b where x are same variable.
from poly_reg_2.coef_ which one is x^2 and which one is not?
Thanks to https://www.youtube.com/watch?v=Hwj_9wMXDVo I've gotten insight and found out how to interpret formula for polynomial regression.
So poly_reg_2.coef_ = array([[0.93189329, 0.43283304]])
you know simple linear regression looks like
y = b + m1x
Then 2-degree polynomial regression looks like
y = b + m1x + m2(x^2)
and 3-degree:
y = b + m1x + m2(x^2) + m3(x^3)
and so on... so for my case two coefficients are just m1 and m2 in according order.
so finally my formula becomes:
y = b + 0.93189329x + 0.43283304(x^2).
I am using a CAR model as in https://docs.pymc.io/notebooks/PyMC3_tips_and_heuristic.html. I am using the same problem and model described in the link: Oi∼exp(Poisson(exp(β0+β1∗aff+ϕi+log(Ei))), in which I want to obtain the values of β0 and β1.
Here, model2 operates the parameters:
with pm.Model() as model2:
# Vague prior on intercept
beta0 = pm.Normal('beta0', mu=0.0, tau=1.0e-5)
# Vague prior on covariate effect
beta1 = pm.Normal('beta1', mu=0.0, tau=1.0e-5)
# Random effects (hierarchial) prior
tau_h = pm.Gamma('tau_h', alpha=3.2761, beta=1.81)
# Spatial clustering prior
tau_c = pm.Gamma('tau_c', alpha=1.0, beta=1.0)
# Regional random effects
theta = pm.Normal('theta', mu=0.0, tau=tau_h, shape=N)
mu_phi = CAR2('mu_phi', w=wmat2, a=amat2, tau=tau_c, shape=N)
# Zero-centre phi
phi = pm.Deterministic('phi', mu_phi-tt.mean(mu_phi))
# Mean model
mu = pm.Deterministic('mu', tt.exp(logE + beta0 + beta1*quil + theta + phi))
# Likelihood
Yi = pm.Poisson('Yi', mu=mu, observed=O)
# Marginal SD of heterogeniety effects
sd_h = pm.Deterministic('sd_h', tt.std(theta))
# Marginal SD of clustering (spatial) effects
sd_c = pm.Deterministic('sd_c', tt.std(phi))
# Proportion sptial variance
alpha = pm.Deterministic('alpha', sd_c/(sd_h+sd_c))
trace2 = pm.sample(1000, tune=500, cores=4,
init='advi',
nuts_kwargs={"target_accept":0.9,
"max_treedepth": 15})
My question is: when using the sample function, with 4000 iterations (1000 per core), I obtain a vector of 4000 positions for each parameter in the model. What is the best value for the parameter: the last value (result of the final iteration) or the average value, since they are calculated via a Markov chain?
My objective is to perform an Inverse Laplace Transform on some decay data (NMR T2 decay via CPMG). For that, we were provided with the CONTIN algorithm. This algorithm was adapted to Matlab by Iari-Gabriel Marino, and it works very well. I want to adapt this code into Python. The core of the problem is with scipy.optimize.fmin, which is not minimizing the mean square deviation (MSD) in any way similar to Matlab's fminsearch. The latter results in a good minimization, while the former doesn't.
I have gone through line by line of my adapted code in Python, and the original Matlab. I checked every matrix and every output. I used this to identify that the critical point is in fmin. I also tried scipy.optimize.minimize and other minimization algorithms, but none gave even remotely satisfactory results.
I have made two MWE, for Python and Matlab, to make it reproducible to all. The example data were obtained from the documentation of the matlab function. Apologies if this is long code, but I don't really know how to shorten it without sacrificing readability and clarity. I tried to have the lines match as closely as possible. I am using Python 3.7.3, scipy v1.3.0, numpy 1.16.2, Matlab R2018b, on Windows 8.1. It's a relatively recent Anaconda install (<2 months).
My code:
import numpy as np
from scipy.optimize import fmin
import matplotlib.pyplot as plt
def msd(g, y, A, alpha, R, w, constraints):
""" msd: mean square deviation. This is the function to be minimized by fmin"""
if 'zero_at_extremes' in constraints:
g[0] = 0
g[-1] = 0
if 'g>0' in constraints:
g = np.abs(g)
r = np.diff(g, axis=0, n=2)
yfit = A # g
# Sum of weighted square residuals
VAR = np.sum(w * (y - yfit) ** 2)
# Regularizor
REG = alpha ** 2 * np.sum((r - R # g) ** 2)
# output to be minimized
return VAR + REG
# Objective: match this distribution
g0 = np.array([0, 0, 10.1625, 25.1974, 21.8711, 1.6377, 7.3895, 8.736, 1.4256, 0, 0]).reshape((-1, 1))
s0 = np.logspace(-3, 6, len(g0)).reshape((-1, 1))
t = np.linspace(0.01, 500, 100).reshape((-1, 1))
sM, tM = np.meshgrid(s0, t)
A = np.exp(-tM / sM)
np.random.seed(1)
# Creates data from the initial distribution with some random noise.
data = (A # g0) + 0.07 * np.random.rand(t.size).reshape((-1, 1))
# Parameters and function start
alpha = 1E-2 # regularization parameter
s = np.logspace(-3, 6, 20).reshape((-1, 1)) # x of the ILT
g0 = np.ones(s.size).reshape((-1, 1)) # guess of y of ILT
y = data # noisy data
options = {'maxiter':1e8, 'maxfun':1e8} # for the fmin function
constraints=['g>0', 'zero_at_extremes'] # constraints for the MSD function
R=np.zeros((len(g0) - 2, len(g0)), order='F') # Regularizor
w=np.ones(y.reshape(-1, 1).size).reshape((-1, 1)) # Weights
sM, tM = np.meshgrid(s, t, indexing='xy')
A = np.exp(-tM/sM)
g0 = g0 * y.sum() / (A # g0).sum() # Makes a "better guess" for the distribution, according to algorithm
print('msd of input data:\n', msd(g0, y, A, alpha, R, w, constraints))
for i in range(5): # Just for testing. If this is extremely high, ~1000, it's still bad.
g = fmin(func=msd,
x0 = g0,
args=(y, A, alpha, R, w, constraints),
**options,
disp=True)[:, np.newaxis]
msdfit = msd(g, y, A, alpha, R, w, constraints)
if 'zero_at_extremes' in constraints:
g[0] = 0
g[-1] = 0
if 'g>0' in constraints:
g = np.abs(g)
g0 = g
print('New guess', g)
print('Final msd of g', msdfit)
# Visualize the fit
plt.plot(s, g, label='Initial approximation')
plt.plot(np.logspace(-3, 6, 11), np.array([0, 0, 10.1625, 25.1974, 21.8711, 1.6377, 7.3895, 8.736, 1.4256, 0, 0]), label='Distribution to match')
plt.xscale('log')
plt.legend()
plt.show()
Matlab:
% Objective: match this distribution
g0 = [0 0 10.1625 25.1974 21.8711 1.6377 7.3895 8.736 1.4256 0 0]';
s0 = logspace(-3,6,length(g0))';
t = linspace(0.01,500,100)';
[sM,tM] = meshgrid(s0,t);
A = exp(-tM./sM);
rng(1);
% Creates data from the initial distribution with some random noise.
data = A*g0 + 0.07*rand(size(t));
% Parameters and function start
alpha = 1e-2; % regularization parameter
s = logspace(-3,6,20)'; % x of the ILT
g0 = ones(size(s)); % initial guess of y of ILT
y = data; % noisy data
options = optimset('MaxFunEvals',1e8,'MaxIter',1e8); % constraints for fminsearch
constraints = {'g>0','zero_at_the_extremes'}; % constraints for MSD
R = zeros(length(g0)-2,length(g0));
w = ones(size(y(:)));
[sM,tM] = meshgrid(s,t);
A = exp(-tM./sM);
g0 = g0*sum(y)/sum(A*g0); % Makes a "better guess" for the distribution
disp('msd of input data:')
disp(msd(g0, y, A, alpha, R, w, constraints))
for k = 1:5
[g,msdfit] = fminsearch(#msd,g0,options,y,A,alpha,R,w,constraints);
if ismember('zero_at_the_extremes',constraints)
g(1) = 0;
g(end) = 0;
end
if ismember('g>0',constraints)
g = abs(g);
end
g0 = g;
end
disp('New guess')
disp(g)
disp('Final msd of g')
disp(msdfit)
% Visualize the fit
semilogx(s, g)
hold on
semilogx(logspace(-3,6,11), [0 0 10.1625 25.1974 21.8711 1.6377 7.3895 8.736 1.4256 0 0])
legend('First approximation', 'Distribution to match')
hold off
function out = msd(g,y,A,alpha,R,w,constraints)
% msd: The mean square deviation; this is the function
% that has to be minimized by fminsearch
% Constraints and any 'a priori' knowledge
if ismember('zero_at_the_extremes',constraints)
g(1) = 0;
g(end) = 0;
end
if ismember('g>0',constraints)
g = abs(g); % must be g(i)>=0 for each i
end
r = diff(diff(g(1:end))); % second derivative of g
yfit = A*g;
% Sum of weighted square residuals
VAR = sum(w.*(y-yfit).^2);
% Regularizor
REG = alpha^2 * sum((r-R*g).^2);
% Output to be minimized
out = VAR+REG;
end
Here is the optimization in Python
Here is the optimization in Matlab
I have checked the output of MSD of g0 before starting, and both give the value of 2651. After minimization, Python goes up, to 4547, and Matlab goes down to 0.1381.
I think the problem is one of the following. It's in my implementation, that is, I am using fmin wrong, or there's some other passage I got wrong, but I can't figure out what. The fact the MSD increases when it should have decreased with a minimization function is damning. Reading the documentation, the scipy implementation is different from Matlab's (they use the Nelder Mead method described in Lagarias, per their documentation), while scipy uses the original Nelder Mead). Maybe that affects significantly? Or perhaps my initial guess is too bad for scipy's algorithm?
So, quite a long time since I posted this, but I wanted to share what I ended up learning and doing.
The Inverse Laplace Transform for CPMG data is a bit of a misnomer, and it's more properly called just inversion. The general problem is solving a Fredholm integral of the first kind. One way of doing this is the Tikhonov regularization method. Turns out, you can describe this problem quite easily using numpy, and solve it with a scipy package, so I don't have to "reinvent" the wheel with this.
I used the solution shown in this post, and the names here reflect that solution.
def tikhonov_regularized_inversion(
kernel: np.ndarray, alpha: float, data: np.ndarray
) -> np.ndarray:
data = data.reshape(-1, 1)
I = alpha * np.eye(*kernel.shape)
C = np.concatenate([kernel, I], axis=0)
d = np.concatenate([data, np.zeros_like(data)])
x, _ = nnls(C, d.flatten())
Here, kernel is a matrix containing all the possible exponential decay curves, and my solution judges the contribution of each decay curve in the data I received. First, I stack my data as a column, then pad it with zeros, creating the vector d. I then stack my kernel on top of a diagonal matrix containing the regularization parameter alpha along the diagonal, of the same size as the kernel. Last, I call the convenient nnls, a non negative least square solver in scipy.optimize. This is because there's no reason to have a negative contribution, only no contribution.
This solved my problem, it's quick and convenient.
I have two independent Normal distributed random variables a, b. In pymc, it's something like:
from pymc import Normal
def model():
a = Normal('a', tau=0.01)
b = Normal('b', tau=0.1)
I'd like to know what's a+b if we can see it as a normal distribution, that is:
from pymc import Normal
def model():
a = Normal('a', tau=0.01)
b = Normal('b', tau=0.1)
tau_c = Uniform("tau_c", lower=0.0, upper=1.0)
c = Normal("a+b", tau=tau_c, observed=True, value=a+b)
Then I'd like to estimate tau_c, but it doesn't work with pymc because a and b are stochastic (if they are arrays it's posible, but I don't have observations of a or b, I just know their distributions).
A way I think I can do it, is generating random values by using the distributions of each a and b and then doing this:
def model(a, b):
tau_c = Uniform("tau_c", lower=0.0, upper=1.0)
c = Normal("a+b", tau=tau_c, observed=True, value=a+b)
But I think there's a better way of doing this with pymc.
Thanks!
If I understood correctly your question and code you should be doing something simpler. If you want to estimate the parameters of the distribution given by the sum of a and b, then use only the first block in the following example. If you also want to estimate the parameters for the variable a independently of the parameter of variable b, then use the other two blocks
with pm.Model() as model:
mu = pm.Normal('mu', mu=0, sd=10)
sd = pm.HalfNormal('sd', 10)
alpha = pm.Normal('alpha', mu=0, sd=10)
ab = pm.SkewNormal('ab', mu=mu, sd=sd, alpha=alpha, observed=a+b)
mu_a = pm.Normal('mu_a', mu=0, sd=10)
sd_a = pm.HalfNormal('sd_a', 10)
alpha_a = pm.Normal('alpha_a', mu=0, sd=10)
a = pm.SkewNormal('a', mu=mu_a, sd=sd_a, alpha=alpha_a, observed=a)
mu_b = pm.Normal('mu_b', mu=0, sd=10)
sd_b = pm.HalfNormal('sd_b', 10)
alpha_b = pm.Normal('alpha_b', mu=0, sd=10)
b = pm.SkewNormal('b', mu=mu_b, sd=sd_b, alpha=alpha_b, observed=b)
trace = pm.sample(1000)
Be sure to use the last version of PyMC3, since previous versions did not include the SkewNormal distribution.
Update:
Given that you change your question:
If a and b are independent random variables and both are normally distributed then their sum is going to be normally distributed.
a ~ N(mu_a, sd_a²)
b ~ N(mu_b, sd_b²)
a+b ~ N(mu_a+mu_b, sd_a²+sd_b²)
that is you sum their means and you sum their variances (not their standard deviations). You don't need to use PyMC3.
If you still want to use PyMC3 (may be your distribution are not Gaussian and you do not know how to compute their sum analytically). You can generate synthetic data from your a and b distributions and then use PyMC3 to estimate the parameters, something in line of:
with pm.Model() as model:
mu = pm.Normal('mu', mu=0, sd=10)
sd = pm.HalfNormal('sd', 10)
ab = pm.Normal('ab', mu=mu, sd=sd, observed=a+b)
trace = pm.sample(1000)