Implementing stochastic gradient descent - python

I am trying to implement a basic way of the stochastic gradient desecent with multi linear regression and the L2 Norm as loss function.
The result can be seen in this picture:
Its pretty far of the ideal regression line, but I dont really understand why thats the case. I double checked all array dimensions and they all seem to fit.
Below is my source code. If anyone can see my error or give me a hint I would appreciate that.
def SGD(x,y,learning_rate):
theta = np.array([[0],[0]])
for i in range(N):
xi = x[i].reshape(1,-1)
y_pre = xi#theta
theta = theta + learning_rate*(y[i]-y_pre[0][0])*xi.T
print(theta)
return theta
N = 100
x = np.array(np.linspace(-2,2,N))
y = 4*x + 5 + np.random.uniform(-1,1,N)
X = np.array([x**0,x**1]).T
plt.scatter(x,y,s=6)
th = SGD(X,y,0.1)
y_reg = np.matmul(X,th)
print(y_reg)
print(x)
plt.plot(x,y_reg)
plt.show()
Edit: Another solution was to shuffle the measurements with x = np.random.permutation(x)

to illustrate my comment,
def SGD(x,y,n,learning_rate):
theta = np.array([[0],[0]])
# currently it does exactly one iteration. do more
for _ in range(n):
for i in range(len(x)):
xi = x[i].reshape(1,-1)
y_pre = xi#theta
theta = theta + learning_rate*(y[i]-y_pre[0][0])*xi.T
print(theta)
return theta
SGD(X,y,10,0.01) yields the correct result

Related

Fitting N datapoints in 3D on a straight line

I have N datapoints in 3d that lie on a line. The y-direction is fixed, so I want to fit x,z against y.
Lets say we have 6 datapoints, that align with the y axis:
x=[0,0,0,0,0,0]
y=[1,2,3,4,5,6]
z=[0,0,0,0,0,0]
what I want to do:
I want to get the best set of fitting parameters, the gof and fitting error.
So far with a least squarefit, I get a reduced chi2 of < 1, which means I might be overfitting (or misunderstanding something).
Questions:
1.) For example, for the above example I receive a reduced chi2 of 0- this seems false to me?
2.) Also, I am wondering if a least square fit is adequate for this as well- maybe someone can shed some insight on this? Would svd be a better choice for this?
import scipy.optimize
import numpy as np
#define a model (line)
def linear(params, y):
a, b = params
data = [a * y[i] + b for i in range(0, len(y))]
return data
#define the residuals that need to me minimized
def fitting_cost(params, x, y, z):
a_x, b_x, a_z, b_z = params
x_pred = linear((a_x, b_x), y)
z_pred = linear((a_z, b_z), y)
res_x = [x_pred[i] - x[i] for i in range(0, 6)]
res_z = [z_pred[i] - z[i] for i in range(0, 6)]
return res_x + res_z
#do the fit and return parameters plus gof
def least_squares_fit(x, y, z):
sp = [0,0,0,0]
result = scipy.optimize.leastsq(fitting_cost, sp,
args=(x, y, z),
full_output=True)
s_sq = (result[2]['fvec'] ** 2).sum() / (
len(result[2]['fvec']) - len(result[0]))
return result[0], s_sq

Incremental Bayesian updates with multi-dimensional parameters

I am trying to use PYMC3 for a Bayesian model where I would like to repeatedly train my model on new unseen data. I am thinking I would need to update the priors with the posterior of the previously trained model every time I see the data, similar to how is achieved here https://docs.pymc.io/notebooks/updating_priors.html. They use the following function that finds the KDE from the samples and replacing each of the original definitions of the parameters in the model with a call to from_posterior.
def from_posterior(param, samples):
smin, smax = np.min(samples), np.max(samples)
width = smax - smin
x = np.linspace(smin, smax, 100)
y = stats.gaussian_kde(samples)(x)
# what was never sampled should have a small probability but not 0,
# so we'll extend the domain and use linear approximation of density on it
x = np.concatenate([[x[0] - 3 * width], x, [x[-1] + 3 * width]])
y = np.concatenate([[0], y, [0]])
return Interpolated(param, x, y)
And here is my original model.
def create_model(batsmen, bowlers, id1, id2, X):
testval = [[-5,0,1,2,3.5,5] for i in range(0, 9)]
l = [i for i in range(9)]
model = pm.Model()
with model:
delta_1 = pm.Uniform("delta_1", lower=0, upper=1)
delta_2 = pm.Uniform("delta_2", lower=0, upper=1)
inv_sigma_sqr = pm.Gamma("sigma^-2", alpha=1.0, beta=1.0)
inv_tau_sqr = pm.Gamma("tau^-2", alpha=1.0, beta=1.0)
mu_1 = pm.Normal("mu_1", mu=0, sigma=1/pm.math.sqrt(inv_tau_sqr), shape=len(batsmen))
mu_2 = pm.Normal("mu_2", mu=0, sigma=1/pm.math.sqrt(inv_tau_sqr), shape=len(bowlers))
delta = pm.math.ge(l, 3) * delta_1 + pm.math.ge(l, 6) * delta_2
eta = [pm.Deterministic("eta_" + str(i), delta[i] + mu_1[id1[i]] - mu_2[id2[i]]) for i in range(9)]
cutpoints = pm.Normal("cutpoints", mu=0, sigma=1/pm.math.sqrt(inv_sigma_sqr), transform=pm.distributions.transforms.ordered, shape=(9,6), testval=testval)
X_ = [pm.OrderedLogistic("X_" + str(i), cutpoints=cutpoints[i], eta=eta[i], observed=X[i]-1) for i in range(9)]
return model
Here, the problem is that some of my parameters such as mu_1, are multidimensional. This is why I get the following error:
ValueError: points have dimension 1, dataset has dimension 1500
because of the line y = stats.gaussian_kde(samples)(x).
Can someone please help me make this work for multi-dimensional parameters? I don't properly understand what KDE is and how the code computes it.
Thank you in advance!!

Scipy `fmin_cg` args are not match with my functions args

I am trying to build a linear regression model and find optimal values using fmin_cg optimizer.
I have two functions for this job. First linear_reg_cost which is cost function and second linear_reg_grad which is gradient of cost function. This functions both have same argument.
def hypothesis(x,theta):
return np.dot(x,theta)
Cost function:
def linear_reg_cost(x_flatten, y, theta_flatten, lambda_, num_of_features,num_of_samples):
x = x_flatten.reshape(num_of_samples, num_of_features)
theta = theta_flatten.reshape(n,1)
loss = hypothesis(x,theta)-y
regularizer = lambda_*np.sum(theta[1:,:]**2)/(2*m)
j = np.sum(loss ** 2)/(2*m)
return j
Gradient function:
def linear_reg_grad(x_flatten, y, theta_flatten, lambda_, num_of_features,num_of_samples):
x = x_flatten.reshape(num_of_samples, num_of_features)
m,n = x.shape
theta = theta_flatten.reshape(n,1)
new_theta = np.zeros(shape=(theta.shape))
loss = hypothesis(x,theta)-y
gradient = np.dot(x.T,loss)
new_theta[0:,:] = gradient/m
new_theta[1:,:] = gradient[1:,:]/m + lambda_*(theta[1:,]/m)
return new_theta
and fmin_cg:
theta = np.ones(n)
from scipy.optimize import fmin_cg
new_theta = fmin_cg(f=linear_reg_cost, x0=theta, fprime=linear_reg_grad,args=(x.flatten(), y, lambda_, m,n))
Note: I flatten x as input and retrieve in the cost and gradient function as matrix.
the output error:
<ipython-input-98-b29c1b8f6e58> in linear_reg_grad(x_flatten, y, theta_flatten, lambda_, num_of_features, num_of_samples)
1 def linear_reg_grad(x_flatten, y, theta_flatten, lambda_,num_of_features, num_of_samples):
----> 2 x = x_flatten.reshape(num_of_samples, num_of_features)
3 m,n = x.shape
4 theta = theta_flatten.reshape(n,1)
5 new_theta = np.zeros(shape=(theta.shape))
ValueError: cannot reshape array of size 2 into shape (2,12)
Note: x.shape = (12,2), y.shape = (12,1) ,theta.shape = (2,). So num_of_features =2 and num_of_samples=12. But error shows that my input x is parsing instead of theta. Why this happening even when I explicitly assigned args in fmin_cg? And how I should solve this problem?
Thanks for any advice
All of your implementations are correct but you have a little mistake.
Be inform to pass arguments in order for both of your functions.
Your problem is the order of num_of_feature and num_of_samples. You can replace their position with each other in linear_reg_grad or linear_reg_cost. Of course you should change this order in scipy.optimize.fmin_cg, args argument.
Second important thing is, x as first argument in fmin_cg is the variable you want to update each time and find the optimal one. So in your solution, x in fmin_cg must be theta not your x which is your input.

Implementing Linear Regression using Gradient Descent

I have just started in machine learning and currently taking the course by andrew Ng's Machine learning Course. I have implemented the linear regression algorithm in python but the result is not desirable. I code of python is as follows:
import numpy as np
x = [[1,1,1,1,1,1,1,1,1,1],[10,20,30,40,50,60,70,80,90,100]]
y = [10,16,20,23,29,30,35,40,45,50]
x = np.array(x)
y = np.array(y)
theta = np.zeros((2,1))
def Cost(x,y,theta):
m = len(y)
pred_ions = np.transpose(theta).dot(x)
J = 1/(2*m) * np.sum((pred_ions - y)*(pred_ions - y))
return J
def GradientDescent(x,y,theta,iteration,alpha):
m = len(y)
pred_ions = np.transpose(theta).dot(x)
i = 1
while i <= iteration:
theta[0] = theta[0] - alpha/m * np.sum(pred_ions - y)
theta[1] = theta[1] - alpha/m * np.sum((pred_ions - y)*x[1,:])
Cost_History = Cost(x,y,theta)
i = i + 1
return theta[0],theta[1]
itera = 1000
alpha = 0.01
a,b = GradientDescent(x,y,theta,itera, alpha)
print(a)
print(b)
I am not able to figure out what exactly is the problem. But, my results are something very strange. The value of parameter is, according to above code, are 298 and 19890. Any Help would be appreciated. Thanks.
Ah. I did this assignment too a while ago.
See this mentioned in Page 7 of the assignment PDF:
Octave/MATLAB array indices start from one, not zero. If you’re
storing θ0 and θ1 in a vector called theta, the values will be
theta(1) and theta(2).
So, in your while loop, change the theta[0] and theta[1] to theta[1] and theta[2]. It should work right.
Also, if you are storing the Cost in Cost_History, shouldn't it include the iteration variable like
Cost_History[i] = Cost(x,y,theta)
Just check that too! Hope this helped.
Edit 1: Okay, I have understood the issue now. In his video, Andrew Ng says that you need to update both the thetas simultaneously. To do that, store the theta matrix in a temp variable. And update theta[0] and theta[1] based on the temp values.
Currently in your code, during theta[1] = it has changed the theta[0] to the newer value already, so both are not being updated simultaneously.
So instead, do this:
while i <= iteration:
temp = theta
theta[0] = theta[0] - alpha/m * np.sum(np.transpose(temp).dot(x) - y)
theta[1] = theta[1] - alpha/m * np.sum((np.transpose(temp).dot(x) - y)*x[1,:])
Cost_History[i] = Cost(x,y,theta)
i = i + 1
It should work now, if not, let me know, I will debug on my side.

Code Not Converging Vanilla Gradient Descent

I have a specific analytical gradient I am using to calculate my cost f(x,y), and gradients dx and dy. It runs, but I can't tell if my gradient descent is broken. Should I plot my partial derivatives x and y?
import math
gamma = 0.00001 # learning rate
iterations = 10000 #steps
theta = np.array([0,5]) #starting value
thetas = []
costs = []
# calculate cost of any point
def cost(theta):
x = theta[0]
y = theta[1]
return 100*x*math.exp(-0.5*x*x+0.5*x-0.5*y*y-y+math.pi)
def gradient(theta):
x = theta[0]
y = theta[1]
dx = 100*math.exp(-0.5*x*x+0.5*x-0.0035*y*y-y+math.pi)*(1+x*(-x + 0.5))
dy = 100*x*math.exp(-0.5*x*x+0.5*x-0.05*y*y-y+math.pi)*(-y-1)
gradients = np.array([dx,dy])
return gradients
#for 2 features
for step in range(iterations):
theta = theta - gamma*gradient(theta)
value = cost(theta)
thetas.append(theta)
costs.append(value)
thetas = np.array(thetas)
X = thetas[:,0]
Y = thetas[:,1]
Z = np.array(costs)
iterations = [num for num in range(iterations)]
plt.plot(Z)
plt.xlabel("num. iteration")
plt.ylabel("cost")
I strongly recommend you check whether or not your analytic gradient is working correcly by first evaluating it against a numerical gradient.
I.e make sure that your f'(x) = (f(x+h) - f(x)) / h for some small h.
After that, make sure your updates are actually in the right direction by picking a point where you know x or y should decrease and then checking the sign of your gradient function output.
Of course make sure your goal is actually minimization vs maximization.

Categories

Resources