I've run into some problems while trying to implement Stochastic Gradient Descent, and basically what is happening is that my cost is growing like crazy and I don't have a clue why.
MSE implementation:
def mse(x,y,w,b):
predictions = x # w
summed = (np.square(y - predictions - b)).mean(0)
cost = summed / 2
return cost
Gradients:
def grad_w(y,x,w,b,n_samples):
return -y # x / n_samples + x.T # x # w / n_samples + b * x.mean(0)
def grad_b(y,x,w,b,n_samples):
return -y.mean(0) + x.mean(0) # w + b
SGD Implementation:
def stochastic_gradient_descent(X,y,w,b,learning_rate=0.01,iterations=500,batch_size =100):
length = len(y)
cost_history = np.zeros(iterations)
n_batches = int(length/batch_size)
for it in range(iterations):
cost =0
indices = np.random.permutation(length)
X = X[indices]
y = y[indices]
for i in range(0,length,batch_size):
X_i = X[i:i+batch_size]
y_i = y[i:i+batch_size]
w -= learning_rate*grad_w(y_i,X_i,w,b,length)
b -= learning_rate*grad_b(y_i,X_i,w,b,length)
cost = mse(X_i,y_i,w,b)
cost_history[it] = cost
if cost_history[it] <= 0.0052: break
return w, cost_history[:it]
Random Variables:
w_true = np.array([0.2, 0.5,-0.2])
b_true = -1
first_feature = np.random.normal(0,1,1000)
second_feature = np.random.uniform(size=1000)
third_feature = np.random.normal(1,2,1000)
arrays = [first_feature,second_feature,third_feature]
x = np.stack(arrays,axis=1)
y = x # w_true + b_true + np.random.normal(0,0.1,1000)
w = np.asarray([0.0,0.0,0.0], dtype='float64')
b = 1.0
After running this:
theta,cost_history = stochastic_gradient_descent(x,y,w,b)
print('Final cost/MSE: {:0.3f}'.format(cost_history[-1]))
I Get that:
Final cost/MSE: 3005958172614261248.000
And here is the plot
Here are a few suggestions:
your learning rate is too big for the training: changing it to something like 1e-3 should be fine.
your update part could be slightly modified as follows:
def stochastic_gradient_descent(X,y,w,b,learning_rate=0.01,iterations=500,batch_size =100):
length = len(y)
cost_history = np.zeros(iterations)
n_batches = int(length/batch_size)
for it in range(iterations):
cost =0
indices = np.random.permutation(length)
X = X[indices]
y = y[indices]
for i in range(0,length,batch_size):
X_i = X[i:i+batch_size]
y_i = y[i:i+batch_size]
w -= learning_rate*grad_w(y_i,X_i,w,b,len(X_i)) # the denominator should be the actual batch size
b -= learning_rate*grad_b(y_i,X_i,w,b,len(X_i))
cost += mse(X_i,y_i,w,b)*len(X_i) # add batch loss
cost_history[it] = cost/length # this is a running average of your batch losses, which is statistically more stable
if cost_history[it] <= 0.0052: break
return w, b, cost_history[:it]
The final results:
w_true = np.array([0.2, 0.5, -0.2])
b_true = -1
first_feature = np.random.normal(0,1,1000)
second_feature = np.random.uniform(size=1000)
third_feature = np.random.normal(1,2,1000)
arrays = [first_feature,second_feature,third_feature]
x = np.stack(arrays,axis=1)
y = x # w_true + b_true + np.random.normal(0,0.1,1000)
w = np.asarray([0.0,0.0,0.0], dtype='float64')
b = 0.0
theta,bias,cost_history = stochastic_gradient_descent(x,y,w,b,learning_rate=1e-3,iterations=3000)
print("Final epoch cost/MSE: {:0.3f}".format(cost_history[-1]))
print("True final cost/MSE: {:0.3f}".format(mse(x,y,theta,bias)))
print(f"Final coefficients:\n{theta,bias}")
Hey #TQCH and thanks for that. I've come up with a different approach to implement SGD without an inner loop and the results were also pretty sweet.
def stochastic_gradient_descent(X,y,w,b,learning_rate=0.35,iterations=3000,batch_size =100):
length = len(y)
cost_history = np.zeros(iterations)
n_batches = int(length/batch_size)
marker = 0
cost = mse(X,y,w,b)
print(cost)
for it in range(iterations):
cost =0
indices = np.random.choice(length, batch_size)
X_i = X[indices]
y_i = y[indices]
w -= learning_rate*grad_w(y_i,X_i,w,b)
b -= learning_rate*grad_b(y_i,X_i,w,b)
cost = mse(X_i,y_i,w,b)
cost_history[it] = cost
if cost_history[it] <= 0.0075 and cost_history[it] > 0.0071: marker = it
if cost <= 0.0052: break
print(f'{w}, {b}')
return w, cost_history, marker, cost
w = np.asarray([0.0,0.0,0.0], dtype='float64')
b = 1.0
theta,cost_history, marker, cost = stochastic_gradient_descent(x,y,w,b)
print(f'Number of iterations: {marker}')
print('Final cost/MSE: {:0.3f}'.format(cost))
which gave me these results:
1.9443112664859845,
[ 0.19592532 0.31735225 -0.20044424], -0.9059800816290591
Number of iterations: 68
Final cost/MSE: 0.005
But you're right I missed that I was dividing by total length of vector y and not by batch size and forgot to add batch loss!
Thanks for that!
Related
I was trying to build a three layer neural network with each having one neuron using numpy. One was the input layer, one was the hidden layer and one was the output layer. The hidden layer had relu activation and the output layer had sigmoid activation. I tried to make the gradient descent and mean squared error all in numpy, where the gradient descent had an array as input where all the data was. The data was actually random numbers where if the input is more than 18 then the ouput will be 1.But the problem is that the weights and biases are initialized to 1 and after any amount of epoch given, the weights and biases wont change but the loss will decrease. The structure if it helps: (x)---((y1 = w1 * x + b1)|(z1 = relu(y1)))----((y2 = w2 * z1 + b2)|(z2 = sigmoid(y2))). The code:
import numpy as np
import math
import random as r
# dataset generation
x = []
y = []
for i in range (100):
age = r.randint(0, 80)
x.append(age)
if age > 18:
y.append(1)
else:
y.append(0)
# forward pass functions
def sig(x):
y = []
for i in x:
y.append(1/(1+math.exp(-i)))
return y
def relu(x):
y = []
for i in x:
if i > 0:
y.append(i)
else:
y.append(0)
return y
def mse(ytrue,ypred):
total = 0
for yt,yp in zip(ytrue,ypred):
total += (yt - yp)**2
return total/len(ytrue)
# backward pass functions
def dmse(ytrue,ypred):
total = []
for yt,yp in zip(ytrue,ypred):
total.append((yt - yp)*2)
return total
def d_of_relu(x):
y = []
for i in x:
if i > 0:
y.append(1)
else:
y.append(0)
return y
def d_of_sig(x):
y = []
for i in y:
y.append(y*(1-y))
return y
# additional functions
def y_calculation(x,w,b):
y = []
for i in x:
y.append(i*w+b)
return y
def chain_addition_second_layer_with_weight(dcdz2,dz2dy2,dy2dw2):
total = 0
for a,b in zip(dcdz2,dz2dy2):
total += a*b*dy2dw2
return total/len(dcdz2)
def chain_addition_second_layer_with_bias(dcdz2,dz2dy2):
total = 0
for a,b in zip(dcdz2,dz2dy2):
total += a*b*1
return total/len(dcdz2)
def chain_addition_first_layer_with_bias(dcdz2,dz2dy2,dy2dz1,dz1dy1):
total = 0
for a,b,d in zip(dcdz2,dz2dy2,dz1dy1):
total += a*b*dy2dz1*d*1
return total/len(dcdz2)
def chain_addition_first_layer_with_weight(dcdz2,dz2dy2,dy2dz1,dz1dy1,dy1dw1):
total = 0
for a,b,d in zip(dcdz2,dz2dy2,dz1dy1):
total += a*b*dy2dz1*d*dy1dw1
return total/len(dcdz2)
# actual gradient descend
def gradient_descend(x,y,lr,epoch):
w1 = w2 = b1 = b2 = 2
for i in range(epoch):
y1 = y_calculation(x,w1,b1)
z1 = relu(y1)
y2 = y_calculation(z1,w2,b2)
z2 = sig(y2)
cost = mse(y,z2)
dcdz2 = dmse(y,z2)
dz2dy2 = d_of_sig(z2)
dy2dw2 = z1
dy2db2 = 1
dy2dz1 = w2
dz1dy1 = d_of_relu(z1)
dy1dw1 = x
w2 = w2 - lr * chain_addition_second_layer_with_weight(dcdz2,dz2dy2,dy2dw2)
b2 = b2 - lr * chain_addition_second_layer_with_bias(dcdz2,dz2dy2)
w1 = w1 - lr * chain_addition_first_layer_with_weight(dcdz2,dz2dy2,dy2dz1,dz1dy1,dy1dw1)
b1 = b1 - lr * chain_addition_first_layer_with_bias(dcdz2,dz2dy2,dy2dz1,dz1dy1)
return w2,b2,w1,b1,cost
print(gradient_descend(x,y,0.1,300))
If anyone replies the necessary changes, it wil be very helpful. Thanks in advance
This is how I generated the training data for my Linear Regression.
!pip install grapher, numpy
from grapher import Grapher
import matplotlib.pyplot as plt
import numpy as np
# Secret: y = 3x + 4
# x, y = [float(row[0]) for row in rows], [float(row[5]) for row in rows]
x, y = [a for a in range(-20, 20)], [3*a + 4 for a in range(-20, 20)]
g = Grapher(['3*x + 4'], title="y = 3x+4")
plt.scatter(x, y)
g.plot()
Then, I tried gradient descent on a simple quadratic function (x - 7)^2
def n(x):
return (x-7)**2
cur_x = 0
lr = 0.001
ittr = 10000
n = 0
prev_x = -1
max_precision = 0.0000001
precision = 1
while n < ittr and precision > max_precision:
prev_x = cur_x
cur_x = cur_x - lr * (2*(cur_x - 7))
precision = abs(prev_x - cur_x)
n+=1
if n%100 == 0:
print(n, ':')
print(cur_x)
print()
print(cur_x)
And this works perfectly.
Then I made a Linear Regression class to make the same thing happen.
class LinearRegression:
def __init__(self, X, Y):
self.X = X
self.Y = Y
self.m = 1
self.c = 0
self.learning_rate = 0.01
self.max_precision = 0.000001
self.itter = 10000
def h(self, x, m, c):
return m * x + c
def J(self, m, c):
loss = 0
for x in self.X:
loss += (self.h(x, m, c) - self.Y[self.X.index(x)])**2
return loss/2
def calc_loss(self):
return self.J(self.m, self.c)
def guess_answer(self, step=1):
losses = []
mcvalues = []
for m in np.arange(-10, 10, step):
for c in np.arange(-10, 10, step):
mcvalues.append((m, c))
losses.append(self.J(m, c))
minloss = sorted(losses)[0]
return mcvalues[losses.index(minloss)]
def gradient_decent(self):
print('Orignal: ', self.m, self.c)
nm = 0
nc = 0
prev_m = 0
perv_c = -1
mprecision = 1
cprecision = 1
while nm < self.itter and mprecision > self.max_precision:
prev_m = self.m
nm += 1
self.m = self.m - self.learning_rate * sum([(self.h(x, self.m, self.c) - self.Y[self.X.index(x)])*x for x in self.X])
mprecision = abs(self.m - prev_m)
return self.m, self.c
def graph_loss(self):
plt.scatter(0, self.J(0))
print(self.J(0))
plt.plot(self.X, [self.J(x) for x in self.X])
def check_loss(self):
plt.plot([m for m in range(-20, 20)], [self.J(m, 0) for m in range(-20, 20)])
x1 = 10
y1 = self.J(x1, 0)
l = sum([(self.h(x, x1, self.c) - self.Y[self.X.index(x)])*x for x in self.X])
print(l)
plt.plot([m for m in range(-20, 20)], [(l*(m - x1)) + y1 for m in range(-20, 20)])
plt.scatter([x1], [y1])
LinearRegression(x, y).gradient_decent()
Output is
Orignal: 1 0
(nan, 0)
Then I tried graphing my Loss Function (J(m, c)) and tried to use its derivative to see if it actuallly gives slope. I was in a suspection that I have messed up my d(J(m, c))/dm
After running LinearRegression(x, y).check_loss()
I get this graph
It is a slope at whatever point I want it to be. Why isnt it working in my code?
Now that I see, the main problem is with the learning rate. Learning rate of 0.01 is too high. Keeping it lower than 0.00035 works well. About 0.0002 works well and quick. I tried graphing things, and saw it made a lot of difference.
With a learning rate of 0.00035 and 1000 iterations, this was the graph:
With a learning rate of 0.0002 and 1000 iterations, this was the graph:
With a learning rate of 0.0004 and just 10 iterations, this was the graph:
Instead of converging to the point, its diverging. THat is why learning rate is important and anything bigger than 0.0004 will result in the same.
It took me quite some time to figure out.
I am newbie to NN and I am trying to implement NN with Python/Numpy from the code I found at:
"Create a Simple Neural Network in Python from Scratch"
enter link description here
My input array is:
array([[5.71, 5.77, 5.94],
[5.77, 5.94, 5.51],
[5.94, 5.51, 5.88],
[5.51, 5.88, 5.73]])
Output array is:
array([[5.51],
[5.88],
[5.73],
[6.41]])
after running the code, I see following results which are not correct:
synaptic_weights after training
[[1.90625275]
[2.54867698]
[1.07698312]]
outputs after training
[[1.]
[1.]
[1.]
[1.]]
Here is the core of the code:
for iteration in range(1000):
input_layer = tr_input
outputs = sigmoid(np.dot(input_layer, synapic_weights))
error = tr_output - outputs
adjustmnets = error * sigmoid_derivative(outputs)
synapic_weights +=np.dot(input_layer.T, adjustmnets )
print('synaptic_weights after training')
print(synapic_weights)
print('outputs after training')
print(outputs)
What should I change in this code so it works for my data? Or shall I take different method? Any help is highly appreciated.
That's because you are using a wrong activation function (i.e. sigmoid). The main reason why we use sigmoid function is because it exists between (0 to 1). Therefore, it is especially used for models where we have to predict the probability as an output.Since probability of anything exists only between the range of 0 and 1, sigmoid is the right choice.
If you want to train a model to predict the values in your array, you should use a regression model. Otherwise, you can convert your output into labels (for example are 5.x to 0 and 6.x to 1) and retrain your model.
These are the steps involved in my neural network implementation.
Randomly initialize weights (θ Theta)
Implement forward propagation
Compute cost function
Implement back propagation to compute partial derivative
Use gradient descent
def forward_prop(X, theta_list):
m = X.shape[0]
a_list = []
z_list = []
a_list.append(np.insert(X, 0, values=np.ones(m), axis=1))
idx = 0
for idx, thera in enumerate(theta_list):
z_list.append(a_list[idx] * (theta_list[idx].T))
if idx != (len(theta_list)-1):
a_list.append(np.insert(sigmoid(z_list[idx]), 0, values=np.ones(m), axis=1))
else:
a_list.append(sigmoid(z_list[idx]))
return a_list, z_list
def back_prop(params, input_size, hidden_layers, num_labels, X, y, regularization, regularize):
m = X.shape[0]
X = np.matrix(X)
y = np.matrix(y)
theta_list = []
startCount = 0
idx = 0
for idx, val in enumerate(hidden_layers):
if idx == 0:
startCount = val * (input_size + 1)
theta_list.append(np.matrix(np.reshape(params[:startCount], (val, (input_size + 1)))))
if idx != 0:
tempCount = startCount
startCount += (val * (hidden_layers[idx-1] + 1))
theta_list.append(np.matrix(np.reshape(params[tempCount:startCount], (val, (hidden_layers[idx-1] + 1)))))
if idx == (len(hidden_layers)-1):
theta_list.append(np.matrix(np.reshape(params[startCount:], (num_labels, (val + 1)))))
a_list, z_list= forward_prop(X, theta_list)
J = cost(X, y, a_list[len(a_list)-1], theta_list, regularization, regularize)
d_list = []
d_list.append(a_list[len(a_list)-1] - y)
idx = 0
while idx < (len(theta_list)-1):
d_temp = np.multiply(d_list[idx] * theta_list[len(a_list) - 2 - idx], sigmoid_gradient(a_list[len(a_list) - 2 - idx]))
d_list.append(d_temp[:,1:])
idx += 1
delta_list = []
for theta in theta_list:
delta_list.append(np.zeros(theta.shape))
for idx, delta in enumerate(delta_list):
delta_list[idx] = delta_list[idx] + ((d_list[len(d_list) - 1 -idx].T) * a_list[idx])
delta_list[idx] = delta_list[idx] / m
if regularize:
for idx, delta in enumerate(delta_list):
delta_list[idx][:, 1:] = delta_list[idx][:, 1:] + (theta_list[idx][:, 1:] * regularization)
grad_list = np.ravel(delta_list[0])
idx = 1
while idx < (len(delta_list)):
grad_list = np.concatenate((grad_list, np.ravel(delta_list[idx])), axis=None)
idx += 1
return J, grad_list
def cost(X, y, h, theta_list, regularization, regularize):
m = X.shape[0]
X = np.matrix(X)
y = np.matrix(y)
J = (np.multiply(-y, np.log(h)) - np.multiply((1 - y), np.log(1 - h))).sum() / m
if regularize:
regularization_value = 0.0
for theta in theta_list:
regularization_value += np.sum(np.power(theta[:, 1:], 2))
J += (float(regularization) / (2 * m)) * regularization_value
return J
Implementation
so what I try to do is to simulate with Monte Carlo a American Option (Stock) and use TensorFlow to price it.
I use two helper function , get_continuation_function to create the TF operators. And the pricing_function to create the computational graph for the pricing.
The npv operator is sum of the optimal exercise decisions. At each time I check if the exercise value is greater than the predicted continuation value (in other words, whether the option is in the money).
And the actual pricing function is american_tf. I execute the function to create the paths, the exercise values for the training path. Then, I iterate backward through the training_functions and learn the value and decision on each exercise date.
def get_continuation_function():
X = tf.placeholder(tf.float32, (None,1),name="X")
y = tf.placeholder(tf.float32, (None,1),name="y")
w = tf.Variable(tf.random_uniform((1,1))*0.1,,name="w")
b = tf.Variable(initial_value = tf.ones(1)*1,name="b")
y_hat = tf.add(tf.matmul(X, w), b)
pre_error = tf.pow(y-y_hat,2)
error = tf.reduce_mean(pre_error)
train = tf.train.AdamOptimizer(0.1).minimize(error)
return(X, y, train, w, b, y_hat)
def pricing_function(number_call_dates):
S = tf.placeholder(tf.float32,name="S")
# First excerise date
dts = tf.placeholder(tf.float32,name="dts")
# 2nd exersice date
K = tf.placeholder(tf.float32,name="K")
r = tf.placeholder(tf.float32,,name="r")
sigma = tf.placeholder(tf.float32,name="sigma")
dW = tf.placeholder(tf.float32,name="dW")
S_t = S * tf.cumprod(tf.exp((r-sigma**2/2) * dts + sigma * tf.sqrt(dts) * dW), axis=1)
E_t = tf.exp(-r * tf.cumsum(dts)) * tf.maximum(K-S_t, 0)
continuationValues = []
training_functions = []
previous_exersies = 0
npv = 0
for i in range(number_call_dates-1):
(input_x, input_y, train, w, b, y_hat) = get_continuation_function()
training_functions.append((input_x, input_y, train, w, b, y_hat))
X = tf.keras.activations.relu(S_t[:, i])
contValue = tf.add(tf.matmul(X, w),b)
continuationValues.append(contValue)
inMoney = tf.cast(tf.greater(E_t[:,i], 0.), tf.float32)
exercise = tf.cast(tf.greater(E_t[:,i], contValue[:,0]), tf.float32) * inMoney * (1-previous_exersies)
previous_exersies += exercise
npv += exercise*E_t[:,i]
# Last exercise date
inMoney = tf.cast(tf.greater(E_t[:,-1], 0.), tf.float32)
exercise = inMoney * (1-previous_exersies)
npv += exercise*E_t[:,-1]
npv = tf.reduce_mean(npv)
return([S, dts, K, r, sigma,dW, S_t, E_t, npv, training_functions])
def american_tf(S_0, strike, M, impliedvol, riskfree_r, random_train, random_pricing):
n_exercise = len(M)
with tf.Session() as sess:
S,dts,K,r,sigma,dW,S_t,E_t,npv,training_functions = pricing_function(n_exercise)
sess.run(tf.global_variables_initializer())
paths, exercise_values = sess.run([S_t,E_t], {
S: S_0,
dts: M,
K: strike,
r: riskfree_r,
sigma: impliedvol,
dW: random_train
})
for i in range(n_exercise-1)[::-1]:
(input_x,input_y,train,w,b,y_hat) = training_functions[i]
y= exercise_values[:,i+1:i+2]
X = paths[:,i]
print(input_x.shape)
print((exercise_values[:,i]>0).shape)
for epochs in range(100):
_ = sess.run(train, {input_x:X[exercise_values[:,i]>0],
input_y:y[exercise_values[:,i]>0]})
cont_value = sess.run(y_hat, {input_x:X, input_y:y})
exercise_values[:,i+1:i+2] = np.maximum(exercise_values[:,i+1:i+2], cont_value)
npv = sess.run(npv, {S: S_0, K: strike, r: riskfree_r, sigma: impliedvol, dW: N_pricing})
return npv
N_samples_learn = 1000
N_samples_pricing = 1000
calldates = 12
N = np.random.randn(N_samples_learn,calldates)
N_pricing = np.random.randn(N_samples_pricing,calldates)
american_tf(100., 90., [1.]*calldates, 0.25, 0.05, N, N_pricing)
Calldates is the number of steps
training sample set = 1000
test sample size = 1000
But my error is very weird
---> 23 nput_y:y[exercise_values[:,i]>0]})
ValueError: Cannot feed value of shape (358,) for Tensor 'Placeholder_441:0', which has shape '(?, 1)'
There are a bunch of things discussed in comment with #hallo12. I just want to upload a working version incorporating all the changes. The code is tested and runs without error. But to make sure the final training output is correct, you may want to compare against some benchmark.
General comment: It's good to separate the variable and time dimension in this type of application, especially when you only have 1 variable. For example, your input array should be 3D with
[time, training sample, input variable]
rather than 2D with [training sample, time]. This way when you iterate over the time dimension, the rest of the dimensions are kept unchanged.
import tensorflow as tf
import numpy as np
def get_continuation_function():
X = tf.placeholder(tf.float32, (None,1),name="X")
y = tf.placeholder(tf.float32, (None,1),name="y")
w = tf.Variable(tf.random_uniform((1,1))*0.1,name="w")
b = tf.Variable(initial_value = tf.ones(1)*1,name="b")
y_hat = tf.add(tf.matmul(X, w), b)
pre_error = tf.pow(y-y_hat,2)
error = tf.reduce_mean(pre_error)
train = tf.train.AdamOptimizer(0.1).minimize(error)
return(X, y, train, w, b, y_hat)
def pricing_function(number_call_dates):
S = tf.placeholder(tf.float32,name="S")
# First excerise date
dts = tf.placeholder(tf.float32,name="dts")
# 2nd exersice date
K = tf.placeholder(tf.float32,name="K")
r = tf.placeholder(tf.float32,name="r")
sigma = tf.placeholder(tf.float32,name="sigma")
dW = tf.placeholder(tf.float32,name="dW")
S_t = S * tf.cumprod(tf.exp((r-sigma**2/2) * dts + sigma * tf.sqrt(dts) * dW), axis=1)
E_t = tf.exp(-r * tf.cumsum(dts)) * tf.maximum(K-S_t, 0)
continuationValues = []
training_functions = []
previous_exersies = 0
npv = 0
for i in range(number_call_dates-1):
(input_x, input_y, train, w, b, y_hat) = get_continuation_function()
training_functions.append((input_x, input_y, train, w, b, y_hat))
X = tf.keras.activations.relu(S_t[:, i:i+1])
contValue = tf.add(tf.matmul(X, w),b)
continuationValues.append(contValue)
inMoney = tf.cast(tf.greater(E_t[:,i], 0.), tf.float32)
exercise = tf.cast(tf.greater(E_t[:,i], contValue[:,0]), tf.float32) * inMoney * (1-previous_exersies)
previous_exersies += exercise
npv += exercise*E_t[:,i]
# Last exercise date
inMoney = tf.cast(tf.greater(E_t[:,-1], 0.), tf.float32)
exercise = inMoney * (1-previous_exersies)
npv += exercise*E_t[:,-1]
npv = tf.reduce_mean(npv)
return([S, dts, K, r, sigma,dW, S_t, E_t, npv, training_functions])
def american_tf(S_0, strike, M, impliedvol, riskfree_r, random_train, random_pricing):
n_exercise = len(M)
with tf.Session() as sess:
S,dts,K,r,sigma,dW,S_t,E_t,npv,training_functions = pricing_function(n_exercise)
sess.run(tf.global_variables_initializer())
paths, exercise_values = sess.run([S_t,E_t], {
S: S_0,
dts: M,
K: strike,
r: riskfree_r,
sigma: impliedvol,
dW: random_train
})
for i in range(n_exercise-1)[::-1]:
(input_x,input_y,train,w,b,y_hat) = training_functions[i]
y= exercise_values[:,i+1:i+2]
X = paths[:,i]
print(input_x.shape)
print((exercise_values[:,i]>0).shape)
for epochs in range(100):
_ = sess.run(train, {input_x:(X[exercise_values[:,i]>0]).reshape(len(X[exercise_values[:,i]>0]),1),
input_y:(y[exercise_values[:,i]>0]).reshape(len(y[exercise_values[:,i]>0]),1)})
cont_value = sess.run(y_hat, {input_x:X.reshape(len(X),1), input_y:y.reshape(len(y),1)})
exercise_values[:,i+1:i+2] = np.maximum(exercise_values[:,i+1:i+2], cont_value)
npv = sess.run(npv, {S: S_0, K: strike, dts:M, r: riskfree_r, sigma: impliedvol, dW: N_pricing})
return npv
N_samples_learn = 1000
N_samples_pricing = 1000
calldates = 12
N = np.random.randn(N_samples_learn,calldates)
N_pricing = np.random.randn(N_samples_pricing,calldates)
print(american_tf(100., 90., [1.]*calldates, 0.25, 0.05, N, N_pricing))
I am trying to code logistic regression from scratch. In this code I have, I thought my cost derivative was my regularization, but I've been tasked with adding L1norm regularization. How do you add this in python? Should this be added where I have defined the cost derivative? Any help in the right direction is appreciated.
def Sigmoid(z):
return 1/(1 + np.exp(-z))
def Hypothesis(theta, X):
return Sigmoid(X # theta)
def Cost_Function(X,Y,theta,m):
hi = Hypothesis(theta, X)
_y = Y.reshape(-1, 1)
J = 1/float(m) * np.sum(-_y * np.log(hi) - (1-_y) * np.log(1-hi))
return J
def Cost_Function_Derivative(X,Y,theta,m,alpha):
hi = Hypothesis(theta,X)
_y = Y.reshape(-1, 1)
J = alpha/float(m) * X.T # (hi - _y)
return J
def Gradient_Descent(X,Y,theta,m,alpha):
new_theta = theta - Cost_Function_Derivative(X,Y,theta,m,alpha)
return new_theta
def Accuracy(theta):
correct = 0
length = len(X_test)
prediction = (Hypothesis(theta, X_test) > 0.5)
_y = Y_test.reshape(-1, 1)
correct = prediction == _y
my_accuracy = (np.sum(correct) / length)*100
print ('LR Accuracy: ', my_accuracy, "%")
def Logistic_Regression(X,Y,alpha,theta,num_iters):
m = len(Y)
for x in range(num_iters):
new_theta = Gradient_Descent(X,Y,theta,m,alpha)
theta = new_theta
if x % 100 == 0:
print #('theta: ', theta)
print #('cost: ', Cost_Function(X,Y,theta,m))
Accuracy(theta)
ep = .012
initial_theta = np.random.rand(X_train.shape[1],1) * 2 * ep - ep
alpha = 0.5
iterations = 10000
Logistic_Regression(X_train,Y_train,alpha,initial_theta,iterations)
Regularization adds a term to the cost function so that there is a compromise between minimize cost and minimizing the model parameters to reduce overfitting. You can control how much compromise you would like by adding a scalar e for the regularization term.
So just add the L1 norm of theta to the original cost function:
J = J + e * np.sum(abs(theta))
Since this term is added to the cost function, then it should be considered when computing the gradient of the cost function.
This is simple since the derivative of the sum is the sum of derivatives. So now just need to figure out what is the derivate of the term sum(abs(theta)). Since it is a linear term, then the derivative is constant. It is = 1 if theta >= 0, and -1 if theta < 0 (note there is a mathematical undeterminity at 0, but we don't care about it).
So in the function Cost_Function_Derivative we add:
J = J + alpha * e * (theta >= 0).astype(float)