three layer neural network for MNIST with Python

three layer neural network for MNIST with Python - python

I'm currently writing my own code to implement a single-hidden-layer neural network and test the model on MNIST dataset. But I got wired result(NLL is unacceptably high) though I checked my code for over 2 days without finding what's went wrong.
Here're global parameters:
layers = np.array([784, 300, 10])
learningRate = 0.01
momentum = 0.01
batch_size = 10000
num_of_batch = len(train_label)/batch_size
nepoch = 30
Softmax function definition:
def softmax(x):
x = np.exp(x)
x_sum = np.sum(x,axis=1) #shape = (nsamples,)
for row_idx in range(len(x)):
x[row_idx,:] /= x_sum[row_idx]
return x
Sigmoid function definition:
def f(x):
return 1.0/(1+np.exp(-x))
initialize w and b
k = np.vectorize(math.sqrt)(layers[0:-2]*layers[1:])
w1 = np.random.uniform(-0.5, 0.5, layers[0:2][::-1])
b1 = np.random.uniform(-0.5, 0.5, (1,layers[1]))
w2 = np.random.uniform(-0.5, 0.5, layers[1:3][::-1])
b2 = np.random.uniform(-0.5, 0.5, (1,layers[2]))
And the following is the core part for each mini-batch:
for idx in range(num_of_batch):
# forward_vectorized
x = train_set[idx*batch_size:(idx+1)*batch_size,:]
y = Y[idx*batch_size:(idx+1)*batch_size,:]
a1 = x
a2 = f(np.dot(np.insert(a1,0,1,axis=1),np.insert(w1,0,b1,axis=1).T))
a3 = softmax(np.dot(np.insert(a2,0,1,axis=1),np.insert(w2,0,b2,axis=1).T))
# compute delta
d3 = a3-y
d2 = np.dot(d3,w2)*a2*(1.0-a2)
# compute grad
D2 = np.dot(d3.T,a2)
D1 = np.dot(d2.T,a1)
# update_parameters
w1 = w1 - learningRate*(D1/batch_size + momentum*w1)
b1 = b1 - learningRate*(np.sum(d2,axis=0)/batch_size)
w2 = w2 - learningRate*(D2/batch_size+ momentum*w2)
b2 = b2 - learningRate*(np.sum(d3,axis=0)/batch_size)
e = -np.sum(y*np.log(a3))/batch_size
err.append(e)
After one epoch(50,000 samples), I got the following sequence of e, which seems to be too large:
Out[1]:
10000/50000 4.033538
20000/50000 3.924567
30000/50000 3.761105
40000/50000 3.632708
50000/50000 3.549212
I think the back_prop code should be correct and I couldn't find what's going wrong. It has tortured me for over 2 days.

Related

WHY is there no change in the weights in my only numpy neural network

I was trying to build a three layer neural network with each having one neuron using numpy. One was the input layer, one was the hidden layer and one was the output layer. The hidden layer had relu activation and the output layer had sigmoid activation. I tried to make the gradient descent and mean squared error all in numpy, where the gradient descent had an array as input where all the data was. The data was actually random numbers where if the input is more than 18 then the ouput will be 1.But the problem is that the weights and biases are initialized to 1 and after any amount of epoch given, the weights and biases wont change but the loss will decrease. The structure if it helps: (x)---((y1 = w1 * x + b1)|(z1 = relu(y1)))----((y2 = w2 * z1 + b2)|(z2 = sigmoid(y2))). The code:
import numpy as np
import math
import random as r
# dataset generation
x = []
y = []
for i in range (100):
age = r.randint(0, 80)
x.append(age)
if age > 18:
y.append(1)
else:
y.append(0)
# forward pass functions
def sig(x):
y = []
for i in x:
y.append(1/(1+math.exp(-i)))
return y
def relu(x):
y = []
for i in x:
if i > 0:
y.append(i)
else:
y.append(0)
return y
def mse(ytrue,ypred):
total = 0
for yt,yp in zip(ytrue,ypred):
total += (yt - yp)**2
return total/len(ytrue)
# backward pass functions
def dmse(ytrue,ypred):
total = []
for yt,yp in zip(ytrue,ypred):
total.append((yt - yp)*2)
return total
def d_of_relu(x):
y = []
for i in x:
if i > 0:
y.append(1)
else:
y.append(0)
return y
def d_of_sig(x):
y = []
for i in y:
y.append(y*(1-y))
return y
# additional functions
def y_calculation(x,w,b):
y = []
for i in x:
y.append(i*w+b)
return y
def chain_addition_second_layer_with_weight(dcdz2,dz2dy2,dy2dw2):
total = 0
for a,b in zip(dcdz2,dz2dy2):
total += a*b*dy2dw2
return total/len(dcdz2)
def chain_addition_second_layer_with_bias(dcdz2,dz2dy2):
total = 0
for a,b in zip(dcdz2,dz2dy2):
total += a*b*1
return total/len(dcdz2)
def chain_addition_first_layer_with_bias(dcdz2,dz2dy2,dy2dz1,dz1dy1):
total = 0
for a,b,d in zip(dcdz2,dz2dy2,dz1dy1):
total += a*b*dy2dz1*d*1
return total/len(dcdz2)
def chain_addition_first_layer_with_weight(dcdz2,dz2dy2,dy2dz1,dz1dy1,dy1dw1):
total = 0
for a,b,d in zip(dcdz2,dz2dy2,dz1dy1):
total += a*b*dy2dz1*d*dy1dw1
return total/len(dcdz2)
# actual gradient descend
def gradient_descend(x,y,lr,epoch):
w1 = w2 = b1 = b2 = 2
for i in range(epoch):
y1 = y_calculation(x,w1,b1)
z1 = relu(y1)
y2 = y_calculation(z1,w2,b2)
z2 = sig(y2)
cost = mse(y,z2)
dcdz2 = dmse(y,z2)
dz2dy2 = d_of_sig(z2)
dy2dw2 = z1
dy2db2 = 1
dy2dz1 = w2
dz1dy1 = d_of_relu(z1)
dy1dw1 = x
w2 = w2 - lr * chain_addition_second_layer_with_weight(dcdz2,dz2dy2,dy2dw2)
b2 = b2 - lr * chain_addition_second_layer_with_bias(dcdz2,dz2dy2)
w1 = w1 - lr * chain_addition_first_layer_with_weight(dcdz2,dz2dy2,dy2dz1,dz1dy1,dy1dw1)
b1 = b1 - lr * chain_addition_first_layer_with_bias(dcdz2,dz2dy2,dy2dz1,dz1dy1)
return w2,b2,w1,b1,cost
print(gradient_descend(x,y,0.1,300))
If anyone replies the necessary changes, it wil be very helpful. Thanks in advance

Python library for dot product classification

I have the following python pyseudo-code:
A1 = "101000001111"
A2 = "110000010101"
B2 = "000111010000"
B2 = "000110100000"
# TODO get X = [x1, x2, ..., x12]
assert(A1 * X > .5)
assert(A2 * X > .5)
assert(B1 * X < .5)
assert(B2 * X < .5)
So this will basically be a regression based classification.
0.5 is my threshold but how to get X?

You need to find 12 coefficients. You can try to use LogisticRegression or LinearRegression
When you have linear coefficients you can use np.dot or # operator to get a dot product.
Example:
import numpy as np
from sklearn.linear_model import LogisticRegression
A1 = "101000001111"
A2 = "110000010101"
B1 = "000111010000"
B2 = "000110100000"
A1 = np.array(list(A1), np.float32)
A2 = np.array(list(A2), np.float32)
B1 = np.array(list(B1), np.float32)
B2 = np.array(list(B2), np.float32)
X = np.array((A1, A2, B1, B2))
y = np.array([1, 1, 0, 0])
w = model = LogisticRegression(fit_intercept=False).fit(X, y).coef_.flatten()
print(A1.dot(w))
print(A2.dot(w))
print(B1.dot(w))
print(B2.dot(w))
assert A1 # w > 0.5
assert A2 # w > 0.5
assert B1 # w < 0.5
assert B2 # w < 0.5
Results:
1.7993630995882384
1.5032155788245702
-1.0190643734998346
-1.0385501901808816

Pricing American Stock Option with TensorFlow Neural Network , Simulate by Monte Carlo

so what I try to do is to simulate with Monte Carlo a American Option (Stock) and use TensorFlow to price it.
I use two helper function , get_continuation_function to create the TF operators. And the pricing_function to create the computational graph for the pricing.
The npv operator is sum of the optimal exercise decisions. At each time I check if the exercise value is greater than the predicted continuation value (in other words, whether the option is in the money).
And the actual pricing function is american_tf. I execute the function to create the paths, the exercise values for the training path. Then, I iterate backward through the training_functions and learn the value and decision on each exercise date.
def get_continuation_function():
X = tf.placeholder(tf.float32, (None,1),name="X")
y = tf.placeholder(tf.float32, (None,1),name="y")
w = tf.Variable(tf.random_uniform((1,1))*0.1,,name="w")
b = tf.Variable(initial_value = tf.ones(1)*1,name="b")
y_hat = tf.add(tf.matmul(X, w), b)
pre_error = tf.pow(y-y_hat,2)
error = tf.reduce_mean(pre_error)
train = tf.train.AdamOptimizer(0.1).minimize(error)
return(X, y, train, w, b, y_hat)
def pricing_function(number_call_dates):
S = tf.placeholder(tf.float32,name="S")
# First excerise date
dts = tf.placeholder(tf.float32,name="dts")
# 2nd exersice date
K = tf.placeholder(tf.float32,name="K")
r = tf.placeholder(tf.float32,,name="r")
sigma = tf.placeholder(tf.float32,name="sigma")
dW = tf.placeholder(tf.float32,name="dW")
S_t = S * tf.cumprod(tf.exp((r-sigma**2/2) * dts + sigma * tf.sqrt(dts) * dW), axis=1)
E_t = tf.exp(-r * tf.cumsum(dts)) * tf.maximum(K-S_t, 0)
continuationValues = []
training_functions = []
previous_exersies = 0
npv = 0
for i in range(number_call_dates-1):
(input_x, input_y, train, w, b, y_hat) = get_continuation_function()
training_functions.append((input_x, input_y, train, w, b, y_hat))
X = tf.keras.activations.relu(S_t[:, i])
contValue = tf.add(tf.matmul(X, w),b)
continuationValues.append(contValue)
inMoney = tf.cast(tf.greater(E_t[:,i], 0.), tf.float32)
exercise = tf.cast(tf.greater(E_t[:,i], contValue[:,0]), tf.float32) * inMoney * (1-previous_exersies)
previous_exersies += exercise
npv += exercise*E_t[:,i]
# Last exercise date
inMoney = tf.cast(tf.greater(E_t[:,-1], 0.), tf.float32)
exercise = inMoney * (1-previous_exersies)
npv += exercise*E_t[:,-1]
npv = tf.reduce_mean(npv)
return([S, dts, K, r, sigma,dW, S_t, E_t, npv, training_functions])
def american_tf(S_0, strike, M, impliedvol, riskfree_r, random_train, random_pricing):
n_exercise = len(M)
with tf.Session() as sess:
S,dts,K,r,sigma,dW,S_t,E_t,npv,training_functions = pricing_function(n_exercise)
sess.run(tf.global_variables_initializer())
paths, exercise_values = sess.run([S_t,E_t], {
S: S_0,
dts: M,
K: strike,
r: riskfree_r,
sigma: impliedvol,
dW: random_train
})
for i in range(n_exercise-1)[::-1]:
(input_x,input_y,train,w,b,y_hat) = training_functions[i]
y= exercise_values[:,i+1:i+2]
X = paths[:,i]
print(input_x.shape)
print((exercise_values[:,i]>0).shape)
for epochs in range(100):
_ = sess.run(train, {input_x:X[exercise_values[:,i]>0],
input_y:y[exercise_values[:,i]>0]})
cont_value = sess.run(y_hat, {input_x:X, input_y:y})
exercise_values[:,i+1:i+2] = np.maximum(exercise_values[:,i+1:i+2], cont_value)
npv = sess.run(npv, {S: S_0, K: strike, r: riskfree_r, sigma: impliedvol, dW: N_pricing})
return npv
N_samples_learn = 1000
N_samples_pricing = 1000
calldates = 12
N = np.random.randn(N_samples_learn,calldates)
N_pricing = np.random.randn(N_samples_pricing,calldates)
american_tf(100., 90., [1.]*calldates, 0.25, 0.05, N, N_pricing)
Calldates is the number of steps
training sample set = 1000
test sample size = 1000
But my error is very weird
---> 23 nput_y:y[exercise_values[:,i]>0]})
ValueError: Cannot feed value of shape (358,) for Tensor 'Placeholder_441:0', which has shape '(?, 1)'

There are a bunch of things discussed in comment with #hallo12. I just want to upload a working version incorporating all the changes. The code is tested and runs without error. But to make sure the final training output is correct, you may want to compare against some benchmark.
General comment: It's good to separate the variable and time dimension in this type of application, especially when you only have 1 variable. For example, your input array should be 3D with
[time, training sample, input variable]
rather than 2D with [training sample, time]. This way when you iterate over the time dimension, the rest of the dimensions are kept unchanged.
import tensorflow as tf
import numpy as np
def get_continuation_function():
X = tf.placeholder(tf.float32, (None,1),name="X")
y = tf.placeholder(tf.float32, (None,1),name="y")
w = tf.Variable(tf.random_uniform((1,1))*0.1,name="w")
b = tf.Variable(initial_value = tf.ones(1)*1,name="b")
y_hat = tf.add(tf.matmul(X, w), b)
pre_error = tf.pow(y-y_hat,2)
error = tf.reduce_mean(pre_error)
train = tf.train.AdamOptimizer(0.1).minimize(error)
return(X, y, train, w, b, y_hat)
def pricing_function(number_call_dates):
S = tf.placeholder(tf.float32,name="S")
# First excerise date
dts = tf.placeholder(tf.float32,name="dts")
# 2nd exersice date
K = tf.placeholder(tf.float32,name="K")
r = tf.placeholder(tf.float32,name="r")
sigma = tf.placeholder(tf.float32,name="sigma")
dW = tf.placeholder(tf.float32,name="dW")
S_t = S * tf.cumprod(tf.exp((r-sigma**2/2) * dts + sigma * tf.sqrt(dts) * dW), axis=1)
E_t = tf.exp(-r * tf.cumsum(dts)) * tf.maximum(K-S_t, 0)
continuationValues = []
training_functions = []
previous_exersies = 0
npv = 0
for i in range(number_call_dates-1):
(input_x, input_y, train, w, b, y_hat) = get_continuation_function()
training_functions.append((input_x, input_y, train, w, b, y_hat))
X = tf.keras.activations.relu(S_t[:, i:i+1])
contValue = tf.add(tf.matmul(X, w),b)
continuationValues.append(contValue)
inMoney = tf.cast(tf.greater(E_t[:,i], 0.), tf.float32)
exercise = tf.cast(tf.greater(E_t[:,i], contValue[:,0]), tf.float32) * inMoney * (1-previous_exersies)
previous_exersies += exercise
npv += exercise*E_t[:,i]
# Last exercise date
inMoney = tf.cast(tf.greater(E_t[:,-1], 0.), tf.float32)
exercise = inMoney * (1-previous_exersies)
npv += exercise*E_t[:,-1]
npv = tf.reduce_mean(npv)
return([S, dts, K, r, sigma,dW, S_t, E_t, npv, training_functions])
def american_tf(S_0, strike, M, impliedvol, riskfree_r, random_train, random_pricing):
n_exercise = len(M)
with tf.Session() as sess:
S,dts,K,r,sigma,dW,S_t,E_t,npv,training_functions = pricing_function(n_exercise)
sess.run(tf.global_variables_initializer())
paths, exercise_values = sess.run([S_t,E_t], {
S: S_0,
dts: M,
K: strike,
r: riskfree_r,
sigma: impliedvol,
dW: random_train
})
for i in range(n_exercise-1)[::-1]:
(input_x,input_y,train,w,b,y_hat) = training_functions[i]
y= exercise_values[:,i+1:i+2]
X = paths[:,i]
print(input_x.shape)
print((exercise_values[:,i]>0).shape)
for epochs in range(100):
_ = sess.run(train, {input_x:(X[exercise_values[:,i]>0]).reshape(len(X[exercise_values[:,i]>0]),1),
input_y:(y[exercise_values[:,i]>0]).reshape(len(y[exercise_values[:,i]>0]),1)})
cont_value = sess.run(y_hat, {input_x:X.reshape(len(X),1), input_y:y.reshape(len(y),1)})
exercise_values[:,i+1:i+2] = np.maximum(exercise_values[:,i+1:i+2], cont_value)
npv = sess.run(npv, {S: S_0, K: strike, dts:M, r: riskfree_r, sigma: impliedvol, dW: N_pricing})
return npv
N_samples_learn = 1000
N_samples_pricing = 1000
calldates = 12
N = np.random.randn(N_samples_learn,calldates)
N_pricing = np.random.randn(N_samples_pricing,calldates)
print(american_tf(100., 90., [1.]*calldates, 0.25, 0.05, N, N_pricing))

Neural Network XOR with numpy not converging

I have trained a Neural Net to solve the XOR problem. The problem with my network is that it is not converging. I am using Andrew Ng's methods and notations as taught in the DeepLearning.ai course.
Here's the code :
import numpy as np
from __future__ import print_function
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
Y = np.array([[0, 1, 1, 0]])
np.random.seed(1)
W1 = np.random.randn(3, 2) * 0.0001
b1 = np.ones((3, 1))
W2 = np.random.randn(1, 3) * 0.0001
b2 = np.ones((1, 1))
The next part for the Backpropagation:
learning_rate = 0.01
m = 4
for iteration in range(100000):
# forward propagation
# layer1
Z1 = np.dot(W1, X.T) + b1
A1 = sigmoid(Z1)
# layer2
Z2 = np.dot(W2, A1) + b2
A2 = sigmoid(Z2)
# backpropagation
dZ2 = Y - A2
dW2 = (1 / m) * np.dot(dZ2, A1.T)
db2 = (1 / m) * np.sum(dZ2, axis=1, keepdims=True)
dZ1 = np.dot(dW2.T, dZ2) * sigmoid_gradient(Z1)
dW1 = (1 / m) * np.dot(dZ1, X)
db1 = (1 / m) * np.sum(dZ1, axis=1, keepdims=True)
# checking if shapes are correctly preserved
assert (dZ2.shape == Z2.shape)
assert (dW2.shape == W2.shape)
assert (db2.shape == b2.shape)
assert (dZ1.shape == Z1.shape)
assert (dW1.shape == W1.shape)
assert (db1.shape == b1.shape)
# update parameters
W1 = W1 + learning_rate * dW1
W2 = W2 + learning_rate * dW2
b1 = b1 + learning_rate * db1
b2 = b2 + learning_rate * db2
# print every 10k
if (iteration % 10000 == 0):
print(A2)

You have made a couple of mistakes in your code. For example, in computing the W2.
...
dZ2 = Y - A2
dW2 = (1 / m) * np.dot(dZ2, A1.T)
...
W2 = W2 + learning_rate * dW2
We want to calculate the derivative of Cost with respect to W2 using the chain rule.
We can write the derivatives as follows:
You haven't implemented the middle part which computes the derivative of the Z2.
You can check out this video, it explains the math part of backpropagation. Moreover, you can check out this simple implementation of the neural network.

Wrong values for partial derivatives in neural network python

I am implementing a simple neural network classifier for the iris dataset. The NN has 3 input nodes, 1 hidden layer with two nodes, and 3 output nodes. I have implemented evrything but the values of the partial derivatives are not calculated correctly. I have exhausted myself looking for the solution but couldn't.
Here is my code for calculating the partial derivatives.
def derivative_cost_function(self,X,Y,thetas):
'''
Computes the derivates of Cost function w.r.t input parameters (thetas)
for given input and labels.
Input:
------
X: can be either a single d X n-dimensional vector or d X n dimensional matrix of inputs
theata: must dk X 1-dimensional vector for representing vectors of k classes
Y: Must be k X n-dimensional label vector
Returns:
------
partial_thetas: a dk X 1-dimensional vector of partial derivatives of cost function w.r.t parameters..
'''
#forward pass
a2, a3=self.forward_pass(X,thetas)
#now back-propogate
# unroll thetas
l1theta, l2theta = self.unroll_thetas(thetas)
nexamples=float(X.shape[1])
# compute delta3, l2theta
a3 = np.array(a3)
a2 = np.array(a2)
Y = np.array(Y)
a3 = a3.T
delta3 = (a3 * (1 - a3)) * (((a3 - Y)/((a3)*(1-a3))))
l2Derivatives = np.dot(delta3, a2)
#print "Layer 2 derivatives shape = ", l2Derivatives.shape
#print "Layer 2 derivatives = ", l2Derivatives
# compute delta2, l1 theta
a2 = a2.T
dotProduct = np.dot(l2theta.T,delta3)
delta2 = dotProduct * (a2) * (1- a2)
l1Derivatives = np.dot(delta2[1:], X.T)
#print "Layer 1 derivatives shape = ", l1Derivatives.shape
#print "Layer 1 derivatives = ", l1Derivatives
#remember to exclude last element of delta2, representing the deltas of bias terms...
# i.e. delta2=delta2[:-1]
# roll thetas into a big vector
thetas=(self.roll_thetas(l1Derivatives,l2Derivatives)).reshape(thetas.shape) # return the same shape as you received
return thetas

Why not have a look of my implementation in https://github.com/zizhaozhang/simple_neutral_network/blob/master/nn.py
The derivatives is actually here:
def dCostFunction(self, theta, in_dim, hidden_dim, num_labels, X, y):
#compute gradient
t1, t2 = self.uncat(theta, in_dim, hidden_dim)
a1, z2, a2, z3, a3 = self._forward(X, t1, t2) # p x s matrix
# t1 = t1[1:, :] # remove bias term
# t2 = t2[1:, :]
sigma3 = -(y - a3) * self.dactivation(z3) # do not apply dsigmode here? should I
sigma2 = np.dot(t2, sigma3)
term = np.ones((1,num_labels))
sigma2 = sigma2 * np.concatenate((term, self.dactivation(z2)),axis=0)
theta2_grad = np.dot(sigma3, a2.T)
theta1_grad = np.dot(sigma2[1:,:], a1.T)
theta1_grad = theta1_grad / num_labels
theta2_grad = theta2_grad / num_labels
return self.cat(theta1_grad.T, theta2_grad.T)
Hope it helps

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

three layer neural network for MNIST with Python - python

Related

WHY is there no change in the weights in my only numpy neural network

Python library for dot product classification

Pricing American Stock Option with TensorFlow Neural Network , Simulate by Monte Carlo

Neural Network XOR with numpy not converging

Wrong values for partial derivatives in neural network python

Categories

Resources