Here I am trying to understand neural networks by coding one from scratch (in numpy only). I did the forward pass (using dot products) successfully. But I have no idea how I should proceed to do the backward pass (partial derivatives with respect to each trainable parameter and update using SDG equation). Loss can be the mean square error for example.
Here is my code so far, I added comments below the code describing what is left.
'''
I want to design a NN that has :
input layer I of 4 neurons
hidden layer H1 of 3 neurons
hidden layer H2 of 3 neurons
output layer O of 1 neurons
'''
import numpy as np
inputs = [1, 2, 3, 2.5]
# -------------- Hidden layers ---------------------------
wh1 = [[0.2, 0.8, -0.5, 1],
[0.5, -0.91, 0.26, -0.5],
[-0.26, -0.27, 0.17, 0.87]]
bh1 = [2, 3, 0.5]
wh2 = [[0.1, -0.14, 0.5],
[-0.5, 0.12, -0.33],
[-0.44, 0.73, -0.13]]
bh2 = [-1, 2, -0.5]
layer1_outputs = np.dot(wh1, np.array(inputs)) + bh1
layer2_outputs = np.dot(wh2, layer1_outputs,) + bh2
# ------------ output layer ------------------------------
who = [0.1, -0.14, 0.5]
bho = [4]
layer_out = np.dot(who, layer2_outputs,) + bho
# --------------------------------------------------------
print(layer_out)
true_outputs = np.sin(inputs)
# compute RMSE
# compute partial derivatives
# update weights
architecture of the NN :
Backpropagation in Neural Network uses chain rule of derivatives if you wish to implement backpropagation you have to find a way to implement the feature.
Here is my suggestion.
Create a class for your neural network, so you can create a separate function for each task.
Use a loop to pass through your network from front to back, and use the chain rule to calculate the partial derivatives at each level.
Adding sample code, from my old work, refer to GitHub repo for full code.
https://github.com/akash-agni/DeepLearning/blob/main/Neural_Network_From_Scratch_using_Numpy.ipynb
def backpropogate(self, X, y):
delta = list() #Empty list to store derivatives
delta_w = [0 for _ in range(len(self.layers))] #stores weight updates
delta_b = [0 for _ in range(len(self.layers))] #stores bias updates
error_o = (self.layers[-1].z - y.T) #Calculate the the error at output layer.
for i in reversed(range(len(self.layers) - 1)):
error_i = np.multiply(self.layers[i+1].weights.T.dot(error_o), self.layers[i].activation_grad()) # mutliply error with weights transpose to get gradients
delta_w[i+1] = error_o.dot(self.layers[i].a.T)/len(y) # store gradient for weights
delta_b[i+1] = np.sum(error_o, axis=1, keepdims=True)/len(y) # store gradients for biases
error_o = error_i # now make assign the previous layers error as current error and repeat the process.
delta_w[0] = error_o.dot(X) # gradients for last layer
delta_b[0] = np.sum(error_o, axis=1, keepdims=True)/len(y)
return (delta_w, delta_b) return gradients.
Related
A model that I am working on should be predicting quite a lot of variables simultaneously (>1000). Therefore I would like to have a small neural network at the end of the network for each output.
In order to do this compactly, I would like to find a way to create a sparse trainable connection between two layers in the neural network within the Tensorflow framework.
Only a small portion of the connection matrix should be trainable: It is only the parameters that are part of the block-diagonal.
For example:
The connection matrix is the following:
The trainable parameters should be in the place of the 1's.
I have written exactly such a layer:
https://github.com/ArnovanHilten/GenNet/blob/master/GenNet_utils/LocallyDirectedConnected_tf2.py
It takes a sparse matrix as an input and lets you decide how to connect between layers. The layer uses sparse tensors and matrix multiplications.
edit
so the comment was Is this a trainable object though?
The answer: No. You cannot use sparse matrix currently and make it trainable. Instead you can use a mask matrix (see at the end)
But if you need to use sparse matrix, you just have to use tf.sparse.sparse_dense_matmul() or tf.sparse_tensor_to_dense() where your sparse interacts with a dense matrix. I have taken a simple XOR example from here and replaced dense with a sparse matrix:
#Declaring necessary modules
import tensorflow as tf
import numpy as np
"""
A simple numpy implementation of a XOR gate to understand the backpropagation
algorithm
"""
x = tf.placeholder(tf.float32,shape = [4,2],name = "x")
#declaring a place holder for input x
y = tf.placeholder(tf.float32,shape = [4,1],name = "y")
#declaring a place holder for desired output y
m = np.shape(x)[0]#number of training examples
n = np.shape(x)[1]#number of features
hidden_s = 2 #number of nodes in the hidden layer
l_r = 1#learning rate initialization
theta1 = tf.SparseTensor(indices=[[0, 0],[0, 1], [1, 1]], values=[0.1, 0.2, 0.1], dense_shape=[3, 2])
#theta1 = tf.cast(tf.Variable(tf.random_normal([3,hidden_s]),name = "theta1"),tf.float64)
theta2 = tf.cast(tf.Variable(tf.random_normal([hidden_s+1,1]),name = "theta2"),tf.float32)
#conducting forward propagation
a1 = tf.concat([np.c_[np.ones(x.shape[0])],x],1)
#the weights of the first layer are multiplied by the input of the first layer
#z1 = tf.sparse_tensor_dense_matmul(theta1, a1)
z1 = tf.matmul(a1,tf.sparse_tensor_to_dense(theta1))
#the input of the second layer is the output of the first layer, passed through the
a2 = tf.concat([np.c_[np.ones(x.shape[0])],tf.sigmoid(z1)],1)
#the input of the second layer is multiplied by the weights
z3 = tf.matmul(a2,theta2)
#the output is passed through the activation function to obtain the final probability
h3 = tf.sigmoid(z3)
cost_func = -tf.reduce_sum(y*tf.log(h3)+(1-y)*tf.log(1-h3),axis = 1)
#built in tensorflow optimizer that conducts gradient descent using specified
optimiser = tf.train.GradientDescentOptimizer(learning_rate = l_r).minimize(cost_func)
#setting required X and Y values to perform XOR operation
X = [[0,0],[0,1],[1,0],[1,1]]
Y = [[0],[1],[1],[0]]
#initializing all variables, creating a session and running a tensorflow session
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
#running gradient descent for each iterati
for i in range(200):
sess.run(optimiser, feed_dict = {x:X,y:Y})#setting place holder values using feed_dict
if i%100==0:
print("Epoch:",i)
print(sess.run(theta1))
and the output is:
Epoch: 0
SparseTensorValue(indices=array([[0, 0],
[0, 1],
[1, 1]]), values=array([0.1, 0.2, 0.1], dtype=float32), dense_shape=array([3, 2]))
Epoch: 100
SparseTensorValue(indices=array([[0, 0],
[0, 1],
[1, 1]]), values=array([0.1, 0.2, 0.1], dtype=float32), dense_shape=array([3, 2]))
So the only way is to use a mask matrix. You can use it by multiplication or tf.where
1) Multiplication: You can create mask matrix of the desired shape and multiply it with your weight matrix:
mask = tf.Variable([[1,0,0],[0,1,0],[0,0,1]],name ='mask', trainable=False)
weight = tf.cast(tf.Variable(tf.random_normal([3,3])),tf.float32)
desired_tensor = tf.matmul(weight, mask)
2) tf.where
mask = tf.Variable([[1,0,0],[0,1,0],[0,0,1]],name ='mask', trainable=False)
weight = tf.cast(tf.Variable(tf.random_normal([3,3])),tf.float32)
desired_tensor = tf.where(mask > 0, tf.ones_like(weight), weight)
Hope it helps
You can do that by using sparse tensors like so:
SparseTensor(indices=[[0, 0], [1, 2]], values=[1, 2], dense_shape=[3, 4])
and the output is:
[[1, 0, 0, 0]
[0, 0, 2, 0]
[0, 0, 0, 0]]
you can look up more on the documentation of sparse tensor here:
https://www.tensorflow.org/api_docs/python/tf/sparse/SparseTensor
Hope it helps!
I am using pyswarms PSO for neural network optimisation. I am trying to create a network of input layer and output layer.
# Store the features as X and the labels as y
X = np.random.randn(25000,20)
y = np.random.random_integers(0,2,25000)
# In[29]:
def sigmoid(x):
return 1 / (1 + math.exp(-x))
# In[58]:
print(X_train.shape)
print(y_train.shape)
# In[63]:
# Forward propagation
def forward_prop(params):
"""Forward propagation as objective function
This computes for the forward propagation of the neural network, as
well as the loss. It receives a set of parameters that must be
rolled-back into the corresponding weights and biases.
Inputs
------
params: np.ndarray
The dimensions should include an unrolled version of the
weights and biases.
Returns
-------
float
The computed negative log-likelihood loss given the parameters
"""
# Neural network architecture
n_inputs = 20
n_classes = 2
# Roll-back the weights and biases
W1 = params[0:40]
# Perform forward propagation
z1 = X.dot(W1) # Pre-activation in Layer 1
#a1 = np.tanh(z1) # Activation in Layer 1
#z2 = a1.dot(W2) + b2 # Pre-activation in Layer 2
logits = z1 # Logits for Layer 2
# Compute for the softmax of the logits
exp_scores = np.exp(logits)
probs = exp_scores / np.sum(exp_scores,axis=1)
# Compute for the negative log likelihood
N = 25000 # Number of samples
corect_logprobs = -np.log(probs[range(N), y])
loss = np.sum(corect_logprobs) / N
return loss
# In[64]:
def f(x):
# Compute for the negative log likelihood
"""Higher-level method to do forward_prop in the
whole swarm.
Inputs
------
x: numpy.ndarray of shape (n_particles, dimensions)
The swarm that will perform the search
Returns
-------
numpy.ndarray of shape (n_particles, )
The computed loss for each particle
"""
n_particles = x.shape[0]
j = [forward_prop(x[i]) for i in range(n_particles)]
return np.array(j)
# In[65]:
# Initialize swarm
options = {'c1': 0.5, 'c2': 0.3, 'w':0.9}
# Call instance of PSO
dimensions = 20
optimizer = ps.single.GlobalBestPSO(n_particles=100, dimensions=dimensions, options=options)
# Perform optimization
cost, pos = optimizer.optimize(f, print_step=100, iters=1000, verbose=3)
I modified the code the from the examples but I am getting errors
AxisError: axis 1 is out of bounds for array of dimension 1.
Moreover, this example implements softmax function in last layer. How should I use it with different loss functions?
Original code can be found here.
import tensorflow as tf
x = tf.placeholder(tf.float32, [None,4]) # input vector
w1 = tf.Variable(tf.random_normal([4,2])) # weights between first and second layers
b1 = tf.Variable(tf.zeros([2])) # biases added to hidden layer
w2 = tf.Variable(tf.random_normal([2,1])) # weights between second and third layer
b2 = tf.Variable(tf.zeros([1])) # biases added to third (output) layer
def feedForward(x,w,b): # function for forward propagation
Input = tf.add(tf.matmul(x,w), b)
Output = tf.sigmoid(Input)
return Output
>>> Out1 = feedForward(x,w1,b1) # output of first layer
>>> Out2 = feedForward(Out1,w2,b2) # output of second layer
>>> MHat = 50*Out2 # final prediction is in the range (0,50)
>>> M = tf.placeholder(tf.float32, [None,1]) # placeholder for actual (target value of marks)
>>> J = tf.reduce_mean(tf.square(MHat - M)) # cost function -- mean square errors
>>> train_step = tf.train.GradientDescentOptimizer(0.05).minimize(J) # minimize J using Gradient Descent
>>> sess = tf.InteractiveSession() # create interactive session
>>> tf.global_variables_initializer().run() # initialize all weight and bias variables with specified values
>>> xs = [[1,3,9,7],
[7,9,8,2], # x training data
[2,4,6,5]]
>>> Ms = [[47],
[43], # M training data
[39]]
>>> for _ in range(1000): # performing learning process on training data 1000 times
sess.run(train_step, feed_dict = {x:xs, M:Ms})
>>> print(sess.run(MHat, feed_dict = {x:[[1,3,9,7]]}))
[[ 50.]]
>>> print(sess.run(MHat, feed_dict = {x:[[1,15,9,7]]}))
[[ 50.]]
>>> print(sess.run(tf.transpose(MHat), feed_dict = {x:[[1,15,9,7]]}))
[[ 50.]]
In this code, I am trying to predict the marks M of a student out of 50 given how many hours he/she slept, studied, used electronics and played. These 4 features come under the input feature vector x.
To solve this regression problem, I am using a deep neural network with
an input layer with 4 perceptrons (the input features) , a hidden layer with two perceptrons and an output layer with one perceptron. I have used sigmoid as activation function. But, I am getting the exact same prediction([[50.0]]) for M for all possible input vectors I feed in. Can someone please tell me
what is wrong with the code below. I HIGHLY APPRECIATE THE HELP! (IN ADVANCE)
You would need to modify your feedforward() function. Here you don't need to apply sigmoid() at last layer (simply return the activation function!) and also no need to multiply output of this function by 50.
def feedForward(X,W1,b1,W2,b2):
Z=tf.sigmoid(tf.matmul(X,W1)+b1)
return tf.matmul(Z,W2)+b2
MHat = feedForward(x,w1,b1,w2,b2)
Hope this helps!
Don't forget to let us know if it solved your problem :)
I am learning TensorFlow.
I have a question about the code in Introduction:
import tensorflow as tf
import numpy as np
# Create 100 phony x, y data points in NumPy, y = x * 0.1 + 0.3
x_data = np.random.rand(100).astype(np.float32)
y_data = x_data * 0.1 + 0.3
# Try to find values for W and b that compute y_data = W * x_data + b
# (We know that W should be 0.1 and b 0.3, but TensorFlow will
# figure that out for us.)
W = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
b = tf.Variable(tf.zeros([1]))
y = W * x_data + b
# Minimize the mean squared errors.
loss = tf.reduce_mean(tf.square(y - y_data))
optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)
# Before starting, initialize the variables. We will 'run' this first.
init = tf.global_variables_initializer()
# Launch the graph.
sess = tf.Session()
sess.run(init)
# Fit the line.
for step in range(201):
sess.run(train)
if step % 20 == 0:
print(step, sess.run(W), sess.run(b))
# Learns best fit is W: [0.1], b: [0.3]
This program learns best fit of W and b.
If I don't know the formula (y = W * x_data + b), how can I train a model?
For example, this is a training set:
{input = {{1,1}, {1,2}, {2,3}, ... }, target = {2, 3, 5, ...}}
How to train a function(a, b) ~= (a+b)?
In most cases, we do not know the exact form of the objective formula. Thus, we have to design a function and try to approximate the objective formula by this function.
In neural network, the formula is defined by the network architecture (for example, Multilayer perceptron or Recurrent Neural network) and hyper-parameters (for example, the number of hidden layer, the number of neuron in the hidden layers).
In this particular case for example, you can assume the approximate function has the form of (y = Wa+Ub+C -- a linear perceptron) and train the parameters of this function (W,U,C) to approximate the parameters of the objective formula (y=a+b) using the data given.
A neural network is a universal function approximator: that is, for any function (linear, polynomial, etc.), a neural network can approximate it given enough nodes in hidden layers and an activation function. A non-linear activation function (e.g. sigmoid, tanh, ReLU) will "bend" the linear boundary produced by Wx+b to be non-linear.
So I'm trying to create a VERY simple neural network with no hidden layers, just input (3 elements) and linear output (2 elements).
I then define some variables to store configurations and weights
# some configs
input_size = 3
action_size = 2
min_delta, max_delta = -1, 1
learning_rate_op = 0.5
w = {} # weights
I then create the training network
# training network
with tf.variable_scope('prediction'):
state_tensor = tf.placeholder('float32', [None, input_size], name='state_tensor')
w['q_w'] = tf.get_variable('Matrix', [state_tensor.get_shape().as_list()[1], action_size], tf.float32, tf.random_normal_initializer(stddev=0.02))
w['q_b'] = tf.get_variable('bias', [action_size], initializer=tf.constant_initializer(0))
q = tf.nn.bias_add(tf.matmul(state_tensor, w['q_w']), w['q_b'])
I define the optimizer to minimize the square different between the target value and the training network
# weight optimizer
with tf.variable_scope('optimizer'):
# tensor to hold target value
# eg, target_q_tensor=[10;11]
target_q_tensor = tf.placeholder('float32', [None], name='target_q_tensor')
# tensors for action_tensor, for action_tensor matrix and for value deltas
# eg, action_tensor=[0;1], action_one_hot=[[1,0];[0,1]], q_acted=[Q_0,Q_1]
action_tensor = tf.placeholder('int64', [None], name='action_tensor')
action_one_hot = tf.one_hot(action_tensor, action_size, 1.0, 0.0, name='action_one_hot')
q_acted = tf.reduce_sum(q * action_one_hot, reduction_indices=1, name='q_acted')
# delta
delta = target_q_tensor - q_acted
clipped_delta = tf.clip_by_value(delta, min_delta, max_delta, name='clipped_delta')
# error function
loss = tf.reduce_mean(tf.square(clipped_delta), name='loss')
# optimizer
# optim = tf.train.AdamOptimizer(learning_rate_op).minimize(loss)
optim = tf.train.GradientDescentOptimizer(learning_rate_op).minimize(loss)
And finally, I run some values in an infinite loop. However, the weights are never updated, they maintain the random values with which they were initialized
with tf.Session() as sess:
tf.initialize_all_variables().run()
s_t = np.array([[1,0,0],[1,0,1],[1,1,0],[1,0,0]])
action = np.array([0, 1, 0, 1])
target_q = np.array([10, -11, -12, 13])
while True:
if counter % 10000 == 0:
q_values = q.eval({state_tensor: s_t})
for i in range(len(s_t)):
print("q", q_values[i])
print("w", sess.run(w['q_w']), '\nb', sess.run(w['q_b']))
sess.run(optim, {target_q_tensor: target_q, action_tensor: action, state_tensor: s_t})
I took the code from a working DQN implementation, so I figure I'm doing something blatantly wrong. The network should converge to:
# 0 | 1
####################
1,0,0 # 10 13
1,0,1 # x -11
1,1,0 # -12 x
But they do not change at all. Any pointers?
Turns out that clipping the loss is causing the issue. However, I don't understand why...
If your loss is always 1, then it means your clipped delta is always clipping it to 1. It strikes me as an odd choice to clip the loss anyways. Perhaps you meant to clip the gradient of the loss? See this also.
Removing the clipping entirely will (probably) work as well in simple cases.