How do you implement neurons in ANN? - python

From what I understand about Neural Networks, you have a number of hidden layers which each consist of X neurons. A neuron takes in a number of inputs and prospective weights, then using an activation function (sigmoid in my case) gives an output.
My task is to implement a network from scratch (only using numpy), with 2 hidden layers, sigmoid activation function and 500 neurons in each hidden layer. What I don't understand is, how can I implement the concept of neurons? According to this article, one neuron is when all inputs are weighted and are passed into the activation function. So do I feed in the same inputs, 500 times, with different weights each time (in the first layers, then again in the second)? I've also read this topic, where the following is said:
The neuron is nothing more than a set of inputs, a set of weights, and an activation function. The neuron translates these inputs into a single output, which can then be picked up as input for another layer of neurons later on.
So according to this, I indeed should weigh the inputs differently, 500 times, and then pass these forward to the next layer which will do the same. Am I understanding this correctly?
Here is the code I have written so far (it is very elementary but I did not want to proceed further before I clear this up), but have no idea how I would be implementing this:
class NeuralNetwork:
def __init__(self, data, y, neurons, hidden):
self.input = data
self.y = y
self.output = np.zeros(y.shape)
self.layers = hidden
self.neurons = neurons
self.weights = self.generateWeightArray()
print(self.weights)
def generateWeightArray(self):
weightarr = []
#Last weight array is for inbetween hidden and output layer
for i in range(self.layers + 1):
weightarr.append(self.generateWeightMatrix())
return np.asarray(weightarr)
def generateWeightMatrix(self):
return np.random.rand(self.input.shape[0], self.input.shape[1]-1)
def sigmoid(self, x):
return 1/(1+np.exp(-x))
def dsigmoid(self, x):
return self.sigmoid(x)*(1-self.sigmoid(x))
def train(self):
pass
def run(self):
#Since between each layer we have a matrix of weights, we can just keep going for the number of hidden
#layers we have
for i in range(self.layers):
out = np.dot(self.input.transpose(), self.weights[i]).transpose() #step1
self.input = self.sigmoid(out) #step2
print(self.input)
net = NeuralNetwork(np.array([[1,2,3,4],[3,5,1,2],[5,6,7,8]]), np.array([1,0,1]), 500, 2)
net.run()
EDIT
I have changed my code as follows
class NeuralNetwork:
def __init__(self, data, y, neurons, hidden):
self.input = data
self.y = y
self.output = np.zeros(y.shape)
self.layers = hidden
self.neurons = neurons
self.weights_to_hidden = np.random.rand(self.neurons, self.input.shape[1])
self.weights = self.generateWeightArray()
self.weights_to_output = np.random.rand(self.neurons,1)
print(self.weights_to_output)
#Generate a matrix with h+1 weight matrices, where h is the number of hidden layers (+1 for output)
def generateWeightArray(self):
weightarr = []
#Last weight array is for inbetween hidden and output layer
for i in range(self.layers):
weightarr.append(self.generateWeightMatrix())
return np.asarray(weightarr)
#Generate a matrix with n columns and m rows, where n is the number of features and m is the number of neurons
#in the layer
def generateWeightMatrix(self):
return np.random.rand(self.neurons, self.neurons)
def sigmoid(self, x):
return 1/(1+np.exp(-x))
def dsigmoid(self, x):
return self.sigmoid(x)*(1-self.sigmoid(x))
def train(self):
#2 hidden layers, then hidden -> output layer
hidden_in = self.sigmoid(np.dot(self.input, self.weights_to_hidden.transpose()).transpose())
print("Going into hidden layer:")
print(hidden_in)
for i in range(self.layers):
in_hidden = self.sigmoid(np.dot(hidden_in.transpose(), self.weights[i]).transpose())
print("After ",str(i+1), " hidden layer:")
print(in_hidden)
print("Output")
out = self.sigmoid(np.dot(hidden_in.transpose(), self.weights_to_output).transpose())
print(out)
net = NeuralNetwork(np.array([[1,2,3,4],[3,5,1,2],[5,6,7,8]]), np.array([1,0,1]), 5, 2)
net.train()
And the output after running is
[[0.89405222 0.89501672 0.89717842]]
I'm not sure if self.weights_to_output has the correct shape though because its a (n,1), so all features (in each record) will have the same weight, rather than having 3 weights for each row (?)

The 'neurons' (usually called 'units' these days) are the activations in each layer. These are represented as a vector with one element for each unit. You will represent a layer's activations as a 1D array. So the short answer is that the neurons are elements in 1D arrays.
Let's look at the activation of one unit in layer 3 of a deep neural network:
a_3 = sigma(w_3 # a_2 + b_3)
So:
a_3 will be a scalar — the activation for this unit;
sigma is the activation function (e.g. logistic function, tanh, or ReLU)
w_3 is the vector of weights for this layer (one element for each unit of layer 2)
a_2 is the vector of activations for the previous layer (one element for each unit of layer 2)
b_3 is the bias for this unit.
Note that w_3 and a_2 are both 1D arrays (vectors). The # operator does matrix multiplication in Python 3.4+ and in this case it's going to perform the dot product.
Now, thanks to the magic of linear algebra, it turns out we don't need to loop over each unit to compute all the activations. If we let the weight vector w_3 be a matrix, W_3 (note the capital letter), then it can represent the weights for all units in layer 2, connecting to all units in layer 3. Then:
a_3 = sigma(W_3 # a_2 + b_3)
Now a_3 will be a vector.
The tricky part is keeping track of all the shapes. For W # a to work, the shapes must be compatible. For example, imagine we have a network with 2 units in layer 2 (so a_2 is a 1D array with 2 elements) and 3 units in layers 3 (so a_3 needs to have 3 elements). Now W_3 needs to be 3 × 2 and a_2 needs to be 2 × 1. Then the matrix multiply works. You can just use np.reshape() and np.transpose() to achieve the shapes you need.
I hope this helps... it's a lot of words.
Maybe this diagram (from this article) helps explain:
The diagram doesn't say how many records there are. There are 3 features per data instance (i.e. we have an M × 3 input matrix). The input layer is 'just' another layer, you can think of the inputs as just another set of activations. You could think of x as a_0 (the 0-th layer).
This 3 Blue 1 Brown video is well worth watching too: https://www.youtube.com/watch?v=aircAruvnKk

Related

Pytorch 2D Feature Tensor 1D output Same weights across dimension

I am handeling a timeseries dataset with n timesteps, m features and k objects.
As a result my feature vector has a shape of (n,k,m) While my targets shape is (n,m)
I want to predict the targets for every timestep and object, but with the same weights for every opject. Also my loss function looks like this.
average_loss = loss_func(prediction, labels)
sum_loss = loss_func(sum(prediction), sum(labels))
loss = loss_weight * average_loss + (1-loss_weight) * sum_loss
My plan is to not only make sure, that I predict every item as good as possible, but also that the sum of all items get perdicted. loss_weights is a constant.
Currently I am doing this kind of ugly solution:
features = local_batch.squeeze(dim = 0)
labels = torch.unsqueeze(local_labels.squeeze(dim = 0), 1)
prediction = net(features)
I set my batchsize = 1. And squeeze it to make the k objects my batch.
My network looks like this:
def __init__(self, n_feature, n_hidden, n_output):
super(Net, self).__init__()
self.hidden = torch.nn.Linear(n_feature, n_hidden) # hidden layer
self.predict = torch.nn.Linear(n_hidden, n_output) # output layer
def forward(self, x):
x = F.relu(self.hidden(x)) # activation function for hidden layer
x = self.predict(x) # linear output
return x
How do I make sure I do a reasonable convolution over the opject dimension in order to keep the same weights for all objects, without commiting to batchsize=1? Also, how do I achieve the same loss function, where I compute the loss of the prediction sum vs target sum for any timestamp?
It's not exactly ugly -- I would do the same but generalize it a bit for batch size >1 using view.
# Using your notations
n, k, m = features.shape
features = local_batch.view(n*k, m)
prediction = net(features).view(n, k, m)
With the prediction in the correct shape (n*k*m), implementing your loss function should not be difficult.

Why does this TensorFlow example not have a summation before the activation function?

I'm trying to understand a TensorFlow code snippet. What I've been taught is that we sum all the incoming inputs and then pass them to an activation function. Shown in the picture below is a single neuron. Notice that we compute a weighted sum of the inputs and THEN compute the activation.
In most examples of the multi-layer perceptron, they don't include the summation step. I find this very confusing.
Here is an example of one of those snippets:
weights = {
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
}
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'b2': tf.Variable(tf.random_normal([n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_classes]))
}
# Create model
def multilayer_perceptron(x):
# Hidden fully connected layer with 256 neurons
layer_1 = tf.nn.relu(tf.add(tf.matmul(x, weights['h1']), biases['b1']))
# Hidden fully connected layer with 256 neurons
layer_2 = tf.nn.relu(tf.add(tf.matmul(layer_1, weights['h2']), biases['b2']))
# Output fully connected layer with a neuron for each class
out_layer = tf.nn.relu(tf.matmul(layer_2, weights['out']) + biases['out'])
return out_layer
In each layer, we first multiply the inputs with a weights. Afterwards, we add the bias term. Then we pass those to the tf.nn.relu. Where does the summation happen? It looks like we've skipped this!
Any help would be really great!
The last layer of your model out_layer outputs probabilities of each class Prob(y=yi|X) and has shape [batch_size, n_classes]. To calculate these probabilities the softmax
function is applied. For each single input data point x that your model receives it outputs a vector of probabilities y of size number of classes. You then pick the one that has highest probability by applying argmax on the output vector class=argmax(P(y|x)) which can be written in tensorflow as y_pred = tf.argmax(out_layer, 1).
Consider network with a single layer. You have input matrix X of shape [n_samples, x_dimension] and you multiply it by some matrix W that has shape [x_dimension, model_output]. The summation that you're talking about is dot product between the row of matrix X and column of matrix W. The output will then have shape [n_samples, model_output]. On this output you apply activation function (if it is the final layer you probably want softmax). Perhaps the picture that you've shown is a bit misleading.
Mathematically, the layer without bias can be described as and suppose that the first row of matrix (the first row is a single input data point) is
and first column of W is
The result of this dot product is given by
which is your summation. You repeat this for each column in matrix W and the result is vector of size model_output (which correspond to the number of columns in W). To this vector you add bias (if needed) and then apply activation.
The tf.matmul operator performs a matrix multiplication, which means that each element in the resulting matrix is a sum of products (which corresponds exactly to what you describe).
Take a simple example with a row-vector and a column-vector, as would be the case if you had exactly one neuron and an input vector (as per the graphic you shared above);
x = [2,3,1]
y = [3,
1,
2]
Then the result would be:
tf.matmul(x, y) = 2*3 + 3*1 +1*2 = 11
There you can see the weighted sum.
p.s: tf.multiply performs element-wise multiplication, which is not what we want here.

Initializing a bias term in my nonlinear regression model using TensorFlow

I am trying to make a basic nonlinear regression model that will predict the return index of companies in the FTSE350.
I am unsure as to what my bias term should look like in terms of dimensions and whether I am using it properly in the calculations method:
w1 = tf.Variable(tf.truncated_normal([4, 10], mean=0.0, stddev=1.0, dtype=tf.float64))
b1 = tf.Variable(tf.constant(0.1, shape=[4,10], dtype = tf.float64))
w2 = tf.Variable(tf.truncated_normal([10, 1], mean=0.0, stddev=1.0, dtype=tf.float64))
b2 = tf.Variable(tf.constant(0.1, shape=[1], dtype = tf.float64))
def calculations(x, y):
w1d = tf.matmul(x, w1)
h1 = (tf.nn.sigmoid(tf.add(w1d, b1)))
h1w2 = tf.matmul(h1, w2)
activation = tf.add(tf.nn.sigmoid(tf.matmul(h1, w2)), b2)
error = tf.reduce_sum(tf.pow(activation - y,2))/(len(x))
return [ activation, error ]
My initial thoughts were that it should be the same size as my weights but I get this error:
ValueError: Dimensions must be equal, but are 251 and 4 for 'Add' (op: 'Add') with input shapes: [251,10], [4,10]
I've played around with different ideas but don't seem to be getting anywhere.
(My input data has 4 features)
The network structure I have attempted is 4 neurons in the input layer, 10 in the hidden layer, and 1 in the output later but I feel like I may mixed up the dimensions in my weights layer too?
When you are constructing the layers for a feed-forward fully-connected neural network (like in your example), the shape of the biases should be equal to the number of nodes in the corresponding layer. So in your case, since your weight matrix has a shape of (4, 10), you have 10 nodes in that layer and you should be using:
b1 = tf.Variable(tf.constant(0.1, shape=[10], type = tf.float64))
The reason for this is when you do w1d = tf.matmul(x, w1), you are actually getting a matrix of shape (batch_size, 10) (if batch_size is the number of rows in your input matrix). This is because you are matrix multiplying a (batch_size, 4) matrix by a (4, 10) weight matrix. Then, you are adding a bias across each column of w1d, which can be represented as a 10-dimensional vector, which you would get if you made the shape of b1 [10].
Without the non-linearity (sigmoid) afterward, this is called an affine transformation, which you can read more about here: https://en.wikipedia.org/wiki/Affine_transformation.
Another fantastic resource is the Stanford Deep Learning Tutorial, which has a good explanation of how these feed-forward models work here:
http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/.
Hope that helped!
I think your b1 should just be of dimention 10 and your code should run
Since 4 is the number of features and 10 is the number of neurones in your first layer (i think in term of neural net ...)
then you must add a bias of dimention = 10
Also you might see the biases as adding an extra feature of constant value = 1.
see this pdf if you have time it expalin very well :https://cs.stanford.edu/~quocle/tutorial1.pdf

How to get bias and neuron weights in optimizer?

In a TensorFlow optimizer (python) the method apply_dense does get called for the neuron weights (layer connections) and the bias weights but I would like to use both in this method.
def _apply_dense(self, grad, weight):
...
For example: A fully connected neural network with two hidden layer with two neurons and a bias for each.
If we take a look at layer 2 we get in apply_dense a call for the neuron weights:
and a call for the bias weights:
But I would either need both matrix in one call of apply_dense or a weight matrix like this:
X_2X_4, B_1X_4, ... is just a notation for the weight of the connection between the two neurons. Therefore B_1X_4 ist only a placeholder for the weight between B_1 and X_4.
How to do this?
MWE
For an minimal working example here a stochastic gradient descent optimizer implementation with a momentum. For every layer the momentum of all incoming connections from other neurons is reduced to the mean (see ndims == 2). What i need instead is the mean of not only the momentum values from the incoming neuron connections but also from the incoming bias connections (as described above).
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
from tensorflow.python.training import optimizer
class SGDmomentum(optimizer.Optimizer):
def __init__(self, learning_rate=0.001, mu=0.9, use_locking=False, name="SGDmomentum"):
super(SGDmomentum, self).__init__(use_locking, name)
self._lr = learning_rate
self._mu = mu
self._lr_t = None
self._mu_t = None
def _create_slots(self, var_list):
for v in var_list:
self._zeros_slot(v, "a", self._name)
def _apply_dense(self, grad, weight):
learning_rate_t = tf.cast(self._lr_t, weight.dtype.base_dtype)
mu_t = tf.cast(self._mu_t, weight.dtype.base_dtype)
momentum = self.get_slot(weight, "a")
if momentum.get_shape().ndims == 2: # neuron weights
momentum_mean = tf.reduce_mean(momentum, axis=1, keep_dims=True)
elif momentum.get_shape().ndims == 1: # bias weights
momentum_mean = momentum
else:
momentum_mean = momentum
momentum_update = grad + (mu_t * momentum_mean)
momentum_t = tf.assign(momentum, momentum_update, use_locking=self._use_locking)
weight_update = learning_rate_t * momentum_t
weight_t = tf.assign_sub(weight, weight_update, use_locking=self._use_locking)
return tf.group(*[weight_t, momentum_t])
def _prepare(self):
self._lr_t = tf.convert_to_tensor(self._lr, name="learning_rate")
self._mu_t = tf.convert_to_tensor(self._mu, name="momentum_term")
For a simple neural network: https://raw.githubusercontent.com/aymericdamien/TensorFlow-Examples/master/examples/3_NeuralNetworks/multilayer_perceptron.py (only change the optimizer to the custom SGDmomentum optimizer)
Update: I'll try to give a better answer (or at least some ideas) now that I have some understanding of your goal, but, as you suggest in the comments, there is probably not infallible way of doing this in TensorFlow.
Since TF is a general computation framework, there is no good way of determining what pairs of weights and biases are there in a model (or if it is a neural network at all). Here are some possible approaches to the problem that I can think of:
Annotating the tensors. This is probably not practical since you already said you have no control over the model, but an easy option would be to add extra attributes to the tensors to signify the weight/bias relationships. For example, you could do something like W.bias = B and B.weight = W, and then in _apply_dense check hasattr(weight, "bias") and hasattr(weight, "weight") (there may be some better designs in this sense).
You can look into some framework built on top of TensorFlow where you may have better information about the model structure. For example, Keras is a layer-based framework that implements its own optimizer classes (based on TensorFlow or Theano). I'm not too familiar with the code or its extensibility, but probably you have more tools there to use.
Detect the structure of the network yourself from the optimizer. This is quite complicated, but theoretically possible. from the loss tensor passed to the optimizer, it should be possible to "climb up" in the model graph to reach all of its nodes (taking the .op of the tensors and the .inputs of the ops). You could detect tensor multiplications and additions with variables and skip everything else (activations, loss computation, etc) to determine the structure of the network; if the model does not match your expectations (e.g. there are no multiplications or there is a multiplication without a later addition) you can raise an exception indicating that your optimizer cannot be used for that model.
Old answer, kept for the sake of keeping.
I'm not 100% clear on what you are trying to do, so I'm not sure if this really answers your question.
Let's say you have a dense layer transforming an input of size M to an output of size N. According to the convention you show, you'd have an N × M weights matrix W and a N-sized bias vector B. Then, an input vector X of size M (or a batch of inputs of size M × K) would be processed by the layer as W · X + B, and then applying the activation function (in the case of a batch, the addition would be a "broadcasted" operation). In TensorFlow:
X = ... # Input batch of size M x K
W = ... # Weights of size N x M
B = ... # Biases of size N
Y = tf.matmul(W, X) + B[:, tf.newaxis] # Output of size N x K
# Activation...
If you want, you can always put W and B together in a single extended weights matrix W*, basically adding B as a new row in W, so W* would be (N + 1) × M. Then you just need to add a new element to the input vector X containing a constant 1 (or a new row if it's a batch), so you would get X* with size N + 1 (or (N + 1) × K for a batch). The product W* · X* would then give you the same result as before. In TensorFlow:
X = ... # Input batch of size M x K
W_star = ... # Extended weights of size (N + 1) x M
# You can still have a "view" of the original W and B if you need it
W = W_star[:N]
B = W_star[-1]
X_star = tf.concat([X, tf.ones_like(X[:1])], axis=0)
Y = tf.matmul(W_star, X_star) # Output of size N x K
# Activation...
Now you can compute gradients and updates for weights and biases together. A drawback of this approach is that if you want to apply regularization then you should be careful to apply it only on the weights part of the matrix, not on the biases.

numpy : calculate the derivative of the softmax function

I am trying to understand backpropagation in a simple 3 layered neural network with MNIST.
There is the input layer with weights and a bias. The labels are MNIST so it's a 10 class vector.
The second layer is a linear tranform. The third layer is the softmax activation to get the output as probabilities.
Backpropagation calculates the derivative at each step and call this the gradient.
Previous layers appends the global or previous gradient to the local gradient. I am having trouble calculating the local gradient of the softmax
Several resources online go through the explanation of the softmax and its derivatives and even give code samples of the softmax itself
def softmax(x):
"""Compute the softmax of vector x."""
exps = np.exp(x)
return exps / np.sum(exps)
The derivative is explained with respect to when i = j and when i != j. This is a simple code snippet I've come up with and was hoping to verify my understanding:
def softmax(self, x):
"""Compute the softmax of vector x."""
exps = np.exp(x)
return exps / np.sum(exps)
def forward(self):
# self.input is a vector of length 10
# and is the output of
# (w * x) + b
self.value = self.softmax(self.input)
def backward(self):
for i in range(len(self.value)):
for j in range(len(self.input)):
if i == j:
self.gradient[i] = self.value[i] * (1-self.input[i))
else:
self.gradient[i] = -self.value[i]*self.input[j]
Then self.gradient is the local gradient which is a vector. Is this correct? Is there a better way to write this?
I am assuming you have a 3-layer NN with W1, b1 for is associated with the linear transformation from input layer to hidden layer and W2, b2 is associated with linear transformation from hidden layer to output layer. Z1 and Z2 are the input vector to the hidden layer and output layer. a1 and a2 represents the output of the hidden layer and output layer. a2 is your predicted output. delta3 and delta2 are the errors (backpropagated) and you can see the gradients of the loss function with respect to model parameters.
This is a general scenario for a 3-layer NN (input layer, only one hidden layer and one output layer). You can follow the procedure described above to compute gradients which should be easy to compute! Since another answer to this post already pointed to the problem in your code, i am not repeating the same.
As I said, you have n^2 partial derivatives.
If you do the math, you find that dSM[i]/dx[k] is SM[i] * (dx[i]/dx[k] - SM[i]) so you should have:
if i == j:
self.gradient[i,j] = self.value[i] * (1-self.value[i])
else:
self.gradient[i,j] = -self.value[i] * self.value[j]
instead of
if i == j:
self.gradient[i] = self.value[i] * (1-self.input[i])
else:
self.gradient[i] = -self.value[i]*self.input[j]
By the way, this may be computed more concisely like so (vectorized):
SM = self.value.reshape((-1,1))
jac = np.diagflat(self.value) - np.dot(SM, SM.T)
np.exp is not stable because it has Inf.
So you should subtract maximum in x.
def softmax(x):
"""Compute the softmax of vector x."""
exps = np.exp(x - x.max())
return exps / np.sum(exps)
If x is matrix, please check the softmax function in this notebook.

Categories

Resources