LSTM cell implementation in Pytorch design choices

LSTM cell implementation in Pytorch design choices - python

I was looking for an implementation of an LSTM cell in Pytorch that I could extend, and I found an implementation of it in the accepted answer here. I will post it here because I'd like to refer to it. There are quite a few implementation details that I do not understand, and I was wondering if someone could clarify.
import math
import torch as th
import torch.nn as nn
class LSTM(nn.Module):
def __init__(self, input_size, hidden_size, bias=True):
super(LSTM, self).__init__()
self.input_size = input_size
self.hidden_size = hidden_size
self.bias = bias
self.i2h = nn.Linear(input_size, 4 * hidden_size, bias=bias)
self.h2h = nn.Linear(hidden_size, 4 * hidden_size, bias=bias)
self.reset_parameters()
def reset_parameters(self):
std = 1.0 / math.sqrt(self.hidden_size)
for w in self.parameters():
w.data.uniform_(-std, std)
def forward(self, x, hidden):
h, c = hidden
h = h.view(h.size(1), -1)
c = c.view(c.size(1), -1)
x = x.view(x.size(1), -1)
# Linear mappings
preact = self.i2h(x) + self.h2h(h)
# activations
gates = preact[:, :3 * self.hidden_size].sigmoid()
g_t = preact[:, 3 * self.hidden_size:].tanh()
i_t = gates[:, :self.hidden_size]
f_t = gates[:, self.hidden_size:2 * self.hidden_size]
o_t = gates[:, -self.hidden_size:]
c_t = th.mul(c, f_t) + th.mul(i_t, g_t)
h_t = th.mul(o_t, c_t.tanh())
h_t = h_t.view(1, h_t.size(0), -1)
c_t = c_t.view(1, c_t.size(0), -1)
return h_t, (h_t, c_t)
1- Why multiply the hidden size by 4 for both self.i2h and self.h2h (in the init method)
2- I don't understand the reset method for the parameters. In particular, why do we reset parameters in this way?
3- Why do we use view for h, c, and x in the forward method?
4- I'm also confused about the column bounds in the activations part of the forward method. As an example, why do we upper bound with 3 * self.hidden_size for gates?
5- Where are all the parameters of the LSTM? I'm talking about the Us and Ws here:

1- Why multiply the hidden size by 4 for both self.i2h and self.h2h (in the init method)
In the equations you have included, the input x and the hidden state h are used for four calculations, where each of them is a matrix multiplication with a weight. Whether you do four matrix multiplications or concatenate the weights and do one bigger matrix multiplication and separate the results afterwards, has the same result.
input_size = 5
hidden_size = 10
input = torch.randn((2, input_size))
# Two different weights
w_c = torch.randn((hidden_size, input_size))
w_i = torch.randn((hidden_size, input_size))
# Concatenated weights into one tensor
# with size:[2 * hidden_size, input_size]
w_combined = torch.cat((w_c, w_i), dim=0)
# Output calculated by using separate matrix multiplications
out_c = torch.matmul(w_c, input.transpose(0, 1))
out_i = torch.matmul(w_i, input.transpose(0, 1))
# One bigger matrix multiplication with the combined weights
out_combined = torch.matmul(w_combined, input.transpose(0, 1))
# The first hidden_size number of rows belong to w_c
out_combined_c = out_combined[:hidden_size]
# The second hidden_size number of rows belong to w_i
out_combined_i = out_combined[hidden_size:]
# Using torch.allclose because they are equal besides floating point errors.
torch.allclose(out_c, out_combined_c) # => True
torch.allclose(out_i, out_combined_i) # => True
By setting the output size of the linear layer to 4 * hidden_size there are four weights with size hidden_size, so only one layer is needed instead of four. There is not really an advantage of doing this, except maybe a minor performance improvement, mostly for smaller inputs that don't fully exhaust the parallelisations capabilities if done individually.
4- I'm also confused about the column bounds in the activations part of the forward method. As an example, why do we upper bound with 3 * self.hidden_size for gates?
That's where the outputs are separated to correspond to the output of the four individual calculations. The output is the concatenation of [i_t; f_t; o_t; g_t] (not including tanh and sigmoid respectively).
You can get the same separation by splitting the output into four chunks with torch.chunk:
i_t, f_t, o_t, g_t = torch.chunk(preact, 4, dim=1)
But after the separation you would have to apply torch.sigmoid to i_t, f_t and o_t, and torch.tanh to g_t.
5- Where are all the parameters of the LSTM? I'm talking about the Us and Ws here:
The parameters W are the weights in the linear layer self.i2h and U in the linear layer self.h2h, but concatenated.
W_i, W_f, W_o, W_c = torch.chunk(self.i2h.weight, 4, dim=0)
U_i, U_f, U_o, U_c = torch.chunk(self.h2h.weight, 4, dim=0)
3- Why do we use view for h, c, and x in the forward method?
Based on h_t = h_t.view(1, h_t.size(0), -1) towards the end, the hidden states have the size [1, batch_size, hidden_size]. With h = h.view(h.size(1), -1) that gets rid of the first singular dimension to get size [batch_size, hidden_size]. The same could be achieved with h.squeeze(0).
2- I don't understand the reset method for the parameters. In particular, why do we reset parameters in this way?
Parameter initialisation can have a big impact on the model's learning capability. The general rule for the initialisation is to have values close to zero without being too small. A common initialisation is to draw from a normal distribution with mean 0 and variance of 1 / n, where n is the number of neurons, which in turn means a standard deviation of 1 / sqrt(n).
In this case it uses a uniform distribution instead of a normal distribution, but the general idea is similar. Determining the minimum/maximum value based on the number of neurons but avoiding to make them too small. If the minimum/maximum value would be 1 / n the values would get very small, so using 1 / sqrt(n) is more appropriate, e.g. 256 neurons: 1 / 256 = 0.0039 whereas 1 / sqrt(256) = 0.0625.
Initializing neural networks provides some explanations of different initialisations with interactive visualisations.

Related

How can I get the in and out edges weights for each neuron in a neural network?

Say I have the following network
import torch
import torch.nn as nn
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.fc1 = nn.Linear(1, 2)
self.fc2 = nn.Linear(2, 3)
self.fc3 = nn.Linear(3, 1)
def forward(self, x):
x = torch.relu(self.fc1(x))
x = torch.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Model()
I know that I can access the weights (i.e edges) at each layer:
net.fc1.weight
However, I'm trying to create a function that randomly selects a neuron from the entire network and outputs it's in-connections (i.e the edges/weights that are attached to it from the previous layer) and its out-connections (i.e the edges/weights that are going out of it to the next layer).
pseudocode:
def get_neuron_in_out_edges(list_of_neurons):
shuffled_list_of_neurons = shuffle(list_of_neurons)
in_connections_list = []
out_connections_list = []
for neuron in shuffled_list_of_neurons:
in_connections = get_in_connections(neuron) # a list of connections
out_connections = get_out_connections(neuron) # a list of connections
in_connections_list.append([neuron,in_connections])
out_connections_list.append([neuron,out_connections])
return in_connections_list, out_connections_list
The idea is that I can then access these values and say if they're smaller than 10, change them to 10 in the network. This is for a networks class where we're working on plotting different networks so this doesn't have to make much sense from a machine learning perspective

Let's ignore biases for this discussion.
A linear layer computes the output y given weights w and inputs x as:
y_i = sum_j w_ij x_j
So, for neuron i all the incoming edges are the weights w_ij - that is the i-th row of the weight matrix W.
Similarly, for input neuron j it affects all y_i according to the j-th column of the weight matrix W.

How can I properly initialize the needed vectors in a Neural Network Model?

I'm trying to implement a Neural Network Model from scratch in Python (using Numpy). For reference, I'm using the Chapter e-7 of this book (Learning from data, by Professor Abu-Mostafa) as a theoretical support.
One of the first problems that I'm facing is how to correctly initialize the matrix of weights and the vectors of inputs and outputs (W, x and s, respectively).
Here is the my approach:
Let L be the number of layers (you do not count the 'first' layer; i.e., the layer of the vector x plus 'bias').
Let d be the dimension of the hidden layers (I'm assuming that all hidden layers have the same number of nodes).
Let out be the number of nodes at the last layer (it is typically 1).
Now, here is how I defined the matrix and vectors of interest:
Let w_ be the vector of weights. Actually, it is a vector in which each component is a matrix the of the form W_{L}. Here, the (i, j)-th value is the w_{i, j}^{(L)} term.
Let x_ be the vector of inputs.
Let s_ be the vector of outputs; you may see s_ as numpy.dot(W^{L}.T, x^{L-1}).
The following image summarizes what I've just described:
The problem arises from the fact that the dimensions of each layer (input, hidden layers and output) are NOT the same. What I was trying to do is to split each vector into different variables; however, work with it in the following steps of the algorithm is extremely difficult (because of how the indexes become a mess). Here is the piece of code that replicates my attempt:
class NeuralNetwork:
"""
Neural Network Model
"""
def __init__(self, L, d, out):
self.L = L # number of layers
self.d = d # dimension of hidden layers
self.out = out # dimension of the output layer
def initialize_(self, X):
# Initialize the vector of inputs
self.x_ = np.zeros((self.L - 1) * (self.d + 1)).reshape(self.L - 1, self.d + 1)
self.xOUT_ = np.zeros(1 * self.out).reshape(1, self.out)
# Initialize the vector of outputs
self.s_ = np.zeros((self.L - 1) * (self.d)).reshape(self.L - 1, self.d)
self.sOUT_ = np.zeros(1 * self.out).reshape(1, self.out)
# Initialize the vector of weights
self.wIN_ = np.random.normal(0, 0.1, 1 * (X.shape[1] + 1) * self.d).reshape(1, X.shape[1] + 1, self.d)
self.w_ = np.random.normal(0, 0.1, (self.L - 2) * (self.d + 1) * self.d).reshape(self.L - 2, self.d + 1, self.d)
self.wOUT_ = np.random.normal(0, 0.1, 1 * (self.d + 1) * self.out).reshape(1, self.d + 1, self.out)
def fit(self, X, y):
self.initialize_(X)
Whenever IN or OUT appear in the code, that is my way to deal with the differences of dimension between the input and output layers, respectively.
Clearly, this is NOT a good way to do it. So my question is: How can I work with these different dimensional vectors (with respect to each layer) in a clever way?
For example, after initialize them, I want to reproduce the following algorithm (forward-propagation) - you will see that, with my way to index things, it becomes almost impossible:
Where \theta(s) = \tanh(s).
P.S.: I also tried to create an array of arrays (or an array of list), but if I do that, my indexes become useless - they do not represent anymore what I wanted them to represent.

You could encapsulate the neuron logic and let the neurons perform the calculations individually:
class Neuron:
def __init__(self, I, O, b):
self.I = I # input neurons from previous layer
self.O = O # output neurons in next layer
self.b = b # bias
def activate(self, X):
output = np.dot(self.I, X) + self.b
...
return theta(output)

Efficient way to avoid modifying parameter by inplace operation

I have a model which has noisy linear layers (for which you can sample values from a mu and sigma parameter) and need to create two decorrelated outputs of it.
This means I have something like:
model.sample_noise()
output_1 = model(input)
with torch.no_grad():
model.sample_noise()
output_2 = model(input)
sample_noise actually modifies weights attached to the model according to a normal distribution.
But in the end this leads to
RuntimeError: one of the variables needed for gradient computation has been
modified by an inplace operation
The question actually is, what's the best way to avoid modifying these parameters. I could actually deepcopy the model every iteration and then use it for the second forward pass, but this does not sound very efficient to me.

If I understand your problem correctly, you want to have a linear layer with matrix M and then create two outputs
y_1 = (M + μ_1) * x + b
y_2 = (M + μ_2) * x + b
where μ_1, μ_2 ~ P. The simplest way would be, in my opinion, to create a custom class
import torch
import torch.nn.functional as F
from torch import nn
class NoisyLinear(nn.Module):
def __init__(self, n_in, n_out):
super(NoisyLinear, self).__init__()
# or any other initialization you want
self.weight = nn.Parameter(torch.randn(n_out, n_in))
self.bias = nn.Parameter(torch.randn(n_out))
def sample_noise(self):
# implement your noise generation here
return torch.randn(*self.weight.shape) * 0.01
def forward(self, x):
noise = self.sample_noise()
return F.linear(x, self.weight + noise, self.bias)
nl = NoisyLinear(4, 3)
x = torch.randn(2, 4)
y1 = nl(x)
y2 = nl(x)
print(y1, y2)

Max pooling layers have weights. interesting work

In the pytorch tutorial step of " Deep Learning with PyTorch: A 60 Minute Blitz > Neural Networks "
I have a question that what dose mean params[1] in the networks?
The reason why i have this think is because of max polling dose not have any weight values.
for example.
If you write some codes like that
'
def init(self) :
self.conv1 = nn.Conv2d(1 , 6 , 5)
'
this means input has 1 channel, 6 output channel, conv(5,5)
So i understood that params[0] has 6 channel, 5 by 5 matrix random mapping values when init.
for the same reason
params[2] has like same form, but 16 channel. i understood this too.
but params[1], what dose mean?
Maybe it is just presentation method of existence for max polling.
but at the end of this tutorial, in step of the " update the weights "
It's likely updated by this code below.
learning_rate = 0.01
for f in net.parameters():
f.data.sub_(f.grad.data * learning_rate)
this is code for construct a network
import torch
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# 1 input image channel, 6 output channels, 5x5 square convolution
# kernel
self.conv1 = nn.Conv2d(1, 6, 5)
self.conv2 = nn.Conv2d(6, 16, 5)
# an affine operation: y = Wx + b
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
# Max pooling over a (2, 2) window
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
# If the size is a square you can only specify a single number
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
x = x.view(-1, self.num_flat_features(x))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
def num_flat_features(self, x):
size = x.size()[1:] # all dimensions except the batch dimension
num_features = 1
for s in size:
num_features *= s
return num_features
params = list(net.parameters())
print(params[1])
Parameter containing:
tensor([-0.0614, -0.0778, 0.0968, -0.0420, 0.1779, -0.0843],
requires_grad=True)
please visit this pytorch tutorial site.
https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#sphx-glr-beginner-blitz-neural-networks-tutorial-py
Summary
I have a one question.
that is why max pooling layers has four weights which can be updated ?
I think they shouldn't have any weights right?
Am I wrong?
Please help me. I'm a korean.

You are wrong about that. It has nothing to do with max_pooling.
As you can read in your "linked" Tutorial, is the "nn.paramter"-Tensor automatically registered as parameter when it gets assigned to a Module.
Which, in your case basically, means that everything listed within the __init__ is a module and parameter can be assigned to.
What the values mean inside parameter, well its the parameter your model needs to calculate its steps. to picture it
params[0] -> self.conf1 -> Layer-Input
params[1] -> self.conf1 -> Layer-Output
params[2] -> self.conf2 -> Layer-Input
params[3] -> self.conf2 -> Layer-Output
params[4] -> self.fc1 -> Layer-Input
params[5] -> self.fc1 -> Layer-Output
and so on until you reach params[9], which is the end of your whole parameter list.
EDIT: forgot about the weights
These values are indicator of what your Net has learned.
Therefore you have the ability to alter these values in order to fine-tune your Net to fit your needs.
And if you ask why then 2 rows for each layer?
Well, when you do backpropagation you need these values to locate issues within your Layers.
That's why it stored before passed into a Layer and then after returning from that layer.
hope things are a little clearer now.

Neural Network seems to be getting stuck on a single output with each execution

I've created a neural network to estimate the sin(x) function for an input x. The network has 21 output neurons (representing numbers -1.0, -0.9, ..., 0.9, 1.0) with numpy that does not learn, as I think I implemented the neuron architecture incorrectly when I defined the feedforward mechanism.
When I execute the code, the amount of test data it estimates correctly sits around 48/1000. This happens to be the average data point count per category if you split 1000 test data points between 21 categories. Looking at the network output, you can see that the network seems to just start picking a single output value for every input. For example, it may pick -0.5 as the estimate for y regardless of the x you give it. Where did I go wrong here? This is my first network. Thank you!
import random
import numpy as np
import math
class Network(object):
def __init__(self,inputLayerSize,hiddenLayerSize,outputLayerSize):
#Create weight vector arrays to represent each layer size and initialize indices randomly on a Gaussian distribution.
self.layer1 = np.random.randn(hiddenLayerSize,inputLayerSize)
self.layer1_activations = np.zeros((hiddenLayerSize, 1))
self.layer2 = np.random.randn(outputLayerSize,hiddenLayerSize)
self.layer2_activations = np.zeros((outputLayerSize, 1))
self.outputLayerSize = outputLayerSize
self.inputLayerSize = inputLayerSize
self.hiddenLayerSize = hiddenLayerSize
# print(self.layer1)
# print()
# print(self.layer2)
# self.weights = [np.random.randn(y,x)
# for x, y in zip(sizes[:-1], sizes[1:])]
def feedforward(self, network_input):
#Propogate forward through network as if doing this by hand.
#first layer's output activations:
for neuron in range(self.hiddenLayerSize):
self.layer1_activations[neuron] = 1/(1+np.exp(network_input * self.layer1[neuron]))
#second layer's output activations use layer1's activations as input:
for neuron in range(self.outputLayerSize):
for weight in range(self.hiddenLayerSize):
self.layer2_activations[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
self.layer2_activations[neuron] = 1/(1+np.exp(self.layer2_activations[neuron]))
#convert layer 2 activation numbers to a single output. The neuron (weight vector) with highest activation will be output.
outputs = [x / 10 for x in range(-int((self.outputLayerSize/2)), int((self.outputLayerSize/2))+1, 1)] #range(-10, 11, 1)
return(outputs[np.argmax(self.layer2_activations)])
def train(self, training_pairs, epochs, minibatchsize, learn_rate):
#apply gradient descent
test_data = build_sinx_data(1000)
for epoch in range(epochs):
random.shuffle(training_pairs)
minibatches = [training_pairs[k:k + minibatchsize] for k in range(0, len(training_pairs), minibatchsize)]
for minibatch in minibatches:
loss = 0 #calculate loss for each minibatch
#Begin training
for x, y in minibatch:
network_output = self.feedforward(x)
loss += (network_output - y) ** 2
#adjust weights by abs(loss)*sigmoid(network_output)*(1-sigmoid(network_output)*learn_rate
loss /= (2*len(minibatch))
adjustWeights = loss*(1/(1+np.exp(-network_output)))*(1-(1/(1+np.exp(-network_output))))*learn_rate
self.layer1 += adjustWeights
#print(adjustWeights)
self.layer2 += adjustWeights
#when line 63 placed here, results did not improve during minibatch.
print("Epoch {0}: {1}/{2} correct".format(epoch, self.evaluate(test_data), len(test_data)))
print("Training Complete")
def evaluate(self, test_data):
"""
Returns number of test inputs which network evaluates correctly.
The ouput assumed to be neuron in output layer with highest activation
:param test_data: test data set identical in form to train data set.
:return: integer sum
"""
correct = 0
for x, y in test_data:
output = self.feedforward(x)
if output == y:
correct+=1
return(correct)
def build_sinx_data(data_points):
"""
Creates a list of tuples (x value, expected y value) for Sin(x) function.
:param data_points: number of desired data points
:return: list of tuples (x value, expected y value
"""
x_vals = []
y_vals = []
for i in range(data_points):
#parameter of randint signifies range of x values to be used*10
x_vals.append(random.randint(-2000,2000)/10)
y_vals.append(round(math.sin(x_vals[i]),1))
return (list(zip(x_vals,y_vals)))
# training_pairs, epochs, minibatchsize, learn_rate
sinx_test = Network(1,21,21)
print(sinx_test.feedforward(10))
sinx_test.train(build_sinx_data(600),20,10,2)
print(sinx_test.feedforward(10))

I didn't examine thoroughly all of your code, but some issues are clearly visible:
* operator doesn't perform matrix multiplication in numpy, you have to use numpy.dot. This affects, for instance, these lines: network_input * self.layer1[neuron], self.layer1_activations[weight]*self.layer2[neuron][weight], etc.
Seems like you are solving your problem via classification (selecting 1 out of 21 classes), but using L2 loss. This is somewhat mixed up. You have two options: either stick to classification and use a cross entropy loss function, or perform regression (i.e. predict the numeric value) with L2 loss.
You should definitely extract sigmoid function to avoid writing the same expression all over again:
def sigmoid(z):
return 1 / (1 + np.exp(-z))
def sigmoid_derivative(x):
return sigmoid(x) * (1 - sigmoid(x))
You perform the same update of self.layer1 and self.layer2, which clearly wrong. Take some time analyzing how exactly backpropagation works.

I edited how my loss function was integrated into my function and also correctly implemented gradient descent. I also removed the use of mini-batches and simplified what my network was trying to do. I now have a network which attempts to classify something as even or odd.
Some extremely helpful guides I used to fix things up:
Chapter 1 and 2 of Neural Networks and Deep Learning, by Michael Nielsen, available for free at http://neuralnetworksanddeeplearning.com/chap1.html . This book gives thorough explanations for how Neural Nets work, including breakdowns of the math behind their execution.
Backpropagation from the Beginning, by Erik Hallström, linked by Maxim. https://medium.com/#erikhallstrm/backpropagation-from-the-beginning-77356edf427d
. Not as thorough as the above guide, but I kept both open concurrently, as this guide is more to the point about what is important and how to apply the mathematical formulas that are thoroughly explained in Nielsen's book.
How to build a simple neural network in 9 lines of Python code https://medium.com/technology-invention-and-more/how-to-build-a-simple-neural-network-in-9-lines-of-python-code-cc8f23647ca1
. A useful and fast introduction to some neural networking basics.
Here is my (now functioning) code:
import random
import numpy as np
import scipy
import math
class Network(object):
def __init__(self,inputLayerSize,hiddenLayerSize,outputLayerSize):
#Layers represented both by their weights array and activation and inputsums vectors.
self.layer1 = np.random.randn(hiddenLayerSize,inputLayerSize)
self.layer2 = np.random.randn(outputLayerSize,hiddenLayerSize)
self.layer1_activations = np.zeros((hiddenLayerSize, 1))
self.layer2_activations = np.zeros((outputLayerSize, 1))
self.layer1_inputsums = np.zeros((hiddenLayerSize, 1))
self.layer2_inputsums = np.zeros((outputLayerSize, 1))
self.layer1_errorsignals = np.zeros((hiddenLayerSize, 1))
self.layer2_errorsignals = np.zeros((outputLayerSize, 1))
self.layer1_deltaw = np.zeros((hiddenLayerSize, inputLayerSize))
self.layer2_deltaw = np.zeros((outputLayerSize, hiddenLayerSize))
self.outputLayerSize = outputLayerSize
self.inputLayerSize = inputLayerSize
self.hiddenLayerSize = hiddenLayerSize
print()
print(self.layer1)
print()
print(self.layer2)
print()
# self.weights = [np.random.randn(y,x)
# for x, y in zip(sizes[:-1], sizes[1:])]
def feedforward(self, network_input):
#Calculate inputsum and and activations for each neuron in the first layer
for neuron in range(self.hiddenLayerSize):
self.layer1_inputsums[neuron] = network_input * self.layer1[neuron]
self.layer1_activations[neuron] = self.sigmoid(self.layer1_inputsums[neuron])
# Calculate inputsum and and activations for each neuron in the second layer. Notice that each neuron in the second layer represented by
# weights vector, consisting of all weights leading out of the kth neuron in (l-1) layer to the jth neuron in layer l.
self.layer2_inputsums = np.zeros((self.outputLayerSize, 1))
for neuron in range(self.outputLayerSize):
for weight in range(self.hiddenLayerSize):
self.layer2_inputsums[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
self.layer2_activations[neuron] = self.sigmoid(self.layer2_inputsums[neuron])
return self.layer2_activations
def interpreted_output(self, network_input):
#convert layer 2 activation numbers to a single output. The neuron (weight vector) with highest activation will be output.
self.feedforward(network_input)
outputs = [x / 10 for x in range(-int((self.outputLayerSize/2)), int((self.outputLayerSize/2))+1, 1)] #range(-10, 11, 1)
return(outputs[np.argmax(self.layer2_activations)])
# def build_expected_output(self, training_data):
# #Views expected output number y for each x to generate an expected output vector from the network
# index=0
# for pair in training_data:
# expected_output_vector = np.zeros((self.outputLayerSize,1))
# x = training_data[0]
# y = training_data[1]
# for i in range(-int((self.outputLayerSize / 2)), int((self.outputLayerSize / 2)) + 1, 1):
# if y == i / 10:
# expected_output_vector[i] = 1
# #expect the target category to be a 1.
# break
# training_data[index][1] = expected_output_vector
# index+=1
# return training_data
def train(self, training_data, learn_rate):
self.backpropagate(training_data, learn_rate)
def backpropagate(self, train_data, learn_rate):
#Perform for each x,y pair.
for datapair in range(len(train_data)):
x = train_data[datapair][0]
y = train_data[datapair][1]
self.feedforward(x)
# print("l2a " + str(self.layer2_activations))
# print("l1a " + str(self.layer1_activations))
# print("l2 " + str(self.layer2))
# print("l1 " + str(self.layer1))
for neuron in range(self.outputLayerSize):
#Calculate first error equation for error signals of output layer neurons
self.layer2_errorsignals[neuron] = (self.layer2_activations[neuron] - y[neuron]) * self.sigmoid_prime(self.layer2_inputsums[neuron])
#Use recursive formula to calculate error signals of hidden layer neurons
self.layer1_errorsignals = np.multiply(np.array(np.matrix(self.layer2.T) * np.matrix(self.layer2_errorsignals)) , self.sigmoid_prime(self.layer1_inputsums))
#print(self.layer1_errorsignals)
# for neuron in range(self.hiddenLayerSize):
# #Use recursive formula to calculate error signals of hidden layer neurons
# self.layer1_errorsignals[neuron] = np.multiply(self.layer2[neuron].T,self.layer2_errorsignals[neuron]) * self.sigmoid_prime(self.layer1_inputsums[neuron])
#Partial derivative of C with respect to weight for connection from kth neuron in (l-1)th layer to jth neuron in lth layer is
#(jth error signal in lth layer) * (kth activation in (l-1)th layer.)
#Update all weights for network at each iteration of a training pair.
#Update weights in second layer
for neuron in range(self.outputLayerSize):
for weight in range(self.hiddenLayerSize):
self.layer2_deltaw[neuron][weight] = self.layer2_errorsignals[neuron]*self.layer1_activations[weight]*(-learn_rate)
self.layer2 += self.layer2_deltaw
#Update weights in first layer
for neuron in range(self.hiddenLayerSize):
self.layer1_deltaw[neuron] = self.layer1_errorsignals[neuron]*(x)*(-learn_rate)
self.layer1 += self.layer1_deltaw
#Comment/Uncomment to enable error evaluation.
#print("Epoch {0}: Error: {1}".format(datapair, self.evaluate(test_data)))
# print("l2a " + str(self.layer2_activations))
# print("l1a " + str(self.layer1_activations))
# print("l1 " + str(self.layer1))
# print("l2 " + str(self.layer2))
def evaluate(self, test_data):
error = 0
for x, y in test_data:
#x is integer, y is single element np.array
output = self.feedforward(x)
error += y - output
return error
#eval function for sin(x)
# def evaluate(self, test_data):
# """
# Returns number of test inputs which network evaluates correctly.
# The ouput assumed to be neuron in output layer with highest activation
# :param test_data: test data set identical in form to train data set.
# :return: integer sum
# """
# correct = 0
# for x, y in test_data:
# outputs = [x / 10 for x in range(-int((self.outputLayerSize / 2)), int((self.outputLayerSize / 2)) + 1,
# 1)] # range(-10, 11, 1)
# newy = outputs[np.argmax(y)]
# output = self.interpreted_output(x)
# #print("output: " + str(output))
# if output == newy:
# correct+=1
# return(correct)
def sigmoid(self, z):
return 1 / (1 + np.exp(-z))
def sigmoid_prime(self, z):
return (1 - self.sigmoid(z)) * self.sigmoid(z)
def build_simple_data(data_points):
x_vals = []
y_vals = []
for each in range(data_points):
x = random.randint(-3,3)
expected_output_vector = np.zeros((1, 1))
if x > 0:
expected_output_vector[[0]] = 1
else:
expected_output_vector[[0]] = 0
x_vals.append(x)
y_vals.append(expected_output_vector)
print(list(zip(x_vals,y_vals)))
print()
return (list(zip(x_vals,y_vals)))
simpleNet = Network(1, 3, 1)
# print("Pretest")
# print(simpleNet.feedforward(-3))
# print(simpleNet.feedforward(10))
# init_weights_l1 = simpleNet.layer1
# init_weights_l2 = simpleNet.layer2
# simpleNet.train(build_simple_data(10000),.1)
# #sometimes Error converges to 0, sometimes error converges to 10.
# print("Initial Weights:")
# print(init_weights_l1)
# print(init_weights_l2)
# print("Final Weights")
# print(simpleNet.layer1)
# print(simpleNet.layer2)
# print("Post-test")
# print(simpleNet.feedforward(-3))
# print(simpleNet.feedforward(10))
def test_network(iterations,net,training_points):
"""
Casually evaluates pre and post test
:param iterations: number of trials to be run
:param net: name of network to evaluate.
;param training_points: size of training data to be used
:return: four 1x1 arrays.
"""
pretest_negative = 0
pretest_positive = 0
posttest_negative = 0
posttest_positive = 0
for each in range(iterations):
pretest_negative += net.feedforward(-10)
pretest_positive += net.feedforward(10)
net.train(build_simple_data(training_points),.1)
for each in range(iterations):
posttest_negative += net.feedforward(-10)
posttest_positive += net.feedforward(10)
return(pretest_negative/iterations, pretest_positive/iterations, posttest_negative/iterations, posttest_positive/iterations)
print(test_network(10000, simpleNet, 10000))
While much differs between this code and the code posted in the OP, there is a particular difference that is interesting. In the original feedforward method notice
#second layer's output activations use layer1's activations as input:
for neuron in range(self.outputLayerSize):
for weight in range(self.hiddenLayerSize):
self.layer2_activations[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
self.layer2_activations[neuron] = 1/(1+np.exp(self.layer2_activations[neuron]))
The line
self.layer2_activations[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
Resembles
self.layer2_inputsums[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
In the updated code. This line performs the dot product between each weight vector and each input vector (the activations from layer 1) to arrive at the input_sum for a neuron, commonly referred to as z (think sigmoid(z)). In my network, the derivative of the sigmoid function, sigmoid_prime, is used to calculate the gradient of the cost function with respect to all the weights. By multiplying sigmoid_prime(z) * network error between actual and expected output. If z is very big (and positive), the neuron will have an activation value very close to 1. That means that the network is confident that that neuron should be activating. The same is true if z is very negative. The network, then, doesn't want to radically adjust weights that it is happy with, so the scale of the change in each weight for a neuron is given by the gradient of sigmoid(z), sigmoid_prime(z). Very large z means very small gradient and very small change applied to weights (the gradient of sigmoid is maximized at z = 0, when the network is unconfident about how a neuron should be categorized and when the activation for that neuron is 0.5).
Since I was continually adding on to each neuron's input_sum (z) and never resetting the value for new inputs of dot(weights, activations), the value for z kept growing, continually slowing the rate of change for the weights until weight modification grew to a standstill. I added the following line to cope with this:
self.layer2_inputsums = np.zeros((self.outputLayerSize, 1))
The new posted network can be copy and pasted into an editor and executed so long as you have the numpy module installed. The final line of output to print will be a list of 4 arrays representing final network output. The first two are the pretest values for a negative and positive input, respectively. These should be random. The second two are post-test values to determine how well the network classifies as positive and negative number. A number near 0 denotes negative, near 1 denotes positive.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

LSTM cell implementation in Pytorch design choices - python

Related

How can I get the in and out edges weights for each neuron in a neural network?

How can I properly initialize the needed vectors in a Neural Network Model?

Efficient way to avoid modifying parameter by inplace operation

Max pooling layers have weights. interesting work

Neural Network seems to be getting stuck on a single output with each execution

Categories

Resources