I have a GRU network, which is manually built (i.e. no nn.GRU) with 2 vertical layers, 128-dim hidden layer, and sequence length of 64 chars.
I'm trying to overfit a small corpus taken out of Shakespeare:
I ran training for 500+ epochs, and after every 25 epochs I generate a sample with (very low temperature) by giving only the first letter "B". At first it's gibberish, but at the end it does get close to the actual text. The first version was:
This was without passing the hidden state between batches. I thought maybe this was my problem, so I passed the hidden state, detached, between batches. But I still don't get perfect overfit, and it actually seems to worsen it:
Here's how I implemented the GRU:
for i in range(self.n_layers):
layer_input = layer_middle
layer_middle = torch.zeros((batch_size, seq_len, self.h_dim))
params = self.layer_params[i]
for j in range(seq_len):
x = layer_input[:, j, :].to(torch.device('cuda'))
z = F.sigmoid(params['W1'](x) + params['W2'](layer_states[i]))
r = F.sigmoid(params['W3'](x) + params['W4'](layer_states[i]))
g = F.tanh(params['W5'](x) + params['W6'](r*layer_states[i]))
layer_states[i] = z*layer_states[i] + (1-z)*g
layer_middle[:, j, :] = layer_states[i]
layer_middle = nn.Dropout(self.dropout)(layer_middle)
layer_output = torch.zeros((batch_size, seq_len, self.out_dim))
for j in range(seq_len):
x = layer_middle[:, j, :].to(torch.device('cuda'))
layer_output[:, j, :] = self.layer_params[-1]['W7'](x)
The params are nn.Linear with the correct shapes.
Any idea what might prevent me from overfitting to this tiny corpus??
I am working with a trained LSTM model. I want to extract the output of each gates at prediction, and where the input is a single sequence. In recurrent.py (from keras package), that would be given by:
i = self.recurrent_activation(x_i + K.dot(h_tm1_i,
self.recurrent_kernel_i))
f = self.recurrent_activation(x_f + K.dot(h_tm1_f,
self.recurrent_kernel_f))
c = f * c_tm1 + i * self.activation(x_c + K.dot(h_tm1_c,
self.recurrent_kernel_c))
I'm unable to extract i, c, f, o matrices as can be done for the weights and biases:
for layer in model.layers:
if "lstm" in str(layer).lower():
kernel_i = K.get_value(layer.kernel_i)
kernel_f = K.get_value(layer.kernel_f)
....
EDITED
I tried to compute the activations using for example:
x_i = K.dot(inputs_i, kernel_i)
but then, it outputs the following error: 'numpy.ndarray' object has
no attribute 'get_shape'
That's wrong, because both inputs_i and kernel_i are numpy arrays.
The correct way is:
x_i = np.dot(inputs_i, kernel_i)
and then add bias: x_i = x_i + bias_i
Still following recurrent.py, the next step is to get the output of the input gate i for example:
i = self.recurrent_activation(x_i + K.dot(h_tm1_i,
self.recurrent_kernel_i))
Need to 1st convert x_i into a tensor and then get the recurrent_activation from the layer object:
x_i_tensor = tf.convert_to_tensor(x_i, np.float32)
i = layer_lstm.recurrent_activation(x_i_tensor + K.dot(h_tm1_i,
layer_lstm.recurrent_kernel_i))
So, now the question is how to access the hidden state h_tm1_i?
I'd appreciate some suggestions. Thanks
I've created a neural network to estimate the sin(x) function for an input x. The network has 21 output neurons (representing numbers -1.0, -0.9, ..., 0.9, 1.0) with numpy that does not learn, as I think I implemented the neuron architecture incorrectly when I defined the feedforward mechanism.
When I execute the code, the amount of test data it estimates correctly sits around 48/1000. This happens to be the average data point count per category if you split 1000 test data points between 21 categories. Looking at the network output, you can see that the network seems to just start picking a single output value for every input. For example, it may pick -0.5 as the estimate for y regardless of the x you give it. Where did I go wrong here? This is my first network. Thank you!
import random
import numpy as np
import math
class Network(object):
def __init__(self,inputLayerSize,hiddenLayerSize,outputLayerSize):
#Create weight vector arrays to represent each layer size and initialize indices randomly on a Gaussian distribution.
self.layer1 = np.random.randn(hiddenLayerSize,inputLayerSize)
self.layer1_activations = np.zeros((hiddenLayerSize, 1))
self.layer2 = np.random.randn(outputLayerSize,hiddenLayerSize)
self.layer2_activations = np.zeros((outputLayerSize, 1))
self.outputLayerSize = outputLayerSize
self.inputLayerSize = inputLayerSize
self.hiddenLayerSize = hiddenLayerSize
# print(self.layer1)
# print()
# print(self.layer2)
# self.weights = [np.random.randn(y,x)
# for x, y in zip(sizes[:-1], sizes[1:])]
def feedforward(self, network_input):
#Propogate forward through network as if doing this by hand.
#first layer's output activations:
for neuron in range(self.hiddenLayerSize):
self.layer1_activations[neuron] = 1/(1+np.exp(network_input * self.layer1[neuron]))
#second layer's output activations use layer1's activations as input:
for neuron in range(self.outputLayerSize):
for weight in range(self.hiddenLayerSize):
self.layer2_activations[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
self.layer2_activations[neuron] = 1/(1+np.exp(self.layer2_activations[neuron]))
#convert layer 2 activation numbers to a single output. The neuron (weight vector) with highest activation will be output.
outputs = [x / 10 for x in range(-int((self.outputLayerSize/2)), int((self.outputLayerSize/2))+1, 1)] #range(-10, 11, 1)
return(outputs[np.argmax(self.layer2_activations)])
def train(self, training_pairs, epochs, minibatchsize, learn_rate):
#apply gradient descent
test_data = build_sinx_data(1000)
for epoch in range(epochs):
random.shuffle(training_pairs)
minibatches = [training_pairs[k:k + minibatchsize] for k in range(0, len(training_pairs), minibatchsize)]
for minibatch in minibatches:
loss = 0 #calculate loss for each minibatch
#Begin training
for x, y in minibatch:
network_output = self.feedforward(x)
loss += (network_output - y) ** 2
#adjust weights by abs(loss)*sigmoid(network_output)*(1-sigmoid(network_output)*learn_rate
loss /= (2*len(minibatch))
adjustWeights = loss*(1/(1+np.exp(-network_output)))*(1-(1/(1+np.exp(-network_output))))*learn_rate
self.layer1 += adjustWeights
#print(adjustWeights)
self.layer2 += adjustWeights
#when line 63 placed here, results did not improve during minibatch.
print("Epoch {0}: {1}/{2} correct".format(epoch, self.evaluate(test_data), len(test_data)))
print("Training Complete")
def evaluate(self, test_data):
"""
Returns number of test inputs which network evaluates correctly.
The ouput assumed to be neuron in output layer with highest activation
:param test_data: test data set identical in form to train data set.
:return: integer sum
"""
correct = 0
for x, y in test_data:
output = self.feedforward(x)
if output == y:
correct+=1
return(correct)
def build_sinx_data(data_points):
"""
Creates a list of tuples (x value, expected y value) for Sin(x) function.
:param data_points: number of desired data points
:return: list of tuples (x value, expected y value
"""
x_vals = []
y_vals = []
for i in range(data_points):
#parameter of randint signifies range of x values to be used*10
x_vals.append(random.randint(-2000,2000)/10)
y_vals.append(round(math.sin(x_vals[i]),1))
return (list(zip(x_vals,y_vals)))
# training_pairs, epochs, minibatchsize, learn_rate
sinx_test = Network(1,21,21)
print(sinx_test.feedforward(10))
sinx_test.train(build_sinx_data(600),20,10,2)
print(sinx_test.feedforward(10))
I didn't examine thoroughly all of your code, but some issues are clearly visible:
* operator doesn't perform matrix multiplication in numpy, you have to use numpy.dot. This affects, for instance, these lines: network_input * self.layer1[neuron], self.layer1_activations[weight]*self.layer2[neuron][weight], etc.
Seems like you are solving your problem via classification (selecting 1 out of 21 classes), but using L2 loss. This is somewhat mixed up. You have two options: either stick to classification and use a cross entropy loss function, or perform regression (i.e. predict the numeric value) with L2 loss.
You should definitely extract sigmoid function to avoid writing the same expression all over again:
def sigmoid(z):
return 1 / (1 + np.exp(-z))
def sigmoid_derivative(x):
return sigmoid(x) * (1 - sigmoid(x))
You perform the same update of self.layer1 and self.layer2, which clearly wrong. Take some time analyzing how exactly backpropagation works.
I edited how my loss function was integrated into my function and also correctly implemented gradient descent. I also removed the use of mini-batches and simplified what my network was trying to do. I now have a network which attempts to classify something as even or odd.
Some extremely helpful guides I used to fix things up:
Chapter 1 and 2 of Neural Networks and Deep Learning, by Michael Nielsen, available for free at http://neuralnetworksanddeeplearning.com/chap1.html . This book gives thorough explanations for how Neural Nets work, including breakdowns of the math behind their execution.
Backpropagation from the Beginning, by Erik Hallström, linked by Maxim. https://medium.com/#erikhallstrm/backpropagation-from-the-beginning-77356edf427d
. Not as thorough as the above guide, but I kept both open concurrently, as this guide is more to the point about what is important and how to apply the mathematical formulas that are thoroughly explained in Nielsen's book.
How to build a simple neural network in 9 lines of Python code https://medium.com/technology-invention-and-more/how-to-build-a-simple-neural-network-in-9-lines-of-python-code-cc8f23647ca1
. A useful and fast introduction to some neural networking basics.
Here is my (now functioning) code:
import random
import numpy as np
import scipy
import math
class Network(object):
def __init__(self,inputLayerSize,hiddenLayerSize,outputLayerSize):
#Layers represented both by their weights array and activation and inputsums vectors.
self.layer1 = np.random.randn(hiddenLayerSize,inputLayerSize)
self.layer2 = np.random.randn(outputLayerSize,hiddenLayerSize)
self.layer1_activations = np.zeros((hiddenLayerSize, 1))
self.layer2_activations = np.zeros((outputLayerSize, 1))
self.layer1_inputsums = np.zeros((hiddenLayerSize, 1))
self.layer2_inputsums = np.zeros((outputLayerSize, 1))
self.layer1_errorsignals = np.zeros((hiddenLayerSize, 1))
self.layer2_errorsignals = np.zeros((outputLayerSize, 1))
self.layer1_deltaw = np.zeros((hiddenLayerSize, inputLayerSize))
self.layer2_deltaw = np.zeros((outputLayerSize, hiddenLayerSize))
self.outputLayerSize = outputLayerSize
self.inputLayerSize = inputLayerSize
self.hiddenLayerSize = hiddenLayerSize
print()
print(self.layer1)
print()
print(self.layer2)
print()
# self.weights = [np.random.randn(y,x)
# for x, y in zip(sizes[:-1], sizes[1:])]
def feedforward(self, network_input):
#Calculate inputsum and and activations for each neuron in the first layer
for neuron in range(self.hiddenLayerSize):
self.layer1_inputsums[neuron] = network_input * self.layer1[neuron]
self.layer1_activations[neuron] = self.sigmoid(self.layer1_inputsums[neuron])
# Calculate inputsum and and activations for each neuron in the second layer. Notice that each neuron in the second layer represented by
# weights vector, consisting of all weights leading out of the kth neuron in (l-1) layer to the jth neuron in layer l.
self.layer2_inputsums = np.zeros((self.outputLayerSize, 1))
for neuron in range(self.outputLayerSize):
for weight in range(self.hiddenLayerSize):
self.layer2_inputsums[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
self.layer2_activations[neuron] = self.sigmoid(self.layer2_inputsums[neuron])
return self.layer2_activations
def interpreted_output(self, network_input):
#convert layer 2 activation numbers to a single output. The neuron (weight vector) with highest activation will be output.
self.feedforward(network_input)
outputs = [x / 10 for x in range(-int((self.outputLayerSize/2)), int((self.outputLayerSize/2))+1, 1)] #range(-10, 11, 1)
return(outputs[np.argmax(self.layer2_activations)])
# def build_expected_output(self, training_data):
# #Views expected output number y for each x to generate an expected output vector from the network
# index=0
# for pair in training_data:
# expected_output_vector = np.zeros((self.outputLayerSize,1))
# x = training_data[0]
# y = training_data[1]
# for i in range(-int((self.outputLayerSize / 2)), int((self.outputLayerSize / 2)) + 1, 1):
# if y == i / 10:
# expected_output_vector[i] = 1
# #expect the target category to be a 1.
# break
# training_data[index][1] = expected_output_vector
# index+=1
# return training_data
def train(self, training_data, learn_rate):
self.backpropagate(training_data, learn_rate)
def backpropagate(self, train_data, learn_rate):
#Perform for each x,y pair.
for datapair in range(len(train_data)):
x = train_data[datapair][0]
y = train_data[datapair][1]
self.feedforward(x)
# print("l2a " + str(self.layer2_activations))
# print("l1a " + str(self.layer1_activations))
# print("l2 " + str(self.layer2))
# print("l1 " + str(self.layer1))
for neuron in range(self.outputLayerSize):
#Calculate first error equation for error signals of output layer neurons
self.layer2_errorsignals[neuron] = (self.layer2_activations[neuron] - y[neuron]) * self.sigmoid_prime(self.layer2_inputsums[neuron])
#Use recursive formula to calculate error signals of hidden layer neurons
self.layer1_errorsignals = np.multiply(np.array(np.matrix(self.layer2.T) * np.matrix(self.layer2_errorsignals)) , self.sigmoid_prime(self.layer1_inputsums))
#print(self.layer1_errorsignals)
# for neuron in range(self.hiddenLayerSize):
# #Use recursive formula to calculate error signals of hidden layer neurons
# self.layer1_errorsignals[neuron] = np.multiply(self.layer2[neuron].T,self.layer2_errorsignals[neuron]) * self.sigmoid_prime(self.layer1_inputsums[neuron])
#Partial derivative of C with respect to weight for connection from kth neuron in (l-1)th layer to jth neuron in lth layer is
#(jth error signal in lth layer) * (kth activation in (l-1)th layer.)
#Update all weights for network at each iteration of a training pair.
#Update weights in second layer
for neuron in range(self.outputLayerSize):
for weight in range(self.hiddenLayerSize):
self.layer2_deltaw[neuron][weight] = self.layer2_errorsignals[neuron]*self.layer1_activations[weight]*(-learn_rate)
self.layer2 += self.layer2_deltaw
#Update weights in first layer
for neuron in range(self.hiddenLayerSize):
self.layer1_deltaw[neuron] = self.layer1_errorsignals[neuron]*(x)*(-learn_rate)
self.layer1 += self.layer1_deltaw
#Comment/Uncomment to enable error evaluation.
#print("Epoch {0}: Error: {1}".format(datapair, self.evaluate(test_data)))
# print("l2a " + str(self.layer2_activations))
# print("l1a " + str(self.layer1_activations))
# print("l1 " + str(self.layer1))
# print("l2 " + str(self.layer2))
def evaluate(self, test_data):
error = 0
for x, y in test_data:
#x is integer, y is single element np.array
output = self.feedforward(x)
error += y - output
return error
#eval function for sin(x)
# def evaluate(self, test_data):
# """
# Returns number of test inputs which network evaluates correctly.
# The ouput assumed to be neuron in output layer with highest activation
# :param test_data: test data set identical in form to train data set.
# :return: integer sum
# """
# correct = 0
# for x, y in test_data:
# outputs = [x / 10 for x in range(-int((self.outputLayerSize / 2)), int((self.outputLayerSize / 2)) + 1,
# 1)] # range(-10, 11, 1)
# newy = outputs[np.argmax(y)]
# output = self.interpreted_output(x)
# #print("output: " + str(output))
# if output == newy:
# correct+=1
# return(correct)
def sigmoid(self, z):
return 1 / (1 + np.exp(-z))
def sigmoid_prime(self, z):
return (1 - self.sigmoid(z)) * self.sigmoid(z)
def build_simple_data(data_points):
x_vals = []
y_vals = []
for each in range(data_points):
x = random.randint(-3,3)
expected_output_vector = np.zeros((1, 1))
if x > 0:
expected_output_vector[[0]] = 1
else:
expected_output_vector[[0]] = 0
x_vals.append(x)
y_vals.append(expected_output_vector)
print(list(zip(x_vals,y_vals)))
print()
return (list(zip(x_vals,y_vals)))
simpleNet = Network(1, 3, 1)
# print("Pretest")
# print(simpleNet.feedforward(-3))
# print(simpleNet.feedforward(10))
# init_weights_l1 = simpleNet.layer1
# init_weights_l2 = simpleNet.layer2
# simpleNet.train(build_simple_data(10000),.1)
# #sometimes Error converges to 0, sometimes error converges to 10.
# print("Initial Weights:")
# print(init_weights_l1)
# print(init_weights_l2)
# print("Final Weights")
# print(simpleNet.layer1)
# print(simpleNet.layer2)
# print("Post-test")
# print(simpleNet.feedforward(-3))
# print(simpleNet.feedforward(10))
def test_network(iterations,net,training_points):
"""
Casually evaluates pre and post test
:param iterations: number of trials to be run
:param net: name of network to evaluate.
;param training_points: size of training data to be used
:return: four 1x1 arrays.
"""
pretest_negative = 0
pretest_positive = 0
posttest_negative = 0
posttest_positive = 0
for each in range(iterations):
pretest_negative += net.feedforward(-10)
pretest_positive += net.feedforward(10)
net.train(build_simple_data(training_points),.1)
for each in range(iterations):
posttest_negative += net.feedforward(-10)
posttest_positive += net.feedforward(10)
return(pretest_negative/iterations, pretest_positive/iterations, posttest_negative/iterations, posttest_positive/iterations)
print(test_network(10000, simpleNet, 10000))
While much differs between this code and the code posted in the OP, there is a particular difference that is interesting. In the original feedforward method notice
#second layer's output activations use layer1's activations as input:
for neuron in range(self.outputLayerSize):
for weight in range(self.hiddenLayerSize):
self.layer2_activations[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
self.layer2_activations[neuron] = 1/(1+np.exp(self.layer2_activations[neuron]))
The line
self.layer2_activations[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
Resembles
self.layer2_inputsums[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
In the updated code. This line performs the dot product between each weight vector and each input vector (the activations from layer 1) to arrive at the input_sum for a neuron, commonly referred to as z (think sigmoid(z)). In my network, the derivative of the sigmoid function, sigmoid_prime, is used to calculate the gradient of the cost function with respect to all the weights. By multiplying sigmoid_prime(z) * network error between actual and expected output. If z is very big (and positive), the neuron will have an activation value very close to 1. That means that the network is confident that that neuron should be activating. The same is true if z is very negative. The network, then, doesn't want to radically adjust weights that it is happy with, so the scale of the change in each weight for a neuron is given by the gradient of sigmoid(z), sigmoid_prime(z). Very large z means very small gradient and very small change applied to weights (the gradient of sigmoid is maximized at z = 0, when the network is unconfident about how a neuron should be categorized and when the activation for that neuron is 0.5).
Since I was continually adding on to each neuron's input_sum (z) and never resetting the value for new inputs of dot(weights, activations), the value for z kept growing, continually slowing the rate of change for the weights until weight modification grew to a standstill. I added the following line to cope with this:
self.layer2_inputsums = np.zeros((self.outputLayerSize, 1))
The new posted network can be copy and pasted into an editor and executed so long as you have the numpy module installed. The final line of output to print will be a list of 4 arrays representing final network output. The first two are the pretest values for a negative and positive input, respectively. These should be random. The second two are post-test values to determine how well the network classifies as positive and negative number. A number near 0 denotes negative, near 1 denotes positive.
I am new to CNTK, and using its awesome python API. I have problem figuring out how I may define a Recurrent Convolutional network layer since the Recurrence() seems to assume a regular network layer only.
Be more specific, I would like to have recurrence among convolutional layers.
Any pointer or even a simple example would be highly appreciated. Thank you.
There are two ways to do this in a meaningful way (i.e. without destroying the structure of natural images that convolutions rely on). The simplest is to just have an LSTM at the final layer i.e.
convnet = C.layers.Sequential([Convolution(...), MaxPooling(...), Convolution(...), ...])
z = C.layers.Sequential([convnet, C.layers.Recurrence(LSTM(100)), C.layers.Dense(10)])
for a 10-class problem.
The more complex way would be to define your own recurrent cell that only uses convolutions and thus respects the structure of natural images. To define a recurrent cell you need to write a function that takes the previous state and an input (i.e. a single frame if you are processing video) and outputs the next state and output. For example you can look into the implementation of the GRU in the CNTK layers module, and adapt it to use convolution instead of times everywhere. If this is what you want I can try to provide such an example. However, I encourage you to try the simple way first.
Update: I wrote a barebones convolutional GRU. You need to pay special attention to how the initial state is defined but otherwise it seems to work fine. Here's the layer definition
def ConvolutionalGRU(kernel_shape, outputs, activation=C.tanh, init=C.glorot_uniform(), init_bias=0, name=''):
conv_filter_shape = (outputs, C.InferredDimension) + kernel_shape
bias_shape = (outputs,1,1)
# parameters
bz = C.Parameter(bias_shape, init=init_bias, name='bz') # bias
br = C.Parameter(bias_shape, init=init_bias, name='br') # bias
bh = C.Parameter(bias_shape, init=init_bias, name='bc') # bias
Wz = C.Parameter(conv_filter_shape, init=init, name='Wz') # input
Wr = C.Parameter(conv_filter_shape, init=init, name='Wr') # input
Uz = C.Parameter(conv_filter_shape, init=init, name='Uz') # hidden-to-hidden
Ur = C.Parameter(conv_filter_shape, init=init, name='Hz') # hidden-to-hidden
Wh = C.Parameter(conv_filter_shape, init=init, name='Wc') # input
Uh = C.Parameter(conv_filter_shape, init=init, name='Hc') # hidden-to-hidden
# Convolutional GRU model function
def conv_gru(dh, x):
zt = C.sigmoid (bz + C.convolution(Wz, x) + C.convolution(Uz, dh)) # update gate z(t)
rt = C.sigmoid (br + C.convolution(Wr, x) + C.convolution(Ur, dh)) # reset gate r(t)
rs = dh * rt # hidden state after reset
ht = zt * dh + (1-zt) * activation(bh + C.convolution(Wh, x) + C.convolution(Uh, rs))
return ht
return conv_gru
and here is how to use it
x = C.sequence.input_variable(3,224,224))
z = C.layers.Recurrence(ConvolutionalGRU((3,3), 32), initial_state=C.constant(0, (32,224,224)))
y = z(x)
x0 = np.random.randn(16,3,224,224).astype('f') # a single seq. with 16 random "frames"
output = y.eval({x:x0})
output[0].shape
(16, 32, 224, 224)
I'm having trouble with my first neural network. I simply cannot find the source of the error.
Problem
Reading the book "Make your own neural network" by Tariq Rashid I tried to implement Handwriting recognition using NN which would classify images and determine which digit from 0 to 9 is written down.
After training the NN the tests show that each of the letters have ~99% match, which is obviously wrong.
Suspicions
In the book the author approaches NN matrices a bit deferent then I have. For example he multiplies input-hidden layer weights with input, which I do other way around by multiplying input with input-hidden weights.
Here is illustration of the way I do matrix multiplication while querying NN (feedforward):
I'm aware that matrices do not posses commutative property for dot product but I it don't notice that I have made an error there.
Should I take different approach i.e. transpose all matrices and multiply them in different order?
Is there de facto standard for dimensions of an input and output matrix i.e. should they be shaped as 1×n or n×1?
If this is wrong approach then it certainly has manifested itself in backpropagation with gradient descent used for training.
Source code
import numpy as np
import matplotlib.pyplot
from matplotlib.pyplot import imshow
import scipy.special as scipy
from PIL import Image
class NeuralNetwork(object):
def __init__(self):
self.input_neuron_count = 28*28 # One for each pixel, 28*28 = 784 in total.
self.hidden_neuron_count = 100 # Arbitraty.
self.output_neuron_count = 10 # One for each digit from 0 to 9.
self.learning_rate = 0.1 # Arbitraty.
# Sampling the weights from a normal probability distribution
# centered around zero and with standard deviation
# that is related to the number of incoming links into a node,
# 1/√(number of incoming links).
generate_random_weight_matrix = lambda input_neuron_count, output_neuron_count: (
np.random.normal(0.0, pow(input_neuron_count, -0.5), (input_neuron_count, output_neuron_count))
)
self.input_x_hidden_weights = generate_random_weight_matrix(self.input_neuron_count, self.hidden_neuron_count)
self.hidden_x_output_weights = generate_random_weight_matrix(self.hidden_neuron_count, self.output_neuron_count)
self.activation_function = lambda value: scipy.expit(value) # Sigmoid function
def train(self, input_array, target_array):
inputs = np.array(input_array, ndmin=2)
targets = np.array(target_array, ndmin=2)
hidden_layer_input = np.dot(inputs, self.input_x_hidden_weights)
hidden_layer_output = self.activation_function(hidden_layer_input)
output_layer_input = np.dot(hidden_layer_output, self.hidden_x_output_weights)
output_layer_output = self.activation_function(output_layer_input)
output_errors = targets - output_layer_output
self.hidden_x_output_weights += self.learning_rate * np.dot(hidden_layer_output.T, (output_errors * output_layer_output * (1 - output_layer_output)))
hidden_errors = np.dot(output_errors, self.hidden_x_output_weights.T)
self.input_x_hidden_weights += self.learning_rate * np.dot(inputs.T, (hidden_errors * hidden_layer_output * (1 - hidden_layer_output)))
def query(self, input_array):
inputs = np.array(input_array, ndmin=2)
hidden_layer_input = np.dot(inputs, self.input_x_hidden_weights)
hidden_layer_output = self.activation_function(hidden_layer_input)
output_layer_input = np.dot(hidden_layer_output, self.hidden_x_output_weights)
output_layer_output = self.activation_function(output_layer_input)
return output_layer_output
Replication (Training and testing)
The original source of training and testing data is from The MNIST Database. I have used CSV version which I downloaded from the book authors web page The MNIST Dataset of Handwitten Digits.
Here is the code I have used for training and testing so far:
def prepare_data(handwritten_digit_array):
return ((handwritten_digit_array / 255.0 * 0.99) + 0.0001).flatten()
def create_target(digit_target):
target = np.zeros(10) + 0.01
target[digit_target] = target[digit_target] + 0.98
return target
# Training
neural_network = NeuralNetwork()
training_data_file = open('mnist_train.csv', 'r')
training_data = training_data_file.readlines()
training_data_file.close()
for data in training_data:
handwritten_digit_raw = data.split(',')
handwritten_digit_array = np.asfarray(handwritten_digit_raw[1:]).reshape((28, 28))
handwritten_digit_target = int(handwritten_digit_raw[0])
neural_network.train(prepare_data(handwritten_digit_array), create_target(handwritten_digit_target))
# Testing
test_data_file = open('mnist_test_10.csv', 'r')
test_data = test_data_file.readlines()
test_data_file.close()
for data in test_data:
handwritten_digit_raw = data.split(',')
handwritten_digit_array = np.asfarray(handwritten_digit_raw[1:]).reshape((28, 28))
handwritten_digit_target = int(handwritten_digit_raw[0])
output = neural_network.query(handwritten_digit_array.flatten())
print('target', handwritten_digit_target)
print('output', output)
This is one of those facepalm moments. Neural network has been working as expected all along. The truth is that I have now noticed I've overlooked the test results and read numbers written in scientific notation incorrectly.
Measured on 10000 test data from The MNIST Database this NN has accuracy of 94.01%.