Augmenting training data - python

I'm working through an exercise for augmenting training data and then testing it through the artificial neural network. The idea is to test the accuracy as more data is added to the training set.
I'm using the mnist dataset.
This how I am rotating the image:
def rotate_image(inputs, degree):
## create rotated variations
# rotated anticlockwise by x degrees
inputs_plusx_img = scipy.ndimage.interpolation.rotate(inputs.reshape(28,28), degree, cval=0.01, order=1, reshape=False)
new_inputs1 = inputs_plusx_img.reshape(784)
# rotated clockwise by x degrees
inputs_minusx_img = scipy.ndimage.interpolation.rotate(inputs.reshape(28,28), -degree, cval=0.01, order=1, reshape=False)
new_inputs2 = inputs_minusx_img.reshape(784)
return (new_inputs1, new_inputs2)
degree = 10
df = pd.read_csv(train_file)
#print(df.head())
idx = 100
instance = df.iloc[idx:(idx+1), 1:].values
#print(instance.reshape(28,28))
new_image1, new_image2 = rotate_image(instance, degree)
# show rotated image
image_array = np.asfarray(new_image1).flatten().reshape((28,28))
print(new_image1)
# print the grid in grey scale
plt.imshow(image_array, cmap='Greys', interpolation='None')
Now what I'm not sure about is how to add the new image to the training data set and then add it to the ANN class.
This is my neural network:
class neuralNetwork:
"""Artificial Neural Network classifier.
Parameters
------------
lr : float
Learning rate (between 0.0 and 1.0)
ep : int
Number of epochs
bs : int
Size of the training batch to be used when calculating the gradient descent.
batch_size = 1 standard gradient descent
batch_size > 1 stochastic gradient descent
inodes : int
Number of input nodes which is normally the number of features in an instance.
hnodes : int
Number of hidden nodes in the net.
onodes : int
Number of output nodes in the net.
Attributes
-----------
wih : 2d-array
Input2Hidden node weights after fitting
who : 2d-array
Hidden2Output node weights after fitting
E : list
Sum-of-squares error value in each epoch.
Results : list
Target and predicted class labels for the test data.
Functions
---------
activation_function : float (between 1 and -1)
implments the sigmoid function which squashes the node input
"""
def __init__(self, inputnodes=784, hiddennodes=200, outputnodes=10, learningrate=0.1, batch_size=1, epochs=10):
self.inodes = inputnodes
self.hnodes = hiddennodes
self.onodes = outputnodes
#two weight matrices, wih (input to hidden layer) and who (hidden layer to output)
#a weight on link from node i to node j is w_ij
#Draw random samples from a normal (Gaussian) distribution centered around 0.
#numpy.random.normal(loc to centre gaussian=0.0, scale=1, size=dimensions of the array we want)
#scale is usually set to the standard deviation which is related to the number of incoming links i.e.
#1/sqrt(num of incoming inputs). we use pow to raise it to the power of -0.5.
#We have set 0 as the centre of the guassian dist.
# size is set to the dimensions of the number of hnodes, inodes and onodes for each weight matrix
self.wih = np.random.normal(0.0, pow(self.inodes, -0.5), (self.hnodes, self.inodes))
self.who = np.random.normal(0.0, pow(self.onodes, -0.5), (self.onodes, self.hnodes))
#set the learning rate
self.lr = learningrate
#set the batch size
self.bs = batch_size
#set the number of epochs
self.ep = epochs
#store errors at each epoch
self.E= []
#store results from testing the model
#keep track of the network performance on each test instance
self.results= []
#define the activation function here
#specify the sigmoid squashing function. Here expit() provides the sigmoid function.
#lambda is a short cut function which is executed there and then with no def (i.e. like an anonymous function)
self.activation_function = lambda x: scipy.special.expit(x)
pass
# function to help management of batching for gradient descent
# size of the batch is controled by self,bs
def batch_input(self, X, y): # (self, train_inputs, targets):
"""Yield consecutive batches of the specified size from the input list."""
for i in range(0, len(X), self.bs):
# yield a tuple of the current batched data and labels
yield (X[i:i + self.bs], y[i:i + self.bs])
#train the neural net
#note the first part is very similar to the query function because they both require the forward pass
def train(self, train_inputs, targets_list):
#def train(self, train_inputs):
"""Training the neural net.
This includes the forward pass ; error computation;
backprop of the error ; calculation of gradients and updating the weights.
Parameters
----------
train_inputs : {array-like}, shape = [n_instances, n_features]
Training vectors, where n_instances is the number of training instances and
n_features is the number of features.
Note this contains all features including the class feature which is in first position
Returns
-------
self : object
"""
for e in range(self.ep):
print("Training epoch#: ", e)
sum_error = 0.0
for (batchX, batchY) in self.batch_input(train_inputs, targets_list):
#creating variables to store the gradients
delta_who = 0
delta_wih = 0
# iterate through the inputs sent in
for inputs, targets in zip(batchX, batchY):
#convert inputs list to 2d array
inputs = np.array(inputs, ndmin=2).T
targets = np.array(targets, ndmin=2).T
#calculate signals into hidden layer
hidden_inputs = np.dot(self.wih, inputs)
#calculate the signals emerging from the hidden layer
hidden_outputs = self.activation_function(hidden_inputs)
#calculate signals into final output layer
final_inputs=np.dot(self.who, hidden_outputs)
#calculate the signals emerging from final output layer
final_outputs = self.activation_function(final_inputs)
#to calculate the error we need to compute the element wise diff between target and actual
output_errors = targets - final_outputs
#Next distribute the error to the hidden layer such that hidden layer error
#is the output_errors, split by weights, recombined at hidden nodes
hidden_errors = np.dot(self.who.T, output_errors)
## for each instance accumilate the gradients from each instance
## delta_who are the gradients between hidden and output weights
## delta_wih are the gradients between input and hidden weights
delta_who += np.dot((output_errors * final_outputs * (1.0 - final_outputs)), np.transpose(hidden_outputs))
delta_wih += np.dot((hidden_errors * hidden_outputs * (1.0 - hidden_outputs)), np.transpose(inputs))
sum_error += np.dot(output_errors.T, output_errors)#this is the sum of squared error accumilated over each batced instance
pass #instance
# update the weights by multiplying the gradient with the learning rate
# note that the deltas are divided by batch size to obtain the average gradient according to the given batch
# obviously if batch size = 1 then we simply end up dividing by 1 since each instance forms a singleton batch
self.who += self.lr * (delta_who / self.bs)
self.wih += self.lr * (delta_wih / self.bs)
pass # batch
self.E.append(np.asfarray(sum_error).flatten())
print("errors (SSE): ", self.E[-1])
pass # epoch
#query the neural net
def query(self, inputs_list):
#convert inputs_list to a 2d array
inputs = np.array(inputs_list, ndmin=2).T
#propogate input into hidden layer. This is the start of the forward pass
hidden_inputs = np.dot(self.wih, inputs)
#squash the content in the hidden node using the sigmoid function (value between 1, -1)
hidden_outputs = self.activation_function(hidden_inputs)
#propagate into output layer and the apply the squashing sigmoid function
final_inputs = np.dot(self.who, hidden_outputs)
final_outputs = self.activation_function(final_inputs)
return final_outputs
#iterate through all the test data to calculate model accuracy
def test(self, test_inputs, test_targets):
self.results = []
#go through each test instances
for inputs, target in zip(test_inputs, test_targets):
#query the network with test inputs
#note this returns 10 output values ; of which the index of the highest value
# is the networks predicted class label
outputs = self.query(inputs)
#get the target which has 0.99 as highest value corresponding to the actual class
target_label = np.argmax(target)
#get the index of the highest output node as this corresponds to the predicted class
predict_label = np.argmax(outputs) #this is the class predicted by the ANN
self.results.append([predict_label, target_label])
pass
pass
self.results = np.asfarray(self.results) # flatten results to avoid nested arrays
Functions to per process data and then train network:
def preprocess_data(Xy):
X=[]
y=[]
for instance in Xy:
# split the record by the ',' commas
all_values = instance.split(',')
# scale and shift the inputs
inputs = (np.asfarray(all_values[1:]) / 255.0 * 0.99) + 0.01
# create the target output values (all 0.01, except the desired label which is 0.99)
targets = np.zeros(output_nodes) + 0.01
# all_values[0] is the target label for this record
targets[int(all_values[0])] = 0.99
X.insert(len(X), inputs)
y.insert(len(y), targets)
pass
return(X,y)
pass
mini_training_data = np.random.choice(train_data_list, 60000, replace = False)
print("Percentage of training data used:", (len(mini_training_data)/len(train_data_list)) * 100)
X_train, y_train = preprocess_data(mini_training_data)
X_test, y_test = preprocess_data(test_data_list)
n = neuralNetwork(input_nodes, hidden_nodes, output_nodes, learning_rate, batch_size, epochs)
n.train(X_train, y_train)
n.test(X_test, y_test)
#print network performance as an accuracy metric
correct = 0 # number of predictions that were correct
#iteratre through each tested instance and accumilate number of correct predictions
for result in n.results:
if (result[0] == result[1]):
correct += 1
pass
pass
# print the accuracy on test set
print ("Test set accuracy% = ", (100 * correct / len(n.results)))

Related

Train a neural network when the training has only the derivative of output wrt all inputs

There is a scalar function F with 1000 inputs. I want to train a model to predict F given the inputs. However, in the training dataset, we only know the derivative of F with respect to each input, not the value of F itself. How I can construct a neural network with this limitation in tensorflow or pytorch?
I think you can use torch.autograd to compute the gradients, and then use them for the loss. You need:
(a) A trainable nn.Module to represent the (unknown) function F:
class UnknownF(nn.Module):
def __init__(self, ...):
# whatever combinations of linear layers and activations and whatever...
def forward(self, x):
# x is 1000 dim vector
y = self.layers(x)
# y is a _scalar_ output
return y
model = UnknownF(...) # instansiate the model of the unknown function
(b) Training data:
x = torch.randn(n, 1000, requires_grad=True) # n examples of 1000-dim vectors
dy = torch.randn(n, 1000) # the corresponding n-dim gradients of the n inputs
(c) An optimizer:
opt = torch.optim.SGD(model.parameters(), lr=0.1)
(d) Put it together:
criterion = nn.MSELoss()
for e in range(num_epochs):
for i in range(n):
# batch size = 1, pick one example
x_ = x[i, :]
dy_ = dy[i, :]
opt.zero_grad()
# predict the unknown output
y_ = model(x_)
# compute the gradients of the model using autograd:
pred_dy_ = autograd.grad(y_, x_, create_graph=True)[0]
# compute the loss between the model's gradients and the GT ones:
loss = criterion(pred_dy_, dy_)
loss.backward()
opt.step() # update model's parameters accordingly.

ValueError: shapes (240000,28,28) and (2,512) not aligned: 28 (dim 2) != 2 (dim 0)

I'm making a CNN and I've got this error that the matrices don't align and i understand the error but i don't know how to fix it. Here is the code:
import numpy as np
import nnfs
import emnist
import os
import cv2
import pickle
import copy
nnfs.init()
# Dense layer
class Layer_Dense:
# Layer initialization
def __init__(self, n_inputs, n_neurons,
weight_regularizer_l1=0, weight_regularizer_l2=0,
bias_regularizer_l1=0, bias_regularizer_l2=0):
# Initialize weights and biases
self.weights = 0.01 * np.random.randn(n_inputs, n_neurons)
self.biases = np.zeros((1, n_neurons))
# Set regularization strength
self.weight_regularizer_l1 = weight_regularizer_l1
self.weight_regularizer_l2 = weight_regularizer_l2
self.bias_regularizer_l1 = bias_regularizer_l1
self.bias_regularizer_l2 = bias_regularizer_l2
# Forward pass
def forward(self, inputs, training):
# Remember input values
self.inputs = inputs
# Calculate output values from inputs, weights and biases
self.output = np.dot(inputs, self.weights) + self.biases
# Backward pass
def backward(self, dvalues):
# Gradients on parameters
self.dweights = np.dot(self.inputs.T, dvalues)
self.dbiases = np.sum(dvalues, axis=0, keepdims=True)
# Gradients on regularization
# L1 on weights
if self.weight_regularizer_l1 > 0:
dL1 = np.ones_like(self.weights)
dL1[self.weights < 0] = -1
self.dweights += self.weight_regularizer_l1 * dL1
# L2 on weights
if self.weight_regularizer_l2 > 0:
self.dweights += 2 * self.weight_regularizer_l2 * \
self.weights
# L1 on biases
if self.bias_regularizer_l1 > 0:
dL1 = np.ones_like(self.biases)
dL1[self.biases < 0] = -1
self.dbiases += self.bias_regularizer_l1 * dL1
# L2 on biases
if self.bias_regularizer_l2 > 0:
self.dbiases += 2 * self.bias_regularizer_l2 * \
self.biases
# Gradient on values
self.dinputs = np.dot(dvalues, self.weights.T)
# Retrieve layer parameters
def get_parameters(self):
return self.weights, self.biases
# Set weights and biases in a layer instance
def set_parameters(self, weights, biases):
self.weights = weights
self.biases = biases
# Dropout
class Layer_Dropout:
# Init
def __init__(self, rate):
# Store rate, we invert it as for example for dropout
# of 0.1 we need success rate of 0.9
self.rate = 1 - rate
# Forward pass
def forward(self, inputs, training):
# Save input values
self.inputs = inputs
# If not in the training mode - return values
if not training:
self.output = inputs.copy()
return
# Generate and save scaled mask
self.binary_mask = np.random.binomial(1, self.rate,size=inputs.shape) / self.rate
# Apply mask to output values
self.output = inputs * self.binary_mask
# Backward pass
def backward(self, dvalues):
# Gradient on values
self.dinputs = dvalues * self.binary_mask
#Input "layer"
class Layer_Input:
# Forward pass
def forward(self, inputs, training):
self.output = inputs
# ReLU activation
class Activation_ReLU:
# Forward pass
def forward(self, inputs, training):
# Remember input values
self.inputs = inputs
# Calculate output values from inputs
self.output = np.maximum(0, inputs)
# Backward pass
def backward(self, dvalues):
# Since we need to modify original variable,
# let's make a copy of values first
self.dinputs = dvalues.copy()
# Zero gradient where input values were negative
self.dinputs[self.inputs <= 0] = 0
# Calculate predictions for outputs
def predictions(self, outputs):
return outputs
# Softmax activation
class Activation_Softmax:
# Forward pass
def forward(self, inputs, training):
# Remember input values
self.inputs = inputs
# Get unnormalized probabilities
exp_values = np.exp(inputs - np.max(inputs, axis=1,keepdims=True))
# Normalize them for each sample
probabilities = exp_values / np.sum(exp_values, axis=1,keepdims=True)
self.output = probabilities
# Backward pass
def backward(self, dvalues):
# Create uninitialized array
self.dinputs = np.empty_like(dvalues)
# Enumerate outputs and gradients
for index, (single_output, single_dvalues) in enumerate(zip(self.output, dvalues)):
# Flatten output array
single_output = single_output.reshape(-1, 1)
# Calculate Jacobian matrix of the output
jacobian_matrix = np.diagflat(single_output) - np.dot(single_output, single_output.T)
# Calculate sample-wise gradient
# and add it to the array of sample gradients
self.dinputs[index] = np.dot(jacobian_matrix,single_dvalues)
# Calculate predictions for outputs
def predictions(self, outputs):
return np.argmax(outputs, axis=1)
# Adam optimizer
class Optimizer_Adam:
# Initialize optimizer - set settings
def __init__(self, learning_rate=0.001, decay=0., epsilon=1e-7,
beta_1=0.9, beta_2=0.999):
self.learning_rate = learning_rate
self.current_learning_rate = learning_rate
self.decay = decay
self.iterations = 0
self.epsilon = epsilon
self.beta_1 = beta_1
self.beta_2 = beta_2
# Call once before any parameter updates
def pre_update_params(self):
if self.decay:
self.current_learning_rate = self.learning_rate * (1. / (1. + self.decay * self.iterations))
# Update parameters
def update_params(self, layer):
# If layer does not contain cache arrays,
# create them filled with zeros
if not hasattr(layer, 'weight_cache'):
layer.weight_momentums = np.zeros_like(layer.weights)
layer.weight_cache = np.zeros_like(layer.weights)
layer.bias_momentums = np.zeros_like(layer.biases)
layer.bias_cache = np.zeros_like(layer.biases)
# Update momentum with current gradients
layer.weight_momentums = self.beta_1 * layer.weight_momentums + (1 - self.beta_1) * layer.dweights
layer.bias_momentums = self.beta_1 * layer.bias_momentums + (1 - self.beta_1) * layer.dbiases
# Get corrected momentum
# self.iteration is 0 at first pass
# and we need to start with 1 here
weight_momentums_corrected = layer.weight_momentums / (1 - self.beta_1 ** (self.iterations + 1))
bias_momentums_corrected = layer.bias_momentums / (1 - self.beta_1 ** (self.iterations + 1))
# Update cache with squared current gradients
layer.weight_cache = self.beta_2 * layer.weight_cache + (1 - self.beta_2) * layer.dweights**2
layer.bias_cache = self.beta_2 * layer.bias_cache + (1 - self.beta_2) * layer.dbiases**2
# Get corrected cache
weight_cache_corrected = layer.weight_cache / (1 - self.beta_2 ** (self.iterations + 1))
bias_cache_corrected = layer.bias_cache / (1 - self.beta_2 ** (self.iterations + 1))
# Vanilla SGD parameter update + normalization
# with square rooted cache
layer.weights += -self.current_learning_rate * weight_momentums_corrected / (np.sqrt(weight_cache_corrected) + self.epsilon)
layer.biases += -self.current_learning_rate * bias_momentums_corrected / (np.sqrt(bias_cache_corrected) + self.epsilon)
# Call once after any parameter updates
def post_update_params(self):
self.iterations += 1
# Common loss class
class Loss:
# Regularization loss calculation
def regularization_loss(self):
# 0 by default
regularization_loss = 0
# Calculate regularization loss
# iterate all trainable layers
for layer in self.trainable_layers:
# L1 regularization - weights
# calculate only when factor greater than 0
if layer.weight_regularizer_l1 > 0:
regularization_loss += layer.weight_regularizer_l1 * np.sum(np.abs(layer.weights))
# L2 regularization - weights
if layer.weight_regularizer_l2 > 0:
regularization_loss += layer.weight_regularizer_l2 * np.sum(layer.weights * layer.weights)
# L1 regularization - biases
# calculate only when factor greater than 0
if layer.bias_regularizer_l1 > 0:
regularization_loss += layer.bias_regularizer_l1 * np.sum(np.abs(layer.biases))
# L2 regularization - biases
if layer.bias_regularizer_l2 > 0:
regularization_loss += layer.bias_regularizer_l2 * np.sum(layer.biases * layer.biases)
return regularization_loss
# Set/remember trainable layers
def remember_trainable_layers(self, trainable_layers):
self.trainable_layers = trainable_layers
# Calculates the data and regularization losses
# given model output and ground truth values
def calculate(self, output, y, *, include_regularization=False):
# Calculate sample losses
sample_losses = self.forward(output, y)
# Calculate mean loss
data_loss = np.mean(sample_losses)
# Add accumulated sum of losses and sample count
self.accumulated_sum += np.sum(sample_losses)
self.accumulated_count += len(sample_losses)
# If just data loss - return it
if not include_regularization:
return data_loss
# Return the data and regularization losses
return data_loss, self.regularization_loss()
# Calculates accumulated loss
def calculate_accumulated(self, *, include_regularization=False):
# Calculate mean loss
data_loss = self.accumulated_sum / self.accumulated_count
# If just data loss - return it
if not include_regularization:
return data_loss
# Return the data and regularization losses
return data_loss, self.regularization_loss()
# Reset variables for accumulated loss
def new_pass(self):
self.accumulated_sum = 0
self.accumulated_count = 0
# Cross-entropy loss
class Loss_CategoricalCrossentropy(Loss):
# Forward pass
def forward(self, y_pred, y_true):
# Number of samples in a batch
samples = len(y_pred)
# Clip data to prevent division by 0
# Clip both sides to not drag mean towards any value
y_pred_clipped = np.clip(y_pred, 1e-7, 1 - 1e-7)
# Probabilities for target values -
# only if categorical labels
if len(y_true.shape) == 1:
correct_confidences = y_pred_clipped[range(samples),y_true]
# Mask values - only for one-hot encoded labels
elif len(y_true.shape) == 2:
correct_confidences = np.sum(y_pred_clipped * y_true,axis=1)
# Losses
negative_log_likelihoods = -np.log(correct_confidences)
return negative_log_likelihoods
# Backward pass
def backward(self, dvalues, y_true):
# Number of samples
samples = len(dvalues)
# Number of labels in every sample
# We'll use the first sample to count them
labels = len(dvalues[0])
# If labels are sparse, turn them into one-hot vector
if len(y_true.shape) == 1:
y_true = np.eye(labels)[y_true]
# Calculate gradient
self.dinputs = -y_true / dvalues
# Normalize gradient
self.dinputs = self.dinputs / samples
# Softmax classifier - combined Softmax activation
# and cross-entropy loss for faster backward step
class Activation_Softmax_Loss_CategoricalCrossentropy():
# Backward pass
def backward(self, dvalues, y_true):
# Number of samples
samples = len(dvalues)
# If labels are one-hot encoded,
# turn them into discrete values
if len(y_true.shape) == 2:
y_true = np.argmax(y_true, axis=1)
# Copy so we can safely modify
self.dinputs = dvalues.copy()
# Calculate gradient
self.dinputs[range(samples), y_true] -= 1
# Normalize gradient
self.dinputs = self.dinputs / samples
# Common accuracy class
class Accuracy:
# Calculates an accuracy
# given predictions and ground truth values
def calculate(self, predictions, y):
# Get comparison results
comparisons = self.compare(predictions, y)
# Calculate an accuracy
accuracy = np.mean(comparisons)
# Add accumulated sum of matching values and sample count
self.accumulated_sum += np.sum(comparisons)
self.accumulated_count += len(comparisons)
# Return accuracy
return accuracy
# Calculates accumulated accuracy
def calculate_accumulated(self):
# Calculate an accuracy
accuracy = self.accumulated_sum / self.accumulated_count
# Return the data and regularization losses
return accuracy
# Reset variables for accumulated accuracy
def new_pass(self):
self.accumulated_sum = 0
self.accumulated_count = 0
# Accuracy calculation for classification model
class Accuracy_Categorical(Accuracy):
def __init__(self, *, binary=False):
# Binary mode?
self.binary = binary
# No initialization is needed
def init(self, y):
pass
# Compares predictions to the ground truth values
def compare(self, predictions, y):
if not self.binary and len(y.shape) == 2:
y = np.argmax(y, axis=1)
return predictions == y
# Model class
class Model:
def __init__(self):
# Create a list of network objects
self.layers = []
# Softmax classifier's output object
self.softmax_classifier_output = None
# Add objects to the model
def add(self, layer):
self.layers.append(layer)
#
# Set loss, optimizer and accuracy
def set(self, *, loss=None, optimizer=None, accuracy=None):
if loss is not None:
self.loss = loss
if optimizer is not None:
self.optimizer = optimizer
if accuracy is not None:
self.accuracy = accuracy
# Finalize the model
def finalize(self):
# Create and set the input layer
self.input_layer = Layer_Input()
# Count all the objects
layer_count = len(self.layers)
# Initialize a list containing trainable layers:
self.trainable_layers = []
# Iterate the objects
for i in range(layer_count):
# If it's the first layer,
# the previous layer object is the input layer
if i == 0:
self.layers[i].prev = self.input_layer
self.layers[i].next = self.layers[i+1]
# All layers except for the first and the last
elif i < layer_count - 1:
self.layers[i].prev = self.layers[i-1]
self.layers[i].next = self.layers[i+1]
# The last layer - the next object is the loss
# Also let's save aside the reference to the last object
# whose output is the model's output
else:
self.layers[i].prev = self.layers[i-1]
self.layers[i].next = self.loss
self.output_layer_activation = self.layers[i]
# If layer contains an attribute called "weights",
# it's a trainable layer -
# add it to the list of trainable layers
# We don't need to check for biases -
# checking for weights is enough
if hasattr(self.layers[i], 'weights'):
self.trainable_layers.append(self.layers[i])
# Update loss object with trainable layers
if self.loss is not None:
self.loss.remember_trainable_layers(self.trainable_layers)
# If output activation is Softmax and
# loss function is Categorical Cross-Entropy
# create an object of combined activation
# and loss function containing
# faster gradient calculation
if isinstance(self.layers[-1], Activation_Softmax) and isinstance(self.loss, Loss_CategoricalCrossentropy):
# Create an object of combined activation
# and loss functions
self.softmax_classifier_output = Activation_Softmax_Loss_CategoricalCrossentropy()
# Train the model
def train(self, X, y, *, epochs=1, batch_size=None,print_every=1, validation_data=None):
# Initialize accuracy object
self.accuracy.init(y)
# Default value if batch size is not being set
train_steps = 1
# Calculate number of steps
if batch_size is not None:
train_steps = len(X) // batch_size
# Dividing rounds down. If there are some remaining
# data but not a full batch, this won't include it
# Add `1` to include this not full batch
if train_steps * batch_size < len(X):
train_steps += 1
# Main training loop
for epoch in range(1, epochs+1):
# Print epoch number
print(f'epoch: {epoch}')
# Reset accumulated values in loss and accuracy objects
self.loss.new_pass()
self.accuracy.new_pass()
# Iterate over steps
for step in range(train_steps):
# If batch size is not set -
# train using one step and full dataset
if batch_size is None:
batch_X = X
batch_y = y
# Otherwise slice a batch
else:
batch_X = X[step*batch_size:(step+1)*batch_size]
atch_y = y[step*batch_size:(step+1)*batch_size]
# Perform the forward pass
output = self.forward(batch_X, training=True)
# Calculate loss
data_loss, regularization_loss = self.loss.calculate(output, batch_y,include_regularization=True)
loss = data_loss + regularization_loss
# Get predictions and calculate an accuracy
predictions = self.output_layer_activation.predictions(output)
accuracy = self.accuracy.calculate(predictions,batch_y)
# Perform backward pass
self.backward(output, batch_y)
# Optimize (update parameters)
self.optimizer.pre_update_params()
for layer in self.trainable_layers:
self.optimizer.update_params(layer)
self.optimizer.post_update_params()
# Print a summary
if not step % print_every or step == train_steps - 1:
print(f'step: {step}, ' +
f'acc: {accuracy:.3f}, ' +
f'loss: {loss:.3f} (' +
f'data_loss: {data_loss:.3f}, ' +
f'reg_loss: {regularization_loss:.3f}), ' +
f'lr: {self.optimizer.current_learning_rate}')
# Get and print epoch loss and accuracy
epoch_data_loss, epoch_regularization_loss = self.loss.calculate_accumulated(include_regularization=True)
epoch_loss = epoch_data_loss + epoch_regularization_loss
epoch_accuracy = self.accuracy.calculate_accumulated()
print(f'training, ' +
f'acc: {epoch_accuracy:.3f}, ' +
f'loss: {epoch_loss:.3f} (' +
f'data_loss: {epoch_data_loss:.3f}, ' +
f'reg_loss: {epoch_regularization_loss:.3f}), ' +
f'lr: {self.optimizer.current_learning_rate}')
# If there is the validation data
if validation_data is not None:
# Evaluate the model:
self.evaluate(*validation_data,batch_size=batch_size)
# Evaluates the model using passed-in dataset
def evaluate(self, X_val, y_val, *, batch_size=None):
# Default value if batch size is not being set
validation_steps = 1
# Calculate number of steps
if batch_size is not None:
validation_steps = len(X_val) // batch_size
# Dividing rounds down. If there are some remaining
# data but not a full batch, this won't include it
# Add `1` to include this not full batch
if validation_steps * batch_size < len(X_val):
validation_steps += 1
# Reset accumulated values in loss
# and accuracy objects
self.loss.new_pass()
self.accuracy.new_pass()
# Iterate over steps
for step in range(validation_steps):
# If batch size is not set -
# train using one step and full dataset
if batch_size is None:
batch_X = X_val
batch_y = y_val
# Otherwise slice a batch
else:
batch_X = X_val[step*batch_size:(step+1)*batch_size]
batch_y = y_val[step*batch_size:(step+1)*batch_size]
# Perform the forward pass
output = self.forward(batch_X, training=False)
# Calculate the loss
self.loss.calculate(output, batch_y)
# Get predictions and calculate an accuracy
predictions = self.output_layer_activation.predictions(output)
self.accuracy.calculate(predictions, batch_y)
# Get and print validation loss and accuracy
validation_loss = self.loss.calculate_accumulated()
validation_accuracy = self.accuracy.calculate_accumulated()
# Print a summary
print(f'validation, ' +
f'acc: {validation_accuracy:.3f}, ' +
f'loss: {validation_loss:.3f}')
# Predicts on the samples
def predict(self, X, *, batch_size=None):
# Default value if batch size is not being set
prediction_steps = 1
# Calculate number of steps
if batch_size is not None:
prediction_steps = len(X) // batch_size
# Dividing rounds down. If there are some remaining
# data but not a full batch, this won't include it
# Add `1` to include this not full batch
if prediction_steps * batch_size < len(X):
prediction_steps += 1
# Model outputs
output = []
# Iterate over steps
for step in range(prediction_steps):
# If batch size is not set -
# train using one step and full dataset
if batch_size is None:
batch_X = X
# Otherwise slice a batch
else:
batch_X = X[step*batch_size:(step+1)*batch_size]
# Perform the forward pass
batch_output = self.forward(batch_X, training=False)
# Append batch prediction to the list of predictions
output.append(batch_output)
# Stack and return results
return np.vstack(output)
# Performs forward pass
def forward(self, X, training):
# Call forward method on the input layer
# this will set the output property that
# the first layer in "prev" object is expecting
self.input_layer.forward(X, training)
# Call forward method of every object in a chain
# Pass output of the previous object as a parameter
for layer in self.layers:
layer.forward(layer.prev.output, training)
# "layer" is now the last object from the list,
# return its output
# Performs backward pass
def backward(self, output, y):
# If softmax classifier
if self.softmax_classifier_output is not None:
# First call backward method
# on the combined activation/loss
# this will set dinputs property
self.softmax_classifier_output.backward(output, y)
# Since we'll not call backward method of the last layer
# which is Softmax activation
# as we used combined activation/loss
# object, let's set dinputs in this object
self.layers[-1].dinputs = self.softmax_classifier_output.dinputs
# Call backward method going through
# all the objects but last
# in reversed order passing dinputs as a parameter
for layer in reversed(self.layers[:-1]):
layer.backward(layer.next.dinputs)
return
# First call backward method on the loss
# this will set dinputs property that the last
# layer will try to access shortly
self.loss.backward(output, y)
# Call backward method going through all the objects
# in reversed order passing dinputs as a parameter
for layer in reversed(self.layers):
layer.backward(layer.next.dinputs)
# Retrieves and returns parameters of trainable layers
def get_parameters(self):
# Create a list for parameters
parameters = []
# Iterable trainable layers and get their parameters
for layer in self.trainable_layers:
parameters.append(layer.get_parameters())
# Return a list
return parameters
#Updates the model with new parameters
def set_parameters(self, parameters):
# Iterate over the parameters and layers
# and update each layers with each set of the parameters
for parameter_set, layer in zip(parameters,self.trainable_layers):
layer.set_parameters(*parameter_set)
# Saves the parameters to a file
def save_parameters(self, path):
# Open a file in the binary-write mode
# and save parameters into it
with open(path, 'wb') as f:
pickle.dump(self.get_parameters(), f)
# Loads the weights and updates a model instance with them
def load_parameters(self, path):
# Open file in the binary-read mode,
# load weights and update trainable layers
with open(path, 'rb') as f:
self.set_parameters(pickle.load(f))
# Saves the model
def save(self, path):
# Make a deep copy of current model instance
model = copy.deepcopy(self)
# Reset accumulated values in loss and accuracy objects
model.loss.new_pass()
model.accuracy.new_pass()
# Remove data from the input layer
# and gradients from the loss object
model.input_layer.__dict__.pop('output', None)
model.loss.__dict__.pop('dinputs', None)
# For each layer remove inputs, output and dinputs properties
for layer in model.layers:
for property in ['inputs', 'output', 'dinputs','dweights', 'dbiases']:
layer.__dict__.pop(property, None)
# Open a file in the binary-write mode and save the model
with open(path, 'wb') as f:
pickle.dump(model, f)
# Loads and returns a model
#staticmethod
def load(path):
# Open file in the binary-read mode, load a model
with open(path, 'rb') as f:
model = pickle.load(f)
# Return a model
return model
# Create dataset
X, y = emnist.extract_training_samples('digits')
X_test, y_test = emnist.extract_test_samples('digits')
# Instantiate the model
model = Model()
# Add layers
model.add(Layer_Dense(2, 512, weight_regularizer_l2=5e-4,bias_regularizer_l2=5e-4))
model.add(Activation_ReLU())
model.add(Layer_Dropout(0.1))
model.add(Layer_Dense(512, 3))
model.add(Activation_Softmax())
# Set loss, optimizer and accuracy objects
model.set(
loss=Loss_CategoricalCrossentropy(),
optimizer=Optimizer_Adam(learning_rate=0.05, decay=5e-5),
accuracy=Accuracy_Categorical()
)
# Finalize the model
model.finalize()
# Train the model
model.train(X, y, validation_data=(X_test, y_test),epochs=10000, print_every=100)
And this is the error i get in sublime text:
epoch: 1
Traceback (most recent call last):
File "/media/luke/New Volume/final project/untitled.py", line 654, in <module>
model.train(X, y, validation_data=(X_test, y_test),epochs=10000, print_every=100)
File "/media/luke/New Volume/final project/untitled.py", line 430, in train
output = self.forward(batch_X, training=True)
File "/media/luke/New Volume/final project/untitled.py", line 545, in forward
layer.forward(layer.prev.output, training)
File "/media/luke/New Volume/final project/untitled.py", line 29, in forward
self.output = np.dot(inputs, self.weights) + self.biases
File "/home/luke/.local/lib/python3.8/site-packages/nnfs/core.py", line 22, in dot
return orig_dot(*[a.astype('float64') for a in args], **kwargs).astype('float32')
File "<__array_function__ internals>", line 5, in dot
ValueError: shapes (240000,28,28) and (2,512) not aligned: 28 (dim 2) != 2 (dim 0)
As you can see it gets to epoch 1 then when trying to do the numpy dot product and then cant do it.
I'd appreciate any help
Thanks :)
Firstly, you should flatten your input so its shape is (240000, 28*28) = (240000, 784). After that, the problem is in this line:
model.add(Layer_Dense(2, 512, weight_regularizer_l2=5e-4,bias_regularizer_l2=5e-4))
You set your input size to 2, when it should be 784 which is the number of pixels in each image (assuming you're using MNIST).
model.add(Layer_Dense(784, 512, weight_regularizer_l2=5e-4,bias_regularizer_l2=5e-4))
Should work correctly if your inputs are flattened.
Edit: To flatten your inputs I would use np.reshape as demonstrated here https://stackoverflow.com/a/18758049/11777402.
X.reshape(240000, 784)

Linear Function in Neural Network is producing large value as output

I am performing regression on the iris data set to predict its type. I have successfully performed classification using the same data and same neural network. For classification, I have used tanh as the activation function in all layers. But for regression, I am using tanh function in the hidden layer and identity function in the output layer.
import numpy as np
class BackPropagation:
weight =[]
output =[]
layers =0
eta = 0.1
def __init__(self, x):
self.layers = len(x)
for i in range(self.layers-2):
w = np.random.randn(x[i]+1,x[i+1]+1)
self.weight.append(w)
w = w = np.random.randn(x[-2]+1,x[-1])
self.weight.append(w)
def tanh(self,x):
return np.tanh(x)
def deriv_tanh(self,x):
return 1.0-(x**2)
def linear(self,x):
return x
def deriv_linear(self,x):
return 1
def training(self,in_data,target,epoch=100):
bias = np.atleast_2d(np.ones(in_data.shape[0])*(-1)).T
in_data = np.hstack((in_data,bias))
print("Training Starts ......")
while epoch!=0:
epoch-=1
self.output=[]
self.output.append(in_data)
# FORWARD PHASE
for j in range(self.layers-2):
y_in = np.dot(self.output[j],self.weight[j])
y_out = self.tanh(y_in)
self.output.append(y_out)
y_in = np.dot(self.output[-1],self.weight[-1])
y_out = self.linear(y_in)
self.output.append(y_out)
print("Weight Is")
for i in self.weight:
print(i)
# BACKWARD PHASE
error = self.output[-1]-target
print("ERROR IS")
print(np.mean(0.5*error*error))
delta=[]
delta_o = error * self.deriv_linear(self.output[-1])
delta.append(delta_o)
for k in reversed(range(self.layers-2)):
delta_h = np.dot(delta[-1],self.weight[k+1].T) * self.deriv_tanh(self.output[k+1])
delta.append(delta_h)
delta.reverse()
# WEIGHT UPDATE
for i in range(self.layers-1):
self.weight[i] -= (self.eta * np.dot(self.output[i].T, delta[i]))
print("Training complete !")
print("ACCURACY IS")
acc = (1.0-(0.5*error*error))*100
print(np.mean(acc))
def recall(self,in_data):
in_data = np.atleast_2d(in_data)
bias = np.atleast_2d(np.ones(in_data.shape[0])*(-1)).T
in_data = np.hstack((in_data,bias))
y_out = in_data.copy()
for i in range(self.layers-2):
y_in = np.dot(y_out,self.weight[i])
y_out = self.tanh(y_in).copy()
y_in = np.dot(y_out,self.weight[-1])
y_out = self.linear(y_in).copy()
return y_out
# MAIN
data = np.loadtxt("iris.txt",delimiter=",")
obj = BackPropagation([4,2,1])
in_data = data[:rows,:cols].copy()
target = data[:rows,cols:].copy()
obj.training(in_data,target)
print("ANSWER IS")
print(obj.recall(in_data))
The data set is something like this. Here, first 4 columns are features and last column contains the target value. There are 150 records like this in the data set.
5.1,3.5,1.4,0.2,0
4.9,3.0,1.4,0.2,0
5.0,3.6,1.4,0.2,0
5.4,3.9,1.7,0.4,0
4.6,3.4,1.4,0.3,0
7.0,3.2,4.7,1.4,1
6.4,3.2,4.5,1.5,1
6.9,3.1,4.9,1.5,1
5.5,2.3,4.0,1.3,1
6.3,3.3,6.0,2.5,2
5.8,2.7,5.1,1.9,2
7.1,3.0,5.9,2.1,2
6.3,2.9,5.6,1.8,2
After every epoch, the predicted value is increasing exponentially. And, within 50 epochs, the code gives INF or -INF as output. Instead of identity function, I also tried leaky ReLU, but still the output was INF. I have also tried varying learning rate , number of neurons in hidden layers, number of hidden layers, initial weight values, number of iterations etc.
So, how can I perform regression using neural network with back propagation of error ?
Use the mean squared error function for regression tasks. For classification tasks, one usually uses a softmax layer as output and optimizes the cross-entry cost function.

How to use different loss functions with the PSO optimised Neural Network code?

I am using pyswarms PSO for neural network optimisation. I am trying to create a network of input layer and output layer.
# Store the features as X and the labels as y
X = np.random.randn(25000,20)
y = np.random.random_integers(0,2,25000)
# In[29]:
def sigmoid(x):
return 1 / (1 + math.exp(-x))
# In[58]:
print(X_train.shape)
print(y_train.shape)
# In[63]:
# Forward propagation
def forward_prop(params):
"""Forward propagation as objective function
This computes for the forward propagation of the neural network, as
well as the loss. It receives a set of parameters that must be
rolled-back into the corresponding weights and biases.
Inputs
------
params: np.ndarray
The dimensions should include an unrolled version of the
weights and biases.
Returns
-------
float
The computed negative log-likelihood loss given the parameters
"""
# Neural network architecture
n_inputs = 20
n_classes = 2
# Roll-back the weights and biases
W1 = params[0:40]
# Perform forward propagation
z1 = X.dot(W1) # Pre-activation in Layer 1
#a1 = np.tanh(z1) # Activation in Layer 1
#z2 = a1.dot(W2) + b2 # Pre-activation in Layer 2
logits = z1 # Logits for Layer 2
# Compute for the softmax of the logits
exp_scores = np.exp(logits)
probs = exp_scores / np.sum(exp_scores,axis=1)
# Compute for the negative log likelihood
N = 25000 # Number of samples
corect_logprobs = -np.log(probs[range(N), y])
loss = np.sum(corect_logprobs) / N
return loss
# In[64]:
def f(x):
# Compute for the negative log likelihood
"""Higher-level method to do forward_prop in the
whole swarm.
Inputs
------
x: numpy.ndarray of shape (n_particles, dimensions)
The swarm that will perform the search
Returns
-------
numpy.ndarray of shape (n_particles, )
The computed loss for each particle
"""
n_particles = x.shape[0]
j = [forward_prop(x[i]) for i in range(n_particles)]
return np.array(j)
# In[65]:
# Initialize swarm
options = {'c1': 0.5, 'c2': 0.3, 'w':0.9}
# Call instance of PSO
dimensions = 20
optimizer = ps.single.GlobalBestPSO(n_particles=100, dimensions=dimensions, options=options)
# Perform optimization
cost, pos = optimizer.optimize(f, print_step=100, iters=1000, verbose=3)
I modified the code the from the examples but I am getting errors
AxisError: axis 1 is out of bounds for array of dimension 1.
Moreover, this example implements softmax function in last layer. How should I use it with different loss functions?
Original code can be found here.

Neural Network seems to be getting stuck on a single output with each execution

I've created a neural network to estimate the sin(x) function for an input x. The network has 21 output neurons (representing numbers -1.0, -0.9, ..., 0.9, 1.0) with numpy that does not learn, as I think I implemented the neuron architecture incorrectly when I defined the feedforward mechanism.
When I execute the code, the amount of test data it estimates correctly sits around 48/1000. This happens to be the average data point count per category if you split 1000 test data points between 21 categories. Looking at the network output, you can see that the network seems to just start picking a single output value for every input. For example, it may pick -0.5 as the estimate for y regardless of the x you give it. Where did I go wrong here? This is my first network. Thank you!
import random
import numpy as np
import math
class Network(object):
def __init__(self,inputLayerSize,hiddenLayerSize,outputLayerSize):
#Create weight vector arrays to represent each layer size and initialize indices randomly on a Gaussian distribution.
self.layer1 = np.random.randn(hiddenLayerSize,inputLayerSize)
self.layer1_activations = np.zeros((hiddenLayerSize, 1))
self.layer2 = np.random.randn(outputLayerSize,hiddenLayerSize)
self.layer2_activations = np.zeros((outputLayerSize, 1))
self.outputLayerSize = outputLayerSize
self.inputLayerSize = inputLayerSize
self.hiddenLayerSize = hiddenLayerSize
# print(self.layer1)
# print()
# print(self.layer2)
# self.weights = [np.random.randn(y,x)
# for x, y in zip(sizes[:-1], sizes[1:])]
def feedforward(self, network_input):
#Propogate forward through network as if doing this by hand.
#first layer's output activations:
for neuron in range(self.hiddenLayerSize):
self.layer1_activations[neuron] = 1/(1+np.exp(network_input * self.layer1[neuron]))
#second layer's output activations use layer1's activations as input:
for neuron in range(self.outputLayerSize):
for weight in range(self.hiddenLayerSize):
self.layer2_activations[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
self.layer2_activations[neuron] = 1/(1+np.exp(self.layer2_activations[neuron]))
#convert layer 2 activation numbers to a single output. The neuron (weight vector) with highest activation will be output.
outputs = [x / 10 for x in range(-int((self.outputLayerSize/2)), int((self.outputLayerSize/2))+1, 1)] #range(-10, 11, 1)
return(outputs[np.argmax(self.layer2_activations)])
def train(self, training_pairs, epochs, minibatchsize, learn_rate):
#apply gradient descent
test_data = build_sinx_data(1000)
for epoch in range(epochs):
random.shuffle(training_pairs)
minibatches = [training_pairs[k:k + minibatchsize] for k in range(0, len(training_pairs), minibatchsize)]
for minibatch in minibatches:
loss = 0 #calculate loss for each minibatch
#Begin training
for x, y in minibatch:
network_output = self.feedforward(x)
loss += (network_output - y) ** 2
#adjust weights by abs(loss)*sigmoid(network_output)*(1-sigmoid(network_output)*learn_rate
loss /= (2*len(minibatch))
adjustWeights = loss*(1/(1+np.exp(-network_output)))*(1-(1/(1+np.exp(-network_output))))*learn_rate
self.layer1 += adjustWeights
#print(adjustWeights)
self.layer2 += adjustWeights
#when line 63 placed here, results did not improve during minibatch.
print("Epoch {0}: {1}/{2} correct".format(epoch, self.evaluate(test_data), len(test_data)))
print("Training Complete")
def evaluate(self, test_data):
"""
Returns number of test inputs which network evaluates correctly.
The ouput assumed to be neuron in output layer with highest activation
:param test_data: test data set identical in form to train data set.
:return: integer sum
"""
correct = 0
for x, y in test_data:
output = self.feedforward(x)
if output == y:
correct+=1
return(correct)
def build_sinx_data(data_points):
"""
Creates a list of tuples (x value, expected y value) for Sin(x) function.
:param data_points: number of desired data points
:return: list of tuples (x value, expected y value
"""
x_vals = []
y_vals = []
for i in range(data_points):
#parameter of randint signifies range of x values to be used*10
x_vals.append(random.randint(-2000,2000)/10)
y_vals.append(round(math.sin(x_vals[i]),1))
return (list(zip(x_vals,y_vals)))
# training_pairs, epochs, minibatchsize, learn_rate
sinx_test = Network(1,21,21)
print(sinx_test.feedforward(10))
sinx_test.train(build_sinx_data(600),20,10,2)
print(sinx_test.feedforward(10))
I didn't examine thoroughly all of your code, but some issues are clearly visible:
* operator doesn't perform matrix multiplication in numpy, you have to use numpy.dot. This affects, for instance, these lines: network_input * self.layer1[neuron], self.layer1_activations[weight]*self.layer2[neuron][weight], etc.
Seems like you are solving your problem via classification (selecting 1 out of 21 classes), but using L2 loss. This is somewhat mixed up. You have two options: either stick to classification and use a cross entropy loss function, or perform regression (i.e. predict the numeric value) with L2 loss.
You should definitely extract sigmoid function to avoid writing the same expression all over again:
def sigmoid(z):
return 1 / (1 + np.exp(-z))
def sigmoid_derivative(x):
return sigmoid(x) * (1 - sigmoid(x))
You perform the same update of self.layer1 and self.layer2, which clearly wrong. Take some time analyzing how exactly backpropagation works.
I edited how my loss function was integrated into my function and also correctly implemented gradient descent. I also removed the use of mini-batches and simplified what my network was trying to do. I now have a network which attempts to classify something as even or odd.
Some extremely helpful guides I used to fix things up:
Chapter 1 and 2 of Neural Networks and Deep Learning, by Michael Nielsen, available for free at http://neuralnetworksanddeeplearning.com/chap1.html . This book gives thorough explanations for how Neural Nets work, including breakdowns of the math behind their execution.
Backpropagation from the Beginning, by Erik Hallström, linked by Maxim. https://medium.com/#erikhallstrm/backpropagation-from-the-beginning-77356edf427d
. Not as thorough as the above guide, but I kept both open concurrently, as this guide is more to the point about what is important and how to apply the mathematical formulas that are thoroughly explained in Nielsen's book.
How to build a simple neural network in 9 lines of Python code https://medium.com/technology-invention-and-more/how-to-build-a-simple-neural-network-in-9-lines-of-python-code-cc8f23647ca1
. A useful and fast introduction to some neural networking basics.
Here is my (now functioning) code:
import random
import numpy as np
import scipy
import math
class Network(object):
def __init__(self,inputLayerSize,hiddenLayerSize,outputLayerSize):
#Layers represented both by their weights array and activation and inputsums vectors.
self.layer1 = np.random.randn(hiddenLayerSize,inputLayerSize)
self.layer2 = np.random.randn(outputLayerSize,hiddenLayerSize)
self.layer1_activations = np.zeros((hiddenLayerSize, 1))
self.layer2_activations = np.zeros((outputLayerSize, 1))
self.layer1_inputsums = np.zeros((hiddenLayerSize, 1))
self.layer2_inputsums = np.zeros((outputLayerSize, 1))
self.layer1_errorsignals = np.zeros((hiddenLayerSize, 1))
self.layer2_errorsignals = np.zeros((outputLayerSize, 1))
self.layer1_deltaw = np.zeros((hiddenLayerSize, inputLayerSize))
self.layer2_deltaw = np.zeros((outputLayerSize, hiddenLayerSize))
self.outputLayerSize = outputLayerSize
self.inputLayerSize = inputLayerSize
self.hiddenLayerSize = hiddenLayerSize
print()
print(self.layer1)
print()
print(self.layer2)
print()
# self.weights = [np.random.randn(y,x)
# for x, y in zip(sizes[:-1], sizes[1:])]
def feedforward(self, network_input):
#Calculate inputsum and and activations for each neuron in the first layer
for neuron in range(self.hiddenLayerSize):
self.layer1_inputsums[neuron] = network_input * self.layer1[neuron]
self.layer1_activations[neuron] = self.sigmoid(self.layer1_inputsums[neuron])
# Calculate inputsum and and activations for each neuron in the second layer. Notice that each neuron in the second layer represented by
# weights vector, consisting of all weights leading out of the kth neuron in (l-1) layer to the jth neuron in layer l.
self.layer2_inputsums = np.zeros((self.outputLayerSize, 1))
for neuron in range(self.outputLayerSize):
for weight in range(self.hiddenLayerSize):
self.layer2_inputsums[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
self.layer2_activations[neuron] = self.sigmoid(self.layer2_inputsums[neuron])
return self.layer2_activations
def interpreted_output(self, network_input):
#convert layer 2 activation numbers to a single output. The neuron (weight vector) with highest activation will be output.
self.feedforward(network_input)
outputs = [x / 10 for x in range(-int((self.outputLayerSize/2)), int((self.outputLayerSize/2))+1, 1)] #range(-10, 11, 1)
return(outputs[np.argmax(self.layer2_activations)])
# def build_expected_output(self, training_data):
# #Views expected output number y for each x to generate an expected output vector from the network
# index=0
# for pair in training_data:
# expected_output_vector = np.zeros((self.outputLayerSize,1))
# x = training_data[0]
# y = training_data[1]
# for i in range(-int((self.outputLayerSize / 2)), int((self.outputLayerSize / 2)) + 1, 1):
# if y == i / 10:
# expected_output_vector[i] = 1
# #expect the target category to be a 1.
# break
# training_data[index][1] = expected_output_vector
# index+=1
# return training_data
def train(self, training_data, learn_rate):
self.backpropagate(training_data, learn_rate)
def backpropagate(self, train_data, learn_rate):
#Perform for each x,y pair.
for datapair in range(len(train_data)):
x = train_data[datapair][0]
y = train_data[datapair][1]
self.feedforward(x)
# print("l2a " + str(self.layer2_activations))
# print("l1a " + str(self.layer1_activations))
# print("l2 " + str(self.layer2))
# print("l1 " + str(self.layer1))
for neuron in range(self.outputLayerSize):
#Calculate first error equation for error signals of output layer neurons
self.layer2_errorsignals[neuron] = (self.layer2_activations[neuron] - y[neuron]) * self.sigmoid_prime(self.layer2_inputsums[neuron])
#Use recursive formula to calculate error signals of hidden layer neurons
self.layer1_errorsignals = np.multiply(np.array(np.matrix(self.layer2.T) * np.matrix(self.layer2_errorsignals)) , self.sigmoid_prime(self.layer1_inputsums))
#print(self.layer1_errorsignals)
# for neuron in range(self.hiddenLayerSize):
# #Use recursive formula to calculate error signals of hidden layer neurons
# self.layer1_errorsignals[neuron] = np.multiply(self.layer2[neuron].T,self.layer2_errorsignals[neuron]) * self.sigmoid_prime(self.layer1_inputsums[neuron])
#Partial derivative of C with respect to weight for connection from kth neuron in (l-1)th layer to jth neuron in lth layer is
#(jth error signal in lth layer) * (kth activation in (l-1)th layer.)
#Update all weights for network at each iteration of a training pair.
#Update weights in second layer
for neuron in range(self.outputLayerSize):
for weight in range(self.hiddenLayerSize):
self.layer2_deltaw[neuron][weight] = self.layer2_errorsignals[neuron]*self.layer1_activations[weight]*(-learn_rate)
self.layer2 += self.layer2_deltaw
#Update weights in first layer
for neuron in range(self.hiddenLayerSize):
self.layer1_deltaw[neuron] = self.layer1_errorsignals[neuron]*(x)*(-learn_rate)
self.layer1 += self.layer1_deltaw
#Comment/Uncomment to enable error evaluation.
#print("Epoch {0}: Error: {1}".format(datapair, self.evaluate(test_data)))
# print("l2a " + str(self.layer2_activations))
# print("l1a " + str(self.layer1_activations))
# print("l1 " + str(self.layer1))
# print("l2 " + str(self.layer2))
def evaluate(self, test_data):
error = 0
for x, y in test_data:
#x is integer, y is single element np.array
output = self.feedforward(x)
error += y - output
return error
#eval function for sin(x)
# def evaluate(self, test_data):
# """
# Returns number of test inputs which network evaluates correctly.
# The ouput assumed to be neuron in output layer with highest activation
# :param test_data: test data set identical in form to train data set.
# :return: integer sum
# """
# correct = 0
# for x, y in test_data:
# outputs = [x / 10 for x in range(-int((self.outputLayerSize / 2)), int((self.outputLayerSize / 2)) + 1,
# 1)] # range(-10, 11, 1)
# newy = outputs[np.argmax(y)]
# output = self.interpreted_output(x)
# #print("output: " + str(output))
# if output == newy:
# correct+=1
# return(correct)
def sigmoid(self, z):
return 1 / (1 + np.exp(-z))
def sigmoid_prime(self, z):
return (1 - self.sigmoid(z)) * self.sigmoid(z)
def build_simple_data(data_points):
x_vals = []
y_vals = []
for each in range(data_points):
x = random.randint(-3,3)
expected_output_vector = np.zeros((1, 1))
if x > 0:
expected_output_vector[[0]] = 1
else:
expected_output_vector[[0]] = 0
x_vals.append(x)
y_vals.append(expected_output_vector)
print(list(zip(x_vals,y_vals)))
print()
return (list(zip(x_vals,y_vals)))
simpleNet = Network(1, 3, 1)
# print("Pretest")
# print(simpleNet.feedforward(-3))
# print(simpleNet.feedforward(10))
# init_weights_l1 = simpleNet.layer1
# init_weights_l2 = simpleNet.layer2
# simpleNet.train(build_simple_data(10000),.1)
# #sometimes Error converges to 0, sometimes error converges to 10.
# print("Initial Weights:")
# print(init_weights_l1)
# print(init_weights_l2)
# print("Final Weights")
# print(simpleNet.layer1)
# print(simpleNet.layer2)
# print("Post-test")
# print(simpleNet.feedforward(-3))
# print(simpleNet.feedforward(10))
def test_network(iterations,net,training_points):
"""
Casually evaluates pre and post test
:param iterations: number of trials to be run
:param net: name of network to evaluate.
;param training_points: size of training data to be used
:return: four 1x1 arrays.
"""
pretest_negative = 0
pretest_positive = 0
posttest_negative = 0
posttest_positive = 0
for each in range(iterations):
pretest_negative += net.feedforward(-10)
pretest_positive += net.feedforward(10)
net.train(build_simple_data(training_points),.1)
for each in range(iterations):
posttest_negative += net.feedforward(-10)
posttest_positive += net.feedforward(10)
return(pretest_negative/iterations, pretest_positive/iterations, posttest_negative/iterations, posttest_positive/iterations)
print(test_network(10000, simpleNet, 10000))
While much differs between this code and the code posted in the OP, there is a particular difference that is interesting. In the original feedforward method notice
#second layer's output activations use layer1's activations as input:
for neuron in range(self.outputLayerSize):
for weight in range(self.hiddenLayerSize):
self.layer2_activations[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
self.layer2_activations[neuron] = 1/(1+np.exp(self.layer2_activations[neuron]))
The line
self.layer2_activations[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
Resembles
self.layer2_inputsums[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
In the updated code. This line performs the dot product between each weight vector and each input vector (the activations from layer 1) to arrive at the input_sum for a neuron, commonly referred to as z (think sigmoid(z)). In my network, the derivative of the sigmoid function, sigmoid_prime, is used to calculate the gradient of the cost function with respect to all the weights. By multiplying sigmoid_prime(z) * network error between actual and expected output. If z is very big (and positive), the neuron will have an activation value very close to 1. That means that the network is confident that that neuron should be activating. The same is true if z is very negative. The network, then, doesn't want to radically adjust weights that it is happy with, so the scale of the change in each weight for a neuron is given by the gradient of sigmoid(z), sigmoid_prime(z). Very large z means very small gradient and very small change applied to weights (the gradient of sigmoid is maximized at z = 0, when the network is unconfident about how a neuron should be categorized and when the activation for that neuron is 0.5).
Since I was continually adding on to each neuron's input_sum (z) and never resetting the value for new inputs of dot(weights, activations), the value for z kept growing, continually slowing the rate of change for the weights until weight modification grew to a standstill. I added the following line to cope with this:
self.layer2_inputsums = np.zeros((self.outputLayerSize, 1))
The new posted network can be copy and pasted into an editor and executed so long as you have the numpy module installed. The final line of output to print will be a list of 4 arrays representing final network output. The first two are the pretest values for a negative and positive input, respectively. These should be random. The second two are post-test values to determine how well the network classifies as positive and negative number. A number near 0 denotes negative, near 1 denotes positive.

Categories

Resources