Convergence of backpropagation algorithm : how many iterations? - python

I'm new to the world of neural network, which is very interesting. I wrote the basic algorithm of backpropagation on multi-layer NN to solve small problems.
I use the activation function sigmoid (like most of you I think) (x->1/(1+exp(-x))).
I tried my program on several problems :
The first one is the XOR problem. I took a 3 layers network of size [2,2,1] with one bias neuron in the two first layers (so actually the size is more [3,3,1]).
It tried it with 1000 sets of data (i.e. a couple (0/1, 0/1) and its XOR as output), and the algorithm seemed to converge at an error of 0.5 :( I found it weird so i raised the number to 10000, and as it didn't change anything, to 100000 (in despair :p) and it WORKS ! The error fell down to less that 0.02 in average. Does anybody have an idea why it needs so much data to work?
The second one is the sum problem between two numbers (like 4+8 = ?). I took randomly a [2, 5, 5, 1] network with one bias neuron in the three first layers (so actually the size is more [3, 6, 6, 1]). I put a training data set of 100000 couples of numbers below 100 and their sum. This time, the error does not converge at all, while the output of the network always return the number 1. Have you already seen such situations ? Is it a bug code ? (code that i checked many times but perhaps).
import random
import math
class Network:
def initdata(self):
#weights initialization
self.weights.append([])
self.threshold.append([])
for l in range(1,len(self.layers)):
n = self.layers[l]
thresholdl = []
weightsl = []
for i in range(n):
thresholdl.append(-random.random())
weightsli = []
for j in range(self.layers[l-1]):
weightsli.append(random.random()*2-1)
#adding bias neurons
weightsli.append(thresholdl[-1])
weightsl.append(weightsli)
self.weights.append(weightsl)
self.threshold.append(thresholdl)
def __init__(self, layers):
self.layers = layers
self.weights = []
self.threshold = []
self.initdata()
def activation_function(self, x):
return 1/(1+math.exp(-x))
def outputlayer(self, input, l):
if l==0:
return [input]
output = []
prevoutput = self.outputlayer(input, l-1)
for i in range(self.layers[l]):
f = 0
for k in range(len(prevoutput[-1])):
f += self.weights[l][i][k]*prevoutput[-1][k]
f += self.weights[l][i][-1] #bias weight !
output.append(self.activation_function(f))
return prevoutput+[output]
def layersoutput(self, input):
return self.outputlayer(input, len(self.layers)-1)
def finaloutput(self, input):
return self.layersoutput(input)[-1]
def train(self, data, nu):
for (input, finaloutput) in data:
output = self.layersoutput(input)
err = self.errorvector(finaloutput, output[-1])
self.changeweights(err, output, nu)
def changeweights(self, err, output, nu):
deltas = []
for i in range(len(self.layers)):
deltas.append([])
tempweights = self.weights.copy()
def changeweightslayer(layer):
if layer != len(self.layers)-1:
changeweightslayer(layer+1)
for i in range(self.layers[layer]):
delta = 0
if layer != len(self.layers)-1:
delta = output[layer][i]*(1-output[layer][i])*sum([deltas[layer+1][l]*self.weights[layer+1][l][i] for l in range(self.layers[layer+1])])
else:
delta = output[layer][i]*(1-output[layer][i])*err[i]
deltas[layer].append(delta)
for k in range(len(self.weights[layer][i])-1):
tempweights[layer][i][k] += nu*output[layer-1][k]*delta
tempweights[layer][i][-1] += nu*delta
changeweightslayer(1)
self.weights = tempweights
def quadraticerror(self, a, b):
return sum([(a[i]-b[i])**2 for i in range(len(a))])
def errorvector(self, a, b):
return [a[i]-b[i] for i in range(len(a))]
network = Network([2, 5, 5, 1])
print(network.weights)
data = []
for i in range(1000000):
bit1 = random.randrange(100)
bit2 = random.randrange(100)
data.append(([float(bit1), float(bit2)], [float(bit1+bit2)]))
network.train(data, 0.1)
print(network.weights)

Related

why I get RecursionError: maximum recursion depth exceeded

I was trying to make a neural network with 40 middle neurons.I want to initial Neuron class 40 times by loop and append each iteration to a list and then pass it to final node.but then I've got that when I pass a list like [neuron1,neuron2, ...] it works without any problem but when I pass a list that I've appended it in loop it throws RecursionError: maximum recursion depth exceeded. here is my network initial code:
W1 = Weight('w1', random_weight())
W2 = Weight('w2', random_weight())
neurons = [None] * 2
A = Neuron('A', [i0, i1], [ W1, W2])
neurons[0] = A
B = Neuron('B', [i0, i1], [W1, W2])
neurons[1] = B
out = Neuron('out', [A,B], [W1, W2])
this one works good. but below code has problem !
W1 = Weight('w1', random_weight())
W2 = Weight('w2', random_weight())
neurons = [None] * 2
A = Neuron('A', [i0, i1], [ W1, W2])
neurons[0] = A
B = Neuron('B', [i0, i1], [W1, W2])
neurons[1] = B
out = Neuron('out', neurons, [W1, W2])
here is my Neuron class implementation.
class Neuron(DifferentiableElement):
def __init__(self, name, inputs, input_weights, use_cache=True):
assert len(inputs)==len(input_weights)
for i in range(len(inputs)):
assert isinstance(inputs[i],(Neuron,Input))
assert isinstance(input_weights[i],Weight)
DifferentiableElement.__init__(self)
self.my_name = name
self.my_inputs = inputs # list of Neuron or Input instances
self.my_weights = input_weights # list of Weight instances
self.use_cache = use_cache
self.clear_cache()
self.my_descendant_weights = None
self.my_direct_weights = None
def output(self):
if self.use_cache:
if self.my_output is None:
self.my_output = self.compute_output()
return self.my_output
return self.compute_output()
def compute_output(self):
output = 0
inputs = self.get_inputs()
weights = self.get_weights()
for i in range(len(inputs)):
output += inputs[i].output() * weights[i].get_value()
output = 1 / (1 + math.exp(-1 * output))
return output
You call the output function every time you execute the compute_output function and so on. That's leading to a RecursionError: maximum recursion depth exceeded, becuase there is only a limited time a function can call another function at one direct call.
The exact spot in compute_output is:
def compute_output(self):
…
for i in range(len(inputs)):
output += inputs[i].output() * weights[i].get_value()
you call inputs[i].output() at this point
PS:
Difference between +/ += and append() is mentioned Here

My neural network only returns 0, 1 or other constant value

I'm trying to create a network, that would help predict stock prices the following day. My input data are: open, high, low and close stock values, volume, index values, a few technical indicators and exchange rate; the output is closing price from the next day. I'm using data uploaded from Excel file.
I wrote a program, that I will paste below, but it doesn't seem to be working correctly. Network always returns 1, 0 or other constant value (between 0 - 1).
I took the following steps so far:
tried to normalise the data like so: X_norm = X/(10 ** d) where d is the smallest number for which this conditon is met: abs(X_norm) < 1. I did that for the whole set in Excel before dividing it into training and test.
shuffled the data before dividing it into training/test, so that learning examples are not from consecutive days
running the network on a smaller data set and on example data set (I generated random numbers and did a simple math using them for an output and tried running network with that)
changing amount of hidden neurons
chaninging number of iterations (up to a 1000, which was a lot for my computer considering the data set, so I didn't try any more because it would take too much time)
changing learning rate.
No matter what steps I took the outcome was always the same. I think my problem could be that I don't have a bias, but perhaps I also have other mistakes in my code that are contributing to this error.
My program:
import numpy as np
import pandas as pd
df = pd.read_excel(r"path", sheet_name="DATA", index_col=0, header=0)
df = df.to_numpy()
np.random.shuffle(df)
X_data = df[:, 0:15]
X_data = X_data.reshape(1000, 1, 15)
print(f"X_data: {X_data}")
Y_data = df[:, 15]
Y_data = Y_data.reshape(1000, 1, 1)
print(f"Y_data: {Y_data}")
X = X_data[0:801]
x_test = X_data[801:]
y = Y_data[0:801]
y_test = Y_data[801:]
print(f"X_train: {X}")
print(f"x_test: {x_test}")
print(f"Y_train: {y}")
print(f"y_test: {y_test}")
rate = 0.2
class NeuralNetwork:
def __init__(self):
self.input_neurons = 15
self.hidden1_neurons = 10
self.hidden2_neurons = 5
self.output_neuron = 1
self.input_to_hidden1_w = (np.random.random((self.input_neurons, self.hidden1_neurons))) # 14x30
self.hidden1_to_hidden2_w = (np.random.random((self.hidden1_neurons, self.hidden2_neurons))) # 30x20
self.hidden2_to_output_w = (np.random.random((self.hidden2_neurons, self.output_neuron))) # 20x1
def activation(self, x):
sigmoid = 1/(1+np.exp(-x))
return sigmoid
def activation_d(self, x):
derivative = x * (1 - x)
return derivative
def feed_forward(self, X):
self.z1 = np.dot(X, self.input_to_hidden1_w)
self.z1_a = self.activation(self.z1)
self.z2 = np.dot(self.z1_a, self.hidden1_to_hidden2_w)
self.z2_a = self.activation(self.z2)
self.z3 = np.dot(self.z2_a, self.hidden2_to_output_w)
output = self.activation(self.z3)
return output
def backward(self, X, y, rate, output):
error = y - output
z3_error_delta = error * self.activation_d(output)
z2_error = np.dot(z3_error_delta, np.transpose(self.hidden2_to_output_w))
z2_error_delta = z2_error * self.activation_d(self.z2)
z1_error = np.dot(z2_error_delta, np.transpose(self.hidden1_to_hidden2_w))
z1_error_delta = z1_error * self.activation_d(self.z1)
self.input_to_hidden1_w += rate * np.dot(np.transpose(X), z1_error_delta)
self.hidden1_to_hidden2_w += rate * np.dot(np.transpose(self.z1), z2_error_delta)
self.hidden2_to_output_w += rate * np.dot(np.transpose(self.z2), z3_error_delta)
def train(self, X, y):
output = self.feed_forward(X)
self.backward(X, y, rate, output)
def save_weights(self):
np.savetxt("w1.txt", self.input_to_hidden1_w, fmt="%s")
np.savetxt("w2.txt", self.hidden1_to_hidden2_w, fmt="%s")
np.savetxt("w3.txt", self.hidden2_to_output_w, fmt="%s")
def check(self, x_test, y_test):
self.feed_forward(x_test)
np.mean(np.square((y_test - self.feed_forward(x_test))))
Net = NeuralNetwork()
for l in range(100):
for i, pattern in enumerate(X):
for j, outcome in enumerate(y):
print(f"#: {l}")
print(f'''
# {str(l)}
# {str(X[i])}
# {str(y[j])}''')
print(f"Predicted output: {Net.feed_forward(X[i])}")
Net.train(X[i], y[j])
print(f"Error training: {(np.mean(np.square(y - Net.feed_forward(X))))}")
Net.save_weights()
for i, pattern in enumerate(x_test):
for j, outcome in enumerate(y_test):
Net.check(x_test[i], y_test[j])
print(f"Error test: {(np.mean(np.square(y_test - Net.feed_forward(x_test))))}")

Perceptron single layered

I'm struggling to implement a single layered perceptron: http://en.wikipedia.org/wiki/Perceptron. My program, depending on the weights, either is lost in the learning loop or find wrong weights. As a test case I use logical AND. Could you please give a hind why my perceptron does not converge? This is for my own learning. Thanks.
# learning rate
rate = 0.1
# Test data
# logical AND
# vector = (bias, coordinate1, coordinate2, targetedresult)
testdata = [[1, 0, 0, 0], [1, 0, 1, 0], [1, 1, 0, 0], [1, 1, 1, 1]]
# initial weigths
import random
w = [random.random(), random.random(), random.random()]
print 'initial weigths = ', w
def test(w, vector):
if diff(w, vector) <= 0.1:
return True
else:
return False
def diff(w, vector):
from copy import deepcopy
we = deepcopy(w)
return dirac(sum(we[i]*vector[i] for i in range(3))) - vector[3]
def improve(w, vector):
for i in range(3):
w[i] += rate*diff(w, vector)*vector[i]
return w
def dirac(z):
if z > 0:
return 1
else:
return 0
error = True
while error == True:
discrepancy = 0
for x in testdata:
if not test(w, x):
w = improve(w, x)
discrepancy += 1
if discrepancy == 0:
print 'improved weigths = ', w
error = False
It looks like you need an extra loop surrounding your for loop to iterate the improvement until your solutions converge (step 3 in the Wikipedia page you linked).
As it stands now, you give each training case exactly one chance to update the weights, so it has no chance to converge.
The only glitch I can see is in the activation function. Increase the cut off value, (z > 0.5).
Also, since there are only 4 input cases in each epoch, it very difficult to work with 0 and 1 as the only output. Try removing the dirac function and increasing the threshold to 0.2. It might take longer to learn but will be much more precise. Of course in case of NAND you dont really need to be. But it helps in understanding.

Multiple objects somehow interfering with each other [original version]

I have a neural network (NN) which works perfectly when applied to a single data set. However if I want to run the NN on, for example, one set of data and then create a new instance of the NN to run on different set of data (or even the same set again) then the new instance will produce completely incorrect predictions.
For example, training on an XOR pattern:
test=[[0,0],[0,1],[1,0],[1,1]]
data = [[[0,0], [0]],[[0,1], [0]],[[1,0], [0]],[[1,1], [1]]]
n = NN(2, 3, 1) # Create a neural network with 2 input, 3 hidden and 1 output nodes
n.train(data,500,0.5,0) # Train it for 500 iterations with learning rate 0.5 and momentum 0
prediction = np.zeros((len(test)))
for row in range(len(test)):
prediction[row] = n.runNetwork(test[row])[0]
print prediction
#
# Now do the same thing again but with a new instance and new version of the data.
#
test2=[[0,0],[0,1],[1,0],[1,1]]
data2 = [[[0,0], [0]],[[0,1], [0]],[[1,0], [0]],[[1,1], [1]]]
p = NN(2, 3, 1)
p.train(data2,500,0.5,0)
prediction2 = np.zeros((len(test2)))
for row in range(len(test2)):
prediction2[row] = p.runNetwork(test2[row])[0]
print prediction2
Will output:
[-0.01 -0. -0.06 0.97]
[ 0. 0. 1. 1.]
Notice that the first prediction is quite good where as the second is completely wrong, and I can't see anything wrong with the class:
import math
import random
import itertools
import numpy as np
random.seed(0)
def rand(a, b):
return (b-a)*random.random() + a
def sigmoid(x):
return math.tanh(x)
def dsigmoid(y):
return 1.0 - y**2
class NN:
def __init__(self, ni, nh, no):
# number of input, hidden, and output nodes
self.ni = ni + 1 # +1 for bias node
self.nh = nh + 1
self.no = no
# activations for nodes
self.ai = [1.0]*self.ni
self.ah = [1.0]*self.nh
self.ao = [1.0]*self.no
# create weights (rows=number of features, columns=number of processing nodes)
self.wi = np.zeros((self.ni, self.nh))
self.wo = np.zeros((self.nh, self.no))
# set them to random vaules
for i in range(self.ni):
for j in range(self.nh):
self.wi[i][j] = rand(-5, 5)
for j in range(self.nh):
for k in range(self.no):
self.wo[j][k] = rand(-5, 5)
# last change in weights for momentum
self.ci = np.zeros((self.ni, self.nh))
self.co = np.zeros((self.nh, self.no))
def runNetwork(self, inputs):
if len(inputs) != self.ni-1:
raise ValueError('wrong number of inputs')
# input activations
for i in range(self.ni-1):
#self.ai[i] = sigmoid(inputs[i])
self.ai[i] = inputs[i]
# hidden activations
for j in range(self.nh-1):
sum = 0.0
for i in range(self.ni):
sum = sum + self.ai[i] * self.wi[i][j]
self.ah[j] = sigmoid(sum)
# output activations
for k in range(self.no):
sum = 0.0
for j in range(self.nh):
sum = sum + self.ah[j] * self.wo[j][k]
self.ao[k] = sigmoid(sum)
ao_simplified = [round(a,2) for a in self.ao[:]]
return ao_simplified
def backPropagate(self, targets, N, M):
if len(targets) != self.no:
raise ValueError('wrong number of target values')
# calculate error terms for output
output_deltas = [0.0] * self.no
for k in range(self.no):
error = targets[k]-self.ao[k]
output_deltas[k] = dsigmoid(self.ao[k]) * error
# calculate error terms for hidden
hidden_deltas = [0.0] * self.nh
for j in range(self.nh):
error = 0.0
for k in range(self.no):
error = error + output_deltas[k]*self.wo[j][k]
hidden_deltas[j] = dsigmoid(self.ah[j]) * error
# update output weights
for j in range(self.nh):
for k in range(self.no):
change = output_deltas[k]*self.ah[j]
self.wo[j][k] = self.wo[j][k] + N*change + M*self.co[j][k]
self.co[j][k] = change
#print N*change, M*self.co[j][k]
# update input weights
for i in range(self.ni):
for j in range(self.nh):
change = hidden_deltas[j]*self.ai[i]
self.wi[i][j] = self.wi[i][j] + N*change + M*self.ci[i][j]
self.ci[i][j] = change
# calculate error
error = 0.0
for k in range(len(targets)):
error = error + 0.5*(targets[k]-self.ao[k])**2
return error
def train(self, patterns, iterations=1000, N=0.5, M=0.1):
# N: learning rate
# M: momentum factor
for i in range(iterations):
error = 0.0
for p in patterns:
inputs = p[0]
targets = p[1]
self.runNetwork(inputs)
error = error + self.backPropagate(targets, N, M)
if i % 100 == 0: # Prints error every 100 iterations
print('error %-.5f' % error)
Any help would be greatly appreciated!
Your error -- if there is one -- doesn't have anything to do with the class. As #Daniel Roseman suggested, the natural guess would be that it was a class/instance variable issue, or maybe a mutable default argument, or multiplication of a list, or something, the most common causes of mysterious behaviour.
Here, though, you're getting different results only because you're using different random numbers each time. If you random.seed(0) before you call NN(2,3,1), you get exactly the same results:
error 2.68110
error 0.44049
error 0.39256
error 0.26315
error 0.00584
[ 0.01 0.01 0.07 0.97]
error 2.68110
error 0.44049
error 0.39256
error 0.26315
error 0.00584
[ 0.01 0.01 0.07 0.97]
I can't judge whether your algorithm is right. Incidentally, I think your rand function is reinventing random.uniform.

Backprop implementation issue

What I am supposed to do. I have an black and white image (100x100px):
I am supposed to train a backpropagation neural network with this image. The inputs are x, y coordinates of the image (from 0 to 99) and output is either 1 (white color) or 0 (black color).
Once the network has learned, I would like it to reproduce the image based on its weights and get the closest possible image to the original.
Here is my backprop implementation:
import os
import math
import Image
import random
from random import sample
#------------------------------ class definitions
class Weight:
def __init__(self, fromNeuron, toNeuron):
self.value = random.uniform(-0.5, 0.5)
self.fromNeuron = fromNeuron
self.toNeuron = toNeuron
fromNeuron.outputWeights.append(self)
toNeuron.inputWeights.append(self)
self.delta = 0.0 # delta value, this will accumulate and after each training cycle used to adjust the weight value
def calculateDelta(self, network):
self.delta += self.fromNeuron.value * self.toNeuron.error
class Neuron:
def __init__(self):
self.value = 0.0 # the output
self.idealValue = 0.0 # the ideal output
self.error = 0.0 # error between output and ideal output
self.inputWeights = []
self.outputWeights = []
def activate(self, network):
x = 0.0;
for weight in self.inputWeights:
x += weight.value * weight.fromNeuron.value
# sigmoid function
if x < -320:
self.value = 0
elif x > 320:
self.value = 1
else:
self.value = 1 / (1 + math.exp(-x))
class Layer:
def __init__(self, neurons):
self.neurons = neurons
def activate(self, network):
for neuron in self.neurons:
neuron.activate(network)
class Network:
def __init__(self, layers, learningRate):
self.layers = layers
self.learningRate = learningRate # the rate at which the network learns
self.weights = []
for hiddenNeuron in self.layers[1].neurons:
for inputNeuron in self.layers[0].neurons:
self.weights.append(Weight(inputNeuron, hiddenNeuron))
for outputNeuron in self.layers[2].neurons:
self.weights.append(Weight(hiddenNeuron, outputNeuron))
def setInputs(self, inputs):
self.layers[0].neurons[0].value = float(inputs[0])
self.layers[0].neurons[1].value = float(inputs[1])
def setExpectedOutputs(self, expectedOutputs):
self.layers[2].neurons[0].idealValue = expectedOutputs[0]
def calculateOutputs(self, expectedOutputs):
self.setExpectedOutputs(expectedOutputs)
self.layers[1].activate(self) # activation function for hidden layer
self.layers[2].activate(self) # activation function for output layer
def calculateOutputErrors(self):
for neuron in self.layers[2].neurons:
neuron.error = (neuron.idealValue - neuron.value) * neuron.value * (1 - neuron.value)
def calculateHiddenErrors(self):
for neuron in self.layers[1].neurons:
error = 0.0
for weight in neuron.outputWeights:
error += weight.toNeuron.error * weight.value
neuron.error = error * neuron.value * (1 - neuron.value)
def calculateDeltas(self):
for weight in self.weights:
weight.calculateDelta(self)
def train(self, inputs, expectedOutputs):
self.setInputs(inputs)
self.calculateOutputs(expectedOutputs)
self.calculateOutputErrors()
self.calculateHiddenErrors()
self.calculateDeltas()
def learn(self):
for weight in self.weights:
weight.value += self.learningRate * weight.delta
def calculateSingleOutput(self, inputs):
self.setInputs(inputs)
self.layers[1].activate(self)
self.layers[2].activate(self)
#return round(self.layers[2].neurons[0].value, 0)
return self.layers[2].neurons[0].value
#------------------------------ initialize objects etc
inputLayer = Layer([Neuron() for n in range(2)])
hiddenLayer = Layer([Neuron() for n in range(10)])
outputLayer = Layer([Neuron() for n in range(1)])
learningRate = 0.4
network = Network([inputLayer, hiddenLayer, outputLayer], learningRate)
# let's get the training set
os.chdir("D:/stuff")
image = Image.open("backprop-input.gif")
pixels = image.load()
bbox = image.getbbox()
width = 5#bbox[2] # image width
height = 5#bbox[3] # image height
trainingInputs = []
trainingOutputs = []
b = w = 0
for x in range(0, width):
for y in range(0, height):
if (0, 0, 0, 255) == pixels[x, y]:
color = 0
b += 1
elif (255, 255, 255, 255) == pixels[x, y]:
color = 1
w += 1
trainingInputs.append([float(x), float(y)])
trainingOutputs.append([float(color)])
print "\nOriginal image ... Black:"+str(b)+" White:"+str(w)+"\n"
#------------------------------ let's train
for i in range(500):
for j in range(len(trainingOutputs)):
network.train(trainingInputs[j], trainingOutputs[j])
network.learn()
for w in network.weights:
w.delta = 0.0
#------------------------------ let's check
b = w = 0
for x in range(0, width):
for y in range(0, height):
out = network.calculateSingleOutput([float(x), float(y)])
if 0.0 == round(out):
color = (0, 0, 0, 255)
b += 1
elif 1.0 == round(out):
color = (255, 255, 255, 255)
w += 1
pixels[x, y] = color
#print out
print "\nAfter learning the network thinks ... Black:"+str(b)+" White:"+str(w)+"\n"
Obviously, there is some issue with my implementation. The above code returns:
Original image ... Black:21 White:4
After learning the network thinks ...
Black:25 White:0
It does the same thing if I try to use larger training set (I'm testing just 25 pixels from the image above for testing purposes). It returns that all pixels should be black after learning.
Now, if I use a manual training set like this instead:
trainingInputs = [
[0.0,0.0],
[1.0,0.0],
[2.0,0.0],
[0.0,1.0],
[1.0,1.0],
[2.0,1.0],
[0.0,2.0],
[1.0,2.0],
[2.0,2.0]
]
trainingOutputs = [
[0.0],
[1.0],
[1.0],
[0.0],
[1.0],
[0.0],
[0.0],
[0.0],
[1.0]
]
#------------------------------ let's train
for i in range(500):
for j in range(len(trainingOutputs)):
network.train(trainingInputs[j], trainingOutputs[j])
network.learn()
for w in network.weights:
w.delta = 0.0
#------------------------------ let's check
for inputs in trainingInputs:
print network.calculateSingleOutput(inputs)
The output is for example:
0.0330125791296 # this should be 0, OK
0.953539182136 # this should be 1, OK
0.971854575477 # this should be 1, OK
0.00046146137467 # this should be 0, OK
0.896699762781 # this should be 1, OK
0.112909223162 # this should be 0, OK
0.00034058462280 # this should be 0, OK
0.0929886299643 # this should be 0, OK
0.940489647869 # this should be 1, OK
In other words the network guessed all pixels right (both black and white). Why does it say all pixels should be black if I use actual pixels from the image instead of hard coded training set like the above?
I tried changing the amount of neurons in the hidden layers (up to 100 neurons) with no success.
This is a homework.
This is also a continuation of my previous question about backprop.
It's been a while, but I did get my degree in this stuff, so I think hopefully some of it has stuck.
From what I can tell, you're too deeply overloading your middle layer neurons with the input set. That is, your input set consists of 10,000 discrete input values (100 pix x 100 pix); you're attempting to encode those 10,000 values into 10 neurons. This level of encoding is hard (I suspect it's possible, but certainly hard); at the least, you'd need a LOT of training (more than 500 runs) to get it to reproduce reasonably. Even with 100 neurons for the middle layer, you're looking at a relatively dense compression level going on (100 pixels to 1 neuron).
As to what to do about these problems; well, that's tricky. You can increase your number of middle neurons dramatically, and you'll get a reasonable effect, but of course it'll take a long time to train. However, I think there might be a different solution; if possible, you might consider using polar coordinates instead of cartesian coordinates for the input; quick eyeballing of the input pattern indicates a high level of symmetry, and effectively you'd be looking at a linear pattern with a repeated predictable deformation along the angular coordinate, which it seems would encode nicely in a small number of middle layer neurons.
This stuff is tricky; going for a general solution for pattern encoding (as your original solution does) is very complex, and can usually (even with large numbers of middle layer neurons) require a lot of training passes; on the other hand, some advance heuristic task breakdown and a little bit of problem redefinition (i.e. advance converting from cartesian to polar coordinates) can give good solutions for well defined problem sets. Therein, of course, is the perpetual rub; general solutions are hard to come by, but slightly more specified solutions can be quite nice indeed.
Interesting stuff, in any event!

Categories

Resources