I am performing regression on the iris data set to predict its type. I have successfully performed classification using the same data and same neural network. For classification, I have used tanh as the activation function in all layers. But for regression, I am using tanh function in the hidden layer and identity function in the output layer.
import numpy as np
class BackPropagation:
weight =[]
output =[]
layers =0
eta = 0.1
def __init__(self, x):
self.layers = len(x)
for i in range(self.layers-2):
w = np.random.randn(x[i]+1,x[i+1]+1)
self.weight.append(w)
w = w = np.random.randn(x[-2]+1,x[-1])
self.weight.append(w)
def tanh(self,x):
return np.tanh(x)
def deriv_tanh(self,x):
return 1.0-(x**2)
def linear(self,x):
return x
def deriv_linear(self,x):
return 1
def training(self,in_data,target,epoch=100):
bias = np.atleast_2d(np.ones(in_data.shape[0])*(-1)).T
in_data = np.hstack((in_data,bias))
print("Training Starts ......")
while epoch!=0:
epoch-=1
self.output=[]
self.output.append(in_data)
# FORWARD PHASE
for j in range(self.layers-2):
y_in = np.dot(self.output[j],self.weight[j])
y_out = self.tanh(y_in)
self.output.append(y_out)
y_in = np.dot(self.output[-1],self.weight[-1])
y_out = self.linear(y_in)
self.output.append(y_out)
print("Weight Is")
for i in self.weight:
print(i)
# BACKWARD PHASE
error = self.output[-1]-target
print("ERROR IS")
print(np.mean(0.5*error*error))
delta=[]
delta_o = error * self.deriv_linear(self.output[-1])
delta.append(delta_o)
for k in reversed(range(self.layers-2)):
delta_h = np.dot(delta[-1],self.weight[k+1].T) * self.deriv_tanh(self.output[k+1])
delta.append(delta_h)
delta.reverse()
# WEIGHT UPDATE
for i in range(self.layers-1):
self.weight[i] -= (self.eta * np.dot(self.output[i].T, delta[i]))
print("Training complete !")
print("ACCURACY IS")
acc = (1.0-(0.5*error*error))*100
print(np.mean(acc))
def recall(self,in_data):
in_data = np.atleast_2d(in_data)
bias = np.atleast_2d(np.ones(in_data.shape[0])*(-1)).T
in_data = np.hstack((in_data,bias))
y_out = in_data.copy()
for i in range(self.layers-2):
y_in = np.dot(y_out,self.weight[i])
y_out = self.tanh(y_in).copy()
y_in = np.dot(y_out,self.weight[-1])
y_out = self.linear(y_in).copy()
return y_out
# MAIN
data = np.loadtxt("iris.txt",delimiter=",")
obj = BackPropagation([4,2,1])
in_data = data[:rows,:cols].copy()
target = data[:rows,cols:].copy()
obj.training(in_data,target)
print("ANSWER IS")
print(obj.recall(in_data))
The data set is something like this. Here, first 4 columns are features and last column contains the target value. There are 150 records like this in the data set.
5.1,3.5,1.4,0.2,0
4.9,3.0,1.4,0.2,0
5.0,3.6,1.4,0.2,0
5.4,3.9,1.7,0.4,0
4.6,3.4,1.4,0.3,0
7.0,3.2,4.7,1.4,1
6.4,3.2,4.5,1.5,1
6.9,3.1,4.9,1.5,1
5.5,2.3,4.0,1.3,1
6.3,3.3,6.0,2.5,2
5.8,2.7,5.1,1.9,2
7.1,3.0,5.9,2.1,2
6.3,2.9,5.6,1.8,2
After every epoch, the predicted value is increasing exponentially. And, within 50 epochs, the code gives INF or -INF as output. Instead of identity function, I also tried leaky ReLU, but still the output was INF. I have also tried varying learning rate , number of neurons in hidden layers, number of hidden layers, initial weight values, number of iterations etc.
So, how can I perform regression using neural network with back propagation of error ?
Use the mean squared error function for regression tasks. For classification tasks, one usually uses a softmax layer as output and optimizes the cross-entry cost function.
Related
I'm relatively new to machine learning, and as a starter project, I decided to implement my own neural network from scratch in Python using NumPy. As such, I have manually implemented methods for forward propagation, backpropagation, and calculating function derivatives.
For my testing data, I wrote a function that generates values of sin(x). When I finally create and train my network, my outputs fluctuate quite a lot with each trial and are significantly off the true values(although they are a decent improvement over the initial predictions).
I have tried adjusting quite a few settings, including the learning rate, number of neurons, number of layers, training iterations, and activation function, but I still end up with a squared cost of around 0.1 over my input data.
I think my derivative functions and chain rule expressions are correct since when I use just one input sample I get a near-perfect answer.
Adding more input data, however, significantly reduces the accuracy of the network.
Do you guys have any suggestions for how to improve this network, or is there anything I'm doing wrong currently?
My code:
import numpy as np
#Generate input data for the network
def inputgen():
inputs=[]
outputs=[]
i=0.01
for x in range(10000):
inputs.append([round(i,7)])
outputs.append([np.sin(i)]) #output is sin(x)
i+=0.0001
return [inputs,outputs]
#set training input and output
inputs = np.array(inputgen()[0])
outputs = np.array(inputgen()[1])
#sigmoid activation function and derivative
def sigmoid(x):
return 1/(1+np.exp(-x))
def sigmoid_derivative(x):
return sigmoid(x)*(1-sigmoid(x))
#tanh activation function and derivative
def tanh(x):
return np.tanh(x)
def tanh_derivative(x):
return 1-((tanh(x))**2)
#Layer class
class Layer:
def __init__(self,num_neurons,num_inputs,inputs):
self.num_neurons = num_neurons #number of neurons in hidden layers
self.num_inputs = num_inputs #number of input neurons(1 in the case of testing data)
self.inputs = inputs
self.weights = np.random.rand(num_inputs,num_neurons)*np.sqrt(1/num_inputs) #weights initialized by Xavier function
self.biases = np.zeros((1,num_neurons)) #biases initialized as 0
self.z = np.dot(self.inputs,self.weights)+self.biases #Cacluate z
self.a = tanh(self.z) #Calculate activation
self.dcost_a = [] #derivative of cost with respect to activation
self.da_z = [] #derivative of activation with respect to z
self.dz_w = [] #derivative of z with respect to weight
self.dcost_w = [] #derivative of cost with respect to weight
self.dcost_b = [] #derivative of cost with respect to bias
#functions used in forwardpropagation
def compute_z(self):
self.z = np.dot(self.inputs,self.weights)+self.biases
return self.z
def activation(self):
self.a = tanh(self.compute_z())
def forward(self):
self.activation()
#Network class
class Network:
def __init__(self,num_layers,num_neurons,num_inputs,inputs,num_outputs,outputs):
self.learningrate = 0.01 #learning rate
self.num_layers=num_layers #number of hidden layers
self.num_neurons=num_neurons #number of neurons in hidden layers
self.num_inputs = num_inputs #number of input neurons
self.inputs=inputs
self.expected_outputs=outputs
self.layers=[]
for x in range(num_layers):
if x==0:
self.layers.append(Layer(num_neurons,num_inputs,inputs)) #Initial layer with given inputs
else:
#Other layers have an input which is the activation of previous layer
self.layers.append(Layer(num_neurons,len(self.layers[x-1].a[0]),self.layers[x-1].a))
self.prediction = Layer(num_outputs,num_neurons,self.layers[-1].a) #prediction
self.layers.append(self.prediction)
self.cost = (self.prediction.a-self.expected_outputs)**2 #cost
#forwardpropagation
def forwardprop(self):
for x in range(self.num_layers+1):
if(x!=0):
self.layers[x].inputs=self.layers[x-1].a
self.layers[x].forward()
self.prediction=self.layers[-1] #update prediction value
def backprop(self):
self.cost = (self.prediction.a-self.expected_outputs)**2
for x in range(len(self.layers)-1,-1,-1):
if(x==len(self.layers)-1):
dcost_a = 2*(self.prediction.a-self.expected_outputs) #derivative of cost with respect to activation for output layer
else:
#derivative of cost with respect to activation for hidden layers(chain rule)
dcost_a=np.zeros((len(self.layers[x].inputs),self.num_neurons)).T
dcost_a1=self.layers[x+1].dcost_a.T
da_z1=self.layers[x+1].da_z.T
dz_a=(self.layers[x+1].weights).T
for z in range(len(dcost_a1)):
dcost_a+=((dcost_a1[z])*da_z1)
for j in range(len(dcost_a)):
dcost_a[j]*=dz_a[z][j]
dcost_a=dcost_a.T
self.layers[x].dcost_a=dcost_a
#derivative of activation with respect to z
da_z = tanh_derivative(self.layers[x].z)
self.layers[x].da_z=da_z
#derivative of z with respect to weights
dz_w = []
if x!=0:
dz_w=self.layers[x-1].a
else:
dz_w=self.inputs
self.layers[x].dz_w=dz_w
#change weights and biases
for x in range(len(self.layers)-1,-1,-1):
#Average each of the derivatives over all training samples
self.layers[x].dcost_a=np.average(self.layers[x].dcost_a,axis=0)
self.layers[x].da_z=np.average(self.layers[x].da_z,axis=0)
self.layers[x].dz_w=(np.average(self.layers[x].dz_w,axis=0)).T
self.layers[x].dcost_w = np.zeros((self.layers[x].weights.shape))
self.layers[x].dcost_b = self.layers[x].dcost_a*self.layers[x].da_z
for v in range(len(self.layers[x].dz_w)):
self.layers[x].dcost_w[v] = (self.layers[x].dcost_a*self.layers[x].da_z)*self.layers[x].dz_w[v]
#update weights and biases
self.layers[x].weights-=(self.layers[x].dcost_w)*self.learningrate
self.layers[x].biases-=(self.layers[x].dcost_b)*self.learningrate
#train the network
def train(self):
for x in range(1000):
self.backprop()
self.forwardprop()
Network1 = Network(3,3,1,inputs,1,outputs)
Network1.train()
print(Network1.prediction.a)
Sample input:
[[0.01 ]
[0.0101]
[0.0102]
...
[1.0097]
[1.0098]
[1.0099]]
Sample output:
[[0.37656753]
[0.37658777]
[0.37660802]
...
[0.53088048]
[0.53089046]
[0.53090043]]
Expected output:
[[0.00999983]
[0.01009983]
[0.01019982]
...
[0.84667225]
[0.84672546]
[0.84677865]]
Few things I would recommend to try:
ReLu activation for hidden layers. Tanh may not work so well for
multi-layered network.
If you are doing regression, try linear activation for output layer.
Experiment with different target functions. sin(x) may be crazy
difficult for small neural network to understand. Try something simpler
like polynomials and increase complexity gradually.
I would keep track of the cost_history and update your learning rate as such.
If you have been
- getting closer to the actual value, increase learning rate by 5%
- getting further away, decrease the learning rate by 50%
def update_learning_rate(self):
if(len(self.cost_history) < 2):
return
if(self.cost_history[0] > self.cost_history[1]):
self.learning_rate /= 2
else:
self.learning_rate *= 1.05
this should actually yield surprisingly better results
what usually happens is that you might be getting stuck in one of the local minima (d) and not the absolute minimum (b). Ignore the labels, this is just a random photo I found online.
I'm working through an exercise for augmenting training data and then testing it through the artificial neural network. The idea is to test the accuracy as more data is added to the training set.
I'm using the mnist dataset.
This how I am rotating the image:
def rotate_image(inputs, degree):
## create rotated variations
# rotated anticlockwise by x degrees
inputs_plusx_img = scipy.ndimage.interpolation.rotate(inputs.reshape(28,28), degree, cval=0.01, order=1, reshape=False)
new_inputs1 = inputs_plusx_img.reshape(784)
# rotated clockwise by x degrees
inputs_minusx_img = scipy.ndimage.interpolation.rotate(inputs.reshape(28,28), -degree, cval=0.01, order=1, reshape=False)
new_inputs2 = inputs_minusx_img.reshape(784)
return (new_inputs1, new_inputs2)
degree = 10
df = pd.read_csv(train_file)
#print(df.head())
idx = 100
instance = df.iloc[idx:(idx+1), 1:].values
#print(instance.reshape(28,28))
new_image1, new_image2 = rotate_image(instance, degree)
# show rotated image
image_array = np.asfarray(new_image1).flatten().reshape((28,28))
print(new_image1)
# print the grid in grey scale
plt.imshow(image_array, cmap='Greys', interpolation='None')
Now what I'm not sure about is how to add the new image to the training data set and then add it to the ANN class.
This is my neural network:
class neuralNetwork:
"""Artificial Neural Network classifier.
Parameters
------------
lr : float
Learning rate (between 0.0 and 1.0)
ep : int
Number of epochs
bs : int
Size of the training batch to be used when calculating the gradient descent.
batch_size = 1 standard gradient descent
batch_size > 1 stochastic gradient descent
inodes : int
Number of input nodes which is normally the number of features in an instance.
hnodes : int
Number of hidden nodes in the net.
onodes : int
Number of output nodes in the net.
Attributes
-----------
wih : 2d-array
Input2Hidden node weights after fitting
who : 2d-array
Hidden2Output node weights after fitting
E : list
Sum-of-squares error value in each epoch.
Results : list
Target and predicted class labels for the test data.
Functions
---------
activation_function : float (between 1 and -1)
implments the sigmoid function which squashes the node input
"""
def __init__(self, inputnodes=784, hiddennodes=200, outputnodes=10, learningrate=0.1, batch_size=1, epochs=10):
self.inodes = inputnodes
self.hnodes = hiddennodes
self.onodes = outputnodes
#two weight matrices, wih (input to hidden layer) and who (hidden layer to output)
#a weight on link from node i to node j is w_ij
#Draw random samples from a normal (Gaussian) distribution centered around 0.
#numpy.random.normal(loc to centre gaussian=0.0, scale=1, size=dimensions of the array we want)
#scale is usually set to the standard deviation which is related to the number of incoming links i.e.
#1/sqrt(num of incoming inputs). we use pow to raise it to the power of -0.5.
#We have set 0 as the centre of the guassian dist.
# size is set to the dimensions of the number of hnodes, inodes and onodes for each weight matrix
self.wih = np.random.normal(0.0, pow(self.inodes, -0.5), (self.hnodes, self.inodes))
self.who = np.random.normal(0.0, pow(self.onodes, -0.5), (self.onodes, self.hnodes))
#set the learning rate
self.lr = learningrate
#set the batch size
self.bs = batch_size
#set the number of epochs
self.ep = epochs
#store errors at each epoch
self.E= []
#store results from testing the model
#keep track of the network performance on each test instance
self.results= []
#define the activation function here
#specify the sigmoid squashing function. Here expit() provides the sigmoid function.
#lambda is a short cut function which is executed there and then with no def (i.e. like an anonymous function)
self.activation_function = lambda x: scipy.special.expit(x)
pass
# function to help management of batching for gradient descent
# size of the batch is controled by self,bs
def batch_input(self, X, y): # (self, train_inputs, targets):
"""Yield consecutive batches of the specified size from the input list."""
for i in range(0, len(X), self.bs):
# yield a tuple of the current batched data and labels
yield (X[i:i + self.bs], y[i:i + self.bs])
#train the neural net
#note the first part is very similar to the query function because they both require the forward pass
def train(self, train_inputs, targets_list):
#def train(self, train_inputs):
"""Training the neural net.
This includes the forward pass ; error computation;
backprop of the error ; calculation of gradients and updating the weights.
Parameters
----------
train_inputs : {array-like}, shape = [n_instances, n_features]
Training vectors, where n_instances is the number of training instances and
n_features is the number of features.
Note this contains all features including the class feature which is in first position
Returns
-------
self : object
"""
for e in range(self.ep):
print("Training epoch#: ", e)
sum_error = 0.0
for (batchX, batchY) in self.batch_input(train_inputs, targets_list):
#creating variables to store the gradients
delta_who = 0
delta_wih = 0
# iterate through the inputs sent in
for inputs, targets in zip(batchX, batchY):
#convert inputs list to 2d array
inputs = np.array(inputs, ndmin=2).T
targets = np.array(targets, ndmin=2).T
#calculate signals into hidden layer
hidden_inputs = np.dot(self.wih, inputs)
#calculate the signals emerging from the hidden layer
hidden_outputs = self.activation_function(hidden_inputs)
#calculate signals into final output layer
final_inputs=np.dot(self.who, hidden_outputs)
#calculate the signals emerging from final output layer
final_outputs = self.activation_function(final_inputs)
#to calculate the error we need to compute the element wise diff between target and actual
output_errors = targets - final_outputs
#Next distribute the error to the hidden layer such that hidden layer error
#is the output_errors, split by weights, recombined at hidden nodes
hidden_errors = np.dot(self.who.T, output_errors)
## for each instance accumilate the gradients from each instance
## delta_who are the gradients between hidden and output weights
## delta_wih are the gradients between input and hidden weights
delta_who += np.dot((output_errors * final_outputs * (1.0 - final_outputs)), np.transpose(hidden_outputs))
delta_wih += np.dot((hidden_errors * hidden_outputs * (1.0 - hidden_outputs)), np.transpose(inputs))
sum_error += np.dot(output_errors.T, output_errors)#this is the sum of squared error accumilated over each batced instance
pass #instance
# update the weights by multiplying the gradient with the learning rate
# note that the deltas are divided by batch size to obtain the average gradient according to the given batch
# obviously if batch size = 1 then we simply end up dividing by 1 since each instance forms a singleton batch
self.who += self.lr * (delta_who / self.bs)
self.wih += self.lr * (delta_wih / self.bs)
pass # batch
self.E.append(np.asfarray(sum_error).flatten())
print("errors (SSE): ", self.E[-1])
pass # epoch
#query the neural net
def query(self, inputs_list):
#convert inputs_list to a 2d array
inputs = np.array(inputs_list, ndmin=2).T
#propogate input into hidden layer. This is the start of the forward pass
hidden_inputs = np.dot(self.wih, inputs)
#squash the content in the hidden node using the sigmoid function (value between 1, -1)
hidden_outputs = self.activation_function(hidden_inputs)
#propagate into output layer and the apply the squashing sigmoid function
final_inputs = np.dot(self.who, hidden_outputs)
final_outputs = self.activation_function(final_inputs)
return final_outputs
#iterate through all the test data to calculate model accuracy
def test(self, test_inputs, test_targets):
self.results = []
#go through each test instances
for inputs, target in zip(test_inputs, test_targets):
#query the network with test inputs
#note this returns 10 output values ; of which the index of the highest value
# is the networks predicted class label
outputs = self.query(inputs)
#get the target which has 0.99 as highest value corresponding to the actual class
target_label = np.argmax(target)
#get the index of the highest output node as this corresponds to the predicted class
predict_label = np.argmax(outputs) #this is the class predicted by the ANN
self.results.append([predict_label, target_label])
pass
pass
self.results = np.asfarray(self.results) # flatten results to avoid nested arrays
Functions to per process data and then train network:
def preprocess_data(Xy):
X=[]
y=[]
for instance in Xy:
# split the record by the ',' commas
all_values = instance.split(',')
# scale and shift the inputs
inputs = (np.asfarray(all_values[1:]) / 255.0 * 0.99) + 0.01
# create the target output values (all 0.01, except the desired label which is 0.99)
targets = np.zeros(output_nodes) + 0.01
# all_values[0] is the target label for this record
targets[int(all_values[0])] = 0.99
X.insert(len(X), inputs)
y.insert(len(y), targets)
pass
return(X,y)
pass
mini_training_data = np.random.choice(train_data_list, 60000, replace = False)
print("Percentage of training data used:", (len(mini_training_data)/len(train_data_list)) * 100)
X_train, y_train = preprocess_data(mini_training_data)
X_test, y_test = preprocess_data(test_data_list)
n = neuralNetwork(input_nodes, hidden_nodes, output_nodes, learning_rate, batch_size, epochs)
n.train(X_train, y_train)
n.test(X_test, y_test)
#print network performance as an accuracy metric
correct = 0 # number of predictions that were correct
#iteratre through each tested instance and accumilate number of correct predictions
for result in n.results:
if (result[0] == result[1]):
correct += 1
pass
pass
# print the accuracy on test set
print ("Test set accuracy% = ", (100 * correct / len(n.results)))
At the moment I try to build an Autoencoder for timeseries data in tensorflow. I have nearly 500 days of data where each day have 24 datapoints. Since this is my first try my architecture is very simple. After my input of size 24 the hidden layers are of size: 10; 3; 10 with an output of again 24. I normalized the data (datapoints are in range [-0.5; 0.5]), use the sigmoid activation function and the RMSPropOptimizer.
After training (loss function in picture) the output is the same for every timedata i give into the network. Does someone know what is the reason for that? Is it possible that my Dataset is the issue (code below)?
class TimeDataset:
def __init__(self,data):
self._index_in_epoch = 0
self._epochs_completed = 0
self._data = data
self._num_examples = data.shape[0]
pass
#property
def data(self):
return self._data
def next_batch(self, batch_size, shuffle=True):
start = self._index_in_epoch
# first call
if start == 0 and self._epochs_completed == 0:
idx = np.arange(0, self._num_examples) # get all possible indexes
np.random.shuffle(idx) # shuffle indexe
self._data = self.data[idx] # get list of `num` random samples
if start + batch_size > self._num_examples:
# not enough samples left -> go to the next batch
self._epochs_completed += 1
rest_num_examples = self._num_examples - start
data_rest_part = self.data[start:self._num_examples]
idx0 = np.arange(0, self._num_examples) # get all possible indexes
np.random.shuffle(idx0) # shuffle indexes
self._data = self.data[idx0] # get list of `num` random samples
start = 0
self._index_in_epoch = batch_size - rest_num_examples #avoid the case where the #sample != integar times of batch_size
end = self._index_in_epoch
data_new_part = self._data[start:end]
return np.concatenate((data_rest_part, data_new_part), axis=0)
else:
# get next batch
self._index_in_epoch += batch_size
end = self._index_in_epoch
return self._data[start:end]
*edit: here are some examples of the output (red original, blue reconstructed):
**edit: I just saw an autoencoder example with a more complicant luss function than mine. Someone know if the loss function self.loss = tf.reduce_mean(tf.pow(self.X - self.decoded, 2)) is sufficient?
***edit: some more code to describe my training
This is my Autoencoder Class:
class AutoEncoder():
def __init__(self):
# Training Parameters
self.learning_rate = 0.005
self.alpha = 0.5
# Network Parameters
self.num_input = 24 # one day as input
self.num_hidden_1 = 10 # 2nd layer num features
self.num_hidden_2 = 3 # 2nd layer num features (the latent dim)
self.X = tf.placeholder("float", [None, self.num_input])
self.weights = {
'encoder_h1': tf.Variable(tf.random_normal([self.num_input, self.num_hidden_1])),
'encoder_h2': tf.Variable(tf.random_normal([self.num_hidden_1, self.num_hidden_2])),
'decoder_h1': tf.Variable(tf.random_normal([self.num_hidden_2, self.num_hidden_1])),
'decoder_h2': tf.Variable(tf.random_normal([self.num_hidden_1, self.num_input])),
}
self.biases = {
'encoder_b1': tf.Variable(tf.random_normal([self.num_hidden_1])),
'encoder_b2': tf.Variable(tf.random_normal([self.num_hidden_2])),
'decoder_b1': tf.Variable(tf.random_normal([self.num_hidden_1])),
'decoder_b2': tf.Variable(tf.random_normal([self.num_input])),
}
self.encoded = self.encoder(self.X)
self.decoded = self.decoder(self.encoded)
# Define loss and optimizer, minimize the squared error
self.loss = tf.reduce_mean(tf.pow(self.X - self.decoded, 2))
self.optimizer = tf.train.RMSPropOptimizer(self.learning_rate).minimize(self.loss)
def encoder(self, x):
# sigmoid, tanh, relu
en_layer_1 = tf.nn.sigmoid (tf.add(tf.matmul(x, self.weights['encoder_h1']),
self.biases['encoder_b1']))
en_layer_2 = tf.nn.sigmoid (tf.add(tf.matmul(en_layer_1, self.weights['encoder_h2']),
self.biases['encoder_b2']))
return en_layer_2
def decoder(self, x):
de_layer_1 = tf.nn.sigmoid (tf.add(tf.matmul(x, self.weights['decoder_h1']),
self.biases['decoder_b1']))
de_layer_2 = tf.nn.sigmoid (tf.add(tf.matmul(de_layer_1, self.weights['decoder_h2']),
self.biases['decoder_b2']))
return de_layer_2
and this is how I train my network (input data have shape (number_days, 24)):
model = autoencoder.AutoEncoder()
num_epochs = 3
batch_size = 50
num_batches = 300
display_batch = 50
examples_to_show = 16
loss_values = []
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
#training
for e in range(1, num_epochs+1):
print('starting epoch {}'.format(e))
for b in range(num_batches):
# get next batch of data
batch_x = dataset.next_batch(batch_size)
# Run optimization op (backprop) and cost op (to get loss value)
l = sess.run([model.loss], feed_dict={model.X: batch_x})
sess.run(model.optimizer, feed_dict={model.X: batch_x})
# Display logs
if b % display_batch == 0:
print('Epoch {}: Batch ({}) Loss: {}'.format(e, b, l))
loss_values.append(l)
# testing
test_data = dataset.next_batch(batch_size)
decoded_test_data = sess.run(model.decoded, feed_dict={model.X: test_data})
Just a suggestion, I have had some issues with autoencoders using the sigmoid function.
I switched to tanh or relu and those improved the results.
With the autoencoder it is basically learning to recreate the output from the input, by encoding and decoding. If you mean it's the same as the input, then you are getting what you want. It has learned the data set.
Ultimately you can compare by reviewing the Mean Squared Error between the input and output and see if it is exactly the same. If you mean that the output is exactly the same regardless of the input, that isn't something I've run into. I guess if your input doesn't vary much from day to day, then I could imagine that would have some impact. Are you looking for anomalies?
Also, if you have a time series for training, I wouldn't shuffle the data in this particular case. If the temporal order is significant, you introduce data leakage (basically introducing future data into the training set) depending on what you are trying to achieve.
Ah, I didn't initially see your post with the graph results.. thanks for adding.
The sigmoid output is floored at 0, so it cannot reproduce your data that is below 0.
If you want to use a sigmoid output, then rescale your data between ]0;1[ (0 and 1 excluded).
I know this is a very old post, so this is just an attempt to help whoever wonders here again with the same problem.... If the autoencoder is converging to the same encoding for all the different instances, there may be a problem in the loss function.... Check the size and shape of the return of the loss function, as it may be getting confused and evaluating the wrong tensors (i.e. you may need to transpose something somewhere) Basically, assuming you are using the autoencoder to encode M features of N training instances, your loss function should return N values. the size of your loss tensor should be the amount of instances in your training set. I found that the hard way.....
I've created a neural network to estimate the sin(x) function for an input x. The network has 21 output neurons (representing numbers -1.0, -0.9, ..., 0.9, 1.0) with numpy that does not learn, as I think I implemented the neuron architecture incorrectly when I defined the feedforward mechanism.
When I execute the code, the amount of test data it estimates correctly sits around 48/1000. This happens to be the average data point count per category if you split 1000 test data points between 21 categories. Looking at the network output, you can see that the network seems to just start picking a single output value for every input. For example, it may pick -0.5 as the estimate for y regardless of the x you give it. Where did I go wrong here? This is my first network. Thank you!
import random
import numpy as np
import math
class Network(object):
def __init__(self,inputLayerSize,hiddenLayerSize,outputLayerSize):
#Create weight vector arrays to represent each layer size and initialize indices randomly on a Gaussian distribution.
self.layer1 = np.random.randn(hiddenLayerSize,inputLayerSize)
self.layer1_activations = np.zeros((hiddenLayerSize, 1))
self.layer2 = np.random.randn(outputLayerSize,hiddenLayerSize)
self.layer2_activations = np.zeros((outputLayerSize, 1))
self.outputLayerSize = outputLayerSize
self.inputLayerSize = inputLayerSize
self.hiddenLayerSize = hiddenLayerSize
# print(self.layer1)
# print()
# print(self.layer2)
# self.weights = [np.random.randn(y,x)
# for x, y in zip(sizes[:-1], sizes[1:])]
def feedforward(self, network_input):
#Propogate forward through network as if doing this by hand.
#first layer's output activations:
for neuron in range(self.hiddenLayerSize):
self.layer1_activations[neuron] = 1/(1+np.exp(network_input * self.layer1[neuron]))
#second layer's output activations use layer1's activations as input:
for neuron in range(self.outputLayerSize):
for weight in range(self.hiddenLayerSize):
self.layer2_activations[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
self.layer2_activations[neuron] = 1/(1+np.exp(self.layer2_activations[neuron]))
#convert layer 2 activation numbers to a single output. The neuron (weight vector) with highest activation will be output.
outputs = [x / 10 for x in range(-int((self.outputLayerSize/2)), int((self.outputLayerSize/2))+1, 1)] #range(-10, 11, 1)
return(outputs[np.argmax(self.layer2_activations)])
def train(self, training_pairs, epochs, minibatchsize, learn_rate):
#apply gradient descent
test_data = build_sinx_data(1000)
for epoch in range(epochs):
random.shuffle(training_pairs)
minibatches = [training_pairs[k:k + minibatchsize] for k in range(0, len(training_pairs), minibatchsize)]
for minibatch in minibatches:
loss = 0 #calculate loss for each minibatch
#Begin training
for x, y in minibatch:
network_output = self.feedforward(x)
loss += (network_output - y) ** 2
#adjust weights by abs(loss)*sigmoid(network_output)*(1-sigmoid(network_output)*learn_rate
loss /= (2*len(minibatch))
adjustWeights = loss*(1/(1+np.exp(-network_output)))*(1-(1/(1+np.exp(-network_output))))*learn_rate
self.layer1 += adjustWeights
#print(adjustWeights)
self.layer2 += adjustWeights
#when line 63 placed here, results did not improve during minibatch.
print("Epoch {0}: {1}/{2} correct".format(epoch, self.evaluate(test_data), len(test_data)))
print("Training Complete")
def evaluate(self, test_data):
"""
Returns number of test inputs which network evaluates correctly.
The ouput assumed to be neuron in output layer with highest activation
:param test_data: test data set identical in form to train data set.
:return: integer sum
"""
correct = 0
for x, y in test_data:
output = self.feedforward(x)
if output == y:
correct+=1
return(correct)
def build_sinx_data(data_points):
"""
Creates a list of tuples (x value, expected y value) for Sin(x) function.
:param data_points: number of desired data points
:return: list of tuples (x value, expected y value
"""
x_vals = []
y_vals = []
for i in range(data_points):
#parameter of randint signifies range of x values to be used*10
x_vals.append(random.randint(-2000,2000)/10)
y_vals.append(round(math.sin(x_vals[i]),1))
return (list(zip(x_vals,y_vals)))
# training_pairs, epochs, minibatchsize, learn_rate
sinx_test = Network(1,21,21)
print(sinx_test.feedforward(10))
sinx_test.train(build_sinx_data(600),20,10,2)
print(sinx_test.feedforward(10))
I didn't examine thoroughly all of your code, but some issues are clearly visible:
* operator doesn't perform matrix multiplication in numpy, you have to use numpy.dot. This affects, for instance, these lines: network_input * self.layer1[neuron], self.layer1_activations[weight]*self.layer2[neuron][weight], etc.
Seems like you are solving your problem via classification (selecting 1 out of 21 classes), but using L2 loss. This is somewhat mixed up. You have two options: either stick to classification and use a cross entropy loss function, or perform regression (i.e. predict the numeric value) with L2 loss.
You should definitely extract sigmoid function to avoid writing the same expression all over again:
def sigmoid(z):
return 1 / (1 + np.exp(-z))
def sigmoid_derivative(x):
return sigmoid(x) * (1 - sigmoid(x))
You perform the same update of self.layer1 and self.layer2, which clearly wrong. Take some time analyzing how exactly backpropagation works.
I edited how my loss function was integrated into my function and also correctly implemented gradient descent. I also removed the use of mini-batches and simplified what my network was trying to do. I now have a network which attempts to classify something as even or odd.
Some extremely helpful guides I used to fix things up:
Chapter 1 and 2 of Neural Networks and Deep Learning, by Michael Nielsen, available for free at http://neuralnetworksanddeeplearning.com/chap1.html . This book gives thorough explanations for how Neural Nets work, including breakdowns of the math behind their execution.
Backpropagation from the Beginning, by Erik Hallström, linked by Maxim. https://medium.com/#erikhallstrm/backpropagation-from-the-beginning-77356edf427d
. Not as thorough as the above guide, but I kept both open concurrently, as this guide is more to the point about what is important and how to apply the mathematical formulas that are thoroughly explained in Nielsen's book.
How to build a simple neural network in 9 lines of Python code https://medium.com/technology-invention-and-more/how-to-build-a-simple-neural-network-in-9-lines-of-python-code-cc8f23647ca1
. A useful and fast introduction to some neural networking basics.
Here is my (now functioning) code:
import random
import numpy as np
import scipy
import math
class Network(object):
def __init__(self,inputLayerSize,hiddenLayerSize,outputLayerSize):
#Layers represented both by their weights array and activation and inputsums vectors.
self.layer1 = np.random.randn(hiddenLayerSize,inputLayerSize)
self.layer2 = np.random.randn(outputLayerSize,hiddenLayerSize)
self.layer1_activations = np.zeros((hiddenLayerSize, 1))
self.layer2_activations = np.zeros((outputLayerSize, 1))
self.layer1_inputsums = np.zeros((hiddenLayerSize, 1))
self.layer2_inputsums = np.zeros((outputLayerSize, 1))
self.layer1_errorsignals = np.zeros((hiddenLayerSize, 1))
self.layer2_errorsignals = np.zeros((outputLayerSize, 1))
self.layer1_deltaw = np.zeros((hiddenLayerSize, inputLayerSize))
self.layer2_deltaw = np.zeros((outputLayerSize, hiddenLayerSize))
self.outputLayerSize = outputLayerSize
self.inputLayerSize = inputLayerSize
self.hiddenLayerSize = hiddenLayerSize
print()
print(self.layer1)
print()
print(self.layer2)
print()
# self.weights = [np.random.randn(y,x)
# for x, y in zip(sizes[:-1], sizes[1:])]
def feedforward(self, network_input):
#Calculate inputsum and and activations for each neuron in the first layer
for neuron in range(self.hiddenLayerSize):
self.layer1_inputsums[neuron] = network_input * self.layer1[neuron]
self.layer1_activations[neuron] = self.sigmoid(self.layer1_inputsums[neuron])
# Calculate inputsum and and activations for each neuron in the second layer. Notice that each neuron in the second layer represented by
# weights vector, consisting of all weights leading out of the kth neuron in (l-1) layer to the jth neuron in layer l.
self.layer2_inputsums = np.zeros((self.outputLayerSize, 1))
for neuron in range(self.outputLayerSize):
for weight in range(self.hiddenLayerSize):
self.layer2_inputsums[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
self.layer2_activations[neuron] = self.sigmoid(self.layer2_inputsums[neuron])
return self.layer2_activations
def interpreted_output(self, network_input):
#convert layer 2 activation numbers to a single output. The neuron (weight vector) with highest activation will be output.
self.feedforward(network_input)
outputs = [x / 10 for x in range(-int((self.outputLayerSize/2)), int((self.outputLayerSize/2))+1, 1)] #range(-10, 11, 1)
return(outputs[np.argmax(self.layer2_activations)])
# def build_expected_output(self, training_data):
# #Views expected output number y for each x to generate an expected output vector from the network
# index=0
# for pair in training_data:
# expected_output_vector = np.zeros((self.outputLayerSize,1))
# x = training_data[0]
# y = training_data[1]
# for i in range(-int((self.outputLayerSize / 2)), int((self.outputLayerSize / 2)) + 1, 1):
# if y == i / 10:
# expected_output_vector[i] = 1
# #expect the target category to be a 1.
# break
# training_data[index][1] = expected_output_vector
# index+=1
# return training_data
def train(self, training_data, learn_rate):
self.backpropagate(training_data, learn_rate)
def backpropagate(self, train_data, learn_rate):
#Perform for each x,y pair.
for datapair in range(len(train_data)):
x = train_data[datapair][0]
y = train_data[datapair][1]
self.feedforward(x)
# print("l2a " + str(self.layer2_activations))
# print("l1a " + str(self.layer1_activations))
# print("l2 " + str(self.layer2))
# print("l1 " + str(self.layer1))
for neuron in range(self.outputLayerSize):
#Calculate first error equation for error signals of output layer neurons
self.layer2_errorsignals[neuron] = (self.layer2_activations[neuron] - y[neuron]) * self.sigmoid_prime(self.layer2_inputsums[neuron])
#Use recursive formula to calculate error signals of hidden layer neurons
self.layer1_errorsignals = np.multiply(np.array(np.matrix(self.layer2.T) * np.matrix(self.layer2_errorsignals)) , self.sigmoid_prime(self.layer1_inputsums))
#print(self.layer1_errorsignals)
# for neuron in range(self.hiddenLayerSize):
# #Use recursive formula to calculate error signals of hidden layer neurons
# self.layer1_errorsignals[neuron] = np.multiply(self.layer2[neuron].T,self.layer2_errorsignals[neuron]) * self.sigmoid_prime(self.layer1_inputsums[neuron])
#Partial derivative of C with respect to weight for connection from kth neuron in (l-1)th layer to jth neuron in lth layer is
#(jth error signal in lth layer) * (kth activation in (l-1)th layer.)
#Update all weights for network at each iteration of a training pair.
#Update weights in second layer
for neuron in range(self.outputLayerSize):
for weight in range(self.hiddenLayerSize):
self.layer2_deltaw[neuron][weight] = self.layer2_errorsignals[neuron]*self.layer1_activations[weight]*(-learn_rate)
self.layer2 += self.layer2_deltaw
#Update weights in first layer
for neuron in range(self.hiddenLayerSize):
self.layer1_deltaw[neuron] = self.layer1_errorsignals[neuron]*(x)*(-learn_rate)
self.layer1 += self.layer1_deltaw
#Comment/Uncomment to enable error evaluation.
#print("Epoch {0}: Error: {1}".format(datapair, self.evaluate(test_data)))
# print("l2a " + str(self.layer2_activations))
# print("l1a " + str(self.layer1_activations))
# print("l1 " + str(self.layer1))
# print("l2 " + str(self.layer2))
def evaluate(self, test_data):
error = 0
for x, y in test_data:
#x is integer, y is single element np.array
output = self.feedforward(x)
error += y - output
return error
#eval function for sin(x)
# def evaluate(self, test_data):
# """
# Returns number of test inputs which network evaluates correctly.
# The ouput assumed to be neuron in output layer with highest activation
# :param test_data: test data set identical in form to train data set.
# :return: integer sum
# """
# correct = 0
# for x, y in test_data:
# outputs = [x / 10 for x in range(-int((self.outputLayerSize / 2)), int((self.outputLayerSize / 2)) + 1,
# 1)] # range(-10, 11, 1)
# newy = outputs[np.argmax(y)]
# output = self.interpreted_output(x)
# #print("output: " + str(output))
# if output == newy:
# correct+=1
# return(correct)
def sigmoid(self, z):
return 1 / (1 + np.exp(-z))
def sigmoid_prime(self, z):
return (1 - self.sigmoid(z)) * self.sigmoid(z)
def build_simple_data(data_points):
x_vals = []
y_vals = []
for each in range(data_points):
x = random.randint(-3,3)
expected_output_vector = np.zeros((1, 1))
if x > 0:
expected_output_vector[[0]] = 1
else:
expected_output_vector[[0]] = 0
x_vals.append(x)
y_vals.append(expected_output_vector)
print(list(zip(x_vals,y_vals)))
print()
return (list(zip(x_vals,y_vals)))
simpleNet = Network(1, 3, 1)
# print("Pretest")
# print(simpleNet.feedforward(-3))
# print(simpleNet.feedforward(10))
# init_weights_l1 = simpleNet.layer1
# init_weights_l2 = simpleNet.layer2
# simpleNet.train(build_simple_data(10000),.1)
# #sometimes Error converges to 0, sometimes error converges to 10.
# print("Initial Weights:")
# print(init_weights_l1)
# print(init_weights_l2)
# print("Final Weights")
# print(simpleNet.layer1)
# print(simpleNet.layer2)
# print("Post-test")
# print(simpleNet.feedforward(-3))
# print(simpleNet.feedforward(10))
def test_network(iterations,net,training_points):
"""
Casually evaluates pre and post test
:param iterations: number of trials to be run
:param net: name of network to evaluate.
;param training_points: size of training data to be used
:return: four 1x1 arrays.
"""
pretest_negative = 0
pretest_positive = 0
posttest_negative = 0
posttest_positive = 0
for each in range(iterations):
pretest_negative += net.feedforward(-10)
pretest_positive += net.feedforward(10)
net.train(build_simple_data(training_points),.1)
for each in range(iterations):
posttest_negative += net.feedforward(-10)
posttest_positive += net.feedforward(10)
return(pretest_negative/iterations, pretest_positive/iterations, posttest_negative/iterations, posttest_positive/iterations)
print(test_network(10000, simpleNet, 10000))
While much differs between this code and the code posted in the OP, there is a particular difference that is interesting. In the original feedforward method notice
#second layer's output activations use layer1's activations as input:
for neuron in range(self.outputLayerSize):
for weight in range(self.hiddenLayerSize):
self.layer2_activations[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
self.layer2_activations[neuron] = 1/(1+np.exp(self.layer2_activations[neuron]))
The line
self.layer2_activations[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
Resembles
self.layer2_inputsums[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
In the updated code. This line performs the dot product between each weight vector and each input vector (the activations from layer 1) to arrive at the input_sum for a neuron, commonly referred to as z (think sigmoid(z)). In my network, the derivative of the sigmoid function, sigmoid_prime, is used to calculate the gradient of the cost function with respect to all the weights. By multiplying sigmoid_prime(z) * network error between actual and expected output. If z is very big (and positive), the neuron will have an activation value very close to 1. That means that the network is confident that that neuron should be activating. The same is true if z is very negative. The network, then, doesn't want to radically adjust weights that it is happy with, so the scale of the change in each weight for a neuron is given by the gradient of sigmoid(z), sigmoid_prime(z). Very large z means very small gradient and very small change applied to weights (the gradient of sigmoid is maximized at z = 0, when the network is unconfident about how a neuron should be categorized and when the activation for that neuron is 0.5).
Since I was continually adding on to each neuron's input_sum (z) and never resetting the value for new inputs of dot(weights, activations), the value for z kept growing, continually slowing the rate of change for the weights until weight modification grew to a standstill. I added the following line to cope with this:
self.layer2_inputsums = np.zeros((self.outputLayerSize, 1))
The new posted network can be copy and pasted into an editor and executed so long as you have the numpy module installed. The final line of output to print will be a list of 4 arrays representing final network output. The first two are the pretest values for a negative and positive input, respectively. These should be random. The second two are post-test values to determine how well the network classifies as positive and negative number. A number near 0 denotes negative, near 1 denotes positive.
I am trying to detect micro-events in a long time series. For this purpose, I will train a LSTM network.
Data. Input for each time sample is 11 different features somewhat normalized to fit 0-1. Output will be either one of two classes.
Batching. Due to huge class imbalance I have extracted the data in batches of each 60 time samples, of which at least 5 will always be class 1, and the rest class to. In this way the class imbalance is reduced from 150:1 to around 12:1 I have then randomized the order of all my batches.
Model. I am attempting to train an LSTM, with initial configuration of 3 different cells with 5 delay steps. I expect the micro events to arrive in sequences of at least 3 time steps.
Problem: When I try to train the network it will quickly converge towards saying that EVERYTHING belongs to the majority class. When I implement a weighted loss function, at some certain threshold it will change to saying that EVERYTHING belongs to the minority class. I suspect (without being expert) that there is no learning in my LSTM cells, or that my configuration is off?
Below is the code for my implementation. I am hoping that someone can tell me
Is my implementation correct?
What other reasons could there be for such behaviour?
ar_model.py
import numpy as np
import tensorflow as tf
from tensorflow.models.rnn import rnn
import ar_config
config = ar_config.get_config()
class ARModel(object):
def __init__(self, is_training=False, config=None):
# Config
if config is None:
config = ar_config.get_config()
# Placeholders
self._features = tf.placeholder(tf.float32, [None, config.num_features], name='ModelInput')
self._targets = tf.placeholder(tf.float32, [None, config.num_classes], name='ModelOutput')
# Hidden layer
with tf.variable_scope('lstm') as scope:
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(config.num_hidden, forget_bias=0.0)
cell = tf.nn.rnn_cell.MultiRNNCell([lstm_cell] * config.num_delays)
self._initial_state = cell.zero_state(config.batch_size, dtype=tf.float32)
outputs, state = rnn.rnn(cell, [self._features], dtype=tf.float32)
# Output layer
output = outputs[-1]
softmax_w = tf.get_variable('softmax_w', [config.num_hidden, config.num_classes], tf.float32)
softmax_b = tf.get_variable('softmax_b', [config.num_classes], tf.float32)
logits = tf.matmul(output, softmax_w) + softmax_b
# Evaluate
ratio = (60.00 / 5.00)
class_weights = tf.constant([ratio, 1 - ratio])
weighted_logits = tf.mul(logits, class_weights)
loss = tf.nn.softmax_cross_entropy_with_logits(weighted_logits, self._targets)
self._cost = cost = tf.reduce_mean(loss)
self._predict = tf.argmax(tf.nn.softmax(logits), 1)
self._correct = tf.equal(tf.argmax(logits, 1), tf.argmax(self._targets, 1))
self._accuracy = tf.reduce_mean(tf.cast(self._correct, tf.float32))
self._final_state = state
if not is_training:
return
# Optimize
optimizer = tf.train.AdamOptimizer()
self._train_op = optimizer.minimize(cost)
#property
def features(self):
return self._features
#property
def targets(self):
return self._targets
#property
def cost(self):
return self._cost
#property
def accuracy(self):
return self._accuracy
#property
def train_op(self):
return self._train_op
#property
def predict(self):
return self._predict
#property
def initial_state(self):
return self._initial_state
#property
def final_state(self):
return self._final_state
ar_train.py
import os
from datetime import datetime
import numpy as np
import tensorflow as tf
from tensorflow.python.platform import gfile
import ar_network
import ar_config
import ar_reader
config = ar_config.get_config()
def main(argv=None):
if gfile.Exists(config.train_dir):
gfile.DeleteRecursively(config.train_dir)
gfile.MakeDirs(config.train_dir)
train()
def train():
train_data = ar_reader.ArousalData(config.train_data, num_steps=config.max_steps)
test_data = ar_reader.ArousalData(config.test_data, num_steps=config.max_steps)
with tf.Graph().as_default(), tf.Session() as session, tf.device('/cpu:0'):
initializer = tf.random_uniform_initializer(minval=-0.1, maxval=0.1)
with tf.variable_scope('model', reuse=False, initializer=initializer):
m = ar_network.ARModel(is_training=True)
s = tf.train.Saver(tf.all_variables())
tf.initialize_all_variables().run()
for batch_input, batch_target in train_data:
step = train_data.iter_steps
dict = {
m.features: batch_input,
m.targets: batch_target
}
session.run(m.train_op, feed_dict=dict)
state, cost, accuracy = session.run([m.final_state, m.cost, m.accuracy], feed_dict=dict)
if not step % 10:
test_input, test_target = test_data.next()
test_accuracy = session.run(m.accuracy, feed_dict={
m.features: test_input,
m.targets: test_target
})
now = datetime.now().time()
print ('%s | Iter %4d | Loss= %.5f | Train= %.5f | Test= %.3f' % (now, step, cost, accuracy, test_accuracy))
if not step % 1000:
destination = os.path.join(config.train_dir, 'ar_model.ckpt')
s.save(session, destination)
if __name__ == '__main__':
tf.app.run()
ar_config.py
class Config(object):
# Directories
train_dir = '...'
ckpt_dir = '...'
train_data = '...'
test_data = '...'
# Data
num_features = 13
num_classes = 2
batch_size = 60
# Model
num_hidden = 3
num_delays = 5
# Training
max_steps = 100000
def get_config():
return Config()
UPDATED ARCHITECTURE:
# Placeholders
self._features = tf.placeholder(tf.float32, [None, config.num_features, config.num_delays], name='ModelInput')
self._targets = tf.placeholder(tf.float32, [None, config.num_output], name='ModelOutput')
# Weights
weights = {
'hidden': tf.get_variable('w_hidden', [config.num_features, config.num_hidden], tf.float32),
'out': tf.get_variable('w_out', [config.num_hidden, config.num_classes], tf.float32)
}
biases = {
'hidden': tf.get_variable('b_hidden', [config.num_hidden], tf.float32),
'out': tf.get_variable('b_out', [config.num_classes], tf.float32)
}
#Layer in
with tf.variable_scope('input_hidden') as scope:
inputs = self._features
inputs = tf.transpose(inputs, perm=[2, 0, 1]) # (BatchSize,NumFeatures,TimeSteps) -> (TimeSteps,BatchSize,NumFeatures)
inputs = tf.reshape(inputs, shape=[-1, config.num_features]) # (TimeSteps,BatchSize,NumFeatures -> (TimeSteps*BatchSize,NumFeatures)
inputs = tf.add(tf.matmul(inputs, weights['hidden']), biases['hidden'])
#Layer hidden
with tf.variable_scope('hidden_hidden') as scope:
inputs = tf.split(0, config.num_delays, inputs) # -> n_steps * (batchsize, features)
cell = tf.nn.rnn_cell.BasicLSTMCell(config.num_hidden, forget_bias=0.0)
self._initial_state = cell.zero_state(config.batch_size, dtype=tf.float32)
outputs, state = rnn.rnn(cell, inputs, dtype=tf.float32)
#Layer out
with tf.variable_scope('hidden_output') as scope:
output = outputs[-1]
logits = tf.add(tf.matmul(output, weights['out']), biases['out'])
Odd elements
Weighted loss
I am not sure your "weighted loss" does what you want it to do:
ratio = (60.00 / 5.00)
class_weights = tf.constant([ratio, 1 - ratio])
weighted_logits = tf.mul(logits, class_weights)
this is applied before calculating the loss function (further I think you wanted an element-wise multiplication as well? also your ratio is above 1 which makes the second part negative?) so it forces your predictions to behave in a certain way before applying the softmax.
If you want weighted loss you should apply this after
loss = tf.nn.softmax_cross_entropy_with_logits(weighted_logits, self._targets)
with some element-wise multiplication of your weights.
loss = loss * weights
Where your weights have a shape like [2,]
However, I would not recommend you to use weighted losses. Perhaps try increasing the ratio even further than 1:6.
Architecture
As far as I can read, you are using 5 stacked LSTMs with 3 hidden units per layer?
Try removing the multi rnn and just use a single LSTM/GRU (maybe even just a vanilla RNN) and jack the hidden units up to ~100-1000.
Debugging
Often when you are facing problems with an odd behaving network, it can be a good idea to:
Print everything
Literally print the shapes and values of every tensor in your model, use sess to fetch it and then print it. Your input data, the first hidden representation, your predictions, your losses etc.
You can also use tensorflows tf.Print() x_tensor = tf.Print(x_tensor, [tf.shape(x_tensor)])
Use tensorboard
Using tensorboard summaries on your gradients, accuracy metrics and histograms will reveal patterns in your data that might explain certain behavior, such as what lead to exploding weights. Like maybe your forget bias goes to infinity or your not tracking gradient through a certain layer etc.
Other questions
How large is your dataset?
How long are your sequences?
Are the 13 features categorical or continuous? You should not normalize categorical variables or represent them as integers, instead you should use one-hot encoding.
Gunnar has already made lots of good suggestions. A few more small things worth paying attention to in general for this sort of architecture:
Try tweaking the Adam learning rate. You should determine the proper learning rate by cross-validation; as a rough start, you could just check whether a smaller learning rate saves your model from crashing on the training data.
You should definitely use more hidden units. It's cheap to try larger networks when you first start out on a dataset. Go as large as necessary to avoid the underfitting you've observed. Later you can regularize / pare down the network after you get it to learn something useful.
Concretely, how long are the sequences you are passing into the network? You say you have a 30k-long time sequence.. I assume you are passing in subsections / samples of this sequence?