I have highly unbalanced data in a two class problem that I am trying to use TensorFlow to solve with a NN. I was able to find a posting that exactly described the difficulty that I'm having and gave a solution which appears to address my problem. However I'm working with an assistant, and neither of us really knows python and so TensorFlow is being used like a black box for us. I have extensive (decades) of experience working in a variety of programming languages in various paradigms. That experience allows me to have a pretty good intuitive grasp of what I see happening in the code my assistant cobbled together to get a working model, but neither of us can follow what is going on enough to be able to tell exactly where in TensorFlow we need to make edits to get what we want.
I'm hoping someone with a good knowledge of Python and TensorFlow can look at this and just tell us something like, "Hey, just edit the file called xxx and at the lines at yyy," so we can get on with it.
Below, I have a link to the solution we want to implement, and I've also included the code my assistant wrote that initially got us up and running. Our code produces good results when our data is balanced, but when highly imbalanced, it tends to classify everything skewed to the larger class to get better results.
Here is a link to the solution we found that looks promising:
Loss function for class imbalanced binary classifier in Tensor flow
I've included the relevant code from this link below. Since I know that where we make these edits will depend on how we are using TensorFlow, I've also included our implementation immediately under it in the same code block with appropriate comments to make it clear what we want to add and what we are currently doing:
# Here is the stuff we need to add some place in the TensorFlow source code:
ratio = 31.0 / (500.0 + 31.0)
class_weight = tf.constant([[ratio, 1.0 - ratio]])
logits = ... # shape [batch_size, 2]
weight_per_label = tf.transpose( tf.matmul(labels
, tf.transpose(class_weight)) ) #shape [1, batch_size]
# this is the weight for each datapoint, depending on its label
xent = tf.mul(weight_per_label
, tf.nn.softmax_cross_entropy_with_logits(logits, labels, name="xent_raw") #shape [1, batch_size]
loss = tf.reduce_mean(xent) #shape 1
# NOW HERE IS OUR OWN CODE TO SHOW HOW WE ARE USING TensorFlow:
# (Obviously this is not in the same file in real life ...)
import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
import tensorflow as tf
import numpy as np
from math import exp
from PreProcessData import load_and_process_training_Data,
load_and_process_test_data
from PrintUtilities import printf, printResultCompare
tf.set_random_seed(0)
#==============================================================
# predefine file path
''' Unbalanced Training Data, hence there are 1:11 target and nontarget '''
targetFilePath = '/Volumes/Extend/BCI_TestData/60FeaturesVersion/Train1-35/tar.txt'
nontargetFilePath = '/Volumes/Extend/BCI_TestData/60FeaturesVersion/Train1-35/nontar.txt'
testFilePath = '/Volumes/Extend/BCI_TestData/60FeaturesVersion/Test41/feats41.txt'
labelFilePath = '/Volumes/Extend/BCI_TestData/60FeaturesVersion/Test41/labs41.txt'
# train_x,train_y =
load_and_process_training_Data(targetFilePath,nontargetFilePath)
train_x, train_y =
load_and_process_training_Data(targetFilePath,nontargetFilePath)
# test_x,test_y = load_and_process_test_data(testFilePath,labelFilePath)
test_x, test_y = load_and_process_test_data(testFilePath,labelFilePath)
# trained neural network path
save_path = "nn_saved_model/model.ckpt"
# number of classes
n_classes = 2 # in this case, target or non_target
# number of hidden layers
num_hidden_layers = 1
# number of nodes in each hidden layer
nodes_in_layer1 = 40
nodes_in_layer2 = 100
nodes_in_layer3 = 30 # We think: 3 layers is dangerous!! try to avoid it!!!!
# number of data features in each blocks
block_size = 3000 # computer may not have enough memory, so we divide the train into blocks
# number of times we iterate through training data
total_iterations = 1000
# terminate training if computed loss < supposed loss
expected_loss = 0.1
# max learning rate and min learnign rate
max_learning_rate = 0.002
min_learning_rate = 0.0002
# These are placeholders for some values in graph
# tf.placeholder(dtype, shape=None(optional), name=None(optional))
# It's a tensor to hold our datafeatures
x = tf.placeholder(tf.float32, [None,len(train_x[0])])
# Every row has either [1,0] for targ or [0,1] for non_target. placeholder to hold one hot value
Y_C = tf.placeholder(tf.int8, [None, n_classes])
# variable learning rate
lr = tf.placeholder(tf.float32)
# neural network model
def neural_network_model(data):
if (num_hidden_layers == 1):
# layers contain weights and bias for case like all neurons fired a 0 into the layer, we will need result out
# When using RELUs, make sure biases are initialised with small *positive* values for example 0.1 = tf.ones([K])/10
hidden_1_layer = {'weights': tf.Variable(tf.random_normal([len(train_x[0]), nodes_in_layer1])),
'bias': tf.Variable(tf.ones([nodes_in_layer1]) / 10)}
# no more bias when come to the output layer
output_layer = {'weights': tf.Variable(tf.random_normal([nodes_in_layer1, n_classes])),
'bias': tf.Variable(tf.zeros([n_classes]))}
# multiplication of the raw input data multipled by their unique weights (starting as random, but will be optimized)
l1 = tf.add(tf.matmul(data, hidden_1_layer['weights']), hidden_1_layer['bias'])
l1 = tf.nn.relu(l1)
# We repeat this process for each of the hidden layers, all the way down to our output, where we have the final values still being the multiplication of the input and the weights, plus the output layer's bias values.
Ylogits = tf.matmul(l1, output_layer['weights']) + output_layer['bias']
if (num_hidden_layers == 2):
# layers contain weights and bias for case like all neurons fired a 0 into the layer, we will need result out
# When using RELUs, make sure biases are initialised with small *positive* values for example 0.1 = tf.ones([K])/10
hidden_1_layer = {'weights': tf.Variable(tf.random_normal([len(train_x[0]), nodes_in_layer1])),
'bias': tf.Variable(tf.ones([nodes_in_layer1]) / 10)}
hidden_2_layer = {'weights': tf.Variable(tf.random_normal([nodes_in_layer1, nodes_in_layer2])),
'bias': tf.Variable(tf.ones([nodes_in_layer2]) / 10)}
# no more bias when come to the output layer
output_layer = {'weights': tf.Variable(tf.random_normal([nodes_in_layer2, n_classes])),
'bias': tf.Variable(tf.zeros([n_classes]))}
# multiplication of the raw input data multipled by their unique weights (starting as random, but will be optimized)
l1 = tf.add(tf.matmul(data, hidden_1_layer['weights']), hidden_1_layer['bias'])
l1 = tf.nn.relu(l1)
l2 = tf.add(tf.matmul(l1, hidden_2_layer['weights']), hidden_2_layer['bias'])
l2 = tf.nn.relu(l2)
# We repeat this process for each of the hidden layers, all the way down to our output, where we have the final values still being the multiplication of the input and the weights, plus the output layer's bias values.
Ylogits = tf.matmul(l2, output_layer['weights']) + output_layer['bias']
if (num_hidden_layers == 3):
# layers contain weights and bias for case like all neurons fired a 0 into the layer, we will need result out
# When using RELUs, make sure biases are initialised with small *positive* values for example 0.1 = tf.ones([K])/10
hidden_1_layer = {'weights':tf.Variable(tf.random_normal([len(train_x[0]), nodes_in_layer1])), 'bias':tf.Variable(tf.ones([nodes_in_layer1]) / 10)}
hidden_2_layer = {'weights':tf.Variable(tf.random_normal([nodes_in_layer1, nodes_in_layer2])), 'bias':tf.Variable(tf.ones([nodes_in_layer2]) / 10)}
hidden_3_layer = {'weights':tf.Variable(tf.random_normal([nodes_in_layer2, nodes_in_layer3])), 'bias':tf.Variable(tf.ones([nodes_in_layer3]) / 10)}
# no more bias when come to the output layer
output_layer = {'weights':tf.Variable(tf.random_normal([nodes_in_layer3, n_classes])), 'bias':tf.Variable(tf.zeros([n_classes]))}
# multiplication of the raw input data multipled by their unique weights (starting as random, but will be optimized)
l1 = tf.add(tf.matmul(data,hidden_1_layer['weights']), hidden_1_layer['bias'])
l1 = tf.nn.relu(l1)
l2 = tf.add(tf.matmul(l1,hidden_2_layer['weights']), hidden_2_layer['bias'])
l2 = tf.nn.relu(l2)
l3 = tf.add(tf.matmul(l2,hidden_3_layer['weights']), hidden_3_layer['bias'])
l3 = tf.nn.relu(l3)
# We repeat this process for each of the hidden layers, all the way down to our output, where we have the final values still being the multiplication of the input and the weights, plus the output layer's bias values.
Ylogits = tf.matmul(l3,output_layer['weights']) + output_layer['bias']
return Ylogits # return the neural network model
# set up the training process
def train_neural_network(x):
# produce the prediction base on output of nn model
Ylogits = neural_network_model(x)
# measure the error use build in cross entropy function, the value that we want to minimize
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=Ylogits, labels=Y_C))
# To optimize our cost (cross_entropy), reduce error, default learning_rate is 0.001, but you can change it, this case we use default
# optimizer = tf.train.GradientDescentOptimizer(0.003)
optimizer = tf.train.AdamOptimizer(lr)
train_step = optimizer.minimize(cross_entropy)
# start the session
with tf.Session() as sess:
# We initialize all of our variables first before start
sess.run(tf.global_variables_initializer())
# iterate epoch count time (cycles of feed forward and back prop), each epoch means neural see through all train_data once
for epoch in range(total_iterations):
# count the total cost per epoch, declining mean better result
epoch_loss=0
i=0
decay_speed = 150
# current learning rate
learning_rate = min_learning_rate + (max_learning_rate - min_learning_rate) * exp(-epoch/decay_speed)
# divide the dataset in to data_set/batch_size in case run out of memory
while i < len(train_x):
# load train data
start = i
end = i + block_size
batch_x = np.array(train_x[start:end])
batch_y = np.array(train_y[start:end])
train_data = {x: batch_x, Y_C: batch_y, lr: learning_rate}
# train
# sess.run(train_step,feed_dict=train_data)
# run optimizer and cost against batch of data.
_, c = sess.run([train_step, cross_entropy], feed_dict=train_data)
epoch_loss += c
i+=block_size
# print iteration status
printf("epoch: %5d/%d , loss: %f", epoch, total_iterations, epoch_loss)
# terminate training when loss < expected_loss
if epoch_loss < expected_loss:
break
# how many predictions we made that were perfect matches to their labels
# test model
# test data
test_data = {x:test_x, Y_C:test_y}
# calculate accuracy
correct_prediction = tf.equal(tf.argmax(Ylogits, 1), tf.argmax(Y_C, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, 'float'))
print('Accuracy:',accuracy.eval(test_data))
# result matrix, return the position of 1 in array
result = (sess.run(tf.argmax(Ylogits.eval(feed_dict=test_data),1)))
answer = []
for i in range(len(test_y)):
if test_y[i] == [0,1]:
answer.append(1)
elif test_y[i]==[1,0]:
answer.append(0)
answer = np.array(answer)
printResultCompare(result,answer)
# save the prediction of correctness
np.savetxt('nn_prediction.txt', Ylogits.eval(feed_dict={x: test_x}), delimiter=',',newline="\r\n")
# save the nn model for later use again
# 'Saver' op to save and restore all the variables
saver = tf.train.Saver()
saver.save(sess, save_path)
#print("Model saved in file: %s" % save_path)
# load the trained neural network model
def test_loaded_neural_network(trained_NN_path):
Ylogits = neural_network_model(x)
saver = tf.train.Saver()
with tf.Session() as sess:
# load saved model
saver.restore(sess, trained_NN_path)
print("Loading variables from '%s'." % trained_NN_path)
np.savetxt('nn_prediction.txt', Ylogits.eval(feed_dict={x: test_x}), delimiter=',',newline="\r\n")
# test model
# result matrix
result = (sess.run(tf.argmax(Ylogits.eval(feed_dict={x:test_x}),1)))
# answer matrix
answer = []
for i in range(len(test_y)):
if test_y[i] == [0,1]:
answer.append(1)
elif test_y[i]==[1,0]:
answer.append(0)
answer = np.array(answer)
printResultCompare(result,answer)
# calculate accuracy
correct_prediction = tf.equal(tf.argmax(Ylogits, 1), tf.argmax(Y_C, 1))
print(Ylogits.eval(feed_dict={x: test_x}).shape)
train_neural_network(x)
#test_loaded_neural_network(save_path)
So, can anyone help point us to the right place to make the edits that we need to make to resolve our problem? (i.e. what is the name of the file we need to edit, and where is it located.) Thanks in advance!
-gt-
The answer you want:
You should add these codes in your train_neural_network(x) function.
ratio = (num of classes 1) / ((num of classes 0) + (num of classes 1))
class_weight = tf.constant([[ratio, 1.0 - ratio]])
Ylogits = neural_network_model(x)
weight_per_label = tf.transpose( tf.matmul(Y_C , tf.transpose(class_weight)) )
cross_entropy = tf.reduce_mean( tf.mul(weight_per_label, tf.nn.softmax_cross_entropy_with_logits(logits=Ylogits, labels=Y_C) ) )
optimizer = tf.train.AdamOptimizer(lr)
train_step = optimizer.minimize(cross_entropy)
instead of these lines:
Ylogits = neural_network_model(x)
# measure the error use build in cross entropy function, the value that we want to minimize
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=Ylogits, labels=Y_C))
# To optimize our cost (cross_entropy), reduce error, default learning_rate is 0.001, but you can change it, this case we use default
# optimizer = tf.train.GradientDescentOptimizer(0.003)
optimizer = tf.train.AdamOptimizer(lr)
train_step = optimizer.minimize(cross_entropy)
More Details:
Since in neural network, we calculate the error of prediction with respect to the targets( the true labels ), in your case, you use the cross entropy error which finds the sum of targets multiple Log of predicted probabilities.
The optimizer of network backpropagates to minimize the error to achieve more accuracy.
Without weighted loss, the weight for each class are equals, so optimizer reduce the error for the classes which have more amount and overlook the other class.
So in order to prevent this phenomenon, we should force the optimizer to backpropogate larger error for class with small amount, to do this we should multiply the errors with a scalar.
I hope it was useful :)
Related
I am trying to create a simple 3D U-net for image segmentation, just to learn how to use the layers. Therefore I do a 3D convolution with stride 2 and then a transpose deconvolution to get back the same image size. I am also overfitting to a small set (test set) just to see if my network is learning.
I created the same net in Keras and it works just fine. Now I want to create in tensorflow but I been having trouble with it.
The cost changes slightly but no matter what I do (reduce learning rate, add more epochs, add more layers, change batch size...) the output is always the same. I believe the net is not updating the weights. I am sure I am doing something wrong but I can find what it is. Any help would be greatly appreciate it.
Here is my code:
def forward_propagation(X):
if ( mode == 'train'): print(" --------- Net --------- ")
# Convolutional Layer 1
with tf.variable_scope('CONV1'):
Z1 = tf.layers.conv3d(X, filters = 16, kernel =[3,3,3], strides = [ 2, 2, 2], padding='SAME', name = 'S2/conv3d')
A1 = tf.nn.relu(Z1, name = 'S2/ReLU')
if ( mode == 'train'): print("Convolutional Layer 1 S2 " + str(A1.get_shape()))
# DEConvolutional Layer 1
with tf.variable_scope('DeCONV1'):
output_deconv1 = tf.stack([X.get_shape()[0] , X.get_shape()[1], X.get_shape()[2], X.get_shape()[3], 1])
dZ1 = tf.nn.conv3d_transpose(A1, filters = 1, kernel =[3,3,3], strides = [2, 2, 2], padding='SAME', name = 'S2/conv3d_transpose')
dA1 = tf.nn.relu(dZ1, name = 'S2/ReLU')
if ( mode == 'train'): print("Deconvolutional Layer 1 S1 " + str(dA1.get_shape()))
return dA1
def compute_cost(output, target, method = 'dice_hard_coe'):
with tf.variable_scope('COST'):
if (method == 'sigmoid_cross_entropy') :
# Make them vectors
output = tf.reshape( output, [-1, output.get_shape().as_list()[0]] )
target = tf.reshape( target, [-1, target.get_shape().as_list()[0]] )
loss = tf.nn.sigmoid_cross_entropy_with_logits(logits = output, labels = target)
cost = tf.reduce_mean(loss)
return cost
and the main function for the model:
def model(X_h5, Y_h5, learning_rate = 0.009,
num_epochs = 100, minibatch_size = 64, print_cost = True):
ops.reset_default_graph() # to be able to rerun the model without overwriting tf variables
#tf.set_random_seed(1) # to keep results consistent (tensorflow seed)
#seed = 3 # to keep results consistent (numpy seed)
(m, n_D, n_H, n_W, num_channels) = X_h5["test_data"].shape #TTT
num_labels = Y_h5["test_mask"].shape[4] #TTT
img_size = Y_h5["test_mask"].shape[1] #TTT
costs = [] # To keep track of the cost
accuracies = [] # To keep track of the accuracy
# Create Placeholders of the correct shape
X, Y = create_placeholders(n_H, n_W, n_D, minibatch_size)
# Forward propagation: Build the forward propagation in the tensorflow graph
nn_output = forward_propagation(X)
prediction = tf.nn.sigmoid(nn_output)
# Cost function: Add cost function to tensorflow graph
cost_method = 'sigmoid_cross_entropy'
cost = compute_cost(nn_output, Y, cost_method)
# Backpropagation: Define the tensorflow optimizer. Use an AdamOptimizer that minimizes the cost.
optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost)
# Initialize all the variables globally
init = tf.global_variables_initializer()
# Start the session to compute the tensorflow graph
with tf.Session() as sess:
print('------ Training ------')
# Run the initialization
tf.local_variables_initializer().run(session=sess)
sess.run(init)
# Do the training loop
for i in range(num_epochs*m):
# ----- TRAIN -------
current_epoch = i//m
patient_start = i-(current_epoch * m)
patient_end = patient_start + minibatch_size
current_X_train = np.zeros((minibatch_size, n_D, n_H, n_W,num_channels))
current_X_train[:,:,:,:,:] = np.array(X_h5["test_data"][patient_start:patient_end,:,:,:,:]) #TTT
current_X_train = np.nan_to_num(current_X_train) # make nan zero
current_Y_train = np.zeros((minibatch_size, n_D, n_H, n_W, num_labels))
current_Y_train[:,:,:,:,:] = np.array(Y_h5["test_mask"][patient_start:patient_end,:,:,:,:]) #TTT
current_Y_train = np.nan_to_num(current_Y_train) # make nan zero
feed_dict = {X: current_X_train, Y: current_Y_train}
_ , temp_cost = sess.run([optimizer, cost], feed_dict=feed_dict)
# ----- TEST -------
# Print the cost every 1/5 epoch
if ((i % (num_epochs*m/5) )== 0):
# Calculate the predictions
test_predictions = np.zeros(Y_h5["test_mask"].shape)
for j in range(0, X_h5["test_data"].shape[0], minibatch_size):
patient_start = j
patient_end = patient_start + minibatch_size
current_X_test = np.zeros((minibatch_size, n_D, n_H, n_W, num_channels))
current_X_test[:,:,:,:,:] = np.array(X_h5["test_data"][patient_start:patient_end,:,:,:,:])
current_X_test = np.nan_to_num(current_X_test) # make nan zero
current_Y_test = np.zeros((minibatch_size, n_D, n_H, n_W, num_labels))
current_Y_test[:,:,:,:,:] = np.array(Y_h5["test_mask"][patient_start:patient_end,:,:,:,:])
current_Y_test = np.nan_to_num(current_Y_test) # make nan zero
feed_dict = {X: current_X_test, Y: current_Y_test}
_, current_prediction = sess.run([cost, prediction], feed_dict=feed_dict)
test_predictions[j:j + minibatch_size,:,:,:,:] = current_prediction
costs.append(temp_cost)
print ("[" + str(current_epoch) + "|" + str(num_epochs) + "] " + "Cost : " + str(costs[-1]))
display_progress(X_h5["test_data"], Y_h5["test_mask"], test_predictions, 5, n_H, n_W)
# plot the cost
plt.plot(np.squeeze(costs))
plt.ylabel('cost')
plt.xlabel('epochs')
plt.show()
return
I call the model with:
model(hdf5_data_file, hdf5_mask_file, num_epochs = 500, minibatch_size = 1, learning_rate = 1e-3)
These are the results that I am currently getting:
Edit:
I have tried reducing the learning rate and it doesn't help. I also tried using tensorboard debug and the weights are not being updated:
I am not sure why this is happening.
I Created the same simple model in keras and it works fine. I am not sure what I am doing wrong in tensorflow.
Not sure if you are still looking for help, as I am answering this question half a year later your posted date. :) I've listed my observations and also some suggestions for you to try below. It my primary observation is right... then you probably just need a coffee break / a night of good sleep.
primary observation:
tf.reshape( output, [-1, output.get_shape().as_list()[0]] ) seems wrong. If you prefer to flatten the vector, it should be something like tf.reshape(output,[-1,np.prod(image_shape_list)]).
other observations:
With such a shallow network, I doubt the network have enough spatial resolution to differentiate tumor voxels from non-tumor voxels. Can you show the keras implementation and the performance compared to a pure tf implementation? I would probably go with 2+ layers, let's .
say with 3 layers, with a stride of 2 per layer, and an input image width of 256, you will end with a width of 32 at your deepest encoder layer. (If you have a limited GPU memory, downsample the input image.)
if changing the loss computation does not work, as #bremen_matt mentioned, reduce LR to say maybe 1e-5.
after the basic architecture tweaks and you "feel" that the network is sort of learning and not stuck, try augmenting the training data, add dropout, batch norm during training, and then maybe fancy up your loss by adding a discriminator.
i'm struggling to understand tensorflow, and I can't find good basic examples that don't rely on the MNIST dataset. I've tried to create a classification nn for some public datasets where they provide a number of (unknown) features, and a label for each sample. There's one where they provide around 90 features of audio analysis, and the year of publication as the label. (https://archive.ics.uci.edu/ml/datasets/yearpredictionmsd)
Needless to say, I didn't manage to train the network, and little could I do for understanding the provided features.
I'm now trying to generate artificial data, and try to train a network around it. The data are pairs of number (position), and the label is 1 if that position is inside a circle of radius r around an arbitrary point (5,5).
numrows=10000
circlex=5
circley=5
circler=3
data = np.random.rand(numrows,2)*10
labels = [ math.sqrt( math.pow(x-circlex, 2) + math.pow(y-circley, 2) ) for x,y in data ]
labels = list( map(lambda x: x<circler, labels) )
If tried many combinations of network shape, parameters, optimizers, learning rates, etc (I admit the math is not strong on this one), but eithere there's no convergence, or it sucks (70% accuracy on last test).
Current version (labels converted to one_hot encoding [1,0] and [0,1] (outside, inside).
# model creation
graph=tf.Graph()
with graph.as_default():
X = tf.placeholder(tf.float32, [None, 2] )
layer1 = tf.layers.dense(X, 2)
layer2 = tf.layers.dense(layer1, 2)
Y = tf.nn.softmax(layer2)
y_true = tf.placeholder(tf.float32, [None, 2] )
loss=tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits_v2(logits=Y, labels=y_true) )
optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
def accuracy(predictions, labels):
return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
/ predictions.shape[0])
# training
with tf.Session(graph=graph) as session:
tf.global_variables_initializer().run()
for step in range(1000):
_, l, predictions = session.run([optimizer,loss,Y], feed_dict={X:data, y_true:labels})
if step % 100 == 0:
print("Loss at step %d: %f" % (step, l)
print("Accuracy %f" % accuracy(predictions, labels))
The acuracy in this example is around 70% (loss around 0.6).
The question is... what am I doing wrong?
UPDATE
I'm leaving the question as originally asked. Main lessons I learned:
Normalize your input data. The mean should be around 0, and the range ~ between -1 and 1.
Blue: normalized data, Red: raw input data as created above
Batch your input data. If the subsets used are random enough, it decreases the number of iterations needed without hurting accuracy too much.
Don't forget activation functions between layers :)
The input:
Plotting the synthetic data with two classes.
Output from the code above:
All outputs are classified as a single class and because of class imbalance, accuracy is high 70%.
Issues with the code
Even though there are two layers defined, no activation function defined between the two. So tf.softmax( ((x*w1)+b1) * w2 + b2) squashes down to a single layer. There is just a single hyperplane trying to separate this input and the hyperplane lies outside the input space, thats why you get all inputs classified as a single class.
Bug: Softmax is applied twice: on the logits as well as during entropy_loss.
The entire input is given as a single batch, instead of mini-batches.
Inputs need to be normalized.
Fixing the above issues and the output becomes:
The above output makes sense, as the model has two hidden layers and so we have two hyperplanes trying to separate the data. The final layer then combines these two hyperplanes in such a way to minimize error.
Increasing the hidden layer from 2 to 3:
With 3 hidden layers, we get 3 hyperplanes and we can see the final layer adjusts these hyperplanes to separate the data well.
Code:
# Normalize data
data = (data - np.mean(data)) /np.sqrt(np.var(data))
n_hidden = 3
batch_size = 128
# Feed batch data
def get_batch(inputX, inputY, batch_size):
duration = len(inputX)
for i in range(0,duration//batch_size):
idx = i*batch_size
yield inputX[idx:idx+batch_size], inputY[idx:idx+batch_size]
# Create the graph
tf.reset_default_graph()
graph=tf.Graph()
with graph.as_default():
X = tf.placeholder(tf.float32, [None, 2] )
layer1 = tf.layers.dense(X, n_hidden, activation=tf.nn.sigmoid)
layer2 = tf.layers.dense(layer1, 2)
Y = tf.nn.softmax(layer2)
y_true = tf.placeholder(tf.int32, [None] )
loss = tf.losses.sparse_softmax_cross_entropy(logits=layer2, labels=y_true)
optimizer = tf.train.GradientDescentOptimizer(0.1).minimize(loss)
accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(Y, 1),tf.argmax(tf.one_hot(y_true,2), 1)), tf.float32))
# training
with tf.Session(graph=graph) as session:
session.run(tf.global_variables_initializer())
for epoch in range(10):
acc_avg = 0.
loss_avg = 0.
for step in range(10000//batch_size):
for inputX, inputY in get_batch(data, labels, batch_size):
_, l, acc = session.run([optimizer,loss,accuracy], feed_dict={X:inputX, y_true:inputY})
acc_avg += acc
loss_avg += l
print("Loss at step %d: %f" % (step, loss_avg*batch_size/10000))
print("Accuracy %f" % (acc_avg*batch_size/10000))
#Get prediction
pred = session.run(Y, feed_dict={X:data})
# Plotting function
import matplotlib.pylab as plt
plt.scatter(data[:,0], data[:,1], s=20, c=np.argmax(pred,1), cmap='jet', vmin=0, vmax=1)
plt.show()
Getting to know Tensorflow, I built a toy network for classification. It consists of 15 input nodes for features identical to the one-hot encoding of the corresponding class label (with indexing beginning at 1) - so the data to be loaded from an input CSV may look like this:
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,2
...
0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,15
The network has only one hidden layer and an output layer, the latter containing probabilities for a given class. Here's my problem: during training the network assings a growing probability for whatever was fed in as the very first input.
Here are the relevant lines of code (some lines are omitted):
# number_of_p : number of samples
# number_of_a : number of attributes (features) -> 15
# number_of_s : number of styles (labels) -> 15
# function for generating hidden layers
# nodes is a list of nodes in each layer (len(nodes) = number of hidden layers)
def hidden_generation(nodes):
hidden_nodes = [number_of_a] + nodes + [number_of_s]
number_of_layers = len(hidden_nodes) - 1
print(hidden_nodes)
hidden_layer = list()
for i in range (0,number_of_layers):
hidden_layer.append(tf.zeros([hidden_nodes[i],batch_size]))
hidden_weights = list()
for i in range (0,number_of_layers):
hidden_weights.append(tf.Variable(tf.random_normal([hidden_nodes[i+1], hidden_nodes[i]])))
hidden_biases = list()
for i in range (0,number_of_layers):
hidden_biases.append(tf.Variable(tf.zeros([hidden_nodes[i+1],batch_size])))
return hidden_layer, hidden_weights, hidden_biases
#loss function
def loss(labels, logits):
cross_entropy = tf.losses.softmax_cross_entropy(
onehot_labels = labels, logits = logits)
return tf.reduce_mean(cross_entropy, name = 'xentropy_mean')
hidden_layer, hidden_weights, hidden_biases = hidden_generation(hidden_layers)
with tf.Session() as training_sess:
training_sess.run(tf.global_variables_initializer())
training_sess.run(a_iterator.initializer, feed_dict = {a_placeholder_feed: training_set.data})
current_a = training_sess.run(next_a)
training_sess.run(s_iterator.initializer, feed_dict = {s_placeholder_feed: training_set.target})
current_s = training_sess.run(next_s)
s_one_hot = training_sess.run(tf.one_hot((current_s - 1), number_of_s))
for i in range (1,len(hidden_layers)+1):
hidden_layer[i] = tf.tanh(tf.matmul(hidden_weights[i-1], (hidden_layer[i-1])) + hidden_biases[i-1])
output = tf.nn.softmax(tf.transpose(tf.matmul(hidden_weights[-1],hidden_layer[-1]) + hidden_biases[-1]))
optimizer = tf.train.GradientDescentOptimizer(learning_rate = 0.1)
# using the AdamOptimizer does not help, nor does choosing a much bigger and smaller learning rate
train = optimizer.minimize(loss(s_one_hot, output))
training_sess.run(train)
for i in range (0, (number_of_p)):
current_a = training_sess.run(next_a)
current_s = training_sess.run(next_s)
s_one_hot = training_sess.run(tf.transpose(tf.one_hot((current_s - 1), number_of_s)))
# (no idea why I have to declare those twice for the datastream to move)
training_sess.run(train)
I assume the loss function is being declared at the wrong place and always references the same vectors. However, replacing the loss function did not help me by now.
I will gladly provide the rest of the code if anyone is kind enough to help me.
EDIT: I've already discovered and fixed one major (and dumb) mistake: weights go before values node values in tf.matmul.
You do not want to be declaring the training op over and over again. That is unnecessary and like you pointed out is slower. You are not feeding your current_a into the neural net. So you are not going to be getting new outputs, also how you are using iterators isn't correct which could also be the cause of the problem.
with tf.Session() as training_sess:
training_sess.run(tf.global_variables_initializer())
training_sess.run(a_iterator.initializer, feed_dict = {a_placeholder_feed: training_set.data})
current_a = training_sess.run(next_a)
training_sess.run(s_iterator.initializer, feed_dict = {s_placeholder_feed: training_set.target})
current_s = training_sess.run(next_s)
s_one_hot = training_sess.run(tf.one_hot((current_s - 1), number_of_s))
for i in range (1,len(hidden_layers)+1):
hidden_layer[i] = tf.tanh(tf.matmul(hidden_weights[i-1], (hidden_layer[i-1])) + hidden_biases[i-1])
output = tf.nn.softmax(tf.transpose(tf.matmul(hidden_weights[-1],hidden_layer[-1]) + hidden_biases[-1]))
optimizer = tf.train.GradientDescentOptimizer(learning_rate = 0.1)
# using the AdamOptimizer does not help, nor does choosing a much bigger and smaller learning rate
train = optimizer.minimize(loss(s_one_hot, output))
training_sess.run(train)
for i in range (0, (number_of_p)):
current_a = training_sess.run(next_a)
current_s = training_sess.run(next_s)
s_one_hot = training_sess.run(tf.transpose(tf.one_hot((current_s - 1), number_of_s)))
# (no idea why I have to declare those twice for the datastream to move)
training_sess.run(train)
Here is some pseudocode to help you get the correct data flow. I would do the one hot encoding prior to this just to make things easier for loading the data during training.
train_dataset = tf.data.Dataset.from_tensor_slices((inputs, targets))
train_dataset = train_dataset.batch(batch_size)
train_dataset = train_dataset.repeat(num_epochs)
iterator = train_dataset.make_one_shot_iterator()
next_inputs, next_targets = iterator.get_next()
# Define Training procedure
global_step = tf.Variable(0, name="global_step", trainable=False)
loss = Neural_net_function(next_inputs, next_targets)
optimizer = tf.train.AdamOptimizer(learning_rate)
grads_and_vars = optimizer.compute_gradients(loss)
train_op = optimizer.apply_gradients(grads_and_vars, global_step=global_step)
with tf.Session() as training_sess:
for i in range(number_of_training_samples * num_epochs):
taining_sess.run(train_op)
Solved it! Backpropagation works properly when the training procedure is redeclared for every new dataset.
for i in range (0, (number_of_p)):
current_a = training_sess.run(next_a)
current_s = training_sess.run(next_s)
s_one_hot = training_sess.run(tf.transpose(tf.one_hot((current_s - 1), number_of_s)))
optimizer = tf.train.GradientDescentOptimizer(learning_rate = 0.1)
train = optimizer.minimize(loss(s_one_hot, output))
training_sess.run(train)
...makes training considerably slower, but it works.
I've wrote my first Tensorflow program ( using my own data) . It works well at least it doesn't crash! but I'm getting a wired accuracy values either 0 oder 1 ?
.................................
the previous part of the code, is only about handeling csv file an getting Data in correct format / shapes
......................................................
# Tensoflow
x = tf.placeholder(tf.float32,[None,len(Training_Data[0])],name='Train_data')# each input has a 457 lenght
y_ = tf.placeholder(tf.float32,[None, numberOFClasses],name='Labels')#
#w = tf.Variable(tf.zeros([len(Training_Data[0]),numberOFClasses]),name='Weights')
w = tf.Variable(tf.truncated_normal([len(Training_Data[0]),numberOFClasses],stddev=1./10),name='Weights')
b = tf.Variable(tf.zeros([numberOFClasses]),name='Biases')
model = tf.add(tf.matmul(x,w),b)
y = tf.nn.softmax(model)
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
#cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
for j in range(len(train_data)):
if(np.shape(train_data) == (batchSize,numberOFClasses)):
sess.run(train_step,feed_dict={x:train_data[j],y_:np.reshape(train_labels[j],(batchSize,numberOFClasses)) })
correct_prediction = tf.equal(tf.arg_max(y,1),tf.arg_max(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction,"float"))
accuracy_vector= []
current_class =[]
for i in range(len(Testing_Data)):
if( np.shape(Testing_Labels[i]) == (numberOFClasses,)):
accuracy_vector.append(sess.run(accuracy,feed_dict={x:np.reshape(Testing_Data[i],(1,457)),y_:np.reshape(Testing_Labels[i],(1,19))}))#,i)#,Test_Labels[i])
current_class.append(int(Test_Raw[i][-1]))
ploting theaccuracy_vector delivers the following :
[]
any idea what I'm missing here ?
thanks a lot for any hint !
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
tf.nn.softmax_cross_entropy_with_logits wants unscaled logits.
From the doc:
WARNING: This op expects unscaled logits, since it performs a softmax on logits internally for efficiency. Do not call this op with the output of softmax, as it will produce incorrect results.
this means that the line y = tf.nn.softmax(model) is wrong.
Instead, you want to pass unscaled logits to that function, thus:
y = model
Moreover, once you fix this problem, if the network doesn't work, try to lower the learning rate from 0.01 to something about 1e-3 or 1e-4. (I tell you this because 1e-2 usually is an "high" learning rate)
You're testing on batches of size 1, so either the prediction is good or it's false, so you can only get 0 or 1 accuracy:
accuracy_vector.append(sess.run(accuracy,feed_dict={x:np.reshape(Testing_Data[i],(1,457)),y_:np.reshape(Testing_Labels[i],(1,19))}))#,i)#,Test_Labels[i])
Just use a bigger batch size :
accuracy_vector.append(sess.run(accuracy,feed_dict={x:np.reshape(Testing_Data[i:i+batch_size],(batch_size,457)),y_:np.reshape(Testing_Labels[i:i+batch_size],(batch_size,19))}))
I am trying to detect micro-events in a long time series. For this purpose, I will train a LSTM network.
Data. Input for each time sample is 11 different features somewhat normalized to fit 0-1. Output will be either one of two classes.
Batching. Due to huge class imbalance I have extracted the data in batches of each 60 time samples, of which at least 5 will always be class 1, and the rest class to. In this way the class imbalance is reduced from 150:1 to around 12:1 I have then randomized the order of all my batches.
Model. I am attempting to train an LSTM, with initial configuration of 3 different cells with 5 delay steps. I expect the micro events to arrive in sequences of at least 3 time steps.
Problem: When I try to train the network it will quickly converge towards saying that EVERYTHING belongs to the majority class. When I implement a weighted loss function, at some certain threshold it will change to saying that EVERYTHING belongs to the minority class. I suspect (without being expert) that there is no learning in my LSTM cells, or that my configuration is off?
Below is the code for my implementation. I am hoping that someone can tell me
Is my implementation correct?
What other reasons could there be for such behaviour?
ar_model.py
import numpy as np
import tensorflow as tf
from tensorflow.models.rnn import rnn
import ar_config
config = ar_config.get_config()
class ARModel(object):
def __init__(self, is_training=False, config=None):
# Config
if config is None:
config = ar_config.get_config()
# Placeholders
self._features = tf.placeholder(tf.float32, [None, config.num_features], name='ModelInput')
self._targets = tf.placeholder(tf.float32, [None, config.num_classes], name='ModelOutput')
# Hidden layer
with tf.variable_scope('lstm') as scope:
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(config.num_hidden, forget_bias=0.0)
cell = tf.nn.rnn_cell.MultiRNNCell([lstm_cell] * config.num_delays)
self._initial_state = cell.zero_state(config.batch_size, dtype=tf.float32)
outputs, state = rnn.rnn(cell, [self._features], dtype=tf.float32)
# Output layer
output = outputs[-1]
softmax_w = tf.get_variable('softmax_w', [config.num_hidden, config.num_classes], tf.float32)
softmax_b = tf.get_variable('softmax_b', [config.num_classes], tf.float32)
logits = tf.matmul(output, softmax_w) + softmax_b
# Evaluate
ratio = (60.00 / 5.00)
class_weights = tf.constant([ratio, 1 - ratio])
weighted_logits = tf.mul(logits, class_weights)
loss = tf.nn.softmax_cross_entropy_with_logits(weighted_logits, self._targets)
self._cost = cost = tf.reduce_mean(loss)
self._predict = tf.argmax(tf.nn.softmax(logits), 1)
self._correct = tf.equal(tf.argmax(logits, 1), tf.argmax(self._targets, 1))
self._accuracy = tf.reduce_mean(tf.cast(self._correct, tf.float32))
self._final_state = state
if not is_training:
return
# Optimize
optimizer = tf.train.AdamOptimizer()
self._train_op = optimizer.minimize(cost)
#property
def features(self):
return self._features
#property
def targets(self):
return self._targets
#property
def cost(self):
return self._cost
#property
def accuracy(self):
return self._accuracy
#property
def train_op(self):
return self._train_op
#property
def predict(self):
return self._predict
#property
def initial_state(self):
return self._initial_state
#property
def final_state(self):
return self._final_state
ar_train.py
import os
from datetime import datetime
import numpy as np
import tensorflow as tf
from tensorflow.python.platform import gfile
import ar_network
import ar_config
import ar_reader
config = ar_config.get_config()
def main(argv=None):
if gfile.Exists(config.train_dir):
gfile.DeleteRecursively(config.train_dir)
gfile.MakeDirs(config.train_dir)
train()
def train():
train_data = ar_reader.ArousalData(config.train_data, num_steps=config.max_steps)
test_data = ar_reader.ArousalData(config.test_data, num_steps=config.max_steps)
with tf.Graph().as_default(), tf.Session() as session, tf.device('/cpu:0'):
initializer = tf.random_uniform_initializer(minval=-0.1, maxval=0.1)
with tf.variable_scope('model', reuse=False, initializer=initializer):
m = ar_network.ARModel(is_training=True)
s = tf.train.Saver(tf.all_variables())
tf.initialize_all_variables().run()
for batch_input, batch_target in train_data:
step = train_data.iter_steps
dict = {
m.features: batch_input,
m.targets: batch_target
}
session.run(m.train_op, feed_dict=dict)
state, cost, accuracy = session.run([m.final_state, m.cost, m.accuracy], feed_dict=dict)
if not step % 10:
test_input, test_target = test_data.next()
test_accuracy = session.run(m.accuracy, feed_dict={
m.features: test_input,
m.targets: test_target
})
now = datetime.now().time()
print ('%s | Iter %4d | Loss= %.5f | Train= %.5f | Test= %.3f' % (now, step, cost, accuracy, test_accuracy))
if not step % 1000:
destination = os.path.join(config.train_dir, 'ar_model.ckpt')
s.save(session, destination)
if __name__ == '__main__':
tf.app.run()
ar_config.py
class Config(object):
# Directories
train_dir = '...'
ckpt_dir = '...'
train_data = '...'
test_data = '...'
# Data
num_features = 13
num_classes = 2
batch_size = 60
# Model
num_hidden = 3
num_delays = 5
# Training
max_steps = 100000
def get_config():
return Config()
UPDATED ARCHITECTURE:
# Placeholders
self._features = tf.placeholder(tf.float32, [None, config.num_features, config.num_delays], name='ModelInput')
self._targets = tf.placeholder(tf.float32, [None, config.num_output], name='ModelOutput')
# Weights
weights = {
'hidden': tf.get_variable('w_hidden', [config.num_features, config.num_hidden], tf.float32),
'out': tf.get_variable('w_out', [config.num_hidden, config.num_classes], tf.float32)
}
biases = {
'hidden': tf.get_variable('b_hidden', [config.num_hidden], tf.float32),
'out': tf.get_variable('b_out', [config.num_classes], tf.float32)
}
#Layer in
with tf.variable_scope('input_hidden') as scope:
inputs = self._features
inputs = tf.transpose(inputs, perm=[2, 0, 1]) # (BatchSize,NumFeatures,TimeSteps) -> (TimeSteps,BatchSize,NumFeatures)
inputs = tf.reshape(inputs, shape=[-1, config.num_features]) # (TimeSteps,BatchSize,NumFeatures -> (TimeSteps*BatchSize,NumFeatures)
inputs = tf.add(tf.matmul(inputs, weights['hidden']), biases['hidden'])
#Layer hidden
with tf.variable_scope('hidden_hidden') as scope:
inputs = tf.split(0, config.num_delays, inputs) # -> n_steps * (batchsize, features)
cell = tf.nn.rnn_cell.BasicLSTMCell(config.num_hidden, forget_bias=0.0)
self._initial_state = cell.zero_state(config.batch_size, dtype=tf.float32)
outputs, state = rnn.rnn(cell, inputs, dtype=tf.float32)
#Layer out
with tf.variable_scope('hidden_output') as scope:
output = outputs[-1]
logits = tf.add(tf.matmul(output, weights['out']), biases['out'])
Odd elements
Weighted loss
I am not sure your "weighted loss" does what you want it to do:
ratio = (60.00 / 5.00)
class_weights = tf.constant([ratio, 1 - ratio])
weighted_logits = tf.mul(logits, class_weights)
this is applied before calculating the loss function (further I think you wanted an element-wise multiplication as well? also your ratio is above 1 which makes the second part negative?) so it forces your predictions to behave in a certain way before applying the softmax.
If you want weighted loss you should apply this after
loss = tf.nn.softmax_cross_entropy_with_logits(weighted_logits, self._targets)
with some element-wise multiplication of your weights.
loss = loss * weights
Where your weights have a shape like [2,]
However, I would not recommend you to use weighted losses. Perhaps try increasing the ratio even further than 1:6.
Architecture
As far as I can read, you are using 5 stacked LSTMs with 3 hidden units per layer?
Try removing the multi rnn and just use a single LSTM/GRU (maybe even just a vanilla RNN) and jack the hidden units up to ~100-1000.
Debugging
Often when you are facing problems with an odd behaving network, it can be a good idea to:
Print everything
Literally print the shapes and values of every tensor in your model, use sess to fetch it and then print it. Your input data, the first hidden representation, your predictions, your losses etc.
You can also use tensorflows tf.Print() x_tensor = tf.Print(x_tensor, [tf.shape(x_tensor)])
Use tensorboard
Using tensorboard summaries on your gradients, accuracy metrics and histograms will reveal patterns in your data that might explain certain behavior, such as what lead to exploding weights. Like maybe your forget bias goes to infinity or your not tracking gradient through a certain layer etc.
Other questions
How large is your dataset?
How long are your sequences?
Are the 13 features categorical or continuous? You should not normalize categorical variables or represent them as integers, instead you should use one-hot encoding.
Gunnar has already made lots of good suggestions. A few more small things worth paying attention to in general for this sort of architecture:
Try tweaking the Adam learning rate. You should determine the proper learning rate by cross-validation; as a rough start, you could just check whether a smaller learning rate saves your model from crashing on the training data.
You should definitely use more hidden units. It's cheap to try larger networks when you first start out on a dataset. Go as large as necessary to avoid the underfitting you've observed. Later you can regularize / pare down the network after you get it to learn something useful.
Concretely, how long are the sequences you are passing into the network? You say you have a 30k-long time sequence.. I assume you are passing in subsections / samples of this sequence?