Tensorflow model not training when changing the input data - python

I am trying to use tensorflow to train a neural network (LeNet) using the traffic sign images. I want to check the effect of a preprocessing technique on the performance of the nn. So, I preprocessed the images and stored the results (trainingimages, validationimages,testimages, final testimages) as a tuple in a dict.
I then tried to iterate over this dict and then use the training and validation operations of the tensorflow as follows
import tensorflow as tf
from sklearn.utils import shuffle
output_data = []
EPOCHS = 5
BATCH_SIZE = 128
rate = 0.0005
for key in finalInputdata.keys():
for procTypes in range(0,(len(finalInputdata[key]))):
if np.shape(finalInputdata[key][procTypes][0]) != ():
X_train = finalInputdata[key][procTypes][0]
X_valid = finalInputdata[key][procTypes][1]
X_test = finalInputdata[key][procTypes][2]
X_finaltest = finalInputdata[key][procTypes][3]
x = tf.placeholder(tf.float32, (None, 32, 32,np.shape(X_train)[-1]))
y = tf.placeholder(tf.int32, (None))
one_hot_y = tf.one_hot(y,43)
# Tensor Operations
logits = LeNet(x,np.shape(X_train)[-1])
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits,one_hot_y)
softmax_probability = tf.nn.softmax(logits)
loss_operation = tf.reduce_mean(cross_entropy)
optimizer = tf.train.AdamOptimizer(learning_rate=rate)
training_operation = optimizer.minimize(loss_operation)
correct_prediction = tf.equal(tf.argmax(logits,1), tf.argmax(one_hot_y,1))
accuracy_operation = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# Pipeline for training and evaluation
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
num_examples = len(X_train)
print("Training on %s images processed as %s" %(key,dict_fornames['proctypes'][procTypes]))
print()
for i in range(EPOCHS):
X_train, y_train = shuffle(X_train, y_train)
for offset in range(0, num_examples, BATCH_SIZE):
end = offset + BATCH_SIZE
batch_x, batch_y = X_train[offset:end], y_train[offset:end]
sess.run(training_operation, feed_dict = {x: batch_x, y: batch_y})
training_accuracy = evaluate(X_train,y_train)
validation_accuracy = evaluate(X_valid, y_valid)
testing_accuracy = evaluate(X_test, y_test)
final_accuracy = evaluate(X_finaltest, y_finalTest)
print("EPOCH {} ...".format(i+1))
print("Training Accuracy = {:.3f}".format(training_accuracy))
print("Validation Accuracy = {:.3f}".format(validation_accuracy))
print()
output_data.append({'EPOCHS':EPOCHS, 'LearningRate':rate, 'ImageType': 'RGB',\
'PreprocType': dict_fornames['proctypes'][0],\
'TrainingAccuracy':training_accuracy, 'ValidationAccuracy':validation_accuracy, \
'TestingAccuracy': testing_accuracy})
sess.close()
The evaluate function is as follows
def evaluate(X_data, y_data):
num_examples = len(X_data)
total_accuracy = 0
sess = tf.get_default_session()
for offset in range(0,num_examples, BATCH_SIZE):
batch_x, batch_y = X_data[offset:offset+BATCH_SIZE], y_data[offset:offset+BATCH_SIZE]
accuracy = sess.run(accuracy_operation, feed_dict = {x:batch_x, y:batch_y})
total_accuracy += (accuracy * len(batch_x))
return total_accuracy / num_examples
Once I execute the program, it works good for the first iteration of dataset but from the second iteration, the network doesnt train and continues to do so for all the other iterations.
Training on RGB images processed as Original
EPOCH 1 ...
Training Accuracy = 0.525
Validation Accuracy = 0.474
EPOCH 2 ...
Training Accuracy = 0.763
Validation Accuracy = 0.682
EPOCH 3 ...
Training Accuracy = 0.844
Validation Accuracy = 0.723
EPOCH 4 ...
Training Accuracy = 0.888
Validation Accuracy = 0.779
EPOCH 5 ...
Training Accuracy = 0.913
Validation Accuracy = 0.795
Training on RGB images processed as Mean Subtracted Data
EPOCH 1 ...
Training Accuracy = 0.056
Validation Accuracy = 0.057
EPOCH 2 ...
Training Accuracy = 0.057
Validation Accuracy = 0.057
EPOCH 3 ...
Training Accuracy = 0.057
Validation Accuracy = 0.056
EPOCH 4 ...
Training Accuracy = 0.058
Validation Accuracy = 0.056
EPOCH 5 ...
Training Accuracy = 0.058
Validation Accuracy = 0.058
Training on RGB images processed as Normalized Data
EPOCH 1 ...
Training Accuracy = 0.058
Validation Accuracy = 0.054
EPOCH 2 ...
Training Accuracy = 0.058
Validation Accuracy = 0.054
EPOCH 3 ...
Training Accuracy = 0.058
Validation Accuracy = 0.054
EPOCH 4 ...
Training Accuracy = 0.058
Validation Accuracy = 0.054
EPOCH 5 ...
Training Accuracy = 0.058
Validation Accuracy = 0.054
However, if I restart the kernel and use any datatype (any iteration), it works. I figured out that I must clear the graph or run multiple sessions for multiple datatypes but I am not yet clear on how to do that. I tried using tf.reset_default_graph() but seems like it does not have any effect. Can somebody point me in the right direction?
Thanks

You might want to make try with data that is normalized to zero mean and unit variance before feeding it to the network, e.g. by scaling images to -1..1 range; that said, 0..1 range mostly sounds sane as well. Depending on the activations used in the network, the value range can make all the difference: ReLUs, for example, die out for inputs below zero, sigmoids start to saturate when values are below -4 or above +4 and tanh activations miss out on half of their value range if no value is ever below 0 - if the value range is too big, gradients may explode as well, preventing training altogether. From this paper, the authors seem to subtract the (batch) image mean instead of the value range mean.
You can try to use a smaller learning rate as well (although personally, I usually start experimenting around 0.0001 for Adam).
As for your multiple sessions part of the question: The way it is currently implemented in your code, you are basically cluttering the default graph. By calling
for key in finalInputdata.keys():
for procTypes in range(0,(len(finalInputdata[key]))):
if np.shape(finalInputdata[key][procTypes][0]) != ():
# ...
x = tf.placeholder(tf.float32, (None, 32, 32,np.shape(X_train)[-1]))
y = tf.placeholder(tf.int32, (None))
one_hot_y = tf.one_hot(y,43)
# Tensor Operations
logits = LeNet(x,np.shape(X_train)[-1])
# ... etc ...
you are creating len(finalInputdata) * N different instances of LeNet, all within the default graph. That might be an issue when variables are internally reused in the network.
If you do want to reset your default graph in order to try different hyperparameters, try
for key in finalInputdata.keys():
for procTypes in range(0,(len(finalInputdata[key]))):
tf.reset_default_graph()
# define the graph
sess = tf.InteractiveSession()
# train
but it is probably better to explictly create Graphs and Sessions like so:
for key in finalInputdata.keys():
for procTypes in range(0,(len(finalInputdata[key]))):
with tf.Graph().as_default() as graph:
# define the graph
with tf.Session(graph=graph) as sess:
# train
Instead of calling sess = tf.get_default_session() you would then directly use the sess reference.
I also found that Jupyter kernels and GPU enabled TensorFlow don't play together that well when iterating on networks, sometimes running into out of memory errors or downright crashing the browser tab.

Related

How to do inference on a test dataset too large for RAM?

I'm training a network to classify audio. First I extract logmel-spectrograms from my audio data, save these in arrays and train my network using these. At each epoch I inference on my test data to get an accuracy estimate.
My training dataset is 24GB and test dataset is 6GB. Both are too large for the RAM. I found that I could extract the logmel-specs from my training data before running the network, save each minibatch in a pickle file, then load these one by one during training.
However, I use .eval() to get the accuracy from my my whole test data at once. This worked when I used smaller datasets as there was no need to split my data up into chunks using different pickle files. However, I'm now trying to figure out how to run the .eval() line or equivalent so that it provides accuracy for the whole test dataset, rather than the smaller chunks I've split it into. Is there a way I can get overall accuracy for my test data using pickle files or another method?
Here is the key component of code at the end where I think this can be done:
correct = tf.equal(tf.argmax(logits, 1), tf.argmax(labels_input, 1))
test_accuracy = tf.reduce_mean(tf.cast(correct, 'float')) #changes correct to type: float
test_accuracy1 = test_accuracy.eval({features_input:X_test, labels_input:y_test})
test_accuracy_scores.append(test_accuracy1)
print('Test accuracy:', test_accuracy1)
Here is my entire codeblock for the network:
### Train NN, output results
r"""This uses the VGGish model definition within a larger model which adds two
layers on top, and then trains this larger model.
We input log-mel spectrograms (X_train) calculated above with associated labels
(y_train), and feed the batches into the model. Once the model is trained, it
is then executed on the test log-mel spectrograms (X_test) and the accuracy is output.
Alongside .csv file with the predictions for each 0.96s chunk and their true
class is also output for the test data. Column1 = the logit for the first class,
Column2 = the logit for the scond class etc. The final column is the true class.
"""
num_min_batches = len(os.listdir(pickle_files_dir))/2
os.chdir(scripts_directory)
def main(X):
with tf.Graph().as_default(), tf.Session() as sess:
# Define VGGish.
embeddings = vggish_slim.define_vggish_slim(training=FLAGS.train_vggish)
# Define a shallow classification model and associated training ops on top
# of VGGish.
with tf.variable_scope('mymodel'):
# Add a fully connected layer with 100 units. Add an activation function
# to the embeddings since they are pre-activation.
num_units = 100
fc = slim.fully_connected(tf.nn.relu(embeddings), num_units)
# Add a classifier layer at the end, consisting of parallel logistic
# classifiers, one per class. This allows for multi-class tasks.
logits = slim.fully_connected(
fc, _NUM_CLASSES, activation_fn=None, scope='logits')
tf.sigmoid(logits, name='prediction')
linear_out= slim.fully_connected(
fc, _NUM_CLASSES, activation_fn=None, scope='linear_out')
logits = tf.sigmoid(linear_out, name='logits')
# Add training ops.
with tf.variable_scope('train'):
global_step = tf.train.create_global_step()
# Labels are assumed to be fed as a batch multi-hot vectors, with
# a 1 in the position of each positive class label, and 0 elsewhere.
labels_input = tf.placeholder(
tf.float32, shape=(None, _NUM_CLASSES), name='labels')
# Cross-entropy label loss.
xent = tf.nn.sigmoid_cross_entropy_with_logits(
logits=logits, labels=labels_input, name='xent')
loss = tf.reduce_mean(xent, name='loss_op')
tf.summary.scalar('loss', loss)
# We use the same optimizer and hyperparameters as used to train VGGish.
optimizer = tf.train.AdamOptimizer(
learning_rate=vggish_params.LEARNING_RATE,
epsilon=vggish_params.ADAM_EPSILON)
train_op = optimizer.minimize(loss, global_step=global_step)
# Initialize all variables in the model, and then load the pre-trained
# VGGish checkpoint.
sess.run(tf.global_variables_initializer())
vggish_slim.load_vggish_slim_checkpoint(sess, FLAGS.checkpoint)
# The training loop.
features_input = sess.graph.get_tensor_by_name(
vggish_params.INPUT_TENSOR_NAME)
validation_accuracy_scores = []
test_accuracy_scores = []
for epoch in range(num_epochs):
epoch_loss = 0
i=0
while i < num_min_batches:
#print('mini batch'+str(i))
X_pickle_file = pickle_files_dir + 'X_train_mini_batch_' + str(i)
with open(X_pickle_file, "rb") as fp: # Unpickling
batch_x = pickle.load(fp)
y_pickle_file = pickle_files_dir + 'y_train_mini_batch_' + str(i)
with open(y_pickle_file, "rb") as fp: # Unpickling
batch_y = pickle.load(fp)
_, c = sess.run([train_op, loss], feed_dict={features_input: batch_x, labels_input: batch_y})
epoch_loss += c
i+=1
#print no. of epochs and loss
print('Epoch', epoch+1, 'completed out of', num_epochs,', loss:',epoch_loss)
#note this adds a small computational cost
correct = tf.equal(tf.argmax(logits, 1), tf.argmax(labels_input, 1))
test_accuracy = tf.reduce_mean(tf.cast(correct, 'float')) #changes correct to type: float
test_accuracy1 = test_accuracy.eval({features_input:X_test, labels_input:y_test})
test_accuracy_scores.append(test_accuracy1)
print('Test accuracy:', test_accuracy1)
if __name__ == '__main__':
tf.app.run()

How to properly graph train/val on same graph in pytorch

I have the following code that trains a model and stores logs in a results variable
import tqdm.notebook as tq
import sys
num_epochs = 10
results = {"train_loss": [], "val_loss": [], "train_acc": [], "val_acc": []}
for epoch in range(1, num_epochs+1):
sys.stdout.write(f"---Epoch {epoch}/{num_epochs}: ")
epoch_loss = {"train": [], "val": []}
epoch_acc = {"train": [], "val": []}
for phase in ['train', 'val']:
if phase=="train":
model.train(True)
else:
model.train(False)
# most important thing I learned from this project was how to fix tqdm nastiness in colab
for batch_idx, (x, y) in tq.tqdm(enumerate(dataloaders[phase]),
total=len(dataloaders[phase]),
leave=False):
# put data to device and get output
x, y = x.to(device), y.to(device)
preds = model(x)
# calc and log model loss
batch_loss = criterion(preds, y)
epoch_loss[phase].append(batch_loss.item())
# calculate acc and extend to epoch_acc
preds = torch.argmax(preds, dim=1)
batch_acc = torch.sum(preds==y)/len(y)
epoch_acc[phase].append(batch_acc)
# zero the grad
optimizer.zero_grad()
# take a step if training mode is on
if phase=="train":
batch_loss.backward()
optimizer.step()
scheduler.step()
# at the end of each epoch, calculate avg epoch train/val loss/accuracy
train_loss = sum(epoch_loss["train"])/len(epoch_loss["train"])
val_loss = sum(epoch_loss["val"])/len(epoch_loss["val"])
train_acc = 100*sum(epoch_acc["train"])/len(epoch_acc["train"])
val_acc = 100*sum(epoch_acc["val"])/len(epoch_acc["val"])
# log losses and accs every epoch
results['train_loss'].extend(epoch_loss['train'])
results['train_acc'].extend(epoch_acc['train'])
results['val_loss'].extend(epoch_loss['val'])
results['val_acc'].extend(epoch_acc['val'])
# and print it nicely
sys.stdout.write("train_loss: {:.4f} train_acc: {:.2f}% ".format(train_loss, train_acc))
sys.stdout.write("val_loss: {:.4f} val_acc: {:.2f}%\n".format(val_loss, val_acc))
I'm logging the avg accuracy and avg loss of every batch into separate training/validation loss/acc arrays. The problem is that I have more training batches so when I try to graph my training logs I get something like this:
Is there a workaround for this?
You are making a few conceptual errors:
You are calculating the validation loss/accuracy in multiple batches, as opposed to over the entire validation set
You are calculating the validation accuracy for a static model after it has already trained on all the data, as opposed to periodically assessing the validation accuracy as it is training
You should average your batch training performance over each epoch, and once per epoch calculate the complete loss/acc statistics across the entire validation set. Then you will have n_epochs values for both training and validation and can plot them on the same axes.

Get a prediction from Tensor Flow Model

I want to get predictions from my trained tensor flow model. The following is the code I have for training my model.
def train_model(self, train, test, learning_rate=0.0001, num_epochs=16, minibatch_size=32, print_cost=True, graph_filename='costs'):
# Ensure that model can be rerun without overwriting tf variables
ops.reset_default_graph()
# For reproducibility
tf.set_random_seed(42)
seed = 42
# Get input and output shapes
(n_x, m) = train.images.T.shape
n_y = train.labels.T.shape[0]
costs = []
# Create placeholders of shape (n_x, n_y)
X, Y = self.create_placeholders(n_x, n_y)
# Initialize parameters
parameters = self.initialize_parameters()
# Forward propagation
Z3 = self.forward_propagation(X, parameters)
# Cost function
cost = self.compute_cost(Z3, Y)
# Backpropagation (using Adam optimizer)
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)
# Initialize variables
init = tf.global_variables_initializer()
# Start session to compute Tensorflow graph
with tf.Session() as sess:
# Run initialization
sess.run(init)
# Training loop
for epoch in range(num_epochs):
epoch_cost = 0.
num_minibatches = int(m / minibatch_size)
seed = seed + 1
for i in range(num_minibatches):
# Get next batch of training data and labels
minibatch_X, minibatch_Y = train.next_batch(minibatch_size)
# Execute optimizer and cost function
_, minibatch_cost = sess.run([optimizer, cost], feed_dict={X: minibatch_X.T, Y: minibatch_Y.T})
# Update epoch cost
epoch_cost += minibatch_cost / num_minibatches
# Print the cost every epoch
if print_cost == True:
print("Cost after epoch {epoch_num}: {cost}".format(epoch_num=epoch, cost=epoch_cost))
costs.append(epoch_cost)
# Plot costs
plt.figure(figsize=(16,5))
plt.plot(np.squeeze(costs), color='#2A688B')
plt.xlim(0, num_epochs-1)
plt.ylabel("cost")
plt.xlabel("iterations")
plt.title("learning rate = {rate}".format(rate=learning_rate))
plt.savefig(graph_filename, dpi=300)
plt.show()
# Save parameters
parameters = sess.run(parameters)
print("Parameters have been trained!")
# Calculate correct predictions
correct_prediction = tf.equal(tf.argmax(Z3), tf.argmax(Y))
# Calculate accuracy on test set
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print ("Train Accuracy:", accuracy.eval({X: train.images.T, Y: train.labels.T}))
print ("Test Accuracy:", accuracy.eval({X: test.images.T, Y: test.labels.T}))
return parameters
After training the model, I want to extract the prediction from the model.
So I add
print(sess.run(accuracy, feed_dict={X: test.images.T}))
But I am seeing the below error after running the above code:
InvalidArgumentError: You must feed a value for placeholder tensor 'Y'
with dtype float and shape [10,?]
[[{{node Y}} = Placeholderdtype=DT_FLOAT, shape=[10,?], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
Any help is welcome..
The tensor accuracy is a function of the tensor correct_prediction, which in turn is a function of (among the rest) Y.
So you're correctly being told that you should feed values for that placeholder too.
I'm assuming Y hold your labels, so it should also make intuitive sense that your feed_dict would also contain the correct Y values.
Hope that helps.
Good luck!

How to test my trained Tensor Flow model

I currently have a regression model that tries to predict a value based on 25 other ones.
Here is the code I currently gave
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
rng = np.random
learning_rate = 0.11
training_epochs = 1000
display_step = 50
X = np.random.randint(5,size=(100,25)).astype('float32')
y_data = np.random.randint(5,size=(100,1)).astype('float32')
m = 100
epochs = 100
W = tf.Variable(tf.zeros([25,1]))
b = tf.Variable(tf.zeros([1]))
y = tf.add( tf.matmul(X,W), b)
loss = tf.reduce_sum(tf.square(y - y_data)) / (2 * m)
loss = tf.Print(loss, [loss], "loss: ")
optimizer = tf.train.GradientDescentOptimizer(.01)
train = optimizer.minimize(loss)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for i in range(epochs):
sess.run(train)
sess.close()
I understand that right now these variables are all random so the accuracy would not be very good anyways, but I just want to know how to make a test set and find the accuracy of the predictions.
Typically, you split your training set into two pieces: roughly 2/3 for training and 1/3 for testing (opinions vary on the proportions). Train your model with the first set. Check the training accuracy (run the training set back through the model to see how many it gets right).
Now, run the remainder (test set) through the model and check how well the predictions match the results. "Find the accuracy" depends on what sort of predictions you're making: classification vs scoring, binary vs disjoint vs contiguous, etc.

Lasagne mlp target out of bound

Hi I am trying to modify the mnist example to match it to my dataset. I only try to use the mlp example and it gives a strange error.
Tha dataset is a matrix with 2100 rows and 17 columns, and the output should be one of the 16 possible classes. The error seems happening in the secon phase of the training. The model is build correctly (log info confirmed).
Here is the error log:
ValueError: y_i value out of bounds
Apply node that caused the error:
CrossentropySoftmaxArgmax1HotWithBias(Dot22.0, b, targets)
Toposort index: 33
Inputs types: [TensorType(float64, matrix), TensorType(float64, vector), >TensorType(int32, vector)]
Inputs shapes: [(100, 17), (17,), (100,)]
Inputs strides: [(136, 8), (8,), (4,)]
Inputs values: ['not shown', 'not shown', 'not shown']
Outputs clients: [[Sum{acc_dtype=float64}(CrossentropySoftmaxArgmax1HotWithBias.0)], [CrossentropySoftmax1HotWithBiasDx(Assert{msg='sm and dy do not have the same shape.'}.0, CrossentropySoftmaxArgmax1HotWithBias.1, targets)], []]
HINT: Re-running with most Theano optimization disabled could give you a >back-trace of when this node was created. This can be done with by >setting the Theano flag 'optimizer=fast_compile'. If that does not work, >Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
Here is the code:
def build_mlp(input_var=None):
l_in = lasagne.layers.InputLayer(shape=(None, 16),
input_var=input_var)
# Apply 20% dropout to the input data:
l_in_drop = lasagne.layers.DropoutLayer(l_in, p=0.2)
# Add a fully-connected layer of 800 units, using the linear rectifier, and
# initializing weights with Glorot's scheme (which is the default anyway):
l_hid1 = lasagne.layers.DenseLayer(
l_in_drop, num_units=10,
nonlinearity=lasagne.nonlinearities.rectify,
W=lasagne.init.GlorotUniform())
# We'll now add dropout of 50%:
l_hid1_drop = lasagne.layers.DropoutLayer(l_hid1, p=0.5)
# Another 800-unit layer:
l_hid2 = lasagne.layers.DenseLayer(
l_hid1_drop, num_units=10,
nonlinearity=lasagne.nonlinearities.rectify)
# 50% dropout again:
l_hid2_drop = lasagne.layers.DropoutLayer(l_hid2, p=0.5)
# Finally, we'll add the fully-connected output layer, of 10 softmax units:
l_out = lasagne.layers.DenseLayer(
l_hid2_drop, num_units=17,
nonlinearity=lasagne.nonlinearities.softmax)
# Each layer is linked to its incoming layer(s), so we only need to pass
# the output layer to give access to a network in Lasagne:
return l_out
def main(model='mlp', num_epochs=300):
# Load the dataset
print("Loading data...")
X_train, y_train, X_val, y_val, X_test, y_test = load_dataset()
# Prepare Theano variables for inputs and targets
input_var = T.matrix('inputs')
target_var = T.ivector('targets')
# Create neural network model (depending on first command line parameter)
print("Building model and compiling functions...")
if model == 'cnn':
network = build_cnn(input_var)
elif model == 'mlp':
network = build_mlp(input_var)
elif model == 'lstm':
network = build_lstm(input_var)
else:
print("Unrecognized model type %r." % model)
# Create a loss expression for training, i.e., a scalar objective we want
# to minimize (for our multi-class problem, it is the cross-entropy loss):
prediction = lasagne.layers.get_output(network)
loss = lasagne.objectives.categorical_crossentropy(prediction, target_var)
loss = loss.mean()
# We could add some weight decay as well here, see lasagne.regularization.
# Create update expressions for training, i.e., how to modify the
# parameters at each training step. Here, we'll use Stochastic Gradient
# Descent (SGD) with Nesterov momentum, but Lasagne offers plenty more.
params = lasagne.layers.get_all_params(network, trainable=True)
updates = lasagne.updates.nesterov_momentum(
loss, params, learning_rate=0.01, momentum=0.9)
# Create a loss expression for validation/testing. The crucial difference
# here is that we do a deterministic forward pass through the network,
# disabling dropout layers.
test_prediction = lasagne.layers.get_output(network, deterministic=True)
test_loss = lasagne.objectives.categorical_crossentropy(test_prediction,
target_var)
test_loss = test_loss.mean()
# As a bonus, also create an expression for the classification accuracy:
test_acc = T.mean(T.eq(T.argmax(test_prediction, axis=1), target_var),
dtype=theano.config.floatX)
# Compile a function performing a training step on a mini-batch (by giving
# the updates dictionary) and returning the corresponding training loss:
train_fn = theano.function([input_var, target_var], loss, updates=updates)
# Compile a second function computing the validation loss and accuracy:
val_fn = theano.function([input_var, target_var], [test_loss, test_acc])
# Finally, launch the training loop.
print("Starting training...")
# We iterate over epochs:
for epoch in range(num_epochs):
# In each epoch, we do a full pass over the training data:
train_err = 0
train_batches = 0
start_time = time.time()
for batch in iterate_minibatches(X_train, y_train, 100, shuffle=True):
inputs, targets = batch
train_err += train_fn(inputs, targets)
train_batches += 1
# And a full pass over the validation data:
val_err = 0
val_acc = 0
val_batches = 0
for batch in iterate_minibatches(X_val, y_val, 100, shuffle=False):
inputs, targets = batch
err, acc = val_fn(inputs, targets)
val_err += err
val_acc += acc
val_batches += 1
# Then we print the results for this epoch:
print("Epoch {} of {} took {:.3f}s".format(
epoch + 1, num_epochs, time.time() - start_time))
print(" training loss:\t\t{:.6f}".format(train_err / train_batches))
print(" validation loss:\t\t{:.6f}".format(val_err / val_batches))
print(" validation accuracy:\t\t{:.2f} %".format(
val_acc / val_batches * 100))
# After training, we compute and print the test error:
test_err = 0
test_acc = 0
test_batches = 0
for batch in iterate_minibatches(X_test, y_test, 100, shuffle=False):
inputs, targets = batch
err, acc = val_fn(inputs, targets)
test_err += err
test_acc += acc
test_batches += 1
print("Final results:")
print(" test loss:\t\t\t{:.6f}".format(test_err / test_batches))
print(" test accuracy:\t\t{:.2f} %".format(
test_acc / test_batches * 100))
I Figured out the problem:
my dataset does not have an output for every target, becouse it is too small! There are 17 target outputs but my dataset has only 16 different outputs, and it is missing examples of the 17th output.
In order to resolve this problem, just change the softmax with rectify,
from this:
l_out = lasagne.layers.DenseLayer(
l_hid2_drop, num_units=17,
nonlinearity=lasagne.nonlinearities.softmax)
to this:
l_out = lasagne.layers.DenseLayer(
l_hid2_drop, num_units=17,
nonlinearity=lasagne.nonlinearities.rectify)

Categories

Resources