Tensorflow:Training doesn't improve accuracy - python

I have just begin to learn tensorflow,and write a model for exercising on MNIST.Thus I am following a book,but there is still porblem,could you please help me about this?
Following is my code with problem description in it,thank you very much!
x = tf.placeholder(tf.float32,[None,INPUT_NODE],name='input')
y_ = tf.placeholder(tf.float32,[None,OUTPUT_NODE],name='output')
weights1 = tf.Variable(tf.truncated_normal([INPUT_NODE,LAYER1_NODE],stddev=0.1))
biases1 = tf.Variable(tf.constant(0.1,shape=[LAYER1_NODE]))
weights2 = tf.Variable(tf.truncated_normal([LAYER1_NODE,OUTPUT_NODE],stddev=0.1))
biases2 = tf.Variable(tf.constant(0.1,shape=[OUTPUT_NODE]))
the next y = ()...define forward propagating without using moving average model.
y = inference(x,None,weights1,biases1,weights2,biases2)
global_step = tf.Variable(0,trainable=False)
variable_averages = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY,global_step)
variables_averages_op = variable_averages.apply(tf.trainable_variables())
the next average_y =()...define forward propagating using moving average model.
average_y = inference(x,variable_averages,weights1,biases1,weights2,biases2)
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y,labels=tf.arg_max(y_,1))
cross_entropy_mean = tf.reduce_mean(cross_entropy)
regularizer = tf.contrib.layers.l2_regularizer(REGULARIZATION_RATE)
regularization = regularizer(variable_averages.average(weights1)) +\
regularizer(variable_averages.average(weights2))
loss = cross_entropy_mean + regularization
learning_rate = tf.train.exponential_decay(
LEARNING_RATE_BASE,
global_step,
mnist.train.num_examples / BATCH_SIZE,
LEARNING_RATE_DECAY
)
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,global_step=global_step)
train_op = tf.group(train_step,variables_averages_op)
the problem is when I use average_y to calculate the accuracy,it seems like training doesn't help improving at all:
After 0 training steps, acc in validatation is 0.0742
After 1000 training steps, acc in validatation is 0.0924
After 2000 training steps, acc in validatation is 0.0924
When I using y instead of average_y,everything is good.This really confuse me:
After 0 training steps, acc in validatation is 0.0686
After 1000 training steps, acc in validatation is 0.9716
After 2000 training steps, acc in validatation is 0.9768
#correct_prediction = tf.equal(tf.arg_max(y,1),tf.arg_max(y_,1))
correct_prediction = tf.equal(tf.arg_max(average_y,1),tf.arg_max(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))
with tf.Session() as sess:
tf.initialize_all_variables().run()
validate_feed = {
x:mnist.validation.images,
y_:mnist.validation.labels
}
test_feed={
x:mnist.test.images,
y_:mnist.test.labels
}
for i in range(TRAINING_STEPS):
if i%1000 == 0:
validate_acc = sess.run(accuracy,feed_dict=validate_feed)
print("After %d training steps, acc in validatation is %g"%(i,validate_acc))
xs,ys = mnist.train.next_batch(BATCH_SIZE)
sess.run([train_op,global_step],feed_dict={x:xs,y_:ys})
test_acc = sess.run(accuracy,feed_dict=test_feed)
print("After %d training steps, acc in test is %g" % (TRAINING_STEPS, test_acc))

From your code snippet you are training the classification loss with respect to the y logits instead of average_y, so the inference graph with exponential moving average is actually not trained
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y,labels=tf.arg_max(y_,1))

Related

Validation loss being lower than the training loss and it does not decrease in pytorch

I was trying to train a image to image translation model using transunet. I split my data by 70%, 15%, 15% for training, validation and testing. But when I monitor the loss curve, I find that the validation loss is much lower than the training loss.
loss curve:
The code is here:
criterion = nn.L1Loss()
net = net.cuda()
net = torch.nn.DataParallel(net, device_ids=range(torch.cuda.device_count()))
optimizer = torch.optim.Adam(net.parameters(), lr=lr, betas=(0.9, 0.999))
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', patience=3)
for epoch in range(1, total_epoch + 1):
print('---------- Epoch:'+str(epoch)+ ' ----------')
# data_loader_iter = iter(data_loader)
data_loader_iter = data_loader
train_epoch_loss = 0.
print('Train:')
for img, mask in tqdm(data_loader_iter,ncols=20,total=len(data_loader_iter)):
net.train()
img, mask = img.to(device), mask.to(device)
optimizer.zero_grad()
pred = net(img)
train_loss = criterion(pred, mask)
train_epoch_loss += train_loss
train_loss.backward()
optimizer.step()
train_epoch_loss /= len(data_loader_iter)
val_data_loader_num = val_data_loader
test_data_loader_num = test_data_loader
val_epoch_loss = 0
test_epoch_loss = 0
#Validation
print('Validation:')
with torch.no_grad():
for val_img, val_mask in tqdm(val_data_loader_num,ncols=20,total=len(val_data_loader_num)):
val_img, val_mask = val_img.to(device), val_mask.to(device)
net.eval()
predict = net(val_img)
val_loss = criterion(predict, val_mask)
val_epoch_loss += val_loss
val_epoch_loss = val_epoch_loss / len(val_data_loader_num)
Another problem is that, when I was testing the model, one class could not be predicted properly, that class is always clustered at the edge of the imageļ¼Œ see the green class in the figure below:
while the ground truth looks like this:
I know there seems to be to many problems, but anyone has faced similar problems? Thanks in advance!

Has anyone implemented a optuna Hyperparameter optimization for a Pytorch LSTM?

I am trying to implemented a Optuna Hyperparameter optimization for a Pytorch LSTM. But I do not know how to define my model correctly.
When I just use nn.linear erverything works fine but when I use nn.LSTMCell I get the following error:
AttributeError: 'tuple' object has no attribute 'dim'
The error gets raised because, the LSTM returns a tupel not a tensor. But I do not know how to fix it and can not find an example of an Pytorch LSTM with Optuna optimization online.
Here the Model definition:
def build_model_custom(trail):
# Suggest the number of layers of neural network model
n_layers = trail.suggest_int("n_layers", 1, 3)
layers = []
in_features = 20
for i in range(n_layers):
# Suggest the number of units in each layer
out_features = trail.suggest_int("n_units_l{}".format(i), 4, 18)
layers.append(nn.LSTMCell(in_features, out_features))
in_features = out_features
layers.append(nn.Linear(in_features, 2))
return nn.Sequential(*layers)
I have implemented an example of optuna optimizing LSTM before, I hope it will help you:
def get_best_parameters(args, Dtr, Val):
def objective(trial):
model = TransformerModel(args).to(args.device)
loss_function = nn.MSELoss().to(args.device)
optimizer = trial.suggest_categorical('optimizer',
[torch.optim.SGD,
torch.optim.RMSprop,
torch.optim.Adam])(
model.parameters(), lr=trial.suggest_loguniform('lr', 5e-4, 1e-2))
print('training...')
epochs = 10
val_loss = 0
for epoch in range(epochs):
train_loss = []
for batch_idx, (seq, target) in enumerate(Dtr, 0):
seq, target = seq.to(args.device), target.to(args.device)
optimizer.zero_grad()
y_pred = model(seq)
loss = loss_function(y_pred, target)
train_loss.append(loss.item())
loss.backward()
optimizer.step()
# validation
val_loss = get_val_loss(args, model, Val)
print('epoch {:03d} train_loss {:.8f} val_loss {:.8f}'.format(epoch, np.mean(train_loss), val_loss))
model.train()
return val_loss
sampler = optuna.samplers.TPESampler()
study = optuna.create_study(sampler=sampler, direction='minimize')
study.optimize(func=objective, n_trials=5)
pruned_trials = study.get_trials(deepcopy=False,
states=tuple([TrialState.PRUNED]))
complete_trials = study.get_trials(deepcopy=False,
states=tuple([TrialState.COMPLETE]))
best_trial = study.best_trial
print('val_loss = ', best_trial.value)
for key, value in best_trial.params.items():
print("{}: {}".format(key, value))
I implemented a solution by my self. I am not sure if it's the most pythonic but it works.
Suggestions for improvement are welcome.
def train_and_evaluate(param, model, trail):
# Load Data
train_dataloader = torch.utils.data.DataLoader(Train_Dataset, batch_size=batch_size)
Test_dataloader = torch.utils.data.DataLoader(Test_Dataset, batch_size=batch_size)
criterion = nn.MSELoss()
optimizer = getattr(optim, param['optimizer'])(model.parameters(), lr= param['learning_rate'])
acc = nn.L1Loss()
# Training Loop
for epoch_num in range(EPOCHS):
# Training
total_loss_train = 0
for train_input, train_target in train_dataloader:
output = model.forward(train_input.float())
batch_loss = criterion(output, train_target.float())
total_loss_train += batch_loss.item()
model.zero_grad()
batch_loss.backward()
optimizer.step()
# Evaluation
total_loss_val = 0
total_mae = 0
with torch.no_grad():
for test_input, test_target in Test_dataloader:
output = model(test_input.float())
batch_loss = criterion(output, test_target)
total_loss_val += batch_loss.item()
batch_mae = acc(output, test_target)
total_mae += batch_mae.item()
accuracy = total_mae/len(Test_Dataset)
# Add prune mechanism
trail.report(accuracy, epoch_num)
if trail.should_prune():
raise optuna.exceptions.TrialPruned()
return accuracy

Tensorflow's loss function returns NAN after changing RNN to LSTM cell

I am training a model to predict Time Series using an RNN model. This model is trained without any issue. Here's the original code:
tf.reset_default_graph()
num_inputs = 1
num_neurons = 100
num_outputs = 1
learning_rate = 0.0001
num_train_iterations = 2000
batch_size = 1
X = tf.placeholder(tf.float32, [None, time_steps-1, num_inputs])
y = tf.placeholder(tf.float32, [None, time_steps-1, num_outputs])
cell = tf.contrib.rnn.OutputProjectionWrapper(
tf.contrib.rnn.BasicRNNCell(num_units=num_neurons, activation=tf.nn.relu),
output_size=num_outputs)
outputs, states = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32)
loss = tf.reduce_mean(tf.square(outputs - y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train = optimizer.minimize(loss)
init = tf.global_variables_initializer()
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.75)
with tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)) as sess:
sess.run(init)
for iteration in range(num_train_iterations):
elx,ely = next_batch(training_data, time_steps)
sess.run(train, feed_dict={X: elx, y: ely})
if iteration % 100 == 0:
mse = loss.eval(feed_dict={X: elx, y: ely})
print(iteration, "\tMSE:", mse)
The problem comes when I change tf.contrib.rnn.BasicRNNCell to tf.contrib.rnn.BasicLSTMCell, there's a huge slowdown in speed and the loss function (MSE variable becomes NAN). My best bet is that MSE is the incorrect loss function and that I should try cross entropy. I searched for similar code and found that tf.nn.softmax_cross_entropy_with_logits() could be the solution but still don't understand how to implement it in my problem.
Usually the "NAN" occurs when your gradients blow up.
Here is some code for tf.softmax. Have a try.
#Output Layer
logit = tf.add(tf.matmul(H1,w2),b2)
cross_entropy =
tf.nn.softmax_cross_entropy_with_logits(logits=logit,labels=Y)
#Cost
cost = (tf.reduce_mean(cross_entropy))
#Optimizer
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
#Prediction
y_pred = tf.nn.softmax(logit)
pred = tf.argmax(y_pred, axis=1 )

Multilevel neural network

I am attempting to complete the following tensorflow tutorial and (attempting problem 4): https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/udacity/3_regularization.ipynb
However, I think I might be setting up the arrays of weights below wrong. As soon as I change hidden_layer to [image_size * image_size,1024,num_labels] (i.e. just one hidden layer), this works fine. Currently I am getting NaNs for the loss.
One possible solution is that the block
for i in range(1,len(weights)-1):
relus = tf.nn.dropout(tf.nn.relu(tf.matmul(relus, weights[i]) + biases[i]),p_hide)
is causing the problems since I am destroying the past value of relus and Neural Nets need them to do backpropagation. In fact when there is one hidden layer this block does not get executed.
batch_size = 128
hidden_layer = [image_size * image_size,1024,300,num_labels]
l2_regulariser = 0.005
p_hide = 0.5
graph = tf.Graph()
with graph.as_default():
# Input data. For the training data, we use a placeholder that will be fed
# at run time with a training minibatch.
tf_train_dataset = tf.placeholder(tf.float32,shape=(batch_size, image_size * image_size))
tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
tf_valid_dataset = tf.constant(valid_dataset)
tf_test_dataset = tf.constant(test_dataset)
# Variables.
weights = [None]*(len(hidden_layer)-1)
biases = [None]*(len(hidden_layer)-1)
for i in range(len(weights)):
weights[i] = tf.Variable(tf.truncated_normal([hidden_layer[i], hidden_layer[i+1]]))
biases[i] = tf.Variable(tf.zeros([hidden_layer[i+1]]))
# Training computation.
relus = tf.nn.dropout(tf.nn.relu(tf.matmul(tf_train_dataset, weights[0]) + biases[0]),p_hide)
for i in range(1,len(weights)-1):
relus = tf.nn.dropout(tf.nn.relu(tf.matmul(relus, weights[i]) + biases[i]),p_hide)
logits = tf.matmul(relus, weights[len(weights)-1]) + biases[len(weights)-1]
loss = 0
for weight in weights:
loss += tf.nn.l2_loss(weight)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))+ l2_regulariser*loss
# Optimizer.
global_step = tf.Variable(0) # count the number of steps taken.
learning_rate = tf.train.exponential_decay(0.5, global_step, decay_steps=20, decay_rate=0.9)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
# Predictions for the training, validation, and test data.
train_prediction = tf.nn.softmax(logits)
relus = tf.nn.relu(tf.matmul(tf_valid_dataset, weights[0]) + biases[0])
for i in range(1,len(weights)-1):
relus = tf.nn.relu(tf.matmul(relus, weights[i]) + biases[i])
valid_prediction = tf.nn.softmax(tf.matmul(relus, weights[len(weights)-1]) + biases[len(weights)-1])
relus = tf.nn.relu(tf.matmul(tf_test_dataset, weights[0]) + biases[0])
for i in range(1,len(weights)-1):
relus = tf.nn.relu(tf.matmul(relus, weights[i]) + biases[i])
test_prediction = tf.nn.softmax(tf.matmul(relus, weights[len(weights)-1]) + biases[len(weights)-1])
######################
# The NN training part
######################
num_steps = 3001
with tf.Session(graph=graph) as session:
tf.initialize_all_variables().run()
print("Initialized")
for step in range(num_steps):
# Pick an offset within the training data, which has been randomized.
# Note: we could use better randomization across epochs.
offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
# Generate a minibatch.
batch_data = train_dataset[offset:(offset + batch_size), :]
batch_labels = train_labels[offset:(offset + batch_size), :]
# Prepare a dictionary telling the session where to feed the minibatch.
# The key of the dictionary is the placeholder node of the graph to be fed,
# and the value is the numpy array to feed to it.
feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels, global_step : int(step)}
_, l, predictions = session.run(
[optimizer, loss, train_prediction], feed_dict=feed_dict)
if (step % 500 == 0):
print("Minibatch loss at step %d: %f" % (step, l))
print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
print("Validation accuracy: %.1f%%" % accuracy(
valid_prediction.eval(), valid_labels))
print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))
You should better initialize your weights:
tf.truncated_normal([hidden_layer[i], hidden_layer[i+1]], stddev=0.1)
And most of all, you should lower your learning rate to something around 0.01, 0.001...
I think your get a loss of NaN because the learning rate is too high and it breaks the network (you get exploding weights).

Adding multiple layers to TensorFlow causes loss function to become Nan

I'm writing a neural-network classifier in TensorFlow/Python for the notMNIST dataset. I've implemented l2 regularization and dropout on the hidden layers. It works fine as long as there is only one hidden layer, but when I added more layers (to improve accuracy), the loss function increases rapidly at each step, becoming NaN by step 5. I tried temporarily disabling Dropout and L2 regularization, but I get the same behavior as long as there are 2+ layers. I even rewrote my code from scratch (doing some refactoring to make it more flexible), but with the same results. The number and size of layers is controlled by hidden_layer_spec. What am I missing?
#works for np.array([1024]) with about 96.1% accuracy
hidden_layer_spec = np.array([1024, 300])
num_hidden_layers = hidden_layer_spec.shape[0]
batch_size = 256
beta = 0.0005
epochs = 100
stepsPerEpoch = float(train_dataset.shape[0]) / batch_size
num_steps = int(math.ceil(float(epochs) * stepsPerEpoch))
l2Graph = tf.Graph()
with l2Graph.as_default():
#with tf.device('/cpu:0'):
# Input data. For the training data, we use a placeholder that will be fed
# at run time with a training minibatch.
tf_train_dataset = tf.placeholder(tf.float32,
shape=(batch_size, image_size * image_size))
tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
tf_valid_dataset = tf.constant(valid_dataset)
tf_test_dataset = tf.constant(test_dataset)
weights = []
biases = []
for hi in range(0, num_hidden_layers + 1):
width = image_size * image_size if hi == 0 else hidden_layer_spec[hi - 1]
height = num_labels if hi == num_hidden_layers else hidden_layer_spec[hi]
weights.append(tf.Variable(tf.truncated_normal([width, height]), name = "w" + `hi + 1`))
biases.append(tf.Variable(tf.zeros([height]), name = "b" + `hi + 1`))
print(`width` + 'x' + `height`)
def logits(input, addDropoutLayer = False):
previous_layer = input
for hi in range(0, hidden_layer_spec.shape[0]):
previous_layer = tf.nn.relu(tf.matmul(previous_layer, weights[hi]) + biases[hi])
if addDropoutLayer:
previous_layer = tf.nn.dropout(previous_layer, 0.5)
return tf.matmul(previous_layer, weights[num_hidden_layers]) + biases[num_hidden_layers]
# Training computation.
train_logits = logits(tf_train_dataset, True)
l2 = tf.nn.l2_loss(weights[0])
for hi in range(1, len(weights)):
l2 = l2 + tf.nn.l2_loss(weights[0])
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(train_logits, tf_train_labels)) + beta * l2
# Optimizer.
global_step = tf.Variable(0) # count the number of steps taken.
learning_rate = tf.train.exponential_decay(0.5, global_step, int(stepsPerEpoch) * 2, 0.96, staircase = True)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
# Predictions for the training, validation, and test data.
train_prediction = tf.nn.softmax(train_logits)
valid_prediction = tf.nn.softmax(logits(tf_valid_dataset))
test_prediction = tf.nn.softmax(logits(tf_test_dataset))
saver = tf.train.Saver()
with tf.Session(graph=l2Graph) as session:
tf.initialize_all_variables().run()
print("Initialized")
for step in range(num_steps):
# Pick an offset within the training data, which has been randomized.
# Note: we could use better randomization across epochs.
offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
# Generate a minibatch.
batch_data = train_dataset[offset:(offset + batch_size), :]
batch_labels = train_labels[offset:(offset + batch_size), :]
# Prepare a dictionary telling the session where to feed the minibatch.
# The key of the dictionary is the placeholder node of the graph to be fed,
# and the value is the numpy array to feed to it.
feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
_, l, predictions = session.run(
[optimizer, loss, train_prediction], feed_dict=feed_dict)
if (step % 500 == 0):
print("Minibatch loss at step %d: %f" % (step, l))
print("Learning rate: " % learning_rate)
print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
print("Validation accuracy: %.1f%%" % accuracy(
valid_prediction.eval(), valid_labels))
print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))
save_path = saver.save(session, "l2_degrade.ckpt")
print("Model save to " + `save_path`)
Turns out this was not so much a coding issue as a Deep Learning Issue. The extra layer made the gradients too unstable, and that lead to the loss function quickly devolving to NaN. The best way to fix this is to use Xavier initialization. Otherwise, the variance of the initial values will tend to be too high, causing instability. Also, decreasing the learning rate may help.
I had the same problem and reducing the batch size and learning rate worked for me.

Categories

Resources