tf.keras custom metric is giving incorrect results

tf.keras custom metric is giving incorrect results - python

I have implemented a custom metric in tf.keras for a multi label classification problem.
def multilabel_TP(y_true, y_pred, thres = 0.4):
return (
tf.math.count_nonzero(
tf.math.logical_and(tf.cast(y_true, tf.bool),
tf.cast(y_pred >= thres, tf.bool))
)
)
count_zero function produces integer results but while running the model it gives me float values. The custom function gives me correct results when tried outside the scope of the keras model.
8/33 [======>.......................] - ETA: 27s - loss: 0.4294 - multilabel_TP: **121.6250**
model.compile(loss = 'binary_crossentropy', metrics = multilabel_TP, optimizer= 'adam')
model.fit(train_sentences, y_train, batch_size= 128, epochs = 20, validation_data= (test_sentences, y_test))
Why is this happenning?

What is presented in the keras progress bar is a running mean of your loss/metrics over batches, since the model is being trained on batches and the weights are changing after each batch. This is why you get a floating point value.
Your metric should also return a floating point value, maybe by taking a division over the number of elements in the batch. Then the metric values will make more sense.

Related

Why does model.evaluate() not yield the same accuracy as computing it manually using a for-loop?

After following the transfer learning tutorial on Tensorflow's site, I have a question about how model.evaluate() works in comparison to calculating accuracy by hand.
At the very end, after fine-tuning, in the Evaluation and prediction section, we use model.evaluate() to calculate the accuracy on the test set as follows:
loss, accuracy = model.evaluate(test_dataset)
print('Test accuracy :', accuracy)
6/6 [==============================] - 2s 217ms/step - loss: 0.0516 - accuracy: 0.9740
Test accuracy : 0.9739583134651184
Next, we generate predictions manually from one batch of images from the test set as part of a visualization exercise:
# Apply a sigmoid since our model returns logits
predictions = tf.nn.sigmoid(predictions)
predictions = tf.where(predictions < 0.5, 0, 1)
However, it's also possible to extend this functionality to calculate predictions across the entire test set and compare them to the actual values to yield an average accuracy:
all_acc=tf.zeros([], tf.int32) #initialize array to hold all accuracy indicators (single element)
for image_batch, label_batch in test_dataset.as_numpy_iterator():
predictions = model.predict_on_batch(image_batch).flatten() #run batch through model and return logits
predictions = tf.nn.sigmoid(predictions) #apply sigmoid activation function to transform logits to [0,1]
predictions = tf.where(predictions < 0.5, 0, 1) #round down or up accordingly since it's a binary classifier
accuracy = tf.where(tf.equal(predictions,label_batch),1,0) #correct is 1 and incorrect is 0
all_acc = tf.experimental.numpy.append(all_acc, accuracy)
all_acc = all_acc[1:] #drop first placeholder element
avg_acc = tf.reduce_mean(tf.dtypes.cast(all_acc, tf.float16))
print('My Accuracy:', avg_acc.numpy())
My Accuracy: 0.974
Now, if model.evaluate() generates predictions by applying a sigmoid to the logit model outputs and using a threshold of 0.5 like the tutorial suggests, my manually-calculated accuracy should equal the accuracy output of Tensorflow's model.evaluate() function. This is indeed the case for the tutorial. My Accuracy: 0.974 = accuracy from model.evaluate() function. However, when I try this same code with a model trained using the same convolutional base as the tutorial, but different gabor images (not cats & dogs like the tutorial), my accuracy no longer equals the model.evaluate() accuracy:
current_set = set17 #define set to process.
all_acc=tf.zeros([], tf.float64) #initialize array to hold all accuracy indicators (single element)
loss, acc = model.evaluate(current_set) #now test the model's performance on the test set
for image_batch, label_batch in current_set.as_numpy_iterator():
predictions = model.predict_on_batch(image_batch).flatten() #run batch through model and return logits
predictions = tf.nn.sigmoid(predictions) #apply sigmoid activation function to transform logits to [0,1]
predictions = tf.where(predictions < 0.5, 0, 1) #round down or up accordingly since it's a binary classifier
accuracy = tf.where(tf.equal(predictions,label_batch),1,0) #correct is 1 and incorrect is 0
all_acc = tf.experimental.numpy.append(all_acc, accuracy)
all_acc = all_acc[1:] #drop first placeholder element
avg_acc = tf.reduce_mean(all_acc)
print('My Accuracy:', avg_acc.numpy())
print('Tf Accuracy:', acc)
My Accuracy: 0.832
Tf Accuracy: 0.675000011920929
Does anyone know why there would be a discrepancy? Does the model.evaluate() not use a sigmoid? Or does it use a different threshold than 0.5? Or perhaps it's something else I'm not considering? Please note, my new model was trained using Gabor images, which are different than the cats and dogs from the tutorial, but the code was the same.
Thank you in advance for any help!

Keras: Huge loss after adding class weights

I'm working on a LSTM model in Keras with the goal of next word prediction utilizing BERT word vectors as a part of my inputs for the model.
This is a multi-class categorical problem, and I've done some weird steps to simplify English into clusters of words using BERT and stop-words and k-means, and for my initial practice model I'm using 144 target categories. I plan to up that to about 1000 after working out some kinks.
Here's the architecture of my Keras model:
model = Sequential()
model.add(LSTM(32, input_shape=(SENTENCE_LENGTH, COM_WORDS), dropout=0.2))
model.add(Dropout(0.2))
model.add(Dense(COM_WORDS))
model.add(Activation('softmax'))
optimizer = Adam(lr=lr)
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])
model.fit(X, y, validation_split=0.05, batch_size=128, epochs=epochs)
My loss starts arounds around 6 and goes down, which isn't unusual as far as I know. I then tried to incorporate class weights, since the model was over-predicting common words like 'the', which is expected. so I used this code to make the weights:
max_count = 0
for word in range(COM_WORDS):
if Ys.count(word) > max_count:
max_count = Ys.count(word)
class_weights = {}
for word in range(COM_WORDS):
class_weights[word] = (max_count - Ys.count(word) + 1)
So my most common y-input would have a value of 1 in the dictionary, and an y-input that is only represented once would be weighted at the count of the most common y-input: around 1 million in this case. Then I added it to my fit() and restarted the model.
When I run my model with the weights, i get insanely high loss (this is just a batch of 100,000 of all my inputs being run):
Epoch 1/3
950000/950000 [==============================] - 160s 168us/step - loss: 3014409.5359 - acc: 0.1261 - val_loss: 2808283.0898 - val_acc: 0.1604
The accuracy is fine though! Not too different than when I didn't use weights.
MY QUESTION(s):
Does this high loss matter? Is it just a reflection of my huge weight numbers, or is it indicating something sinister? Are loss numbers relative?
Side question: Should I use a better method to weight my inputs?
Thank you!

Transfer learning with pretrained model by tf.GradientTape can't converge

I would like to perform transfer learning with pretrained model of keras
import tensorflow as tf
from tensorflow import keras
base_model = keras.applications.MobileNetV2(input_shape=(96, 96, 3), include_top=False, pooling='avg')
x = base_model.outputs[0]
outputs = layers.Dense(10, activation=tf.nn.softmax)(x)
model = keras.Model(inputs=base_model.inputs, outputs=outputs)
Training with keras compile/fit functions can converge
model.compile(optimizer=keras.optimizers.Adam(), loss=keras.losses.SparseCategoricalCrossentropy(), metrics=['accuracy'])
history = model.fit(train_data, epochs=1)
The results are: loss: 0.4402 - accuracy: 0.8548
I wanna train with tf.GradientTape, but it can't converge
optimizer = keras.optimizers.Adam()
train_loss = keras.metrics.Mean()
train_acc = keras.metrics.SparseCategoricalAccuracy()
def train_step(data, labels):
with tf.GradientTape() as gt:
pred = model(data)
loss = keras.losses.SparseCategoricalCrossentropy()(labels, pred)
grads = gt.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
train_loss(loss)
train_acc(labels, pred)
for xs, ys in train_data:
train_step(xs, ys)
print('train_loss = {:.3f}, train_acc = {:.3f}'.format(train_loss.result(), train_acc.result()))
But the results are: train_loss = 7.576, train_acc = 0.101
If I only train the last layer by setting
base_model.trainable = False
It converges and the results are: train_loss = 0.525, train_acc = 0.823
What's the problem with the codes? How should I modify it? Thanks

Try RELU as activation function. It may be Vanishing Gradient issue which occurs if you use activation function other than RELU.

Following my comment, the reason why it didn't converge is because you picked a learning rate that was too big. This causes the weight to change too much and the loss to explode. When setting base_model.trainable to False, most of the weight in the networks were fixed and the learning rate was a good fit for your last layers. Here's a picture :
As a general rule, your learning rate should always be chosen for each experiments.
Edit : Following Wilson's comment, I'm not sure this is the reason you have different results but this could be it :
When you specify your loss, your loss is computed on each element of the batch, then to get the loss of the batch, you can take the sum or the mean of the losses, depending on which one you chose, you get a different magnitude. For example, if your batch size is 64, summing the loss will yield you a 64 times bigger loss which will yield 64 times bigger gradient, so choosing sum over mean with a batch size 64 is like picking a 64 times bigger learning rate.
So maybe the reason you have different results is that by default a keras.losses wrapped in a model.compile has a different reduction method. In the same vein, if the loss is reduced by a sum method, the magnitude of the loss depends on the batch size, if you have twice the batch size, you get (on average) twice the loss, and twice the gradient and so it's like doubling the learning rate.
My advice is to check the reduction method used by the loss to be sure it's the same in both case, and if it's sum, to check that the batch size is the same. I would advise to use mean reduction in general since it's not influenced by batch size.

Why my accuracy is always 0.2 in this simple code

I am new in this field and trying to re-run an example LSTM code copied from internet. The accuracy of the LSTM model is always 0.2 but the predicted output is totally correct which means the accuracy should be 1. Could anyone tell me why?
from numpy import array
from keras.models import Sequential, Dense, LSTM
length = 5
seq = array([i/float(length) for i in range(length)])
print(seq)
X = seq.reshape(length, 1, 1)
y = seq.reshape(length, 1)
# define LSTM configuration
n_neurons = length
n_batch = 1000
n_epoch = 1000
# create LSTM
model = Sequential()
model.add(LSTM(n_neurons, input_shape=(1, 1)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
# train LSTM
model.fit(X, y, epochs=n_epoch, batch_size=n_batch)#, verbose=2)
train_loss, train_acc = model.evaluate(X, y)
print('Training set accuracy:', train_acc
result = model.predict(X, batch_size=n_batch, verbose=0)
for value in result:
print('%.1f' % value)

You are measuring accuracy, but you are training a regressor. This means you are having as output a float number and not a fixed categorical value.
If you change the last print to have 3 decimals of precision (print('%.3f' % value) ) you will see that the predicted values are really close to the ground truth but not exactly the same, therefore the accuracy is low:
0.039
0.198
0.392
0.597
0.788
For some reason, the accuracy being used (sparse_categorical_accuracy) is considering the 0.0 and 0.039 (or similar) as a hit instead of a miss, so that's why you are getting 20% instead of 0%.
If you change the sequence to not contain zero, you will have 0% accuracy, which is less confusing:
seq = array([i/float(length) for i in range(1, length+1)])
Finally, to correct this, you can use, for example, mae instead of accuracy as the metric, where you will see the error going down:
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
Other option would be to switch to a categorical framework (changing your floats to categorical values).
Hope this helps! I will edit the answer if I can dig into why the sparse_categorical_accuracy detects the 0 as a hit and not a miss.

Individual loss of each (final-layer) output of Keras model

When training a ANN for regression, Keras stores the train/validation loss in a History object. In the case of multiple outputs in the final layer with a standard loss function, i.e. the Mean Squared Error or MSE:
what does the loss represent in the multi-output scenario? Is it the average/mean of the individual losses of all outputs or is it something else?
Can I somehow access the loss of each output individually without implementing a custom loss function?
Any hints would be much appreciated.
EDIT------------
model = Sequential()
model.add(LSTM(10, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dense(2))
model.compile(loss='mse', optimizer='adam')
Re-phrasing my question after adding the snippet:
How is the loss calculated in the case of two neurons in the output layer and what does the resulting loss represent? Is it the average loss for both outputs?

The standard MSE loss is implemented in Keras as follows:
def mse_loss(y_true, y_pred):
return K.mean(K.square(y_pred - y_true), axis=-1)
If you now have multiple neurons at the output layer, the computed loss will simply be the mean of the squared-loss of all individual neurons.
If you want the loss of each individual output to be tracked you have to write an own metric for that. If you want to keep it as simple as possible you can just use following metric (it has to be nested since Keras only allows a metric to have inputs y_true and y_pred):
def inner_part_custom_metric(y_true, y_pred, i):
d = y_pred-y_true
square_d = K.square(d)
return square_d[:,i] #y has shape [batch_size, output_dim]
def custom_metric_output_i(i):
def custom_metric_i(y_true, y_pred):
return inner_part_custom_metric(y_true, y_pred, i)
return custom_metric_i
Now, say you have 2 output neurons. Create 2 instances of this metric:
metrics = [custom_metric_output_i(0), custom_metric_output_i(1)]
Then compile your model as follows:
model = ...
model.compile(loss='mean_squared_error', optimizer=SGD(lr=0.01), metrics=metrics)
history = model.fit(...)
Now you can access the loss of each individual neuron in the history object. Use following to command to see what's in the history object:
print(history.history.keys())

print(history.history.keys())
and then:
print(history.history['custom_metric_i'])
like stated before, will actually print the history for only one dimension!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.