Transfer Learning model with Keras: using other metrics than accuracy - python

I'm working on a binary classification model for leaves from the Swedish leaves data and thought Transfer Learning could be practical. I found this tutorial, but in the compile function, I want to use different metrics than accuracy. When I try to get AUC or FP/FN/TP/TN, ValueError is raised, claiming the shape of true y (None, 1) and the shape of the y_pred (None, 2) are incompatible.
I fail to understand:
why would y_pred have this shape?
how can the accuracy be calculated, but not the parts of the confusion matrix?!
A solution without a reasoned explanation is also very welcome :)
feature_extractor_model = ""
pretrained_model_without_top_layer = hub.KerasLayer(
feature_extractor_model, input_shape=(224, 224, 3), trainable=False)
classes_num = 2
model = tf.keras.Sequential([
metrics=[['acc'], [tf.keras.metrics.TruePositives(), tf.keras.metrics.FalsePositives(), tf.keras.metrics.TrueNegatives(), tf.keras.metrics.FalseNegatives()]]), y_train, steps_per_epoch=9, epochs=5)

If you have two classes (e.g. cats and dogs) you could either encode it sparsely as zero or one, or one-hot as [0,1] and [1,0].
Your training data is sparsely, so your loss is SparseCCE. Metrics are just losses functionally, so any metric you use would need to accept sparse. In your case, just write a "custom" loss function that accept a sparse y_true, one-hots it, and passes it to the recall/precision/etc metric function.


Keras always predicts all 0's or all 1's

I'm having some issues with the predict function predicting all 0's or all 1's from my model. Here is my model
model = keras.Sequential(
layers.Dense(200, activation="relu"),
layers.Dense(500, activation="relu"),
layers.Dense(1300, activation="relu"),
layers.Dense(2000, activation="relu"),
layers.Dense(1320, activation="relu"),
layers.Dense(710, activation="relu"),
layers.Dense(150, activation="relu"),
layers.Dense(30, activation="relu"),
layers.Dense(1, activation="sigmoid"),
model.compile(loss='binary_crossentropy', optimizer=keras.optimizers.Adam(learning_rate=(0.001)), metrics=[metrics])
history =, target, batch_size=2048, epochs=100, shuffle=True, validation_split=0.2)
I'm very new to deep learning and trying to create models to classify and get predictions. My answer is just based on a 0 or 1 which will say if a customer is going to leave or stay as a customer in the long run. I've tested the data for null and NaN.
I've looked at a lot of posts about what this could be, and for the most part it seems that people were using the wrong activation function for a classification instead of regression problem. And the answer was that if you're using binary crossentropy, you should be using sigmoid (Why does a binary Keras CNN always predict 1?). I thought the output of my network would be correct seeing that I am using ReLu and SigMoid with binary crossentropy but whenever I predict, it's persistent in being all 0's or all 1's. The layers might not make too much sense, I'm still very new at this and playing around to see how layers are affecting the results of when I train and evaluate.
Here is roughly how I am using predict with the data
data = pd.read_csv("judge.csv", skiprows=range(0,0))
samples_to_predict = data.drop(['Surname', 'CreditScore', 'Geography', 'Gender', 'Tenure', 'NumOfProducts', 'HasCrCard', 'EstimatedSalary'], axis=1)
prediction = loaded_model.predict(samples_to_predict.values)
I've been trying to debug this for a while and any help as to which direction to error could be coming from would be welcomed. I've tried increasing my epoch to 1000, I tried lowering my learning_rate, I believe BatchNormalization might take care of not scaling my data(I might be misunderstanding that), tried lowering my batch_size, I tried simply using 3 Dense layers being two ReLu and one Sigmoid, checked that the data I'm predicting is a numpy array and they've all so far produced the same result of predict outputting all 0's or all 1's.
Turns out I was using predict with categories of the data that I had not trained the model with. For example I have a column titled CustomerID that I dropped to train the model, but when I was predicting, I had forgotten to drop that column which made my model predict all 0's or all 1's. After fixing that issue and making sure that I am using only the categories that I trained it with, to predict, got me predictions that were not all 0's or all 1's.

Keras loss function value error: ValueError: An operation has `None` for gradient. on LSTM network

So I'm trying to train my LSTM network language model, and use a perplexity function as my loss function but i get the following error:
ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
My loss function looks as follows:
from keras import backend as K
def perplexity_raw(y_true, y_pred):
The perplexity metric. Why isn't this part of Keras yet?!
# cross_entropy = K.sparse_categorical_crossentropy(y_true, y_pred)
cross_entropy = K.cast(K.equal(K.max(y_true, axis=-1),
K.cast(K.argmax(y_pred, axis=-1), K.floatx())),
perplexity = K.exp(cross_entropy)
return perplexity
and I create my model as follows:
# define model
model = Sequential()
model.add(Embedding(vocab_size, 500, input_length=max_length-1))
model.add(Dense(vocab_size, activation='softmax'))
# compile network
model.compile(loss=perplexity_raw, optimizer='adam', metrics=['accuracy'])
# fit network, y, epochs=150, verbose=2)
The error occurs when I try to fit my model. Does anyone know what causes the error and how to fix it?
These are the culprits: K.argmax and K.max. They don't have a gradient. I also think you just straight up don't need them in your loss metric! That's because maxing and argmaxing something removes the information on how much the prediction is wrong.
I don't know what kind of loss you want to measure, but I think you are looking for something like tf.exp(tf.nn.sigmoid_cross_entropy_with_logits(y_true, y_pred)) or tf.exp(tf.softmax_cross_entopy_with_logits(y_true, y_pred)). You might need to convert your logits to one hot encodings using tf.one_hot.

Trying to add auc-roc score into CNN training

My current CNN has relative high accuracy but low auc score, so I want to train my model considering both accuracy and auc. However, when I tried to add 'auc' as the second metrics to train, I cannot start my epochs.
This is the error message I am getting:
FailedPreconditionError: Error while reading resource variable conv2d_4/kernel from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/conv2d_4/kernel/N10tensorflow3VarE does not exist.
[[{{node conv2d_4/Conv2D/ReadVariableOp}}]]
I have tried the function auc provided in previous discussions. Sorry I can't find the post now.
from keras import backend as K
def auc(y_true, y_pred):
auc = tf.metrics.auc(y_true, y_pred)[1]
return auc
auc_model = models.Sequential()
auc_model.add(layers.Conv1D (kernel_size = (200), filters = 10, input_shape=(1644,1) , activation='relu'))
auc_model.add(layers.MaxPooling1D(pool_size = (50), strides=(10)))
auc_model.add(layers.Reshape((40, 35, 1)))
auc_model.add(layers.Conv2D(16, (3, 3), activation='relu'))
auc_model.add(layers.Conv2D(16, (3, 3), activation='relu'))
auc_model.add(layers.MaxPooling2D((2, 2)))
auc_model.add(layers.Dense(32, activation='relu', kernel_regularizer=keras.regularizers.l2(0.001)))
auc_model.add(layers.Dense(1, activation='sigmoid'))
metrics=['accuracy', auc])
from tensorflow.keras.callbacks import EarlyStopping
target = y_tr.columns[0]
rows_tr = np.isfinite(y_tr[target]).values
rows_te = np.isfinite(y_te[target]).values
x_train = x_tr[rows_tr].reshape((x_tr[rows_tr].shape[0], 1644, 1))
x_test = x_te[rows_te].reshape((x_te[rows_te].shape[0], 1644, 1)) x_train, y_tr[target][rows_tr],
validation_data=(x_test, y_te[target][rows_te]), epochs = 5)
print('\n# Evaluate on test data')
results = auc_model.evaluate(x_test, y_te[target][rows_te], batch_size = 8, verbose=1)
I want to start my training process considering both accuracy and auc score. Thanks.
Metric is just for reporting an evaluation of your trained model at each epoch. It does not change anything on your training.
If you want to make your model consider the AUC as well, you should modify your loss. Minimizing the loss of binary_crossentropy, naturally, maximize the accuracy without regarding the AUC. This makes it more problematic when you have an imbalanced data set, like one skewed class.
If you want it really only for the metric, you can see this post:
How to compute Receiving Operating Characteristic (ROC) and AUC in keras?
But if you truly want your model to maximize the AUC, you should write a custom loss function on Keras and put it in the loss of your model.
There is a good discussion here:

What values are returned from model.evaluate() in Keras?

I've got multiple outputs from my model from multiple Dense layers. My model has 'accuracy' as the only metric in compilation. I'd like to know the loss and accuracy for each output. This is some part of my code.
scores = model.evaluate(X_test, [y_test_one, y_test_two], verbose=1)
When I printed out the scores, this is the result.
[0.7185557290413819, 0.3189622712272771, 0.39959345855771927, 0.8470299135229717, 0.8016634374641469]
What are these numbers represent?
I'm new to Keras and this might be a trivial question. However, I have read the docs from Keras but I'm still not sure.
Quoted from evaluate() method documentation:
Scalar test loss (if the model has a single output and no metrics) or
list of scalars (if the model has multiple outputs and/or metrics).
The attribute model.metrics_names will give you the display labels
for the scalar outputs.
Therefore, you can use metrics_names property of your model to find out what each of those values corresponds to. For example:
from keras import layers
from keras import models
import numpy as np
input_data = layers.Input(shape=(100,))
out_1 = layers.Dense(1)(input_data)
out_2 = layers.Dense(1)(input_data)
model = models.Model(input_data, [out_1, out_2])
model.compile(loss='mse', optimizer='adam', metrics=['mae'])
outputs the following:
['loss', 'dense_1_loss', 'dense_2_loss', 'dense_1_mean_absolute_error', 'dense_2_mean_absolute_error']
which indicates what each of those numbers you see in the output of evaluate method corresponds to.
Further, if you have many layers then those dense_1 and dense_2 names might be a bit ambiguous. To resolve this ambiguity, you can assign names to your layers using name argument of layers (not necessarily on all of them but only on the input and output layers):
# ...
out_1 = layers.Dense(1, name='output_1')(input_data)
out_2 = layers.Dense(1, name='output_2')(input_data)
# ...
which outputs a more clear description:
['loss', 'output_1_loss', 'output_2_loss', 'output_1_mean_absolute_error', 'output_2_mean_absolute_error']
We should be clear that the "loss" figure is the sum of ALL the losses calculated for each item in the x_test array. x_test would contain your test data and y_test would contain your labels. The loss figure is the sum of ALL the losses, not just one loss from one item in the x_test array.

Constant Output and Prediction Syntax with LSTM Keras Network

I am new to neural networks and have two, probably pretty basic, questions. I am setting up a generic LSTM Network to predict the future of sequence, based on multiple Features.
My training data is therefore of the shape (number of training sequences, length of each sequence, amount of features for each timestep).
Or to make it more specific, something like (2000, 10, 3).
I try to predict the value of one feature, not of all three.
If I make my Network deeper and/or wider, the only output I get is the constant mean of the values to be predicted. Take this setup for example:
z0 = Input(shape=[None, len(dataset[0])])
z = LSTM(32, return_sequences=True, activation='softsign', recurrent_activation='softsign')(z0)
z = LSTM(32, return_sequences=True, activation='softsign', recurrent_activation='softsign')(z)
z = LSTM(64, return_sequences=True, activation='softsign', recurrent_activation='softsign')(z)
z = LSTM(64, return_sequences=True, activation='softsign', recurrent_activation='softsign')(z)
z = LSTM(128, activation='softsign', recurrent_activation='softsign')(z)
z = Dense(1)(z)
model = Model(inputs=z0, outputs=z)
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
history=, trainY,validation_split=0.1, epochs=200, batch_size=32,
callbacks=[ReduceLROnPlateau(factor=0.67, patience=3, verbose=1, min_lr=1E-5),
EarlyStopping(patience=50, verbose=1)])
If I just use one layer, like:
z0 = Input(shape=[None, len(dataset[0])])
z = LSTM(4, activation='soft sign', recurrent_activation='softsign')(z0)
z = Dense(1)(z)
model = Model(inputs=z0, outputs=z)
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
history=, trainY,validation_split=0.1, epochs=200, batch_size=32,
callbacks=[ReduceLROnPlateau(factor=0.67, patience=3, verbose=1, min_lr=1E-5),
EarlyStopping(patience=200, verbose=1)])
The predictions are somewhat reasonable, at least they are not constant anymore.
Why does that happen? Around 2000 samples not that many, but in the case of overfitting, I would expect the predictions to match perfectly...
EDIT: Solved, as stated in the comments, it's just that Keras always expects Batches: Keras
When I use:
to get the prediction for the first sequence, I get an dimension error:
"Error when checking : expected input_1 to have 3 dimensions, but got array with shape (3, 3)"
I need to feed in an array of sequences like:
This is a workaround, but I am not really sure, whether this has any deeper meaning, or is just a syntax thing...
This is because you have not normalised input data.
Any neural network model will initially have weights normalised around zero. Since your training dataset has all positive values, the model will try to adjust its weights to predict only positive values. However, the activation function (in your case softsign) will map it to 1. So the model can do nothing except adding the bias. That is why you are getting an almost constant line around the average value of the dataset.
For this, you can use a general tool like sklearn to pre-process your data. If you are using pandas dataframe, something like this will help
data_df = (data_df - data_df.mean()) / data_df.std()
Or to have the parameters in the model, you can consider adding batch normalization layer to your model

