Change yTrue in custom metrics - python

I'm trying to implement GAN in Keras, and I want to use One-sided label smoothing trick, i.e. put the label of True image to be 0.9 instead of 1. However, now the built-in metrics binary_crossentropy does not do the correct thing, it's always 0 for True image.
Then I tried to implement my own metrics in Keras. I want to convert all 0.9 label to be 1, but I'm new to Keras and I don't know how to do that. Here's what I intend:
# Just a pseudo code
def custom_metrics(y_true, y_pred):
if K.equal(y_true, [[0.9]]):
y_true = y_true+0.1
return metrics.binary_accuracy(y_true, y_pred)
How should I compare and change the y_true label? Thanks in advance!
EDIT:
The output of the following code is:
def custom_metrics(y_true, y_pred):
print(K.shape(y_true))
print(K.shape(y_pred))
y_true = K.switch(K.equal(y_true, 0.9), K.ones_like(y_true), K.zeros_like(y_true))
return metrics.binary_accuracy(y_true, y_pred)
Tensor("Shape:0", shape=(2,), dtype=int32)
Tensor("Shape_1:0", shape=(2,), dtype=int32)
ValueError: Shape must be rank 0 but is rank 2 for 'cond/Switch' (op: 'Switch') with input shapes: [?,?], [?,?].

You can use tf.where:
y_true = tf.where(K.equal(y_true, 0.9), tf.ones_like(y_true), tf.zeros_like(y_true))
Alternatively, You can use keras.backend.switch function for that.
keras.backend.switch(condition, then_expression, else_expression)
Your custom metrics function would look something like below:
def custom_metrics(y_true, y_pred):
y_true = K.switch(K.equal(y_true, 0.9),K.ones_like(y_true), K.zeros_like(y_true))
return metrics.binary_accuracy(y_true, y_pred)
Test code:
def test_function(y_true):
print(K.eval(y_true))
y_true = K.switch(K.equal(y_true, 0.9),K.ones_like(y_true), K.zeros_like(y_true))
print(K.eval(y_true))
y_true = K.variable(np.array([0, 0, 0, 0, 0, 0.9, 0.9, 0.9, 0.9, 0.9]))
test_function(y_true)
output:
[0. 0. 0. 0. 0. 0.9 0.9 0.9 0.9 0.9]
[0. 0. 0. 0. 0. 1. 1. 1. 1. 1.]

Related

keras, sparse_categorical_crossentropy label Y dimension and value range

Can someone please explain dimensionality logic for input X and class Y
for sparse_categorical_crossentropy loss function ?
I checked both Keras and tf2 doc and examples, and this post.
Cross Entropy vs Sparce but one point is not clear to me.
Does the Y vector need to be expanded to the same number column as
the number classes models outputs (if I use softmax output), or
Does Keras automatically expand Y?
In my case, I have input images 32x32, and Y is a number between 0 and 10.
So the input is (batch_size, h, w), Y (batch_size, 0....10 integer value)
X = (73257, 32, 32)
Y = (73257, 1)
model.fit(X, Y, epochs=30, validation_split=0.10, batch_size=1, verbose=True)
The model itself just a Sequential bunch of Dense layers and output Softmax.
model = Sequential()
model.add(Dense(32, activation='relu',
input_shape=input_shape,
kernel_initializer='he_uniform',
bias_initializer='ones'))
# bunch of Dense layer and output softmax
model.add(Dense(10, activation='softmax'))
The error is dimensionality.
ValueError: Shape mismatch: The shape of labels (received (1, 1)) should equal the shape of logits except for the last dimension (received (1, 32, 10)).
Thank you.
As mentioned in that post, both categorical cross-entropy (cce) and sparse categorical cross-entropy (scc) have the same loss function just except the format of the true label Y. Simply if Y is an integer, you would use scc whereas if Y is one-hot, you would use cce. So for scc, ground truth Y is mostly 1D whereas in cce, ground truth Y mostly is 2D. For ground truth
- (num_of_samples, n_class_one_hot_encode) <- for cce (2D)
- (num_of_samples, n_class_int) <- for scc (1D)
For example, if we use the cifar10 data set, we can do
import tensorflow as tf
(x_train, y_train), (_, _) = tf.keras.datasets.cifar10.load_data()
# train set / data
x_train = x_train.astype('float32') / 255
sparse = y_train
onehot = y_train
onehot = tf.keras.utils.to_categorical(onehot , num_classes=10)
print(sparse[:5]) # < --- (num_of_samples, n_class_int)
print(onehot[:5]) # < --- (num_of_samples, n_class_one_hot_encode)
[[6]
[9]
[9]
[4]
[1]]
[[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]]
Now, let's define a simple model and train using the above both and see what happens.
def net():
input = tf.keras.Input(shape=(32, 32, 3))
x = tf.keras.layers.Conv2D(16, 3, activation="relu")(input)
x = tf.keras.layers.MaxPooling2D(3)(x)
x = tf.keras.layers.GlobalMaxPooling2D()(x)
x = tf.keras.layers.Dense(10, activation='softmax')(x)
model = tf.keras.Model(input, x)
return model
Using cce
model = net()
model.compile(
loss = tf.keras.losses.CategoricalCrossentropy(),
metrics = 'accuracy',
optimizer = 'adam')
his = model.train_on_batch(x_train, onehot, return_dict=True)
print(his)
{'loss': 2.376708984375, 'accuracy': 0.09651999920606613}
one_hot_pred = model.predict(x_train)
print(onehot[0])
print(one_hot_pred[0])
print(onehot[0].shape)
print(one_hot_pred[0].shape)
[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
[0.1516315 0.1151238 0.11732318 0.10644271 0.08946694 0.1398355
0.05046898 0.04249624 0.11813554 0.06907552]
(10,)
(10,)
Now, using scc
model = net()
model.compile(
loss = tf.keras.losses.SparseCategoricalCrossentropy(),
metrics = 'accuracy',
optimizer = 'adam')
his = model.train_on_batch(x_train, sparse, return_dict=True)
print(his)
{'loss': 2.331458806991577, 'accuracy': 0.10066000372171402}
sparse_pred = model.predict(x_train)
print(sparse[0])
print(sparse_pred[0])
print(sparse[0].shape)
print(sparse_pred[0].shape)
[6]
[0.07184976 0.08837385 0.06910037 0.12347631 0.09542189 0.09981853
0.11247937 0.06707954 0.14902702 0.12337337]
(1,)
(10,)
Observe that, gt and pred shape for scc are (1,) and (10,). In this case, the loss computes the logarithm only for output index which ground truth indicates to. For example, the gt here is 6, and from pred the loss will compute only the logarithm of pred[6]. Here are some little more details of it.

Problem with pooled gradients for class activation map (CAM)

I'm using Keras to create gradient class activation maps to visualize my predictions, but for some of the images that I pass through, it returns zeros for the mean intensity of the gradient over a specific feature map channel. I'm using DenseNet121 to classify between two classes.
I've followed the guide that Francois Chollet wrote to use Grad Cams in his book Deep Learning with Python, and I also had the exact same issue with the VGG16 model as well, so I'm assuming that this is unrelated to the model I choose.
My model is as follows:
K.set_image_dim_ordering('tf')
dnet = DenseNet121(include_top=True, weights='imagenet', input_shape=(img_size, img_size, 3))
dnet.trainable=True
x = dnet.output
x = layers.Dense(100, activation='relu')(x)
x = layers.Dropout(0.4)(x)
x = layers.Dense(2, activation='sigmoid')(x)
model = Model(input=dnet.input, output=x)
optimizer = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, decay=0.08)
class_weight = {0:cwr, 1:1}
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
model_hist = model.fit_generator(train, steps_per_epoch = 60, epochs = 2, class_weight = class_weight)
The code for the Grad CAM and associated heatmap:
img = load_img(load_path, target_size=(img_size, img_size))
img_tensor = img_to_array(img)
img_tensor = np.expand_dims(img_tensor, axis=0)
x = preprocess_input(img_tensor)
preds = model.predict(x)
t = np.argmax(preds[0])
img = cv2.imread(str(load_path))
llayer = model.output[:,t]
last_conv_layer = model.get_layer(conv)
grads = K.gradients(llayer, last_conv_layer.output)[0]
pooled_grads = K.mean(grads, axis=(0, 1, 2))
iterate = K.function([model.input],[pooled_grads, last_conv_layer.output[0]])
pooled_grads_value, conv_layer_output_value = iterate([img_tensor])
for i in range(len(pooled_grads_value)):
conv_layer_output_value[:, :, i] *= pooled_grads_value[i]
heatmap = np.mean(conv_layer_output_value, axis=-1)
heatmap = np.maximum(heatmap, 0)
heatmap /= (np.max(heatmap))
heatmap = cv2.resize(heatmap, (img.shape[1], img.shape[0]))
heatmap = np.uint8(255 * heatmap)
heatmap = cv2.applyColorMap(heatmap, cv2.COLORMAP_JET)
cv2.imwrite('colormap.jpg', heatmap)
colormap = cv2.imread('colormap.jpg')
I localized the issue as far as I could to the pooled_grads_value. Printing this value gives
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
Also returns the following error when trying to plot the associated heatmap
/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:85: RuntimeWarning: invalid value encountered in true_divide
An example of an image that works properly will return the pooled_grads_value as something similar
[ 3.52818736e-16 -2.74286623e-17 -1.04039105e-16 2.26564966e-15
3.80025990e-16 -4.65492462e-16 -3.13070048e-16 -3.99670802e-16
-4.33913274e-16 -1.11373781e-16 -2.18853726e-16 -3.50514463e-16
1.03881816e-17 -6.30468010e-16 -5.57306545e-16 -1.23719115e-16
-3.93387115e-17 7.59074981e-16 -5.14396333e-16 -1.02742529e-16
-2.16168923e-16 1.81140590e-16 -6.42374594e-16 3.01582507e-17
-5.55568844e-17 -3.05854862e-16 3.26082836e-17 2.35498082e-16
7.86424100e-18 6.45698563e-16 -1.54681729e-16 1.11217808e-16]

Custom metrics in keras to evaluate sign prediction

I am working on a regression problem. One of performance metrics for this problem is "sign accuracy", which means I want to see whether the predict value has the same sign of the true value. I know mse could somehow show the closeness between the predict value and the true value, but I would like to see the sign accuracy during the validation.
To be more specific, after training, I use the way below to check the accuracy. What I want to custom the metrics is to realize the way below during validation.
(np.multiply(predict_label,test_label)>0).sum()/float(predict_label.shape[0])
You can implement it in a similar way to accuracy:
def sign_accuracy(y_true, y_pred):
return K.mean(K.greater(y_true * y_pred, 0.), axis=-1)
To test it:
y_true = np.random.rand(5, 1) - 0.5
y_pred = np.random.rand(5, 1) - 0.5
acc = K.eval(sign_accuracy(K.variable(y_true), K.variable(y_pred)))
print(y_true)
[[ 0.20410185]
[ 0.12085985]
[ 0.39697642]
[-0.28178138]
[-0.37796012]]
print(y_pred)
[[-0.38281826]
[ 0.14268927]
[ 0.19218624]
[ 0.21394845]
[ 0.04044269]]
print(acc)
[ 0. 1. 1. 0. 0.]
The mean over axis 0 is taken automatically by Keras when you call fit() or evaluate(), so you don't need to sum acc and divide it by y_pred.shape[0].
This metric can also be applied to multidimensional variables:
y_true = np.random.rand(5, 3) - 0.5
y_pred = np.random.rand(5, 3) - 0.5
acc = K.eval(sign_accuracy(K.variable(y_true), K.variable(y_pred)))
print(y_true)
[[ 0.02745352 -0.27927986 -0.47882833]
[-0.40950793 -0.16218984 0.19184008]
[ 0.25002487 -0.08455175 -0.03606459]
[ 0.09315503 -0.19825522 0.19801222]
[-0.32129431 -0.02256616 0.47799333]]
print(y_pred)
[[-0.06733171 0.18156806 0.28396574]
[ 0.04054056 -0.45898607 -0.10661648]
[-0.05162396 -0.34005141 -0.25910923]
[-0.26283177 0.01532359 0.33764032]
[ 0.2754057 0.26896232 0.23089488]]
print(acc)
[ 0. 0.33333334 0.66666669 0.33333334 0.33333334]

how can I calculate the multi-label top k precisions with tensorflow?

My task is to predict the five most probable tags in a sentence. And now I've got unscaled logits from the output(dense connect) layer:
with tf.name_scope("output"):
scores = tf.nn.xw_plus_b(self.h_drop, W,b, name="scores")
predictions = tf.nn.top_k(self.scores, 5) # should be the k highest score
with tf.name_scope("accuracy"):
labels = input_y # its shape is (batch_size, num_classes)
# calculate the top k accuracy
now predictions are just like [3,1,2,50,12] (3,1... are indexes of the highest scores), while labels are in "multi-hot" form: [0,1,0,1,1,0...].
In python, i can simply write
correct_preds = [input_y[i]==1 for i in predictions]
weighted = np.dot(correct_preds, [5,4,3,2,1]) # weighted by rank
recall = sum(correct_preds) /sum(input_y)
precision =sum(correct_preds)/len(correct_preds)
but in tensorflow, what form shoud I use to complete this task?
Solution
I've coded up an example of how to do the calculations. All of the inputs in this example are coded as tf.constant but of course you can substitute your variables.
The main trick is the matrix multiplications. First is input_y reshaped to be 2d times a [1x5] ones matrix called to_top5. The second is correct_preds by the weighted_matrix.
Code
import tensorflow as tf
input_y = tf.constant( [5,2,9,1] , dtype=tf.int32 )
predictions = tf.constant( [[9,3,5,2,1],[8,9,0,6,5],[1,9,3,4,5],[1,2,3,4,5]])
to_top5 = tf.constant( [[1,1,1,1,1]] , dtype=tf.int32 )
input_y_for_top5 = tf.matmul( tf.reshape(input_y,[-1,1]) , to_top5 )
correct_preds = tf.cast( tf.equal( input_y_for_top5 , predictions ) , dtype=tf.float16 )
weighted_matrix = tf.constant( [[5.],[4.],[3.],[2.],[1.]] , dtype=tf.float16 )
weighted = tf.matmul(correct_preds,weighted_matrix)
recall = tf.reduce_sum(correct_preds) / tf.cast( tf.reduce_sum(input_y) , tf.float16)
precision = tf.reduce_sum(correct_preds) / tf.constant(5.0,dtype=tf.float16)
## training
# Run tensorflow and print the result
with tf.Session() as sess:
print "\n\n=============\n\n"
print "\ninput_y_for_top5"
print sess.run(input_y_for_top5)
print "\ncorrect_preds"
print sess.run(correct_preds)
print "\nweighted"
print sess.run(weighted)
print "\nrecall"
print sess.run(recall)
print "\nprecision"
print sess.run(precision)
print "\n\n=============\n\n"
Output
=============
input_y_for_top5
[[5 5 5 5 5]
[2 2 2 2 2]
[9 9 9 9 9]
[1 1 1 1 1]]
correct_preds
[[ 0. 0. 1. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 1. 0. 0. 0.]
[ 1. 0. 0. 0. 0.]]
weighted
[[ 3.]
[ 0.]
[ 4.]
[ 5.]]
recall
0.17651
precision
0.6001
=============
Summary
The above examples shows a batch size of 4.
The first batch has a y_label of 5, which means that the element with an index of 5 is the correct label for the first batch. Furthermore, the prediction for the first batch is [9,3,5,2,1] which means that the prediction function thinks that the 9th element is the most likely, then element 3 is the next most likely and so on.
Let's say we want an example of a batch size of 3, then use the following code
input_y = tf.constant( [5,2,9] , dtype=tf.int32 )
predictions = tf.constant( [[9,3,5,2,1],[8,9,0,6,5],[1,9,3,4,5]])
If we substitute in the above lines to the program we can see that indeed it calculates everything for a batch size of 3 correctly.
inspired by #wontonimo' answer above, I implemented a method using matrix ops and tf.reshape, tf.gather. The label tensor are "multi-hot", e.g. [[0,1,0,1],[1,0,0,1]]. prediction tensor are obtained by tf.nn.top_k, looks like [[3,1],[0,1]]. Here is the code:
top_k_pred = tf.nn.top_k(logits, 5)
tmp1 = tf.reshape(tf.range(batch_size)*num_classes, (-1,1))
idx_incre = top_k_pred[1] + tf.concat([tmp1]*5,1)
correct_preds = tf.gather(tf.reshape(y_label, (-1,), tf.reshape(idx_incre, (-1,)))
correct_preds = tf.reshape(correct_pred, (batch_size, 5))
weighted = correct_preds * [[5],[4],[3],[2],[1]]

TensorFlow binary classifier outputs predictions for 3 classes instead of 2?

When I print out the predictions, the output includes 3 separate classes 0, 1, and 2 but I only give it 2 separate classes in the training set 0 and 1. I'm not sure why this is happening. I'm trying to elaborate on a tutorial from TensorFlow Machine Learning Cookbook. This is based on the last example of Chapter 2 if anyone has access to it. Note, there are some errors but that may be incompatibility between the older version from the text.
Anyways, I am trying to develop a very rigid structure when building my models so I can get it engrained in muscle memory. I am instantiating the tf.Graph before-hand for each tf.Session of a set of computations and also setting the number of threads to use. Note, I am using TensorFlow 1.0.1 with Python 3.6.1 so the f"formatstring{var}" won't work if you have an older version of Python.
Where I am getting confused is the last step in the prediction under # Accuracy Predictions section. Why am I getting 3 classes for my classification and why is my accuracy so poor for such a simple classification? I am fairly new at this type of model-based machine learning so I'm sure it's some syntax error or assumption I have made. Is there an error in my code?
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import multiprocessing
# Set number of CPU to use
tf_max_threads = tf.ConfigProto(intra_op_parallelism_threads=multiprocessing.cpu_count())
# Data
seed= 0
size = 50
x = np.concatenate((np.random.RandomState(seed).normal(-1,1,size),
np.random.RandomState(seed).normal(2,1,size)
)
)
y = np.concatenate((np.repeat(0, size),
np.repeat(1, size)
)
)
# Containers
loss_data = list()
A_data = list()
# Graph
G_6 = tf.Graph()
n = 25
# Containers
loss_data = list()
A_data = list()
# Iterations
n_iter = 5000
# Train / Test Set
tr_ratio = 0.8
tr_idx = np.random.RandomState(seed).choice(x.size, round(tr_ratio*x.size), replace=False)
te_idx = np.array(list(set(range(x.size)) - set(tr_idx)))
# Build Graph
with G_6.as_default():
# Placeholders
pH_x = tf.placeholder(tf.float32, shape=[None,1], name="pH_x")
pH_y_hat = tf.placeholder(tf.float32, shape=[None,1], name="pH_y_hat")
# Train Set
x_train = x[tr_idx].reshape(-1,1)
y_train = y[tr_idx].reshape(-1,1)
# Test Set
x_test= x[te_idx].reshape(-1,1)
y_test = y[te_idx].reshape(-1,1)
# Model
A = tf.Variable(tf.random_normal(mean=10, stddev=1, shape=[1], seed=seed), name="A")
model = tf.multiply(pH_x, A)
# Loss
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=model, labels=pH_y_hat))
with tf.Session(graph=G_6, config=tf_max_threads) as sess:
sess.run(tf.global_variables_initializer())
# Optimizer
op = tf.train.GradientDescentOptimizer(0.03)
train_step = op.minimize(loss)
# Train linear model
for i in range(n_iter):
idx_random = np.random.RandomState(i).choice(x_train.size, size=n)
x_tr = x[idx_random].reshape(-1,1)
y_tr = y[idx_random].reshape(-1,1)
sess.run(train_step, feed_dict={pH_x:x_tr, pH_y_hat:y_tr})
# Iterations
A_iter = sess.run(A)[0]
loss_iter = sess.run(loss, feed_dict={pH_x:x_tr, pH_y_hat:y_tr}).mean()
# Append
loss_data.append(loss_iter)
A_data.append(A_iter)
# Log
if (i + 1) % 1000 == 0:
print(f"Step #{i + 1}:\tA = {A_iter}", f"Loss = {to_precision(loss_iter)}", sep="\t")
print()
# Accuracy Predictions
A_result = sess.run(A)
y_ = tf.squeeze(tf.round(tf.nn.sigmoid_cross_entropy_with_logits(logits=model, labels=pH_y_hat)))
correct_predictions = tf.equal(y_, pH_y_hat)
accuracy = tf.reduce_mean(tf.cast(correct_predictions, tf.float32))
print(sess.run(y_, feed_dict={pH_x:x_train, pH_y_hat:y_train}))
print("Training:",
f"Accuracy = {sess.run(accuracy, feed_dict={pH_x:x_train, pH_y_hat:y_train})}",
f"Shape = {x_train.shape}", sep="\t")
print("Testing:",
f"Accuracy = {sess.run(accuracy, feed_dict={pH_x:x_test, pH_y_hat:y_test})}",
f"Shape = {x_test.shape}", sep="\t")
# Plot path
with plt.style.context("seaborn-whitegrid"):
fig, ax = plt.subplots(nrows=3, figsize=(6,6))
pd.Series(loss_data,).plot(ax=ax[0], label="loss", legend=True)
pd.Series(A_data,).plot(ax=ax[1], color="red", label="A", legend=True)
ax[2].hist(x[:size], np.linspace(-5,5), label="class_0", color="red")
ax[2].hist(x[size:], np.linspace(-5,5), label="class_1", color="blue")
alphas = np.linspace(0,0.5, len(A_data))
for i in range(0, len(A_data), 100):
alpha = alphas[i]
a = A_data[i]
ax[2].axvline(a, alpha=alpha, linestyle="--", color="black")
ax[2].legend(loc="upper right")
fig.suptitle("training-process", fontsize=15, y=0.95)
Output Results:
Step #1000: A = 6.72 Loss = 1.13
Step #2000: A = 3.93 Loss = 0.58
Step #3000: A = 2.12 Loss = 0.319
Step #4000: A = 1.63 Loss = 0.331
Step #5000: A = 1.58 Loss = 0.222
[ 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 2.
0. 0. 2. 0. 2. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 2. 0. 0. 0. 0. 0. 0. 0. 1. 0.
1. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.
0. 0. 0. 0. 0. 0. 0. 0.]
Training: Accuracy = 0.475 Shape = (80, 1)
Testing: Accuracy = 0.5 Shape = (20, 1)
Your model doesn't do classification
You have a linear regression model, i.e., your output variable (model = tf.multiply(pH_x, A)) outputs for each input a single scalar value with an arbitrary range. That's generally what you'd have for a prediction model, one that needs to predict some numeric value, not for a classifier.
Afterwards, you treat it like it would contain a typical n-ary classifier output (e.g. by passing it sigmoid_cross_entropy_with_logits) but it does not match the expectations of that function - in that case, the 'shape' of the model variable should be multiple values (e.g. 2 in your case) per input datapoint, each corresponding to some metric corresponding to the probability for each class; then often passed to a softmax function to normalize them.
Alternatively, you may want a binary classifier model that outputs a single value 0 or 1 depending on the class - however, in that case, you want something like the logistic function after the matrix multiplication; and that would need a different loss function, something like simple mean square difference, not sigmoid_cross_entropy_with_logits.
Currently the model as written seems like a mash of two different, incompatible tutorials.

Categories

Resources