Costumizing loss function in keras with condition - python

I want to setup a keras model (tensorflow backend) for a multiclassification problem with 4 different classes. I have both labeled and unlabeled data.
I have worked out the case in which I only train with the labeled data and my model looks something like this:
# create model
inputs = keras.Input(shape=(len(config.variables), ))
X = layers.Dense(units=200, activation="relu")(inputs)
output = layers.Dense(units=4, activation="softmax", name="output")(X)
model = keras.Model(inputs=inputs, outputs=output)
model.compile(optimizer=optimizers.Adam(1e-4), loss=loss_function, metrics=["accuracy"])
# train model
model.fit(
x=train_data,
y=train_class_labels,
batch_size=200,
epochs=200,
verbose=2,
validation_split=0.2,
sample_weight = class_weights
)
I have functioning models with to different losses namely categorical_crossentropy and sparse_categorical_crossentropy, and depending on the loss function my train_class_labels where in one-hot representation (e.g. [ [0,1,0,0], [0,0,0,1], ...]) or in the integer representation (e.g. [0,0,2,1,0,3, ...]) and everything worked fine. class_weights is some weight vector ([0.78, 1,34, ...])
Now for my further plans I need to include the unlabeled data in the training process but I need it to be ignored by the loss function.
What I have tried:
setting the labels from the unlabeled data to [0,0,0,0] when using categorical_crossentropy as a loss, because i thought then my unlabeled data would be ignored by the loss function. Somehow this changed the predictions after training.
I also tried setting the weights from the unlabeled data to 0 but that did have an effect either
I concluded that I need to somehow mark me unlabeled data and customize my loss function so that it can be told to ignore those samples. Something like
def custom_loss(y_true, y_pred):
if y_true == labeled data:
return normal loss function
if y_true == unlabeled data:
return 0
Those are some snippets that I have found but they do not seem to work:
def custom_loss(y_true, y_pred):
loss = losses.sparse_categorical_crossentropy(y_true, y_pred)
return K.switch(K.flatten(K.equal(y_true, -1)), K.zeros_like(loss), loss)
def custom_loss2(y_true, y_pred):
idx = tf.not_equal(y_true, -1)
y_true = tf.boolean_mask(y_true, idx)
y_pred = tf.boolean_mask(y_pred, idx)
return losses.sparse_categorical_crossentropy(y_true, y_pred)
In those examples I set the labels from the unlabeled data to -1 so train_class_labels would look something like this: [0,-1,2,0,3, ... ]
But when using the first loss function I just get Nans and when using the second one I get the following error:
Invalid argument: logits and labels must have the same first dimension, got logits shape [1,5000] and labels shape [5000]

I think that setting the labels to [0,0,0,0] would be just fine. Because the loss is calculated by sum of the log losses of your instances per class (in your case the loss would be 0 for instances with no label).
I don't understand why you are inserting non labeled data in your training in a supervised setting.
I think that the differences that you obtain are due to the batch size and to the gradient step. If there are instances that do not contribute to the gradient descent, the loss calculated would be different than before, and then you get the difference in prediction.
Basically there would be less informative instances per batch.
If you use as batch size the size of all the dataset there would be no difference from a previous training without the unlabeled instances (but always with a training with batch size = size of the dataset)

Related

Python Keras weighted accuracy metric is much different than regular accuracy metric

I am training a Transformer model for Time Series Classification. To check the results, I am using a Baseline model which uses the previous target as the next prediction. I am using a data generator to handle the data. The dataset is imbalanced so I am using sample_weights to deal with this, so the data generator outputs 3 variables: inputs, labels, sample_weights.
I have tried setting the sample_weights to all 1's in order to test that things are working well. The baseline model produces identical results with weighted accuracy and regular accuracy, which is expected. However for the Transformer, I am seeing completely different values for the weighted accuracy and regular accuracy, even though the sample_weights are all 1's. Since the sample weights are all 1's I would expect the weighted accuracy to be the same as the regular accuracy, Why are these different?
It seems like the regular metric is normalized to 1, while the weighted metric is normalized to 100, but why would these be different in this case?
Code:
Function from data generator class to get sample weights
def get_sample_weights(self, inputs, labels):
''' Obtains sample weights for any number of classes.
NOTE: sample_weights pertain a weighting to each label
'''
# get initial sample weights
sample_weights = tf.ones_like(labels, dtype=tf.float64)
# get classes and counts for each one
class_counts = np.bincount(self.train_df.price_change)
total = class_counts.sum()
n_classes = len(class_counts)
weights = tf.constant([1, 1, 1], dtype=tf.float64)
for idx, count in enumerate(class_counts):
# compute weight
# weight = total / (n_classes*count)
weight = weights[idx]
# update weight value
sample_weights = tf.where(tf.equal(labels, float(idx)),
weight,
sample_weights)
return inputs, labels, sample_weights
get baseline results
baseline = Baseline(label_index=single_gen.column_indices['price_change'])
baseline.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=['accuracy'],
weighted_metrics=['accuracy'])
train_metrics = baseline.evaluate(single_gen.train))
val_metrics = baseline.evaluate(single_gen.valid))
Get results with Transformer
transformer_model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(),
optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4),
metrics=['sparse_categorical_accuracy'],
weighted_metrics=['sparse_categorical_accuracy'])
history = transformer_model.fit(aapl_gen.train,
epochs=2,
validation_data=aapl_gen.valid)
train_metrics = transformer_model.evaluate(data_gen.train)
val_metrics = transformer_model.evaluate(data_gen.valid)

How to make predictions on new dataset with tensorflow's gradient tape

While I'm able to understand how to use model.fit(x_train, y_train), I can't figure out how to make predictions on new data using tensorflow's gradient tape. My github repository with runnable code (up to an error) can be found here. What is currently working is that I get the trained model "network_output", however it appears that with gradient tape, argmax is being used on the model itself, where I'm used to model.fit() taking the test data as an input:
network_output = trained_network(input_images,input_number)
preds = np.argmax(network_output, axis=1)
Where "input_images" is an ndarray: (20,3,3,1) and "input_number" is an ndarray: (20,5).
Now I'm taking network_output as the trained model and would like to use it to predict similarly typed data of test_images, and test_number respectively.
The error 'tensorflow.python.framework.ops.EagerTensor' object has no attribute 'predict' here:
predicted_number = network_output.predict(test_images)
Which is because I don't know how to use the tape to make predictions. However once the prediction works I would guess I can compare the resulting "predicted_number" against the "test_number" as would usually be done using the model.fit method.
acc = 0
for i in range(len(test_images)):
if (predicted_number[i] == test_number[i]):
acc += 1
print("Accuracy: ", acc / len(input_images) * 100, "%")
In order to obtain prediction I usually iterate through batches manually like this:
predictions = []
for batch in range(num_batch):
logits = trained_network(x_test[batch * batch_size: (batch + 1) * batch_size], training=False)
# first obtain probabilities
# (if the last layer of the network has no activation, otherwise skip the softmax here)
prob = tf.nn.softmax(logits)
# putting back together predictions for all batches
predictions.extend(tf.argmax(input=prob, axis=1))
If you don't have a lot of data you can skip the loop, this is faster than using predict because you directly invoke the __call__ method of the model:
logits = trained_network(x_test, training=False)
prob = tf.nn.softmax(logits)
predictions = tf.argmax(input=prob, axis=1)
Finally you could also use predict. In this case the batches are handled automatically. It is easier to use when you have lots of data since you don't have to create a loop to interate through batches. The result is a numpy array of predictions. In can be used like this:
predictions = trained_network.predict(x_test) # you can set a batch_size if you want
What you're doing wrong is this part:
network_output = trained_network(input_images,input_number)
predicted_number = network_output.predict(test_images)
You have to call predict directly on your model trained_network.

Multiple outcome values for simple neural network. What activate function to use

Hi I'm trying to build a simple neural network with tensorflow, where I give the model the training_data, which contains the standard values and i give it the target_data, which is the result I want it to have if the predicted value is near one of those numbers.
For example, if I give the y_test a value of 3.5, the model would predict and give a number close to 4. So the condition would say it was a lightsmoker. I searched a bit for activation functions and I learned I can't use sigmoid for what I want to do. I'm quite new on this matter. What i've done so far it's by error and trial.
import random
import tensorflow as tf
import numpy as np
training_data=[]
for i in range(0,5):
training_data.append([random.uniform(0,0.2944)])
for i in range(0,5):
training_data.append([random.uniform(0.2944,1.7394)])
for i in range(0,5):
training_data.append([random.uniform(1.7394,3.2394)])
for i in range(0,5):
training_data.append([random.uniform(3.2394,6)])
target_data=[]
for i in range(0,5):
target_data.append([1])
for i in range(0,5):
target_data.append([2])
for i in range(0,5):
target_data.append([3])
for i in range(0,5):
target_data.append([4])
y_test= np.array([100])
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(len(target_data),input_dim=1,activation='softmax'))
model.add(tf.keras.layers.Dense(1,activation='relu'))
model.compile( loss='mean_squared_error',
optimizer='adam',
metrics=['accuracy'])
training_data = np.asarray(training_data)
target_data = np.asarray(target_data)
model.fit(training_data, target_data, epochs=50, verbose=0)
target_pred= model.predict(y_test)
target_pred=float(target_pred)
print("X=%s, Predicted=%s" % (y_test, target_pred))
if( 0<= target_pred <= 1.5):
print("\nNon-Smoker")
elif(1.5<= target_pred <2.5):
print("\nPassive Smoker")
elif(2.5<= target_pred <3.5 ):
print("Lghtsmoker")
else:
print("Smoker\n")
Here is a helpful guide to using activation functions in the final layer as well as corresponding losses for different type of problems.
In your case, I am assuming you are working with a regression task with arbitrary values (any float value as output, not restricted between 0 to 1 or -1 to 1). So, skip the activation function and keep mse or mean_squared_error as your loss function.
EDIT:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(3,input_shape=(1,),activation='relu'))
model.add(tf.keras.layers.Dense(1))
You are defining your problem as a regression problem where the result of model.predict is a linear value. For that kind of situation the last layer in your model is a linear layer that does not have an activation function. For this kind of problem your loss as mse is fine. Now you could elect to define your problem as a classification problem. Where you have 3 classes, Non-Smoker, Passive-Smoker and Light smoker. Now in that case, your target data in training is not a number in the numerical sense but an integer that indicates which class the training sample represents. For example you could have Non_Smoker with the label 0, Passive_Smoker with the label 1 and Light_Smoker with the label 2. Now the last layer in your model would use a softmax activation function. In model.compile your loss would be sparse_categorical_crossentropy because your labels are integers. If you one-hot encode your labels, for example Non_Smoker coded as 100, Light_Smoker as 010 and Passive_Smoker coded as 001 then your loss fuction would be categorical_cross_entropy. Now when you ran model.predict on a test sample it will produce a list containing 3 probabilities. The first in the list is the probability for class 0 - Non_Smoker, second is the probability for class 1 Light Smoker and the third is the probability of the third class Passive_Smoker. Now what you do is use np.argmax to find which index has the highest probability value and that is then the model's prediction.

keras model.evaluate() does not show loss

I've created a neural network of the following form in keras:
from keras.layers import Dense, Activation, Input
from keras import Model
input_dim_v = 3
hidden_dims=[100, 100, 100]
inputs = Input(shape=(input_dim_v,))
net = inputs
for h_dim in hidden_dims:
net = Dense(h_dim)(net)
net = Activation("elu")(net)
outputs = Dense(self.output_dim_v)(net)
model_v = Model(inputs=inputs, outputs=outputs)
model_v.compile(optimizer='adam', loss='mean_squared_error', metrics=['mse'])
Later, I train it on single examples using model_v.train_on_batch(X[i],y[i]).
To test, whether the neural network is becoming a better function approximator, I wanted to evaluate the model on the accumulated X and y (in my case, X and y grow over time) periodically. However, when I call model_v.evaluate(X, y), only the characteristic progress bars appear in the console, but neither the loss value nor the mse-metric (which are the same in this case) are printed.
How can I change that?
The loss and metric values are not shown in the progress bar of evaluate() method. Instead, they are returned as the output of the evaluate() method and therefore you can print them:
for i in n_iter:
# ... get the i-th batch or sample
# ... train the model using the `train_on_batch` method
# evaluate the model on whole or part of test data
loss_metric = model.evaluate(test_data, test_labels)
print(loss_metric)
According to the documentation, if your model has multiple outputs and/or metrics, you can use model.metric_names attribute to find out what the values in loss_metric correspond to.

Get top-k predictions from tensorflow

I am relatively new in machine learning especially when it comes to implementing algorithms. I am using python and tensorflow library to implement a neural network to train on a dataset which has about 20 classes. I am able to train and get predictions successfully but I have a question,
Is it possible to get top k classes along with their probabilities using tensorflow instead of just a single prediction?
If it is possible how can this be done? Thanks for your guidance.
Update 01:
I am adding code of what I am doing. So I build a neural network with 3 layers having tanh, sigmoid, & sigmoid respectively as activation functions for the hidden layers and softmax for output layer. The code for training and prediction is as follows:
y_pred = None
with tf.Session() as sess:
sess.run(init)
for epoch in range(training_epochs):
# running the training_epoch numbered epoch
_,cost = sess.run([optimizer,cost_function],feed_dict={X:tr_features,Y:tr_labels})
cost_history = np.append(cost_history,cost)
# predict results based on the trained model
y_pred = sess.run(tf.argmax(y_,1),feed_dict={X: ts_features})
Right now y_pred is a list of class labels for each test example of ts_features. But instead of getting 1 single class label for each test example I am hoping to get top-k predictions for each example each of the k-predictions accompanied by some kind of probability.
Using tf.nn.top_k():
top_k_values, top_k_indices = tf.nn.top_k(predictions, k=k)
If predictions is a vector of probabilities per class (i.e. predictions[i] = prediction probability for class i), then top_k_values will contain the k highest probabilities in predictions, and top_k_indices will contain the indices of these probabilities, i.e. the corresponding classes.
Supposing that in your code, y_ is the vector of predicted probabilities per class:
k = 3 # replace with your value
# Instead of `y_pred`:
y_k_probs, y_k_pred = sess.run(
tf.nn.top_k(y_, k=k), feed_dict={X: ts_features})

Categories

Resources