Neural Network - convert model output to the predicted target class - python

I am building a neural network from scratch in python.
In the dataset I am using for testing, the features are all numeric (57 features) and the target variable is categorical (10 classes, already converted it to numeric from 0-9, but can use other encoding).
Everything seems to be working, except that I am quite stuck on how to compare my model output with the y_true value to compute the error. So I have 10 classes for the target variable and what I get as output is an array of 10-elements for each observation, instead of a unique value/classification for each sample.
Can someone give me a simple way to convert my output to a single y_predicted value that's comparable with the y_true?
I am trying not to use any libraries except for Numpy and pandas, so using Keras SparseCategoricalCrossentropy() is not an option.

So neural networks don't return the class labels as we know it. They return probabilities of the input belonging to all the classes. Naturally, these probabilities sum up too 1.
Suppose you have 4 classes: A, B, C and D.
if the true label of an input is B, the NN output will not looks like this:
output = [0, 1, 0, 0]
It will look like this:
output = [0.1, 0.6, 0.2, 0.1]
Where the chance of input belonging to class A is 10%, class B is 60%, class C is 20% and class D is 10%.
You can easily make sure that this array sums up to 1 by adding the array elements and dividing each element by the sum. Then, you can use numpy argmax to get the index of the highest probability.
If you don't need the probabilities, then you can skip converting the output such that the sum is 1 and directly apply numpy argmax to get the index of the class with the highest probability.

Related

What are the actual class labels while using SparseCategoricalCrossEntropy loss for multiclass classification in keras?

I am trying to use: [SparseCategoricalCrossEntropy][https://www.tensorflow.org/api_docs/python/tf/keras/losses/SparseCategoricalCrossentropy] for multiclass classification
This will give me the last dimension as the number of classes (N_CLASSES). But I want to retrive the actual class labels from the predictions.
Basically if I have 5 classes (N_CLASSES=5), then I have 5 columns, each containing the probability of the class. But I don't know which column belongs to which actual label. How do I retrieve the actual class labels ?
For example if I have my actual class labels as [1.03, 2.07, -2.09, -974, 366], then from the output of shape (None, 5) how do I know which column represents which class?
Note: I cannot use CategoricalCrossEntropy and pass in the one-hot encoded actual target representation due to memory issues.
Any help will be really appreciated
It is actually pretty simple. Let's assume your model outputs the predictions = [1.03, 2.07, -2.09, -974, 366]. These 5 numbers represent your model's confidence that your input data corresponds to each of the 5 different classes. If you then apply np.argmax to your predictions, which returns the index of the max value in predictions:
np.argmax(predictions)
you will get the index 4. Assuming that each label in your dataset is an integer between 0 and 4, and since you are using SparseCategoricalCrossEntropy, you can say that your model is most confident that your input data belongs to class 4 (whatever class 4 may be). I hope you get the idea.

Autoencoder for Tabular Data with Discrete Values

I want to use an autoencoder for dimension reduction in Keras. The input is a table with discrete values 0,1,2,3,4 (each of these numbers show a category) in the columns. Each subject has a label 0/1 to show sick/healthy. Now I have two questions:
Which activation function should I use in the last layer? Shall I use a combination of sigmoid and ReLU?
I don't know if this kind of input variables need normalization (and if the answer is yes, how?)
Which activation function should I use in the last layer? Shall I use a combination of sigmoid and ReLU?
The activation in the last layer should be sigmoid and use binary_crossentropy loss function for training.
I don't know if this kind of input variables need normalization (and if the answer is yes, how?)
It depends on the nature of discrete values you mentioned. As you know, inputs to a neural network represents the "intensity" of each neurons; higher values mean the neuron being more intensive/active. So, categorical values as input to a NN only makes sense if they map to a continuous range. For example if excellent=3, good=2, bad=1, terrible=0, it's okay to feed these values to a NN because it makes sense to calculate f(wx+b) (intensity of the neuron) as a value of 1.5 means somewhere between bad and good.
However if the categorical values are pure nomial values without any relationship between them (for example: apple=1, orange=2, banana=3), it really doen't make sense to calculate the f(wx+b). In this case what does value 1.5 mean? For this type of data as input to a NN you should convert them to a binary encoding. For example if you have only 3 fruits you can encode this way:
apple = [1, 0, 0]
orange = [0, 1, 0]
banana = [0, 0, 1]
For this binary conversion, Keras has an utility function: to_categorical.

What does predict (Keras) return?

I mean, I know what it returns, but what I really dont´t know is like the following example. It gives me this output: [0.238 0.762] in a model which has only binary outputs [0, 1].
So I know that it is the probabilities of each class to the input given, but what value corresponds to each class? [0, 1] or [1, 0]?
Predict returns the Neural Network outputs at the last layer. This is not necessarily the probabilities, but simply depends on what you used in your neural network architecture. The simple answer is that you can run
model.predict(x) > 0.5
That should work in most cases. The NN will optimize to approach the best solution, but all of the values within are continuous so unless you problem is very easily separable you will rarely get an output that is fully binary.
To answer your question, [0.238 0.762] unless trained strangely likely means [0,1]
For binary classification, the first column is the probability of class 0 and second is of class 1. You can check that in keras code base L257-L260 (here) from keras.wrappers.scikit_learn.KerasClassifier:
# check if binary classification
if probs.shape[1] == 1:
# first column is probability of class 0 and second is of class 1
probs = np.hstack([1 - probs, probs])

TensorFlow: Sample Integers from Gumbel Softmax

I am implementing a program to sample integers from a categorical distribution, where each integer is associated with a probability. I need to ensure that this program is differentiable, so that back propagation can be applied. I found tf.contrib.distributions.RelaxedOneHotCategorical which is very close to what I am trying to achieve.
However, the sample method of this class returns a one-hot vector, instead of an integer. How to write a program that is both differentiable and returns an integer/scalar instead of a vector?
The reason that RelaxedOneHotCategorical is actually differentiable is connected to the fact that it returns a softmax vector of floats instead of the argmax int index. If all you want is the index of the maximal element, you might as well use Categorical.
You can do a dot product of the relaxed one hot vector with a vector of [1 2 3 4 ... n]. The result is going to give you the desired scalar.
For instance if your one hot vector is [0 0 0 1], then dot([0 0 0 1],[1 2 3 4]) will give you 4 which is what you are looking for.
You can't get what you want in a differentiable manner because argmax isn't differentiable, which is why the Gumbel-Softmax distribution was created in the first place. This allows you, for instance, to use the outputs of a language model as inputs to a discriminator in a generative adversarial network because the activation approaches a one-hot vector as the temperature changes.
If you simply need to retrieve the maximal element at inference or testing time, you can use tf.math.argmax. But there's no way to do that in a differentiable manner.

Tensorflow convert predicted values to binary

I created a neural network that is supposed to classify a person as either making more than 50k or less. When I output a prediction, I get values like [ 2.06434059 -2.0643425 ]. But I need them to be in [1, 0] or [0, 1]. Is there any tensorflow function that will convert the predictions, or do I have to do it manually?
Thanks in advance
Take the softmax of your output. eg.
output = tf.nn.softmax(input)
It will convert the values to a probability distribution where the values will sum to one.
Otherwise if you just want {1,0} just take the max of the values coming in.

Categories

Resources