I created a neural network that is supposed to classify a person as either making more than 50k or less. When I output a prediction, I get values like [ 2.06434059 -2.0643425 ]. But I need them to be in [1, 0] or [0, 1]. Is there any tensorflow function that will convert the predictions, or do I have to do it manually?
Thanks in advance
Take the softmax of your output. eg.
output = tf.nn.softmax(input)
It will convert the values to a probability distribution where the values will sum to one.
Otherwise if you just want {1,0} just take the max of the values coming in.
Related
Here: https://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/ under the paragraph: LSTM Network for Regression this guy inverts predictions inside the LSTM-RNN code. If I remove those lines of code, the resulted predictions are useless. I mean the model does not predict anything. So, my question is what the code that invert predictions really does? Why does he use it?
In the field of time series forecasting, raw data generally have large values. For example, in the field of load forecasting, the load value at each moment is about tens of thousands. In order to speed up the convergence of the model, we generally need to normalize the original data. For example, use MinMaxScaler to adjust the range of all data to [0, 1].
It is worth noting that after normalizing the data, the value predicted by the model will also be in the range [0, 1] (if the model converges well). At this time, the prediction result of the model cannot be used directly (the load value in the real scene cannot be in the range of [0, 1]), so we need to inverse normalize the prediction result, that is, inverse_transform.
I am building a neural network from scratch in python.
In the dataset I am using for testing, the features are all numeric (57 features) and the target variable is categorical (10 classes, already converted it to numeric from 0-9, but can use other encoding).
Everything seems to be working, except that I am quite stuck on how to compare my model output with the y_true value to compute the error. So I have 10 classes for the target variable and what I get as output is an array of 10-elements for each observation, instead of a unique value/classification for each sample.
Can someone give me a simple way to convert my output to a single y_predicted value that's comparable with the y_true?
I am trying not to use any libraries except for Numpy and pandas, so using Keras SparseCategoricalCrossentropy() is not an option.
So neural networks don't return the class labels as we know it. They return probabilities of the input belonging to all the classes. Naturally, these probabilities sum up too 1.
Suppose you have 4 classes: A, B, C and D.
if the true label of an input is B, the NN output will not looks like this:
output = [0, 1, 0, 0]
It will look like this:
output = [0.1, 0.6, 0.2, 0.1]
Where the chance of input belonging to class A is 10%, class B is 60%, class C is 20% and class D is 10%.
You can easily make sure that this array sums up to 1 by adding the array elements and dividing each element by the sum. Then, you can use numpy argmax to get the index of the highest probability.
If you don't need the probabilities, then you can skip converting the output such that the sum is 1 and directly apply numpy argmax to get the index of the class with the highest probability.
I want to use an autoencoder for dimension reduction in Keras. The input is a table with discrete values 0,1,2,3,4 (each of these numbers show a category) in the columns. Each subject has a label 0/1 to show sick/healthy. Now I have two questions:
Which activation function should I use in the last layer? Shall I use a combination of sigmoid and ReLU?
I don't know if this kind of input variables need normalization (and if the answer is yes, how?)
Which activation function should I use in the last layer? Shall I use a combination of sigmoid and ReLU?
The activation in the last layer should be sigmoid and use binary_crossentropy loss function for training.
I don't know if this kind of input variables need normalization (and if the answer is yes, how?)
It depends on the nature of discrete values you mentioned. As you know, inputs to a neural network represents the "intensity" of each neurons; higher values mean the neuron being more intensive/active. So, categorical values as input to a NN only makes sense if they map to a continuous range. For example if excellent=3, good=2, bad=1, terrible=0, it's okay to feed these values to a NN because it makes sense to calculate f(wx+b) (intensity of the neuron) as a value of 1.5 means somewhere between bad and good.
However if the categorical values are pure nomial values without any relationship between them (for example: apple=1, orange=2, banana=3), it really doen't make sense to calculate the f(wx+b). In this case what does value 1.5 mean? For this type of data as input to a NN you should convert them to a binary encoding. For example if you have only 3 fruits you can encode this way:
apple = [1, 0, 0]
orange = [0, 1, 0]
banana = [0, 0, 1]
For this binary conversion, Keras has an utility function: to_categorical.
I mean, I know what it returns, but what I really dont´t know is like the following example. It gives me this output: [0.238 0.762] in a model which has only binary outputs [0, 1].
So I know that it is the probabilities of each class to the input given, but what value corresponds to each class? [0, 1] or [1, 0]?
Predict returns the Neural Network outputs at the last layer. This is not necessarily the probabilities, but simply depends on what you used in your neural network architecture. The simple answer is that you can run
model.predict(x) > 0.5
That should work in most cases. The NN will optimize to approach the best solution, but all of the values within are continuous so unless you problem is very easily separable you will rarely get an output that is fully binary.
To answer your question, [0.238 0.762] unless trained strangely likely means [0,1]
For binary classification, the first column is the probability of class 0 and second is of class 1. You can check that in keras code base L257-L260 (here) from keras.wrappers.scikit_learn.KerasClassifier:
# check if binary classification
if probs.shape[1] == 1:
# first column is probability of class 0 and second is of class 1
probs = np.hstack([1 - probs, probs])
I am implementing a program to sample integers from a categorical distribution, where each integer is associated with a probability. I need to ensure that this program is differentiable, so that back propagation can be applied. I found tf.contrib.distributions.RelaxedOneHotCategorical which is very close to what I am trying to achieve.
However, the sample method of this class returns a one-hot vector, instead of an integer. How to write a program that is both differentiable and returns an integer/scalar instead of a vector?
The reason that RelaxedOneHotCategorical is actually differentiable is connected to the fact that it returns a softmax vector of floats instead of the argmax int index. If all you want is the index of the maximal element, you might as well use Categorical.
You can do a dot product of the relaxed one hot vector with a vector of [1 2 3 4 ... n]. The result is going to give you the desired scalar.
For instance if your one hot vector is [0 0 0 1], then dot([0 0 0 1],[1 2 3 4]) will give you 4 which is what you are looking for.
You can't get what you want in a differentiable manner because argmax isn't differentiable, which is why the Gumbel-Softmax distribution was created in the first place. This allows you, for instance, to use the outputs of a language model as inputs to a discriminator in a generative adversarial network because the activation approaches a one-hot vector as the temperature changes.
If you simply need to retrieve the maximal element at inference or testing time, you can use tf.math.argmax. But there's no way to do that in a differentiable manner.