I am implementing a program to sample integers from a categorical distribution, where each integer is associated with a probability. I need to ensure that this program is differentiable, so that back propagation can be applied. I found tf.contrib.distributions.RelaxedOneHotCategorical which is very close to what I am trying to achieve.
However, the sample method of this class returns a one-hot vector, instead of an integer. How to write a program that is both differentiable and returns an integer/scalar instead of a vector?
The reason that RelaxedOneHotCategorical is actually differentiable is connected to the fact that it returns a softmax vector of floats instead of the argmax int index. If all you want is the index of the maximal element, you might as well use Categorical.
You can do a dot product of the relaxed one hot vector with a vector of [1 2 3 4 ... n]. The result is going to give you the desired scalar.
For instance if your one hot vector is [0 0 0 1], then dot([0 0 0 1],[1 2 3 4]) will give you 4 which is what you are looking for.
You can't get what you want in a differentiable manner because argmax isn't differentiable, which is why the Gumbel-Softmax distribution was created in the first place. This allows you, for instance, to use the outputs of a language model as inputs to a discriminator in a generative adversarial network because the activation approaches a one-hot vector as the temperature changes.
If you simply need to retrieve the maximal element at inference or testing time, you can use tf.math.argmax. But there's no way to do that in a differentiable manner.
Related
I am trying to write a model that outputs a vector of length N consisting of labels -1,0 and 1. Each of the labels depicts one of three decisions for the system participants (wireless devices). So the vector depicts a system state that is then passed on to an optimization problem in the next step. Due to the fix problem formulation that is awaiting the output vector a selection of 0,1 and 2 instead is not possible.
After coming across this tanh function to supply the -1,0 and 1 values:
1.5 * backend.tanh(alpha * x) + 0.5 * (backend.tanh(-(3 / alpha) * x)) from here, I was wondering how exactly this output layer and the penultimate layer can be built to suply this vector of labels {-1,0,1}. I tried using the above function in the output layer in a simple Iris classificator. But this resulted in terrible accuracy compared to the one achieved with 0,1,2 and softmax output layer.
Thanks in advance,
with kind regards,
Yuka
It doesn't seem like the outputs are actually "numerically related", for lack of a better term. Meaning, the labels could just as well be "left", "right", "up". So I think your best bet is to have 3 output nodes in the final layer, with softmax activation function, with each of the three nodes representing each of the three labels, using a Cross entropy loss function.
If your training data currently has the target as -1/0/1, you should one-hot encode it so that each target is a vector of length 3. So label 0 might be [0,1,0]
I am building a neural network from scratch in python.
In the dataset I am using for testing, the features are all numeric (57 features) and the target variable is categorical (10 classes, already converted it to numeric from 0-9, but can use other encoding).
Everything seems to be working, except that I am quite stuck on how to compare my model output with the y_true value to compute the error. So I have 10 classes for the target variable and what I get as output is an array of 10-elements for each observation, instead of a unique value/classification for each sample.
Can someone give me a simple way to convert my output to a single y_predicted value that's comparable with the y_true?
I am trying not to use any libraries except for Numpy and pandas, so using Keras SparseCategoricalCrossentropy() is not an option.
So neural networks don't return the class labels as we know it. They return probabilities of the input belonging to all the classes. Naturally, these probabilities sum up too 1.
Suppose you have 4 classes: A, B, C and D.
if the true label of an input is B, the NN output will not looks like this:
output = [0, 1, 0, 0]
It will look like this:
output = [0.1, 0.6, 0.2, 0.1]
Where the chance of input belonging to class A is 10%, class B is 60%, class C is 20% and class D is 10%.
You can easily make sure that this array sums up to 1 by adding the array elements and dividing each element by the sum. Then, you can use numpy argmax to get the index of the highest probability.
If you don't need the probabilities, then you can skip converting the output such that the sum is 1 and directly apply numpy argmax to get the index of the class with the highest probability.
I am trying to create a custom loss function in Keras for generator that generates matrix. The matrix consists of higher number of elements and low number of their centers. Centers have high value comparing to elements - elements have value <0.1, while centers should reach value >0.5. It is important that the centers are at exact correct indices, while it is less important to fit elements. That is why I am trying to create loss that would do the following:
select all elements from y_true where value is >0.5, in numpy I would do indices = np.argwhere(y_true>0.5)
compare values at the given indices for y_true and y_pred, something like loss=(K.square(y_pred[indices]-y_true[indices]))
select all other elements indices_low = np.argwhere(y_true<0.5)
same as step 2, save i.e. as loss_low
return weighted loss, i.e. return loss*100+loss_low, simply to give higher wight to more important data
However, I cannot find a way to achieve this in keras backend, I have found a question about tf.where, trying to look for something similar to my problem but there seem to be nothing like tf.argwhere (can't find in docs, neither browsing net/SO). So how can I achieve this?
Note that the number and positions of centers can vary, and the generator is bad from start so it will not generate any or will generate way more than really should be, so I think that I can't simply use tf.where. I might be incorrect here as I am new to custom loss functions, any thoughts are welcome.
EDIT
After all it seems K.tf.where was exactly what I was looking for, so I have tried it out:
def custom_mse():
def mse(y_true, y_pred):
indices = K.tf.where(y_true>0.5)
loss = K.square(y_true[indices]-y_pred[indices])
indices = K.tf.where(y_true<0.5)
loss_low = K.square(y_true[indices]-y_pred[indices])
return 100*loss+loss_low
return mse
but this keeps throwing an error:
ValueError: Shape must be rank 1 but is rank 3 for 'loss_1/Generator_loss/strided_slice' (op: 'StridedSlice') with input shapes: [?,?,?,?], [1,?,4], [1,?,4], [1].
How can I use the where output?
After a while I finally found the correct solution, so it might help somebody in the future:
Firstly my code was biased by my long time work with numpy and Pandas, thus I have expected tf elements can be addressed as y_true[indices], there are actually built in functions tf.gather and tf.gather_nd for getting elements of a tensor. However, since number of elements in both losses are different, I can't use this because counting losses together will lead to incorrect size error.
This led me to a different approach, thanks to this Q&A. Understanding the code in the accepted answer I have found that you can use tf.where not only to get indices, but as well to apply masks to your tensors. The final solution for my problem is then to apply two masks on the input tensor and calculate two losses, one where I count loss for higher values and one where I count loss for lower values, then multiply the loss that should have higher weight.
def custom_mse():
def mse(y_true, y_pred):
great = K.tf.greater(y_true,0.5)
loss = K.square(tf.where(great, y_true, tf.zeros(tf.shape(y_true)))-tf.where(great, y_pred, tf.zeros(tf.shape(y_pred))))
lower = K.tf.less(y_true,0.5)
loss_low = K.square(tf.where(lower, y_true, tf.zeros(tf.shape(y_true)))-tf.where(lower, y_pred, tf.zeros(tf.shape(y_pred))))
return 100*loss+loss_low
return mse
I want to use an autoencoder for dimension reduction in Keras. The input is a table with discrete values 0,1,2,3,4 (each of these numbers show a category) in the columns. Each subject has a label 0/1 to show sick/healthy. Now I have two questions:
Which activation function should I use in the last layer? Shall I use a combination of sigmoid and ReLU?
I don't know if this kind of input variables need normalization (and if the answer is yes, how?)
Which activation function should I use in the last layer? Shall I use a combination of sigmoid and ReLU?
The activation in the last layer should be sigmoid and use binary_crossentropy loss function for training.
I don't know if this kind of input variables need normalization (and if the answer is yes, how?)
It depends on the nature of discrete values you mentioned. As you know, inputs to a neural network represents the "intensity" of each neurons; higher values mean the neuron being more intensive/active. So, categorical values as input to a NN only makes sense if they map to a continuous range. For example if excellent=3, good=2, bad=1, terrible=0, it's okay to feed these values to a NN because it makes sense to calculate f(wx+b) (intensity of the neuron) as a value of 1.5 means somewhere between bad and good.
However if the categorical values are pure nomial values without any relationship between them (for example: apple=1, orange=2, banana=3), it really doen't make sense to calculate the f(wx+b). In this case what does value 1.5 mean? For this type of data as input to a NN you should convert them to a binary encoding. For example if you have only 3 fruits you can encode this way:
apple = [1, 0, 0]
orange = [0, 1, 0]
banana = [0, 0, 1]
For this binary conversion, Keras has an utility function: to_categorical.
I mean, I know what it returns, but what I really dont´t know is like the following example. It gives me this output: [0.238 0.762] in a model which has only binary outputs [0, 1].
So I know that it is the probabilities of each class to the input given, but what value corresponds to each class? [0, 1] or [1, 0]?
Predict returns the Neural Network outputs at the last layer. This is not necessarily the probabilities, but simply depends on what you used in your neural network architecture. The simple answer is that you can run
model.predict(x) > 0.5
That should work in most cases. The NN will optimize to approach the best solution, but all of the values within are continuous so unless you problem is very easily separable you will rarely get an output that is fully binary.
To answer your question, [0.238 0.762] unless trained strangely likely means [0,1]
For binary classification, the first column is the probability of class 0 and second is of class 1. You can check that in keras code base L257-L260 (here) from keras.wrappers.scikit_learn.KerasClassifier:
# check if binary classification
if probs.shape[1] == 1:
# first column is probability of class 0 and second is of class 1
probs = np.hstack([1 - probs, probs])