I followed this tutorial on GAN - https://github.com/adeshpande3/Generative-Adversarial-Networks/blob/master/Generative%20Adversarial%20Networks%20Tutorial.ipynb
I want to use the trained discriminator for calculating probabilities of test images(I trained on images which represent a certain set, and want to check the probability the test image resembles that set.) I used the following code - (after reloading the model)
newP= sess.run(Dx, feed_dict={x_placeholder: dataset2})
print("prob: " + str(newP)
But It is not giving probabilities, some random floats >1. How to use the trained discrimanator for finding probabilities?
Use, prob = tf.nn.sigmoid(Dx) for your probabilities. Since Dx outputs a single value between 0-1, softmax for a single output will always be 1.(exp(Dx)/exp(Dx) = 1)
Related
I am training 2 autoencoders with 2 separate input paths jointly and I would like to randomly set one of the input paths to zero.
I use tensorflow with keras backend (functional API).
I am computing a joint loss (sum of two losses) for backpropagation.
A -> A' & B ->B'
loss => l2(A,A')+l2(B,B')
networks taking A and B are connected in latent space.
I would like to randomly set A or B to zero and compute the loss only on the corresponding path, meaning if input path A is set to zero loss be computed only by using outputs of only path B and vice versa; e.g.:
0 -> A' & B ->B'
loss: l2(B,B')
How do I randomly set input path to zero? How do I write a callback which does this?
Maybe try the following:
import random
def decision(probability):
return random.random() < probability
Define a method that makes a random decision based on a certain probability x and make your loss calculation depend on this decision.
if current_epoch == random.choice(epochs):
keep_mask = tf.ones_like(A.input, dtype=float32)
throw_mask = tf.zeros_like(A.input, dtype=float32)
if decision(probability=0.5):
total_loss = tf.reduce_sum(reconstruction_loss_a * keep_mask
+ reconstruction_loss_b * throw_mask)
else:
total_loss = tf.reduce_sum(reconstruction_loss_a * throw_mask
+ reconstruction_loss_b * keep_mask)
else:
total_loss = tf.reduce_sum(reconstruction_loss_a + reconstruction_loss_b)
I assume that you do not want to set one of the paths to zero every time you update your model parameters, as then there is a risk that one or even both models will not be sufficiently trained. Also note that I use the input of A to create zero_like and one_like tensors as I assume that both inputs have the same shape; if this is not the case, it can easily be adjusted.
Depending on what your goal is, you may also consider replacing your input of A or B with a random tensor e.g. tf.random.normal based on a random decision. This creates noise in your model, which may be desirable, as your model would be forced to look into the latent space to try reconstruct your original input. This means precisely that you still calculate your reconstruction loss with A.input and A.output, but in reality your model never received the A.input, but rather the random tensor.
Note that this answer serves as a simple conceptual example. A working example with Tensorflow can be found here.
You can set an input to 0 simply:
A = A*random.choice([0,1])
This code can be used inside a loss function
When the outputs (prediction) are the probabilities coming from a Softmax function, and the training target is one-hot type, how do we compare those two different kinds of data to calculate the accuracy?
(the number of training data classified correctly) / (the number of the total training data) *100%
Usually, we assign the class label with highest probability in the output of the soft max function as the label.
preds is list of probability of the label
index=np.argmax(preds)
it will return index of that label from which class it belong
I'd like to use a neural network to predict a scalar value which is the sum of a function of the input values and a random value (I'm assuming gaussian distribution) whose variance also depends on the input values. Now I'd like to have a neural network that has two outputs - the first output should approximate the deterministic part - the function, and the second output should approximate the variance of the random part, depending on the input values. What loss function do I need to train such a network?
(It would be nice if there was an example with Python for Tensorflow, but I'm also interested in general answers. I'm also not quite clear how I could write something like in Python code - none of the examples I found so far show how to address individual outputs from the loss function.)
You can use dropout for that. With a dropout layer you can make several different predictions based on different settings of which nodes dropped out. Then you can simply count the outcomes and interpret the result as a measure for uncertainty.
For details, read:
Gal, Yarin, and Zoubin Ghahramani. "Dropout as a bayesian approximation: Representing model uncertainty in deep learning." international conference on machine learning. 2016.
Since I've found nothing simple to implement, I wrote something myself, that models that explicitly: here is a custom loss function that tries to predict mean and variance. It seems to work but I'm not quite sure how well that works out in practice, and I'd appreciate feedback. This is my loss function:
def meanAndVariance(y_true: tf.Tensor , y_pred: tf.Tensor) -> tf.Tensor :
"""Loss function that has the values of the last axis in y_true
approximate the mean and variance of each value in the last axis of y_pred."""
y_pred = tf.convert_to_tensor(y_pred)
y_true = math_ops.cast(y_true, y_pred.dtype)
mean = y_pred[..., 0::2]
variance = y_pred[..., 1::2]
res = K.square(mean - y_true) + K.square(variance - K.square(mean - y_true))
return K.mean(res, axis=-1)
The output dimension is twice the label dimension - mean and variance of each value in the label. The loss function consists of two parts: a mean squared error that has the mean approximate the mean of the label value, and the variance that approximates the difference of the value from the predicted mean.
When using dropout to estimate the uncertainty (or any other stochastic regularization method), make sure to also checkout our recent work on providing a sampling-free approximation of Monte-Carlo dropout.
https://arxiv.org/pdf/1908.00598.pdf
We essentially follow ur idea. Treat the activations as random variables and then propagate mean and variance using error propagation to the output layer. Consequently, we obtain two outputs - the mean and the variance.
I am trying to follow the udacity tutorial on tensorflow where I came across the following two lines for word embedding models:
# Look up embeddings for inputs.
embed = tf.nn.embedding_lookup(embeddings, train_dataset)
# Compute the softmax loss, using a sample of the negative labels each time.
loss = tf.reduce_mean(tf.nn.sampled_softmax_loss(softmax_weights, softmax_biases,
embed, train_labels, num_sampled, vocabulary_size))
Now I understand that the second statement is for sampling negative labels. But the question is how does it know what the negative labels are? All I am providing the second function is the current input and its corresponding labels along with number of labels that I want to (negatively) sample from. Isn't there the risk of sampling from the input set in itself?
This is the full example: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/udacity/5_word2vec.ipynb
You can find the documentation for tf.nn.sampled_softmax_loss() here. There is even a good explanation of Candidate Sampling provided by TensorFlow here (pdf).
How does it know what the negative labels are?
TensorFlow will randomly select negative classes among all the possible classes (for you, all the possible words).
Isn't there the risk of sampling from the input set in itself?
When you want to compute the softmax probability for your true label, you compute: logits[true_label] / sum(logits[negative_sampled_labels]. As the number of classes is huge (the vocabulary size), there is very little probability to sample the true_label as a negative label.
Anyway, I think TensorFlow removes this possibility altogether when randomly sampling. (EDIT: #Alex confirms TensorFlow does this by default)
Candidate sampling explains how the sampled loss function is calculated:
Compute the loss function in a subset C of all training samples L, where C = T ⋃ S, T is the samples in target classes, and S is the randomly chosen samples in all classes.
The code you provided uses tf.nn.embedding_lookup to get the inputs [batch_size, dim] embed.
Then it uses tf.nn.sampled_softmax_loss to get the sampled loss function:
softmax_weights: A Tensor of shape [num_classes, dim].
softmax_biases: A Tensor of shape [num_classes]. The class biases.
embed: A Tensor of shape [batch_size, dim].
train_labels: A Tensor of shape [batch_size, 1]. The target classes T.
num_sampled: An int. The number of classes to randomly sample per batch. the numbed of classes in S.
vocabulary_size: The number of possible classes.
sampled_values: default to log_uniform_candidate_sampler
For one batch, the target samples are just train_labels (T). It chooses num_sampled samples from embed randomly (S) to be negative samples.
It will uniformly sample from embed respect to the softmax_wiehgt and softmax_bias. Since embed is embeddings[train_dataset] (of shape [batch_size, embedding_size]), if embeddings[train_dataset[i]] contains train_labels[i], it might be selected back, then it is not negative label.
According to Candidate sampling page 2, there are different types. For NCE and negative sampling, NEG=S, which may contain a part of T; for sampled logistic, sampled softmax, NEG = S-T explicitly delete T.
Indeed, it might be a chance of sampling from train_ set.
The tensorflow tutorial on language model allows to compute the probability of sentences :
probabilities = tf.nn.softmax(logits)
in the comments below it also specifies a way of predicting the next word instead of probabilities but does not specify how this can be done. So how to output a word instead of probability using this example?
lstm = rnn_cell.BasicLSTMCell(lstm_size)
# Initial state of the LSTM memory.
state = tf.zeros([batch_size, lstm.state_size])
loss = 0.0
for current_batch_of_words in words_in_dataset:
# The value of state is updated after processing each batch of words.
output, state = lstm(current_batch_of_words, state)
# The LSTM output can be used to make next word predictions
logits = tf.matmul(output, softmax_w) + softmax_b
probabilities = tf.nn.softmax(logits)
loss += loss_function(probabilities, target_words)
Your output is a TensorFlow list and it is possible to get its max argument (the predicted most probable class) with a TensorFlow function. This is normally the list that contains the next word's probabilities.
At "Evaluate the Model" from this page, your output list is y in the following example:
First we'll figure out where we predicted the correct label. tf.argmax
is an extremely useful function which gives you the index of the
highest entry in a tensor along some axis. For example, tf.argmax(y,1)
is the label our model thinks is most likely for each input, while
tf.argmax(y_,1) is the true label. We can use tf.equal to check if our
prediction matches the truth.
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
Another approach that is different is to have pre-vectorized (embedded/encoded) words. You could vectorize your words (therefore embed them) with Word2vec to accelerate learning, you might want to take a look at this. Each word could be represented as a point in a 300 dimensions space of meaning, and you could find automatically the "N words" closest to the predicted point in space at the output of the network. In that case, the argmax way to proceed does not work anymore and you could probably compare on cosine similarity with the words you truly wanted to compare to, but for that I am not sure actually how does this could cause numerical instabilities. In that case y will not represent words as features, but word embeddings over a dimensionality of, let's say, 100 to 2000 in size according to different models. You could Google something like this for more info: "man woman queen word addition word2vec" to understand the subject of embeddings more.
Note: when I talk about word2vec here, it is about using an external pre-trained word2vec model to help your training to only have pre-embedded inputs and create embedding outputs. Those outputs' corresponding words can be re-figured out by word2vec to find the corresponding similar top predicted words.
Notice that the approach I suggest is not exact since it would be only useful to know if we predict EXACTLY the word that we wanted to predict. For a more soft approach, it would be possible to use ROUGE or BLEU metrics for evaluating your model in case you use sentences or something longer than a word.
You need to find the argmax of the probabilities, and translate the index back to a word by reversing the word_to_id map. To get this to work, you must save the probabilities in the model and then fetch them from the run_epoch function (you could also save just the argmax itself). Here's a snippet:
inverseDictionary = dict(zip(word_to_id.values(), word_to_id.keys()))
def run_epoch(...):
decodedWordId = int(np.argmax(logits))
print (" ".join([inverseDictionary[int(x1)] for x1 in np.nditer(x)])
+ " got" + inverseDictionary[decodedWordId] +
+ " expected:" + inverseDictionary[int(y)])
See full implementation here: https://github.com/nelken/tf
It is actually an advantage that the function returns a probability instead of the word itself. Since it is using a list of words, with the associated probabilities, you can do further processing, and increase the accuracy of your result.
To answer your question:
You can take the list of words, iterate though it , and make the program display the word with the highest probability.