Combine generator loss with GAN loss in Generative Adversarial Network - python

I'm now trying to implement GAN in keras.
I want to use both the GAN loss and the generator loss at the same time when I train the network.
Because I've found from some papers that this might contribute to some performance gain.
It is a little bit like the loss function in the paper 'Multi-Scale Video Frame-Synthesis Network with Transitive Consistency Loss':
Loss function
The original code with the GAN loss alone is like the following:
self.generator = generator
self.discriminator = discriminator
self.gan = Sequential([generator, discriminator])
gen, dis, gendis = self.generator, self.discriminator, self.gan
gendis.compile(optimizer=opt, loss='binary_crossentropy')
I'd like to combine the generator loss together. Thus, I tried the following:
gendis.compile(optimizer=opt, loss={'generator_output': 'mse', 'model_2':'binary_crossentropy'}, loss_weights=[1., 0.2])
But it doesn't work and show the error message: ' ValueError: Unknown entry in loss dictionary: "generator_output". Only expected the following keys: ['model_2']'.
How can I add the generator loss into this training procedure?
Thanks a lot!

You may need to set the key of the loss dict to the model output names. So if the second key was expected to be model_2 so maybe the first one is model_1? Can print out the model summary?

Related

A Classifier Network Seems to be "Forgetting" older samples

This is a strange problem: Imagine a neural network classifier. It is a simple linear layer followed by a sigmoid activation that has an input size of 64, and an output size of 112. There also are 112 training samples, where I expect the output to be a one-hot vector. So the basic structure of a training loop is as follows, where samples is a list of integer indices:
model = nn.Sequential(nn.Linear(64,112),nn.Sequential())
loss_fn = nn.BCELoss()
optimizer = optim.AdamW(model.parameters(),lr=3e-4)
for epoch in range(500):
for input_state, index in samples:
one_hot = torch.zeros(112).float()
one_hot[index] = 1.0
optimizer.zero_grad()
prediction = model(input_state)
loss = loss_fn(prediction,one_hot)
loss.backward()
optimizer.step()
This model does not perform well, but I don't think it's a problem with the model itself, but rather how it's trained. I think that this is happening because for the most part, all of the one_hot tensor is zeros, that the model just tends to gravitate toward all of the outputs being zeros, which is what's happening. The question becomes: "How does this get solved?" I tried using the average loss with all the samples, to no avail. So what do I do?
So this is very embarrassing, but the answer actually lies in how I process my data. This is a text-input project, so I used basic python lists to create blocks of messages, but when I did this, I ended up making it so that all of the inputs the net got were the same, but the output was different every time. I solved tho s problem with the copy method.

How should I keep track of total loss while training a network with a batched dataset?

I am attempting to train a discriminator network by applying gradients to its optimizer. However, when I use a tf.GradientTape to find the gradients of loss w.r.t training variables, None is returned. Here is the training loop:
def train_step():
#Generate noisy seeds
noise = tf.random.normal([BATCH_SIZE, noise_dim])
with tf.GradientTape() as disc_tape:
pattern = generator(noise)
pattern = tf.reshape(tensor=pattern, shape=(28,28,1))
dataset = get_data_set(pattern)
disc_loss = tf.Variable(shape=(1,2), initial_value=[[0,0]], dtype=tf.float32)
disc_tape.watch(disc_loss)
for batch in dataset:
disc_loss.assign_add(discriminator(batch, training=True))
disc_gradients = disc_tape.gradient(disc_loss, discriminator.trainable_variables)
Code Description
The generator network generates a 'pattern' from noise. I then generate a dataset from that pattern by applying various convolutions to the tensor. The dataset that is returned is batched, so I iterate through the dataset and keep track of the loss of my discriminator by adding the loss from this batch to the total loss.
What I do know
tf.GradientTape returns None when there is no graph connection between the two variables. But isn't there a graph connection between loss and trainable variables? I believe my mistake has something to do with how I keep track of loss in the disc_loss tf.Variable
My Question
How do I keep track of loss while iterating through a batched dataset so that I may use it later to calculate gradients?
The base answer here is that the assign_add function of tf.Variable is not differentiable, thus no gradient can be calculated between the variable disc_loss and the discriminator trainable variables.
In this very specific case, the answer was
disc_loss = disc_loss + discriminator(batch, training=True)
In future cases of similar problems, be sure to check that all operations used while being watched by the gradient tape are differentiable.
This link has a list of differentiable and non-differentiable tensorflow ops. I found it very useful.

Keras multiple input, output, loss model

I am working on super-resolution GAN and having some doubts about the code I found on Github. In particular, I have multiple inputs, multiple outputs in the model. Also, I have two different loss functions.
In the following code will the mse loss be applied to img_hr and fake_features?
# Build and compile the discriminator
self.discriminator = self.build_discriminator()
self.discriminator.compile(loss='mse',
optimizer=optimizer,
metrics=['accuracy'])
# Build the generator
self.generator = self.build_generator()
# High res. and low res. images
img_hr = Input(shape=self.hr_shape)
img_lr = Input(shape=self.lr_shape)
# Generate high res. version from low res.
fake_hr = self.generator(img_lr)
# Extract image features of the generated img
fake_features = self.vgg(fake_hr)
# For the combined model we will only train the generator
self.discriminator.trainable = False
# Discriminator determines validity of generated high res. images
validity = self.discriminator(fake_hr)
self.combined = Model([img_lr, img_hr], [validity, fake_features])
self.combined.compile(loss=['binary_crossentropy', 'mse'],
loss_weights=[1e-3, 1],
optimizer=optimizer)
In the following code will the mse loss be applied to img_hr and
fake_features?
From the documentation, https://keras.io/models/model/#compile
"If the model has multiple outputs, you can use a different loss on each output by passing a dictionary or a list of losses."
In this case, the mse loss will be applied to fake_features and the corresponding y_true passed as part of self.combined.fit().
In neural networks Loss is applied to the Outputs of a network in order to have a way of measurement of "How wrong is this output?" so you can take this value and minimize it via Gradient decent and backprop.
Following this Intuition the Losses in keras are a List with the same length as the Outputs of your model. They are appied to the Output with the same index.
self.combined = Model([img_lr, img_hr], [validity, fake_features])
This gives you a model with 2 Inputs (img_lr, img_hr) and 2 outputs (validity, fake_features). So combined.compile(loss=['binary_crossentropy', 'mse']... uses binary_crossentropy loss for validity and Mean Squared Error for fake_features.

Approximation of funtion with multi-dimensional output using a keras neural network

As part of a project for my studies I want to try and approximate a function f:R^m -> R^n using a Keras neural network (to which I am completely new). The network seems to be learning to some (indeed unsatisfactory) point. But the predictions of the network don't resemble the expected results in the slightest.
I have two numpy-arrays containing the training-data (the m-dimensional input for the function) and the training-labels (the n-dimensional expected output of the function). I use them for training my Keras model (see below), which seems to be learning on the provided data.
inputs = Input(shape=(m,))
hidden = Dense(100, activation='sigmoid')(inputs)
hidden = Dense(80, activation='sigmoid')(hidden)
outputs = Dense(n, activation='softmax')(hidden)
opti = tf.keras.optimizers.Adam(lr=0.001)
model = Model(inputs=inputs, outputs=outputs)
model.compile(optimizer=opti,
loss='poisson',
metrics=['accuracy'])
model.fit(training_data, training_labels, verbose = 2, batch_size=32, epochs=30)
When I call the evaluate-method on my model with a set of test-data and a set of test-labels, I get an apparent accuracy of more than 50%. However, when I use the predict method, the predictions of the network do not resemble the expected results in the slightest. For example, the first ten entries of the expected output are:
[0., 0.08193582, 0.13141066, 0.13495408, 0.16852582, 0.2154705 ,
0.30517559, 0.32567417, 0.34073457, 0.37453226]
whereas the first ten entries of the predicted results are:
[3.09514281e-09, 2.20849714e-03, 3.84095078e-03, 4.99367528e-03,
6.06226595e-03, 7.18442770e-03, 8.96730460e-03, 1.03423093e-02, 1.16029680e-02, 1.31887039e-02]
Does this have something to do with the metrics I use? Could the results be normalized by Keras in some intransparent way? Have I just used the wrong kind of model for the problem I want to solve? What does 'accuracy' mean anyway?
Thank you in advance for your help, I am new to neural networks and have been stuck with this issue for several days.
The problem is with this line:
outputs = Dense(n, activation='softmax')(hidden)
We use softmax activation only in a classification problem, where we need a probability distribution over the classes as an output of the network. And so softmax makes ensures that the output sums to one and non zero (which is true in your case). But I don't think the problem at hand for you is a classification task, you are just trying to predict ten continuous target varaibles, so use a linear activation function instead. So modify the above line to something like this
outputs = Dense(n, activation='linear')(hidden)

How does the tensorflow word2vec tutorial update embeddings?

This thread comes close: What is the purpose of weights and biases in tensorflow word2vec example?
But I am still missing something from my interpretation of this: https://github.com/tensorflow/tensorflow/blob/r1.2/tensorflow/examples/tutorials/word2vec/word2vec_basic.py
From what I understand, you feed the network the indices of target and context words from your dictionary.
_, loss_val = session.run([optimizer, loss], feed_dict=feed_dict)
average_loss += loss_val
The batch inputs are then looked up to return the vectors that are randomly generated at the beginning
embeddings = tf.Variable(
tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
# Look up embeddings for inputs.
embed = tf.nn.embedding_lookup(embeddings, train_inputs)
Then an optimizer adjusts the weights and biases to best predict the label as opposed to num_sampled random alternatives
loss = tf.reduce_mean(
tf.nn.nce_loss(weights=nce_weights,
biases=nce_biases,
labels=train_labels,
inputs=embed,
num_sampled=num_sampled,
num_classes=vocabulary_size))
# Construct the SGD optimizer using a learning rate of 1.0.
optimizer = tf.train.GradientDescentOptimizer(1.0).minimize(loss)
My questions are as follows:
Where do the embeddings variable get updated?. It appears to me that I could get the final result by either running the index of a word through the neural network, or by just taking the final_embeddings vectors and using that. But I do not understand where embeddings is ever changed from its random initialization.
If I were to draw this computation graph, what would it look like (or better yet, what is the best way to actually do so)?
Is this running all of the context/target pairs in the batch at once? Or one by one?
Embeddings: Embeddings is a variable. It gets updated every time you do backprop (while running optimizer with loss)
Grpah: Did you try saving the graph and displaying it in tensorboard ? Is this what you're looking for ?
Batching: Atleast in the example you linked, he is doing batch processing using the function at line 96. https://github.com/tensorflow/tensorflow/blob/r1.2/tensorflow/examples/tutorials/word2vec/word2vec_basic.py#L96
Please correct me if I misunderstood your question.

Categories

Resources