I am coding a wgan in tensorflow on mnist dataset and it works well but I am finding it difficult to clip weights of discriminator model [-0.01,0.01] in tensorflow. In keras we can do weight clipping using.
for l in self.discriminator.layers:
weights = l.get_weights()
weights = [np.clip(w, -self.clip_value, self.clip_value) for w in weights]
l.set_weights(weights)
I have found a tensorflow doc for weight clipping discrimantor
tf.contrib.gan.features.clip_discriminator_weights(
optimizer,
model,
weight_clip
)
Other than this there is not much is given to how use this function.
#my tf code
def generator(z):
h=tf.nn.relu(layer_mlp(z,"g1",[10,128]))
prob=tf.nn.sigmoid(layer_mlp(h,"g2",[128,784]))
return prob
def discriminator(x):
h=tf.nn.relu(layer_mlp(x,"d1",[784,128]))
logit=layer_mlp(h,"d2",[128,1])
prob=tf.nn.sigmoid(logit)
return prob
G_sample=generator(z)
D_real= discriminator(x)
D_fake= discriminator(G_sample)
D_loss = tf.reduce_mean(D_real) - tf.reduce_mean(D_fake)
G_loss = -tf.reduce_mean(D_fake)
for epoch in epochs:
#training the model
Adding to Yaakov's answer, you can use tf.clip_by_value with trainable_variables as shown in this repo https://github.com/hcnoh/WGAN-tensorflow2
for w in model.discriminator.trainable_variables:
w.assign(tf.clip_by_value(w, -clip_const, clip_const))
You can use below function to implement clipping in tensorflow.
tf.clip_by_value(
t,
clip_value_min,
clip_value_max,
name=None
)
Please refer to below links on how to implement it in your code.
https://www.tensorflow.org/api_docs/python/tf/clip_by_value
https://github.com/wiseodd/generative-models/blob/master/GAN/wasserstein_gan/wgan_tensorflow.py
Related
I'm working on a custom transformer model where the training steps method goes like this:
#simplified version of my training method. where model = myTransformerModel()
for windows in data: #step through data
l1 = model(window)
loss = torch.mean(l1)
optimizer.zero_grad()
loss.backward(retain_graph=True)
optimizer.step()
scheduler.step()
I'm trying to recreate this in TensorFlow, currently its like this:
for windows in data: #step through data
with tf.GradientTape() as tape:
l1 = model.call(window)
loss = tf.reduce_mean(l1)
train = optimizer.minimize(loss, var_list=model.trainable_variables,tape=tape)
This functions, but causes the scheduler to step with every window, which throws off the learning rate.
I have also tried this in place of the minimize line:
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
Is there a good way to make the TensorFlow model more like the PyTorch one? Is there a better way to implement my steps with the gradienttape?
I am trying to implement YOLOv3 in tensorflow, I have taken help from online repositories and was successful in converting the darknet weights to tensorflow and run inference.
Now, I am trying to train the model using the YOLO loss as implemented here.
I am using the following code snipppet to do so:
with tf.name_scope('Loss_and_Detect'):
yolo_loss = compute_loss(output, y_true, anchors, config.num_classes, print_loss=False)
tf.summary.scalar('YOLO_loss', yolo_loss)
variables = tf.trainable_variables()
# Variables to be optimized by train_op if the pre-trained darknet-53 is used as is
if config.pre_train:
variables = variables[312:] # Get the weights after the 52nd conv-layer (darknet-53)
# 5e-4 as used in the paper
l2_loss = config.weight_decay * tf.add_n([tf.nn.l2_loss(tf.cast(v, dtype=tf.float32)) for v in variables])
loss = yolo_loss + l2_loss
tf.summary.scalar('L2_loss', l2_loss)
tf.summary.scalar('Total_loss', loss)
# Define an optimizer for minimizing the computed loss
with tf.name_scope('Optimizer'):
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = optimizer.minimize(loss=loss, global_step=global_step, var_list=variables)
The problem is my YOLO_loss is stuck at ~7-8 and the L2_loss keeps on increasing.
Here is a snapshot of the tensorboard with learning rate 1e-6 with exponential decay applied to it.(decay_rate=0.8)
I cannot figure out what an I missing/doing wrong.
Any help is appreciated.
I'm following the code of a coursera assignment which implements a NER tagger using a bidirectional LSTM.
But I'm not able to understand how the embedding matrix is being updated. In the following code, build_layers has a variable embedding_matrix_variable which acts an input the the LSTM. However it's not getting updated anywhere.
Can you help me understand how embeddings are being trained?
def build_layers(self, vocabulary_size, embedding_dim, n_hidden_rnn, n_tags):
initial_embedding_matrix = np.random.randn(vocabulary_size, embedding_dim) / np.sqrt(embedding_dim)
embedding_matrix_variable = tf.Variable(initial_embedding_matrix, name='embedding_matrix', dtype=tf.float32)
forward_cell = tf.nn.rnn_cell.DropoutWrapper(
tf.nn.rnn_cell.BasicLSTMCell(num_units=n_hidden_rnn, forget_bias=3.0),
input_keep_prob=self.dropout_ph,
output_keep_prob=self.dropout_ph,
state_keep_prob=self.dropout_ph
)
backward_cell = tf.nn.rnn_cell.DropoutWrapper(
tf.nn.rnn_cell.BasicLSTMCell(num_units=n_hidden_rnn, forget_bias=3.0),
input_keep_prob=self.dropout_ph,
output_keep_prob=self.dropout_ph,
state_keep_prob=self.dropout_ph
)
embeddings = tf.nn.embedding_lookup(embedding_matrix_variable, self.input_batch)
(rnn_output_fw, rnn_output_bw), _ = tf.nn.bidirectional_dynamic_rnn(
cell_fw=forward_cell, cell_bw=backward_cell,
dtype=tf.float32,
inputs=embeddings,
sequence_length=self.lengths
)
rnn_output = tf.concat([rnn_output_fw, rnn_output_bw], axis=2)
self.logits = tf.layers.dense(rnn_output, n_tags, activation=None)
def compute_loss(self, n_tags, PAD_index):
"""Computes masked cross-entopy loss with logits."""
ground_truth_tags_one_hot = tf.one_hot(self.ground_truth_tags, n_tags)
loss_tensor = tf.nn.softmax_cross_entropy_with_logits(labels=ground_truth_tags_one_hot, logits=self.logits)
mask = tf.cast(tf.not_equal(self.input_batch, PAD_index), tf.float32)
self.loss = tf.reduce_mean(tf.reduce_sum(tf.multiply(loss_tensor, mask), axis=-1) / tf.reduce_sum(mask, axis=-1))
In TensorFlow, variables are not usually updated directly (i.e. by manually setting them to a certain value), but rather they are trained using an optimization algorithm and automatic differentiation.
When you define a tf.Variable, you are adding a node (that maintains a state) to the computational graph. At training time, if the loss node depends on the state of the variable that you defined, TensorFlow will compute the gradient of the loss function with respect to that variable by automatically following the chain rule through the computational graph. Then, the optimization algorithm will make use of the computed gradients to update the values of the trainable variables that took part in the computation of the loss.
Concretely, the code that you provide builds a TensorFlow graph in which the loss self.loss depends on the weights in embedding_matrix_variable (i.e. there is a path between these nodes in the graph), so TensorFlow will compute the gradient with respect to this variable, and the optimizer will update its values when minimizing the loss. It might be useful to inspect the TensorFlow graph using TensorBoard.
I'm using Mxnet to train a XOR neural network, but the losses don't go down, they are always above 0.5.
Below is my code in Mxnet 1.1.0; Python 3.6; OS X El Capitan 10.11.6
I tried 2 loss functions - squared loss and softmax loss, both didn't work.
from mxnet import ndarray as nd
from mxnet import autograd
from mxnet import gluon
import matplotlib.pyplot as plt
X = nd.array([[0,0],[0,1],[1,0],[1,1]])
y = nd.array([0,1,1,0])
batch_size = 1
dataset = gluon.data.ArrayDataset(X, y)
data_iter = gluon.data.DataLoader(dataset, batch_size, shuffle=True)
plt.scatter(X[:, 1].asnumpy(),y.asnumpy())
plt.show()
net = gluon.nn.Sequential()
with net.name_scope():
net.add(gluon.nn.Dense(2, activation="tanh"))
net.add(gluon.nn.Dense(1, activation="tanh"))
net.initialize()
softmax_cross_entropy = gluon.loss.SigmoidBCELoss()#SigmoidBinaryCrossEntropyLoss()
square_loss = gluon.loss.L2Loss()
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.3})
train_losses = []
for epoch in range(100):
train_loss = 0
for data, label in data_iter:
with autograd.record():
output = net(data)
loss = square_loss(output, label)
loss.backward()
trainer.step(batch_size)
train_loss += nd.mean(loss).asscalar()
train_losses.append(train_loss)
plt.plot(train_losses)
plt.show()
I got this question figured out in somewhere else, so I'm going to post the answer here.
Basically, the issue in my original code was multi-dimensional.
Weight initialization. Notice that I used default initialization
net.initialize()
which actually does
net.initialize(initializer.Uniform(scale=0.07))
Apparently these initial weights were too small, and the network could never jump out of them. So the fix is
net.initialize(mx.init.Uniform(1))
After doing this, the network could converge using sigmoid/tanh as the activation, and using L2Loss as the loss function. And it worked with sigmoid and SigmoidBCELoss. However, it still didn't work with tanh and SigmoidBCELoss, which can be fixed by the second item below.
SigmoidBCELoss has to be used in these 2 scenarios in the output layer.
2.1. Linear activation and SigmoidBCELoss(from_sigmoid=False);
2.2. Non-linear activation and SigmoidBCELoss(from_sigmoid=True), in which the output of the non-linear function falls into (0, 1).
In my original code, when I used SigmoidBCELoss, I was using either all sigmoid, or all tanh. So just need to change the activation in the output layer from tanh to sigmoid, and the network could converge. I can still have tanh in the hidden layers.
Hope this helps!
I'm using a very simple NN with a normalized word2vec as input.
When running my train (based on the mini batch) the train cost start around 1020 and decrease around 1000 but never less than this and my accuracy is around 50%.
Why doesn't the cost decrease ? How can I verify that the weigth matrice is updated at each run?
apply_weights_OP = tf.matmul(X, weights, name="apply_weights")
add_bias_OP = tf.add(apply_weights_OP, bias, name="add_bias")
activation_OP = tf.nn.sigmoid(add_bias_OP, name="activation")
cost_OP = tf.nn.l2_loss(activation_OP-yGold, name="squared_error_cost")
optimizer = tf.train.AdamOptimizer(0.001)
global_step = tf.Variable(0, name='global_step', trainable=False)
training_OP = optimizer.minimize(cost_OP, global_step=global_step)
correct_predictions_OP = tf.equal(
tf.argmax(activation_OP, 0),
tf.argmax(yGold, 0)
)
accuracy_OP = tf.reduce_mean(tf.cast(correct_predictions_OP, "float"))
newCost, train_accuracy, _ = sess.run(
[cost_OP, accuracy_OP, training_OP],
feed_dict={
X: trainX[indice_bas: indice_haut],
yGold: trainY[indice_bas: indice_haut]
}
)
Thanks
try using cross entropy instead of the L2 loss, also there is no real point in having an activation function on your output layer.
The examples that ship with tensorflow actually have a basic model that is very similar to what you are trying.
btw: it might also be that the problem you are trying to learn is simply not solvable by a simple linear model (i.e. what you are trying to do), try using a deeper model. Here is an example of a 2 layer deep multilayer perceptron.