Import LSTM from Tensorflow to PyTorch by hand

Import LSTM from Tensorflow to PyTorch by hand - python

I am trying to import a pretrained Model from tensorflow to PyTorch. It takes a single input and maps it onto a single output.
Confusion comes up, when I try to import the LSTM weights
I read the weights and their variables from the file with the following function:
def load_tf_model_weights():
modelpath = 'models/model1.ckpt.meta'
with tf.Session() as sess:
tf.train.import_meta_graph(modelpath)
init = tf.global_variables_initializer()
sess.run(init)
vars = tf.trainable_variables()
W = sess.run(vars)
return W,vars
W,V = load_tf_model_weights()
Then I am inspecting the shapes of the weights
In [33]: [w.shape for w in W]
Out[33]: [(51, 200), (200,), (100, 200), (200,), (50, 1), (1,)]
furthermore the variables are defined as
In [34]: V
Out[34]:
[<tf.Variable 'rnn/multi_rnn_cell/cell_0/lstm_cell/kernel:0' shape=(51, 200) dtype=float32_ref>,
<tf.Variable 'rnn/multi_rnn_cell/cell_0/lstm_cell/bias:0' shape=(200,) dtype=float32_ref>,
<tf.Variable 'rnn/multi_rnn_cell/cell_1/lstm_cell/kernel:0' shape=(100, 200) dtype=float32_ref>,
<tf.Variable 'rnn/multi_rnn_cell/cell_1/lstm_cell/bias:0' shape=(200,) dtype=float32_ref>,
<tf.Variable 'weight:0' shape=(50, 1) dtype=float32_ref>,
<tf.Variable 'FCLayer/Variable:0' shape=(1,) dtype=float32_ref>]
So I can say that the first element of W defines the Kernel of an LSTM and the second element define its bias. According to this post, the shape for the Kernel is defined as
[input_depth + h_depth, 4 * self._num_units]
and the bias as [4 * self._num_units]. We already know that input_depth is 1. So we get, that h_depth and _num_units both have the value 50.
In pytorch my LSTMCell, to which I want to assign the weights, looks like this:
In [38]: cell = nn.LSTMCell(1,50)
In [39]: [p.shape for p in cell.parameters()]
Out[39]:
[torch.Size([200, 1]),
torch.Size([200, 50]),
torch.Size([200]),
torch.Size([200])]
The first two entries can be covered by the first value of W which has the shape (51,200). But the LSTMCell from Tensorflow yields only one bias of shape (200) while pytorch wants two of them
And by leaving the bias out I have weights left over:
cell2 = nn.LSTMCell(1,50,bias=False)
[p.shape for p in cell2.parameters()]
Out[43]: [torch.Size([200, 1]), torch.Size([200, 50])]
Thanks!

pytorch uses CuDNN's LSTM underlayer(even when you don't have CUDA, it still uses something compatible) thus it has one extra bias term.
So you can pick two numbers with their sum equal to 1(0 and 1, 1/2 and 1/2 or anything else) and set your pytorch biases as those numbers times TF's bias.
pytorch_bias_1 = torch.from_numpy(alpha * tf_bias_data)
pytorch_bias_2 = torch.from_numpy((1.0-alpha) * tf_bias_data)

Related

tf.GradientTape returning None for Grad-CAM

I'm trying to implement the following code to apply Grad-CAM to my model.
with tf.GradientTape() as tape:
real_y = tf.expand_dims(y_test_ballest[0], axis=0)
real_y = tf.reshape(real_y, shape=[1,1])
#real_y = tf.Variable(real_y, trainable=True)
value_x = x_test_ballest[0].reshape((-1, 41, 41, 1))
pred_y = model_Ballest(value_x)
model_loss = tf.keras.losses.binary_crossentropy(real_y, pred_y)
#model_loss = tf.Variable(model_loss, trainable=True)
conv_outputs = tf.Variable(gradModel(value_x), trainable=True)
model_gradients = tape.gradient(model_loss, conv_outputs)
Here:
y_test_ballest[0]
is a numpy.int32, with shape (), and
y_test_ballest[0]
is a numpy.ndarray of shape (41,41).
model_Ballest
is the complete model
with the following structure, and
gradModel
is the same model as "model_Ballest" until the "max_pooling2D_5" layer. This model lacks of the last layers. In other words, the model only takes the first 5 layers of "model_Ballest" with their weights.
The inputs of "tape.gradient()" have the following attributes:
model_loss
>> <tf.Tensor: shape=(1,), dtype=float32, numpy=array([----], dtype=float32)>
and
conv_outputs
>> <tf.Variable 'Variable:0' shape=(1, 8, 8, 32) dtype=float32, numpy=array([[[[----]]]])>
Here, the numpy arrays are real arrays but I skiped that data with "----" for the question.

Could I set a part of a tensor untrainable?

It's easy to set a tensor untrainable, trainable=False. But Could I set only part of a tensor untrainable?
Suppose I have a 2*2 tensor, I only want one element untrainable and the other three elements trainable.
Like this (I want the 1,1 element always to be zero, and the other three elements updated by optimizer)
untrainable trainable
trainable trainable
Thanks.

Short answer: you can't.
Longer answer: you can mimic that effect by setting part of the gradient to zero after the computation of the gradient so that part of the variable is never updated.
Here is an example:
import tensorflow as tf
tf.random.set_seed(0)
model = tf.keras.Sequential([tf.keras.layers.Dense(2, activation="sigmoid", input_shape=(2,), name="first"), tf.keras.layers.Dense(1,activation="sigmoid")])
X = tf.random.normal((1000,2))
y = tf.reduce_sum(X, axis=1)
ds = tf.data.Dataset.from_tensor_slices((X,y))
In that example, the first layer has a weight W of the following:
>>> model.get_layer("first").trainable_weights[0]
<tf.Variable 'first/kernel:0' shape=(2, 2) dtype=float32, numpy=
array([[ 0.13573623, -0.68269 ],
[ 0.8938798 , 0.6792033 ]], dtype=float32)>
We then write the custom loop that will only update the first row of that weight W :
loss = tf.losses.MSE
opt = tf.optimizers.SDG(1.) # high learning rate to see the change
for xx,yy in ds.take(1):
with tf.GradientTape() as tape:
l = loss(model(xx),yy)
g = tape.gradient(l,model.get_layer("first").trainable_weights[0])
gradient_slice = g[:1] # first row
new_grad = tf.concat([gradient_slice, tf.zeros((1,2), dtype=tf.float32),], axis=0) # replacing the rest with zeros
opt.apply_gradients(zip([new_grad], [model.get_layer("first").trainable_weights[0]]))
And then, after running that loop, we can inspect the wieghts again:
model.get_layer("first").trainable_weights[0]
<tf.Variable 'first/kernel:0' shape=(2, 2) dtype=float32, numpy=
array([[-0.08515069, -0.51738167],
[ 0.8938798 , 0.6792033 ]], dtype=float32)>
And only the first row changed.

TensorFlow session with two neural networks - TypeError: Fetch argument None has invalid type <class 'NoneType'>

I have seen similar problems described on stackoverflow, but anything I found fit/solve my problem.
I have some reinforcement learning task, when I want to use two neural networks to steers two degrees of freedom.
I have a code like this with two neural network:
def reset_graph(seed=42):
tf.reset_default_graph()
tf.set_random_seed(seed)
np.random.seed(seed)
reset_graph()
n_inputs = 10
n_hidden = 8
n_outputs = 3
learning_rate = 0.0025
initializer = tf.contrib.layers.variance_scaling_initializer()
X1 = tf.placeholder(tf.float32, shape=[None, n_inputs],name='X1')
hidden = tf.layers.dense(X1, 10, activation=tf.nn.tanh,name = 'hidden1', kernel_initializer=initializer)
logits1 = tf.layers.dense(hidden, n_outputs,name='logit1')
outputs1 = tf.nn.softmax(logits1,name='out1')
action1 = tf.multinomial(logits1, num_samples=1,name='action1')
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels= action1[0], logits=logits1,name='cross_e1')
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate,name='opt1')
grads_and_vars = optimizer.compute_gradients(cross_entropy)
gradients = [grad for grad, variable in grads_and_vars]
gradient_placeholders = []
grads_and_vars_feed = []
for grad, variable in grads_and_vars:
gradient_placeholder = tf.placeholder(tf.float32)
gradient_placeholders.append(gradient_placeholder)
grads_and_vars_feed.append((gradient_placeholder, variable))
training_op = optimizer.apply_gradients(grads_and_vars_feed)
X2 = tf.placeholder(tf.float32, shape=[None, n_inputs],name='X2')
initializer2 = tf.contrib.layers.variance_scaling_initializer()
hidden2 = tf.layers.dense(X2, 10, activation=tf.nn.tanh,name='hidden2', kernel_initializer=initializer2)
logits2 = tf.layers.dense(hidden2, 3,name='logit2')
outputs2 = tf.nn.softmax(logits2,name='out2')
action2 = tf.multinomial(logits2, num_samples=1,name='action2')
cross_entropy2 = tf.nn.sparse_softmax_cross_entropy_with_logits(labels= action2[0], logits=logits2,name='cross_e2')
optimizer2 = tf.train.GradientDescentOptimizer(learning_rate=0.002,name = 'opt2')
grads_and_vars2 = optimizer2.compute_gradients(cross_entropy2)
gradients2 = [grad2 for grad2, variable2 in grads_and_vars2]
gradient_placeholders2 = []
grads_and_vars_feed2 = []
for grad2, variable2 in grads_and_vars2:
gradient_placeholder2 = tf.placeholder(tf.float32)
gradient_placeholders2.append(gradient_placeholder2)
grads_and_vars_feed2.append((gradient_placeholder2, variable2))
training_op2 = optimizer2.apply_gradients(grads_and_vars_feed2)
init = tf.global_variables_initializer()
saver = tf.train.Saver()
and when I run it by :
action_val,action_val2,gradients_val,gradients_val2 = sess.run([action,action2, gradients,gradients2], feed_dict={X1: obs.reshape(1, n_inputs),X2: obs.reshape(1, n_inputs)})
I have an error:
TypeError Traceback (most recent call last)
<ipython-input-70-fb66a94fa4dc> in <module>
50 reward, done, obs = agent.step(rotor_speeds)
51
---> 52 action_val,gradients_val,action_val2,gradients_val2 = sess.run([action, gradients,action2, gradients2], feed_dict={X1: obs.reshape(1, n_inputs),X2: obs.reshape(1, n_inputs)})
...
TypeError: Fetch argument None has invalid type <class 'NoneType'>
The problem is with gradients2. When I calculate other parts of graph it work fine. For example:
action_val,action_val2,gradients_val = sess.run(([action1,action2,gradients]),feed_dict={X1: obs.reshape(1, n_inputs),X2: obs.reshape(1, n_inputs)})
it works without problems.
Also, I wonder why on graph generated from above code, the hidden layer(hidden1) and logits (logit1) of first neural network are connected to second optimizer(opt2) , because I don't see this unwanted connections in the code. Maybe it is the reason of problem, but also I don't know how to change it.

Let's look at gradients2, since it seems to cause the error:
>>> gradients2
[None,
None,
None,
None,
<tf.Tensor 'gradients_1/hidden2/MatMul_grad/tuple/control_dependency_1:0' shape=(10, 10) dtype=float32>,
<tf.Tensor 'gradients_1/hidden2/BiasAdd_grad/tuple/control_dependency_1:0' shape=(10,) dtype=float32>,
<tf.Tensor 'gradients_1/logit2/MatMul_grad/tuple/control_dependency_1:0' shape=(10, 3) dtype=float32>,
<tf.Tensor 'gradients_1/logit2/BiasAdd_grad/tuple/control_dependency_1:0' shape=(3,) dtype=float32>]
The first 4 elements are None, which explains why sess.run() fails (you cannot evaluate None).
So why does gradients2 contain None values? These values come from grads_and_vars2 so let's look at it:
>>> grads_and_vars2
[(None, <tf.Variable 'hidden1/kernel:0' shape=(10, 10) dtype=float32_ref>),
(None, <tf.Variable 'hidden1/bias:0' shape=(10,) dtype=float32_ref>),
(None, <tf.Variable 'logit1/kernel:0' shape=(10, 3) dtype=float32_ref>),
(None, <tf.Variable 'logit1/bias:0' shape=(3,) dtype=float32_ref>),
(<tf.Tensor 'gradients_1/hidden2/MatMul_grad/tuple/control_dependency_1:0' shape=(10, 10) dtype=float32>,
<tf.Variable 'hidden2/kernel:0' shape=(10, 10) dtype=float32_ref>),
(<tf.Tensor 'gradients_1/hidden2/BiasAdd_grad/tuple/control_dependency_1:0' shape=(10,) dtype=float32>,
<tf.Variable 'hidden2/bias:0' shape=(10,) dtype=float32_ref>),
(<tf.Tensor 'gradients_1/logit2/MatMul_grad/tuple/control_dependency_1:0' shape=(10, 3) dtype=float32>,
<tf.Variable 'logit2/kernel:0' shape=(10, 3) dtype=float32_ref>),
(<tf.Tensor 'gradients_1/logit2/BiasAdd_grad/tuple/control_dependency_1:0' shape=(3,) dtype=float32>,
<tf.Variable 'logit2/bias:0' shape=(3,) dtype=float32_ref>)]
The None values correspond to the variables hidden1/kernel, hidden1/bias, logit1/kernel and logit1/bias. These are the parameters of the hidden1 and logit1 dense layers. What does this mean? Consider how grads_and_vars2 was computed:
grads_and_vars2 = optimizer2.compute_gradients(cross_entropy2)
So we are asking TensorFlow to compute the gradients of cross_entropy2 with regards to all the variables in the TensorFlow graph. This includes variables which cross_entropy2 does not depend on at all, such as the parameters of the hidden1 and logit1 dense layers. This is why there are some None values.
And since sess.run() does not allow you to evaluate None values, you get an error.
To get rid of this error, you can simply filter out the None values:
gradients = [grad for grad, variable in grads_and_vars if grad is not None]
...
gradients2 = [grad2 for grad2, variable2 in grads_and_vars2 if grad2 is not None]
The error should disappear.
As a side-note: I encourage you to check out TensorFlow 2 (with tf.keras): it's much easier to use.

How to build a tensor from 2 scalars in Tensorflow?

I have two scalars resulting from the following operations:
a = tf.reduce_sum(tensor1), b = tf.matmul(tf.transpose(tensor2), tensor3) this is a dot product since tensor2 and tensor3 have the same dimensions (1-D vectors). Since these tensors have shape [None, dim1] it becomes difficult to deal with the shapes.
I want to build a tensor that has shape (2,1) using a and b.
I tried tf.Tensor([a,b], dtype=tf.float64, value_index=0) but raises the error
TypeError: op needs to be an Operation: [<tf.Tensor 'Sum_5:0' shape=() dtype=float32>, <tf.Tensor 'MatMul_67:0' shape=(?, ?) dtype=float32>]
Any easier way to build that tensor/vector?

This would do probably. Change axis based on what you need
a = tf.constant(1)
b = tf.constant(2)
c = tf.stack([a,b],axis=0)
Output:
array([[1],
[2]], dtype=int32)

You can use concat or stack to achieve this:
import tensorflow as tf
t1 = tf.constant([1])
t2 = tf.constant([2])
c = tf.reshape(tf.concat([t1, t2], 0), (2, 1))
with tf.Session() as sess:
print sess.run(c)
In a similar way you can achieve it with tf.stack.

How to set different weight variables in multiple rnn layers in tensorflow?

Hi I'm writing a "many(input) to many(output)" rnn code to deal with character prediction. For this, I set my hidden layer parameter as follows. Three hidden layers and each of them has 100, 200, and 300 hidden units.
hidden layers= [100,200,300]
My code is this.
# parameters.
hiddenLayers = [100,200,300]
timeStep = 20 # sequence length
inputDimension = 38 (# of English alphabet + symbols)
outputDimension = 38 (# of English alphabet + symbols)
input_x = tf.placeholder(tyf.float64, [None, timeStep, inputDimension])
# make weights
w1 = tf.get_variable("w1",[hiddenUnits[0],hiddenUnits[1] ],initializer=tf.random_normal_initializer())
w2 = tf.get_variable("w2",[hiddenUnits[1],hiddenUnits[2] ],initializer=tf.random_normal_initializer())
w3 = tf.get_variable("w3",[hiddenUnits[2],outputDimension ],initializer=tf.random_normal_initializer())
# make biases
b1 = tf.get_variable("b1",[hiddenUnits[1]], initializer=tf.constant_initializer(0.0))
b2 = tf.get_variable("b2",[hiddenUnits[2]], initializer=tf.constant_initializer(0.0))
b3 = tf.get_variable("b3",[outputDimension], initializer=tf.constant_initializer(0.0))
def cell_generator(hiddenUnits):
return rnn_cell = rnn.BasicLSTMCell(hiddenUnits, forget_bias=1.0)
rnn_cell = rnn.MultiRNNCell([cell_generator(_) for _ in hiddenLayers])
outputs, states = tf.nn.dynamic_rnn(rnn_cell, input_x, dtype=tf.float64)
Now I have 3 hidden layers and each of them has different number of hidden units.
When I printed out "outputs" and "states" and they say liek this. (I didn't run a session.)
print(outputs)
Tensor("rnn/transpose:0", shape=(?, 20, 300), dtype=float64)
print(states)
(LSTMStateTuple(c=<tf.Tensor 'rnn/while/Exit_2:0' shape=(?, 100) dtype=float64>, h=<tf.Tensor 'rnn/while/Exit_3:0' shape=(?, 100) dtype=float64>),
LSTMStateTuple(c=<tf.Tensor 'rnn/while/Exit_4:0' shape=(?, 200) dtype=float64>, h=<tf.Tensor 'rnn/while/Exit_5:0' shape=(?, 200) dtype=float64>),
LSTMStateTuple(c=<tf.Tensor 'rnn/while/Exit_6:0' shape=(?, 300) dtype=float64>, h=<tf.Tensor 'rnn/while/Exit_7:0' shape=(?, 300) dtype=float64>))
I thought that "output" should hold every output from 3 hidden layers but it has only last hidden layer so I got lost where to multiply 1st and 2nd weights. So now I have questions.
Can I use "states" variables which hold 3 hidden layers so that I can multiply my 3 weights and add 3 biases to each of the last states.
for example...
hidden2 = tf.matmul(state[0],w1) + b1
hidden3 = tf.matmul(state[1],w2) + b2
final_output = tf.matmul(state[2],w3) + b3
# and do the loss calculation for training...
Is there any way that I can use "output" variable in order to apply my weights and biases to that cell?
Or is there any other alternative ways that I can use my weights and biases? Maybe initialize weights and biases in the previous steps(before toss the basicRNNCell to the MultiRNNcell)?
I really want to set the different number of hidden layers and units and apply them with my predetermined weight and bias parameters. Please let me know if you have any ideas.
Thanks in advance.

I thought that "output" should hold every output from 3 hidden layers but it has only last hidden layer so I got lost where to multiply 1st and 2nd weights. So now I have questions.
By calling MultiRNNCell on 3 cells, you create a multilayered network, where each cell is a single layer (just like in normal fully connected networks you can have multiple layers), and each layer feeds into the next one, like:
input -> cell1 -> cell2 -> cell3 -> output
So the output of your RNN is the output of cell3, hence the (?, 20, 300) shape. The only variables you need to get the final output is w3 and b3, like
final_output = tf.nn.softmax(tf.matmul(outputs, w3) + b3)
which gives you the predicted distribution over your classes.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Import LSTM from Tensorflow to PyTorch by hand - python

Related

tf.GradientTape returning None for Grad-CAM

Could I set a part of a tensor untrainable?

TensorFlow session with two neural networks - TypeError: Fetch argument None has invalid type <class 'NoneType'>

How to build a tensor from 2 scalars in Tensorflow?

How to set different weight variables in multiple rnn layers in tensorflow?

Categories

Resources