I'm having differences of the outputs when comparing a model with its stored protobuf version (via this conversion script). For debugging I'm comparing both layers respectively. For the weights and the actual layer output during a test sequence I receive the identical outputs, thus I'm not sure how to access the hidden layers.
Here is how I load the layers
input = graph.get_tensor_by_name("lstm_1_input_1:0")
layer1 = graph.get_tensor_by_name("lstm_1_1/kernel:0")
layer2 = graph.get_tensor_by_name("lstm_1_1/recurrent_kernel:0")
layer3 = graph.get_tensor_by_name("time_distributed_1_1/kernel:0")
output = graph.get_tensor_by_name("activation_1_1/div:0")
Here is the way what I thought to show the respective elements.
show weights:
with tf.Session(graph=graph) as sess:
print sess.run(layer1)
print sess.run(layer2)
print sess.run(layer3)
show outputs:
with tf.Session(graph=graph) as sess:
y_out, l1_out, l2_out, l3_out = sess.run([output, layer1, layer2, layer3], feed_dict={input: X_test})
With this code sess.run(layer1) == sess.run(layer1,feed_dict={input:X_test}) which shouldn't be true.
Can someone help me out?
When you run sess.run(layer1), you're telling tensorflow to compute the value of layer1 tensor, which is ...
layer1 = graph.get_tensor_by_name("lstm_1_1/kernel:0")
... according to your definition. Note that LSTM kernel is the weights variable. It does not depend on the input, that's why you get the same result with sess.run(layer1, feed_dict={input:X_test}). It's not like tensorflow is computing the output if the input is provided -- it's computing the specified tensor(s), in this case layer1.
When does input matter then? When there is a dependency on it. For example:
sess.run(output). It simply won't work without an input, or any tensor that will allow to compute the input.
The optimization op, such as tf.train.AdapOptimizer(...).minimize(loss). Running this op will change layer1, but it also needs the input to do so.
Maybe you can try TensorBoard and examine your graph to find the outputs of the hidden layers.
Related
I am building a sequence to one model prediction using LSTM. My data has 4 input variables and 1 output variable which needs to be predicted. The data is a time series data. The total length of the data is 38265 (total number of timesteps). The total data is in a Data Frame of size 38265 *5
I want to use the previous 20 timesteps data of the 4 input variables to make prediction of my output variable. I am using the below code for this purpose.
model = Sequential()
model.add(LSTM(units = 120, activation ='relu', return_sequences = False,input_shape =
(train_in.shape[1],5)))
model.add(Dense(100,activation='relu'))
model.add(Dense(50,activation='relu'))
model.add(Dense(1))
I want to calculate the Jacobian of the output variable w.r.t the LSTM model function using tf.Gradient Tape .. Can anyone help me out with this??
The solution to segregate the Jacobian of the output with respect to the LSTM input can be done as follows:
Using tf.GradientTape(), we can compute the Jacobian arising from the gradient flow.
However for getting the Jacobian , the input needs to be in the form of tf.EagerTensor which is usually available when we want to see the Jacobian of the output (after executing y=model(x)). The following code snippet shares this idea:
#Get the Jacobian for each persistent gradient evaluation
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(2,activation='relu'))
model.add(tf.keras.layers.Dense(2,activation='relu'))
x = tf.constant([[5., 6., 3.]])
with tf.GradientTape(persistent=True,watch_accessed_variables=True) as tape:
# Forward pass
tape.watch(x)
y = model(x)
loss = tf.reduce_mean(y**2)
print('Gradients\n')
jacobian_wrt_loss=tape.jacobian(loss,x)
print(f'{jacobian_wrt_loss}\n')
jacobian_wrt_y=tape.jacobian(y,x)
print(f'{jacobian_wrt_y}\n')
But for getting intermediate outputs ,such as in this case, there have been many samples which use Keras. When we separate the outputs coming out from model.layers.output, we get the type to be a Keras.Tensor instead of an EagerTensor.
However for creating the Jacobian, we need the Eager Tensor. (After many failed attempts with #tf.function wrapping as eager execution is already present in TF>2.0)
So alternatively, an auxiliary model can be created with the layers required (in this case, just the Input and LSTM layers).The output of this model will be a tf.EagerTensor which will be useful for the Jacobian tensor creation. The following has been shown in this snippet:
#General Syntax for getting jacobians for each layer output
import numpy as np
import tensorflow as tf
tf.executing_eagerly()
x=tf.constant([[15., 60., 32.]])
x_inp = tf.keras.layers.Input(tensor=tf.constant([[15., 60., 32.]]))
model=tf.keras.Sequential()
model.add(tf.keras.layers.Dense(2,activation='relu',name='dense_1'))
model.add(tf.keras.layers.Dense(2,activation='relu',name='dense_2'))
aux_model=tf.keras.Sequential()
aux_model.add(tf.keras.layers.Dense(2,activation='relu',name='dense_1'))
#model.compile(loss='sparse_categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
with tf.GradientTape(persistent=True,watch_accessed_variables=True) as tape:
# Forward pass
tape.watch(x)
x_y = model(x)
act_y=aux_model(x)
print(x_y,type(x_y))
ops=[layer.output for layer in model.layers]
# ops=[layer.output for layer in model.layers]
# inps=[layer.input for layer in model.layers]
print('Jacobian of Full FFNN\n')
jacobian=tape.jacobian(x_y,x)
print(f'{jacobian[0]}\n')
print('Jacobian of FFNN with Just first Dense\n')
jacobian=tape.jacobian(act_y,x)
print(f'{jacobian[0]}\n')
Here I have used a simple FFNN consisting of 2 Dense layers, but I want to evaluate w.r.t the output of the first Dense layer. Hence I created an auxiliary model having just 1 Dense layer and determined the output of the Jacobian from it.
The details can be found here.
With the help from #Abhilash Majumder, I have done it this way. I am posting it here so that it might help someone in the future.
import numpy as np
import pandas as pd
import tensorflow as tf
tf.compat.v1.enable_eager_execution() #This will enable eager execution which is must.
tf.executing_eagerly() #check if eager execution is enabled or not. Should give "True"
data = pd.read_excel("FileName or Location ")
#My data is in the from of dataframe with 127549 rows and 5 columns(127549*5)
a = data[:20] #shape is (20,5)
b = data[50:70] # shape is (20,5)
A = [a,b] # making a list
A = np.array(A) # convert into array size (2,20,5)
At = tf.convert_to_tensor(A, np.float32) #convert into tensor
At.shape # TensorShape([Dimension(2), Dimension(20), Dimension(5)])
model = load_model('EKF-LSTM-1.h5') # Load the trained model
# I have a trained model which is shown in the question above.
# Output of this model is a single value
with tf.GradientTape(persistent=True,watch_accessed_variables=True) as tape:
tape.watch(At)
y1 = model(At) #defining your output as a function of input variables
print(y1,type(y1)
#output
tf.Tensor([[0.04251503],[0.04634088]], shape=(2, 1), dtype=float32) <class
'tensorflow.python.framework.ops.EagerTensor'>
jacobian=tape.jacobian(y1,At) #jacobian of output w.r.t both inputs
jacobian.shape
Outupt
TensorShape([Dimension(2), Dimension(1), Dimension(2), Dimension(20), Dimension(5)])
Here I calculated Jacobian w.r.t 2 inputs each of size (20,5). If you want to calculate w.r.t to only one input of size (20,5), then use this
jacobian=tape.jacobian(y1,At[0]) #jacobian of output w.r.t only 1st input in 'At'
jacobian.shape
Output
TensorShape([Dimension(1), Dimension(1), Dimension(1), Dimension(20), Dimension(5)])
For those looking to compute the Jacobian over a series of inputs and outputs that are independent of each other for input[i], output[j], i != j, consider the batch_jacobian method.
This will reduce the number of dimensions in your computed Jacobian tensor by one and could be the difference between running out of memory and not.
See: batch_jacobian in the TensorFlow GradientTape docs.
Here is a simple tensorflow functional API model.
input1 = tf.keras.layers.Input(shape=(2,), dtype='float32')
output1 = tf.keras.layers.Dense(2)(input1)
model = tf.keras.Model(inputs=input1, outputs=output1)
In some examples of the functional API, output is obtained using model(), yet there is also model.predict().
With my example above, predict works:
model.predict([[[1.1, 2.2]]])
>> array([[1.8761028 , 0.20520687]], dtype=float32)
If I run just the model though, I get an error:
model([[[1.1, 2.2]]])
>> ... InvalidArgumentError: In[0] is not a matrix [Op:MatMul]
What is the difference and why is the error occuring?
Thanks,
Julian
The error states model() expects a matrix as input, where you've provided a list.
To solve this, just convert it to a matrix:
model(tf.Variable([[[1.1, 2.2]]]))
or
model(np.array([[[1.1, 2.2]]]))
On the difference between model() and model.predict()
The code you are referring to where "output is obtained using model()":
left_proba = model(obs[np.newaxis]) # <--- HERE
action = (tf.random.uniform([1, 1]) > left_proba)
y_target = tf.constant([[1.]]) - tf.cast(action, tf.float32)
loss = tf.reduce_mean(loss_fn(y_target, left_proba))
This is similar to your 2nd line of code:
output1 = tf.keras.layers.Dense(2)(input1)
How is this similar, you ask?
In your code, you create a new node in the graph of layers by calling a Dense layer on this input1 object.
The "layer call" action is like drawing an arrow from input1 to this layer you created.
You're "passing" the inputs to the dense layer, and out you get output1.
In the reference code, they treat model like a layer, and do a "layer call".
See the similarity?:
output = Dense(input)
left_proba = model(obs[...])
In turn this creates new nodes that perform other operations (in the 3 lines that follow).
This is useful when you want to take an existing model and use it as a component (or "layer") to build another, new model.
As for model inference, you will always do this via y = model.predict(x).
I am doing research and for an experiment I want to use gradients of a specific layer in the network with respect to the network's input( similar as guided backprop) as input to another network (classifier). The goal is to 'force' network to change 'attention' according to classifier, so those two networks should be trained simultaneously.
I implemented it on this way :
input_tensor = model.input
output_tensor = model.layers[-2].output
grad_calc = keras.layers.Lambda(lambda x:K.gradients(x,input_tensor)[0],output_shape=(256,256,3),trainable=False)(output_tensor)
pred = classifier(grad_calc)
out_model = Model(input_tensor,pred)
out_model.compile(loss='mse',optimizer=keras.optimizers.Adam(0.0001),metrics=['accuracy'])
Then, when I try to train the model
out_model.train_on_batch(imgs,np.zeros((imgs.shape[0],2)))
it is not working. It seems that it stucks there, nothing is happening (no error nor other message).
I am not sure is this right way to implement this, so I would be very thankful if someone with more experience can take a look and give me advice.
If I was trying to achieve that I would swith to plain Tensorflow and something along the lines:
#build model
input = tf.placeholder()
net = tf.layesr.conv2d(input, 12)
loss = tf.nn.l2_loss(net)
step = tf.train.AdamOptimizer().minimize(loss)
# now inspect your graph and select the gradient tensor you are looking for
for op in tf.get_default_graph.get_operations():
print(op.name)
grad = tf.get_default_graph().get_operation_by_name("enqueue")
with tf.Session as sess:
_, grad, input = sess.run([step, grad, input], ...)
# feed your grad and input into another network
I am trying to create a very simple neural network reading in information with the shape 1x2048 and to create a classification for two categories (object or not object). The graph structure however, deviates from what I believe to have coded. The dense layers should be included in the scope of "inner_layer" and should be receiving their input from the "input" placeholder. Instead, TF seems to be treating them as independent layers which do not receive any information from "input".
Also, when using trying to use tensorboard summaries I get an error telling me that I have not mentioned inserting inputs for the apparent placeholders of the dense layers. When omitting tensorboard, everything works as I expected it based on the code.
I have spent a lot of time trying to find the problem but I think I must be overlooking an something very basic.
The graph I get in tensorboard is on this image.
Which corresponds to the following code:
tf.reset_default_graph()
keep_prob = 0.5
# Graph Strcuture
## Placeholders for input
with tf.name_scope('input'):
x_ = tf.placeholder(tf.float32, shape = [None, transfer_values_train.shape[1]], name = "input1")
y_ = tf.placeholder(tf.float32, shape = [None, num_classes], name = "labels")
## Dense Layer one with 2048 nodes
with tf.name_scope('inner_layers'):
first_layer = tf.layers.dense(x_, units = 2048, activation=tf.nn.relu, name = "first_dense")
dropout_layer = tf.nn.dropout(first_layer, keep_prob, name = "dropout_layer")
#readout layer, without softmax
y_conv = tf.layers.dense(dropout_layer, units = 2, activation=tf.nn.relu, name = "second_dense")
# Evaluation and training
with tf.name_scope('cross_entropy'):
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels = y_ , logits = y_conv),
name = "cross_entropy_layer")
with tf.name_scope('trainer'):
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
with tf.name_scope('accuracy'):
prediction = tf.argmax(y_conv, axis = 1)
correct_prediction = tf.equal(prediction, tf.argmax(y_, axis = 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
Does anyone have an idea why the graph is so different from what you would expect based on the code?
The graph rendering in tensorboard may be a bit confusing (initially), but it's correct. Take a look at this picture where I've left only the inner_layers part of your graph:
You may notice that:
The first_dense and second_dense are actually the name scopes themselves (generated by tf.layers.dense function; see also this question).
Their input/output tensors are inside the inner_layers scope and wire correctly to the dropout_layer. Here, in each of dense layers, live the corresponding linear ops: MatMul, BiasAdd, Relu.
Both scopes also include the variables (kernel and bias each), that are shown separately from inner_layers. They encapsulate the ops related specifically to variable, such as read, assign, initialize, etc. The linear ops in first_dense depend on the variable ops of first_dense, and second_dense likewise.
The reason for this separation is that in distributed settings the variables are manages by a different task called parameter server. It's usually run on a different device (CPU as opposed to GPU), sometimes even on a different machine. In other words, for tensorflow the variable management is by design different from matrix computation.
Having said that, I'd love to see a mode in tensorflow that would not split the scope into variables and ops and keep them coupled.
Other than this the graph perfectly matches the code.
I build a LSTM like:
lstm_cell = tf.nn.rnn_cell.LSTMCell(n_hidden, forget_bias=1.0, state_is_tuple=True, activation=tf.nn.tanh)
lstm_cell = tf.nn.rnn_cell.DropoutWrapper(lstm_cell, output_keep_prob=0.5)
lstm_cell = tf.nn.rnn_cell.MultiRNNCell([lstm_cell] * 3, state_is_tuple=True)
Then i train the model, and save variables.
The next time i load saved variables and skip training, it gives me a different prediction.
If i change the output_keep_prob to 1, this model can always show me the same prediction, but if the output_keep_prob is less than 1, like 0.5, this model shows me different prediction every time.
So i guess if the DropoutWrapper leads to different output?
If so, how can i solve this problem?
Thanks
Try using the seed keyword argument to DropoutWrapper(...):
lstm_cell = tf.nn.rnn_cell.DropoutWrapper(lstm_cell, output_keep_prob=0.5, seed=42)
See the docs here for DropoutWrapper.__init__
Dropout will randomly activate a subset of your net, and is used during training for regularization. Because you've hardcoded dropout as 0.5, it means every time you run the net half your nodes will be randomly silenced, thus producing a different and random result.
You can sanity check this is what's happening by setting a seed, so that the same nodes will be 'randomly' silenced by dropout each time. However, what you probably want to do is make dropout a placeholder so that you can set it to 1.0 (ie keep all the nodes) during test time, which will produce the same output for each input deterministically.