Keras : Gradients of output w.r.t. input as input to classifier - python

I am doing research and for an experiment I want to use gradients of a specific layer in the network with respect to the network's input( similar as guided backprop) as input to another network (classifier). The goal is to 'force' network to change 'attention' according to classifier, so those two networks should be trained simultaneously.
I implemented it on this way :
input_tensor = model.input
output_tensor = model.layers[-2].output
grad_calc = keras.layers.Lambda(lambda x:K.gradients(x,input_tensor)[0],output_shape=(256,256,3),trainable=False)(output_tensor)
pred = classifier(grad_calc)
out_model = Model(input_tensor,pred)
Then, when I try to train the model
it is not working. It seems that it stucks there, nothing is happening (no error nor other message).
I am not sure is this right way to implement this, so I would be very thankful if someone with more experience can take a look and give me advice.

If I was trying to achieve that I would swith to plain Tensorflow and something along the lines:
#build model
input = tf.placeholder()
net = tf.layesr.conv2d(input, 12)
loss = tf.nn.l2_loss(net)
step = tf.train.AdamOptimizer().minimize(loss)
# now inspect your graph and select the gradient tensor you are looking for
for op in tf.get_default_graph.get_operations():
grad = tf.get_default_graph().get_operation_by_name("enqueue")
with tf.Session as sess:
_, grad, input =[step, grad, input], ...)
# feed your grad and input into another network


How are keras tensors connected to layers that create them

In the book "Machine Learning with scikit-learn and Tensorflow" there's a code fragment I can't wrap my head around. Until that chapter, their models were only explicitly using layers - be it in a sequential fashion, or functional. But in the chapter 16, there's this:
import tensorflow_addons as tfa
encoder_inputs = keras.layers.Input(shape=[None], dtype=np.int32)
decoder_inputs = keras.layers.Input(shape=[None], dtype=np.int32)
sequence_lengths = keras.layers.Input(shape=[], dtype=np.int32)
embeddings = keras.layers.Embedding(vocab_size, embed_size)
encoder_embeddings = embeddings(encoder_inputs)
decoder_embeddings = embeddings(decoder_inputs)
encoder = keras.layers.LSTM(512, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_embeddings)
encoder_state = [state_h, state_c]
sampler = tfa.seq2seq.sampler.TrainingSampler()
decoder_cell = keras.layers.LSTMCell(512)
output_layer = keras.layers.Dense(vocab_size)
decoder = tfa.seq2seq.basic_decoder.BasicDecoder(decoder_cell, sampler,
final_outputs, final_state, final_sequence_lengths = decoder(
decoder_embeddings, initial_state=encoder_state,
Y_proba = tf.nn.softmax(final_outputs.rnn_output)
model = keras.models.Model(
inputs=[encoder_inputs, decoder_inputs, sequence_lengths],
And then he just runs the model in a standard way:
model.compile(loss="sparse_categorical_crossentropy", optimizer="adam")
X = np.random.randint(100, size=10*1000).reshape(1000, 10)
Y = np.random.randint(100, size=15*1000).reshape(1000, 15)
X_decoder = np.c_[np.zeros((1000, 1)), Y[:, :-1]]
seq_lengths = np.full([1000], 15)
history =[X, X_decoder, seq_lengths], Y, epochs=2)
I have trouble understanding the code starting at line 7. The author is creating an Embedding layer which he immediately calls on encoder_inputs and decoder_inputs, then he does basically the same with the LSTM layer that he calls on the previously created encoder_embeddings and tensors returned by this operation are used in the code slightly below. What I don't get here is how are those tensors trained? It looks like he's not using the layers creating them in the model, but if so, then how come the embeddings are learned and the whole model converges?
To understand this overall flow, you must understand how things are made under the hood. Tensorflow uses graph execution when making the model. When you have passed [encoder_inputs, decoder_inputs, sequence_lengths] as an inputs and [Y_proba] as a output. The model doesn't immediately start the training, first it builds the model. So, what builds means here, the thing is it makes a computational graph first and then stores this computational graph. model.compile() does this for you, it makes a computational graph for you.
Let me explain it further let's suppose I wanna compute a + b = c and b + d = 2 and finally c * d = 6 using a computational graph, then first Tensorflow will make 3 nodes for it, how will it look like? see in the picture below.
As you are seeing in the picture above the same exact thing is done by TensorFlow when you pass your inputs and outputs. The picture above is the depiction of forward pass. Now the same graph would be used by Tensorflow to do the backward pass. See the figure below.
Now, first, the computational graph is made and then the same computational graph is used to compute the forward pass and backward pass.
The graph above computes the computational graph of your complete model. But how? in your case specifically. The model will ask how Y_prob comes here. The graph consists of the operations and tensors created between the inputs and outputs. The Embedding layer is created and applied to the inputs encoder_inputs and decoder_inputs to obtain encoder_embeddings and decoder_embeddings, respectively. The LSTM layer is applied to encoder_embeddings to produce encoder_outputs, state_h, and state_c. These tensors are then passed as inputs to the BasicDecoder layer, which combines the decoder_embeddings, encoder_state (constructed from state_h and state_c), and sequence_lengths to produce final_outputs, final_state, and final_sequence_lengths. Finally, the softmax function is applied to the rnn_output of final_outputs to produce the final output Y_proba.
All the entities which are mentioned in the paragraph above in quotes would be your intermediate nodes in a computational graph.
So, it will start with the inputs and bring it down to the Y-Prob. During graph computation, the weights of the model and other parameters are also initiated. The graph is made once, which is then easy to compute the forward pass and backward pass.
How do these layers are trained and optimized for convergence?
when you specify inputs=[encoder_inputs, decoder_inputs, sequence_lengths] and outputs=[Y_proba], the model knows the intermediate layers that are used to compute Y_proba from encoder_inputs, decoder_inputs and sequence_lengths. These intermediate layers are the Embedding and LSTM layers, as well as the TrainingSampler, LSTMCell, Dense and BasicDecoder layers. These layers are automatically included in the computation graph of the model, allowing the optimizer to update the parameters of these layers during training.
This is an example of the Keras Functional API. In this style of defining a model, you write the blueprints first, and then use it later. Think of it like wiring a circuit: while you're connecting things, there's no electricity flowing through them (the electricity corresponds to data in our metaphor). Later, you turn on the power source, and electricity flows through.
This is how the Functional API works as well. First, let's read the last line:
model = keras.models.Model(
inputs=[encoder_inputs, decoder_inputs, sequence_lengths],
This says "Hey Keras, I need a model whose inputs are encoder_inputs, decoder_inputs, and sequence_lengths, and they will eventually produce Y_proba. The details of how they will produce this output is defined above. Let's look at the specific lines you're having trouble with:
embeddings = keras.layers.Embedding(vocab_size, embed_size)
encoder_embeddings = embeddings(encoder_inputs)
decoder_embeddings = embeddings(decoder_inputs)
The first of these says, "Keras, give me a layer that will produce embeddings". embeddings is a layer object. The next two lines are the wiring that we talked about: you can connect layers preemptively before data flows through them: that's the crux of the Functional API. So the second layer says, "Keras, encoder_inputs, which is an Input, will connect to (and go through) the Embedding layer I just created, and the result of that will be in a variable I call encoder_embeddings.
The rest of the code follows the same logic: you're connecting the wires together, before you compile your model and eventually fit it with the data.

How to Implement Siamese Network using pretrained CNNs in Keras?

I am developing a Siamese Network for Face Recognition using Keras for 224x224x3 sized images. The architecture of a Siamese Network is like this:
For the CNN model, I am thinking of using the InceptionV3 model which is already pretrained in the Keras.applications module.
#Assume all the other modules are imported correctly
from keras.applications.inception_v3 import InceptionV3
def return_siamese_net():
model1=InceptionV3(include_top=False, weights="imagenet", input_tensor=left_input) #Left SubConvNet
model2=InceptionV3(include_top=False, weights="imagenet", input_tensor=right_input) #Right SubConvNet
#Do Something here
distance_layer = #Do Something
prediction = Dense(1,activation='sigmoid')(distance_layer) # Outputs 1 if the images match and 0 if it does not
siamese_net = #Do Something
return siamese_net
I get error since the model is pretrained, and I am now stuck at implementing the Distance Layer for the Twin Network.
What should I add in between to make this Siamese Network work?
A very important note, before you use the distance layer, is to take into consideration that you have only one convolutional neural network.
The shared weights actually refer to only one convolutional neural network, and the weights are shared because the same weights are used when passing a pair of images (depending on the loss function used) in order to compute the features and subsequently the embeddings of each input image.
You would have only one neural network, and the block logic will need to look like:
def euclidean_distance(vectors):
(features_A, features_B) = vectors
sum_squared = K.sum(K.square(features_A - features_B), axis=1, keepdims=True)
return K.sqrt(K.maximum(sum_squared, K.epsilon()))
image_A = Input(shape=...)
image_B = Input(shape=...)
feature_extractor_model = get_feature_extractor_model(shape=...)
features_A = feature_extractor(image_A)
features_B = feature_extractor(image_B)
distance = Lambda(euclidean_distance)([features_A, features_B])
outputs = Dense(1, activation="sigmoid")(distance)
siamese_model = Model(inputs=[image_A, image_B], outputs=outputs)
Of course, the feature extractor model can be a pretrained network from Keras/TensorFlow, with the output classification layer improved.
The main logic should be like the one above, of course, if you want to use triplet loss, that would require three inputs (Anchor, Positive, Negative), but for the beginning I would recommend to stick to the basics.
Also, it would a good idea to consult this documentation:

Tensorflow: How to get the correct output of a pretrained Keras model with inputs from another Tensorflow model

I have a Keras pre-trained model "model_keras" and I want to use it in a loss function. The input of model "model_keras" is an output of another Tensorflow model "model_tf" (a generative model). I'm trying to update the weights of "model_tf" by minimizing the loss. During the optimization, "model_kears" is only used for inference and will not get updated. My problem is that I'm not able to get the correct inference result from "model_keras", due to this issue, I'm not able to update the "model_tf" correctly. The code is shown below:
loss_func(input, target, model_keras): # the input is an output of another Tensorflow model.
inference_res = model_keras(input)
loss = tf.reduce_mean(inference_res-target)
return loss
train_phase = tf.placeholder(tf.bool)
z = tf.placeholder(tf.float32, [None, 128])
y = tf.placeholder(tf.int32, [None])
t = tf.placeholder(tf.float32, [None, 10])
model_tf = Generator("generator") # Building the Tensorflow model "model_tf"
fake_img = model_tf(z, train_phase, y, NUMS_CLASS) # fake_img is the output of "model_tf" and will be served as the input of "model_keras"
model_keras = MyKerasModel("Vgg19") # Loading the pretrained Keras model
G_loss = loss_func(fake_img, t, model_keras)
G_opt = tf.train.AdamOptimizer(4e-4, beta1=0., beta2=0.9).minimize(G_loss, var_list=model_tf.var_list())
sess = tf.Session(), feed_dict={z: Z, train_phase: True, y: Y, t: target}) # Z, Y and target are numpy arrays.
I also tried to use model.predict(input) but got the ValueError: "When feeding symbolic tensors to a model, we expect the tensors to have a static batch size". Reason behind is that model.predict() expects the input to be real data tensor instead of a symbolic tensor. However, since I want to update the weights of "model_tf", I need to make the loss function differentiable and compute the gradients. Therefore, I can not just pass a numpy array to "model_keras".
How can I get the correct output(inference_res) of "model_keras" in this case? The Tensorflow and Keras version I'm using is 1.15 and 2.2.5, respectively.
If I understood your question, here is an idea. You can pass your input to model_keras and lets name the output keras_y. Then freeze the model_keras and add the model to the end of model_tf so you have a big model which is sequence of model_tf and then model_keras (which the second part has been freezed). Next give the inputs to your model and name the output as model_y. Now you can compute the loss as loss_func(keras_y, model_y)

Difference between model(x) and model.predict(x)

Here is a simple tensorflow functional API model.
input1 = tf.keras.layers.Input(shape=(2,), dtype='float32')
output1 = tf.keras.layers.Dense(2)(input1)
model = tf.keras.Model(inputs=input1, outputs=output1)
In some examples of the functional API, output is obtained using model(), yet there is also model.predict().
With my example above, predict works:
model.predict([[[1.1, 2.2]]])
>> array([[1.8761028 , 0.20520687]], dtype=float32)
If I run just the model though, I get an error:
model([[[1.1, 2.2]]])
>> ... InvalidArgumentError: In[0] is not a matrix [Op:MatMul]
What is the difference and why is the error occuring?
The error states model() expects a matrix as input, where you've provided a list.
To solve this, just convert it to a matrix:
model(tf.Variable([[[1.1, 2.2]]]))
model(np.array([[[1.1, 2.2]]]))
On the difference between model() and model.predict()
The code you are referring to where "output is obtained using model()":
left_proba = model(obs[np.newaxis]) # <--- HERE
action = (tf.random.uniform([1, 1]) > left_proba)
y_target = tf.constant([[1.]]) - tf.cast(action, tf.float32)
loss = tf.reduce_mean(loss_fn(y_target, left_proba))
This is similar to your 2nd line of code:
output1 = tf.keras.layers.Dense(2)(input1)
How is this similar, you ask?
In your code, you create a new node in the graph of layers by calling a Dense layer on this input1 object.
The "layer call" action is like drawing an arrow from input1 to this layer you created.
You're "passing" the inputs to the dense layer, and out you get output1.
In the reference code, they treat model like a layer, and do a "layer call".
See the similarity?:
output = Dense(input)
left_proba = model(obs[...])
In turn this creates new nodes that perform other operations (in the 3 lines that follow).
This is useful when you want to take an existing model and use it as a component (or "layer") to build another, new model.
As for model inference, you will always do this via y = model.predict(x).

TF Graph does not correspond to the code

I am trying to create a very simple neural network reading in information with the shape 1x2048 and to create a classification for two categories (object or not object). The graph structure however, deviates from what I believe to have coded. The dense layers should be included in the scope of "inner_layer" and should be receiving their input from the "input" placeholder. Instead, TF seems to be treating them as independent layers which do not receive any information from "input".
Also, when using trying to use tensorboard summaries I get an error telling me that I have not mentioned inserting inputs for the apparent placeholders of the dense layers. When omitting tensorboard, everything works as I expected it based on the code.
I have spent a lot of time trying to find the problem but I think I must be overlooking an something very basic.
The graph I get in tensorboard is on this image.
Which corresponds to the following code:
keep_prob = 0.5
# Graph Strcuture
## Placeholders for input
with tf.name_scope('input'):
x_ = tf.placeholder(tf.float32, shape = [None, transfer_values_train.shape[1]], name = "input1")
y_ = tf.placeholder(tf.float32, shape = [None, num_classes], name = "labels")
## Dense Layer one with 2048 nodes
with tf.name_scope('inner_layers'):
first_layer = tf.layers.dense(x_, units = 2048, activation=tf.nn.relu, name = "first_dense")
dropout_layer = tf.nn.dropout(first_layer, keep_prob, name = "dropout_layer")
#readout layer, without softmax
y_conv = tf.layers.dense(dropout_layer, units = 2, activation=tf.nn.relu, name = "second_dense")
# Evaluation and training
with tf.name_scope('cross_entropy'):
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels = y_ , logits = y_conv),
name = "cross_entropy_layer")
with tf.name_scope('trainer'):
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
with tf.name_scope('accuracy'):
prediction = tf.argmax(y_conv, axis = 1)
correct_prediction = tf.equal(prediction, tf.argmax(y_, axis = 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
Does anyone have an idea why the graph is so different from what you would expect based on the code?
The graph rendering in tensorboard may be a bit confusing (initially), but it's correct. Take a look at this picture where I've left only the inner_layers part of your graph:
You may notice that:
The first_dense and second_dense are actually the name scopes themselves (generated by tf.layers.dense function; see also this question).
Their input/output tensors are inside the inner_layers scope and wire correctly to the dropout_layer. Here, in each of dense layers, live the corresponding linear ops: MatMul, BiasAdd, Relu.
Both scopes also include the variables (kernel and bias each), that are shown separately from inner_layers. They encapsulate the ops related specifically to variable, such as read, assign, initialize, etc. The linear ops in first_dense depend on the variable ops of first_dense, and second_dense likewise.
The reason for this separation is that in distributed settings the variables are manages by a different task called parameter server. It's usually run on a different device (CPU as opposed to GPU), sometimes even on a different machine. In other words, for tensorflow the variable management is by design different from matrix computation.
Having said that, I'd love to see a mode in tensorflow that would not split the scope into variables and ops and keep them coupled.
Other than this the graph perfectly matches the code.

