How to Implement Siamese Network using pretrained CNNs in Keras?

How to Implement Siamese Network using pretrained CNNs in Keras? - python

I am developing a Siamese Network for Face Recognition using Keras for 224x224x3 sized images. The architecture of a Siamese Network is like this:
For the CNN model, I am thinking of using the InceptionV3 model which is already pretrained in the Keras.applications module.
#Assume all the other modules are imported correctly
from keras.applications.inception_v3 import InceptionV3
IMG_SHAPE=(224,224,3)
def return_siamese_net():
left_input=Input(IMG_SHAPE)
right_input=Input(IMG_SHAPE)
model1=InceptionV3(include_top=False, weights="imagenet", input_tensor=left_input) #Left SubConvNet
model2=InceptionV3(include_top=False, weights="imagenet", input_tensor=right_input) #Right SubConvNet
#Do Something here
distance_layer = #Do Something
prediction = Dense(1,activation='sigmoid')(distance_layer) # Outputs 1 if the images match and 0 if it does not
siamese_net = #Do Something
return siamese_net
model=return_siamese_net()
I get error since the model is pretrained, and I am now stuck at implementing the Distance Layer for the Twin Network.
What should I add in between to make this Siamese Network work?

A very important note, before you use the distance layer, is to take into consideration that you have only one convolutional neural network.
The shared weights actually refer to only one convolutional neural network, and the weights are shared because the same weights are used when passing a pair of images (depending on the loss function used) in order to compute the features and subsequently the embeddings of each input image.
You would have only one neural network, and the block logic will need to look like:
def euclidean_distance(vectors):
(features_A, features_B) = vectors
sum_squared = K.sum(K.square(features_A - features_B), axis=1, keepdims=True)
return K.sqrt(K.maximum(sum_squared, K.epsilon()))
image_A = Input(shape=...)
image_B = Input(shape=...)
feature_extractor_model = get_feature_extractor_model(shape=...)
features_A = feature_extractor(image_A)
features_B = feature_extractor(image_B)
distance = Lambda(euclidean_distance)([features_A, features_B])
outputs = Dense(1, activation="sigmoid")(distance)
siamese_model = Model(inputs=[image_A, image_B], outputs=outputs)
Of course, the feature extractor model can be a pretrained network from Keras/TensorFlow, with the output classification layer improved.
The main logic should be like the one above, of course, if you want to use triplet loss, that would require three inputs (Anchor, Positive, Negative), but for the beginning I would recommend to stick to the basics.
Also, it would a good idea to consult this documentation:
https://www.pyimagesearch.com/2020/11/30/siamese-networks-with-keras-tensorflow-and-deep-learning/
https://towardsdatascience.com/one-shot-learning-with-siamese-networks-using-keras-17f34e75bb3d

Related

How are keras tensors connected to layers that create them

In the book "Machine Learning with scikit-learn and Tensorflow" there's a code fragment I can't wrap my head around. Until that chapter, their models were only explicitly using layers - be it in a sequential fashion, or functional. But in the chapter 16, there's this:
import tensorflow_addons as tfa
encoder_inputs = keras.layers.Input(shape=[None], dtype=np.int32)
decoder_inputs = keras.layers.Input(shape=[None], dtype=np.int32)
sequence_lengths = keras.layers.Input(shape=[], dtype=np.int32)
embeddings = keras.layers.Embedding(vocab_size, embed_size)
encoder_embeddings = embeddings(encoder_inputs)
decoder_embeddings = embeddings(decoder_inputs)
encoder = keras.layers.LSTM(512, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_embeddings)
encoder_state = [state_h, state_c]
sampler = tfa.seq2seq.sampler.TrainingSampler()
decoder_cell = keras.layers.LSTMCell(512)
output_layer = keras.layers.Dense(vocab_size)
decoder = tfa.seq2seq.basic_decoder.BasicDecoder(decoder_cell, sampler,
output_layer=output_layer)
final_outputs, final_state, final_sequence_lengths = decoder(
decoder_embeddings, initial_state=encoder_state,
sequence_length=sequence_lengths)
Y_proba = tf.nn.softmax(final_outputs.rnn_output)
model = keras.models.Model(
inputs=[encoder_inputs, decoder_inputs, sequence_lengths],
outputs=[Y_proba])
And then he just runs the model in a standard way:
model.compile(loss="sparse_categorical_crossentropy", optimizer="adam")
X = np.random.randint(100, size=10*1000).reshape(1000, 10)
Y = np.random.randint(100, size=15*1000).reshape(1000, 15)
X_decoder = np.c_[np.zeros((1000, 1)), Y[:, :-1]]
seq_lengths = np.full([1000], 15)
history = model.fit([X, X_decoder, seq_lengths], Y, epochs=2)
I have trouble understanding the code starting at line 7. The author is creating an Embedding layer which he immediately calls on encoder_inputs and decoder_inputs, then he does basically the same with the LSTM layer that he calls on the previously created encoder_embeddings and tensors returned by this operation are used in the code slightly below. What I don't get here is how are those tensors trained? It looks like he's not using the layers creating them in the model, but if so, then how come the embeddings are learned and the whole model converges?

To understand this overall flow, you must understand how things are made under the hood. Tensorflow uses graph execution when making the model. When you have passed [encoder_inputs, decoder_inputs, sequence_lengths] as an inputs and [Y_proba] as a output. The model doesn't immediately start the training, first it builds the model. So, what builds means here, the thing is it makes a computational graph first and then stores this computational graph. model.compile() does this for you, it makes a computational graph for you.
Let me explain it further let's suppose I wanna compute a + b = c and b + d = 2 and finally c * d = 6 using a computational graph, then first Tensorflow will make 3 nodes for it, how will it look like? see in the picture below.
As you are seeing in the picture above the same exact thing is done by TensorFlow when you pass your inputs and outputs. The picture above is the depiction of forward pass. Now the same graph would be used by Tensorflow to do the backward pass. See the figure below.
Now, first, the computational graph is made and then the same computational graph is used to compute the forward pass and backward pass.
The graph above computes the computational graph of your complete model. But how? in your case specifically. The model will ask how Y_prob comes here. The graph consists of the operations and tensors created between the inputs and outputs. The Embedding layer is created and applied to the inputs encoder_inputs and decoder_inputs to obtain encoder_embeddings and decoder_embeddings, respectively. The LSTM layer is applied to encoder_embeddings to produce encoder_outputs, state_h, and state_c. These tensors are then passed as inputs to the BasicDecoder layer, which combines the decoder_embeddings, encoder_state (constructed from state_h and state_c), and sequence_lengths to produce final_outputs, final_state, and final_sequence_lengths. Finally, the softmax function is applied to the rnn_output of final_outputs to produce the final output Y_proba.
All the entities which are mentioned in the paragraph above in quotes would be your intermediate nodes in a computational graph.
So, it will start with the inputs and bring it down to the Y-Prob. During graph computation, the weights of the model and other parameters are also initiated. The graph is made once, which is then easy to compute the forward pass and backward pass.
How do these layers are trained and optimized for convergence?
when you specify inputs=[encoder_inputs, decoder_inputs, sequence_lengths] and outputs=[Y_proba], the model knows the intermediate layers that are used to compute Y_proba from encoder_inputs, decoder_inputs and sequence_lengths. These intermediate layers are the Embedding and LSTM layers, as well as the TrainingSampler, LSTMCell, Dense and BasicDecoder layers. These layers are automatically included in the computation graph of the model, allowing the optimizer to update the parameters of these layers during training.

This is an example of the Keras Functional API. In this style of defining a model, you write the blueprints first, and then use it later. Think of it like wiring a circuit: while you're connecting things, there's no electricity flowing through them (the electricity corresponds to data in our metaphor). Later, you turn on the power source, and electricity flows through.
This is how the Functional API works as well. First, let's read the last line:
model = keras.models.Model(
inputs=[encoder_inputs, decoder_inputs, sequence_lengths],
outputs=[Y_proba])
This says "Hey Keras, I need a model whose inputs are encoder_inputs, decoder_inputs, and sequence_lengths, and they will eventually produce Y_proba. The details of how they will produce this output is defined above. Let's look at the specific lines you're having trouble with:
embeddings = keras.layers.Embedding(vocab_size, embed_size)
encoder_embeddings = embeddings(encoder_inputs)
decoder_embeddings = embeddings(decoder_inputs)
The first of these says, "Keras, give me a layer that will produce embeddings". embeddings is a layer object. The next two lines are the wiring that we talked about: you can connect layers preemptively before data flows through them: that's the crux of the Functional API. So the second layer says, "Keras, encoder_inputs, which is an Input, will connect to (and go through) the Embedding layer I just created, and the result of that will be in a variable I call encoder_embeddings.
The rest of the code follows the same logic: you're connecting the wires together, before you compile your model and eventually fit it with the data.

How do you pass outputs of two different models as inputs to a another model in keras?

Here's the kind of model I want to make
I'm trying to make a sort of adversarial network. I have already trained a discriminator. Now I want to connect outputs of two separate VGG16 models to this discriminator. I'm trying to avoid retraining the discriminator and just simply load pre-trained weights, that is why I want to load the model and then somehow connect it with the two VGG16 models.
If there is a way I can load the weights for the discriminator part only somehow that would be great as well!
Apologies if my query is not in line with the guidelines, this is my second every post on stackoverflow.
EDIT: Here's the code I've written for the overall structure without loading weights, i.e. I will be training this from scratch.
sat/street extractors are VGG16 networks that learn features
sat_fv = sat_extractor.layers[-1].output
str_fv = street_extractor.layers[-1].output
deep_sat_feat = Reshape((32, 32, 1))(sat_fv)
deep_street_feat = Reshape((32, 32, 1))(str_fv)
deep_merged = concatenate([deep_sat_feat, deep_street_feat], axis=3)
deep_act_1 = conv2d_block(deep_merged)
flat = Flatten()(deep_act_1)
dense = Dense(4096)(flat)
act = LeakyReLU()(dense)
dense = Dense(1024)(act)
act = LeakyReLU()(dense)
dense = Dense(128)(act)
act = LeakyReLU()(dense)
den_3 = Dense(1)(act)
act_5 = Activation("sigmoid")(den_3)
generator_model = Model(inputs=[sat_extractor.input,street_extractor.input], outputs=[sat_fv,str_fv,act_5])
discriminator_model = Model(inputs=[sat_extractor.input,street_extractor.input], outputs=[act_5])
generator_model.compile('Adam',loss=['categorical_crossentropy','categorical_crossentropy','binary_crossentropy'],loss_weights=[1/15,1/15,1/3])
discriminator_model.compile(loss="binary_crossentropy", optimizer=Adam(0.00001), metrics=["accuracy"])

Is it possible to create multiple instances of the same CNN that take in multiple images and are concatenated into a dense layer? (keras)

Similar to this question, I'm looking to have several image input layers that go through one larger CNN (e.g. XCeption minus dense layers), and then have the output of the one CNN across all images be concatenated into a dense layer.
Is this possible with Keras or is it even possible to train a network from the ground-up with this architecture?
I'm essentially looking to train a model that takes in a larger but fixed number of images per sample (i.e. 3+ image inputs with similar visual features), but not to explode the number of parameters by training several CNNs at once. The idea is to train only one CNN, that can be used for all the outputs. Having all images go into the same dense layers is important so the model can learn the associations across multiple images, which are always ordered based on their source.

You can easily achieve this using the Keras functional API the following way.
from tensorflow.python.keras import layers, models, applications
# Multiple inputs
in1 = layers.Input(shape=(128,128,3))
in2 = layers.Input(shape=(128,128,3))
in3 = layers.Input(shape=(128,128,3))
# CNN output
cnn = applications.xception.Xception(include_top=False)
out1 = cnn(in1)
out2 = cnn(in2)
out3 = cnn(in3)
# Flattening the output for the dense layer
fout1 = layers.Flatten()(out1)
fout2 = layers.Flatten()(out2)
fout3 = layers.Flatten()(out3)
# Getting the dense output
dense = layers.Dense(100, activation='softmax')
dout1 = dense(fout1)
dout2 = dense(fout2)
dout3 = dense(fout3)
# Concatenating the final output
out = layers.Concatenate(axis=-1)([dout1, dout2, dout3])
# Creating the model
model = models.Model(inputs=[in1,in2,in3], outputs=out)
model.summary()```

Adding Dropout to testing/inference phase

I've trained the following model for some timeseries in Keras:
input_layer = Input(batch_shape=(56, 3864))
first_layer = Dense(24, input_dim=28, activation='relu',
activity_regularizer=None,
kernel_regularizer=None)(input_layer)
first_layer = Dropout(0.3)(first_layer)
second_layer = Dense(12, activation='relu')(first_layer)
second_layer = Dropout(0.3)(second_layer)
out = Dense(56)(second_layer)
model_1 = Model(input_layer, out)
Then I defined a new model with the trained layers of model_1 and added dropout layers with a different rate, drp, to it:
input_2 = Input(batch_shape=(56, 3864))
first_dense_layer = model_1.layers[1](input_2)
first_dropout_layer = model_1.layers[2](first_dense_layer)
new_dropout = Dropout(drp)(first_dropout_layer)
snd_dense_layer = model_1.layers[3](new_dropout)
snd_dropout_layer = model_1.layers[4](snd_dense_layer)
new_dropout_2 = Dropout(drp)(snd_dropout_layer)
output = model_1.layers[5](new_dropout_2)
model_2 = Model(input_2, output)
Then I'm getting the prediction results of these two models as follow:
result_1 = model_1.predict(test_data, batch_size=56)
result_2 = model_2.predict(test_data, batch_size=56)
I was expecting to get completely different results because the second model has new dropout layers and theses two models are different (IMO), but that's not the case. Both are generating the same result. Why is that happening?

As I mentioned in the comments, the Dropout layer is turned off in inference phase (i.e. test mode), so when you use model.predict() the Dropout layers are not active. However, if you would like to have a model that uses Dropout both in training and inference phase, you can pass training argument when calling it, as suggested by François Chollet:
# ...
new_dropout = Dropout(drp)(first_dropout_layer, training=True)
# ...
Alternatively, If you have already trained your model and now want to use it in inference mode and keep the Dropout layers (and possibly other layers which have different behavior in training/inference phase such as BatchNormalization) active, you can define a backend function that takes the model's inputs as well as Keras learning phase:
from keras import backend as K
func = K.function(model.inputs + [K.learning_phase()], model.outputs)
# to use it pass 1 to set the learning phase to training mode
outputs = func([input_arrays] + [1.])

your question has a simple solution in the latest version of Tensorflow. you can set the training argument of the call method to true.
you can run a code like the below code:
model(input,training=True)
by using training=True TensorFlow automatically applies the Dropout layer in inference mode.

As there are already some working code solutions above, I will simply add a few more details regarding dropout during inference to prevent confusion.
Based on the original paper, Dropout layers play the role of turning off (setting gradients to zero) the neuron nodes during training to reduce overfitting. However, once we finish off with training and start testing the model, we do not 'touch' any neurons, thus, all the units are considered to make the decision when inferencing. This causes previously 'dead' neuron weights to be large than expected due to the usage of Dropout. To prevent this, a scaling factor is applied to balance the network node. To be more precise, if a unit is retained with probability p during training, the outgoing weights of that unit are multiplied by p during the prediction stage.

How to design and train convolutional neural networks to estimate third band in an image to ful accuracy

I am trying to estimate the third band(Blue) in an RGB image using convolutional neural networks. my design using Keras is a sequentiol model with a convolution2D layer as input layer two hidden layers and output neuron. if i want loss(rmse) to be zero how should i change my model?
my model in python goes like this
in_image = skimage.io.imread('test.jpg')[0:50,0:50,:].astype(float)
data = in_image[:,:,0:2]
target = in_image[:,:,2:3]
model1 = keras.models.Sequential()
model1.add(keras.layers.Convolution2D(50,(3,3),strides = (1,1),padding = "same",input_shape=(None,None,2))) #Convolution Layer
model1.add(keras.layers.Dense(50,activation = 'relu')) # Hiden Layer1
model1.add(keras.layers.Dense(50,activation = 'sigmoid')) # Hidden Layer 2
model1.add(keras.layers.Dense(1)) # Output Layer
adadelta = keras.optimizers.Adadelta(lr=1.0, rho=0.95, epsilon=1e-08, decay=0.0)
model1.compile(loss='mean_squared_error', optimizer=adadelta) # Compile the model
model1.fit(np.array([data]),np.array([target]),epochs = 5000)
estimated_band = model1.predict(np.array([data]))

Given your problem setup, it looks like you're trying to training a neural network on one image such that it is able to predict the blue channel of an image from other 2 images. Putting aside the use of such an experiment, there are a few important things when training neural networks properly, including.
learning rate
weight initialization
optimizer
model complexity.
Yann Lecun's Efficient backprop is a late 90s paper that talks about numbers 1, 2 and 3. Number 4 holds on the assumption that as the number of free parameters increase, at some point you'll be able to match each parameter to each output.
Note that achieving zero-loss provides no guarantees on generalization nor does it mean that your model will not generalize, as brilliantly described in a paper presented at ICLR.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.