I am trying to mimic this keras blog about fine tuning image classifiers. I would like to use the Inceptionv3 found on a fchollet repo.
Inception is a Model (functional API), so I can't just do model.add(top_model) which is reserved for Sequential.
How can I add combine two functional Models? Let's say I have
inputs = Input(shape=input_shape)
x = Flatten()(inputs)
predictions = Dense(4, name='final1')(x)
model1 = Model(input=inputs, output=predictions)
for the first model and
inputs_2 = Input(shape=(4,))
y = Dense(5)(l_inputs)
y = Dense(2, name='final2')(y)
predictions_2 = Dense(29)(y)
model2 = Model(input=inputs2, output=predictions2)
for the second. I now want an end-to-end that goes from inputs to predicions_2 and links predictions to inputs_2.
I tried using model1.get_layer('final1').output but I had a mismatch with types and I couldn't make it work.
I haven't tried this but according to the documentation functional models are callable, so you can do something like:
y = model2(model1(x))
where x is the data that goes to inputs and y is the result of predictions_2
I ran into this problem as well while fine tuning VGG16. Here's what worked for me and I imagine a similar approach can be taken for Inception V3. Tested on Keras 2.0.5 with Tensorflow 1.2 backend.
# NOTE: define the following variables
# top_model_weights_path
# num_classes
# dense_layer_1 = 4096
# dense_layer_2 = 4096
vgg16 = applications.VGG16(
include_top=False,
weights='imagenet',
input_shape=(224, 224, 3))
# Inspect the model
vgg16.summary()
# This shape has to match the last layer in VGG16 (without top)
dense_input = Input(shape=(7, 7, 512))
dense_output = Flatten(name='flatten')(dense_input)
dense_output = Dense(dense_layer_1, activation='relu', name='fc1')(dense_output)
dense_output = Dense(dense_layer_2, activation='relu', name='fc2')(dense_output)
dense_output = Dense(num_classes, activation='softmax', name='predictions')(dense_output)
top_model = Model(inputs=dense_input, outputs=dense_output, name='top_model')
# from: https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
# note that it is necessary to start with a fully-trained
# classifier, including the top classifier,
# in order to successfully do fine-tuning
top_model.load_weights(top_model_weights_path)
block5_pool = vgg16.get_layer('block5_pool').output
# Now combine the two models
full_output = top_model(block5_pool)
full_model = Model(inputs=vgg16.input, outputs=full_output)
# set the first 15 layers (up to the last conv block)
# to non-trainable (weights will not be updated)
# WARNING: this may not be applicable for Inception V3
for layer in full_model.layers[:15]:
layer.trainable = False
# Verify things look as expected
full_model.summary()
# compile the model with a SGD/momentum optimizer
# and a very slow learning rate.
full_model.compile(
loss='binary_crossentropy',
optimizer=optimizers.SGD(lr=5e-5, momentum=0.9),
metrics=['accuracy'])
# Train the model...
I think there are 2 options depending on what you need:
(a) predictions_1 and predictions_2 matter for you. In this case, you can train a network with 2 outputs. Here an example derived from your post:
input_shape = [3, 20]
inputs = Input(shape=input_shape)
x = Flatten()(inputs)
predictions_1 = Dense(4, name='predictions_1')(x)
# here the predictions_1 just corresponds to your next layer's input
y = Dense(5)(predictions_1)
y = Dense(2)(y)
predictions_2 = Dense(29, name='predictions_2')(y)
# you specify here that you have 2 outputs
model = Model(input=inputs, output=[predictions_1, predictions_2])
For the .fit and .predict, you can find a lot of details in https://keras.io/getting-started/functional-api-guide/, section: Multi-input and multi-output models.
(b) you are only interested in predictions_2. In this case, you can just do:
input_shape = [3, 20]
inputs = Input(shape=input_shape)
x = Flatten()(inputs)
predictions_1 = Dense(4, name='predictions_1')(x)
# here the predictions_1 just corresponds to your next layer's input
y = Dense(5)(predictions_1)
y = Dense(2)(y)
predictions_2 = Dense(29, name='predictions_2')(y)
# you specify here that your only output is predictions_2
model = Model(input=inputs, output=predictions_2)
Now as regards inception_v3. You can define by yourself the architecture and modify the deep layers inside according to your needs (giving to these layers specific names in order to avoid keras naming them automatically).
After that, compile your model and loads weights (as in https://keras.io/models/about-keras-models/ see function load_weights(..., by_name=True))
# you can load weights for only the part that corresponds to the true
# inception_v3 architecture. The other part will be initialized
# randomly
model.load_weights("inception_v3.hdf5", by_name=True)
This should solve your problem. By the way, you can find extra information here: https://www.gradientzoo.com. The doc. explains several saving / loading / fine-tuning routines ;)
Update: if you do not want to redefine your model from scratch you can do the following:
input_shape = [3, 20]
# define model1 and model2 as you want
inputs1 = Input(shape=input_shape)
x = Flatten()(inputs1)
predictions_1 = Dense(4, name='predictions_1')(x)
model1 = Model(input=inputs1, output=predictions_1)
inputs2 = Input(shape=(4,))
y = Dense(5)(inputs2)
y = Dense(2)(y)
predictions_2 = Dense(29, name='predictions_2')(y)
model2 = Model(input=inputs2, output=predictions_2)
# then define functions returning the image of an input through model1 or model2
def give_model1():
def f(x):
return model1(x)
return f
def give_model2():
def g(x):
return model2(x)
return g
# now you can create a global model as follows:
inputs = Input(shape=input_shape)
x = model1(inputs)
predictions = model2(x)
model = Model(input=inputs, output=predictions)
Drawing from filitchp's answer above, assuming the output dimensions of model1 match the input dimensions of model2, this worked for me:
model12 = Model(inputs=inputs, outputs=model2(model1.output))
Related
I have tried to visualize the architecture of my neural network (see code below). I want to get something like this in terms of visualization.
but I didn't manage to do it. What package should I use or can anyone illustrate what would my network result in?
This is the code for my network:
new_img_size = 128
nbr_img = 3
delta_t = 10
min_pred = 10
image1 = Input(shape=(new_img_size, new_img_size, 3))
image2 = Input(shape=(new_img_size, new_img_size, 3))
y1 = BatchNormalization()(image1)
y1 = Flatten()(y1)
y1 = Dense(1024, activation='relu')(y1)
cnn1 = Model(inputs=image1, outputs=y1)
input_sequence1 = Input(shape=(nbr_img, new_img_size, new_img_size, 3))
lstm1 = TimeDistributed(cnn1)(input_sequence1)
lstm1 = LSTM(1024, activation='relu', return_sequences=False)(lstm1)
lstm1 = Dense(48, activation='relu')(lstm1)
y2 = BatchNormalization()(image2)
y2 = Flatten()(y2)
y2 = Dense(1024, activation='relu')(y2)
cnn2 = Model(inputs=image2, outputs=y2)
input_sequence2 = Input(shape=(nbr_img, new_img_size, new_img_size, 3))
lstm2 = TimeDistributed(cnn2)(input_sequence2)
lstm2 = LSTM(1024, activation='relu', return_sequences=False)(lstm2)
lstm2 = Dense(48, activation='relu')(lstm2)
merged = concatenate([lstm1, lstm2])
mlp = Dense(96, activation='relu')(merged)
mlp = Dense(48, activation='relu')(merged)
mlp = Dense(int(min_pred/delta_t), activation='linear')(mlp)
model = Model(inputs=[input_sequence1, input_sequence2], outputs=mlp)
model.compile(optimizer="Adam", loss='mse', metrics=['mae'])
Thank you for your help
EDIT
I have tried both
tf.keras.utils.plot_model
and netron and both give me this
I do find this useful but since I have a layer of TimeDistributed, I would like to see this in my plot as well. I don't want to see just the name "Time distributed", I would like to see how this layer is creating seperate CNN layers for each input image.
You can use plot_model to programmatically visualize the architecture https://www.tensorflow.org/api_docs/python/tf/keras/utils/plot_model
tf.keras.utils.plot_model(
model, to_file='model.png', show_shapes=False, show_dtype=False,
show_layer_names=True, rankdir='TB', expand_nested=False, dpi=96,
layer_range=None
)
Or, you can use netron to visualize the weight of the model including the architecture.
ref: https://github.com/lutzroeder/netron
I want to make a model like the below picture. (simplified)
So, practically, I want the weights with the same names to always have the same values during training. What I did was the code below:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
example_train_features = np.arange(12000).reshape(1000, 12)
example_lanbels = np.random.randint(2, size=1000) #these data are just for illustration purposes
train_ds = tf.data.Dataset.from_tensor_slices((example_train_features, example_lanbels)).shuffle(buffer_size = 1000).batch(32)
dense1 = layers.Dense(1, activation="relu") #input shape:4
dense2 = layers.Dense(2, activation="relu") #input shape:1
dense3 = layers.Dense(1, activation="sigmoid") #input shape:6
feature_input = keras.Input(shape=(12,), name="features")
nodes_list = []
for i in range(3):
first_lvl_input = feature_input[i :: 4] ######## marked line
out1 = dense1(first_lvl_input)
out2 = dense2(out1)
nodes_list.append(out2)
joined = layers.concatenate(nodes_list)
final_output = dense3(joined)
model = keras.Model(inputs = feature_input, outputs = final_output, name="extrema_model")
compile_and_fit(model, train_ds, val_ds, patience=4)
model.compile(loss = tf.keras.losses.BinaryCrossentropy(),
optimizer = tf.keras.optimizers.RMSprop(),
metrics=keras.metrics.BinaryAccuracy())
history = model.fit(train_ds, epochs=10, validation_data=val_ds)
But when I try to run this code I get this error:
MklConcatOp : Dimensions of inputs should match: shape[0][0]= 71 vs. shape[18][0] = 70
[[node extrema_model/concatenate_2/concat (defined at <ipython-input-373-5efb41d312df>:398) ]] [Op:__inference_train_function_15338]
(please don't pay attention to numbers as they are from my real code) this is because it gets the whole data including the labels as an input, but shouldn't Keras only feed the features itself? Anyway, if I write the marked line as below:
first_lvl_input = feature_input[i :12: 4]
it doesn't give me the above error anymore. But, then I get another error which I know why happens but I don't know how to resolve it.
InvalidArgumentError: Incompatible shapes: [4,1] vs. [32,1]
[[node gradient_tape/binary_crossentropy/logistic_loss/mul/BroadcastGradientArgs
(defined at <ipython-input-1-b82546367b3c>:398) ]] [Op:__inference_train_function_6098]
This is because keras is feeding again the whole batch array, whereas in Keras documentation it is written you shouldn't specify the batch dimension for the program as it understands itself, so I expected Keras to feed the data one by one for my code to work. So I appreciate any ideas on how to resolve this or on how to write a code for what I want. Thanks.
You can wrap the dense layers in timedistributed wrapper , and reshape your data to have three dimensions (1000,3,4)(batch, sequence, feature), so for each time step (=3 that replace your for loop code .) the four features will be multiplied with the same weights each time.
example_train_features = np.arange(12000).reshape(1000, 3, 4 )
example_lanbels = np.random.randint(2, size=1000) #these data are just for illustration purposes
train_ds = tf.data.Dataset.from_tensor_slices((example_train_features, example_lanbels)).shuffle(buffer_size = 1000).batch(32)
dense1 = layers.TimeDistributed(layers.Dense(1, activation="relu")) #input shape:4
dense2 =layers.TimeDistributed(layers.Dense(2, activation="relu")) #input shape:1
dense3 = layers.Dense(1, activation="sigmoid") #input shape:6
feature_input = keras.Input(shape=(3,4), name="features")
out1 = dense1(feature_input)
out2 = dense2(out1)
z = layers.Flatten()(out2)
final_output = dense3(z)
model = keras.Model(inputs = feature_input, outputs = final_output, name="extrema_model")
model.compile(loss = tf.keras.losses.BinaryCrossentropy(),
optimizer = tf.keras.optimizers.RMSprop(),
metrics=keras.metrics.BinaryAccuracy())
history = model.fit(train_ds, epochs=10)
I am trying to create a convolutional neural network that has two regression outputs, a score and a confidence. I have frozen the layers they have in common in the hopes that the addition of the confidence output doesn't change the score, but in my experiments it has. For the model with just the score, I used Xception and added a simple GlobalAveragePooling2D and Dense(512) layer then output a single number.
base_model = Xception(input_shape=(224, 224, 3), weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(512, activation='relu')(x)
predictions = Dense(1, activation='sigmoid')(x)
model = Model(inputs=base_model.input, outputs=predictions)
for layer in base_model.layers:
layer.trainable = False
optimizer = Adam(learning_rate=learning_rate)
model.compile(loss='mae', optimizer=optimizer, metrics=['mse','mae'], run_eagerly=True)
Here is what the end of model.summary() looks like:
When I fit it, the model produces good results.
But when I try to add a second output the result of the first becomes much worse. The new model gets trained off tuples where is first number is the same as the first model and the second number is a confidence value. The model is very similar to the one above.
base_model = Xception(input_shape=(224, 224, 3), weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
score_x = Dense(512, activation='relu')(x)
score_out = Dense(1, activation='sigmoid', name='score_model')(score_x)
confidence_x = Dense(512, activation='relu')(x)
confidence_out = Dense(1, name='confidence_model')(confidence_x)
model = Model(inputs=base_model.input, outputs=[score_out, confidence_out])
for layer in base_model.layers:
layer.trainable = False
losses = {'score_model': 'mae', 'confidence_model': 'mae'}
loss_weights = {'score_model': 1, 'confidence_model': 1}
model.compile(loss=losses, loss_weights=loss_weights, optimizer=optimizer, metrics=['mse','mae'], run_eagerly=True)
When I look at model.summary(), it has twice as many trainable parameters as the previous model, which is exactly what I was expecting. Everything looks right to me so far.
But when I train this model the performance on the score is much worse. I was thinking it would be the same (within stochastic variation). After the first epoch, the loss from the first model is around 0.125. The score_model_loss from the second model is around 0.554. Clearly I'm not completely separating the models. What am I missing?
Note: This answer will work well only because the layer that do the feature extraction are frozen. As #Akshay Sehgal stated in the comments :
optimizing for 2 goals together is actually a completely different problem than optimizing 2 independent goals separately
In that case, we are optimizing for 2 goals separately.
The easiest solution is probably to write a custom training loop with 2 tf.GradientTape, one for each goal. Lets consider this really simple example:
Dummy data
Let's create some random Data
import tensorflow as tf
X = tf.random.normal((1000,1))
y1= 3*X + 1
y2 = -2*X +2
ds = tf.data.Dataset.from_tensor_slices((X,y1,y2)).batch(10)
Creating a model with 2 outputs
In that example, I skip the feature extraction step, as a simple linear regression will work for the data. But as your feature extractor network is frozen, the example is similar.
inp = tf.keras.Input((1,))
dense_1 = tf.keras.layers.Dense(1, name="objective1")(inp)
dense_2 = tf.keras.layers.Dense(1, name="objective2")(inp)
model = tf.keras.Model(inputs=inp, outputs=[dense_1, dense_2])
# setting up the loss functions as well as the optimizer
opt = tf.optimizers.SGD()
loss_func1 = tf.losses.mean_squared_error
loss_func2 = tf.losses.mean_absolute_error
Note the name given to the two dense layers: I will use them later to retrieve the appropriate weights.
Getting the weights to optimize
We can use the name set before to retrieve the variable belonging to each objective :
var1, var2 = [],[]
for l in model.layers:
if "objective1" in l.name:
var1 += l.trainable_variables
if "objective2" in l.name:
var2 += l.trainable_variables
The training loop
You simply need to tapes, one for each objective. You can use different optimizer as well, if it makes the training better.
counter = 0
for x, y1, y2 in ds:
counter += 1
with tf.GradientTape() as tape1, tf.GradientTape() as tape2:
pred1, pred2 = model(x)
loss1 = loss_func1(y1, pred1)
loss2 = loss_func2(y2, pred2)
grad1 = tape1.gradient(loss1, var1)
grad2 = tape2.gradient(loss2, var2)
opt.apply_gradients(zip(grad1, var1))
opt.apply_gradients(zip(grad2, var2))
if counter % 10:
print(f"Step : {counter}, objective1: {tf.reduce_mean(loss1)}, objective2: {tf.reduce_mean(loss2)}")
If we run the training, we get:
Step : 1, objective1: 4.609124183654785, objective2: 2.6634981632232666
[...]
Step : 99, objective1: 7.176481902227555e-14, objective2: 0.030187154188752174
The principle advantage training that way is that you just need to extract the features once for the two objectives.
EDIT: More clarification -
I have a pre-trained model file which I can load and pull model.layers and model.weights from. This model may have a complex set of interconnected layers.
I want to be able to use the model.layers or the model() file directly to append it to a layer in another neural network.
#Dummy model - this function is not available to me; only the model file
def model1():
inp = layers.Input((3,))
x = layers.Dense(4, activation='relu')(inp)
out = layers.Dense(2, activation='softmax')(x)
model = Model(inp, out)
return model
pretrained_model = model1() #I have THIS only!
L = pretrained_model.layers
print(L)
[<tensorflow.python.keras.engine.input_layer.InputLayer at 0x7f915d6778b0>,
<tensorflow.python.keras.layers.core.Dense at 0x7f915d643790>,
<tensorflow.python.keras.layers.core.Dense at 0x7f915e124e50>]
I want to take the Dense layers L[1:] and add them to another architecture (Not the weights, just the layers). Something like below as #Anton has described in his solution.
inp = layers.Input((3,))
x = Dense(3, activation='relu')(inp)
m0 = get_layers(pretrained_model)(x) #<---
out = layers.Dense(2)(m0)
This should give me a model.summary() with 5 layers - inp, x, L[1], L[2], out
But I am unable to use the list of layers directly.
I can come up with a function that recreates a partial computation graph based on these layers but I am looking for something simpler.
I have already tried modifying the model1() function to work for me as below, which serves my purpose but assuming I only get a model file and with a massive number of layers, this will not be possible.
def model1(layer):
#inp = layers.Input((3,))
x = layers.Dense(4, activation='relu')(layer)
out = layers.Dense(2, activation='softmax')(x)
model = Model(inp, out)
return model.output
How can I use a model generator inside another model
We can use the generate model1() and replace
inp = layers.Input((3,))
x = Dense(3, activation='relu')(inp)
m0 = get_layers(pretrained_model)(x) # <---
out = layers.Dense(2)(m0)
with
inp = layers.Input((3,))
x = layers.Dense(3, activation='relu')(inp)
m0 = pretrained_model(x) # <---
out = layers.Dense(2)(m0)
and if we want a new model generator model2() that does that as a function
def model2(pretrained_model):
inp = layers.Input((3,), name='model2_input')
x = layers.Dense(3, activation='relu', name='model2_x')(inp)
m0 = pretrained_model(x)
out = layers.Dense(2, name='model2_out')(m0)
model = Model(inp, out, name='model2')
return model
second_model = model2()
If we look at the graph of second_model we can see that indeed it contains the layers of model1
We can generate the above image using
tf.keras.utils.plot_model(second_model, show_shapes=True, expand_nested=True)
I have used Keras and TensorFlow to classify the Fashion MNIST following this tutorial .
It uses the AdamOptimizer to find the value for model parameters that minimize the loss function of the network. The input for the network is a 2-D tensor with shape [28, 28], and output is a 1-D tensor with shape [10] which is the result of a softmax function.
Once the network has been trained, I want to use the optimizer for another task: find an input that maximizes one of the elements of the output tensor. How can this be done? Is it possible to do so using Keras or one have to use a lower level API?
Since the input is not unique for a given output, it would be even better if we could impose some constraints on the values the input can take.
The trained model has the following format
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation=tf.nn.relu),
keras.layers.Dense(10, activation=tf.nn.softmax)
])
I feel you would want to backprop with respect to the input freezing all the weights to your model. What you could do is:
Add a dense layer after the input layer with the same dimensions as input and set it as trainable
Freeze all the other layers of your model. (except the one you added)
As an input, feed an identity matrix and train your model based on whatever output you desire.
This article and this post might be able to help you if you want to backprop based on the input instead. It's a bit like what you are aiming for but you can get the intuition.
It would be very similar to the way that filters of a Convolutional Network is visualized: we would do gradient ascent optimization in input space to maximize the response of a particular filter.
Here is how to do it: after training is finished, first we need to specify the output and define a loss function that we want to maximize:
from keras import backend as K
output_class = 0 # the index of the output class we want to maximize
output = model.layers[-1].output
loss = K.mean(output[:,output_class]) # get the average activation of our desired class over the batch
Next, we need to take the gradient of the loss we have defined above with respect to the input layer:
grads = K.gradients(loss, model.input)[0] # the output of `gradients` is a list, just take the first (and only) element
grads = K.l2_normalize(grads) # normalize the gradients to help having an smooth optimization process
Next, we need to define a backend function that takes the initial input image and gives the values of loss and gradients as outputs, so that we can use it in the next step to implement the optimization process:
func = K.function([model.input], [loss, grads])
Finally, we implement the gradient ascent optimization process:
import numpy as np
input_img = np.random.random((1, 28, 28)) # define an initial random image
lr = 1. # learning rate used for gradient updates
max_iter = 50 # number of gradient updates iterations
for i in range(max_iter):
loss_val, grads_val = func([input_img])
input_img += grads_val * lr # update the image based on gradients
Note that, after this process is finished, to display the image you may need to make sure that all the values in the image are in the range [0, 255] (or [0,1]).
After the hints Saket Kumar Singh gave in his answer, I wrote the following that seems to solve the question.
I create two custom layers. Maybe Keras offers already some classes that are equivalent to them.
The first on is a trainable input:
class MyInputLayer(keras.layers.Layer):
def __init__(self, output_dim, **kwargs):
self.output_dim = output_dim
super(MyInputLayer, self).__init__(**kwargs)
def build(self, input_shape):
self.kernel = self.add_weight(name='kernel',
shape=self.output_dim,
initializer='uniform',
trainable=True)
super(MyInputLayer, self).build(input_shape)
def call(self, x):
return self.kernel
def compute_output_shape(self, input_shape):
return self.output_dim
The second one gets the probability of the label of interest:
class MySelectionLayer(keras.layers.Layer):
def __init__(self, position, **kwargs):
self.position = position
self.output_dim = 1
super(MySelectionLayer, self).__init__(**kwargs)
def build(self, input_shape):
super(MySelectionLayer, self).build(input_shape)
def call(self, x):
mask = np.array([False]*x.shape[-1])
mask[self.position] = True
return tf.boolean_mask(x, mask,axis=1)
def compute_output_shape(self, input_shape):
return self.output_dim
I used them in this way:
# Build the model
layer_flatten = keras.layers.Flatten(input_shape=(28, 28))
layerDense1 = keras.layers.Dense(128, activation=tf.nn.relu)
layerDense2 = keras.layers.Dense(10, activation=tf.nn.softmax)
model = keras.Sequential([
layer_flatten,
layerDense1,
layerDense2
])
# Compile the model
model.compile(optimizer=tf.train.AdamOptimizer(),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Train the model
# ...
# Freeze the model
layerDense1.trainable = False
layerDense2.trainable = False
# Build another model
class_index = 7
layerInput = MyInputLayer((1,784))
layerSelection = MySelectionLayer(class_index)
model_extended = keras.Sequential([
layerInput,
layerDense1,
layerDense2,
layerSelection
])
# Compile it
model_extended.compile(optimizer=tf.train.AdamOptimizer(),
loss='mean_absolute_error')
# Train it
dummyInput = np.ones((1,1))
target = np.ones((1,1))
model_extended.fit(dummyInput, target,epochs=300)
# Retrieve the weights of layerInput
layerInput.get_weights()[0]
Interesting. Maybe a solution would be to feed all your data to the network and for each sample save the output_layer after softmax.
This way, for 3 classes, where you want to find the best input for class 1, you are looking for outputs where the first component is high. For example: [1 0 0]
Indeed the output means the probability, or the confidence of the network, for the sample being one of the classes.
Funny coincident I was just working on the same "problem". I'm interested in the direction of adversarial training etc. What I did was to insert a LocallyConnected2D Layer after the input and then train with data which is all one and has as targets the class of interest.
As model I use
batch_size = 64
num_classes = 10
epochs = 20
input_shape = (28, 28, 1)
inp = tf.keras.layers.Input(shape=input_shape)
conv1 = tf.keras.layers.Conv2D(32, kernel_size=(3, 3),activation='relu',kernel_initializer='he_normal')(inp)
pool1 = tf.keras.layers.MaxPool2D((2, 2))(conv1)
drop1 = tf.keras.layers.Dropout(0.20)(pool1)
flat = tf.keras.layers.Flatten()(drop1)
fc1 = tf.keras.layers.Dense(128, activation='relu')(flat)
norm1 = tf.keras.layers.BatchNormalization()(fc1)
dropfc1 = tf.keras.layers.Dropout(0.25)(norm1)
out = tf.keras.layers.Dense(num_classes, activation='softmax')(dropfc1)
model = tf.keras.models.Model(inputs = inp , outputs = out)
model.compile(loss=tf.keras.losses.categorical_crossentropy,
optimizer=tf.keras.optimizers.RMSprop(),
metrics=['accuracy'])
model.summary()
after training I insert the new layer
def insert_intermediate_layer_in_keras(model,position, before_layer_id):
layers = [l for l in model.layers]
if(before_layer_id==0) :
x = new_layer
else:
x = layers[0].output
for i in range(1, len(layers)):
if i == before_layer_id:
x = new_layer(x)
x = layers[i](x)
else:
x = layers[i](x)
new_model = tf.keras.models.Model(inputs=layers[0].input, outputs=x)
return new_model
def fix_model(model):
for l in model.layers:
l.trainable=False
fix_model(model)
new_layer = tf.keras.layers.LocallyConnected2D(1, kernel_size=(1, 1),
activation='linear',
kernel_initializer='he_normal',
use_bias=False)
new_model = insert_intermediate_layer_in_keras(model,new_layer,1)
new_model.compile(loss=tf.keras.losses.categorical_crossentropy,
optimizer=tf.keras.optimizers.RMSprop(),
metrics=['accuracy'])
and finally rerun training with my fake data.
X_fake = np.ones((60000,28,28,1))
print(Y_test.shape)
y_fake = np.ones((60000))
Y_fake = tf.keras.utils.to_categorical(y_fake, num_classes)
new_model.fit(X_fake, Y_fake, epochs=100)
weights = new_layer.get_weights()[0]
imshow(weights.reshape(28,28))
plt.show()
Results are not yet satisfying but I'm confident of the approach and guess I need to play around with the optimiser.