I am trying to create a convolutional neural network that has two regression outputs, a score and a confidence. I have frozen the layers they have in common in the hopes that the addition of the confidence output doesn't change the score, but in my experiments it has. For the model with just the score, I used Xception and added a simple GlobalAveragePooling2D and Dense(512) layer then output a single number.
base_model = Xception(input_shape=(224, 224, 3), weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(512, activation='relu')(x)
predictions = Dense(1, activation='sigmoid')(x)
model = Model(inputs=base_model.input, outputs=predictions)
for layer in base_model.layers:
layer.trainable = False
optimizer = Adam(learning_rate=learning_rate)
model.compile(loss='mae', optimizer=optimizer, metrics=['mse','mae'], run_eagerly=True)
Here is what the end of model.summary() looks like:
When I fit it, the model produces good results.
But when I try to add a second output the result of the first becomes much worse. The new model gets trained off tuples where is first number is the same as the first model and the second number is a confidence value. The model is very similar to the one above.
base_model = Xception(input_shape=(224, 224, 3), weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
score_x = Dense(512, activation='relu')(x)
score_out = Dense(1, activation='sigmoid', name='score_model')(score_x)
confidence_x = Dense(512, activation='relu')(x)
confidence_out = Dense(1, name='confidence_model')(confidence_x)
model = Model(inputs=base_model.input, outputs=[score_out, confidence_out])
for layer in base_model.layers:
layer.trainable = False
losses = {'score_model': 'mae', 'confidence_model': 'mae'}
loss_weights = {'score_model': 1, 'confidence_model': 1}
model.compile(loss=losses, loss_weights=loss_weights, optimizer=optimizer, metrics=['mse','mae'], run_eagerly=True)
When I look at model.summary(), it has twice as many trainable parameters as the previous model, which is exactly what I was expecting. Everything looks right to me so far.
But when I train this model the performance on the score is much worse. I was thinking it would be the same (within stochastic variation). After the first epoch, the loss from the first model is around 0.125. The score_model_loss from the second model is around 0.554. Clearly I'm not completely separating the models. What am I missing?
Note: This answer will work well only because the layer that do the feature extraction are frozen. As #Akshay Sehgal stated in the comments :
optimizing for 2 goals together is actually a completely different problem than optimizing 2 independent goals separately
In that case, we are optimizing for 2 goals separately.
The easiest solution is probably to write a custom training loop with 2 tf.GradientTape, one for each goal. Lets consider this really simple example:
Dummy data
Let's create some random Data
import tensorflow as tf
X = tf.random.normal((1000,1))
y1= 3*X + 1
y2 = -2*X +2
ds = tf.data.Dataset.from_tensor_slices((X,y1,y2)).batch(10)
Creating a model with 2 outputs
In that example, I skip the feature extraction step, as a simple linear regression will work for the data. But as your feature extractor network is frozen, the example is similar.
inp = tf.keras.Input((1,))
dense_1 = tf.keras.layers.Dense(1, name="objective1")(inp)
dense_2 = tf.keras.layers.Dense(1, name="objective2")(inp)
model = tf.keras.Model(inputs=inp, outputs=[dense_1, dense_2])
# setting up the loss functions as well as the optimizer
opt = tf.optimizers.SGD()
loss_func1 = tf.losses.mean_squared_error
loss_func2 = tf.losses.mean_absolute_error
Note the name given to the two dense layers: I will use them later to retrieve the appropriate weights.
Getting the weights to optimize
We can use the name set before to retrieve the variable belonging to each objective :
var1, var2 = [],[]
for l in model.layers:
if "objective1" in l.name:
var1 += l.trainable_variables
if "objective2" in l.name:
var2 += l.trainable_variables
The training loop
You simply need to tapes, one for each objective. You can use different optimizer as well, if it makes the training better.
counter = 0
for x, y1, y2 in ds:
counter += 1
with tf.GradientTape() as tape1, tf.GradientTape() as tape2:
pred1, pred2 = model(x)
loss1 = loss_func1(y1, pred1)
loss2 = loss_func2(y2, pred2)
grad1 = tape1.gradient(loss1, var1)
grad2 = tape2.gradient(loss2, var2)
opt.apply_gradients(zip(grad1, var1))
opt.apply_gradients(zip(grad2, var2))
if counter % 10:
print(f"Step : {counter}, objective1: {tf.reduce_mean(loss1)}, objective2: {tf.reduce_mean(loss2)}")
If we run the training, we get:
Step : 1, objective1: 4.609124183654785, objective2: 2.6634981632232666
[...]
Step : 99, objective1: 7.176481902227555e-14, objective2: 0.030187154188752174
The principle advantage training that way is that you just need to extract the features once for the two objectives.
Related
i have the following Model
inputs = Input(shape=(8)) # 8 groessen als eingabe
x = Reshape((8, 1))(inputs)
# generator
x = Bidirectional(LSTM(32, return_sequences=True))(x)
x = Bidirectional(LSTM(64, return_sequences=True))(x)
generated = LSTM(4, return_sequences=True, activation="sigmoid")(x)
# rating
x = Bidirectional(LSTM(128, return_sequences=True))(generated)
x = Bidirectional(LSTM(128))(x)
x = Flatten()(x)
x = Dense(16, activation="relu")(x)
rating = Dense(8, activation="relu")(x)
model = Model(inputs=inputs, outputs=[rating, generated])
return model
i feed a sequence (8,) to the model, a generator should create a new sequence (8,4) out of this sequence that fulfill some conditions. There are a lot of outputs that could fulfil that condition, but my generator should just take one that it likes.
then i feed the generated sequence to the next layer, where i calculate the value (8,) of this output in order to be able to have gradients, when i apply my loss function
rating, generated = model(model_input, training=True)
calculated_rating = my_numeric_function(np.array(generated))
loss_1 = mse(model_input, rating) # the rating-loss
loss_2 = mse(calculated_rating , rating) # the loss between rating and generator
metric = mse(model_input, calculated_rating) # metric: difference between model input and real calculated
tape.gradient(loss_1+loss_2, model.trainable_weights)
my loss1 and loss2 is decreasing, but the metric (calculated_rating - generated) stays exactly the same.
it seems that in backpropagation my loss function does not allow to change the weights of the generator layers.
what may be the reason that the metrics does not decrease ? (it stays around 51-52)
EDIT: More clarification -
I have a pre-trained model file which I can load and pull model.layers and model.weights from. This model may have a complex set of interconnected layers.
I want to be able to use the model.layers or the model() file directly to append it to a layer in another neural network.
#Dummy model - this function is not available to me; only the model file
def model1():
inp = layers.Input((3,))
x = layers.Dense(4, activation='relu')(inp)
out = layers.Dense(2, activation='softmax')(x)
model = Model(inp, out)
return model
pretrained_model = model1() #I have THIS only!
L = pretrained_model.layers
print(L)
[<tensorflow.python.keras.engine.input_layer.InputLayer at 0x7f915d6778b0>,
<tensorflow.python.keras.layers.core.Dense at 0x7f915d643790>,
<tensorflow.python.keras.layers.core.Dense at 0x7f915e124e50>]
I want to take the Dense layers L[1:] and add them to another architecture (Not the weights, just the layers). Something like below as #Anton has described in his solution.
inp = layers.Input((3,))
x = Dense(3, activation='relu')(inp)
m0 = get_layers(pretrained_model)(x) #<---
out = layers.Dense(2)(m0)
This should give me a model.summary() with 5 layers - inp, x, L[1], L[2], out
But I am unable to use the list of layers directly.
I can come up with a function that recreates a partial computation graph based on these layers but I am looking for something simpler.
I have already tried modifying the model1() function to work for me as below, which serves my purpose but assuming I only get a model file and with a massive number of layers, this will not be possible.
def model1(layer):
#inp = layers.Input((3,))
x = layers.Dense(4, activation='relu')(layer)
out = layers.Dense(2, activation='softmax')(x)
model = Model(inp, out)
return model.output
How can I use a model generator inside another model
We can use the generate model1() and replace
inp = layers.Input((3,))
x = Dense(3, activation='relu')(inp)
m0 = get_layers(pretrained_model)(x) # <---
out = layers.Dense(2)(m0)
with
inp = layers.Input((3,))
x = layers.Dense(3, activation='relu')(inp)
m0 = pretrained_model(x) # <---
out = layers.Dense(2)(m0)
and if we want a new model generator model2() that does that as a function
def model2(pretrained_model):
inp = layers.Input((3,), name='model2_input')
x = layers.Dense(3, activation='relu', name='model2_x')(inp)
m0 = pretrained_model(x)
out = layers.Dense(2, name='model2_out')(m0)
model = Model(inp, out, name='model2')
return model
second_model = model2()
If we look at the graph of second_model we can see that indeed it contains the layers of model1
We can generate the above image using
tf.keras.utils.plot_model(second_model, show_shapes=True, expand_nested=True)
I am building image classifier with localisation using CNN.
My CNN has image as input, however after last CONV layer i want to split it into two , one part for image classification, and next part for image localisation.
Needless to say one part should use mean squared error, another one should use binary binary_crossentropy. My structure is something like:
input_image = Input(shape=(IMG_W, IMG_H, 3))
# Layer 1
x = Conv2D(32, (3,3), strides=(1,1), padding='same', name='conv_1', use_bias=False)(input_image)
x = BatchNormalization(name='norm_1')(x)
x = LeakyReLU(alpha=0.1)(x)
# Layer 2
x = Conv2D(64, (3,3), strides=(1,1), padding='same', name='conv_2', use_bias=False)(x)
x = BatchNormalization(name='norm_2')(x)
x = LeakyReLU(alpha=0.1)(x)
now i want to divied it into two Dense (FC) layer
class_layer = x
class_layer = Dense(256,activation="relu")(class_layer)
class_layer = Dense(2,activation="softmax")(class_layer)
model_one = Model(input_image,class_layer)
model_one.compile(loss="binary_crossentrophy", optimizer=keras.optimizers.Adam(),metrics=['accuracy'])
and layer for image localisation
x = Dense(1024,activation="relu")(x)
x = Dense(256,activation="relu")(x)
x = Dense(4,activation="relu")(x)
model = Model(input_image,x)
model.compile(loss="mean_squared_error", optimizer=keras.optimizers.Adam(),metrics=['accuracy'])
However how can i concat the layes so the result vector will be ( 2 + 4 ) ?
Can i even achieve splitting like this?
I know about model.concatenate However this should be called before compiling, so each part wouldnt have different loss function
Thanks for help and answers
You can initialize your model with multiple outputs, and specify losses for each of them. If you want your loss from model_one to have weight a, and the loss from model to have weight b, so your total loss would look like a*mse + b*binary_ce, then you would have something like
model = Model(input_image, [x, class_layer])
model.compile(loss=['mean_squared_error', 'binary_crossentropy'],
loss_weights=[a, b],
optimizer=keras.optimizers.Adam())
See the loss and loss_weights parameters in the documentation for Model.compile for more details https://keras.io/models/model/.
I am training a neural network using Keras and Theano, in which inputs have a format like:
[
[Situation features],
[Option 1 features],
[Option 2 features],
]
I want to train a model to predict how often each option will be chosen, by making the model learn how to score each option, and how the situation makes differences in score more or less important.
My model looks like:
option_inputs = [Input(shape=(NUM_FEATURES,), name='situation_input'),
Input(shape=(NUM_FEATURES,), name='option_input_0'),
Input(shape=(NUM_FEATURES,), name='option_input_1')]
situation_input_processing = Dense(5, activation='relu', name='situation_input_processing')
option_input_processing = Dense(20, activation='relu', name='option_input_processing')
diversity_neuron = Dense(1, activation='softplus', name='diversity_neuron')
scoring_neuron = Dense(1, activation='linear', name='scoring_neuron')
diversity_output = diversity_neuron(situation_input_processing(journey_inputs[0]))
scoring_outputs = [scoring_neuron(option_input_processing(option_input)) for option_input in option_inputs[1:2]]
logit_outputs = [Multiply()([diversity_output, scoring_output]) for scoring_output in scoring_outputs]
probability_outputs = Activation('softmax')(keras.layers.concatenate(logit_outputs, axis=-1))
model = Model(inputs=option_inputs, outputs=probability_outputs)
When trying to get probability_outputs, I get the error:
ValueError: Concatenate layer should be called on a list of inputs
The error seems to be triggered because logit_outputs is not built iterating through all 3 input feature collections, only out of 2 of them.
Any idea how to work around this problem?
Once the model is trained, I want to observe the outputs of diversity_neuron and scoring_neuron to learn how to extrapolate the scoring for arbitrary number of options and understand what drives diversity.
I have made these changes to workaround the problem:
I'm including the situation features in the beginning of each option
feature list
I have added a layer that can filter the situation
features from the option inputs. This is made by manually setting
weights of a non-trainable layer.
I then can iterate through all
inputs in any path of the network
The final code looks like:
option_inputs = [Input(shape=(NUM_FEATURES,), name='option_input_0'),
Input(shape=(NUM_FEATURES,), name='option_input_1')]
situation_input_filtering = Dense(NUM_SITUATION_FEATURES, activation='linear', name='situation_input_filtering')
situation_input_filtering.trainable = False
situation_input_processing = Dense(5, activation='relu', name='situation_input_processing')
option_input_processing = Dense(20, activation='relu', name='option_input_processing')
diversity_neuron = Dense(1, activation='sigmoid', name='diversity_neuron')
scoring_neuron = Dense(1, activation='linear', name='scoring_neuron')
diversity_outputs = [diversity_neuron(situation_input_processing(situation_input_filtering(option_input))) for
option_input in option_inputs]
scoring_outputs = [scoring_neuron(option_input_processing(option_input)) for option_input in option_inputs]
logit_outputs = [Multiply()([diversity_output, scoring_output]) for diversity_output, scoring_output in
zip(diversity_outputs, scoring_outputs)]
combined = keras.layers.concatenate(logit_outputs, axis=-1)
probability_outputs = Activation('softmax')(combined)
model = Model(inputs=option_inputs, outputs=probability_outputs)
model.compile(optimizer='adadelta', loss='categorical_crossentropy', metrics=['accuracy'])
mask_weights = np.zeros([NUM_FEATURES, NUM_SITUATION_FEATURES])
for i in xrange(NUM_situation_FEATURES):
mask_weights[i, i] = 1.0
for layer in model.layers:
if layer.name == 'situation_input_filtering':
layer.set_weights([mask_weights, np.zeros(NUM_SITUATION_FEATURES)])
I am trying to mimic this keras blog about fine tuning image classifiers. I would like to use the Inceptionv3 found on a fchollet repo.
Inception is a Model (functional API), so I can't just do model.add(top_model) which is reserved for Sequential.
How can I add combine two functional Models? Let's say I have
inputs = Input(shape=input_shape)
x = Flatten()(inputs)
predictions = Dense(4, name='final1')(x)
model1 = Model(input=inputs, output=predictions)
for the first model and
inputs_2 = Input(shape=(4,))
y = Dense(5)(l_inputs)
y = Dense(2, name='final2')(y)
predictions_2 = Dense(29)(y)
model2 = Model(input=inputs2, output=predictions2)
for the second. I now want an end-to-end that goes from inputs to predicions_2 and links predictions to inputs_2.
I tried using model1.get_layer('final1').output but I had a mismatch with types and I couldn't make it work.
I haven't tried this but according to the documentation functional models are callable, so you can do something like:
y = model2(model1(x))
where x is the data that goes to inputs and y is the result of predictions_2
I ran into this problem as well while fine tuning VGG16. Here's what worked for me and I imagine a similar approach can be taken for Inception V3. Tested on Keras 2.0.5 with Tensorflow 1.2 backend.
# NOTE: define the following variables
# top_model_weights_path
# num_classes
# dense_layer_1 = 4096
# dense_layer_2 = 4096
vgg16 = applications.VGG16(
include_top=False,
weights='imagenet',
input_shape=(224, 224, 3))
# Inspect the model
vgg16.summary()
# This shape has to match the last layer in VGG16 (without top)
dense_input = Input(shape=(7, 7, 512))
dense_output = Flatten(name='flatten')(dense_input)
dense_output = Dense(dense_layer_1, activation='relu', name='fc1')(dense_output)
dense_output = Dense(dense_layer_2, activation='relu', name='fc2')(dense_output)
dense_output = Dense(num_classes, activation='softmax', name='predictions')(dense_output)
top_model = Model(inputs=dense_input, outputs=dense_output, name='top_model')
# from: https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
# note that it is necessary to start with a fully-trained
# classifier, including the top classifier,
# in order to successfully do fine-tuning
top_model.load_weights(top_model_weights_path)
block5_pool = vgg16.get_layer('block5_pool').output
# Now combine the two models
full_output = top_model(block5_pool)
full_model = Model(inputs=vgg16.input, outputs=full_output)
# set the first 15 layers (up to the last conv block)
# to non-trainable (weights will not be updated)
# WARNING: this may not be applicable for Inception V3
for layer in full_model.layers[:15]:
layer.trainable = False
# Verify things look as expected
full_model.summary()
# compile the model with a SGD/momentum optimizer
# and a very slow learning rate.
full_model.compile(
loss='binary_crossentropy',
optimizer=optimizers.SGD(lr=5e-5, momentum=0.9),
metrics=['accuracy'])
# Train the model...
I think there are 2 options depending on what you need:
(a) predictions_1 and predictions_2 matter for you. In this case, you can train a network with 2 outputs. Here an example derived from your post:
input_shape = [3, 20]
inputs = Input(shape=input_shape)
x = Flatten()(inputs)
predictions_1 = Dense(4, name='predictions_1')(x)
# here the predictions_1 just corresponds to your next layer's input
y = Dense(5)(predictions_1)
y = Dense(2)(y)
predictions_2 = Dense(29, name='predictions_2')(y)
# you specify here that you have 2 outputs
model = Model(input=inputs, output=[predictions_1, predictions_2])
For the .fit and .predict, you can find a lot of details in https://keras.io/getting-started/functional-api-guide/, section: Multi-input and multi-output models.
(b) you are only interested in predictions_2. In this case, you can just do:
input_shape = [3, 20]
inputs = Input(shape=input_shape)
x = Flatten()(inputs)
predictions_1 = Dense(4, name='predictions_1')(x)
# here the predictions_1 just corresponds to your next layer's input
y = Dense(5)(predictions_1)
y = Dense(2)(y)
predictions_2 = Dense(29, name='predictions_2')(y)
# you specify here that your only output is predictions_2
model = Model(input=inputs, output=predictions_2)
Now as regards inception_v3. You can define by yourself the architecture and modify the deep layers inside according to your needs (giving to these layers specific names in order to avoid keras naming them automatically).
After that, compile your model and loads weights (as in https://keras.io/models/about-keras-models/ see function load_weights(..., by_name=True))
# you can load weights for only the part that corresponds to the true
# inception_v3 architecture. The other part will be initialized
# randomly
model.load_weights("inception_v3.hdf5", by_name=True)
This should solve your problem. By the way, you can find extra information here: https://www.gradientzoo.com. The doc. explains several saving / loading / fine-tuning routines ;)
Update: if you do not want to redefine your model from scratch you can do the following:
input_shape = [3, 20]
# define model1 and model2 as you want
inputs1 = Input(shape=input_shape)
x = Flatten()(inputs1)
predictions_1 = Dense(4, name='predictions_1')(x)
model1 = Model(input=inputs1, output=predictions_1)
inputs2 = Input(shape=(4,))
y = Dense(5)(inputs2)
y = Dense(2)(y)
predictions_2 = Dense(29, name='predictions_2')(y)
model2 = Model(input=inputs2, output=predictions_2)
# then define functions returning the image of an input through model1 or model2
def give_model1():
def f(x):
return model1(x)
return f
def give_model2():
def g(x):
return model2(x)
return g
# now you can create a global model as follows:
inputs = Input(shape=input_shape)
x = model1(inputs)
predictions = model2(x)
model = Model(input=inputs, output=predictions)
Drawing from filitchp's answer above, assuming the output dimensions of model1 match the input dimensions of model2, this worked for me:
model12 = Model(inputs=inputs, outputs=model2(model1.output))