I am training a neural network using Keras and Theano, in which inputs have a format like:
[
[Situation features],
[Option 1 features],
[Option 2 features],
]
I want to train a model to predict how often each option will be chosen, by making the model learn how to score each option, and how the situation makes differences in score more or less important.
My model looks like:
option_inputs = [Input(shape=(NUM_FEATURES,), name='situation_input'),
Input(shape=(NUM_FEATURES,), name='option_input_0'),
Input(shape=(NUM_FEATURES,), name='option_input_1')]
situation_input_processing = Dense(5, activation='relu', name='situation_input_processing')
option_input_processing = Dense(20, activation='relu', name='option_input_processing')
diversity_neuron = Dense(1, activation='softplus', name='diversity_neuron')
scoring_neuron = Dense(1, activation='linear', name='scoring_neuron')
diversity_output = diversity_neuron(situation_input_processing(journey_inputs[0]))
scoring_outputs = [scoring_neuron(option_input_processing(option_input)) for option_input in option_inputs[1:2]]
logit_outputs = [Multiply()([diversity_output, scoring_output]) for scoring_output in scoring_outputs]
probability_outputs = Activation('softmax')(keras.layers.concatenate(logit_outputs, axis=-1))
model = Model(inputs=option_inputs, outputs=probability_outputs)
When trying to get probability_outputs, I get the error:
ValueError: Concatenate layer should be called on a list of inputs
The error seems to be triggered because logit_outputs is not built iterating through all 3 input feature collections, only out of 2 of them.
Any idea how to work around this problem?
Once the model is trained, I want to observe the outputs of diversity_neuron and scoring_neuron to learn how to extrapolate the scoring for arbitrary number of options and understand what drives diversity.
I have made these changes to workaround the problem:
I'm including the situation features in the beginning of each option
feature list
I have added a layer that can filter the situation
features from the option inputs. This is made by manually setting
weights of a non-trainable layer.
I then can iterate through all
inputs in any path of the network
The final code looks like:
option_inputs = [Input(shape=(NUM_FEATURES,), name='option_input_0'),
Input(shape=(NUM_FEATURES,), name='option_input_1')]
situation_input_filtering = Dense(NUM_SITUATION_FEATURES, activation='linear', name='situation_input_filtering')
situation_input_filtering.trainable = False
situation_input_processing = Dense(5, activation='relu', name='situation_input_processing')
option_input_processing = Dense(20, activation='relu', name='option_input_processing')
diversity_neuron = Dense(1, activation='sigmoid', name='diversity_neuron')
scoring_neuron = Dense(1, activation='linear', name='scoring_neuron')
diversity_outputs = [diversity_neuron(situation_input_processing(situation_input_filtering(option_input))) for
option_input in option_inputs]
scoring_outputs = [scoring_neuron(option_input_processing(option_input)) for option_input in option_inputs]
logit_outputs = [Multiply()([diversity_output, scoring_output]) for diversity_output, scoring_output in
zip(diversity_outputs, scoring_outputs)]
combined = keras.layers.concatenate(logit_outputs, axis=-1)
probability_outputs = Activation('softmax')(combined)
model = Model(inputs=option_inputs, outputs=probability_outputs)
model.compile(optimizer='adadelta', loss='categorical_crossentropy', metrics=['accuracy'])
mask_weights = np.zeros([NUM_FEATURES, NUM_SITUATION_FEATURES])
for i in xrange(NUM_situation_FEATURES):
mask_weights[i, i] = 1.0
for layer in model.layers:
if layer.name == 'situation_input_filtering':
layer.set_weights([mask_weights, np.zeros(NUM_SITUATION_FEATURES)])
Related
I want to make a model like the below picture. (simplified)
So, practically, I want the weights with the same names to always have the same values during training. What I did was the code below:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
example_train_features = np.arange(12000).reshape(1000, 12)
example_lanbels = np.random.randint(2, size=1000) #these data are just for illustration purposes
train_ds = tf.data.Dataset.from_tensor_slices((example_train_features, example_lanbels)).shuffle(buffer_size = 1000).batch(32)
dense1 = layers.Dense(1, activation="relu") #input shape:4
dense2 = layers.Dense(2, activation="relu") #input shape:1
dense3 = layers.Dense(1, activation="sigmoid") #input shape:6
feature_input = keras.Input(shape=(12,), name="features")
nodes_list = []
for i in range(3):
first_lvl_input = feature_input[i :: 4] ######## marked line
out1 = dense1(first_lvl_input)
out2 = dense2(out1)
nodes_list.append(out2)
joined = layers.concatenate(nodes_list)
final_output = dense3(joined)
model = keras.Model(inputs = feature_input, outputs = final_output, name="extrema_model")
compile_and_fit(model, train_ds, val_ds, patience=4)
model.compile(loss = tf.keras.losses.BinaryCrossentropy(),
optimizer = tf.keras.optimizers.RMSprop(),
metrics=keras.metrics.BinaryAccuracy())
history = model.fit(train_ds, epochs=10, validation_data=val_ds)
But when I try to run this code I get this error:
MklConcatOp : Dimensions of inputs should match: shape[0][0]= 71 vs. shape[18][0] = 70
[[node extrema_model/concatenate_2/concat (defined at <ipython-input-373-5efb41d312df>:398) ]] [Op:__inference_train_function_15338]
(please don't pay attention to numbers as they are from my real code) this is because it gets the whole data including the labels as an input, but shouldn't Keras only feed the features itself? Anyway, if I write the marked line as below:
first_lvl_input = feature_input[i :12: 4]
it doesn't give me the above error anymore. But, then I get another error which I know why happens but I don't know how to resolve it.
InvalidArgumentError: Incompatible shapes: [4,1] vs. [32,1]
[[node gradient_tape/binary_crossentropy/logistic_loss/mul/BroadcastGradientArgs
(defined at <ipython-input-1-b82546367b3c>:398) ]] [Op:__inference_train_function_6098]
This is because keras is feeding again the whole batch array, whereas in Keras documentation it is written you shouldn't specify the batch dimension for the program as it understands itself, so I expected Keras to feed the data one by one for my code to work. So I appreciate any ideas on how to resolve this or on how to write a code for what I want. Thanks.
You can wrap the dense layers in timedistributed wrapper , and reshape your data to have three dimensions (1000,3,4)(batch, sequence, feature), so for each time step (=3 that replace your for loop code .) the four features will be multiplied with the same weights each time.
example_train_features = np.arange(12000).reshape(1000, 3, 4 )
example_lanbels = np.random.randint(2, size=1000) #these data are just for illustration purposes
train_ds = tf.data.Dataset.from_tensor_slices((example_train_features, example_lanbels)).shuffle(buffer_size = 1000).batch(32)
dense1 = layers.TimeDistributed(layers.Dense(1, activation="relu")) #input shape:4
dense2 =layers.TimeDistributed(layers.Dense(2, activation="relu")) #input shape:1
dense3 = layers.Dense(1, activation="sigmoid") #input shape:6
feature_input = keras.Input(shape=(3,4), name="features")
out1 = dense1(feature_input)
out2 = dense2(out1)
z = layers.Flatten()(out2)
final_output = dense3(z)
model = keras.Model(inputs = feature_input, outputs = final_output, name="extrema_model")
model.compile(loss = tf.keras.losses.BinaryCrossentropy(),
optimizer = tf.keras.optimizers.RMSprop(),
metrics=keras.metrics.BinaryAccuracy())
history = model.fit(train_ds, epochs=10)
I am trying to create a convolutional neural network that has two regression outputs, a score and a confidence. I have frozen the layers they have in common in the hopes that the addition of the confidence output doesn't change the score, but in my experiments it has. For the model with just the score, I used Xception and added a simple GlobalAveragePooling2D and Dense(512) layer then output a single number.
base_model = Xception(input_shape=(224, 224, 3), weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(512, activation='relu')(x)
predictions = Dense(1, activation='sigmoid')(x)
model = Model(inputs=base_model.input, outputs=predictions)
for layer in base_model.layers:
layer.trainable = False
optimizer = Adam(learning_rate=learning_rate)
model.compile(loss='mae', optimizer=optimizer, metrics=['mse','mae'], run_eagerly=True)
Here is what the end of model.summary() looks like:
When I fit it, the model produces good results.
But when I try to add a second output the result of the first becomes much worse. The new model gets trained off tuples where is first number is the same as the first model and the second number is a confidence value. The model is very similar to the one above.
base_model = Xception(input_shape=(224, 224, 3), weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
score_x = Dense(512, activation='relu')(x)
score_out = Dense(1, activation='sigmoid', name='score_model')(score_x)
confidence_x = Dense(512, activation='relu')(x)
confidence_out = Dense(1, name='confidence_model')(confidence_x)
model = Model(inputs=base_model.input, outputs=[score_out, confidence_out])
for layer in base_model.layers:
layer.trainable = False
losses = {'score_model': 'mae', 'confidence_model': 'mae'}
loss_weights = {'score_model': 1, 'confidence_model': 1}
model.compile(loss=losses, loss_weights=loss_weights, optimizer=optimizer, metrics=['mse','mae'], run_eagerly=True)
When I look at model.summary(), it has twice as many trainable parameters as the previous model, which is exactly what I was expecting. Everything looks right to me so far.
But when I train this model the performance on the score is much worse. I was thinking it would be the same (within stochastic variation). After the first epoch, the loss from the first model is around 0.125. The score_model_loss from the second model is around 0.554. Clearly I'm not completely separating the models. What am I missing?
Note: This answer will work well only because the layer that do the feature extraction are frozen. As #Akshay Sehgal stated in the comments :
optimizing for 2 goals together is actually a completely different problem than optimizing 2 independent goals separately
In that case, we are optimizing for 2 goals separately.
The easiest solution is probably to write a custom training loop with 2 tf.GradientTape, one for each goal. Lets consider this really simple example:
Dummy data
Let's create some random Data
import tensorflow as tf
X = tf.random.normal((1000,1))
y1= 3*X + 1
y2 = -2*X +2
ds = tf.data.Dataset.from_tensor_slices((X,y1,y2)).batch(10)
Creating a model with 2 outputs
In that example, I skip the feature extraction step, as a simple linear regression will work for the data. But as your feature extractor network is frozen, the example is similar.
inp = tf.keras.Input((1,))
dense_1 = tf.keras.layers.Dense(1, name="objective1")(inp)
dense_2 = tf.keras.layers.Dense(1, name="objective2")(inp)
model = tf.keras.Model(inputs=inp, outputs=[dense_1, dense_2])
# setting up the loss functions as well as the optimizer
opt = tf.optimizers.SGD()
loss_func1 = tf.losses.mean_squared_error
loss_func2 = tf.losses.mean_absolute_error
Note the name given to the two dense layers: I will use them later to retrieve the appropriate weights.
Getting the weights to optimize
We can use the name set before to retrieve the variable belonging to each objective :
var1, var2 = [],[]
for l in model.layers:
if "objective1" in l.name:
var1 += l.trainable_variables
if "objective2" in l.name:
var2 += l.trainable_variables
The training loop
You simply need to tapes, one for each objective. You can use different optimizer as well, if it makes the training better.
counter = 0
for x, y1, y2 in ds:
counter += 1
with tf.GradientTape() as tape1, tf.GradientTape() as tape2:
pred1, pred2 = model(x)
loss1 = loss_func1(y1, pred1)
loss2 = loss_func2(y2, pred2)
grad1 = tape1.gradient(loss1, var1)
grad2 = tape2.gradient(loss2, var2)
opt.apply_gradients(zip(grad1, var1))
opt.apply_gradients(zip(grad2, var2))
if counter % 10:
print(f"Step : {counter}, objective1: {tf.reduce_mean(loss1)}, objective2: {tf.reduce_mean(loss2)}")
If we run the training, we get:
Step : 1, objective1: 4.609124183654785, objective2: 2.6634981632232666
[...]
Step : 99, objective1: 7.176481902227555e-14, objective2: 0.030187154188752174
The principle advantage training that way is that you just need to extract the features once for the two objectives.
I am having trouble finding a Generator architecture for a simple array.
The ideas is the following,
Generator takes the encoded input from the auto-encoder (trained earlier) which outputs an array of shape (200,).
My current generator Model:
def Generator():
input_layer = Input(shape=[200]) #(bs, 200)
d1 = Dense(165, activation = 'relu')(input_layer)
d2 = Dense(140, activation = 'relu')(d1)
d3 = Dense(100, activation = 'relu')(d2)
d4 = Dense(50, activation = 'relu')(d3)
d5 = Dense(25, activation = 'relu')(d4) #(bs, 25)
up1 = Dense(50, activation = 'relu')(d5)
cat = concatenate([up1,dense_4])
up2 = Dense(100, activation = 'relu')(cat)
cat = concatenate([up2, dense_3])
up3 = Dense(140, activation = 'relu')(cat)
cat = concatenate([up3, dense_2])
up4 = Dense(165, activation='relu')(cat)
cat = concatenate([up4, dense_1])
last = Dense(200, activation = 'relu')(cat)
return tf.keras.models.Model(inputs = input_layer, outputs=last)
The train steps, losses and cycleGAN tensorflow documentation are same.
Since I am using this generator for a cycle-GAN project, I thought i would get good results after 100 epochs, but it was the opposite.
I used a Generator without skip connections but the results were much worse than the one above.
I searched for RES-Net and U-Net Architecture but since they require Convolutional layers, I haven't tried them yet.
My Questions:
What recommendations do you have?
What type of skips connections should I use (add or concatenate)?
I am building image classifier with localisation using CNN.
My CNN has image as input, however after last CONV layer i want to split it into two , one part for image classification, and next part for image localisation.
Needless to say one part should use mean squared error, another one should use binary binary_crossentropy. My structure is something like:
input_image = Input(shape=(IMG_W, IMG_H, 3))
# Layer 1
x = Conv2D(32, (3,3), strides=(1,1), padding='same', name='conv_1', use_bias=False)(input_image)
x = BatchNormalization(name='norm_1')(x)
x = LeakyReLU(alpha=0.1)(x)
# Layer 2
x = Conv2D(64, (3,3), strides=(1,1), padding='same', name='conv_2', use_bias=False)(x)
x = BatchNormalization(name='norm_2')(x)
x = LeakyReLU(alpha=0.1)(x)
now i want to divied it into two Dense (FC) layer
class_layer = x
class_layer = Dense(256,activation="relu")(class_layer)
class_layer = Dense(2,activation="softmax")(class_layer)
model_one = Model(input_image,class_layer)
model_one.compile(loss="binary_crossentrophy", optimizer=keras.optimizers.Adam(),metrics=['accuracy'])
and layer for image localisation
x = Dense(1024,activation="relu")(x)
x = Dense(256,activation="relu")(x)
x = Dense(4,activation="relu")(x)
model = Model(input_image,x)
model.compile(loss="mean_squared_error", optimizer=keras.optimizers.Adam(),metrics=['accuracy'])
However how can i concat the layes so the result vector will be ( 2 + 4 ) ?
Can i even achieve splitting like this?
I know about model.concatenate However this should be called before compiling, so each part wouldnt have different loss function
Thanks for help and answers
You can initialize your model with multiple outputs, and specify losses for each of them. If you want your loss from model_one to have weight a, and the loss from model to have weight b, so your total loss would look like a*mse + b*binary_ce, then you would have something like
model = Model(input_image, [x, class_layer])
model.compile(loss=['mean_squared_error', 'binary_crossentropy'],
loss_weights=[a, b],
optimizer=keras.optimizers.Adam())
See the loss and loss_weights parameters in the documentation for Model.compile for more details https://keras.io/models/model/.
I am trying to mimic this keras blog about fine tuning image classifiers. I would like to use the Inceptionv3 found on a fchollet repo.
Inception is a Model (functional API), so I can't just do model.add(top_model) which is reserved for Sequential.
How can I add combine two functional Models? Let's say I have
inputs = Input(shape=input_shape)
x = Flatten()(inputs)
predictions = Dense(4, name='final1')(x)
model1 = Model(input=inputs, output=predictions)
for the first model and
inputs_2 = Input(shape=(4,))
y = Dense(5)(l_inputs)
y = Dense(2, name='final2')(y)
predictions_2 = Dense(29)(y)
model2 = Model(input=inputs2, output=predictions2)
for the second. I now want an end-to-end that goes from inputs to predicions_2 and links predictions to inputs_2.
I tried using model1.get_layer('final1').output but I had a mismatch with types and I couldn't make it work.
I haven't tried this but according to the documentation functional models are callable, so you can do something like:
y = model2(model1(x))
where x is the data that goes to inputs and y is the result of predictions_2
I ran into this problem as well while fine tuning VGG16. Here's what worked for me and I imagine a similar approach can be taken for Inception V3. Tested on Keras 2.0.5 with Tensorflow 1.2 backend.
# NOTE: define the following variables
# top_model_weights_path
# num_classes
# dense_layer_1 = 4096
# dense_layer_2 = 4096
vgg16 = applications.VGG16(
include_top=False,
weights='imagenet',
input_shape=(224, 224, 3))
# Inspect the model
vgg16.summary()
# This shape has to match the last layer in VGG16 (without top)
dense_input = Input(shape=(7, 7, 512))
dense_output = Flatten(name='flatten')(dense_input)
dense_output = Dense(dense_layer_1, activation='relu', name='fc1')(dense_output)
dense_output = Dense(dense_layer_2, activation='relu', name='fc2')(dense_output)
dense_output = Dense(num_classes, activation='softmax', name='predictions')(dense_output)
top_model = Model(inputs=dense_input, outputs=dense_output, name='top_model')
# from: https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
# note that it is necessary to start with a fully-trained
# classifier, including the top classifier,
# in order to successfully do fine-tuning
top_model.load_weights(top_model_weights_path)
block5_pool = vgg16.get_layer('block5_pool').output
# Now combine the two models
full_output = top_model(block5_pool)
full_model = Model(inputs=vgg16.input, outputs=full_output)
# set the first 15 layers (up to the last conv block)
# to non-trainable (weights will not be updated)
# WARNING: this may not be applicable for Inception V3
for layer in full_model.layers[:15]:
layer.trainable = False
# Verify things look as expected
full_model.summary()
# compile the model with a SGD/momentum optimizer
# and a very slow learning rate.
full_model.compile(
loss='binary_crossentropy',
optimizer=optimizers.SGD(lr=5e-5, momentum=0.9),
metrics=['accuracy'])
# Train the model...
I think there are 2 options depending on what you need:
(a) predictions_1 and predictions_2 matter for you. In this case, you can train a network with 2 outputs. Here an example derived from your post:
input_shape = [3, 20]
inputs = Input(shape=input_shape)
x = Flatten()(inputs)
predictions_1 = Dense(4, name='predictions_1')(x)
# here the predictions_1 just corresponds to your next layer's input
y = Dense(5)(predictions_1)
y = Dense(2)(y)
predictions_2 = Dense(29, name='predictions_2')(y)
# you specify here that you have 2 outputs
model = Model(input=inputs, output=[predictions_1, predictions_2])
For the .fit and .predict, you can find a lot of details in https://keras.io/getting-started/functional-api-guide/, section: Multi-input and multi-output models.
(b) you are only interested in predictions_2. In this case, you can just do:
input_shape = [3, 20]
inputs = Input(shape=input_shape)
x = Flatten()(inputs)
predictions_1 = Dense(4, name='predictions_1')(x)
# here the predictions_1 just corresponds to your next layer's input
y = Dense(5)(predictions_1)
y = Dense(2)(y)
predictions_2 = Dense(29, name='predictions_2')(y)
# you specify here that your only output is predictions_2
model = Model(input=inputs, output=predictions_2)
Now as regards inception_v3. You can define by yourself the architecture and modify the deep layers inside according to your needs (giving to these layers specific names in order to avoid keras naming them automatically).
After that, compile your model and loads weights (as in https://keras.io/models/about-keras-models/ see function load_weights(..., by_name=True))
# you can load weights for only the part that corresponds to the true
# inception_v3 architecture. The other part will be initialized
# randomly
model.load_weights("inception_v3.hdf5", by_name=True)
This should solve your problem. By the way, you can find extra information here: https://www.gradientzoo.com. The doc. explains several saving / loading / fine-tuning routines ;)
Update: if you do not want to redefine your model from scratch you can do the following:
input_shape = [3, 20]
# define model1 and model2 as you want
inputs1 = Input(shape=input_shape)
x = Flatten()(inputs1)
predictions_1 = Dense(4, name='predictions_1')(x)
model1 = Model(input=inputs1, output=predictions_1)
inputs2 = Input(shape=(4,))
y = Dense(5)(inputs2)
y = Dense(2)(y)
predictions_2 = Dense(29, name='predictions_2')(y)
model2 = Model(input=inputs2, output=predictions_2)
# then define functions returning the image of an input through model1 or model2
def give_model1():
def f(x):
return model1(x)
return f
def give_model2():
def g(x):
return model2(x)
return g
# now you can create a global model as follows:
inputs = Input(shape=input_shape)
x = model1(inputs)
predictions = model2(x)
model = Model(input=inputs, output=predictions)
Drawing from filitchp's answer above, assuming the output dimensions of model1 match the input dimensions of model2, this worked for me:
model12 = Model(inputs=inputs, outputs=model2(model1.output))