My model has two inputs and I want to calculate the loss of the two inputs separately because the loss of input 2 has to be multiplied by a weight. Then add up these two losses as the final loss for the model. The structure is somehow like this:
This is my model:
def final_loss(y_true, y_pred):
loss = x_loss_value.output + y_model.output*weight
return loss
def mymodel(input_shape): #pooling=max or avg
img_input1 = Input(shape=(input_shape[0], input_shape[1], input_shape[2], ))
image_input2 = Input(shape=(input_shape[0], input_shape[1], input_shape[2], ))
#for input1
x = Conv2D(32, (3, 3), strides=(2, 2))(img_input1)
x_dense = Dense(2, activation='softmax', name='predictions')(x)
x_loss_value = my_categorical_crossentropy_layer(x)[input1_y_true, input1_y_pred]
x_model = Model(inputs=img_input1, outputs=x_loss_value)
#for input2
y = Conv2D(32, (3, 3), strides=(2, 2))(image_input2)
y_dense = Dense(2, activation='softmax', name='predictions')(y)
y_loss_value = my_categorical_crossentropy_layer(y)[input2_y_true, input2_y_pred]
y_model = Model(inputs=img_input2, outputs=y_loss_value)
concat = concatenate([x_model.output, y_model.output])
final_dense = Dense(2, activation='softmax')(concat)
# Create model.
model = Model(inputs=[img_input1,image_input2], output = final_dense)
return model
model.compile(optimizer = optimizers.adam(lr=1e-7), loss = final_loss, metrics = ['accuracy'])
Most of the related solutions I found just customize the final loss and change the loss in Model.complie(loss=customize_loss).
However, I need to apply different losses for different inputs. I'm trying to use a customized layer like this, and get my loss value for final the loss calculation:
class my_categorical_crossentropy_layer1(Layer):
def __init__(self, **kwargs):
self.is_placeholder = True
super(my_categorical_crossentropy_layer1, self).__init__(**kwargs)
def my_categorical_crossentropy_loss(self, y_true, y_pred):
y_pred = K.constant(y_pred) if not K.is_tensor(y_pred) else y_pred
y_true = K.cast(y_true, y_pred.dtype)
return K.categorical_crossentropy(y_true, y_pred, from_logits=from_logits)
def call(self, y_true, y_pred):
loss = self.my_categorical_crossentropy_loss(y_true, y_pred)
self.add_loss(loss, inputs=(y_true, y_pred))
return loss
But, inside the keras model, I can't figure out how to get the y_true and y_pred of the current epoch/batch for my loss layer.
So I can't add x = my_categorical_crossentropy_layer()[y_true, y_pred] to my model.
Is there any way to do the variable calculation like this in the keras model?
Further, can Keras get the previous epoch's training loss or val loss during training process?
I want to apply the previous epoch's training loss as my weight in the final loss.
this is my proposal...
your it's a double binary classification problem that you want to carry out using a single fit. the first thing to notice is that you need to take care of dimensionality: your input is 4d while your target is 2d one-hot encoded so your network needs something to reduce dimensionality, for example, flatten or global pooling. after this, you can start fitting creating a single model with two inputs and two outputs and use two losses. in your case, the losses are weighted categorical_crossentropy. keras enable by default to set the loss weights using loss_weights parameters. to reproduce the formula loss1*1+loss2*W set the weights to [1, W]. you can use the loss_weights parameter also specifying different losses for your output in this way losses=[loss1, loss2, ....] which are linearly combined with the weights specified in the loss_weights
below a working example
input_shape = (28,28,3)
n_sample = 10
# create dummy data
X1 = np.random.uniform(0,1, (n_sample,)+input_shape) # 4d
X2 = np.random.uniform(0,1, (n_sample,)+input_shape) # 4d
y1 = tf.keras.utils.to_categorical(np.random.randint(0,2, n_sample)) # 2d
y2 = tf.keras.utils.to_categorical(np.random.randint(0,2, n_sample)) # 2d
def mymodel(input_shape, weight):
img_input1 = Input(shape=(input_shape[0], input_shape[1], input_shape[2], ))
img_input2 = Input(shape=(input_shape[0], input_shape[1], input_shape[2], ))
# for input1
x = Conv2D(32, (3, 3), strides=(2, 2))(img_input1)
x = GlobalMaxPool2D()(x) # pass from 4d to 2d
x = Dense(2, activation='softmax', name='predictions1')(x)
# for input2
y = Conv2D(32, (3, 3), strides=(2, 2))(img_input2)
y = GlobalMaxPool2D()(y) # pass from 4d to 2d
y = Dense(2, activation='softmax', name='predictions2')(y)
# Create model
model = Model([img_input1,img_input2], [x,y])
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'],
loss_weights=[1,weight])
return model
weight = 0.3
model = mymodel(input_shape, weight)
model.summary()
model.fit([X1,X2], [y1,y2], epochs=2)
Related
I am trying to develop an autoencoder that also uses the labels of the greyscale images I am trying to reconstruct. For that, I define a custom loss function.
However, when I try to run my code, I get this error:
TypeError: Keras symbolic inputs/outputs do not implement __len__.
You may be trying to pass Keras symbolic inputs/outputs to a TF API
that does not register dispatching, preventing Keras from
automatically converting the API call to a lambda layer in the
Functional Model. This error will also get raised if you try asserting
a symbolic input/output directly.
I confirmed, that it's being produced at return total_loss (see below)
I already tried the solutions provided here (trying to disable eager execution) and here (changing math operations to tf.math versions + disabling eager execution). Similar questions provide more or less the same answers or use additional libraries which I don't. However, none of the solutions work for me.
Here is the code I'm working with:
import tensorflow as tf
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow import keras
def joint_loss(imgs_true, imgs_pred, y_true, y_pred, reconstruction_weight,
classification_weight):
# imgs_true = original images (= keras.input)
# imgs_pred = reconstructed images of my autoencoder
# y_true = true labels of my data (= keras.input)
# y_pred = predicted labels from my bottleneck layer
# reconstruction_weight/classification_weight = explanation below in "hyperparameters"
reconstruction_loss = tf.reduce_mean(tf.square(imgs_true - imgs_pred))
classification_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(y_true, y_pred))
total_loss = (tf.math.scalar_mul(reconstruction_weight, reconstruction_loss) +
tf.math.scalar_mul(classification_weight, classification_loss))
return total_loss
# define function of the autoencoder that's to be optimized; give back validation loss
def create_and_train_autoencoder(encoding_dim, f1, f2, f3, f4, f5, k1, k2,
num_epochs, bat_size, learning_rate,
recon_weight, class_weight):
# explanation for parameters in "hyperparameter" below
# input for images and labels
input_img = keras.Input(shape=(64, 128, 1))
input_label = keras.Input(shape=(1,))
# Encoding layers
x = keras.layers.Conv2D(f1, (k1, k2), activation='relu', padding='same')(input_img)
x = keras.layers.MaxPooling2D((2, 2), padding='same')(x)
x = keras.layers.Conv2D(f2, (k1, k2), activation='relu', padding='same')(x)
x = keras.layers.MaxPooling2D((2, 2), padding='same')(x)
x = keras.layers.Conv2D(f3, (k1, k2), activation='relu', padding='same')(x)
x = keras.layers.MaxPooling2D((2, 2), padding='same')(x)
# Bottleneck
encoded = keras.layers.Conv2D(encoding_dim, (k1, k2), activation='relu',
padding='same')(x)
# Define the classification branch
encoded_flattened = keras.layers.Flatten()(encoded)
encoded_flattened_dense1 = keras.layers.Dense(f4, activation='relu')(encoded_flattened)
encoded_flattened_dense2 = keras.layers.Dense(f5, activation='relu')(encoded_flattened_dense1)
label_output = keras.layers.Dense(1, activation='sigmoid')(encoded_flattened_dense2)
# Decoding layers
x = keras.layers.UpSampling2D((2, 2))(encoded)
x = keras.layers.Conv2D(f3, (k1, k2), activation='relu', padding='same')(x)
x = keras.layers.UpSampling2D((2, 2))(x)
x = keras.layers.Conv2D(f2, (k1, k2), activation='relu', padding='same')(x)
x = keras.layers.UpSampling2D((2, 2))(x)
x = keras.layers.Conv2D(f1, (k1, k2), activation='relu', padding='same')(x)
decoded = keras.layers.Conv2D(1, (k1, k2), activation='sigmoid', padding='same')(x)
# create model and compile
autoencoder = keras.Model([input_img, input_label], [decoded, label_output])
autoencoder.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
loss=joint_loss(input_img, decoded, input_label, label_output,
recon_weight, class_weight))
# Create early stopping callback
early_stopping = EarlyStopping(monitor='val_loss', patience=15,
restore_best_weights=True)
# fit
history = autoencoder.fit([x_train, y_train], [x_train, y_train],
epochs=num_epochs,
batch_size= bat_size,
shuffle=True,
validation_data=([x_test, y_test], [x_test, y_test]),
callbacks=[early_stopping])
return history.history['val_loss'][-1]
# execute the function
best_autoencoder = create_and_train_autoencoder(
encoding_dim, f1, f2, f3, f4, f5, k1, k2, num_epochs, bat_size,
learning_rate, reconstr_weight, class_weight)
A little bit of background if needed:
I have 51 (64,128) greyscale images in a (51,64,128) array as my x_train. I have 13 (64,128) greyscale images in a (13,64,128) array as my x_test. I have 51 (1,) labels in a (51,) array as my y_train. I have 13 (1,) labels in a (13,) array as my y_test. My labels are [0,1].
For the sake of this example I'm using the following hyperparameters:
encoding_dim=14 # number filters for reduced dimension
f1 = 16 # number filters
f2=f3 = 8 # number filters
f4=f5 = 64 # number filters
k1=k2 = 3 # kernel size
num_epochs = 25 # number of epochs
bat_size = 32 # batch_size
learning_rate = 0.0001 # learning rate
reconstr_weight = 0.5 # parameter for manipulating joint loss
class_weight = 0.5 # parameter for manipulating joint loss
I'm using tensorflow 2.7. I'm not using Cuda.
Any help is much appreciated!
In the function 'joint_loss' you are assigning a value to 'total_loss' variable, but returning the function itself (which doesn't make any sense). I guess instead of:
return joint_loss
You intention was:
return total_loss
It seems like model.add_loss() solves the problem. But can anyone explain to me why? Is it only because now my custom loss is a tensor instead of a function? And why wouldn't it work with my joint_loss being a function?
The full code at the according place would look like this:
# create model and compile
autoencoder = keras.Model([input_img, input_label], [decoded, label_output])
autoencoder.add_loss(joint_loss(input_img, decoded, input_label, label_output, recon_weight, class_weight))
autoencoder.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate))
The rest remains the same.
There are so many problems with your loss function. What I mean is you are passing each input-example to the loss function which is going from the model.fit() function. So, your loss function will not be able to access the local variable of the Tensorflow Graph. The thing which you are doing here autoencoder.add_loss(joint_loss(input_img, decoded, input_label, label_output, recon_weight, class_weight)) is also not a good way to pass your loss function. The difference between model.compile(loss) and model.add_loss() function is The model.compile(loss) method is used to specify the primary loss function for the model during training. This is typically a supervised learning problem, where you have a set of inputs and corresponding labels, and you want to minimize the difference between the predicted outputs of the model and the true labels.
On the other hand, the model.add_loss() function is used to add additional loss terms to the model that are not directly related to the supervised learning task. This is useful in situations where you want to add regularization terms, such as L1 or L2 regularization, to the loss function. These additional loss terms are added to the primary loss function specified by compile and are used to penalize certain model parameters in order to reduce overfitting.
Well, I have changed your code and used the tf.Gradient() as tape to compute the loss function, because in your case we cannot pass the input-image in each iteration through the model.compile() method.
x_train = tf.random.normal((51,64,128,1))
x_test = tf.random.normal((13,64,128,1))
y_train = tf.random.uniform((51,1), 0,2, dtype=tf.int32)
y_test = tf.random.uniform((13,1), 0,2, dtype=tf.int32)
train_examples = tf.data.Dataset.from_tensor_slices((x_train,y_train))
test_examples = tf.data.Dataset.from_tensor_slices((x_test,y_test))
train_examples = train_examples.batch(1)
test_examples = test_examples.batch(1)
encoding_dim=14 # number filters for reduced dimension
f1 = 16 # number filters
f2=f3 = 8 # number filters
f4=f5 = 64 # number filters
k1=k2 = 3 # kernel size
num_epochs = 25 # number of epochs
bat_size = 32 # batch_size
learning_rate = 0.0001 # learning rate
reconstr_weight = 0.5 # parameter for manipulating joint loss
class_weight = 0.5 # parameter for manipulating joint loss
def joint_loss(imgs_true, imgs_pred, y_true, y_pred, reconstruction_weight, classification_weight):
y_true = tf.cast(y_true , dtype=tf.float32)
y_pred = tf.cast(y_pred , dtype=tf.float32)
reconstruction_loss = tf.reduce_mean(tf.square(imgs_true - imgs_pred))
classification_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(y_true, y_pred))
total_loss = (tf.math.scalar_mul(reconstruction_weight, reconstruction_loss) +
tf.math.scalar_mul(classification_weight, classification_loss))
return total_loss
# define function of the autoencoder that's to be optimized; give back validation loss
tf.keras.backend.clear_session()
class AutoEncoder(tf.keras.Model):
def __init__(self,encoding_dim, f1, f2, f3, f4, f5, k1, k2):
super().__init__()
self.conv2d_1 = keras.layers.Conv2D(f1, (k1, k2), activation='relu', padding='same')
self.maxpool_1 = keras.layers.MaxPooling2D((2, 2), padding='same')
self.conv2d_2 = keras.layers.Conv2D(f2, (k1, k2), activation='relu', padding='same')
self.maxpool_2 = keras.layers.MaxPooling2D((2, 2), padding='same')
self.conv2d_3 = keras.layers.Conv2D(f3, (k1, k2), activation='relu', padding='same')
self.maxpool_3 = keras.layers.MaxPooling2D((2, 2), padding='same')
self.bottelneck = keras.layers.Conv2D(encoding_dim, (k1, k2), activation='relu', padding='same')
self.encoded_flattened = keras.layers.Flatten()
self.encoded_flattened_dense1 = keras.layers.Dense(f4, activation='relu')
self.encoded_flattened_dense2 = keras.layers.Dense(f5, activation='relu')
self.label_output = keras.layers.Dense(1, activation='sigmoid')
self.upsample_1 = keras.layers.UpSampling2D((2, 2))
self.conv2d_4 = keras.layers.Conv2D(f3, (k1, k2), activation='relu', padding='same')
self.upsample_2 = keras.layers.UpSampling2D((2, 2))
self.conv2d_5 = keras.layers.Conv2D(f2, (k1, k2), activation='relu', padding='same')
self.upsample_3 = keras.layers.UpSampling2D((2, 2))
self.conv2d_6 = keras.layers.Conv2D(f1, (k1, k2), activation='relu', padding='same')
self.decoded = keras.layers.Conv2D(1, (k1, k2), activation='sigmoid', padding='same')
# explanation for parameters in "hyperparameter" below
def call(self, input_img , input_label):
# Encoding layers
x = self.conv2d_1(input_img)
x = self.maxpool_1(x)
x = self.conv2d_2(x)
x = self.maxpool_2(x)
x = self.conv2d_3(x)
x = self.maxpool_3(x)
# Bottleneck
encoded = self.bottelneck(x)
# Define the classification branch
encoded_flattened = self.encoded_flattened(encoded)
encoded_flattened_dense1 = self.encoded_flattened_dense1(encoded_flattened)
encoded_flattened_dense2 = self.encoded_flattened_dense2(encoded_flattened_dense1)
label_output = self.label_output(encoded_flattened_dense2)
# Decoding layers
x = self.upsample_1(encoded)
x = self.conv2d_4(x)
x = self.upsample_2(x)
x = self.conv2d_5(x)
x = self.upsample_3(x)
x = self.conv2d_6(x)
decoded = self.decoded(x)
return [decoded, label_output]
autoencoder = AutoEncoder(encoding_dim, f1, f2, f3, f4, f5, k1, k2)
autoencoder(x_train[0:1,:,:] , y_train[:1,:])
Now, to train the model we have to use the custom-Gradient class.
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
#Metrics which are used to store the average loss and accuracy for per epoch
train_loss = tf.keras.metrics.Mean(name='train_loss')
#tf.function
def train_step(inp , tar_inp):
with tf.GradientTape() as tape:
decoded, label_output = autoencoder(inp, tar_inp)
loss = joint_loss(inp, decoded, tar_inp, label_output, reconstr_weight, class_weight)
gradients = tape.gradient(loss, autoencoder.trainable_variables) #Tape.gradient is used to compute the gradients
optimizer.apply_gradients(zip(gradients, autoencoder.trainable_variables)) #This step is used to update the gradients
train_loss(loss) #keep adding the loss
import time
for epoch in range(20):
start = time.time()
train_loss.reset_states()
for (batch, [inp, tar]) in enumerate(train_examples):
train_step(inp, tar)
if batch % 50 == 0:
print(f'Epoch {epoch + 1} Batch {batch} Loss {train_loss.result():.4f}')
print(f'Epoch {epoch + 1} Loss {train_loss.result():.4f}')
print(f'Time taken for 1 epoch: {time.time() - start:.2f} secs\n')
You can add the test-loss by yourself.
I've been trying to train audio classification model. When i used SGD with learning_rate=0.01, momentum=0.0 and nesterov=False i get the following Loss and Accuracy graphs:
I can't figure out what what causes the instant decrease in loss at around epoch 750. I tried different learning rates, momentum values and their combinations, different batch sizes, initial layer weights etc. to get more appropriate graph but no luck at all. So if you have any knowledge about what causes this please let me know.
Code i used for this training is below:
# MFCCs Model
x = tf.keras.layers.Dense(units=512, activation="sigmoid")(mfcc_inputs)
x = tf.keras.layers.Dropout(0.5)(x)
x = tf.keras.layers.Dense(units=256, activation="sigmoid")(x)
x = tf.keras.layers.Dropout(0.5)(x)
# Spectrograms Model
y = tf.keras.layers.Conv2D(32, kernel_size=(3,3), strides=(2,2))(spec_inputs)
y = tf.keras.layers.AveragePooling2D(pool_size=(2,2), strides=(2,2))(y)
y = tf.keras.layers.BatchNormalization()(y)
y = tf.keras.layers.Activation("sigmoid")(y)
y = tf.keras.layers.Conv2D(64, kernel_size=(3,3), strides=(1,1), padding="same")(y)
y = tf.keras.layers.AveragePooling2D(pool_size=(2,2), strides=(2,2))(y)
y = tf.keras.layers.BatchNormalization()(y)
y = tf.keras.layers.Activation("sigmoid")(y)
y = tf.keras.layers.Conv2D(64, kernel_size=(3,3), strides=(1,1), padding="same")(y)
y = tf.keras.layers.AveragePooling2D(pool_size=(2,2), strides=(2,2))(y)
y = tf.keras.layers.BatchNormalization()(y)
y = tf.keras.layers.Activation("sigmoid")(y)
y = tf.keras.layers.Flatten()(y)
y = tf.keras.layers.Dense(units=256, activation="sigmoid")(y)
y = tf.keras.layers.Dropout(0.5)(y)
# Chroma Model
t = tf.keras.layers.Dense(units=512, activation="sigmoid")(chroma_inputs)
t = tf.keras.layers.Dropout(0.5)(t)
t = tf.keras.layers.Dense(units=256, activation="sigmoid")(t)
t = tf.keras.layers.Dropout(0.5)(t)
# Merge Models
concated = tf.keras.layers.concatenate([x, y, t])
# Dense and Output Layers
z = tf.keras.layers.Dense(64, activation="sigmoid")(concated)
z = tf.keras.layers.Dropout(0.5)(z)
z = tf.keras.layers.Dense(64, activation="sigmoid")(z)
z = tf.keras.layers.Dropout(0.5)(z)
z = tf.keras.layers.Dense(1, activation="sigmoid")(z)
mdl = tf.keras.Model(inputs=[mfcc_inputs, spec_inputs, chroma_inputs], outputs=z)
mdl.compile(optimizer=SGD(), loss="binary_crossentropy", metrics=["accuracy"])
mdl.fit([M_train, X_train, C_train], y_train, batch_size=8, epochs=1000, validation_data=([M_val, X_val, C_val], y_val), callbacks=[tensorboard_cb])
I'm not too sure myself, but as Frightera said, sigmoid activations in hidden layers can cause trouble since it is more sensitive to weight initialization, and if the weights aren't perfectly set, it can cause gradients to be very small. Perhaps the model eventually deals with the small sigmoid gradients and loss finally decreases around epoch 750, but just my hypothesis. If ReLU doesn't work, try using LeakyReLU since it doesn't have the dead neuron effect that ReLU does.
I am building a key-point detection system of the human face. The goal is to have an image of the face be input into the model, and the model then detects anatomical landmarks in the image (eyes, nose) and outputs the pixel coordinates of the landmarks that are visible. There are three targets per landmark: x, y, visible. X and Y are the pixel coordinates, and visible is whether the landmark is in the image or not. The plan is to first have a binary cross entropy loss between predicted visibility and true visibility. Then, the second loss is a regression loss (I'm using MAPE) between the x,y coordinates and the targets. However, the regression loss would only be calculated for landmarks that are visible. The loss would look something like:
#Pseudo-code
def loss(y_true,y_pred):
if y_true[2] == 1
#Probability that landmark is in image
#Compute binary cross entropy loss
#Compute MAPE regression loss
Total_loss = Binary_loss + MAPE_loss
return Total_loss
else:
Total_loss = Binary loss
return Total_loss
Once the loss function is written, how would I go about implementing it in code? I know how to create models for each problem (checking the coordinates, and separately checking the visibility), but I'm not sure exactly how to go about combining the two heads with the conditional loss function. How would I combine the layers (Conv, Flatten, Dense for each head) to get the desired output? Thank you!
EDIT:
I'm not able to upload the data, but here is an image of it. The first 9 columns are the coordinates, and visibility of the landmarks. The last column is the corresponding image which has been flattened. When I load in the data for training, these are the steps I do:
###Read in data file
file = "Directory/file.csv"
train_data = pd.read_csv(file)
###Convert each coordinate column to type float64
train_data['xreye'] = train_data['xreye'].astype(np.float64)
...
###Convert image column to string type
train_data['Image'] = train_data['Image'].astype(str)
#Image is feature, other values are labels to predict later
#Image column values are strings, also some missing values, have to split
##string by space and append it and handle missing values
imag = []
for i in range(len(train_data)):
img = train_data['Image'][i].split(' ')
img = ['0' if x == '' else x for x in img]
imag.append(img)
#Reshape and convert to float value
image_list = np.array(imag,dtype = 'uint8')
X_train = image_list.reshape(-1,256,256,1)
####Get pixel coordinates and visibility targets
training = train_data[['xreye','yreye','reyev','xleye','yleye','leyev','xtsept','ytsept','tseptv']]
y_train = []
for i in range(len(train_data)):
y = training.iloc[i,:]
y_train.append(y)
y_train = np.array(y_train, dtype='float')
EDIT: Model code, loss function, and fit method.
###Loss function
visuals_mask = [False, False, True] * 3
def loss_func(y_true, y_pred):
visuals_true = tf.boolean_mask(y_true, visuals_mask, axis=1)
visuals_pred = tf.boolean_mask(y_pred, visuals_mask, axis=1)
visuals_loss = tf.keras.losses.BinaryCrossentropy(visuals_true, visuals_pred)
visuals_loss = tf.reduce_mean(visuals_loss)
coords_true = tf.boolean_mask(y_true, ~np.array(visuals_mask), axis=1)
coords_pred = tf.boolean_mask(y_pred, ~np.array(visuals_mask), axis=1)
coords_loss = tf.keras.losses.MeanAbsolutePercentageError(coords_true, coords_pred)
coords_loss = tf.reduce_mean(coords_loss)
return coords_loss + visuals_loss
####Model code
model = Sequential()
model.add(Conv2D(32, (3,3), activation='relu', padding='same', use_bias=False, input_shape=(256,256,1)))
model.add(BatchNormalization())
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Conv2D(64, (3,3), activation='relu', padding='same', use_bias=False))
model.add(BatchNormalization())
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Conv2D(128, (3,3), activation='relu', padding='same', use_bias=False))
model.add(BatchNormalization())
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(16, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(4, activation='relu'))
model.add(Dense(9, activation='linear'))
model.summary()
model.compile(optimizer='adam', loss=loss_func)
###Model fit
checkpointer = ModelCheckpoint('C:/Users/Cloud/.spyder-py3/x_y_shift/weights/vis_coords_TEST.hdf5', monitor='val_loss', verbose=1, mode = 'min', save_best_only=True)
out = model.fit(X_train,y_train,epochs=5,batch_size=4,validation_split=0.1, verbose=1, callbacks=[checkpointer])
I can't be sure beacuse I don't have data to reproduce the problem but these are the steps in my head:
Use boolean masking to get the 2, 5 and 8. indexes from the output:
visuals_mask_ = [False, False, True] * 3
# in the loss function
visuals_true = tf.boolean_mask(y_true, visuals_mask_, axis=-1) # do the same with preds
Compute the loss for the visuals
visuals_loss = binary_crossentropy(visuals_true, visuals_pred) # use sparse if that's the case
Get the coordinates' outputs just like we did for visuals but with reversed visuals_mask. I believe tf.boolean_mask(y_true, tf.math.logical_not(visuals_mask_, axis=-1)) should work.
Compute MAPE for the rest (coords_true and coords_pred)
Get the means for both losses by tf.reduce_mean
Get sum of losses and return it
I hope these will provide some insight.
Edit:
I tried the following and seems like it's working:
y_true = tf.convert_to_tensor(np.random.rand(32, 9))
y_pred = tf.convert_to_tensor(np.random.rand(32, 9))
visuals_mask = [False, False, True] * 3
def loss_func(y_true, y_pred):
visuals_true = tf.boolean_mask(y_true, visuals_mask, axis=1)
visuals_pred = tf.boolean_mask(y_pred, visuals_mask, axis=1)
visuals_loss = binary_crossentropy(visuals_true, visuals_pred)
visuals_loss = tf.reduce_mean(visuals_loss)
coords_true = tf.boolean_mask(y_true, ~np.array(visuals_mask), axis=1)
coords_pred = tf.boolean_mask(y_pred, ~np.array(visuals_mask), axis=1)
coords_loss = mean_absolute_percentage_error(coords_true, coords_pred)
coords_loss = tf.reduce_mean(coords_loss)
return coords_loss + visuals_loss
loss_func(y_true, y_pred)
What I assumed here is:
Your output has actually has length of 9 ((batch_size, 9)).
Custom loss calculations may differ in this demonstration and actual training because of eager execution.
Edit 2:
I tried it with this kind of model and it's seems to work:
model = Sequential()
model.add(Conv2D(4, 10, data_format='channels_last', input_shape=(256, 256, 1)))
model.add(Flatten())
model.add(Dense(9, activation='sigmoid'))
model.compile('adam', loss=loss_func)
Suppose I have a training data set shown X which is a 6000*51 matrix and the problem is a multi-output classification problem and the target matrix Y is 6000*10 and each column of the target matrix takes 0 or 1. I can define parameters based on the features as follows:
n = 10
p = X[:,:n]
a = X[:,2:2*n]
c = X[:,2*n]
Suppose the prediction of my model is Prediction. I want to define a loss function as follows:
-np.einsum('ij,ij ->i',p,y_test).mean() + 10 *
np.mean( np.maximum( np.einsum('ij,ij ->i',a,Prediction) - c, 0) )
def knapsack_loss(X, n, cvc=1):
input_a = X[:,n:2*n]
input_a = np.float64(deepcopy(input_a ))
input_p = X[:,:n]
input_p = np.float64(deepcopy(input_p))
input_c = X[:,2*n]
input_c = np.float64(deepcopy(input_c))
def loss(y_true, y_pred):
picks = y_pred
return (-1 * K.batch_dot(picks, input_p, 1)) + cvc * K.maximum(
K.batch_dot(picks, input_a, 1) - input_c, 0)
return loss
def get_model(n_inputs, n_outputs):
model = Sequential()
model.add(Dense(100, input_dim=n_inputs, kernel_initializer='he_uniform', activation='relu'))
model.add(Dense(n_outputs, activation='sigmoid'))
model.compile(loss= knapsack_loss(X_train, n, cvc=1),optimizer='adam')
return model
n_inputs, n_outputs = X.shape[1], Y.shape[1]
model = get_model(n_inputs, n_outputs)
model.fit(X_train, y_train, verbose=0, epochs=500)
When I run this code, I face the following error:
InvalidArgumentError: Incompatible shapes: [32] vs. [6000]
[[{{node training_14/Adam/gradients/loss_22/dense_55_loss/loss/MatMul_1_grad/BroadcastGradientArgs}}]]
I would be thankful if someone can correct it or provide and synthetic example.
I have the following model
def get_model():
epochs = 100
learning_rate = 0.1
decay_rate = learning_rate / epochs
inp = keras.Input(shape=(64, 101, 1), name="inputs")
x = layers.Conv2D(128, kernel_size=(3, 3), strides=(3, 3), padding="same")(inp)
x = layers.Conv2D(256, kernel_size=(3, 3), strides=(3, 3), padding="same")(x)
x = layers.Flatten()(x)
x = layers.Dense(150)(x)
x = layers.Dense(150)(x)
out1 = layers.Dense(40000, name="sf_vec")(x)
out2 = layers.Dense(128, name="ls_weights")(x)
model = keras.Model(inp, [out1, out2], name="2_out_model")
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=decay_rate), # in caso rimettere 0.001
loss="mean_squared_error")
keras.utils.plot_model(model, to_file='model.png', show_shapes=True, show_layer_names=True)
model.summary()
return model
that is, I want to train my neural network based on the "mix" of the loss from the first output and the loss from the second output.
I train my neural network in this way:
model.fit(x_train, [sf_train, ls_filters_train], epochs=10)
and during the training ,for example, this is shown:
Epoch 10/10 -> loss: 0.0702 - sf_vec_loss: 0.0666 - ls_weights_loss: 0.0035
I'd like to know if it's a case that the "loss" is nearly the sum between the sf_vec_loss and ls_weights_loss or if keras is actually reasoning in this way.
Also, is the network being trained on the "loss" only?
Thank you in advance :)
following the Tensorflow Documentation...
from the loss argument:
If the model has multiple outputs, you can use a different loss on
each output by passing a dictionary or a list of losses. The loss
value that will be minimized by the model will then be the sum of all
individual losses
remember also that you can also weight the loss contributions of different model outputs
from the loss_weights argument:
The loss value that will be minimized by the model will then be the
weighted sum of all individual losses, weighted by the loss_weights coefficients