I want to replace the loss function related to my neural network during training, this is the network:
model = tensorflow.keras.models.Sequential()
model.add(tensorflow.keras.layers.Conv2D(32, kernel_size=(3, 3), activation="relu", input_shape=input_shape))
model.add(tensorflow.keras.layers.Conv2D(64, (3, 3), activation="relu"))
model.add(tensorflow.keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(tensorflow.keras.layers.Dropout(0.25))
model.add(tensorflow.keras.layers.Flatten())
model.add(tensorflow.keras.layers.Dense(128, activation="relu"))
model.add(tensorflow.keras.layers.Dropout(0.5))
model.add(tensorflow.keras.layers.Dense(output_classes, activation="softmax"))
model.compile(loss=tensorflow.keras.losses.categorical_crossentropy, optimizer=tensorflow.keras.optimizers.Adam(0.001), metrics=['accuracy'])
history = model.fit(x_train, y_train, batch_size=128, epochs=5, validation_data=(x_test, y_test))
so now I want to change tensorflow.keras.losses.categorical_crossentropy with another, so I made this:
model.compile(loss=tensorflow.keras.losses.mse, optimizer=tensorflow.keras.optimizers.Adam(0.001), metrics=['accuracy'])
history = model.fit(x_improve, y_improve, epochs=1, validation_data=(x_test, y_test)) #FIXME bug during training
but I have this error:
ValueError: No gradients provided for any variable: ['conv2d/kernel:0', 'conv2d/bias:0', 'conv2d_1/kernel:0', 'conv2d_1/bias:0', 'dense/kernel:0', 'dense/bias:0', 'dense_1/kernel:0', 'dense_1/bias:0'].
Why? How can I fix it? There is another way to change loss function?
Thanks
I'm currently working on google colab with Tensorflow and Keras and i was not able to recompile a model mantaining the weights, every time i recompile a model like this:
with strategy.scope():
model = hd_unet_model(INPUT_SIZE)
model.compile(optimizer=Adam(lr=0.01),
loss=tf.keras.losses.MeanSquaredError() ,
metrics=[tf.keras.metrics.MeanSquaredError()])
the weights gets resetted.
so i found an other solution, all you need to do is:
Get the model with the weights you want ( load it or something else )
gets the weights of the model like this:
weights = model.get_weights()
recompile the model ( to change the loss function )
set again the weights of the recompiled model like this:
model.set_weights(weights)
launch the training
i tested this method and it seems to work.
so to change the loss mid-Training you can:
Compile with the first loss.
Train of the first loss.
Save the weights.
Recompile with the second loss.
Load the weights.
Train on the second loss.
So, a straightforward answer I would give is: switch to pytorch if you want to play this kind of games. Since in pytorch you define your training and evaluation functions, it takes just an if statement to switch from a loss function to another one.
Also, I see in your code that you want to switch from cross_entropy to mean_square_error, the former is suitable for classification the latter for regression, so this is not really something you can do, in the code that follows I switched from mean squared error to mean squared logarithmic error, which are both loss suitable for regression.
Despite other answers offers solutions to your question (see change-loss-function-dynamically-during-training) it is not clear wether you can trust or not the results. Some people found that even with a customised function sometimes Keras keep training with the first loss.
Solution:
My solution is based on train_on_batch, which allows us to train a model in a for loop and therefore stop training it whenever we prefer to recompile the model with a new loss function. Please note that recompiling the model does not reset the weights (see:Does recompiling a model re-initialize the weights?).
The dataset can be found here Boston housing dataset
# Regression Example With Boston Dataset: Standardized and Larger
from pandas import read_csv
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import train_test_split
from keras.losses import mean_squared_error, mean_squared_logarithmic_error
from matplotlib import pyplot
import matplotlib.pyplot as plt
# load dataset
dataframe = read_csv("housing.csv", delim_whitespace=True, header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:13]
y = dataset[:,13]
trainX, testX, trainy, testy = train_test_split(X, y, test_size=0.33, random_state=42)
# create model
model = Sequential()
model.add(Dense(13, input_dim=13, kernel_initializer='normal', activation='relu'))
model.add(Dense(6, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal'))
batch_size = 25
# have to define manually a dict to store all epochs scores
history = {}
history['history'] = {}
history['history']['loss'] = []
history['history']['mean_squared_error'] = []
history['history']['mean_squared_logarithmic_error'] = []
history['history']['val_loss'] = []
history['history']['val_mean_squared_error'] = []
history['history']['val_mean_squared_logarithmic_error'] = []
# first compiling with mse
model.compile(loss='mean_squared_error', optimizer='adam', metrics=[mean_squared_error, mean_squared_logarithmic_error])
# define number of iterations in training and test
train_iter = round(trainX.shape[0]/batch_size)
test_iter = round(testX.shape[0]/batch_size)
for epoch in range(2):
# train iterations
loss, mse, msle = 0, 0, 0
for i in range(train_iter):
start = i*batch_size
end = i*batch_size + batch_size
batchX = trainX[start:end,]
batchy = trainy[start:end,]
loss_, mse_, msle_ = model.train_on_batch(batchX,batchy)
loss += loss_
mse += mse_
msle += msle_
history['history']['loss'].append(loss/train_iter)
history['history']['mean_squared_error'].append(mse/train_iter)
history['history']['mean_squared_logarithmic_error'].append(msle/train_iter)
# test iterations
val_loss, val_mse, val_msle = 0, 0, 0
for i in range(test_iter):
start = i*batch_size
end = i*batch_size + batch_size
batchX = testX[start:end,]
batchy = testy[start:end,]
val_loss_, val_mse_, val_msle_ = model.test_on_batch(batchX,batchy)
val_loss += val_loss_
val_mse += val_mse_
val_msle += msle_
history['history']['val_loss'].append(val_loss/test_iter)
history['history']['val_mean_squared_error'].append(val_mse/test_iter)
history['history']['val_mean_squared_logarithmic_error'].append(val_msle/test_iter)
# recompiling the model with new loss
model.compile(loss='mean_squared_logarithmic_error', optimizer='adam', metrics=[mean_squared_error, mean_squared_logarithmic_error])
for epoch in range(2):
# train iterations
loss, mse, msle = 0, 0, 0
for i in range(train_iter):
start = i*batch_size
end = i*batch_size + batch_size
batchX = trainX[start:end,]
batchy = trainy[start:end,]
loss_, mse_, msle_ = model.train_on_batch(batchX,batchy)
loss += loss_
mse += mse_
msle += msle_
history['history']['loss'].append(loss/train_iter)
history['history']['mean_squared_error'].append(mse/train_iter)
history['history']['mean_squared_logarithmic_error'].append(msle/train_iter)
# test iterations
val_loss, val_mse, val_msle = 0, 0, 0
for i in range(test_iter):
start = i*batch_size
end = i*batch_size + batch_size
batchX = testX[start:end,]
batchy = testy[start:end,]
val_loss_, val_mse_, val_msle_ = model.test_on_batch(batchX,batchy)
val_loss += val_loss_
val_mse += val_mse_
val_msle += msle_
history['history']['val_loss'].append(val_loss/test_iter)
history['history']['val_mean_squared_error'].append(val_mse/test_iter)
history['history']['val_mean_squared_logarithmic_error'].append(val_msle/test_iter)
# Some plots to check what is going on
# loss function
pyplot.subplot(311)
pyplot.title('Loss')
pyplot.plot(history['history']['loss'], label='train')
pyplot.plot(history['history']['val_loss'], label='test')
pyplot.legend()
# Only mean squared error
pyplot.subplot(312)
pyplot.title('Mean Squared Error')
pyplot.plot(history['history']['mean_squared_error'], label='train')
pyplot.plot(history['history']['val_mean_squared_error'], label='test')
pyplot.legend()
# Only mean squared logarithmic error
pyplot.subplot(313)
pyplot.title('Mean Squared Logarithmic Error')
pyplot.plot(history['history']['mean_squared_logarithmic_error'], label='train')
pyplot.plot(history['history']['val_mean_squared_logarithmic_error'], label='test')
pyplot.legend()
plt.tight_layout()
pyplot.show()
The resulting plot confirm that the loss function is changing after the second epoch:
The drop in the loss function is due to the fact that the model is switching from normal mean squared error to the logarithmic one, which has much lower values. Printing the scores also prove that the used loss truly changed:
print(history['history']['loss'])
[599.5209197998047, 570.4041115897043, 3.8622902120862688, 2.1578191178185597]
print(history['history']['mean_squared_error'])
[599.5209197998047, 570.4041115897043, 510.29034205845426, 425.32058388846264]
print(history['history']['mean_squared_logarithmic_error'])
[8.624503476279122, 6.346359729766846, 3.8622902120862688, 2.1578191178185597]
In the first two epochs the values of loss are equal to ones of mean_square_error and during the third and fourth epochs the values becomes equal to the ones of mean_square_logarithmic_error, which is the new loss that was set. So it seems that using train_on_batch allows to change loss function, nevertheless I want to stress out again that this is basically what one should do on pytoch to achieve the same results, with the difference that the behaviour of pytorch (in this scenario and in my opinion) is more reliable.
Related
I use Python 3.7 and Keras 2.2.4. I created a Keras model with two output layers:
self.df_model = Model(inputs=input, outputs=[out1,out2])
As the loss history only returns one loss value per epoch, I want to get the loss of each output layer. How is it possible to get two loss values per epoch, one for each output layer?
Each model in Keras has a default History callback which stores all the loss and metric values of all the epochs, both the aggregate values as well as per output layer. This callback creates a History object which is returned when fit model is called and you can access all of these values by using the history property of that object (it is actually a dictionary):
history = model.fit(...)
print(history.history) # <-- a dict which contains all the loss and metric values per epoch
A minimal reproducible example:
from keras import layers
from keras import Model
import numpy as np
inp = layers.Input((1,))
out1 = layers.Dense(2, name="output1")(inp)
out2 = layers.Dense(3, name="output2")(inp)
model = Model(inp, [out1, out2])
model.compile(loss='mse', optimizer='adam')
x = np.random.rand(2, 1)
y1 = np.random.rand(2, 2)
y2 = np.random.rand(2, 3)
history = model.fit(x, [y1,y2], epochs=5)
print(history.history)
#{'loss': [1.0881365537643433, 1.084699034690857, 1.081269383430481, 1.0781562328338623, 1.0747418403625488],
# 'output1_loss': [0.87154925, 0.8690172, 0.86648905, 0.8641926, 0.8616721],
# 'output2_loss': [0.21658726, 0.21568182, 0.2147803, 0.21396361, 0.2130697]}
I'm trying to make a simple neural network with Keras, but my weights won't update after calling fit()
To test the model, I created a simple data set, called mem. mem is a deque of tuples. mem[i][0] gives a np.array of size inp_len of only ones or only zeros.
Here is my code:
inp_len = 5*3 + 3187*4
model = Sequential()
model.add(Dense(units=124, kernel_initializer='ones', input_shape = (inp_len,)))
model.add(LeakyReLU(alpha=0.05))
model.add(Dense(48, kernel_initializer='ones'))
model.add(LeakyReLU(alpha=0.05))
model.add(Dense(48, kernel_initializer='ones'))
model.add(LeakyReLU(alpha=0.05))
model.add(Dense(48, kernel_initializer='ones'))
model.add(LeakyReLU(alpha=0.05))
model.add(Dense(1, activation = 'sigmoid'))
model.compile(loss='binary_crossentropy', optimizer=Adam(lr=learning_rate, decay=learning_rate_decay))
batch_size = 20
batch_old = random.sample(mem, min(len(mem), batch_size))
for i_batch in range(len(batch_old)):
X = batch_old[i_batch][0].reshape(1,inp_len)
y = np.array([[X[0]]])
model.fit(X, y, epochs = 1, batch_size = 1)
I use 1 epoch and with a batch size of 1, because I want to use model.predict() in another part of the code with a different batch size.
Can someone please explain why model.get_weights()[0] keeps returning ones after fitting the model?
I am currently trying to build a model to classify whether or not the outcome of a given football match will be above or below 2.5 goals, based on the Home team, Away team & game league, using a tf.keras.Sequential model in TensorFlow 2.0RC.
The problem I am encountering is that my softmax results converge on [0.5,0.5] when using the model.predict method. What makes this odd is that my validation & test accuracy and losses are about 0.94 & 0.12 respectively after 1000 epochs of training, otherwise I would have put this down to an overfitting problem. I am aware that 1000 epochs is extremely likely to overfit, however, I want to understand why my accuracy increases until about 800 epochs in. My loss flattens at about 300 epochs.
I have tried to alter the number of layers, number of units in each layer, the activation functions, optimizers and loss functions, number of epochs and learning rates, but can only seem to increase the losses.
The results still seem to converge toward [0.5,0.5] regardless.
The full code can be viewed at https://github.com/AhmUgEk/tensorflow_football_predictions, but below is an extract showing model composition.
# Create Keras Sequential model:
model = keras.Sequential()
model.add(feature_layer) # Input processing layer.
model.add(Dense(units=32, activation='relu')) # Hidden Layer 1.
model.add(Dropout(rate=0.4))
model.add(BatchNormalization())
model.add(Dense(units=32, activation='relu')) # Hidden Layer 2.
model.add(Dropout(rate=0.4))
model.add(BatchNormalization())
model.add(Dense(units=2, activation='softmax')) # Output layer.
# Compile the model:
model.compile(
optimizer=keras.optimizers.Adam(learning_rate=0.0001),
loss=keras.losses.MeanSquaredLogarithmicError(),
metrics=['accuracy']
)
# Compile the model:
model.compile(
optimizer=keras.optimizers.Adam(learning_rate=0.0001),
loss=keras.losses.MeanSquaredLogarithmicError(),
metrics=['accuracy']
)
# Fit the model to the training dataset and validate against the
validation dataset between epochs:
model.fit(
train_dataset,
validation_data=val_dataset,
epochs=1000,
callbacks=[tensorboard_callback]
)
I would expect to receive a result of [0.282, 0.718] for example for an input of:
model.predict_classes([np.array(['E0'], dtype='object'),
np.array(['Liverpool'], dtype='object'),
np.array(['Newcastle'], dtype='object')])[0]
but as per the above, receive a result of say [0.5, 0.5].
Am I missing something obvious here?
I had made some minor changes in the model. Now, I am not getting exactly [0.5, 0.5].
Result:
[[0.61482537 0.3851746 ]
[0.5121426 0.48785746]
[0.48058605 0.51941395]
[0.48913187 0.51086813]
[0.45480043 0.5451996 ]
[0.48933673 0.5106633 ]
[0.43431875 0.5656812 ]
[0.55314165 0.4468583 ]
[0.5365097 0.4634903 ]
[0.54371756 0.45628244]]
Implementation:
import datetime
import os
import numpy as np
import pandas as pd
import tensorflow as tf
from gpu_limiter import limit_gpu
from pipe_functions import csv_to_df, dataframe_to_dataset
from sklearn.model_selection import train_test_split
from tensorflow import keras
from tensorflow.keras.layers import BatchNormalization, Dense, DenseFeatures, Dropout, Input
from tensorflow.keras.callbacks import TensorBoard, ModelCheckpoint
import tensorflow.keras.backend as K
from tensorflow.data import Dataset
# Test GPU availability and instantiate memory growth limitation if True:
if tf.test.is_gpu_available():
print('GPU Available\n')
limit_gpu()
else:
print('Running on CPU')
df = csv_to_df("./csv_files")
# Format & organise imported data, making the "Date" column the new index:
df['Date'] = pd.to_datetime(df['Date'])
df = df[['Date', 'Div', 'HomeTeam', 'AwayTeam', 'FTHG', 'FTAG']].dropna().set_index('Date').sort_index()
df['Over_2.5'] = (df['FTHG'] + df['FTAG'] > 2.5).astype(int)
df = df.drop(['FTHG', 'FTAG'], axis=1)
# Split data into training, validation and testing data:
# Note: random_state variable set to ensure reproducibility.
train, test = train_test_split(df, test_size=0.05, random_state=42)
train, val = train_test_split(train, test_size=0.05, random_state=42)
# print(df['Over_2.5'].value_counts()) # Check that data is balanced.
# Create datasets from train, val & test dataframes:
target_col = 'Over_2.5'
batch_size = 32
def df_to_dataset(features: np.ndarray, labels: np.ndarray, shuffle=True, batch_size=8) -> Dataset:
ds = Dataset.from_tensor_slices(({"feature": features}, {"target": labels}))
if shuffle:
ds = ds.shuffle(buffer_size=len(features))
ds = ds.batch(batch_size)
return ds
def get_feature_transform() -> DenseFeatures:
# Format features into feature columns to ensure data is in the correct format for feeding into the model:
feature_cols = []
for column in filter(lambda x: x != target_col, df.columns):
feature_cols.append(tf.feature_column.embedding_column(tf.feature_column.categorical_column_with_vocabulary_list(
key=column, vocabulary_list=df[column].unique()), dimension=5))
return DenseFeatures(feature_cols)
# Transforms all features into dense tensors.
feature_transform = get_feature_transform()
train_features = feature_transform(dict(train)).numpy()
val_features = feature_transform(dict(val)).numpy()
test_features = feature_transform(dict(test)).numpy()
train_dataset = df_to_dataset(train_features, train[target_col].values, shuffle=True, batch_size=batch_size)
val_dataset = df_to_dataset(val_features, val[target_col].values, shuffle=True, batch_size=batch_size) # Shuffle not required to validation data.
test_dataset = df_to_dataset(test_features, test[target_col].values, shuffle=True, batch_size=batch_size) # Shuffle not required to test data.
# Create Keras Functional API:
# Create a feature layer from the feature columns, to be placed at the input layer of the model:
def build_model(input_shape: tuple) -> keras.Model:
input_layer = keras.Input(shape=input_shape, name='feature')
model = Dense(units=1028, activation='relu', kernel_initializer='normal', name='dense0')(input_layer) # Hidden Layer 1.
model = BatchNormalization(name='bc0')(model)
model = Dense(units=1028, activation='relu', kernel_initializer='normal', name='dense1')(model) # Hidden Layer 2.
model = Dropout(rate=0.1)(model)
model = BatchNormalization(name='bc1')(model)
model = Dense(units=100, activation='relu', kernel_initializer='normal', name='dense2')(model) # Hidden Layer 3.
model = Dropout(rate=0.25)(model)
model = BatchNormalization(name='bc2')(model)
model = Dense(units=50, activation='relu', kernel_initializer='normal', name='dense3')(model) # Hidden Layer 4.
model = Dropout(rate=0.4)(model)
model = BatchNormalization(name='bc3')(model)
output_layer = Dense(units=2, activation='softmax', kernel_initializer='normal', name='target')(model) # Output layer.
model = keras.Model(inputs=input_layer, outputs=output_layer, name='better-than-chance')
# Compile the model:
model.compile(
optimizer=keras.optimizers.Adam(learning_rate=0.001),
loss='mse',
metrics=['accuracy']
)
return model
# # Create a TensorBoard log file (time appended) directory for every run of the model:
# directory = ".\\logs\\" + str(datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
# os.mkdir(directory)
# # Create a TensorBoard callback to log a record of model performance for every 1 epoch:
# tensorboard_callback = TensorBoard(log_dir=directory, histogram_freq=1, write_graph=True, write_images=True)
# Run "tensorboard --logdir .\logs" in anaconda prompt to review & compare logged results.
# Note: Make sure that the correct environment is activated before running.
model = build_model((train_features.shape[1],))
model.summary()
# checkpoint = ModelCheckpoint('model-{epoch:03d}.h5', verbose=1, monitor='val_loss',save_best_only=True, mode='auto')
# Fit the model to the training dataset and validate against the validation dataset between epochs:
model.fit(
train_dataset,
validation_data=val_dataset,
epochs=10)
# callbacks=[checkpoint]
# Saves and reloads model.
# model.save("./model.h5")
# model_from_saved = keras.models.load_model("./model.h5")
# Evaluate model accuracy against test dataset:
# scores, accuracy = model.evaluate(train_dataset)
# print('Accuracy:', accuracy)
##############
## OPTIONAL ##
##############
# DUBUGGING
# inp = model.input # input placeholder
# outputs = [layer.output for layer in model.layers] # all layer outputs
# functors = [K.function([inp], [out]) for out in outputs] # evaluation functions
# # Testing
# layer_outs = [func([test_features]) for func in functors]
# print(layer_outs)
# # # Form a prediction based on inputs:
prediction = model.predict({"feature": test_features[:10]})
print(prediction)
One thing you can do is to try some ensemble Learning methods like
RandomForest
and
XGBoost
and compare the results.
You should try is to add other Key Performance Indicators(KPI)s in
your data and then try to fit the model.
I have a Lstm model for sequence prediction,which is shown here:
def create_model(max_sequence_len, total_words):
input_len = max_sequence_len - 1
model = keras.models.Sequential()
model.add(layers.Embedding(total_words, 50, input_length=input_len))
model.add(layers.LSTM(50, input_shape=predictors[:1].shape))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(activation='softmax', units = total_words))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'], lr=0.01)
return model
model_sb = create_model(max_sequence_len, total_words)
history = model_sb.fit(X_train, y_train, epochs = 20 , shuffle = True, validation_split=0.3, )
and it works well but I want to take 2 output from my model who are the output with most probability in softmax dense layer.
for take them I can use this code:
predicted = model_sb.predict(test_sequence, verbose=1)
And then by this code find the first n high probability output:
y_sum = predicted.sum(axis=0)
ind = np.argpartition(y_sum, -n)[-n:]
ind[np.argsort(y_sum[ind])]
But I need to know the accuracy of my model if the output be one of these n output (with "or" condition)
Is there any package which help me?
I mean I don't want to evaluate my model with just one most probability output, I want to evaluate accuracy and loss by 2 high probability result.
This is called top-k accuracy, with k = 2 in your case. Keras already has an implementation of this accuracy:
from keras.metrics import top_k_categorical_accuracy
def my_acc(y_true, y_pred):
return top_k_categorical_accuracy(y_true, y_pred, k=2)
Then you pass this custom metric to your model:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=[my_acc])
I am training a neural net model in Keras. I want to monitor the validation loss and stop the training when certain condition is attained.
I know I can use EarlyStopping to stop the training when there is no improvement in training for a given number of patience rounds.
I want to something different. I want to stop the training when the val_loss is going above a value say x after n rounds.
To make things clear, Let's say x in 0.5 and n is 50. I want to stop the model's training only if the epoch number is greater than 50 and val_loss is above 0.5.
How can I do this in Keras.?
You can define your own callback by inheriting from the Keras EarlyStopping callback and overriding it with your own logic:
from keras.callbacks import EarlyStopping # use as base class
class MyCallBack(EarlyStopping):
def __init__(self, threshold, min_epochs, **kwargs):
super(MyCallBack, self).__init__(**kwargs)
self.threshold = threshold # threshold for validation loss
self.min_epochs = min_epochs # min number of epochs to run
def on_epoch_end(self, epoch, logs=None):
current = logs.get(self.monitor)
if current is None:
warnings.warn(
'Early stopping conditioned on metric `%s` '
'which is not available. Available metrics are: %s' %
(self.monitor, ','.join(list(logs.keys()))), RuntimeWarning
)
return
# implement your own logic here
if (epoch >= self.min_epochs) & (current >= self.threshold):
self.stopped_epoch = epoch
self.model.stop_training = True
Small example to illustrate that it should work:
from keras.layers import Input, Dense
from keras.models import Model
import numpy as np
# Generate some random data
features = np.random.rand(100, 5)
labels = np.random.rand(100, 1)
validation_feat = np.random.rand(100, 5)
validation_labels = np.random.rand(100, 1)
# Define a simple model
input_layer = Input((5, ))
dense_layer = Dense(10)(input_layer)
output_layer = Dense(1)(dense_layer)
model = Model(inputs=input_layer, outputs=output_layer)
model.compile(loss='mse', optimizer='sgd')
# Fit with custom callback
callbacks = [MyCallBack(threshold=0.001, min_epochs=10, verbose=1)]
model.fit(features, labels, validation_data=(validation_feat, validation_labels), callbacks=callbacks, epochs=100)