I am training a three layer neural network with keras:
model = models.Sequential()
model.add(Conv2D(32, (3, 3), padding="same",
input_shape=input_shape, strides=2, kernel_regularizer=l2(reg)))
model.add(BatchNormalization(axis=channels))
model.add(Activation("relu"))
model.add(Conv2D(64, (3, 3), padding="same",
input_shape=input_shape, strides=2, kernel_regularizer=l2(reg)))
model.add(BatchNormalization(axis=channels))
model.add(Activation("relu"))
model.add(Conv2D(128, (3, 3), padding="same",
input_shape=input_shape, strides=2, kernel_regularizer=l2(reg)))
model.add(BatchNormalization(axis=channels))
model.add(Activation("relu"))
model.add(layers.Flatten())
model.add(layers.Dense(neurons, activation='relu', kernel_regularizer=l2(reg)))
model.add(Dropout(0.50))
model.add(Dense(2))
model.add(Activation("softmax"))
My data has two classes, and I am using sparse categorical cross entropy:
model.compile(loss='sparse_categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
history = model.fit(x=X, y=y, batch_size=batch_size, epochs=epochs, validation_data=(X_val, y_val),
shuffle=True,
callbacks=callbacks,
verbose=1)
My data has the following shape:
X: (232, 100, 150, 3)
y: (232,)
Where X are images and y is either 1 or 0, because of using the sparse loss function
The loss is very high for both accuracy and validation, even if the training accuracy is 1! I get values over 20 for the loss, which I understand are not reasonable.
If I set the model to try for a few epochs and output the predictions for the labels and the true values, and I get the categorical cross entropy from them, the value I get is <1, as expected, even when I make the calculation with keras' function (I change to categorical because the sparse gives an error)
21/21 [==============================] - 7s 313ms/step - loss: 44.1764 - acc: 1.0000 - val_loss: 44.7084 - val_acc: 0.7857
cce = tf.keras.losses.CategoricalCrossentropy()
pred = model.predict(x=X_val, batch_size=len(X_val))
loss = cce(true_categorical, pred)
Categorical loss 0.6077293753623962
Is there a way to know exactly how this is calculated and why the high values? Batch size is 8.
The loss printed by Keras is the total loss.
Regularization is also a loss added to the model based on the value of the weights.
Since you have a lot of weights, you also have a lot of contributions to the total loss.
That is why it's big.
If you remove the regularization, you will see the final loss equal to the categorical crossentropy loss.
Related
I've a dataset where I need to predict the target, that it is 0 or 1,
for me is good to know the prediction is near to 0, like 0.20 or near to 1, like 0.89 and so on.
my model structure is this:
model = Sequential()
model.add(Conv1D(filters=32, kernel_size=2, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=1, strides=1))
model.add(LSTM(128, return_sequences=True, recurrent_dropout=0.2,activation='relu'))
model.add(Dense(128, activation="relu",
kernel_regularizer=regularizers.l1_l2(l1=1e-5, l2=1e-4),
bias_regularizer=regularizers.l2(1e-4),
activity_regularizer=regularizers.l2(1e-5)))
model.add(Dropout(0.4))
model.add(Conv1D(filters=32, kernel_size=2, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=1, strides=1))
model.add(LSTM(64, return_sequences=True,activation='relu'))
model.add(Dense(64, activation="relu",kernel_regularizer=regularizers.l1_l2(l1=1e-5, l2=1e-4),
bias_regularizer=regularizers.l2(1e-4),
activity_regularizer=regularizers.l2(1e-5)))
model.add(Dropout(0.4))
model.add(Conv1D(filters=32, kernel_size=2, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=1, strides=1))
model.add(LSTM(32, return_sequences=True, recurrent_dropout=0.2, activation='relu'))
model.add(Dense(32, activation="relu",kernel_regularizer=regularizers.l1_l2(l1=1e-5, l2=1e-4),
bias_regularizer=regularizers.l2(1e-4),
activity_regularizer=regularizers.l2(1e-5)))
model.add(Dropout(0.4))
model.add(BatchNormalization())
model.add(Dense(1, activation='linear'))
from keras.metrics import categorical_accuracy
model.compile(optimizer='rmsprop',loss="mse",metrics=['accuracy'])
model.fit(X_train,y_train,epochs=1000, batch_size=16, verbose=1, validation_split=0.1, callbacks=callback)
Summary of model is here: https://pastebin.com/Ba6ErEzj
Verbosity on training is:
Epoch 58/1000
277/277 [==============================] - 1s 5ms/step - loss: 0.2510 - accuracy: 0.4937 - val_loss: 0.2523 - val_accuracy: 0.4878
Epoch 59/1000
277/277 [==============================] - 1s 5ms/step - loss: 0.2515 - accuracy: 0.4941 - val_loss: 0.2504 - val_accuracy: 0.5122
How can I improve that? accuracy around 0.50 on 0 or 1 output is useless.
This is my Colab code.
To wrap-up suggestions (some already offered in the comments), with some justification...
Mistakes. You are in a binary classification setting, so:
Using MSE is wrong; you should use loss='binary_crossentropy'
In your last single-node layer, you should use activation='sigmoid'.
Best practices. Things like dropout, batch normalization, and kernel & bial regularizers are used for regularization, i.e. (roughly speaking) to avoid overfitting. They should not be used by default, and doing so is well-known to prevent learning (as it seems to be the case here):
Remove all dropout layers
Remove all batch normalization layers
Remove all kernel, bias, and activity regularizers.
You can consider adding some of these back step by step later, but only if you see signs of overfitting.
General advice. Nowadays, usually the first choice for an optimizer is Adam, so change to optimizer='adam' as a first approach.
That said, at the end of the day, everything depends on your data (both their quantity & quality) and the particular problem to be addressed. Experimentation is king (but keeping in mind the general principles stated above).
For some background, my dataset is roughly 75000+ images, 200x200 greyscale, with 26 classes (the letters of the alphabet). My model is:
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(200, 200, 1)))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.2))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.2))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(26, activation='softmax'))
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=[tf.keras.metrics.CategoricalAccuracy()])
model.fit(X_train, y_train, epochs=1, batch_size=64, verbose=1, validation_data=(X_test, y_test))
The output of the model.fit is:
Train on 54600 samples, validate on 23400 samples
Epoch 1/1
54600/54600 [==============================] - 54s 984us/step - loss: nan - categorical_accuracy: 0.9964 - val_loss: nan - val_categorical_accuracy: 0.9996
99.9+ valadiation accuracy. When I run a test, it gets all the predictions incorrect. So, I assume it is overfitting. Why is this happening, despite adding the dropout layers? What other options do I have to fix this? Thank you!
The only way you would get all the predictions on a held-out test set incorrect while simultaneously getting almost 100% on validation accuracy is if you have a data leak. i.e. Your training data must contain the same images as your validation data (or they are VERY similar to the point of being identical).
Or the data in your test set is very different than your training and validation datasets.
To fix this ensure that across all your datasets no single image exists in more than one of the datasets. Also ensure that the images are generally similar. i.e. if training using cell phone photos, do not then test with images taken using a DSLR or images that have watermarks pulled from Google.
It is also odd that your loss is nan. It may be due to using categorical accuracy. To fix this just put the metric to be 'accuracy'. This will dynamically determine the best accuracy to use. One of [binary, categorical or sparse_categorical].
Hope this helps.
It's totally not overfiting, look at your loss it's equal to nan. That means your gradients have exploded during the training. To see what's really happening i recommend you look at the loss after every mini-batch and see at what point the loss becomes nan.
I am applying CNN model using keras. I fed the details coefficients of discrete wavelet transform level 5 as a 2D array of size (5,3840) into the CNN.I would like to use CNN to predict seizure.The problem is my network is overfitting. Any suggestion on how to solve overfitting problem.
input_shape=(1, 22, 5, 3844)
model = Sequential()
#C1
model.add(Conv3D(16, (22, 5, 5), strides=(1, 2, 2), padding='same',activation='relu',data_format= "channels_first", input_shape=input_shape))
model.add(keras.layers.MaxPooling3D(pool_size=(1, 2, 2),data_format= "channels_first", padding='same'))
model.add(BatchNormalization())
#C2
model.add(Conv3D(32, (1, 3, 3), strides=(1, 1,1), padding='same',data_format= "channels_first", activation='relu'))#incertezza se togliere padding
model.add(keras.layers.MaxPooling3D(pool_size=(1,2, 2),data_format= "channels_first", ))
model.add(BatchNormalization())
#C3
model.add(Conv3D(64, (1,3, 3), strides=(1, 1,1), padding='same',data_format= "channels_first", activation='relu'))#incertezza se togliere padding
model.add(keras.layers.MaxPooling3D(pool_size=(1,2, 2),data_format= "channels_first",padding='same' ))
model.add(BatchNormalization())
model.add(Flatten())
model.add(Dropout(0.5))
model.add(Dense(256, activation='sigmoid'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))
opt_adam = keras.optimizers.Adam(lr=0.00001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
model.compile(loss='categorical_crossentropy', optimizer=opt_adam, metrics=['accuracy'])
return model
There are 2 frequently used regularization techniques to avoid over-fitting:
L1 & L2 Regularization : Regularizers allow to apply penalties on layer parameters or layer activity during optimization. These penalties are incorporated in the loss function that the network optimizes.
from keras import regularizers
model.add(Dense(64, input_dim=64,
kernel_regularizer=regularizers.l2(0.01),
activity_regularizer=regularizers.l1(0.01)))
Dropout : Dropout consists in randomly setting a fraction rate of input units to 0 at each update during training time, which helps prevent
over-fitting.
from keras.layers import Dropout
model.add(Dense(60, input_dim=60, activation='relu'))
model.add(Dropout(rate=0.2))
model.add(Dense(30, activation='relu'))
model.add(Dropout(rate=0.2))
model.add(Dense(1, activation='sigmoid'))
Also you can use Early-Stopping to interrupt training when the validation loss isn't decreasing anymore
from keras.callbacks import EarlyStopping
early_stopping = EarlyStopping(monitor='val_loss', patience=2)
model.fit(x, y, validation_split=0.2, callbacks=[early_stopping])
Additionally, you might wanna consider Data-Augmentation techniques such as cropping, padding, and horizontal flipping. With these techniques, you can increase the diversity of your data available for your training model, without actually collecting new data. So you can capture data invariance and reduce over-fitting
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
y_train = np_utils.to_categorical(y_train, num_classes)
y_test = np_utils.to_categorical(y_test, num_classes)
datagen = ImageDataGenerator(
featurewise_center=True,
featurewise_std_normalization=True,
rotation_range=20,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True)
model.fit_generator(datagen.flow(x_train, y_train, batch_size=32),
steps_per_epoch=len(x_train) / 32, epochs=epochs)
Steps to remove overfitting:
Reduce the number of neural units in your hidden layers
I do not think you need softmax layer after a sigmoid layer. Probably your model is overfitting because of that.
Try replacing sigmoid layer with a dense layer with relu activation and output (n, 2) followed by your softmax layer.
Your learning rate is very low as well, which suggests that your model will take long to find the global minima hence underfit, but that is not happening here. This solidifies my suspicion that sigmoid layer is the cause.
Note: First time posting. I've tried to be thorough in my description
I've been trying to set up what I thought would be a very simple CNN by following this tutorial:
https://machinelearningmastery.com/cnn-models-for-human-activity-recognition-time-series-classification/
My Xtrain dataset is a time series as a numpy array with 34396 rows (samples) and 600 columns (time steps). My Ytrain dataset is just an array containing labels 0,1, or 2 (as ints). I'm just trying to use the CNN to perform multi-classification.
I'm running into an issue getting errors like
Input 0 is incompatible with layer conv1d_39: expected ndim=3, found
ndim=4
when input_shape=(n_timesteps,n_features,n_outputs)
or
Error when checking input: expected conv1d_40_input to have 3
dimensions, but got array with shape (34396, 600)
when input_shape=(n_timesteps,n_features)
I've been searching online for hours now but I can't seem to find a solution to my problem. I think its a simple problem with my data format and the input_shape values but I haven't been able to fix it.
I've tried setting input_shape to
(None, 600, 1)
(34396,600, 1)
(34396,600)
(None,600)
among various other combinations.
train_df = pd.read_csv('training.csv')
test_df = pd.read_csv('test.csv')
x_train=train_df.iloc[:,2:].values
y_train=train_df.iloc[:,1].values
x_test=train_df.iloc[:,2:].values
y_test=train_df.iloc[:,1].values
n_rows=len(x_train)
n_cols=len(x_train[0])
def evaluate_model(trainX, trainy, testX, testy):
verbose, epochs, batch_size = 0, 10, 32
n_timesteps, n_features, n_outputs = trainX.shape[0], trainX.shape[1], 3
print(n_timesteps, n_features, n_outputs)
model = Sequential()
model.add(Conv1D(filters=64, kernel_size=3, activation='relu', input_shape=(n_timesteps,n_features,n_outputs)))
model.add(Conv1D(filters=64, kernel_size=3, activation='relu'))
model.add(Dropout(0.5))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(100, activation='relu'))
model.add(Dense(n_outputs, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit network
model.fit(trainX, trainy, epochs=epochs, batch_size=batch_size, verbose=verbose)
# evaluate model
_, accuracy = model.evaluate(testX, testy, batch_size=batch_size, verbose=0)
return accuracy
evaluate_model(x_train,y_train,x_test,y_test)
As given in the keras doc, for Conv1D, for example input_shape=(10, 128) for time series sequences of 10 time steps with 128 features per step.
So for your case since you have 600 timesteps each of 1 feature it should be input_shape=(600,1).
Also you have to feed your labels y's as one-hot-encoded.
Working code
from keras.utils import to_categorical
model = Sequential()
model.add(Conv1D(filters=64, kernel_size=3, activation='relu', input_shape=(600,1)))
model.add(Conv1D(filters=64, kernel_size=3, activation='relu'))
model.add(Dropout(0.5))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(100, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam', metrics=['accuracy'])
x = np.random.randn(100,600)
y = np.random.randint(0,10, size=(100))
# Reshape to no:of sample, time_steps, 1 and convert y to one hot encoding
model.fit(x.reshape(100,600,1), to_categorical(y))
# Same as model.fit(np.expand_dims(x, 2), to_categorical(y))
Output:
Epoch 1/1
100/100 [===========================] - 0s 382us/step - loss: 2.3245 - acc: 0.0800
I am attempting to train a simple convolutional neural network shown below.
model= Sequential()
model.add(Conv1D(32, 3, padding='same', input_shape=(700,7))
model.add(Activation('relu'))
model.add(Conv1D(32,3))
model.add(Activation('relu'))
model.add(MaxPooling1D())
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile('adam', 'binary_crossentropy', metrics=['accuracy'])
I fit it using 100 epochs and a validation training split 0.2 on input data shaped [1000L, 700L, 7L]. Every single one of my epochs displaed the following:
loss: nan - acc:0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00
So my question is, what went wrong and how do I fix it? Is the problem with the network or how my data is being inputed and fitted to the model?