I tried to train my convolutional neural network using tensorflow and keras libraries. But the values of accuracy and val_accuracy didn't change the whole time. There is my neural network code:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten, Conv2D, MaxPooling2D
import pickle
X = pickle.load(open("X.pickle", "rb"))
y = pickle.load(open("y.pickle", "rb"))
X = X/255.0
model = Sequential()
model.add(Conv2D(64, (3, 3), input_shape=X.shape[1:]))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), activation="relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(64, activation="relu"))
model.add(Dense(1, activation="sigmoid"))
model.compile(loss="binary_crossentropy",
optimizer="adam",
metrics=["accuracy"])
model.fit(X, y, batch_size=10, epochs=10, validation_split=0.1)
There is the creation of traning data, features and labels (X - features, y - labels)
def create_training_data():
for category in CATEGORIES:
path = os.path.join(DATADIR, category)
class_num = CATEGORIES.index(category)
for img in os.listdir(path):
try:
img_array = cv2.imread(os.path.join(path, img), cv2.IMREAD_GRAYSCALE)
new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE))
training_data.append([new_array, class_num])
except Exception as e:
pass
create_training_data()
random.shuffle(training_data)
X = []
y = []
for features, label in training_data:
X.append(features)
y.append(label)
X = np.array(X).reshape(-1, IMG_SIZE, IMG_SIZE, 1)
y = np.array(y)
And this is the log of training:
2023-01-15 00:36:42.368335: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Epoch 1/10
70/70 [==============================] - 45s 619ms/step - loss: 0.3039 - accuracy: 0.9627 - val_loss: 0.1211 - val_accuracy: 0.9744
Epoch 2/10
70/70 [==============================] - 42s 600ms/step - loss: 0.1524 - accuracy: 0.9670 - val_loss: 0.1189 - val_accuracy: 0.9744
Epoch 3/10
70/70 [==============================] - 42s 600ms/step - loss: 0.1537 - accuracy: 0.9670 - val_loss: 0.1622 - val_accuracy: 0.9744
Epoch 4/10
70/70 [==============================] - 44s 627ms/step - loss: 0.1563 - accuracy: 0.9670 - val_loss: 0.1464 - val_accuracy: 0.9744
Epoch 5/10
70/70 [==============================] - 42s 604ms/step - loss: 0.1591 - accuracy: 0.9670 - val_loss: 0.1185 - val_accuracy: 0.9744
Epoch 6/10
70/70 [==============================] - 42s 605ms/step - loss: 0.1511 - accuracy: 0.9670 - val_loss: 0.1338 - val_accuracy: 0.9744
Epoch 7/10
70/70 [==============================] - 49s 698ms/step - loss: 0.1623 - accuracy: 0.9670 - val_loss: 0.1188 - val_accuracy: 0.9744
Epoch 8/10
70/70 [==============================] - 50s 709ms/step - loss: 0.1480 - accuracy: 0.9670 - val_loss: 0.1397 - val_accuracy: 0.9744
Epoch 9/10
70/70 [==============================] - 45s 637ms/step - loss: 0.1508 - accuracy: 0.9670 - val_loss: 0.1203 - val_accuracy: 0.9744
Epoch 10/10
70/70 [==============================] - 47s 665ms/step - loss: 0.1716 - accuracy: 0.9670 - val_loss: 0.1238 - val_accuracy: 0.9744
Process finished with exit code 0
What should I do to fix this problem?
There are a couple potential reasons as to why you are facing this:
Your dataset is far too small. If your validation set is tiny, there is a high probability that your model will get the same % of predictions correct/incorrect
There is a great imbalance in your dataset. If one class heavily outweighs another, your model will favor the majority class, and predict it no matter what, as that is what brings the optimal accuracy for the model.
From what I see, there is nothing wrong with your code, rather modifications that need to be made to the dataset itself.
Hmm accuracy and validation accuracy are high even on the first epoch. Try using a lower learning rate in the Adam optimizer say .0002, On the first epoch pay attention to the loss and validation loss as the batches are process. It should start low and gradually increase during the epoch.
Related
I am working on a image classification project and my model doesn't seem to train properly.
My dataset is made of 4000 images each with a shape of (120,120,3).
Test set represents 20% of the total dataset.
All images have been correctly labeled.
The images are normalized and one-hot encoded. For now I use only two targets, but I will add one more one I start getting decent results.
I use a batch size of 16
I want to use a CNN model.
My current model :
model = keras.models.Sequential()
model.add(Conv2D(filters=16, kernel_size=(6,6), input_shape=(IMG_SIZE,IMG_SIZE,3), activation='relu',))
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Dropout(0.2))
model.add(Conv2D(filters=32, kernel_size=(5,5), activation='relu',))
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Dropout(0.2))
model.add(Conv2D(filters=64, kernel_size=(4,4), activation='relu',))
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Dropout(0.2))
model.add(Conv2D(filters=128, kernel_size=(3,3), activation='relu',))
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Dropout(0.2))
model.add(Conv2D(filters=256, kernel_size=(2,2), activation='relu',))
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))
model.compile(optimizer='nadam',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.summary()
model summary gives :
Total params: 273,330
Trainable params: 273,330
Non-trainable params: 0
from tensorflow.keras.callbacks import EarlyStopping
early_stop = EarlyStopping(monitor='val_loss',patience=10)
history = model.fit(x_train_sample, y_train_sample,
batch_size = BATCH_SIZE,
epochs = EPOCHS,
verbose = 1,
validation_data = (x_test, y_test)
,callbacks=[early_stop,PlotLossesKeras()])
When I run my model for 30 epochs, earlystopping triggers.
Epoch 1/30
43/43 [==============================] - 9s 205ms/step - loss: 0.1109 - accuracy: 0.9531 - val_loss: 0.5259 - val_accuracy: 0.8397
Epoch 2/30
43/43 [==============================] - 10s 231ms/step - loss: 0.0812 - accuracy: 0.9692 - val_loss: 0.5793 - val_accuracy: 0.8355
Epoch 3/30
43/43 [==============================] - 9s 219ms/step - loss: 0.1000 - accuracy: 0.9721 - val_loss: 0.5367 - val_accuracy: 0.8547
Epoch 4/30
43/43 [==============================] - 9s 209ms/step - loss: 0.0694 - accuracy: 0.9707 - val_loss: 0.6101 - val_accuracy: 0.8269
Epoch 5/30
43/43 [==============================] - 9s 203ms/step - loss: 0.0891 - accuracy: 0.9633 - val_loss: 0.6116 - val_accuracy: 0.8419
Epoch 6/30
43/43 [==============================] - 9s 210ms/step - loss: 0.0567 - accuracy: 0.9765 - val_loss: 0.4833 - val_accuracy: 0.8419
Epoch 7/30
43/43 [==============================] - 9s 218ms/step - loss: 0.0312 - accuracy: 0.9897 - val_loss: 1.4513 - val_accuracy: 0.8034
Epoch 8/30
43/43 [==============================] - 9s 213ms/step - loss: 0.0820 - accuracy: 0.9707 - val_loss: 0.5821 - val_accuracy: 0.8248
Epoch 9/30
43/43 [==============================] - 9s 222ms/step - loss: 0.0513 - accuracy: 0.9897 - val_loss: 0.8516 - val_accuracy: 0.8462
Epoch 10/30
43/43 [==============================] - 11s 246ms/step - loss: 0.0442 - accuracy: 0.9853 - val_loss: 0.7927 - val_accuracy: 0.8397
Epoch 11/30
43/43 [==============================] - 10s 222ms/step - loss: 0.0356 - accuracy: 0.9897 - val_loss: 0.7730 - val_accuracy: 0.8141
Epoch 12/30
43/43 [==============================] - 10s 232ms/step - loss: 0.0309 - accuracy: 0.9824 - val_loss: 0.9528 - val_accuracy: 0.8226
Epoch 13/30
43/43 [==============================] - 9s 220ms/step - loss: 0.0424 - accuracy: 0.9839 - val_loss: 1.2109 - val_accuracy: 0.8013
Epoch 14/30
43/43 [==============================] - 10s 228ms/step - loss: 0.0645 - accuracy: 0.9824 - val_loss: 0.5308 - val_accuracy: 0.8547
Epoch 15/30
43/43 [==============================] - 11s 259ms/step - loss: 0.0293 - accuracy: 0.9927 - val_loss: 0.9271 - val_accuracy: 0.8333
Epoch 16/30
43/43 [==============================] - 9s 217ms/step - loss: 0.0430 - accuracy: 0.9795 - val_loss: 0.6687 - val_accuracy: 0.8483
I have tried many different model architectures, changing number of layers, kernel size etc... I can't seem to figure out what is going wrong.
There are many possible reasons.
For starters, depending on your categories, you might want to consider using transfer learning to speed up your training process.
Your architecture looks reasonable and the training and validation loss seems right as well (overfitting is occurring).
Given that you've stated that you could have 3 categories and am currently only using 2, might there be a different distribution between your training set and your test set? That might be causing the model to be unable to generalise well.
For instance, your dataset contains of evenly distributed number of images of Cats, Dogs and Humans. You set 2 categories to train on and thus your model attempts to segment between humans and animals when it tries to validate, there is an uneven distribution in the training data causing the model to see insufficient training size of humans (33%)?
I have built a tensorflow model and am getting no change in my validation accuracy in different epochs, which makes me believe there is something wrong in my setup. Below is my code.
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
from keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras import regularizers
import tensorflow as tf
model = Sequential()
model.add(Conv2D(16, (3, 3), input_shape=(299, 299,3),padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2),padding='same'))
model.add(Conv2D(32, (3, 3),padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2),padding='same'))
model.add(Conv2D(64, (3, 3),padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2),padding='same'))
model.add(Conv2D(64, (3, 3),padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2),padding='same'))
# this converts our 3D feature maps to 1D feature vectors
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
batch_size=32
# this is the augmentation configuration we will use for training
train_datagen = ImageDataGenerator(
rescale=1./255,
# shear_range=0.2,
# zoom_range=0.2,
horizontal_flip=True)
# this is the augmentation configuration we will use for testing:
# only rescaling
test_datagen = ImageDataGenerator(rescale=1./255)
# this is a generator that will read pictures found in
# subfolers of 'data/train', and indefinitely generate
# batches of augmented image data
train_generator = train_datagen.flow_from_directory(
'Documents/Training', # this is the target directory
target_size=(299, 299), #all images will be resized to 299
batch_size=batch_size,
class_mode='binary') # since we use binary_crossentropy loss, we need binary labels
# this is a similar generator, for validation data
validation_generator = test_datagen.flow_from_directory(
'Documents/Dev',
target_size=(299, 299),
batch_size=batch_size,
class_mode='binary')
#w1 = tf.Variable(tf.truncated_normal([784, 30], stddev=0.1))
model.fit_generator(
train_generator,
steps_per_epoch=50 // batch_size,
verbose = 1,
epochs=10,
validation_data=validation_generator,
validation_steps=8 // batch_size)
Which when I run produces the following output. Anything I'm missing here as far as my architecture is concerned or data generation steps? I have referenced Tensorflow model accuracy not increasing and accuracy not increasing in tensorflow model to no avail yet.
Epoch 1/10
3/3 [==============================] - 2s 593ms/step - loss: 0.6719 - accuracy: 0.6250 - val_loss: 0.8198 - val_accuracy: 0.5000
Epoch 2/10
3/3 [==============================] - 2s 607ms/step - loss: 0.6521 - accuracy: 0.6667 - val_loss: 0.8518 - val_accuracy: 0.5000
Epoch 3/10
3/3 [==============================] - 2s 609ms/step - loss: 0.6752 - accuracy: 0.6250 - val_loss: 0.7129 - val_accuracy: 0.5000
Epoch 4/10
3/3 [==============================] - 2s 611ms/step - loss: 0.6841 - accuracy: 0.6250 - val_loss: 0.7010 - val_accuracy: 0.5000
Epoch 5/10
3/3 [==============================] - 2s 608ms/step - loss: 0.6977 - accuracy: 0.5417 - val_loss: 0.6551 - val_accuracy: 0.5000
Epoch 6/10
3/3 [==============================] - 2s 607ms/step - loss: 0.6508 - accuracy: 0.7083 - val_loss: 0.5752 - val_accuracy: 0.5000
Epoch 7/10
3/3 [==============================] - 2s 615ms/step - loss: 0.6596 - accuracy: 0.6875 - val_loss: 0.9326 - val_accuracy: 0.5000
Epoch 8/10
3/3 [==============================] - 2s 604ms/step - loss: 0.7022 - accuracy: 0.6458 - val_loss: 0.6976 - val_accuracy: 0.5000
Epoch 9/10
3/3 [==============================] - 2s 591ms/step - loss: 0.6331 - accuracy: 0.7292 - val_loss: 0.9571 - val_accuracy: 0.5000
Epoch 10/10
3/3 [==============================] - 2s 595ms/step - loss: 0.6085 - accuracy: 0.7292 - val_loss: 0.6029 - val_accuracy: 0.5000
Out[24]: <keras.callbacks.callbacks.History at 0x1ee4e3a8f08>
You are setting the training steps per epoch =50//32=1. So do you only have 50 training images? Similarly for validation you have steps = 8//32=0. Do you have only 8 validation images? When you execute the program how many images do the training and validation generators print out they have found? You will need more images than that. Try setting your batch size =1
The model that I am using is this:
from keras.layers import (Input, MaxPooling1D, Dropout,
BatchNormalization, Activation, Add,
Flatten, Conv1D, Dense)
from keras.models import Model
import numpy as np
class ResidualUnit(object):
"""References
----------
.. [1] K. He, X. Zhang, S. Ren, and J. Sun, "Identity Mappings in Deep Residual Networks,"
arXiv:1603.05027 [cs], Mar. 2016. https://arxiv.org/pdf/1603.05027.pdf.
.. [2] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in 2016 IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-778. https://arxiv.org/pdf/1512.03385.pdf
"""
def __init__(self, n_samples_out, n_filters_out, kernel_initializer='he_normal',
dropout_rate=0.8, kernel_size=17, preactivation=True,
postactivation_bn=False, activation_function='relu'):
self.n_samples_out = n_samples_out
self.n_filters_out = n_filters_out
self.kernel_initializer = kernel_initializer
self.dropout_rate = dropout_rate
self.kernel_size = kernel_size
self.preactivation = preactivation
self.postactivation_bn = postactivation_bn
self.activation_function = activation_function
def _skip_connection(self, y, downsample, n_filters_in):
"""Implement skip connection."""
# Deal with downsampling
if downsample > 1:
y = MaxPooling1D(downsample, strides=downsample, padding='same')(y)
elif downsample == 1:
y = y
else:
raise ValueError("Number of samples should always decrease.")
# Deal with n_filters dimension increase
if n_filters_in != self.n_filters_out:
# This is one of the two alternatives presented in ResNet paper
# Other option is to just fill the matrix with zeros.
y = Conv1D(self.n_filters_out, 1, padding='same',
use_bias=False,
kernel_initializer=self.kernel_initializer
)(y)
return y
def _batch_norm_plus_activation(self, x):
if self.postactivation_bn:
x = Activation(self.activation_function)(x)
x = BatchNormalization(center=False, scale=False)(x)
else:
x = BatchNormalization()(x)
x = Activation(self.activation_function)(x)
return x
def __call__(self, inputs):
"""Residual unit."""
x, y = inputs
n_samples_in = y.shape[1]
downsample = n_samples_in // self.n_samples_out
n_filters_in = y.shape[2]
y = self._skip_connection(y, downsample, n_filters_in)
# 1st layer
x = Conv1D(self.n_filters_out, self.kernel_size, padding='same',
use_bias=False,
kernel_initializer=self.kernel_initializer
)(x)
x = self._batch_norm_plus_activation(x)
if self.dropout_rate > 0:
x = Dropout(self.dropout_rate)(x)
# 2nd layer
x = Conv1D(self.n_filters_out, self.kernel_size, strides=downsample,
padding='same', use_bias=False,
kernel_initializer=self.kernel_initializer
)(x)
if self.preactivation:
x = Add()([x, y]) # Sum skip connection and main connection
y = x
x = self._batch_norm_plus_activation(x)
if self.dropout_rate > 0:
x = Dropout(self.dropout_rate)(x)
else:
x = BatchNormalization()(x)
x = Add()([x, y]) # Sum skip connection and main connection
x = Activation(self.activation_function)(x)
if self.dropout_rate > 0:
x = Dropout(self.dropout_rate)(x)
y = x
return [x, y]
# ----- Model ----- #
kernel_size = 16
kernel_initializer = 'he_normal'
signal = Input(shape=(1000, 12), dtype=np.float32, name='signal')
age_range = Input(shape=(6,), dtype=np.float32, name='age_range')
is_male = Input(shape=(1,), dtype=np.float32, name='is_male')
x = signal
x = Conv1D(64, kernel_size, padding='same', use_bias=False,
kernel_initializer=kernel_initializer
)(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x, y = ResidualUnit(512, 128, kernel_size=kernel_size,
kernel_initializer=kernel_initializer
)([x, x])
x, y = ResidualUnit(256, 196, kernel_size=kernel_size,
kernel_initializer=kernel_initializer
)([x, y])
x, y = ResidualUnit(64, 256, kernel_size=kernel_size,
kernel_initializer=kernel_initializer
)([x, y])
x, _ = ResidualUnit(16, 320, kernel_size=kernel_size, kernel_initializer=kernel_initializer
)([x, y])
x = Flatten()(x)
diagn = Dense(2, activation='sigmoid', kernel_initializer=kernel_initializer)(x)
model = Model(signal, diagn)
model.summary()
# ----- Train ----- #
from keras.optimizers import Adam
from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau
loss = 'binary_crossentropy'
lr = 0.001
batch_size = 64
opt = Adam(learning_rate=0.001)
callbacks = [ReduceLROnPlateau(monitor='val_loss',
factor=0.1,
patience=7,
min_lr=lr / 100)]
model.compile(optimizer=opt, loss=loss, metrics=['accuracy'])
history = model.fit(x_train, y_train,
batch_size=batch_size,
epochs=70,
initial_epoch=0,
validation_split=0.1,
shuffle='batch',
callbacks=callbacks,
verbose=1)
# Save final result
model.save("./final_model_middle_one.hdf5")
When I substitute the use of Keras with tf.keras, which I need to use the qkeras library, the model doesn't learn and gets stuck at a much lower accuracy at every iteration. What could be causing this?
When I use keras the accuracy start high at 83% and slightly increases during training.
Train on 17340 samples, validate on 1927 samples
Epoch 1/70
17340/17340 [==============================] - 33s 2ms/step - loss: 0.3908 - accuracy: 0.8314 - val_loss: 0.3283 - val_accuracy: 0.8710
Epoch 2/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.3641 - accuracy: 0.8416 - val_loss: 0.3340 - val_accuracy: 0.8612
Epoch 3/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.3525 - accuracy: 0.8483 - val_loss: 0.3847 - val_accuracy: 0.8550
Epoch 4/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.3354 - accuracy: 0.8563 - val_loss: 0.4641 - val_accuracy: 0.8215
Epoch 5/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.3269 - accuracy: 0.8590 - val_loss: 0.7172 - val_accuracy: 0.7870
Epoch 6/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.3202 - accuracy: 0.8630 - val_loss: 0.3599 - val_accuracy: 0.8617
Epoch 7/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.3101 - accuracy: 0.8678 - val_loss: 0.2659 - val_accuracy: 0.8934
Epoch 8/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.3058 - accuracy: 0.8688 - val_loss: 0.5683 - val_accuracy: 0.8293
Epoch 9/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.2980 - accuracy: 0.8739 - val_loss: 0.3442 - val_accuracy: 0.8643
Epoch 10/70
7424/17340 [===========>..................] - ETA: 17s - loss: 0.2966 - accuracy: 0.8707
When I use tf.keras the accuracy starts at 50% and does not increase considerably during training:
Epoch 1/70
271/271 [==============================] - 30s 110ms/step - loss: 0.9325 - accuracy: 0.5093 - val_loss: 0.6973 - val_accuracy: 0.5470 - lr: 0.0010
Epoch 2/70
271/271 [==============================] - 29s 108ms/step - loss: 0.8424 - accuracy: 0.5157 - val_loss: 0.6660 - val_accuracy: 0.6528 - lr: 0.0010
Epoch 3/70
271/271 [==============================] - 29s 108ms/step - loss: 0.8066 - accuracy: 0.5213 - val_loss: 0.6441 - val_accuracy: 0.6539 - lr: 0.0010
Epoch 4/70
271/271 [==============================] - 29s 108ms/step - loss: 0.7884 - accuracy: 0.5272 - val_loss: 0.6649 - val_accuracy: 0.6559 - lr: 0.0010
Epoch 5/70
271/271 [==============================] - 29s 108ms/step - loss: 0.7888 - accuracy: 0.5368 - val_loss: 0.6899 - val_accuracy: 0.5760 - lr: 0.0010
Epoch 6/70
271/271 [==============================] - 29s 108ms/step - loss: 0.7617 - accuracy: 0.5304 - val_loss: 0.6641 - val_accuracy: 0.6533 - lr: 0.0010
Epoch 7/70
271/271 [==============================] - 29s 108ms/step - loss: 0.7485 - accuracy: 0.5333 - val_loss: 0.6450 - val_accuracy: 0.6544 - lr: 0.0010
Epoch 8/70
271/271 [==============================] - 29s 108ms/step - loss: 0.7431 - accuracy: 0.5382 - val_loss: 0.6599 - val_accuracy: 0.6539 - lr: 0.0010
Epoch 9/70
271/271 [==============================] - 29s 108ms/step - loss: 0.7336 - accuracy: 0.5421 - val_loss: 0.6532 - val_accuracy: 0.6554 - lr: 0.0010
Epoch 10/70
271/271 [==============================] - 29s 108ms/step - loss: 0.7274 - accuracy: 0.5379 - val_loss: 0.6753 - val_accuracy: 0.6492 - lr: 0.0010
The lines that have been changed between the two trials are the lines where I import keras modules by adding 'tensorflow.' in front of them. I don't know why the results would be so different, possibly due to different default values of certain parameters?
It might be related to how the accuracy metric is computed in keras vs tf.keras. As far as I can tell the accuracy function is usually used when you have one-hot-encoded output. However, it seems that you are outputting two values [A, B] with a sigmoid function applied to each value.
Since I don't know the labels you're using, there might be two cases:
a) You want to predict A or B. If sos I would change the activation function to softmax
b) You want to predict between A or not A and B or not B. In this case I would modify the output tensor shape to have two heads, each with two values: head_A = [A, not_A] and head_B = [B, not_B]. I would then hot-encode the labels respectively and then I would assume you could use the accuracy metric.
Alternatively, you can create a custom metric that is appropriate to your output shape.
I have a similar (same?) problem, I was manipulating some examples from Kaggle, and was unable to save the model using keras. After much Googling I realised that I needed to use tensorflow.keras. This solved my problem, but the 60000 data items I have and was using for training dropped to a reported 1875. Although the error was still 10%.
1875 * 32 = 60000.
This is my fit.
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=epochs, verbose=True,
callbacks=[early_stopping_monitor])
1539/1875 [=======================>......] - ETA: 3s - loss: 0.4445 - accuracy: 0.8418
It turns out that fit defaults to a batch size of 32. If I increase the batch size to 64 I get half the reported data sets, which makes sense:
model.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), epochs=epochs, verbose=True,
callbacks=[early_stopping_monitor])
938/938 [==============================] - 16s 17ms/step - loss: 0.4568 - accuracy: 0.8388
I noticed from your code that you've set batch_size to 64, and your reported data items reduce from 17340 to 271 which is about a 64th, this must also affect your accuracy due to the data you are using.
From the docs here: https://www.tensorflow.org/api_docs/python/tf/keras/Sequential
batch_size
Integer or None. Number of samples per gradient update. If unspecified, batch_size will default to 32. Do not specify the batch_size if your data is in the form of a dataset, generators, or keras.utils.Sequence instances (since they generate batches).
From the Keras docs: https://keras.rstudio.com/reference/fit.html, it also says that the batch size defaults to 32, it must just be reported differently when training the model.
Hope this helps.
Value of val_acc does not change over the epochs.
Summary:
I'm using a pre-trained (ImageNet) VGG16 from Keras;
from keras.applications import VGG16
conv_base = VGG16(weights='imagenet', include_top=True, input_shape=(224, 224, 3))
Database from ISBI 2016 (ISIC) - which is a set of 900 images of skin lesion used for binary classification (malignant or benign) for training and validation, plus 379 images for testing -;
I use the top dense layers of VGG16 except the last one (that classifies over 1000 classes), and use a binary output with sigmoid function activation;
conv_base.layers.pop() # Remove last one
conv_base.trainable = False
model = models.Sequential()
model.add(conv_base)
model.add(layers.Dense(1, activation='sigmoid'))
Unlock the dense layers setting them to trainable;
Fetch the data, which are in two different folders, one named "malignant" and the other "benign", within the "training data" folder;
from keras.preprocessing.image import ImageDataGenerator
from keras import optimizers
folder = 'ISBI2016_ISIC_Part3_Training_Data'
batch_size = 20
full_datagen = ImageDataGenerator(
rescale=1./255,
#rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
validation_split = 0.2, # 20% validation
horizontal_flip=True)
train_generator = full_datagen.flow_from_directory( # Found 721 images belonging to 2 classes.
folder,
target_size=(224, 224),
batch_size=batch_size,
subset = 'training',
class_mode='binary')
validation_generator = full_datagen.flow_from_directory( # Found 179 images belonging to 2 classes.
folder,
target_size=(224, 224),
batch_size=batch_size,
subset = 'validation',
shuffle=False,
class_mode='binary')
model.compile(loss='binary_crossentropy',
optimizer=optimizers.SGD(lr=0.001), # High learning rate
metrics=['accuracy'])
history = model.fit_generator(
train_generator,
steps_per_epoch=721 // batch_size+1,
epochs=20,
validation_data=validation_generator,
validation_steps=180 // batch_size+1,
)
Then I fine-tune it with 100 more epochs and lower learning rate, setting the last convolutional layer to trainable.
I've tried many things such as:
Changing the optimizer (RMSprop, Adam and SGD);
Removing the top dense layers of the pre-trained VGG16 and adding mine;
model.add(layers.Flatten())
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
Shuffle=True in validation_generator;
Changing batch size;
Varying the learning rate (0.001, 0.0001, 2e-5).
The results are similar to the following:
Epoch 1/100
37/37 [==============================] - 33s 900ms/step - loss: 0.6394 - acc: 0.7857 - val_loss: 0.6343 - val_acc: 0.8101
Epoch 2/100
37/37 [==============================] - 30s 819ms/step - loss: 0.6342 - acc: 0.8107 - val_loss: 0.6342 - val_acc: 0.8101
Epoch 3/100
37/37 [==============================] - 30s 822ms/step - loss: 0.6324 - acc: 0.8188 - val_loss: 0.6341 - val_acc: 0.8101
Epoch 4/100
37/37 [==============================] - 31s 840ms/step - loss: 0.6346 - acc: 0.8080 - val_loss: 0.6341 - val_acc: 0.8101
Epoch 5/100
37/37 [==============================] - 31s 833ms/step - loss: 0.6395 - acc: 0.7843 - val_loss: 0.6341 - val_acc: 0.8101
Epoch 6/100
37/37 [==============================] - 31s 829ms/step - loss: 0.6334 - acc: 0.8134 - val_loss: 0.6340 - val_acc: 0.8101
Epoch 7/100
37/37 [==============================] - 31s 834ms/step - loss: 0.6334 - acc: 0.8134 - val_loss: 0.6340 - val_acc: 0.8101
Epoch 8/100
37/37 [==============================] - 31s 829ms/step - loss: 0.6342 - acc: 0.8093 - val_loss: 0.6339 - val_acc: 0.8101
Epoch 9/100
37/37 [==============================] - 31s 849ms/step - loss: 0.6330 - acc: 0.8147 - val_loss: 0.6339 - val_acc: 0.8101
Epoch 10/100
37/37 [==============================] - 30s 812ms/step - loss: 0.6332 - acc: 0.8134 - val_loss: 0.6338 - val_acc: 0.8101
Epoch 11/100
37/37 [==============================] - 31s 839ms/step - loss: 0.6338 - acc: 0.8107 - val_loss: 0.6338 - val_acc: 0.8101
Epoch 12/100
37/37 [==============================] - 30s 807ms/step - loss: 0.6334 - acc: 0.8120 - val_loss: 0.6337 - val_acc: 0.8101
Epoch 13/100
37/37 [==============================] - 32s 852ms/step - loss: 0.6334 - acc: 0.8120 - val_loss: 0.6337 - val_acc: 0.8101
Epoch 14/100
37/37 [==============================] - 31s 826ms/step - loss: 0.6330 - acc: 0.8134 - val_loss: 0.6336 - val_acc: 0.8101
Epoch 15/100
37/37 [==============================] - 32s 854ms/step - loss: 0.6335 - acc: 0.8107 - val_loss: 0.6336 - val_acc: 0.8101
And goes on the same way, with constant val_acc = 0.8101.
When I use the test set after finishing training, the confusion matrix gives me 100% correct on benign lesions (304) and 0% on malignant, as so:
Confusion Matrix
[[304 0]
[ 75 0]]
What could I be doing wrong?
Thank you.
VGG16 was trained on RGB centered data. Your ImageDataGenerator does not enable featurewise_center, however, so you're feeding your net with raw RGB data. The VGG convolutional base can't process this to provide any meaningful information, so your net ends up universally guessing the more common class.
In general, when you see this type of problem (your net exclusively guessing the most common class), it means that there's something wrong with your data, not with the net. It can be caused by a preprocessing step like this or by a significant portion of "poisoned" anomalous training data that actively harms the training process.
I wrote this code a few days ago and I had a few bugs but with some help, I was able to fix them. The Model is not learning. I tried different batch sizes, different amount of epochs, different activation functions, checked my data a few times for flaws I wasn't able to find any. It is due in a week or so for a school project. Any help will be very much valued.
Here is the code.
from keras.layers import Dense, Input, Concatenate, Dropout
from sklearn.preprocessing import MinMaxScaler
from keras.models import Model
from keras.layers import LSTM
import tensorflow as tf
import NetworkRequest as NR
import ParseNetworkRequest as PNR
import numpy as np
def buildModel():
_Price = Input(shape=(1, 1))
_Volume = Input(shape=(1, 1))
PriceLayer = LSTM(128)(_Price)
VolumeLayer = LSTM(128)(_Volume)
merged = Concatenate(axis=1)([PriceLayer, VolumeLayer])
Dropout(0.2)
dense1 = Dense(128, input_dim=2, activation='relu', use_bias=True)(merged)
Dropout(0.2)
dense2 = Dense(64, input_dim=2, activation='relu', use_bias=True)(dense1)
Dropout(0.2)
output = Dense(1, activation='softmax', use_bias=True)(dense2)
opt = tf.keras.optimizers.Adam(learning_rate=1e-3, decay=1e-6)
_Model = Model(inputs=[_Price, _Volume], output=output)
_Model.compile(optimizer=opt, loss='mse', metrics=['accuracy'])
return _Model
if __name__ == '__main__':
api_key = "47BGPYJPFN4CEC20"
stock = "DJI"
Index = ['4. close', '5. volume']
RawData = NR.Initial_Network_Request(api_key, stock)
Closing = PNR.Parse_Network_Request(RawData, Index[0])
Volume = PNR.Parse_Network_Request(RawData, Index[1])
Length = len(Closing)
scalar = MinMaxScaler(feature_range=(0, 1))
Closing_scaled = scalar.fit_transform(np.reshape(Closing[:-1], (-1, 1)))
Volume_scaled = scalar.fit_transform(np.reshape(Volume[:-1], (-1, 1)))
Labels_scaled = scalar.fit_transform(np.reshape(Closing[1:], (-1, 1)))
Train_Closing = Closing_scaled[:int(0.9 * Length)]
Train_Closing = np.reshape(Train_Closing, (Train_Closing.shape[0], 1, 1))
Train_Volume = Volume_scaled[:int(0.9 * Length)]
Train_Volume = np.reshape(Train_Volume, (Train_Volume.shape[0], 1, 1))
Train_Labels = Labels_scaled[:int((0.9 * Length))]
Train_Labels = np.reshape(Train_Labels, (Train_Labels.shape[0], 1))
# -------------------------------------------------------------------------------------------#
Test_Closing = Closing_scaled[int(0.9 * Length):(Length - 1)]
Test_Closing = np.reshape(Test_Closing, (Test_Closing.shape[0], 1, 1))
Test_Volume = Volume_scaled[int(0.9 * Length):(Length - 1)]
Test_Volume = np.reshape(Test_Volume, (Test_Volume.shape[0], 1, 1))
Test_Labels = Labels_scaled[int(0.9 * Length):(Length - 1)]
Test_Labels = np.reshape(Test_Labels, (Test_Labels.shape[0], 1))
Predict_Closing = Closing_scaled[-1]
Predict_Closing = np.reshape(Predict_Closing, (Predict_Closing.shape[0], 1, 1))
Predict_Volume = Volume_scaled[-1]
Predict_Volume = np.reshape(Predict_Volume, (Predict_Volume.shape[0], 1, 1))
Predict_Label = Labels_scaled[-1]
Predict_Label = np.reshape(Predict_Label, (Predict_Label.shape[0], 1))
model = buildModel()
model.fit(
[
Train_Closing,
Train_Volume
],
[
Train_Labels
],
validation_data=(
[
Test_Closing,
Test_Volume
],
[
Test_Labels
]
),
epochs=10,
batch_size=Length
)
This is the output when I run it.
Using TensorFlow backend.
2020-01-01 16:31:47.905012: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199985000 Hz
2020-01-01 16:31:47.906105: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x49214f0 executing computations on platform Host. Devices:
2020-01-01 16:31:47.906137: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version
/home/martin/PycharmProjects/MarketPredictor/Model.py:26: UserWarning: Update your `Model` call to the Keras 2 API: `Model(inputs=[<tf.Tenso..., outputs=Tensor("de...)`
_Model = Model(inputs=[_Price, _Volume], output=output)
Train on 4527 samples, validate on 503 samples
Epoch 1/10
4527/4527 [==============================] - 1s 179us/step - loss: 0.4716 - accuracy: 2.2090e-04 - val_loss: 0.6772 - val_accuracy: 0.0000e+00
Epoch 2/10
4527/4527 [==============================] - 0s 41us/step - loss: 0.4716 - accuracy: 2.2090e-04 - val_loss: 0.6772 - val_accuracy: 0.0000e+00
Epoch 3/10
4527/4527 [==============================] - 0s 42us/step - loss: 0.4716 - accuracy: 2.2090e-04 - val_loss: 0.6772 - val_accuracy: 0.0000e+00
Epoch 4/10
4527/4527 [==============================] - 0s 42us/step - loss: 0.4716 - accuracy: 2.2090e-04 - val_loss: 0.6772 - val_accuracy: 0.0000e+00
Epoch 5/10
4527/4527 [==============================] - 0s 43us/step - loss: 0.4716 - accuracy: 2.2090e-04 - val_loss: 0.6772 - val_accuracy: 0.0000e+00
Epoch 6/10
4527/4527 [==============================] - 0s 39us/step - loss: 0.4716 - accuracy: 2.2090e-04 - val_loss: 0.6772 - val_accuracy: 0.0000e+00
Epoch 7/10
4527/4527 [==============================] - 0s 42us/step - loss: 0.4716 - accuracy: 2.2090e-04 - val_loss: 0.6772 - val_accuracy: 0.0000e+00
Epoch 8/10
4527/4527 [==============================] - 0s 39us/step - loss: 0.4716 - accuracy: 2.2090e-04 - val_loss: 0.6772 - val_accuracy: 0.0000e+00
Epoch 9/10
4527/4527 [==============================] - 0s 42us/step - loss: 0.4716 - accuracy: 2.2090e-04 - val_loss: 0.6772 - val_accuracy: 0.0000e+00
Epoch 10/10
4527/4527 [==============================] - 0s 38us/step - loss: 0.4716 - accuracy: 2.2090e-04 - val_loss: 0.6772 - val_accuracy: 0.0000e+00
Process finished with exit code 0
The loss is high, and the accuracy is 0.
Please help.
You're using activation functions and metrics made for a classification task, not a stock forecasting task (with a continuous target).
For continuous targets, your final activation layer should be linear. Metrics should be mse or mae, not accuracy.
accuracy would only be satisfied is the dji prediction is exactly equal to the actual price. Since dji has at least 7 digits, it's nearly impossible.
Here's my suggestion:
Use a simpler network: Not sure how big is your dataset, but sometimes using dense. layer isn't helpful. Looks like the weights of there intermediate layers are not changing at all. Try the model with just one dense layer.
Reduce dropout: Try with using one dropout layer with Dropout(0.1).
Adam defaults: Start with using adam optimizer with its default parameters.
Metric selection: As mentioned by Nicolas's answer, use a regression metric instead of accuracy.