Tensorflow image classification binary crossentropy loss is negative - python

I'm new to Tensorflow. I followed some tutorials with a provided dataset and wanted to try something on my own. I decided I'd try to classify Magic the Gathering sets. Each card has a symbol in different colors on it: Black, Gold and so on.
The colors don't matter, just the different symbols. So I created a dataset of 3 different sets (so 3 different symbols) and got around 15'000 images like this. Some are a little bit rotated, some have an X and Y offset, just to get some different images.
Then I adapted the tutorial on the tensorflow website for image classification. Instead of two classes I wanted to try three:
batch_size = 250
epochs = 3
IMG_HEIGHT = 55
IMG_WIDTH = 55
train_image_generator = ImageDataGenerator(rescale=1./255)
validation_image_generator = ImageDataGenerator(rescale=1./255)
train_data_gen = train_image_generator.flow_from_directory(batch_size=batch_size,
directory=train_dir,
shuffle=True,
target_size=(IMG_HEIGHT, IMG_WIDTH),
class_mode='binary')
val_data_gen = validation_image_generator.flow_from_directory(batch_size=batch_size,
directory=validation_dir,
target_size=(IMG_HEIGHT, IMG_WIDTH),
class_mode='binary')
model = Sequential([
Conv2D(16, 3, padding='same', activation='relu', input_shape=(IMG_HEIGHT, IMG_WIDTH ,3)),
MaxPooling2D(),
Conv2D(32, 3, padding='same', activation='relu'),
MaxPooling2D(),
Conv2D(64, 3, padding='same', activation='relu'),
MaxPooling2D(),
Flatten(),
Dense(512, activation='relu'),
Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
history = model.fit_generator(
train_data_gen,
steps_per_epoch=total_train // batch_size,
epochs=epochs,
validation_data=val_data_gen,
validation_steps=total_val // batch_size,
callbacks=[cp_callback]
)
But my loss is negative and I don't get a good accuracy after training. What did I mess up? Is the model used in the tutorial not good for my usecase? Or is there an error in the code because I used three instead of two classes?

The model from the tutorial was used for binary classification (only two classes, cat or dog). You on the other hand want to classify 3 classes not 2. Therefore you have to adapt the architecture a little bit. Your last layer should be:
Dense(3, activation='softmax')
Three neurons because you have three classes and softmax activation because you want your outputs to be valid probabilities. To compile the model, use categorical_crossentropy instead of binary_crossentropy and make sure your labels are one-hot-encoded. Also for your ImageDataGenerator you should pass class_mode=categorical to the .flow_from_directory() function.

Related

CNN layers confusion

I have this model which is attempting to classify cats and dogs:
model = Sequential([Conv2D(128, kernel_size=(3,3), activation='relu', input_shape=(IMG_HEIGHT, IMG_WIDTH, 3)),
MaxPooling2D(pool_size=(2,2)),
Conv2D(64, kernel_size=(3,3), activation='relu'),
MaxPooling2D(pool_size=(2,2)),
Flatten(),
Dense(32, activation='relu'),
Dense(2, activation='softmax')]) # pick between 2 different possible outputs
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()
and then attempting to run the model like so:
history = model.fit(x=train_data_gen, steps_per_epoch=total_train//batch_size,
epochs=epochs, batch_size=batch_size,
validation_data=val_data_gen,
validation_steps=total_val//batch_size)
however, I get this ValueError:
ValueError: `logits` and `labels` must have the same shape, received ((None, 2) vs (None, 1)).
If I change the last dense layer to have a dimensionality of 1, then this runs, but I want a binary classification with 2 output layers to which I softmax between them to analyze the testing data.
How do I fix my train_data_gen in order to match the dimensionality, as it is a keras.preprocessing.image.DirectoryIterator object defined like so:
train_data_gen = train_image_generator.flow_from_directory(batch_size=batch_size,
directory=train_dir,
target_size=(IMG_HEIGHT, IMG_WIDTH),
class_mode='binary')
Is there a way I can reshape this object so my model runs correctly, because I can't seem to find it with regards to this object, or if I need to convert this into a numpy array or tensor first. Also, how do I classify dimensionality/filter arguments in these models? I went with 128, 64, 32, and cutting by 2 because this is what I saw online, but if an explanation could be provided as to why these values are picked that would greatly help me out. Thank you in advance for the help!
Thanks Jay Mody for your response. I was away on holiday and taking a break from this project, but yes what you suggested was correct and useful to actually understand what I was doing. I also wanted to mention a few other errors I had which led to a worse/useless model performance.
The steps_per_epoch and validation_steps arguments were not totally incorrect, but produced weird graphs that I didn't see in other online examples
I learned how they are implemented through this website, in my case substituting the training images and validation images counts as the corresponding sizes. Now my graph looks like so:
Also I played around with my filter(s) arguments for my model, and found this resource helpful. My model now looks like so, and works well:
model = Sequential([Conv2D(32, kernel_size=(3,3), activation='relu', input_shape=(IMG_HEIGHT, IMG_WIDTH, 3)),
MaxPooling2D(pool_size=(2,2)),
Conv2D(32, kernel_size=(3,3), activation='relu'),
MaxPooling2D(pool_size=(2,2)),
Flatten(),
Dense(128, activation='relu'),
Dense(1, activation='sigmoid')]) # pick between 2 different possible outputs
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()
Hope this helps others, this whole field still confuses me but we go on :)

Interpreting the output of a DNN using binary cross entropy as loss

I have a Tensorflow image classification DNN that uses binary crossentropy as its loss and the corresponding label mode binary in the tf.keras.preprocessing.image_dataset_from_directory call. When i train the model and run inference on images the predictions outputs are something like [[-3.5601902]] or [[2.1026382]]. How im to interpret that to get to which of the two classes the model is assigning the image. I think the answer would be an implementation of a softmax function but im not getting it right.
Call to tf.keras.preprocessing.image_dataset:
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
images_directory,
label_mode="binary",
validation_split=0.2,
subset="training",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
images_directory,
label_mode="binary",
validation_split=0.2,
subset="validation",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
and the model
model = Sequential([
layers.experimental.preprocessing.Rescaling(1./255, input_shape=(img_height, img_width, 3)),
layers.Conv2D(16, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(32, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(64, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dense(num_classes)
])
model.compile(optimizer='adam',
loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
metrics=['accuracy'])
Also any input on the model would be appreciated.
It is a bit confusing what you are trying to achieve.
If you are doing a binary classification ( which I believe you are ), the size of your output layer should not be 'num_classes', it should be 1 with a sigmod for activation function. if you did that, the output 'p' would be the probability of class 1 and 1-'p' the probability of class 0. It seems that you mixed a bit of multi categorical approach with binary classification.
I guess the only thing possible to say about those values is that they are logit outputs.

Keras CNN accuracy is either static or way too high for image classification

I'm trying to implement a Convolutional Neural Network that can detect whether a person is wearing glasses or not. Unfortunately, I keep getting very strange results no matter which exact settings I use for learning rate, the specific optimizer, etc. With most settings, I notice that the accuracy of my model doesn't change after the second epoch and gets stuck at around 0.56 (which is close to the ratio of one label, 2700 images, compared to the other label, 2200 images). In other runs, with slightly different settings, the accuracy suddenly rockets to about 0.9 and keeps increasing. In both cases, however, the model predicts the exact same classification ('with glasses') each time (even on images that were in the training/validation set), always with a confidence level of 100% (the label is exactly 1 each time).
I'm not all that experienced with Neural Networks for image classification so I wasn't quite sure how to figure out the issue. I tried printing some values from my dataset and their respective labels, and the labels do contain both labels (0s and 1s). Therefore, I assume it's probably an issue with my model but I can't really figure out much myself. I've tried different optimizers (Adam, SGD mostly), smaller and bigger learning rates, different momentum values, less/more convolutional layers and different parameters for the padding and kernel_initializer, different batch sizes... It's still stuck with either the very quickly improving accuracy or the static one.
My code looks as follows:
#parameters
batch_size = 16
img_height = 180
img_width = 180
num_classes = 2
epochs = 10
#training data
train_db = tf.keras.preprocessing.image_dataset_from_directory(
`D:\archive\faces\`,
validation_split=0.2,
subset="training",
seed=123,
image_size=(img_height, img_width), color_mode = "grayscale",
batch_size=batch_size)
#validation data
val_db = tf.keras.preprocessing.image_dataset_from_directory(
`D:\archive\faces\`,
validation_split=0.2,
subset="validation",
seed=123,
image_size=(img_height, img_width), color_mode = "grayscale",
batch_size=batch_size)
#speeds up the model training
AUTOTUNE = tf.data.experimental.AUTOTUNE
train_db = train_db.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_db = val_db.cache().prefetch(buffer_size=AUTOTUNE)
#establishing the model
model = Sequential([
layers.experimental.preprocessing.Rescaling(1./255),
layers.Conv2D(16, (3,3), activation='relu', input_shape=(img_width, img_height, 1), kernel_initializer='he_uniform', padding='same'),
layers.MaxPooling2D(2, 2),
layers.Conv2D(32, (3,3), activation='relu', kernel_initializer='he_uniform', padding='same'),
layers.MaxPooling2D(2,2),
layers.Conv2D(64, (3,3), activation='relu', kernel_initializer='he_uniform', padding='same'),
layers.MaxPooling2D(2,2),
layers.Conv2D(64, (3,3), activation='relu', kernel_initializer='he_uniform', padding='same'),
layers.MaxPooling2D(2,2),
layers.Conv2D(64, (3,3), activation='relu', kernel_initializer='he_uniform', padding='same'),
layers.MaxPooling2D(2,2),
layers.Flatten(),
layers.Dense(512, activation='relu', kernel_initializer='he_uniform'),
layers.Dense(1, activation='sigmoid')
])
#different optimizer options
opt = tf.keras.optimizers.SGD(learning_rate=0.001, momentum=0.9)
opt2 = tf.keras.optimizers.Adam(learning_rate=0.001)
#compiling the model
model.compile(
optimizer=opt,
loss=tf.losses.BinaryCrossentropy(),
metrics='accuracy')
#training the model
model.fit(train_db,validation_data=val_db,epochs=epochs)
since you are using loss=tf.losses.BinaryCrossentropy() then in image_dataset_from_directory you need to add label_mode='binary' for both the train_db and val_db. For val_db add shuffle=False

How to improve my CNN ? high and constant validation error

I am working on a problem for predicting a score of how fat cows are, based on images of cows.
I applied a CNN to estimate the value which is between 0-5 ( the dataset i have, contains only values between 2.25 and 4 )
I am using 4 CNN layers and 3 Hidden layers.
I actualy have 2 problems :
1/ I got 0.05 training error, but after 3-5 epochs the validation error remains at about 0.33.
2/ The value predicted by my NN are between 2.9 and 3.3 which is too narrow compared with the dataset range. Is it normal ?
How can i improve my model ?
model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(512, 424,1)),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Flatten(input_shape=(512, 424)),
tf.keras.layers.Dense(256, activation=tf.nn.relu),
tf.keras.layers.Dense(128, activation=tf.nn.relu),
tf.keras.layers.Dense(64, activation=tf.nn.relu),
tf.keras.layers.Dense(1, activation='linear')
])
Learning Curve:
Prediction:
This seems to be the case of Overfitting. You can
Shuffle the Data, by using shuffle=True in cnn_model.fit. Code is shown below:
history = cnn_model.fit(x = X_train_reshaped,
y = y_train,
batch_size = 512,
epochs = epochs, callbacks=[callback],
verbose = 1, validation_data = (X_test_reshaped, y_test),
validation_steps = 10, steps_per_epoch=steps_per_epoch, shuffle = True)
Use Early Stopping. Code is shown below
callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=15)
Use Regularization. Code for Regularization is shown below (You can try l1 Regularization or l1_l2 Regularization as well):
from tensorflow.keras.regularizers import l2
Regularizer = l2(0.001)
cnn_model.add(Conv2D(64,3, 3, input_shape = (28,28,1), activation='relu', data_format='channels_last',
activity_regularizer=Regularizer, kernel_regularizer=Regularizer))
cnn_model.add(Dense(units = 10, activation = 'sigmoid',
activity_regularizer=Regularizer, kernel_regularizer=Regularizer))
You can try using BatchNormalization.
Perform Image Data Augmentation using ImageDataGenerator. Refer this link for more info about that.
If the Pixels are not Normalized, Dividing the Pixel Values with 255 also helps.
Finally, if there still no change, you can try using Pre-Trained Models like ResNet or VGG Net, etc..

wifi gesture recognition,dl,ml, python, cnn,

I have dataset look like this
(7500, 200, 30, 3)
which 7500 samples (there are a tensor of shape 200,30,3) which is related to CSI data (kind of wifi data for gesture recognition) It has 150 different labels (gestures) the aim is to classify
I used a CNN by keras to classify, I faced with huge overfitting
def create_DL_model():
# input layer
csi = Input(shape=(200,30,3))
# first feature extractor
x = Conv2D(64, kernel_size=3, activation='relu',name='layer1-01')(csi)
x=BatchNormalization()(x)
x = MaxPooling2D(pool_size=(2, 2),name='layer1-02')(x)
x = Conv2D(64, kernel_size=3, activation='relu',name='layer1-03')(x)
x=BatchNormalization()(x)
x = MaxPooling2D(pool_size=(2, 2),name='layer1-04')(x)
x=BatchNormalization()(x)
x = Conv2D(64, kernel_size=3, activation='relu',name='layer1-05',padding='same')(x)
x=Conv2D(32, kernel_size=3, activation='relu',name='layer1-06',padding='same')(x)
x=Conv2D(64, (3,3),padding='same',activation='relu',name='layer-01')(x)
x=BatchNormalization()(x)
x=MaxPool2D(pool_size=(2, 2,),name='layer-02')(x)
x=Conv2D(32, (3,3),padding="same",activation='relu',name='layer-03')(x)
x=BatchNormalization()(x)
x=MaxPool2D(pool_size=(2, 2),name='layer-04')(x)
x=Flatten()(x)
x=Dense(16,activation='relu')(x)
keras.layers.Dropout(.50, seed=1)
probability=Dense(150,activation='softmax')(x)
model= Model(inputs=csi, outputs=probability)
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
return model
as you see, I used drop out for dense layer, early stopping and batch normalization for fight with overfitting, as you see still, there is the problem
after cross validation, I have accuracy around 70 (some papers got 90 pecent accuracy however we have 150 labels and it seems 90 pecent it is really grear result, they used meta-learning which I could not use), is there any way that you can recommend
many thanks
The accuracy vs epoch graph signals the over fitting issue present in your model. This is due to few training samples (7500/150 = 50 per class). One possible solution could be applying Data Augmentation which allows you to build a powerful image classifier using only very few training examples.
Data structure
Store your data according to the following structure
data/
train/
class1/
class1_img001.jpg
class1_img002.jpg
class2/
class2_img001.jpg
class2_img002.jpg
...
class150/
class150_img001.jpg
class150_img002.jpg
...
validation/
class1/
class1_img001.jpg
class1_img002.jpg
class2/
class2_img001.jpg
class2_img002.jpg
...
class150/
class150_img001.jpg
class150_img002.jpg
...
You can do:
def create_DL_model(img_height, img_width, channel):
# input layer
csi = Input(shape=(img_height, img_width, channel))
# first feature extractor
x = Conv2D(64, kernel_size=3, activation='relu',name='layer1-01')(csi)
x=BatchNormalization()(x)
x = MaxPooling2D(pool_size=(2, 2),name='layer1-02')(x)
x = Conv2D(64, kernel_size=3, activation='relu',name='layer1-03')(x)
x=BatchNormalization()(x)
x = MaxPooling2D(pool_size=(2, 2),name='layer1-04')(x)
x=BatchNormalization()(x)
x = Conv2D(64, kernel_size=3, activation='relu',name='layer1-05',padding='same')(x)
x=Conv2D(32, kernel_size=3, activation='relu',name='layer1-06',padding='same')(x)
x=Conv2D(64, (3,3),padding='same',activation='relu',name='layer-01')(x)
x=BatchNormalization()(x)
x=MaxPool2D(pool_size=(2, 2,),name='layer-02')(x)
x=Conv2D(32, (3,3),padding="same",activation='relu',name='layer-03')(x)
x=BatchNormalization()(x)
x=MaxPool2D(pool_size=(2, 2),name='layer-04')(x)
x=Flatten()(x)
x=Dense(16,activation='relu')(x)
keras.layers.Dropout(.50, seed=1)
probability=Dense(150,activation='softmax')(x)
model= Model(inputs=csi, outputs=probability)
return model
from keras.preprocessing.image import ImageDataGenerator
batch_size = 32
img_height = 200
img_width = 30
channel = 3
model = create_DL_model(img_height, img_width, channel)
# this is the augmentation configuration we will use for training
train_datagen = ImageDataGenerator(
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest')
# this is the augmentation configuration we will use for testing:
# only rescaling
test_datagen = ImageDataGenerator(rescale=1./255)
# this is a generator that will read pictures found in
# subfolers of 'data/train', and indefinitely generate
# batches of augmented image data
train_generator = train_datagen.flow_from_directory(
'data/train', # this is the target directory
target_size=(img_height , img_width ), # all images will be resized
batch_size=batch_size,
class_mode='categorical_crossentropy')
# this is a similar generator, for validation data
validation_generator = test_datagen.flow_from_directory(
'data/validation',
target_size=(img_height , img_width ),
batch_size=batch_size,
class_mode='categorical_crossentropy')
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
model.fit_generator(
train_generator,
steps_per_epoch=7500// batch_size,
epochs=50,
validation_data=validation_generator,
validation_steps=YOUR_VALIDATION_SIZE// batch_size) # use YOUR_VALIDATION_SIZE as per your validation data
model.save('model-e50-b32.h5') # always save your weights after training or during training
The number of CNN layers can be decreased and accuracy-loss can be monitored since we are only feeding 7500 training images.
The above code is not tested. Please share your errors for further advice.
More about data augmentation and how to apply is here.

Categories

Resources